Uploaded by Hulya Muratsa

Worldwide Differential Equations

advertisement
Worldwide
Di↵erential
Equations
with Linear Algebra
Robert McOwen
2
c 2012, Worldwide Center of Mathematics, LLC
ISBN 978-0-9842071-2-1
v.1217161311
Contents
0.1
Preface
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1 First-Order Differential Equations
1.1 Differential Equations and Mathematical Models
1.2 Geometric Analysis and Existence/Uniqueness .
1.3 Separable Equations & Applications . . . . . . .
1.4 Linear Equations & Applications . . . . . . . . .
1.5 Other Methods . . . . . . . . . . . . . . . . . . .
1.6 Additional Exercises . . . . . . . . . . . . . . . .
5
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
7
7
11
14
21
27
32
2 Second-Order Differential Equations
2.1 Introduction to Higher-Order Equations . . . . . . . . .
2.2 General Solutions for Second-Order Equations . . . . . .
2.3 Homogeneous Equations with Constant Coefficients . . .
2.4 Free Mechanical Vibrations . . . . . . . . . . . . . . . .
2.5 Nonhomogeneous Equations with Constant Coefficients
2.6 Forced Mechanical Vibrations . . . . . . . . . . . . . . .
2.7 Electrical Circuits . . . . . . . . . . . . . . . . . . . . .
2.8 Additional Exercises . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
35
35
39
45
50
58
65
71
77
3 Laplace Transform
3.1 Laplace Transform and Its Inverse . . . . . . . .
3.2 Transforms of Derivatives, Initial-Value Problems
3.3 Shifting Theorems . . . . . . . . . . . . . . . . .
3.4 Discontinuous Inputs . . . . . . . . . . . . . . . .
3.5 Convolutions . . . . . . . . . . . . . . . . . . . .
3.6 Additional Exercises . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
79
. 79
. 86
. 90
. 94
. 100
. 103
4 Systems of Linear Equations and Matrices
4.1 Introduction to Systems and Matrices . . .
4.2 Gaussian Elimination . . . . . . . . . . . . .
4.3 Reduced Row-Echelon Form and Rank . . .
4.4 Inverse of a Square Matrix . . . . . . . . . .
4.5 The Determinant of a Square Matrix . . . .
4.6 Cofactor Expansions . . . . . . . . . . . . .
4.7 Additional Exercises . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
105
105
112
120
125
131
136
142
4
5 Vector Spaces
5.1 Vectors in Rn . . . . . . . . . . . .
5.2 General Vector Spaces . . . . . . .
5.3 Subspaces and Spanning Sets . . .
5.4 Linear Independence . . . . . . . .
5.5 Bases and Dimension . . . . . . . .
5.6 Row and Column Spaces . . . . . .
5.7 Inner Products and Orthogonality
5.8 Additional Exercises . . . . . . . .
CONTENTS
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
145
145
148
151
157
163
169
172
178
6 Linear Transformations and Eigenvalues
6.1 Introduction to Transformations and Eigenvalues
6.2 Diagonalization and Similarity . . . . . . . . . .
6.3 Symmetric and Orthogonal Matrices . . . . . . .
6.4 Additional Exercises . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
181
181
187
193
201
7 Systems of First-Order Equations
7.1 Introduction to First-Order Systems . . . . .
7.2 Theory of First-Order Linear Systems . . . .
7.3 Eigenvalue Method for Homogeneous Systems
7.4 Applications to Multiple Tank Mixing . . . .
7.5 Applications to Mechanical Vibrations . . . .
7.6 Additional Exercises . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
203
203
207
214
222
227
234
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Appendix A Complex Numbers
237
Appendix B Review of Partial Fractions
241
Appendix C Table of Integrals
245
Appendix D Table of Laplace Transforms
247
Appendix E Answers to Some Exercises
249
Index
267
0.1. PREFACE
0.1
5
Preface
This textbook is designed for a one-semester undergraduate course in ordinary differential equations and linear algebra. We have had such a course at Northeastern University
since our conversion from the quarter to semester system required us to offer one course
instead of two. Many other institutions have a similarly combined course, perhaps for a
similar reason; consequently, there are many other textbooks available that cover both
differential equations and linear algebra. Let me describe some of the features of my
book and draw some contrasts with the other texts on this subject.
Because many students taking the course at Northeastern are electrical engineering
majors who concurrently take a course in circuits, we always include the Laplace transform in the first half of the course. For this reason, in my textbook I cover first and
second-order differential equations as well as the Laplace transform in the first three
chapters, then I turn to linear algebra in Chapters 4-6, and finally draw on both in the
analysis of systems of differential equations in Chapter 7. This ordering of material is
unusual (perhaps unique) amongst other textbooks for this course, which generally alternate more between differential equations and linear algebra, and put Laplace transform
near the end of the book.
Another feature of my textbook is a fairly concise writing style and selection of
topics. I find that many textbooks on this subject are excessively long: they use a
verbose writing style, include too many sections, and many of the sections contain too
much material. As an instructor using such a book for a one-semester course, I am
constantly deciding what to not cover: not only what sections to skip, but what topics
in each section to leave out. I think that students using such a textbook also find it
difficult to know what has been covered and what has not. On the other hand, I think it
is good to have some additional or optional material, to provide some flexibility for the
instructor, and to make the book more appropriate for advanced or honors students.
Consequently, in my book I have tried to make judicious choices about what material to
include, and to arrange it in such a way as to conveniently allow the instructor to omit
certain topics. For example, an instructor can cover separable first-order differential
equations with applications to unlimited population growth and Newton’s law of cooling,
and then decide whether or not to include the subsections on resistive force models and
on the logistic model for population growth.
The careful selection and arrangement of material is also reflected in the exercises
for the student. At the end of each section I have provided exercises that are designed
to develop fairly basic skills, and I grouped problems together according to the skill
that they are intended to develop. For example, Exercise #1 may address a certain
skill, and there are six specific problems (a-f) to do this. Moreover, the answers to all
of these exercises (not just the odd-numbered ones) are provided in the Appendix. In
fact, some exercises have solution videos on YouTube, which is indicated by Solution ;
a full list of the solution videos can be found at
http://www.centerofmath.org/textbooks/diff eq/supplements.html
In addition to the exercises at the end of each section, I have provided at the end of
each chapter a list of Additional Exercises. These include exercises involving additional
6
CONTENTS
applications and some more challenging problems. Only the odd-numbered problems
from the Additional Exercises sections are given answers in the Appendix.
Let me add that, in order to keep this book at a length that is convenient for a single
semester course, I have had to leave out some important topics. For example, I have not
tried to cover numerical methods in this book. While I believe that numerical methods
(including the use of computational software) should be taught along with theoretical
techniques, there are so many of the latter in a course that covers both differential
equations and linear algebra, that it seemed inadvisable to try to also include numerical
methods. Consequently, I made the difficult decision to leave numerical methods out of
this textbook.
On the other hand, I have taken some advantage of the fact that this book is being
primarily distributed in electronic form to expand the coverage. For example, I have
included links to online resources (especially Wikipedia articles) that provide more information about topics that are only briefly mentioned in this book. Again, I have tried
to make judicious choices about this: if a Wikipedia article on a certain topic exists but
does not provide significantly more information than is given in the text, then I chose
not to include it.
I hope that the choices that I have made in writing this book make it a valuable
learning tool for the students and instructors alike.
Robert McOwen
June 2012
Chapter 1
First-Order Differential
Equations
1.1
Differential Equations and Mathematical Models
A differential equation is an equation that involves an unknown function and its
derivatives. These arise naturally in the physical sciences. For example, Newton’s
second law of motion F = ma concerns the acceleration a of an object of mass m under
a force F . But if we denote the object’s velocity by v and assume that F could depend
on both v and t, then this can be written as a first-order differential equation for v
m
dv
= F (t, v).
dt
(1.1)
The simplest example of (1.1) is when F is a constant, such as the gravitational force Fg
near the surface of the earth. In this case, Fg = mg where g is the constant acceleration
due to gravity, which is given approximately by g ≈ 9.8 m/sec2 ≈ 32 ft/sec2 . If we use
this in (1.1), we can easily integrate to find v(t):
Z t
dv
dv
m
= mg ⇒
= g ⇒ v(t) − v0 =
g dt ⇒ v(t) = gt + v0 ,
dt
dt
0
where v0 is the initial velocity. Notice that we need to know the initial velocity in order
to determine the velocity at time t.
While equations in which time is the independent variable occur frequently in applications, it is often more convenient to consider x as the independent variable. Let us
use this notation and consider a first-order differential equation in which we can
solve for dy/dx in terms of x and y:
dy
= f (x, y).
dx
(1.2)
We are frequently interested in finding a solution of (1.2) that also satisfies an initial
condition:
y(x0 ) = y0 .
(1.3)
7
Fig.1. Gravitational force
8
CHAPTER 1. FIRST-ORDER DIFFERENTIAL EQUATIONS
The combination of (1.2) and (1.3) is called an initial-value problem. The class
of all solutions of (1.2) is called the general solution and it usually depends upon a
constant that can be evaluated to find the particular solution satisfying a given initial
condition. Most of this chapter is devoted to finding general solutions for (1.2), as well
as particular solutions of initial-value problems, when f (x, y) takes various forms.
A very easy case of (1.2) occurs when f is independent of y, i.e.
dy
= f (x),
dx
since we can simply integrate to obtain the general solution as
Z
y(x) = f (x) dx + C, where C is an arbitrary constant.
On the other hand, if we also require y to satisfy the initial condition (1.3), then we
can evaluate C to find the particular solution. This technique was used to solve the
gravitational force problem in the first paragraph and should be familiar from calculus,
but further examples are given in the Exercises.
In (1.1), if we replace v by dx/dt where x denotes the position of the object and we
assume that F could also depend on x, then we obtain an example of a second-order
differential equation for x:
d2 x
dx
m 2 = F t, x,
.
(1.4)
dt
dt
An instance of (1.4) is the damped, forced spring-mass system considered in Chapter 2:
m
d2 x
dx
+c
+ kx = F (t).
dt2
dt
(1.5)
But now initial conditions at t0 must specify the values of both x and dx/dt:
Fig.2. Spring-mass system
x(t0 ) = x0 ,
dx
(t0 ) = v0 .
dt
(1.6)
In general, the order of the differential equation is determined by the highest-order
derivative of the unknown function appearing in the equation. Moreover, an initial-value
problem for an n-th-order differential equation should specify the first n − 1 derivatives
of the unknown function at some initial point.
An important concept for differential equations is linearity: a differential equation
is linear if the unknown function and all of its derivatives occur linearly. For example,
(1.1) is linear if F (t, v) = f (t) + g(t)v, but not if F (t, v) = v 2 . Similarly, (1.4) is linear
if F (t, x, v) = f (t) + g(t)x + h(t)v, but not if F (t, x, v) = sin x or F (t, x, v) = ev . For
example, (1.5) is linear. (Note that the coefficient functions f (t), etc do not have to be
linear in t.)
In the above examples, the independent variable is sometimes x and sometimes t.
However, when it is clear what is the independent variable, we may use 0 to denote
derivatives; for example, we can write (1.5) as
m x00 + c x0 + k x = F (t).
1.1. DIFFERENTIAL EQUATIONS AND MATHEMATICAL MODELS
9
Notation: for simplicity, we generally do not show the dependence of the unknown
function on the independent variable, so we will not write mx00 (t)+cx0 (t)+kx(t) = F (t).
Mathematical Models
Before we begin the analysis of differential equations, let us consider a little more carefully how they arise in mathematical models. Mathematical models are used to reach
conclusions and make predictions about the physical world. Suppose there is a particular physical system that we want to study. Let us describe the modeling process for
the system in several steps:
1. Abstraction: Describe the physical system using mathematical terms and relationships; this provides the model itself.
2. Analysis: Apply mathematical analysis of the model to obtain mathematical
conclusions.
3. Interpretation: Use the mathematical conclusions to obtain conclusions about
the physical system.
4. Refinement (if necessary): If the conclusions of the model do not agree with
experiments, it may be necessary to replace or at least refine the model to make
it more accurate.
In this textbook, of course, we consider models involving differential equations. Perhaps the simplest case is population growth. It is based upon the observation that
populations of all kinds grow (or shrink) at a rate proportional to the size of the population. If we let P (t) denote the size of the population at time t, then this translates
to the mathematical statement dP/dt = kP, where k is the proportionality constant: if
k > 0 then the population is growing, and if k < 0 then it is shrinking. If we know the
population is P0 at time t = 0, then we have an initial-value problem for a first-order
linear differential equation:
dP
= kP,
dt
P (0) = P0 .
P
P(t)
(1.7)
This is our mathematical model for population growth. It can easily be analyzed by
separation of variables (see Section 1.3), and the solution is found to be P (t) = P0 ekt .
When k > 0, the interpretation of this analysis is that the population grows exponentially, and without any upper bound. While this may be true for a while, growing
populations eventually begin to slow down due to additional factors like overcrowding
or limited food supply. This means that the model must be refined to account for these
additional factors; we will discuss one such refinement of (1.7) in Section 1.3. We should
also mention that the case k < 0 in (1.7) provides a model for radioactive decay.
Similar reasoning lies behind Newton’s law of cooling in heat transfer: it is
observed that a body with temperature T that is higher than the ambient temperature
A will cool at a rate proportional to the temperature differential T − A. Consequently,
if the initial temperature T0 is greater than A, then the body will cool: rapidly at first,
- Po
t
t
Fig.3. Population growth
10
CHAPTER 1. FIRST-ORDER DIFFERENTIAL EQUATIONS
but then gradually as it decreases to A. Our mathematical model for cooling is the
initial-value problem for a first-order linear differential equation:
-T
0
dT
= −k(T − A),
dt
T(t)
-
T (0) = T0 ,
(1.8)
A
0
Fig.4.
cooling
Newton’s
law
of
where k > 0. In fact, if T0 is less than A, then (1.8) also governs the warming of
the body; cf. Example 4 in Section 1.3 where we shall solve (1.8) by separation of
variables. But for now, observe that (1.8) implies that T = A is an equilibrium, i.e.
dT /dt = 0 and the object remains at the ambient temperature. We shall have more
to say about equilibria in the next section. Also, note that we have assumed that the
proportionality constant k is independent of T ; of course, this may not be strictly true
for some materials, which means that a refinement of the model may be necessary.
We can also use mathematical models to study resistive forces that depend on the
velocity of a moving body; these are of the form (1.1) and will be discussed in Section 1.3. Other mathematical models discussed in this textbook include the damped,
forced spring-mass system (1.5) and other mechanical vibrations, electrical circuits, and
mixture problems.
Remark. Differential equations involving unknown functions of a single variable, e.g.
x(t) or y(x), are often called ordinary differential equations. On the other hand,
differential equations involving unknown functions of several variables, such as u(x, y),
are called partial differential equations since the derivatives are partial derivatives,
ux and uy . We shall not consider partial differential equations in this textbook.
Exercises
1. For each differential equation, determine (i) the order, and (ii) whether it is linear:
(a) y 0 + xy 2 = cos x
(c) y 000 + y = x2
(b) x00 + 2x0 + x = sin t
(d) x000 + t x = x2
2. For the given differential equation, use integration to (i) find the general solution,
and (ii) find the particular solution satisfying the initial condition y(0) = 1.
(a)
(b)
dy
dt
dy
dx
= sin t
= xex
2
(c)
(d)
dy
dx
dy
dx
= x cos x
=
√1
1−x
3. A rock is hurled into the air with an initial velocity of 64 ft/sec. Assuming only
gravitational force with g = 32 ft/sec2 applies, when does the rock reach its
maximum height? What is the maximum height that the rock achieves?
4. A ball is dropped from a tower that is 100 m tall. Assuming only gravitational
force g = 9.8 m/sec2 , how long does it take to reach the ground? What is the
speed upon impact?
1.2. GEOMETRIC ANALYSIS AND EXISTENCE/UNIQUENESS
1.2
11
Geometric Analysis and Existence/Uniqueness
A first-order differential equation of the form
dy
= f (x, y)
(1.9)
dx
defines a slope field (or direction field) in the xy-plane: the value f (x, y) is the slope
of a tiny line segment at the point (x, y). It has geometrical significance for solution
curves (also called integral curves) which are simply graphs of solutions y(x) of the
equation: at each point (x0 , y0 ) on a solution curve, f (x0 , y0 ) is the slope of the tangent
line to the curve. If we sketch the slope field, and then try to draw curves which are
tangent to this field at each point, we get a good idea how the various solutions behave.
In particular, if we pick (x0 , y0 ) and try to draw a curve passing through (x0 , y0 ) which
is everywhere tangent to the slope field, we should get the graph of the solution for the
initial-value problem dy/dx = f (x, y), y(x0 ) = y0 .
Let us illustrate this with a simple example:
dy
= 2x + y.
(1.10)
dx
We first take a very low-tech approach and simply calculate the slope at various values
of x and y:
y=
x = −2
x = −1
x=0
x=1
x=2
−2
−6
−4
−2
0
2
−1.5
−5.5
−3.5
−1.5
0.5
2.5
−1
−5
−3
−1
1
3
−0.5
−4.5
−2.5
−0.5
1.5
3.5
0
−4
−2
0
2
4
0.5
−3.5
−1.5
0.5
2.5
4.5
1
−3
−1
1
3
5
1.5
−2.5
−0.5
1.5
3.5
5.5
2
−2
0
2
4
6
2
1.5
1
0.5
Figure 1. Values of f (x, y) = 2x + y at various values of x and y
Then we plot these values in the xy plane: see Figure 2 which includes the slope field
at even more points than those computed above. (The slopes are color-coded according
to how “steep” they are.) Using the slope field, we can then sketch solution curves: in
Figure 2 we see the solution curve starting at (−1, 0); note that it is everywhere tangent
to the slope field.
Obviously, this is a very crude and labor-intensive analysis. Happily, technology can
come to our assistance: there are many excellent computational and graphing programs
such as MATLAB, Mathematica, Maple, etc that can plot slope fields and compute
solution curves (not just sketch them). These programs utilize numerical methods
to “solve” differential equations; however, we shall not discuss them here.
Existence and Uniqueness of Solutions
This graphical analysis of (1.9) suggests that we can always find a solution curve passing
through any point (x0 , y0 ), i.e. there exists a solution of (1.9) satisfying the initial
condition y(x0 ) = y0 . It also seems that there is only one such solution curve, i.e.
the solution of the initial-value problem is unique. This question of the existence
and uniqueness of solutions to an initial-value problem is so fundamental that we now
carefully formulate the conditions under which it is true.
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
-0.5
-1
-1.5
-2
Fig.2. Slope field and the
solution curve from (-1,0).
12
CHAPTER 1. FIRST-ORDER DIFFERENTIAL EQUATIONS
Theorem 1. If f (x, y) is continuous in an open rectangle R = (a, b) × (c, d) in the xyplane that contains the point (x0 , y0 ), then there exists a solution y(x) to the initial-value
problem
dy
= f (x, y),
y(x0 ) = y0 ,
(1.11)
dx
that is defined in an open interval I = (α, β) containing x0 . In addition, if the partial
derivative ∂f /∂y is continuous in R, then the solution y(x) of (1.11) is unique.
Figure 3. Existence and
Uniqueness of a Solution.
Remark 1. We might hope that existence holds with α = a and β = b, and indeed this
is the case for linear equations f (x, y) = a(x)y + b(x). But for nonlinear equations, this
need not be the case. For example, if we plot the slope field and some solution curves for
y 0 = y 2 , we see that the solutions seem to approach vertical asymptotes; this suggests
that they are not defined on I = (−∞, ∞) even though f = y 2 is continuous on the
whole xy-plane. (When we discuss separation of variables in Section 1.3, we will be able
to confirm the existence of these vertical asymptotes for many nonlinear equations.)
2.5
2
1.5
1
-2
0.5.
.
.
-1.5
-1
-0.5
This theorem is proved using successive approximations, but we shall not give the
details; see, for example, [1]. However, let us make a couple of remarks that will help
clarify the need for the various hypotheses.
0
0.5
1
1.5
2
Fig.4. Slope field and solution curves for y 0 = y 2 .
Remark 2. Regarding the condition for uniqueness, Exercise 3 in Section 1.3 shows that
the initial-value problem y 0 = y 2/3 , y(0) = 0 has two different solutions; the uniqueness
claim in the theorem does not apply to this example since ∂f /∂y is not continuous in
any rectangle containing (0, 0).
Qualitative Analysis
When the function f in (1.9) does not depend on x, the equation is called autonomous:
dy
= f (y).
dx
(1.12)
For such equations, the slope fields do not depend upon the value of x, so changing the
value of x0 merely translates the solution curve horizontally. But more importantly,
values of y0 for which f (y0 ) = 0 are called critical point, and provide equilibrium
solutions of (1.12), namely y(x) ≡ y0 is a constant solution. (We mentioned this phenomenon in connection with Newton’s law of cooling in the previous section.)
Given an equilibrium solution y0 for the autonomous equation (1.12), nearby solution
curves often approach y0 as x increases, or move away from y0 as x increases. This
phenomenon is called stability:
1. If all nearby solution curves approach y0 , then the equilibrium is called a sink,
which is stable;
2. If at least some nearby solution curves do not approach y0 , then the equilibrium
is called unstable;
Fig.5. Stability of critical
points.
3. If all nearby solution curves move away from y0 , then the equilibrium is called a
source, which is unstable.
1.2. GEOMETRIC ANALYSIS AND EXISTENCE/UNIQUENESS
13
The reason for the terms “sink” and “source” can be seen from the graph. We
illustrate this with an example.
Example 1. Find all equilibrium solutions for y 0 = y 2 − 4y + 3 and determine the
stability of each one.
Solution. We find f (y) = y 2 − 4y + 3 = (y − 1)(y − 3) = 0 has solutions y = 1 and
y = 3, so these are the equilibrium solutions. To determine their stability, we need
to consider the behavior of solution curves in the three intervals (−∞, 1), (1, 3), and
(3, ∞); this is determined by the sign of f (y) in these intervals. But it is clear (for
example, by sampling f at the points y = 0, 2, 4) that f (y) < 0 on the interval (1, 3)
and f (y) > 0 on the intervals (−∞, 1) and (3, ∞). From this we determine that all
solutions near y = 1 tend towards y = 1 (like water flows towards a drain), so this
critical point is asymptotically stable, and it resembles a “sink”. On the other hand,
near y = 3 solutions tend away from y = 3 (like water flowing from a spigot), so this
critical point is unstable, and in fact is a “source”.
2
Note that an equilibrium point y0 may be unstable without being a source: this occurs
if some nearby solution curves approach y0 as x increases while others move away; such
an equilibrium may be called semistable. Let us consider an example of this.
Fig.6. Stability of critical
points in Example 1.
Example 2. Find all equilibrium solutions for y 0 = y 3 −2y 2 and determine the stability
of each one.
Solution. We find f (y) = y 2 (y − 2) = 0 has solutions y = 0 and y = 2, so these are the
equilibrium solutions. For y < 0 and 0 < y < 2 we have f (y) < 0, so nearby solutions
starting below y = 0 tend towards −∞ while solutions starting between y = 0 and y = 2
tend towards y = 0. This means that y = 0 is semistable. On the other hand, solutions
starting above y = 2 tend towards +∞, so y = 2 is a source.
2
Exercises
1. Sketch the slope field and some solution curves for the following equations
(a)
dy
dx
=x−y
(b)
dy
dx
= y − sin x
2. For each initial-value problem, determine whether Theorem 1 applies to show the
existence of a unique solution in an open interval I containing x = 0.
(a)
dy
dx
= x y 4/3 ,
y(0) = 0
(b)
dy
dx
= x y 1/3 ,
y(0) = 0
(c)
dy
dx
= xy
1/3
,
y(0) = 1
3. Find all equilibrium solutions for the following autonomous equations, and determine the stability of each equilibrium.
(a)
(c)
dy
dt
dy
dt
= y − y2
(b)
= y − y3
(d)
dy
dt
dy
dt
= y2 − y3
= y sin y
Fig.7. Stability of critical
points in Example 2.
14
CHAPTER 1. FIRST-ORDER DIFFERENTIAL EQUATIONS
4. A cylindrical tank with cross-sectional area A is being filled with water at the rate
of k ft3 /sec, but there is a small hole in the bottom of the tank of cross-sectional
area a. Torricelli’s
law in hydrodynamics states that the water exits the hole with
√
velocity v = 2 g y, where g = 32 ft/sec2 is acceleration due to gravity and y is
the depth of water in the tank.
(a) Show that y(t) satisfies the differential equation
√
dy
k − a 2gy
=
.
dt
A
(b) Find the equilibrium depth of water in the tank. Is it a stable equilibrium?
Fig.8. Tank with Hole.
1.3
Separable Equations & Applications
A first-order differential equation is called separable if it can be put into the form
p(y)
dy
= q(x).
dx
(1.13)
The equation is called “separable” because, if we formally treat dy/dx as a fraction of
differentials, then (1.13) can be put into the form
p(y) dy = q(x) dx,
in which the variables are “separated” by the equality sign. We can now integrate both
sides of this equation to obtain
Z
Z
p(y) dy = q(x) dx,
which yields
P (y) = Q(x) + C,
(1.14)
where P (y) be an antiderivative for p(y), Q(x) an antiderivative for q(x), and C is an
arbitrary constant. Since we were a little cavalier in our derivation of (1.14), let us
verify that it solves (1.13): if we differentiate both sides with respect to x and use the
chain rule on the left-hand side to calculate
d
dy
P (y) = p(y) ,
dx
dx
we see that (1.14) implies (1.13). This means that we have indeed solved (1.13), although
(1.14) only gives the solution y(x) in implicit form. It may be possible to solve (1.14)
explicitly for y as a function of x; otherwise, it is acceptable to leave the solution in
the implicit form (1.14). If an initial-condition is given, the constant C in (1.14) can be
evaluated.
Example 1. Let us solve the initial-value problem
dy
= 2xy 2 ,
dx
y(0) = 1.
1.3. SEPARABLE EQUATIONS & APPLICATIONS
15
6
Solution. We separate variables and integrate both sides to find
Z
Z
y −2 dy = 2x dx,
5.5
5
4.5
4
and then evaluate both integrals to obtain
3.5
−y −1 = x2 + C.
3
2.5
We can solve for y to obtain the general solution of the differential equation:
2
1.5
y(x) = −
x2
1
.
+C
1
0.5
Finally we use the initial-condition y(0) = 1 to evaluate C = −1 and obtain the solution
of the initial-value problem as
1
.
y(x) =
1 − x2
Notice that this solution is not defined at x = ±1, so the largest interval containing x = 0
on which the solution is continuous is I = (−1, 1). On the other hand, f (x, y) = 2xy 2
is continuous for all −∞ < x, y < ∞, so this provides another nonlinear example that
in Theorem 1 of Section 1.2 we do not always have α = a and β = b.
2
Example 2. Let us find the general solution for
dy
= (1 + y 2 ) cos x.
dx
Solution. We separate variables and integrate both sides to obtain
Z
Z
dy
= cos x dx.
1 + y2
Evaluating the integrals yields
tan−1 y = sin x + C.
This defines y(x) implicitly, but we can find y(x) explicitly by taking tan of both sides:
y(x) = tan[sin x + C].
2
The following example involves the model for population growth that was mentioned
in Section 1.1.
Example 3. A population of bacteria grows at a rate proportional to its size. In 1
hour it increases from 500 to 1,500. What will the population be after 2 hours?
Solution. Letting P (t) denote the population at time t, we have the initial-value
problem
dP
= kP, P (0) = 500.
dt
-1
0
1
2
Fig.1. y(x) = 1/(1 − x ).
16
We do not yet know the value of k, but we have the additional data point P (1) = 1, 500.
If we separate variables and integrate we obtain
Z
Z
dP
= k dt.
P
4000
3000
P(t) =500 e
CHAPTER 1. FIRST-ORDER DIFFERENTIAL EQUATIONS
t ln 3
This yields ln |P | = kt + c where c is an arbitrary constant. Exponentiating both sides,
we find
|P (t)| = ekt+c = C ekt , where C = ec .
2000
1000
0
1
2
Since P is nonnegative, we may remove the absolute value sign, and using the initialcondition P (0) = 500 we obtain
Fig.2. P (t) = 500 et ln 3 .
P (t) = 500 ekt .
Now letting t = 1 we find 1, 500 = 500 ek , which we may solve for k to obtain k = ln 3.
Hence (see Fig.2)
P (t) = 500 et ln 3 .
2
We now evaluate at t = 2 to obtain P (2) = 500 e2 ln 3 = 500 eln 3 = 500 (9) = 4, 500. 2
The next example concerns Newton’s law of cooling/warming, which was also discussed in Section 1.1.
Example 4. A dish of leftovers is removed from a refrigerator at 35◦ F and placed in
an oven at 200◦ F. After 10 minutes, it is 75◦ F. How long will it take until it is 100◦ F?
Solution. Let the temperature of the leftovers at time t be T (t). Since the ambient
temperature is A = 200, our mathematical model (1.8) involves the differential equation
dT
= −k(T − 200).
dt
If we separate variables and integrate we find
Z
Z
dT
= − k dt.
T − 200
Evaluating the integrals yields ln |T − 200| = −kt + c where c is an arbitrary constant.
But in this problem we have T < 200, so ln |T − 200| = ln(200 − T ) = −kt + c, and
exponentiating yields 200 − T = Ce−kt where C = ec . Let us write this as
T (t) = 200 − Ce−kt ,
and then use the initial condition T (0) = 35 to evaluate C:
35 = 200 − C e0
⇒
C = 165.
In order to determine k we use T (10) = 75:
75 = 200 − 165 e−10k
⇒
e−10k =
25
33
⇒
k=
1
ln
10
33
25
= .02776.
1.3. SEPARABLE EQUATIONS & APPLICATIONS
17
Therefore, we have obtained
200
T (t) = 200 − 165 e
−.02776 t
.
150
Finally, to determine when the dish is 100◦ F, we solve T (t) = 100 for t:
200 − 165 e−.02667 t = 100
⇒
e−.02667 t =
100
165
T=200
⇒
So the dish is 100◦ F after approximately 19 minutes.
T(t)
100
t = 18.7767.
50
2
0
5
10
15
20
25
30
Fig.3. Graph of T (t) =
200 − 165e−.02776t .
Resistive Force Models
Consider an object that is moving with velocity v, but whose motion is subject to a
resistive force Fr (sometimes called drag). For example, such a resistive force is
experienced by an object moving along a surface in the form of friction, or an object
moving through the atmosphere in the form of air resistance. In all cases, the resistive
force is zero when v = 0 and increases as v increases; but the specific relationship
between v and Fr depends on the particular physical system.
Let us consider a relatively simple relationship between v and Fr . If c and α are
positive constants, then
Fr = −c v α
(1.15)
has the properties we are looking for: Fr = 0 when v = 0 and Fr increases as v increases.
(The minus sign accounts for the fact that the force decelerates the object.) If α = 1,
then Fr depends linearly upon v; this linear relationship is convenient for analysis,
but may not be appropriate in all cases. In fact, for objects moving through a fluid
(including the air), physical principles may be used to predict that α = 1 is reasonable
for relatively low velocites, but α = 2 is more appropriate for high velocities. Let us use
separation of variables to solve a problem involving low velocity.
Example 5. Predict the motion of an object of mass m falling near the earth’s surface
under the force of gravity and air resistance that is proportional to velocity.
Solution. Let y(t) denote the distance travelled at time t, and v = dy/dt. According
to Newton’s second law of motion, ma = Fg + Fr , where Fg = mg is the force due to
gravity (positive since y is measured downwards) and Fr = − c v as in (1.15) with α = 1
is air resistance. Consequently, we have the first-order differential equation
dv
= mg − c v.
dt
Notice that (1.16) is an autonomous equation with equilibrium
m
v∗ =
mg
;
c
(1.16)
(1.17)
at this velocity, the object has zero acceleration. (Using qualitative analysis, we can
reach some additional conclusions at this point; see Exercise 12.) But (1.16) is also a
separable equation, so we can solve it. If we separate the variables v and t and then
integrate, we obtain
Z
Z
m dv
= − dt.
c v − mg
Fig.4. Forces on a falling
object
18
CHAPTER 1. FIRST-ORDER DIFFERENTIAL EQUATIONS
We can use a substitution u = c v − mg to evaluate the left-hand side:
Z
Z
m
du
m
m
m dv
=
=
ln |u| + C =
ln |c v − mg| + C.
c v − mg
c
u
c
c
R
Since we easily integrate − dt = −t + C, we conclude
v = mg/c
v(t)
m
ln |c v − mg| = −t + C1
c
v0
ln |c v − mg| = −
ct
+ C2
m
mg
+ C4 e−ct/m .
c
We can use the initial velocity v0 to evaluate the constant C4 and conclude
⇒
t
⇒
c v − mg = C3 e−ct/m
Fig.5. v(t) approaching the
terminal velocity.
⇒
v=
v(t) =
mg −ct/m
mg
+ (v0 −
)e
.
c
c
v(t) →
mg
= v∗
c
In particular, we note that
as t → ∞,
leading to the interpretation of the equilibrium v ∗ as the terminal velocity.
2
The Logistic Model for Population Growth
In Section 1.1 we described the simple model for population growth
dP
= k P,
dt
where k > 0 is a constant,
and analyzed it in Example 3 above to conclude that a population of bacteria will
continue to grow exponentially. The model may be accurate for a while, but eventually
other factors such as overcrowding or limited food supply will tend to slow the population
growth. How can we refine the model to account for this?
Suppose that the population is in an environment that supports a maximum population M : as long as P (t) < M then P (t) will increase, but if P (t) > M then P (t) will
decrease. A simple model for this is a nonlinear equation called the logistic model :
dP
= k P (M − P ),
dt
where k, M > 0 are constants.
(1.18)
This has the important features that we are looking for:
0<P <M
⇒
dP/dt > 0,
i.e. P (t) is increasing
P >M
⇒
dP/dt < 0,
i.e. P (t) is decreasing.
Moreover, (1.18) is an autonomous equation with critical points at P = 0 and P = M ;
qualitative analysis shows that P = M is a stable equilibrium, so for any positive initial
population P0 we have P (t) → M as t → ∞. (Of course, the critical point at P = 0
corresponds to the trivial solution of a zero population; qualitative analysis shows it is
unstable, which is what we expect since any positive initial population will grow and
move away from P = 0.)
1.3. SEPARABLE EQUATIONS & APPLICATIONS
19
Beyond qualitative analysis, we notice that (1.18) is separable, and we can solve it
by first writing:
dP
= k dt.
P (M − P )
To integrate the left-hand side, we use a partial fraction decomposition (cf. Appendix
B)
1
1
1
1
=
+
,
P (M − P )
M P
M −P
so
Z
1
1
P
dP
=
(ln P − ln |M − P |) + c =
ln
+ c.
P (M − P )
M
M
|M − P |
This yields
ln
P
|M − P |
= M kt + c1 ,
where c1 = −M c,
and we can exponentiate to conclude
P
= C eM kt ,
|M − P |
where C = ec1 > 0.
In this formula, we need C > 0; but if we allow C to be negative then we can remove
the absolute value signs on M − P and simply write
P
= C eM kt ,
M −P
for some constant C.
If we let t = 0, then we can evaluate C and conclude
P
P0 eM kt
=
.
M −P
M − P0
A little algebra enables us to solve for P and write our solution as
P (t) =
M P0
.
P0 + (M − P0 ) e−M kt
(1.19)
Notice that, regardless of the value of P0 , we have P (t) → M as t → ∞, as expected
from our qualitative analysis.
Example 3 (revisited). If, after 2 hours, the population of bacteria in Example 3 has
only increased to 3, 500 instead of 4, 500 as predicted in Example 3, we might suppose
that a logistic model is more accurate. Find the maximum population M .
Solution. In addition to the initial population P0 = 500, we now have two data points:
P (1) = 1, 500 and P (2) = 3, 500. This should be enough to evaluate the constants k
and M in (1.19), although it is only M that we are asked to find. In particular, we have
the following two equations involving k and M :
500 M
= 1, 500
500 + (M − 500) e−M k
and
500 M
= 3, 500.
500 + (M − 500) e−2M k
20
CHAPTER 1. FIRST-ORDER DIFFERENTIAL EQUATIONS
In general, solving such a system of nonlinear equations may require the assistance of a
computer algebra system. However, in this case, we can use simple algebra to find M .
First, in each equation we solve for the exponential involving k:
P =7,502
e−M k =
6000
M − 1, 500
3 (M − 500)
and e−2M k =
M − 3, 500
.
7 (M − 500)
P(t)
But e−2M k = (e−M k )2 , so we obtain
4000
M − 3, 500
(M − 1, 500)2
=
.
9 (M − 500)2
7 (M − 500)
2000
0
1
2
3
4
5
6
7
8
9
10
Fig.6. Solution of Example
3 (revisited).
If we cross-multiply and simplify, we obtain
2 2
M − 1, 667 M = 0 ⇒ M = 7, 502.
9
Thus the maximum population that this collection of bacteria can achieve is 7, 502. Of
course, once we have found M , we can easily find k from
1
M − 1, 500
k = − ln
≈ 0.000167,
M
3 (M − 500)
2
and graph the solution as in Figure 6.
Exercises
1. Find the general solution of the following differential equations
(a)
dy
dx
+ 2xy = 0
(b)
dy
dx
=
y2
1+x2
(c)
dy
dx
=
cos x
y2
(d)
dy
dx
=
ex
1+y 2
(f)
dx
dt
2
(e) x dx
dt = x + 1
√
= 3 x t (x, t > 0)
2. Find the solution of the following initial-value problems
(a)
dy
dx
= y cos x,
(b)
dy
dx
=
x2 +1
y ,
(c)
dy
dt
= y (3t2 − 1),
y(0) = 1
√
y(1) = 3
y(1) = −2
3. Consider the initial-value problem
dy
= y 2/3 ,
dx
y(0) = 0.
(a) Use separation of variables to obtain the solution y(x) = x3 /27.
(b) Observe that another solution is y(x) ≡ 0.
(c) Since this initial-value problem has two distinct solutions, uniqueness does
not apply. What hypothesis in Theorem 1 in Section 1.2 does not hold?
1.4. LINEAR EQUATIONS & APPLICATIONS
21
4. A city has a population of 50,000 in the year 2000 and 63,000 in 2010. Assuming
the population grows at a rate proportional to its size, find the population in the
year 2011.
Solution
5. A population of 1, 000 bacteria grows to 1, 500 in an hour. When will it reach
2, 000?
6. Radioactive substances decay at a rate proportional to the amount, i.e. dN/dt =
−kN with k a positive constant. Uranium-238 has a half-life of 4.5 × 109 years.
How long will it take 500 grams of Uranium-238 to decay to 400 grams?
7. Radiocarbon dating is based on the decay of the radioactive isotope 14 C of
carbon, which has a half-life of 5, 730 years. While alive, an organism maintains
equal amounts of 14 C and ordinary carbon 12 C, but upon death the ratio of 14 C
to 12 C starts to decrease. If a bone is found to contain 70% as much 14 C as 12 C,
how old is it?
8. A cake is removed from a 350◦ F oven and placed on a cooling rack in a 70◦ F room.
After 30 minutes the cake is 200◦ F. When will it be 100◦ F?
9. A piece of paper is placed in a 500◦ F oven, and after 1 minute its tempeature raises
from 70◦ F to 250◦ F. How long until the paper reaches its combustion temperature
of 451◦ F?
10. A dead body is found at Noon in an office that is maintained at 72◦ F. If the body
is 82◦ F when it is found, and has cooled to 80◦ F at 1 pm, estimate the time of
death. (Assume a living body maintains a temperature of 98.6◦ F.)
11. A drug is being administered intravenously to a patient’s body at the rate of 1
mg/min. The body removes 2% of the drug per minute. If there is initially no
drug in the patient’s body, find the amount at time t.
12. Perform a qualitative analysis on (1.16) to determine the stability of the equilibrium (1.17). Use this stability to conclude that (1.17) is the terminal velocity.
13. A rock with mass 1 kg is hurled into the air with an initial velocity of 10 m/sec.
In addition to gravitational force Fg with g = 9.8 m/sec2 , assume the moving rock
experiences air resistance (1.15) with c = .02 and α = 1. When does the rock
achieve its maximum height? What is the maximum height that it achieves?
14. Consider an object falling near the earth’s surface under the force of gravity and
air resistance that is proportional to the square of the velocity, i.e. (1.15) with
α = 2. Use a qualitative analysis to find the terminal velocity v ∗ (in terms of m,
g, and c).
1.4
Linear Equations & Applications
As defined in Section 1.1, a linear first-order differential equation is one in which the
unknown function and its first-order derivative occur linearly. If the unknown function
22
CHAPTER 1. FIRST-ORDER DIFFERENTIAL EQUATIONS
is y(x), then this means that the equation is of the form a(x)y 0 +b(x)y = c(x). Provided
a(x) is nonzero, we can put the equation into standard form
y 0 + p(x)y = q(x).
(1.20)
We want to find a procedure for obtaining the general solution of (1.20).
The idea is to multiply the equation by a function I(x) so that the left-hand side
is the derivative of a product: this means that we can simply integrate both sides and
then solve for y(x). Because multiplication by I(x) reduces the problem to integration,
I(x) is called an integrating factor. Let us see how to define I(x). Multiplication of
the left-hand side of (1.20) by I(x) yields (using y 0 instead of dy/dx)
I(x)y 0 + I(x) p(x) y.
On the other hand, the product rule implies
[I(x)y]0 = I(x)y 0 + I 0 (x)y.
Comparing these formulas, we see that we want I(x) to satisfy I 0 (x) = I(x)p(x). But
this is a separable differential equation for I, which we know how to solve:
Z
Z
Z
dI
dI
= Ip ⇒
= p dx ⇒ ln |I(x)| = p dx + c
dx
I
R
where c is an arbitrary constant. Exponentiating, we obtain |I(x)| = C exp( p dx).
But we only need one integrating factor, so we can choose C as convenient. In fact, we
can choose C = ±1 so as to obtain the following formula for our integrating factor:
I(x) = e
R
p(x) dx
.
(1.21)
At this point, we can use I(x) as described to solve (1.20) and obtain a general solution
formula. However, it is more important to remember the method than to remember the
solution formula, so let us first apply the method to a simple example.
Example 1. Find the general solution of
xy 0 + 2y = 9x.
Solution. We first need to put the equation in standard form by dividing by x:
y0 +
2
y = 9.
x
(Of course, dividing by x makes us worry about the possibility that x = 0, but the
analysis will at least be valid
R for −∞ < x < 0 and 0 < x < ∞.) Comparing with (1.20)
we see that p(x) = 2/x, so p dx = 2 ln |x|, and according to (1.21) we have
I(x) = e2 ln |x| = x2 .
Multiplying the standard form of our equation by I(x) we obtain
x2 y 0 + 2xy = 9x2 .
1.4. LINEAR EQUATIONS & APPLICATIONS
23
Recognizing the left-hand side as (x2 y)0 , we can integrate:
Z
2 0
2
2
(x y) = 9x
⇒ x y = 9x2 dx = 3x3 + C,
where C is an arbitrary constant. Thus our general solution is
2
y = 3x + C x−2 .
Of course, we can also solve initial-value problems for linear equations.
Example 2. Let us find the solution of the initial-value problem
y 0 + 2x y = x, y(0) = 1.
R
Solution. We see that p(x) = 2x, so p dx = x2 and our integrating factor is I(x) =
2
ex . Multiplying by I(x) we have
2
2
2
ex y 0 + 2x ex y = x ex .
But by the product rule, we see that this can be written
2
d x2
[e y] = x ex .
dx
1
Integrating both sides (using substitution on the right hand side) yields
Z
Z
2
2
1
1
1 2
ex y = x ex dx =
eu du = eu + C = ex + C.
2
2
2
We can divide by our integrating factor to obtain our general solution
-2
-1
0
1
2
−x2
Fig.1. y(x) = (1 + e
2
1
+ C e−x .
2
Now we can use our initial condition at x = 0 to conclude
y(x) =
1 = y(0) =
1
+C
2
⇒
C=
1
,
2
2
2
and our solution may be written as y(x) = (1 + e−x )/2; see Fig.1.
x2
In Example 2 we were lucky that it was possible to integrate x e by substitution.
However, even if we cannot evaluate some of the integrals, the method can still be used.
In fact, if we carry out the procedure in the general case (1.20), we obtain the solution
formula
Z
R
R
y(x) = e− pdx
q(x) e p(x)dx dx + c .
(1.22)
But we repeat that it is better to remember the method than the solution formula.
In the next example, we encounter an integral that must be evaluated piecewise.
Example 3. Solve the initial-value problem
y 0 + 2y = q(x),
y(0) = 2,
)/2
24
CHAPTER 1. FIRST-ORDER DIFFERENTIAL EQUATIONS
where q(x) is the function
(
2, if 0 ≤ x ≤ 1
q(x) =
0, if x > 1.
Solution. Since p(x) = 2 we find I(x) = e2x , and multiplication as before gives
y
2
d 2x
[e y] = q(x) e2x .
dx
1
0
1
x
2
Integration yields
3
Fig.2. Graph of q(x)
e2x y =
Z
q(x) e2x dx.
This shows that the solution is continuous for x ≥ 0, but we need to evaluate the integral
separately for 0 ≤ x ≤ 1 and x > 1. For 0 ≤ x ≤ 1 we have q(x) = 2, and integration
yields
Z
2 e2x dx = e2x + C0 ,
where C0 is an arbitrary constant. Solving for y yields y(x) = 1 + C0 e−2x . Using the
initial condition y(0) = 2, we have C0 = 1 and our solution is
y(x) = 1 + e−2x ,
for 0 ≤ x ≤ 1.
To find the solution for x > 1, we need to perform the integration with q(x) = 0:
Z
2x
e y = 0 dx = C1 ⇒ y(x) = C1 e−2x .
To find the constant C1 we need to know the value of y(x) at some x > 1. We do not
have this information, but if the solution y(x) is to be continuous for x ≥ 0, the limiting
value of y(x) as x ↓ 1 must agree with the value y(1) provided by our solution formula
on 0 ≤ x ≤ 1. In other words,
y
2
1
0
1
2
3
x
4
Fig.3. Graph of solution to
Example 3
C1 e−2 = y(1) = 1 + e−2 .
We conclude C1 = 1 + e2 , so y(x) = 1 + e2 e−2x , for x > 1. Putting these formulas
together, we conclude that the solution is given by
(
1 + e−2x ,
if 0 ≤ x ≤ 1
−2x
2
y(x) =
2
1+e e
, if x > 1.
Mixture Problems
Fig 4. Mixing tank.
Suppose we have a tank containing a solution, a solute dissolved in a solvent such as salt
in water (i.e. brine). A different concentration of the solution flows into the tank at some
rate, and the well-mixed solution is drawn off at a possibly different rate. Obviously,
the amount of solute in the tank can vary with time, and we want to derive a differential
equation that governs this system.
Suppose we know ci , the concentration of the solute in the inflow, and ri , the rate
of the inflow; then the product ci ri is the rate at which the solute is entering the tank.
1.4. LINEAR EQUATIONS & APPLICATIONS
25
Suppose we also know the rate of the outflow ro . However, the concentration of the
outflow co can vary with time and is yet to be determined. Let x(t) denote the amount
of solute in the tank at time t > 0 and suppose that we are given x0 , the amount of salt
in the tank at time t = 0. If V (t) denotes the volume of solution in the tank at time t
(which will vary with time unless ri = ro ), then we have
co (t) =
x(t)
.
V (t)
Moreover, the rate at which the solute is flowing out of the tank is the product co ro , so
the rate of change of x(t) is the difference
dx
x(t)
= ci ri − co ro = ci ri −
ro .
dt
V (t)
Rearranging this, we have a initial-value problem for a first-order linear differential
equation for x(t):
dx
ro
+
x(t) = ci ri , x(0) = x0 .
dt
V (t)
Since it is easy to find V (t) from ri , ro , and V0 , the initial volume of solution in the
tank, we can solve this problem using an integrating factor. Let us consider an example.
Example 4. A 40 liter tank is initially half-full of water. A solution containing 10
grams per liter of salt begins to flow in at 4 liters per minute and the mixed solution
flows out at 2 liters per minute. How much salt is in the tank just before it overflows?
Solution. We know that ri = 4, ci = 10, and ro = 2. We also know that V (0) = 20
and dV /dt = ri − ro = 2, so V (t) = 2t + 20. If we let x(t) denote the amount of salt in
the tank at time t, then x(0) = 0 and the rate of change of x is
dx
x
= (4)(10) − (2)
.
dt
2t + 20
We can rewrite this as an initial-value problem
dx
1
+
x = 40,
dt
t + 10
R
R
Since p dt =
This yields
dt
t+10
x(0) = 0.
= ln(t + 10), the integrating factor is I(t) = eln(t+10) = t + 10.
d
[(t + 10)x] = 40(t + 10)
dt
which we integrate to find
(t + 10)x = 20(t + 10)2 + C
⇒
x(t) = 20(t + 10) + C(t + 10)−1 .
But we can use the initial condition x(0) = 0 to evaluate C = −2, 000, and we know
that overflow occurs when V (t) = 40, i.e. t = 10, so the amount of salt at overflow is
x(10) = 20 · 20 −
2,000
20
= 300 grams.
2
26
CHAPTER 1. FIRST-ORDER DIFFERENTIAL EQUATIONS
Exercises
1. Find the general solution for each of the following linear differential equations.
(Prime 0 denotes d/dx.)
(d) xy 0 + y =
(a) y 0 + y = 2
0
−3x
0
−2x
(b) y + 3y = x e
(c) y + 3y = x e
(e)
dx
dt
(f)
t2 dy
dt
+
1
t+1 x
√
x
(x > 0)
=2
− 3ty = t6 sin t (t > 0)
2. Solve the following initial-value problems. (Prime 0 denotes d/dx.)
(a) y 0 + y = ex , y(0) = 1
(b) y 0 + (cot x) y = cos x, y(π/2) = 1
(c) t dy/dt + y = t−1 , y(−1) = 1
(
1, if 0 ≤ x ≤ 1
(d) y + y = q(x), y(0) = 2, where q(x) =
0, if x > 1.
(
1 − x, if 0 ≤ x ≤ 1
(e) y 0 − x1 y = p(x), y(1) = 1, where p(x) =
0,
if x > 1.
0
3. A 100 gallon tank initially contains 10 lbs salt dissolved in 40 gallons of water.
Brine containing 1 lb salt per gallon begins to flow into the tank at the rate of 3
gal/min and the well-mixed solution is drawn off at the rate of 1 gal/min. How
much salt is in the tank when it is about to overflow?
Solution
4. A tank initially contains 100 liters of pure water. Brine containing 3 lb salt/liter
begins to enter the tank at 1 liter/min, and the well-mixed aolution is drawn off
at 2 liters/min.
(a) How much salt is in the solution after 10 min?
(b) What is the maximum amount of salt in the tank during the 100 minutes it
takes for the tank to drain?
5. A reservoir is filled with 1 billion cubic feet of polluted water that initially contains
0.2% pollutant. Every day 400 million cubic feet of pure water enters the reservoir
and the well-mixed solution flows out at the same rate. When will the pollutant
concentration in the lake be reduced to 0.1%?
6. In Example 4, suppose that the inflow only contains 5 grams per liter of salt,
but all other quantities are the same. In addition, suppose that when the tank
becomes full at t = 10 minutes, the inflow is shut off but the solution continues
to be drawn off at 2 liters per minute. How much salt will be in the tank when it
is once again half-full?
7. The rate of change of the temperature T (t) of a body is still governed by (1.8)
when the ambient temperature A(t) varies with time. Suppose the body is known
to have k = 0.2 and initially is at 20◦ C; suppose also that A(t) = 20e−t . Find the
temperature T (t).
1.5. OTHER METHODS
1.5
27
Other Methods
In this section we gather together some additional methods for solving first-order differential equations. The first basic method is to find a substitution that simplifies the
equation, enabling us to solve it using the techniques that we have already discussed; the
two specific cases that we shall discuss are homogeneous equations and Bernoulli
equations. The second basic method is to use multi-variable calculus techniques to
find solutions using the level curves of a potential function for a gradient vector field;
the equations to which this method applies are called exact equations.
Homogeneous Equations
A first-order differential equation in the form
dy
=F
dx
y
(1.23)
x
is called homogeneous. Since the right-hand side only depends upon v = y/x, let us
introduce this as the new dependent variable:
v=
y
x
⇒
y = xv
⇒
dy
dv
=v+x .
dx
dx
Substituting this into (1.23) we obtain
x
dv
= F (v) − v.
dx
(1.24)
But (1.24) is separable, so we can find the solution v(x). Then we obtain the solution
of (1.23) simply by letting y(x) = x v(x). Let us consider an example.
Example 1. Find the general solution of
(x + y) y 0 = x − y.
Solution. This equation is not separable and it is not linear, so we cannot use those
techniques. However, if we divide through by (x + y), we realize the equation is homogeneous:
dy
x−y
1 − (y/x)
=
=
.
dx
x+y
1 + (y/x)
Introducing v = y/x, we obtain
v+x
dv
1−v
=
dx
1+v
⇒
x
dv
1 − 2v − v 2
=
.
dx
1+v
Separating variables and integrating both sides, we find
Z
Z
1+v
dx
= ln |x| + c.
dv
=
2
1 − 2v − v
x
To integrate the left-hand side, we use the substitution w = 1 − 2v − v 2 to calculate
Z
Z
1+v
1
dw
1
dv = −
= − ln |w| + c = ln(|w|−1/2 ) + c.
1 − 2v − v 2
2
w
2
A function f (x, y) is called
“homogeneous of degree d”
if f (t x, t y) = td f (x, y) for
any t > 0. If f (x, y) =
F (y/x), then f (x, y) is homogeneous of degree 0.
28
CHAPTER 1. FIRST-ORDER DIFFERENTIAL EQUATIONS
If we exponentiate ln(|w|−1/2 ) = ln |x| + c, we obtain
w = C x−2 ,
where we have removed the absolute value sign on w by allowing C to be positive or
negative. Using w = 1 − 2v − v 2 , v = y/x, and some algebra, we obtain
x2 − 2xy − y 2 = C.
2
Notice that this defines the solution y(x) implicitly.
Of course, we can also solve initial-value problems for homogeneous equations.
Example 2. Find the solution of the initial-value problem
xy 2 y 0 = x3 + y 3 ,
y(1) = 2.
Solution. If we divide through by xy 2 , we see that the equation is homogeneous:
y −2 y
dy
x2
y
= 2 + =
+ .
dx
y
x
x
x
In terms of v = y/x, we obtain
v+x
and separating variables yields
Z
v 2 dv =
dv
1
= 2 + v,
dx
v
Z
dx
= ln |x| + c.
x
Integrating and then replacing v by y/x, we obtain
y 3 = x3 (3 ln |x| + c).
Using the initial condition y(1) = 2, we can evaluate c = 8. Moreover, since ln |x| is
not defined at x = 0 and our initial condition occurs at x = 1, this solution should be
restricted to x > 0; thus we can remove the absolute value signs on x. Then take the
cube root of both sides to find
y(x) = x (3 ln x + 8)
1/3
for x > 0.
2
Bernoulli Equations
A first-order differential equation in the form
dy
+ p(x)y = q(x)y α ,
dx
(1.25)
where α is a real number, is called a Bernoulli equation. If α = 0 or α = 1, then
(1.25) is linear; otherwise it is not linear. However, there is a simple substitution that
reduces (1.25) to a linear equation. In fact, if we replace y by
v = y 1−α ,
(1.26)
1.5. OTHER METHODS
29
then
1
dv
dy
=
yα
dx
1−α
dx
and p(x) y = p(x) y α v,
so (1.25) becomes
dv
+ (1 − α) p(x) v = (1 − α) q(x).
dx
(1.27)
Now we can use an integrating factor to solve (1.27) for v, and then use (1.26) to recover
y. Let us perform a simple example.
Example 3. Find the general solution of
x2 y 0 + 2 x y = 3 y 4 .
Solution. We divide by x2 to put this into the form (1.25) with α = 4:
y0 +
2
3
y = 2 y4 .
x
x
We introduce v = y −3 , but rather than just plugging into (1.27), it is better to derive
the equation that v satisfies:
v = y −3
⇒
y = v −1/3
1
y 0 = − v −4/3 v 0 ,
3
⇒
1
2
3
− v −4/3 v 0 + v −1/3 = 2 v −4/3 ,
3
x
x
and after some elementary algebra we obtain the linear equation
⇒
v0 −
6
9
v = − 2.
x
x
As integrating factor, we take
I(x) = e−6
R
x−1 dx
= e−6 ln |x| = x−6 ,
which enables us to find v:
(x−6 v)0 = −9 x−8
⇒
x−6 v =
9 −7
x +C
7
⇒
v=
9 −1
x + C x6 .
7
Finally, we use y = v −1/3 to find our desired solution:
y(x) =
9
7
x−1 + C x6
−1/3
.
2
Exact Equations
Let us consider a first-order equation in differential form:
M (x, y) dx + N (x, y) dy = 0.
(1.28)
30
CHAPTER 1. FIRST-ORDER DIFFERENTIAL EQUATIONS
For example, dy/dx = f (x, y) is equivalent to (1.28) if we take M (x, y) = −f (x, y) and
N (x, y) = 1, but there are many other possibilities. In fact, we are interested in the
case that there is a differentiable function Φ(x, y) so that
M=
∂Φ
∂x
and N =
∂Φ
.
∂y
(1.29)
If we can find Φ(x, y) satisfying (1.29), then we say that (1.28) is exact and we call
Φ a potential function for the vector field (M, N ). The significance of the potential
function is that its level curves, i.e.
Φ(x, y) = c,
(1.30)
implicitly define functions y(x) which are solutions of (1.28); we know this since taking
the differential of (1.30) is exactly (1.28). Let us consider an example.
Example 4. Suppose we want the general solution of
2x sin y dx + x2 cos y dy = 0.
By inspection, we see that Φ(x, y) = x2 sin y has the desired partial derivatives
∂Φ
= 2x sin y
∂x
and
∂Φ
= x2 cos y.
∂y
Consequently,
x2 sin y = c
provides the general solution (in implicit form).
2
But how do we know when (1.28) is exact, and how do we construct Φ(x, y)? Recall
from multivariable calculus that if Φ(x, y) has continuous second-order derivatives (i.e.
∂ 2 Φ/∂x2 , ∂ 2 Φ/∂y∂x, ∂ 2 Φ/∂x∂y, and ∂ 2 Φ/∂y 2 are all continuous functions of x and y)
then “mixed partials are equal”:
∂2Φ
∂2Φ
=
.
∂x ∂y
∂y ∂x
(1.31)
Putting (1.31) together with (1.29), we find that (1.28) being exact implies
∂M
∂N
=
.
∂y
∂x
(1.32)
On the other hand, if M , N satisfy (1.32), then we want to construct Φ so that (1.29)
holds. Let us define
Z
Φ(x, y) = M (x, y) dx + g(y)
(1.33)
where g(y) is a function that is to be determined. Whatever g(y) is, we have ∂Φ/∂x =
M , but we want to choose g(y) so that
Z
∂Φ
∂M
N (x, y) =
=
(x, y) dx + g 0 (y).
∂y
∂y
1.5. OTHER METHODS
31
Consequently, g 0 (y) must satisfy
g 0 (y) = N (x, y) −
Z
∂M
(x, y) dx.
∂y
(1.34)
But this requires the right-hand side in (1.34) to be independent of x. Is this true? We
check by differentiating it with respect to x:
Z
∂
∂N
∂M
∂M
N (x, y) −
(x, y) dx =
−
=0
∂x
∂y
∂x
∂y
since we have assumed (1.32). We summarize these conclusions as a theorem.
Theorem 1. Suppose M (x, y) and N (x, y) are continuously differentiable functions on
a rectangle R given by a < x < b and c < y < d. The equation (1.28) is exact if and
only if (1.32) holds. Moreover, in this case, the potential function Φ(x, y) is given by
(1.33) where g(y) is determined by (1.34).
Example 5. Find the solution of the initial-value problem
y 2 ex dx + (2y ex + cos y) dy = 0,
y(0) = π.
Solution. Let us first check to see whether the differential equation is exact. Letting
M = y 2 ex and N = 2y ex + cos y, we compute
∂M
= 2y ex
∂y
and
∂N
= 2y ex ,
∂x
so we conclude the equation is exact. Next, as in (1.33), we define
Z
Φ(x, y) = y 2 ex dx + g(y) = y 2 ex + g(y).
We know that Φx = M , so we check Φy = N , i.e. we want
2y ex + g 0 (y) = 2y ex + cos y.
Notice that the terms involving x drop out and we are left with g 0 (y) = cos y, which
we easily solve to find g(y) = sin y. We conclude that the general solution is given
implicitly by
Φ(x, y) = y 2 ex + sin y = c.
To find the solution satisfying the initial condition, we simply plug in x = 0 and y = π
to evaluate c = π 2 . Consequently, the solution of the initial-value problem is given
implicitly by
y 2 ex + sin y = π 2 .
Remark 1. Of course, instead of defining Φ by (1.33), (1.34) we could have defined
Z
Φ(x, y) = N (x, y) dy + h(x),
where h(x) is chosen to satisfy
h0 (x) = M (x, y) −
Z
∂N
(x, y) dy.
∂x
2
32
CHAPTER 1. FIRST-ORDER DIFFERENTIAL EQUATIONS
Exercises
1. Determine whether the given differential equation is homogeneous, and if so find
the general solution.
√
(a) 2x y y 0 = x2 + 2y 2 ,
(b) x y 0 = y + 2 xy (x, y > 0),
(c) x(x + y) y 0 = y (x − y),
(d) (x2 − y 2 ) y 0 = 2xy.
2. Find the general solution of the following Bernoulli equations
(a) x2 y 0 + 2xy = 5y 3 ,
(b) x y 0 + 6 y = 3 x y 4/3 .
3. Determine whether the given equation in differential form is exact, and if so find
the general solution.
(a) 2x ey dx + (x2 ey − sin y) dy = 0,
y
(b) (sin y + cos x) dx + x cos y dy = 0,
(d) (cos x + ln y) dx + ( xy + ey ) dy = 0.
y
(c) sin x e dx + cos x + e dy = 0,
4. Solve the following initial-value problems (which could be homogeneous, Bernoulli,
or exact).
(a) (2x2 + y 2 ) dx − xy dy = 0, y(1) = 2,
(b) (2xy 2 + 3x2 ) dx + (2x2 y + 4y 3 ) dy = 0, y(1) = 0,
(c) (y + y 3 ) dx − dy = 0, y(0) = 1.
1.6
Additional Exercises
1. A car is traveling at 30 m/sec when the driver slams on the brakes and the car
skids 100 m. Assuming the braking system provided constant deceleration, how
long did it take for the car to stop?
2. Suppose a car skids 50 m after the brakes are applied at 100 km/hr. How far will
the same car skid if the brakes are applied at 150 km/hr?
For the next two problems, the graph represents the velocity v(t) of a particle moving
along the x-axis for 0 ≤ t ≤ 10. Sketch the graph of the position function x(t).
3.
4.
(5,2)
.
2
.
.
1
1
0 1
(7,2)
(3,2)
2
2
3
4
5
6
7
8
9 10
0 1
2
3
4
5
6
7
8
9 10
1.6. ADDITIONAL EXERCISES
33
For the following two problems, identify which differential equation corresponds to the
given slope field.
5.
(a)
dy/dx = xy 2
(b) dy/dx = y
2
(c)
2
(d) dy/dx = x + y
-1
(a)
dy/dx = x sin y
(b) dy/dx = x cos y
(c)
dy/dx = y + x
2
-2
6.
2
dy/dx = y sin x
(d) dy/dx = y cos y
2
2
1
1
0
1
2
-2
-1
0
-1
-1
-2
-2
1
2
The next two problems concern the autonomous equation dy/dx = f (y). For the function
f (y) given by the graph, find the equilibrium solutions and determine their stability.
7.
8.
f(y)
f(y)
-1
0
1
y
y
-1
0
1
9. For all values of c, determine whether the autonomous equation
dy
= y 2 + 2y + c
dt
has equilibrium solutions; if so, determine their stability.
10. Is it possible for a non-autonomous equation dy/dx = f (x, y) to have an equilibrium solution y(x) ≡ y0 ? If not, explain. If so, give an example.
34
CHAPTER 1. FIRST-ORDER DIFFERENTIAL EQUATIONS
11. A hemispherical tank of radius R contains water at a depth of y, but a small hole
in the bottom of the tank has cross-sectional
area a, so Torricelli’s law says that
√
water exits the tank with velocity v = 2 g y. TheR volume of water in the tank
y
depends upon y, and can be computed by V (y) = 0 A(u) du, where A(y) is the
horizontal cross-sectional area of the tank at height y above the hole.
(a) Show that A(y) = π[R2 − (R − y)2 ].
Fig.1. Hemispherical Tank
with a Hole.
(b) Show that y(t) satisfies the differential equation
A(y)
p
dy
= −a 2 g y.
dt
12. A hemispherical tank as in the previous problem has radius R = 1 m and the hole
has cross-sectional area a = 1 cm2 . Suppose the tank begins full. How long will
it take for all the water to drain out through the hole?
13. The population of Country X grows at a rate proportional to its size with proportionality constant k = 0.1. However, during hard times, there is continuous
emigration. If the initial population is one million, what annual emigration rate
will keep the population to one and a half million in ten years?
14. The rate of change of the temperature T (t) of a body is still governed by (1.8)
when the ambient temperature A(t) varies with time. Suppose the dish of leftovers
in Example 4 of Section 1.3 is placed in the oven at room temperature 70◦ F, and
then the oven is turned up to 200◦ F. Assume the oven heats at a constant rate
from 70◦ to 200◦ in 10 minutes, after which it remains at 200◦ . How long until
the leftovers are 100◦ ? (Note: the same value of k applies here as in Example 4.)
Find the general solution using whatever method is appropriate.
15. dy/dx = 1 + x + y + xy
16. dy/dx = sin(y + x)/(sin y cos x) − 1
17. y 0 + (x ln x)−1 y = x
18. dy/dx = −(x3 + y 2 )/(2xy)
Solve the following initial-value problems using whatever method is appropriate.
19. y 0 + y 2 sin x = 0, y(π) = 1/2
2 −t
20. t dy
dt − y = t e , y(1) = 3
21. (y + ey )dy + (e−x − x)dx = 0, y(0) = 1
2
22. (x2 + 1)y 0 + 2x3 y = 6xe−x , y(0) = −1
2
2
23. tx dx
dt = t + 3x , x(1) = 1
24. (y + 2x sin y cos y)y 0 = 3x2 − sin2 y, y(0) = π
Chapter 2
Second-Order Differential
Equations
2.1
Introduction to Higher-Order Equations
An nth-order differential equation, or differential equation of order n, involves
an unknown function y(x) and its derivatives up to order n. Generally such an equation
is in the form y (n) = f (x, y, . . . , y (n−1) ) where y (k) denotes the k-th order derivative of
y(x); note that this generalizes (1.2) in Section 1.1. But we usually assume the equation
is linear, so can be put in the form
y (n) + a1 (x)y (n−1) + · · · an (x)y = f (x),
(2.1)
where the coefficients a1 (x), . . . , an (x) are functions of x. We can also write y (k) as
Dk y and aj (x)y (k) as aj Dk y. This enables us to write (2.1) as
Dn y + a1 Dn−1 y + · · · an y = (Dn + a1 Dn−1 + · · · + an ) y = f.
If we introduce the linear differential operator of order n as
L = Dn + a1 Dn−1 + · · · + an ,
Differential operators L
provide a convenient way
of writing linear differential
equations as Ly = f
then we can simplify (2.1) even further: Ly = f. When f is nonzero, then (2.1) (in any
of its notational forms) is called a nonhomogeneous equation. On the other hand,
we say that
Ly = y (n) + a1 (x)y (n−1) + · · · an (x)y = 0,
(2.2)
is a homogeneous equation. Homogeneous differential equations are not only important for their own sake, but will prove important in studying the solutions of an
associated nonhomogeneous equation.
Let us further discuss the linearity of the differential operator L. If y1 and y2 are
two functions that are sufficiently differentiable that L(y1 ) and L(y2 ) are both defined,
then L(y1 + y2 ) is also defined and in fact L(y1 + y2 ) = L(y1 ) + L(y2 ). Moreover, if
35
The use of the term “homogeneous” to refer to (2.2)
is distinct from its usage in
Section 1.5.
36
CHAPTER 2. SECOND-ORDER DIFFERENTIAL EQUATIONS
c1 is any constant, then L(c1 y1 ) = c1 L(y1 ). We can combine these statements in one
formula that expresses the linearity of L:
L(c1 y1 + c2 y2 ) = c1 L(y1 ) + c2 L(y2 )
Superposition just means
that solutions of a homogeneous equation can be
added together or multiplied by a constant; they
can also be added to a solution of an associated nonhomogeneous equation
for any constants c1 and c2 .
(2.3)
This linearity provides an important property for the solutions of both (2.1) and (2.2)
called superposition.
Theorem 1. (a) If y1 and y2 are two solutions of the homogeneous equation (2.2), then
the linear combination
y(x) = c1 y1 (x) + c2 y2 (x),
where c1 and c2 are arbitrary constants,
is also a solution of (2.2).
(b) If yp is a particular solution of the nonhomogeneous equation (2.1) and y0 is any
solution of the associated homogeneous equation (2.2), then the linear combination
y(x) = yp (x) + c y0 (x),
where c is an arbitrary constant,
is also a solution of the nonhomogeneous equation (2.1).
Proof. We simply use the linearity of L:
L(c1 y1 + c2 y2 ) = c1 Ly1 + c2 Ly2 = 0 + 0 = 0
L(yp + c y0 ) = L(yp ) + cL(y0 ) = f + 0 = f.
2
Example 1. (a) y 00 + y = 0 and (b) y 00 + y = 2ex .
(a) The functions y1 (x) = cos x and y2 (x) = sin x clearly both satisfy y 00 + y = 0, hence
so does the linear combination
y(x) = c1 cos x + c2 sin x.
(b) The function yp (x) = ex clearly satisfies y 00 + y = 2ex , and hence so does
y(x) = ex + c1 cos x + c2 sin x.
2
In the next section we will discuss the general solution for a second-order equation
and its role in solving initial-value problems. But in the remainder of this section
we discuss another useful approach to analyzing higher-order equations, and then an
important application of second-order equations to mechanical vibrations.
Conversion to a System of First-Order Differential Equations
One approach to analyzing a higher-order equation is to convert it to a system of
first-order equations: this is especially important since it can be used for higher-order
nonlinear equations. Let us see how to do this with the linear equation (2.1). We begin
by renaming y as y1 , then renaming y 0 as y2 so that y10 = y2 . Similarly, if we rename y 00
2.1. INTRODUCTION TO HIGHER-ORDER EQUATIONS
37
as y3 , then we have y20 = y3 . We continue in this fashion; however, instead of renaming
y (n) , we use (2.1) to express it in terms of y1 , . . . , yn . We summarize this as follows:
y10 = y2
y20 = y3
..
.
(2.4)
yn0 = f − a1 yn − a2 yn−1 − · · · − an y1 .
This is an example of a system of first-order differential equations, also called a
first-order system. Our experience with first-order equations suggests that an initialvalue problem for (2.4) involves specifying the values η1 , . . . , ηn for y1 , . . . , yn at some
point x0 :
y1 (x0 ) = η1 , y2 (x0 ) = η2 , . . . yn (x0 ) = ηn .
(2.5)
Recalling how y1 , . . . , yn were defined, this means that an initial-value problem for
(2.1) should specify the first n − 1 derivatives of y at x = x0 :
y(x0 ) = η1 ,
y 0 (x0 ) = η2 ,
...
y (n−1) (x0 ) = ηn .
(2.6)
Let us consider an example.
An initial-value problem
for a differential equation
of order n specifies the first
n − 1 derivatives at the initial point
Example 2. Convert the initial-value problem for the second-order equation
y 00 + 2y 0 + 3y = sin x,
y(0) = 0, y 0 (0) = 1
to an initial-value problem for a first-order system.
Solution. Let y1 = y and y2 = y 0 . We find that y10 = y 0 = y2 and y20 = y 00 =
sin x − 2y 0 − 3y = sin x − 2y2 − 3y1 . Moreover, y1 (0) = y(0) = 0 and y2 (0) = y 0 (0) = 1.
Consequently, the initial-value problem can be written as
y10 = y2 ,
y1 (0) = 0
y20 = sin x − 2y2 − 3y1 ,
y2 (0) = 1.
2
Of course, replacing a single (higher-order) equation by a system of (first-order)
equations introduces complications associated with manipulating systems of equations;
but this is the purpose of linear algebra, that we begin to study in Chapter 4. We shall
further develop the theory of systems of first-order differential equations in Chapter 7,
after we have developed the tools from linear algebra that we will need.
Mechanical Vibrations: Spring-Mass Systems
Suppose an object of mass m is attached to a spring. Compressing or stretching the
spring causes it to exert a restorative force Fs that tries to return the spring to its
natural length. Moreover, it has been observed that the magnitude of this restorative
force is proportional to the amount it has been stretched or compressed; this is called
Hooke’s law. If we denote by x the amount that the spring is stretched beyond its
natural length, then Hooke’s law may be written
Fig.1. Spring’s Force
Fs = −kx,
where k > 0 is called the spring constant.
(2.7)
38
CHAPTER 2. SECOND-ORDER DIFFERENTIAL EQUATIONS
This may be viewed as our mathematical model for the force of the spring: if x > 0
then Fs acts in the negative x direction, and if x < 0 then Fs acts in the positive x
direction, in both cases working to restore the equilibrium position x = 0.
Now suppose the object is in motion. Invoking Newton’s second law, F = ma, we
see that the motion x(t) of the object is governed by m d2 x/dt2 = −kx. Writing this in
the form (2.1) we obtain
d2 x
(2.8)
m 2 + kx = 0.
dt
This is an example of a homogeneous second-order differential equation. What should
we take as initial conditions for (2.8)? At this point, we can either call upon (2.6), or
our experience that the motion is only determined if we know both the initial position
of the object and its initial velocity. Thus we take as initial conditions for (2.8):
x(0) = x0
Fig.2. Vertical Spring
dx
(0) = v0 .
dt
(2.9)
The equation (2.8) together with the conditions (2.9) is an initial-value problem.
Notice in (2.8) that the motion was horizontal, so we did not consider the effect
of gravity. If the spring is hanging vertically and the motion of the object is also
vertical, then gravity does play a role. Let us now denote by y the amount the spring
is stretched beyond its natural length. The positive y direction is downward, which is
also the direction of the gravitational force Fg , so the sum of forces is −ky + mg. This
leads to the equation
d2 y
m 2 + ky = mg.
(2.10)
dt
This is an example of a nonhomogeneous second-order differential equation. Letting a
mass stretch a vertically hung spring is also a good way to determine the spring constant
k, since both the force mg and the displacement y are known (cf. Exercises 4 and 5).
Of course, other forces could be involved as well. For example, the object attached
to the horizontal spring may be resting on a table, and its motion could be impeded by
the resistive force of friction. As discussed in Section 1.3, resistive forces Fr generally
depend upon the velocity of the object, in this case dx/dt. If we assume that the
frictional force is proportional to the velocity, then Fr = −c dx/dt where c is called the
damping coefficient, and in place of (2.8) we have
m
Fig.3. Spring with Friction
and
d2 x
dx
+ kx = 0.
+c
dt2
dt
(2.11)
This is another example of a homogeneous second-order differential equation. The
vertical spring, of course, is not subject to friction, but we could consider air resistance
as a resistive force; it is clear how to modify (2.10) to account for this.
Restorative and resistive forces are called internal forces in the system because
they depend upon position or velocity. Mechanical vibrations involving only internal
forces are called free vibrations; hence, (2.8) and (2.11) are both examples of free
vibrations, and these will be analyzed in Section 2.4. However, there could be other
forces involved in mechanical vibrations that are independent of the position or velocity;
these are called external forces. Of course, gravity is an example of a constant external
force, but more interesting external forces depend on time t, for example a periodic
2.2. GENERAL SOLUTIONS FOR SECOND-ORDER EQUATIONS
39
electromagnetic force. This leads us to the nonhomogeneous second-order differential
equation mentioned in Section 1.1. We shall study such forced vibrations later in this
chapter.
Exercises
1. Verify that the given functions y1 and y2 satisfy the given homogeneous differential
equation.
(a) y1 (x) = ex , y2 (x) = e−x ; y 00 − y = 0.
(b) y1 (x) = cos 2x, y2 (x) = sin 2x; y 00 + 4y = 0.
(c) y1 (x) = e−x cos x, y2 (x) = e−x sin x; y 00 + 2y 0 + 2y = 0.
(d) y1 (x) = x−1 , y2 (x) = x−2 ; y 00 + x4 y 0 +
2
x2 y
= 0.
2. Convert the initial-value problem for the second-order equation to an initial-value
problem for a first-order system:
(a) y 00 − 9y = 0, y(0) = 1, y 0 (0) = −1.
(b) y 00 + 3y 0 − y = ex , y(0) = 1, y 0 (0) = 0.
(c) y 00 = y 2 , y(0) = 1, y 0 (0) = 0.
(d) y 00 + y 0 + 5 sin y = ex , y(0) = 1, y 0 (0) = 1.
3. Convert each of the second order equations (2.8), (2.10), and (2.11) to a system
of first-order equations.
4. A vertically hung spring is stretched .5 m when a 10 kg mass is attached. Assuming
the spring obeys Hooke’s law (2.7), find the spring constant k. (Include the units
in your answer.)
Solution
5. A 10 lb weight stretches a vertical spring 2 ft. Assuming the spring obeys Hooke’s
law (2.7), find the spring constant k. (Include the units in your answer.)
2.2
General Solutions for Second-Order Equations
Let us write our second-order linear differential equation in the form
y 00 + p(x)y 0 + q(x)y = f (x).
(2.12)
An initial-value problem for (2.12) consists of specifying both the value of y and y 0
at some point x0 :
y(x0 ) = y0 and y 0 (x0 ) = y1 .
(2.13)
The existence and uniqueness of a solution to this initial-value problem is provided by
the following:
Theorem 1. Suppose that the functions p, q, and f are all continuous on an open
interval I containing the point x0 . Then for any numbers y0 and y1 , there is a unique
solution of (2.12) satisfying (2.13).
For a linear equation with
continuous coefficients on
an interval, the solution
of an initial-value problem
exists throughout the interval and is unique
40
CHAPTER 2. SECOND-ORDER DIFFERENTIAL EQUATIONS
Like Theorem 1 in Section 1.2, this existence and uniqueness theorem is proved using
successive approximations, although we shall not give the details here. However, let us
observe that existence and uniqueness holds on all of I since the equation is linear.
Of course, Theorem 1 above applies to the case f = 0, i.e. a homogeneous secondorder linear differential equation
y 00 + p(x)y 0 + q(x)y = 0.
(2.14)
Notice that (2.14) admits the trivial solution y(x) ≡ 0. In fact, as an immediate
consequence of the uniqueness statement in Theorem 1, we have the following:
Corollary 1. Suppose that p and q are continuous on an open interval I containing
the point x0 , and y(x) is a solution of (2.14) satisfying y(x0 ) = 0 = y 0 (x0 ). Then y is
the trivial solution.
Let us see how to solve an initial-value problem for (2.14) with an example.
Example 1. Find the solution for the initial-value problem
y 00 + y = 0,
Using a linear combination
of solutions to solve an
initial-value problem
y(0) = 1, y 0 (0) = −1.
Solution. As we saw in the previous section, the linear combination
y(x) = c1 cos x + c2 sin x
provides a two-parameter family of solutions of y 00 + y = 0. If we evaluate this and its
derivative at x = 0 we obtain
y(0) = c1 cos 0 + c2 sin 0 = c1 ,
y 0 (0) = −c1 sin 0 + c2 cos 0 = c2 .
Using the given initial conditions, we find c1 = 1 and c2 = −1, so the unique solution is
y(x) = cos x − sin x.
2
This example demonstrates the usefulness of taking linear combinations of solutions
in order to solve initial-value problems for homogeneous equations. However, we need
to make sure that we are taking linear combinations of solutions that are truly different
from each; in Example 1, we could not use y1 (x) = cos x and y2 (x) = 2 cos x because
the linear combination
y(x) = c1 y1 (x) + c2 y2 (x) = c1 cos x + 2c2 cos x = (c1 + 2c2 ) cos x
Linear independence for
more than two functions
will be discussed in Chapter 5
will not have enough flexibility to satisfy both initial conditions y(0) = 1 and y 0 (0) = −1.
This leads us to the notion of linear independence: two functions defined on an interval
I are said to be linearly dependent if they are constant multiples of each other;
otherwise, they are linearly independent.
Theorem 2. Suppose the functions p and q are continuous on an interval I, and y1 and
y2 are two solutions of (2.14) that are linearly independent on I. Then every solution
y of this equation can be expressed as a linear combination of y1 and y2 , i.e.
y(x) = c1 y1 (x) + c2 y2 (x)
for all x ∈ I.
2.2. GENERAL SOLUTIONS FOR SECOND-ORDER EQUATIONS
41
Thus, if y1 and y2 are linearly independent solutions of (2.14), we call y(x) = c1 y1 (x) +
c2 y2 (x) the general solution of (2.14).
To prove this theorem, we need to use linear algebra for a system of two equations
in two unknowns x1 , x2 . In particular, for any constants a, b, c, and d, the system
a x1 + b x2 = 0
c x1 + d x2 = 0
(2.15)
always admits the trivial solution x1 = 0 = x2 , but we want to know when it admits a
nontrivial solution: at least one of x1 or x2 is nonzero. Also, we want to know when,
for any values of y1 and y2 , the system
a x1 + b x2 = y1
c x1 + d x2 = y2
(2.16)
has a solution x1 , x2 .
Lemma 1. (a) The equations (2.15) admit a nontrivial solution (x1 , x2 ) if and only if
ad − bc = 0.
(b) The equations (2.16) admit a unique solution (x1 , x2 ) for each choice of (y1 , y2 ) if
and only if ad − bc 6= 0.
Since the quantity ad − bc is so important for the solvability of (2.15) and (2.16), we
give it a special name, the determinant:
a b
det
≡ ad − bc = 0.
(2.17)
c d
Lemma 1 is easily proved using elementary algebra (cf. Exercise 13), but it is also a
special case of the n variable version that we shall derive in Chapter 4; its appearance
here provides extra motivation for the material presented in Chapter 4.
Given two differentiable functions f and g on an interval I, let us define the Wronskian of f and g to be the function
f g
W (f, g) = det 0
= f g 0 − f 0 g.
f g0
The relevance of the Wronskian for linear independence is given in the following:
Theorem 3. Suppose y1 and y2 are differentiable functions on the interval I.
(a) If W (y1 , y2 )(x0 ) 6= 0 for some x0 ∈ I, then y1 and y2 are linearly independent on I.
(b) If y1 and y2 are both solutions of (2.14) such that W (y1 , y2 )(x0 ) = 0 for some x0 ∈ I,
then y1 and y2 are linearly dependent on I.
Remark 1. In this theorem, it is somewhat remarkable that the condition on the Wronskian is only made a one point x0 , but the conclusions hold on the whole interval I.
Proof. (a) We will prove the contrapositive: if y1 and y2 are linearly dependent, then
W (y1 , y2 ) ≡ 0 on I. In fact, y1 and y2 being linearly dependent means y2 = c y1 for
some constant c. Hence
W (y1 , y2 ) = y1 y20 − y10 y2 = y1 (c y1 )0 − y10 (c y1 ) = c(y1 y10 − y10 y1 ) ≡ 0.
The Wronskian is named
after the Polish mathematician and philosopher
Józef
Hoene
Wroński
(1776-1853)
42
CHAPTER 2. SECOND-ORDER DIFFERENTIAL EQUATIONS
(b) Suppose W (y1 , y2 )(x0 ) = 0 for some point x0 in I; we want to show that y1 and y2
must be linearly dependent, i.e. for some nonzero constants c1 and c2 we have c1 y1 (x) +
c2 y2 (x) = 0 on I. Consider the 2×2 system
c1 y1 (x0 ) + c2 y2 (x0 ) = 0
c1 y10 (x0 ) + c2 y20 (x0 ) = 0.
The condition in Lemma 1 (a) that we can find a nontrivial solution c1 , c2 is just
W (y1 , y2 )(x0 ) = 0, which we have assumed. Using these nonzero values c1 , c2 , define
y(x) = c1 y1 (x) + c2 y2 (x).
As a linear combination of solutions of (2.14), y is also a solution of (2.14), and it
satisfies the zero initial conditions at x0 :
y(x0 ) = 0,
y 0 (x0 ) = 0.
(2.18)
By Corollary 1, we must have y(x) ≡ 0. But this means that y1 and y2 are linearly
dependent on I.
2
Example 2. (a) Show that f (x) = 2 and g(x) = sin2 x + cos2 x are linearly dependent.
(b) Show that f (x) = e2x and g(x) = e3x are linearly independent.
Solution. (a) By trigonometry, g(x) ≡ 1 so f = 2g, and hence f and g are linearly
dependent. (We may compute W (f, g) ≡ f g 0 − f 0 g ≡ 0, but we cannot use Theorem 3
to conclude that f and g are linearly dependent without knowing that both functions
satisfy an equation of the form (2.14).) (b) W (f, g) = e2x 3 e3x − 2 e2x e3x = e5x 6= 0, so
Theorem 3 (a) implies that f and g are linearly independent. (This was already obvious
since f is not a constant multiple of g.)
2
Now we are ready to prove that the general solution of (2.14) is given as a linear
combination of two linearly independent solutions.
Proof of Theorem 2. Consider any solution y of (2.14) on I and pick any point x0
in I. Given our two linearly independent solutions y1 and y2 , we ask whether we can
solve the following system for constants c1 and c2 :
c1 y1 (x0 ) + c2 y2 (x0 ) = y(x0 )
c1 y10 (x0 ) + c2 y20 (x0 ) = y 0 (x0 ).
Since W (y1 , y2 )(x0 ) = y1 (x0 )y20 (x0 ) − y10 (x0 )y2 (x0 ) 6= 0, Lemma 1 (b) assures us that
we can find c1 , c2 . Using these constants, let us define
z(x) = c1 y1 (x) + c2 y2 (x).
But z is a solution of (2.14) with z(x0 ) = y(x0 ) and z 0 (x0 ) = y 0 (x0 ), so by uniqueness
we must have z(x) ≡ y(x) for x in I. In other words, y is a linear combination of y1
and y2 .
2
Consequently, if we have two linearly independent solutions of a second-order homogeneous equation, we can use them to solve an initial-value problem.
2.2. GENERAL SOLUTIONS FOR SECOND-ORDER EQUATIONS
43
Example 3. Use y1 = e2x and y2 = e3x to solve the initial-value problem
y 00 − 5y 0 + 6y = 0,
y(0) = 1, y 0 (0) = 4.
Solution. We can easily check that both y1 and y2 satisfy the differential equation, and
they are obviously linearly independent (or we can use Theorem 3 (a) as in Example
2). So the general solution is y(x) = c1 e2x + c2 e3x , and we need only choose c1 and c2
to satisfy the initial conditions. But y(0) = c1 + c2 = 1 and y 0 (x) = 2c1 e2x + 3c2 e3x ⇒
y 0 (0) = 2c1 + 3c2 = 4, so c1 = −1 and c2 = 4. We conclude
y(x) = 2 e3x − e2x .
2
Now that we know how to find the general solution of a second-order homogeneous
equation, we naturally want to know how to do the same for a nonhomogeneous equation.
This is provided by the following.
Theorem 4. Suppose that the functions p, q, and f are all continuous on an open
interval I, yp is a particular solution of the nonhomogeneous equation (2.12), and y1 ,
y2 are linearly independent solutions of the homogeneous equation (2.14). Then every
solution y of (2.12) can be written in the form
y(x) = yp (x) + c1 y1 (x) + c2 y2 (x)
for some constants c1 , c2 .
Proof . Since L(yp ) = f = L(y), by linearity we have L(y − yp ) = L(y) − L(yp ) = 0.
So y − yp is a solution of the homogeneous equation (2.14), and we can use Theorem 2
to find constants c1 , c2 so that
y(x) − yp (x) = c1 y1 (x) + c2 y2 (x).
2
But this is what we wanted to show.
Another way of stating this theorem is that the general solution of (2.12) can be
written as
y(x) = yp (x) + yc (x),
(2.19)
where yp (x) is any particular solution of (2.12) and yc (x) is the general solution of the
associated homogeneous equation (2.14); yc is called the complementary solution for
(2.12).
Let us consider an example of using the general solution to solve an initial-value
problem for a nonhomogeneous equation.
Example 4. Find the solution for the initial-value problem
y 00 + y = 2ex ,
y(0) = 1, y 0 (0) = 0.
Solution. We first want to find two linearly independent solutions of the associated
homogenous equation y 00 + y = 0. But, in Example 1, we observed that y1 (x) = cos x
and y2 (x) = sin x are solutions, and they are obviously linearly independent (or we can
compute the Wronskian to confirm this). So our complementary solution is
yc (x) = c1 cos x + c2 sin x.
The general solution for a
nonhomogeneous equation
requires a particular solution and the general solution of the homogeneous
equation
44
CHAPTER 2. SECOND-ORDER DIFFERENTIAL EQUATIONS
We need to have a particular solution yp of y 00 + y = 2ex . In Section 2.5 we will discuss
a systematic way of finding yp , but in this case we might notice that yp (x) = ex works.
Thus our general solution is
y(x) = ex + c1 cos x + c2 sin x,
and it is just a matter of finding the constants c1 and c2 so that the initial conditions
are satisfied:
y(x) = ex + c1 cos x + c2 sin x ⇒ y(0) = 1 + c1 = 1 ⇒ c1 = 0,
y 0 (x) = ex − c1 sin x + c2 cos x ⇒ y 0 (0) = 1 + c2 = 0 ⇒ c2 = −1.
2
So the solution is y(x) = ex − sin x.
Exercises
For Exercises 1-4 below, (a) verify that y1 and y2 satisfy the given second-order equation,
and (b) find the solution satisfying the given initial conditions (I.C.).
1. y 00 − y = 0; y1 (x) = ex , y2 (x) = e−x . I.C. y(0) = 1, y 0 (0) = 0.
2. y 00 − 3y 0 + 2y = 0; y1 (x) = ex , y2 (x) = e2x . I.C. y(0) = 0, y 0 (0) = −1.
3. y 00 − 2y 0 + y = 0; y1 (x) = ex , y2 (x) = x ex . I.C. y(0) = 1, y 0 (0) = 3.
4. y 00 + 2y 0 + 2y = 0; y1 (x) = e−x cos x, y2 (x) = e−x sin x. I.C. y(0) = 1, y 0 (0) = 1.
For Exercises 5-8 below, determine whether the given pair of functions is linearly independent on I = (0, ∞).
5. f (x) = 1 + x2 , g(x) = 1 − x2 .
7. f (x) = 3 x2 , g(x) = 4 e2 ln x .
6. f (x) = cos x, g(x) = sin x.
8. f (x) = x, g(x) = x ex .
For Exercises 9-12 below, (a) verify that yp satisfies the given second-order equation,
(b) verify that y1 and y2 satisfy the associated homogenous equation, and (c) find the
solution of the given equation with initial conditions (I.C.).
9. y 00 − y = x; yp (x) = −x, y1 (x) = ex , y2 (x) = e−x . I.C. y(0) = 1, y 0 (0) = 0.
Solution
10. y 00 + 9y = 3; yp (x) = 1/3, y1 (x) = cos 3x, y2 (x) = sin 3x. I.C. y(0) = 0, y 0 (0) = 1.
11. y 00 − 2y 0 + 2y = 4x; yp (x) = 2x + 2, y1 (x) = ex cos x, y2 (x) = ex sin x.
I.C. y(0) = 0 = y 0 (0).
12. x2 y 00 − 2x y 0 + 2y = 2; yp = 1, y1 (x) = x, y2 (x) = x2 . I.C. y(1) = 0 = y 0 (1).
The following exercise proves Lemma 1 concerning the linear systems (2.15) and (2.16).
13.
a.
b.
c.
d.
If a = c = 0 or b = d = 0, show (2.15) has a nontrivial solution (x1 , x2 ).
Assuming a 6= 0 and ad = bc, show that (2.15) has a nontrivial solution.
If (2.15) has a nontrivial solution, show that ad − bc = 0.
If ad − bc 6= 0, show that (2.16) has a solution (x1 , x2 ) for any choice of
(y1 , y2 ).
e. If ad − bc 6= 0, show that the solution (x1 , x2 ) in (d) is unique.
2.3. HOMOGENEOUS EQUATIONS WITH CONSTANT COEFFICIENTS
2.3
45
Homogeneous Equations with Constant Coefficients
In the previous section we saw the importance of having linearly independent solutions
in order to obtain the general solution for a homogeneous 2nd-order linear differential
equation. In this section we shall describe how to find these linearly independent solutions when the equation has constant coefficients. Since the method works for nth-order
equations, not just for n = 2, we shall initially describe it in this more general context.
Let us recall from Section 2.1 that a homogeneous linear differential equation
of order n with constant coefficients can be written in the form
y (n) + a1 y (n−1) + · · · + an y = 0,
(2.20)
where the a1 , . . . , an are constants. Alternatively, (2.20) can be written as Ly = 0,
where L is the nth-order differential operator
L = Dn + a1 Dn−1 + · · · + an .
(2.21)
In fact, if we introduce the characteristic polynomial for (2.20),
p(r) = rn + a1 rn−1 + · · · + an ,
(2.22)
then we can write the differential operator L as L = p(D).
We begin our analysis of (2.20) with the simplest case n = 1, i.e. we want to solve
(D + a1 )y = 0. Let us change notation slightly and solve
(D − r1 )y = 0.
(2.23)
But this is just y 0 = r1 y which we can easily solve to find y(x) = C er1 x . In particular,
with C = 1 we have the exponential solution
y1 (x) = er1 x .
(2.24)
But this calculation has implications for the general case. Namely, suppose that r1 is a
root of the characteristic polynomial p, i.e. satisfies the characteristic equation
p(r) = 0,
(2.25)
Then p(r) = q(r)(r − r1 ) for some polynomial q of degree n − 1, and so
Ly1 = p(D)y1 = q(D)(D − r1 )y1 = 0
In other words, we see that if r1 is a root of the characteristic polynomial p, then
y1 (x) = er1 x is a solution of (2.20). Since an nth-order polynomial can have up to n
roots, this promises to generate several solutions of (2.20).
Example 1. Find solutions of the 3rd-order differential equation y 000 + 3y 00 + 2y 0 = 0.
Solution. The characteristic equation is r3 + 3r2 + 2r = 0. Now cubic polynomials can
generally be difficult to factor, but in this case r itself is a factor so we easily obtain
r3 + 3r2 + 2r = r (r2 + 3r + 2) = r (r + 1) (r + 2).
Since we have three roots r = 0, −1, −2, we have three solutions:
46
CHAPTER 2. SECOND-ORDER DIFFERENTIAL EQUATIONS
y1 (x) = e0x = 1,
y2 (x) = e−x ,
2
y3 (x) = e−2x .
Generalizing Example 1, if the characteristic equation for (2.20) has n distinct real
roots r1 , . . . , rn , then we get n solutions y1 (x) = er1 x , . . . , yn (x) = ern x . But the
fact that the characteristic polynomial factors as (r − r1 ) · · · (r − rn ) means that the
differential operator L can be similarly factored:
L = (D − r1 )(D − r2 ) · · · (D − rn ).
(2.26)
(Incidentally, notice that the order in which the factors (D − ri ) appear does not matter
since (D − ri )(D − rj ) = D2 − (ri + rj )D + r1 r2 = (D − rj )(D − ri ).)
But now suppose that the roots of p(r) are not all distinct; can we still generate as
many solutions? For example, let us generalize (2.23) and consider
(D − r1 )m1 y = 0,
(2.27)
for an integer m1 ≥ 1; can we generate m1 solutions? Let us try to solve this by
generalizing (2.24) to consider
y(x) = u(x) er1 x ,
(2.28)
where the function u(x) is to be determined. But we can easily calculate
(D − r1 )y =u0 er1 x + ur1 er1 x − r1 u er1 x = u0 er1 x
..
.
(D − r1 )m1 y =(Dm1 u) er1 x .
Therefore, we want u to satisfy Dm1 u = 0, which means that u can be any polynomial
of degree less than m1 : u(x) = c0 + c1 x + c2 x2 + · · · + cm1 −1 xm1 −1 . But this means
that we have indeed generated m1 solutions of (2.27):
y1 (x) = er1 x , y2 (x) = x er1 x , . . . , ym1 (x) = xm1 −1 er1 x .
(2.29)
For the same reasons as before, this construction can be extended to handle operators
L whose characteristic polynomial factors into powers of linear terms:
L = p(D) = (D − r1 )m1 (D − r2 )m2 · · · (D − rk )mk ,
(2.30)
where the r1 , . . . , rk are distinct.
Example 2. Find solutions of the 3rd-order differential equation y 000 − 2y 00 + y 0 = 0.
Solution. The characteristic polynomial factors as p(r) = r3 − 2y 2 + r = (r − 1)2 r = 0.
The roots are r = 0 and r = 1 (double). So r = 0 contributes one solution, namely
y0 = 1, and r = 1 contributes two solutions, namely y1 (x) = ex and y2 (x) = x ex .
2
Evidently, generating all solutions by this method requires us to be able to completely
factor the characteristic polynomial p(r). For higher-order equations, this could be
problematic; but we know how to factor quadratic polynomials (possibly encountering
complex roots), so we now restrict our attention to second-order equations.
2.3. HOMOGENEOUS EQUATIONS WITH CONSTANT COEFFICIENTS
47
Second-Order Equations
We shall describe how to find the general solution for all second-order homogeneous
equations. Let us change notation slightly and write (2.20) for n = 2 as
a y 00 + b y 0 + c y = 0,
(2.31)
where a 6= 0. The characteristic equation for (2.31) is
a r2 + b r + c = 0,
(2.32)
which is solved using the quadratic formula
√
−b ± b2 − 4ac
r=
.
2a
As is often the case with quadratic equations, we encounter different cases depending
on the roots. The case of distinct real roots is the simplest.
Theorem 1. If (2.32) has distinct real roots r1 , r2 , then the general solution of (2.31)
is given by
y(x) = c1 er1 x + c2 er2 x .
Proof. If we let y1 (x) = er1 x and y2 (x) = er2 x , then we have two solutions that
are not constant multiples of each other, so they are linearly independent. (Linear
independence can also be checked using the Wronskian; cf. Exercise 1.) Therefore,
Theorem 2 in Section 2.2 implies the general solution is given by c1 er1 x + c2 er2 x . 2
Case 1: Distinct Real
Roots
Example 3. Find the general solution of y 00 + 2y 0 − 3y = 0.
Solution. The characteristic equation is r2 + 2r − 3 = 0. We can factor it to find the
roots:
r2 + 2r − 3 = (r + 3)(r − 1) ⇒ r = −3, 1.
Consequently, the general solution is
y(x) = c1 ex + c2 e−3x .
2
Now let us consider the case that (2.32) has a double real root r1 = −b/(2m) (since
b2 − 4ac = 0). This means that the characteristic polynomial factors as ar2 + br + c =
a(r − r1 )2 , which in turn means that (2.31) can be written as
(aD2 + bD + c)y = a(D − r1 )2 y = 0.
(2.33)
But we have seen above that this has two distinct solutions
y1 (x) = er1 x
and y2 (x) = x er1 x .
Observing that these solutions are not constant multiples of each other (or using the
Wronskian; cf. Exercise 1), we conclude that they are linearly independent and hence
generate the general solution of (2.31):
Case 2: Double Real
Root
48
CHAPTER 2. SECOND-ORDER DIFFERENTIAL EQUATIONS
Theorem 2. If (2.32) has a double real root r1 , then the general solution of (2.31) is
given by
y(x) = c1 er1 x + c2 x er1 x .
Example 4. Find the general solution of y 00 + 2y 0 + y = 0.
Solution. The characteristic equation is r2 + 2r + 1 = (r + 1)2 = 0, so r = −1 is a
double root. The general solution is therefore
y(x) = c1 e−x + c2 x e−x .
2
Finally we recall that (2.32) need not have any real roots: for example, the quadratic
formula shows that r2 + 2r + 2 = 0 has two complex roots, r = −1 ± i. In fact, since
we are assuming that a, b, and c are all real numbers, if r = α + iβ is a complex root
(where α, β are real numbers), then the complex conjugate r = α − iβ is also a root:
a r2 + b r + c = 0
Case 3:
Complex
Roots (For a review
of complex numbers, see
the Appendix.)
⇒
a r2 + b r + c = a r2 + b r + c = 0.
So we have two solutions of (2.31):
y1 (x) = erx = e(α+iβ)x
and y2 (x) = erx = e(α−iβ)x .
However, these are both complex-valued solutions of the real-valued equation (2.31); if
possible, can we find real-valued solutions? To do this, we need to use Euler’s formula
(see Appendix A):
eiθ = cos θ + i sin θ.
Applying this to both y1 and y2 we find
y1 (x) = e(α+iβ)x = eαx eiβx = eαx (cos βx + i sin βx)
y2 (x) = e(α−iβ)x = eαx e−iβx = eαx (cos βx − i sin βx),
where in the last step we used cos(−βx) = cos βx and sin(−βx) = − sin βx, since cosine
is an even function and sine is an odd function. But then by linearity we find
y1 + y2
= eαx cos βx is a real-valued solution of (2.31), and
2
y1 − y2
= eαx sin βx is a real-valued solution of (2.31).
2i
Theorem 3. If (2.32) has a complex-valued root r = α + iβ where β 6= 0, then the
general solution of (2.31) is given by
y(x) = eαx (c1 cos βx + c2 sin βx).
Proof. We have seen that ỹ1 (x) = eαx cos βx and ỹ2 (x) = eαx sin βx are both solutions
of (2.31), and they are not constant multiples of each other, so they are linearly independent. (Linear independence can also be checked using the Wronskian; cf. Exercise
1.)
2
2.3. HOMOGENEOUS EQUATIONS WITH CONSTANT COEFFICIENTS
49
Example 5. Find the general solution of y 00 + 8y 0 + 20y = 0.
Solution. The characteristic equation is r2 + 8r + 20 = 0, which we solve using the
quadratic formula
√
−8 ± 64 − 80
r=
= −4 ± 2i.
2
We see that we have complex conjugate roots, so let us select one, say r = −4 + 2i.
Theorem 3 tells us that the general solution is
y(x) = e−4 x (c1 cos 2x + c2 sin 2x).
2
Let us recall that the general solution is used to solve initial-value problems by
evaluating the constants. Thus, in all cases we can now find the solution of (2.31)
satisfying the initial conditions
y(x0 ) = y0
and y 0 (x0 ) = y1 .
Examples are given in the Exercises.
Exercises
1. Use the Wronskian to verify that the following pairs of functions encountered in
the proofs of Theorems 1, 2, and 3 are linearly independent:
(a) y1 (x) = er1 x and y2 (x) = er2 x where r1 , r2 are distinct real numbers,
(b) y1 (x) = erx and y2 (x) = x erx where r is a real number,
(c) ỹ1 (x) = eαx cos βx and ỹ2 (x) = eαx sin βx where α, β are real numbers with
β 6= 0.
2. Find the general solution
(a) y 00 − 4y = 0.
(e) 9y 00 + 12y 0 + 4y = 0.
(b) y 00 + 4y = 0.
(f) y 00 + 8y 0 + 25y = 0.
(c) y 00 + 6y 0 + 9y = 0.
Solution
(d) y 00 + 2y 0 − 15y = 0.
(g) y 00 + 2y 0 − 2y = 0.
(h) y 00 − 6y 0 + 11y = 0.
3. Find the solution of the initial-value problem
(a) y 00 − 9y = 0, y(0) = 1, y 0 (0) = −1.
(b) y 00 + 9y = 0, y(0) = 1, y 0 (0) = −1.
Solution
(c) y 00 − 10y 0 + 25y = 0, y(0) = −1, y 0 (0) = 1.
(d) y 00 − 6y 0 + 25y = 0, y(0) = 1, y 0 (0) = −1
(e) 2y 00 − 7y 0 + 3y = 0, y(0) = 0, y 0 (0) = 2
(f) y 00 − 4y 0 + 5y = 0, y(0) = 1, y 0 (0) = 0
50
2.4
CHAPTER 2. SECOND-ORDER DIFFERENTIAL EQUATIONS
Free Mechanical Vibrations
In this section we will apply the techniques developed in the preceding sections of this
chapter to study the homogeneous second-order linear differential equations associated
with a free (i.e. unforced) mechanical vibration. One simple model of such a vibration is
the spring-mass system described in Section 2.1, namely we consider an object of mass
m attached to a spring which exerts a linear restorative force with spring-constant k
and whose motion is subject to a linear resistive force with damping coefficient c:
m
Fig.1.
Spring-mass-
dashpot System
dx
d2 x
+c
+ kx = 0,
2
dt
dt
(2.34)
where x denotes the displacement from the equilibrium position at x = 0. Note that
the damping force could be due to friction if the mass is supported by a horizontal
surface, or due to air resistance if the mass is moving through the air. In a controlled
experiment, the damping force is usually induced by a “dashpot,” a device designed
to exert a resistive force proportional to the velocity of the object; an example of a
dashpot is a shock absorber, like one has in a car suspension. Note that we have used x
as the unknown function, suggesting that the motion is lateral; we can also use (2.34)
for vertical vibrations provided we factor out the effect of gravity (see Exercise 8).
Since (2.34) has constant coefficients, we know that we can find the general solution
(and hence the solution for initial-value problems) using the roots of the characteristic
equation. However, there are different cases depending upon the values of m, k, and c.
We start with the simplest case of undamped motion, i.e. c = 0.
Undamped Motion
Suppose that the motion occurs in a vacuum so there is no air resistance, and there are
no other damping factors. Then we take c = 0 in (2.34) and obtain
m
d2 x
+ kx = 0.
dt2
(2.35)
The characteristic equation is m r2 + k = 0, which has purely imaginary roots
q
k
r = ± i ω, where ω = m
,
and so the general of (2.35) is
x(t) = c1 cos ωt + c2 sin ωt.
(2.36)
This function is clearly a periodic function with period 2π/ω, and the quantity ω is
called the circular frequency.
Given initial conditions on x and dx/dt, we can evaluate the constants c1 and c2 and
obtain a particular solution. While the solution is clearly periodic, it is difficult to see
the amplitude and shape of the graph. However, a little trigonometry will enable us to
put the solution (2.36) into the more convenient amplitude-phase form
x(t) = A cos(ωt − φ)
where A > 0 and 0 ≤ φ < 2π,
(2.37)
2.4. FREE MECHANICAL VIBRATIONS
51
in which A is clearly the amplitude of the periodic motion. The angle φ is called the
phase shift, and controls how much the graph is shifted from that of A cos ωt: the first
peak in the graph of (2.37) occurs at
φ
t =
ω
∗
A
x
(2.38)
instead of at t = 0 (see Fig.2). The quantity (2.38) is called the time lag.
In order to put the solution (2.36) in the form (2.37), we use the difference of angles
formula for cosine
cos(α − β) = cos α cos β + sin α sin β,
|
t
t*
to conclude
Fig.2. Undamped Motion
cos(ωt − φ) = cos ωt cos φ + sin ωt sin φ.
Comparing this with (2.36), we see that we need A and φ to satisfy
A cos φ = c1
and A sin φ = c2 .
(2.39)
To determine A from these equations, we square and add them, then take square-root:
q
A2 cos2 φ + A2 sin2 φ = A2 = c21 + c22 ⇒ A = c21 + c22 .
To obtain the value of φ, we can divide them:
c2
A sin φ
= tan φ = .
A cos φ
c1
From this formula, we may be tempted to conclude that φ = tan−1 (c2 /c1 ), but we
need to be careful: tan−1 returns a value in (−π/2, π/2) and we want 0 ≤ φ < 2π!
Consequently, we may need to add either π or 2π to tan−1 (c2 /c1 ) in order to ensure
that (2.39) is indeed satisfied. Let us see how this works in an example.
Example 1. Suppose a mass of 0.5 kg is attached to a spring with spring constant
k = 2 N/m. Find the motion in amplitude-phase form if (a) the spring is compressed
1 m and then released, and (b) the spring is compressed 1 m and then given an initial
velocity of 1 m/sec towards the equilibrium position.
Solution. With m = 0.5 and k = 2, the equation (2.35) becomes
x00 + 4x = 0.
The characteristic equation r2 + 4 = 0 has imaginary roots r = ± 2 i, which means the
general solution is
x(t) = c1 cos 2t + c2 sin 2t.
We see that the oscillation has circular frequency ω = 2, and we can use the initial
conditions to determine c1 , c2 . In case (a), the mass is released with zero velocity from
the point x = −1, so the initial conditions are
x(0) = −1
and x0 (0) = 0.
(2.40)
Recall that tan−1 is the
principal branch of the
inverse tangent function
and returns values in
(−π/2, π/2).
52
CHAPTER 2. SECOND-ORDER DIFFERENTIAL EQUATIONS
Using these conditions to evaluate the constants, we find c1 = −1 and c2 = 0, so our
particular solution is x(t) = − cos 2t. However, this is not in amplitude-phase form
(since we want a positive amplitude); we want to write
x
− cos 2t = A cos(2t − φ).
0.5
Obviously we take A = 1 and from (2.39) we find that we want
|
π
_
2
cos φ = −1
and
sin φ = 0.
But this implies φ = π and our solution can be written
Fig.3. Solution x(t)
=
cos(2t − π) for Example 1a.
x(t) = cos(2t − π).
The graph appears in Figure 3; note that the time lag t∗ = π/2 is found by setting
2t − π = 0.
Now let us consider the initial conditions (b), which can be expressed as
x(0) = −1
and x0 (0) = 1.
(2.41)
(Note that x0 (0) > 0 since x(0) < 0 and the initial velocity is towards the equilibrium
at x = 0.) Using these to evaluate the constants, we find c1 = −1 and c2 = 1/2, so our
particular solution is
x(t) = − cos 2t +
1
sin 2t.
2
To put this in amplitude-phase form we want
s
A=
(−1)2
√
2
1
5
=
+
2
2
and
1
tan φ = − .
2
x
1.0
0.5
|
1.339
t
Now tan−1 (−0.5) ≈ −0.4636 radians, but we cannot take φ = −0.4636 since we want
0 ≤ φ < 2π. In fact, −0.4636 is in the fourth quadrant, but A cos φ = −1 < 0 and
A sin φ = 1/2 > 0 imply that φ should be in the second quadrant. Consequently, we
add π to tan−1 (−0.5):
φ = −0.4636 + π ≈ 2.678 radians.
Therefore, our solution in amplitude-phase form is
Fig.4. Solution to Exam√
ple 1b has amplitude 5/2
and time lag 1.339.
√
x(t) ≈
5
2
cos(2t − 2.678).
Note that the time lag is t∗ = 2.678/2 = 1.339.
2
2.4. FREE MECHANICAL VIBRATIONS
53
Damped Motion: Three Cases
With c > 0, we want to find the general solution of (2.34). The characteristic equation
is mr2 + cr + k = 0 with roots given by the quadratic formula:
√
−c ± c2 − 4mk
.
r=
2m
We see that there are three cases:
• c2 > 4mk ⇒ two distinct real roots. This case is called overdamped.
• c2 = 4mk ⇒ one double real root. This case is called critically damped.
• c2 < 4mk ⇒ two complex conjugate roots. This case is called underdamped.
Let us investigate each case separately.
√
Overdamped
Motion. In this case c2 > 4mk implies that c2 − 4mk is real and
√
0 < c2 − 4mk < c. Hence the characteristic equation has two negative real roots:
√
√
−c − c2 − 4mk
−c + c2 − 4mk
r1 =
< r2 =
< 0.
2m
2m
As we saw in the previous section, the general solution is of the form
x(t) = c1 er1 t + c2 er2 t ,
and we can evaluate the constants if we are given initial conditions. But regardless of
the values of c1 , c2 the fact that r1 and r2 are negative implies that x(t) decays rapidly
(exponentially) to x = 0: this is not surprising since the damping factor c is so large
that it rapidly overcomes the tendency for oscillatory motion that we observed when
c = 0. In fact, if c1 and c2 are both positive, we see that x(t) remains positive (as
it rapidly approaches 0) and never passes through the equilibirum x = 0. For certain
values of c1 and c2 it is possible for the motion to pass through the equilibrium at x = 0
once, but that is the most oscillation that can occur. See Figure 5 for the graphs of
several particular solutions satisfying x(0) = 1 but with various values for x0 (0).
Critically Damped Motion. In this case c2 = 4mk implies that there is one negative
(double) root
−c
r=
< 0.
2m
As we saw in the previous section, the general solution is of the form
ct
ct
ct
x(t) = c1 e− 2m + c2 t e− 2m = e− 2m (c1 + c2 t).
ct
Although (c1 + c2 t) may grow (if c2 6= 0), the factor e− 2m decays so much more rapidly
that x(t) behaves much like the overdamped case: solutions can pass through x = 0 at
most once.
Underdamped Motion. In this case c2 < 4mk implies that there are two complex
conjugate roots
√
c
4mk − c2
r=−
± i µ where µ =
> 0.
2m
2m
1
0
t
Fig.5. Overdamped vibra-
tion with x(0) = 1 and various values of x0 (0).
54
CHAPTER 2. SECOND-ORDER DIFFERENTIAL EQUATIONS
As we saw in the previous section, the general solution is of the form
ct
x(t) = e− 2m (c1 cos µt + c2 sin µt).
But the parenthetical term is of the same form as the general solution in the undamped
case, i.e. (2.36), so we can put it in amplitude-phase form and obtain
ct
x(t) = A e− 2m cos(µt − φ), where A cos φ = c1 , A sin φ = c2 .
x
(2.42)
ct
t
Fig.6. Underdamped vibration with time-varying amplitude (dotted curve).
Now this is interesting: we have exponential decay (due to the factor e− 2m ) but
also oscillation (due to cos(µt − φ)). This is sometimes described as pseudo-periodic
motion (with pseudo-frequency µ and pseudo-period T = 2π/µ) with time-varying
ct
amplitude A e− 2m . We are not surprised to find that, for very small damping c, the
pseudo-frequency µ is very close to the frequency ω of the undamped vibration; perhaps
more interesting is to note that c > 0 implies µ < ω, so the presence of damping slows
down the frequency of the oscillation. See Figure 6 for a graph of pseudo-periodic
motion.
Example 2. Suppose that the mass and spring of Example 1 is now attached to a
dashpot which exerts a resistive force proportional to velocity with c = 0.2 N/(m/s).
Find the motion in the two cases (a) and (b) described in Example 1.
Solution. We find that (2.34) becomes
0.5 x00 + 0.2 x0 + 2 x = 0.
The characteristic equation can be written as r2 +0.4 r +4 = 0, so the quadratic formula
gives us the roots
√
−0.4 ± −15.84
≈ −0.2 ± i 1.99.
r=
2
(In particular, since c2 = 0.16 < 16 = 4mk, we see that the system is underdamped,
which is consistent with having complex roots.) The general solution can be written as
√
x(t) = e(−0.2)t (c1 cos µt + c2 sin µt)
where µ = 3.96 ≈ 1.99.
Notice that the pseudo-frequency of this oscillation is µ ≈ 1.990, which indeed is a little
smaller than the circular frequency ω = 2 of Example 1.
Now we want to use initial conditions to evaluate the constants c1 , c2 . In case a),
we use initial conditions (2.40). First, we evaluate x(t) at t = 0 to find c1 :
x(0) = e0 (c1 cos 0 + c2 sin 0) = −1
⇒
c1 = −1,
Then we differentiate x(t) to find
x0 (t) = −0.2 e(−0.2)t (c1 cos µt + c2 sin µt) + e(−0.2)t (−c1 µ sin µt + c2 µ cos µt),
and we evaluate this to find c2 :
x0 (0) = −0.2 c1 + c2 µ = 0
⇒
c2 =
−0.2
−0.2
≈
≈ −0.101.
µ
1.99
2.4. FREE MECHANICAL VIBRATIONS
55
We conclude that the solution of our initial-value problem is
0.2
x(t) = e(−0.2)t − cos µt −
sin µt .
µ
But now we’d like to put it in the form (2.42), so we want to find A and φ so that
A cos φ = −1
and A sin φ = −
0.2
≈ −0.101.
µ
p
We immediately see that A ≈ 1 + (0.101)2 ≈ 1.01 and φ should be in the 3rd quadrant
(where cos and sin are negative) satisfying tan φ = c2 /c1 ≈ 0.101. Since tan−1 (0.101) =
0.101 is in the first quadrant, we must add π:
x
1
φ ≈ 0.101 + π ≈ 3.24.
0.5
Therefore, we may approximate the solution in case a) by
t
0
x(t) ≈ (1.01) e(−0.2)t cos(µt − 3.24),
µ ≈ 1.99.
(2.43)
x(t) = e
2
3
4
5
-0.5
-1
The graph of this solution appears in Figure 7.
Finally, in case b) we use the initial conditions (2.41) to find c1 , c2 . Proceeding as
before, we again find c1 = −1 but now c2 = 0.8/µ, so our solution may be written
(−0.2)t
1
Fig.7. Solution to Example
2a.
0.8
sin µt .
− cos µt +
µ
To obtain the form (2.42), we want A and φ to satisfy
A cos φ = −1
and A sin φ =
0.8
0.8
≈
≈ 0.40.
µ
1.99
p
So A ≈ 1 + (0.40)2 ≈ 1.08 and φ should be in the second quadrant (where cos is
negative and sin is positive) satisfying tan φ = c2 /c1 ≈ −0.40. Since tan−1 (−0.40) ≈
−0.38 is in the fourth quadrant, we again must add π:
x
1
0.5
t
0
φ ≈ −0.38 + π ≈ 2.76.
1
2
3
4
5
-0.5
-1
Therefore, we may approximate the solution in case b) by
Fig.8. Solution to Example
(−0.2)t
x(t) ≈ (1.08) e
cos(µt − 2.76),
The graph of this solution appears in Figure 8.
µ ≈ 1.99.
(2.44)
2
2b.
56
CHAPTER 2. SECOND-ORDER DIFFERENTIAL EQUATIONS
Other Mechanical Vibrations
The model for a spring-mass system was discussed in great detail in this section, but
the principles can be applied to many other forms of mechanical vibrations. One other
example that we shall now discuss is the motion of a pendulum consisting of a mass
m attached to one end of a rod of length L. We assume that the rod has negligible
mass and the end not attached to the mass is fixed in place, but allowed to pivot as the
pendulum swings back and forth. To describe the position of the mass at time t, let
θ(t) denote the angle that the rod makes with the vertical, so θ = 0 corresponds to the
rod hanging straight down. We want to derive a differential equation satisfied by θ(t).
We shall use the physical principle of conservation of energy, namely the sum of
the kinetic energy and the potential energy must remain constant. The kinetic energy
is Ekin = 12 mv 2 , so we need to find v in terms of θ and L. But the distance along the
arc of motion is Lθ and L is fixed, so v = Ldθ/dt. Consequently, the kinetic energy is
Fig.9. Pendulum.
Ekin
1
= mL2
2
dθ
dt
2
.
The potential energy is Epot = mgh where h is the height above its equilibrium position
(when θ = 0). Figure 10 shows that L − h = L cos θ or h = L(1 − cos θ), so
Epot = mgL(1 − cos θ).
By conservation of energy, we know
1
mL2
2
dθ
dt
2
+ mgL(1 − cos θ) = C,
where C is a constant.
If we differentiate both sides of this equation using the chain rule, we obtain
Fig.10. Pendulum
trigonometry.
mL2
dθ d2 θ
dθ
+ mgL sin θ
= 0.
2
dt dt
dt
We can factor mLdθ/dt out of this equation and conclude
L
d2 θ
+ g sin θ = 0.
dt2
(2.45)
Now this is a nonlinear second-order differential equation, so we may dismay that the
techniques of this section will not apply. However, if the oscillations are small, then θ
is small and by Taylor series approximation we know sin θ ≈ θ. Thus we might expect
that (2.45) is well-appromiated by the linear equation
L
d2 θ
+ gθ = 0.
dt2
(2.46)
This process of approximating (under certain circumstances) a nonlinear equation by
a linear one is called linearization. Again, we emphasize that we can only expect
solutions of (2.46) to be a good approximations for the solutions of (2.45) when the
oscillations are small.
2.4. FREE MECHANICAL VIBRATIONS
57
Exercises
In Exercises 1-5, we ignore damping forces.
1. If a 2 kg mass is attached to a spring with constant k = 8 N/m and set in motion,
find the period and circular frequency of the motion. Solution
2. A 16 lb weight is attached to a spring with constant k = 8 lb/ft and set in motion.
Find the period and circular frequency of the motion.
3. A mass of 3 kg is attached to a spring with constant k = 12 N/m, then the spring
is stretched 1 m beyond its natural length and given an initial velocity of 1 m/sec
back towards its equilibrium position. Find the circular frequency, period, and
amplitude of the motion.
4. A 2 kg mass is attached to a spring with constant k = 18 N/m. Given initial
conditions x(0) = 1 = x0 (0), find the motion x(t) in amplitude-phase form (2.37).
5. A 1 kg mass is attached to a spring with constant k = 16 N/m. Find the motion
x(t) in amplitude-phase form (2.37) if x(0) = 1 and x0 (0) = −1.
6. For the given values of mass m, damping coefficient c, spring constant k, initial
position x0 , and initial velocity v0 : i) find the solution x(t), and ii) state whether
the motion is overdamped, critically damped, or underdamped.
(a)
(b)
(c)
(d)
(e)
m = 1,
m = 2,
m = 1,
m = 4,
m = 2,
c = 6, k = 8, x0 = 0, v0 = 2.
c = 5, k = 2, x0 = 3, v0 = 0.
c = 2, k = 2, x0 = 1, v0 = 0.
c = 12, k = 9, x0 = 1, v0 = 1.
c = 12, k = 50, x0 = 1, v0 = 2.
7. For the following underdamped systems, find x(t) in the form A e−αt cos(µt − φ),
and identify the pseudo-frequency of the oscillation. What would the circular
frequency ω be if the damping were removed?
(a) m = 1, c = 2, k = 2, x0 = 1, v0 = 0. Solution
(b) m = 2, c = 1, k = 8, x0 = −4, v0 = 2.
8. If we consider a vertical spring-mass-dashpot system, then we need to include
gravity as a force. Consequently, in place of (2.34) we have
d2 y
dy
+c
+ ky = mg,
2
dt
dt
where y is the distance (measured downward) that the spring has been stretched
beyond its natural length. However, when we first attach the mass m, the vertical
spring is stretched so that the equilibrium position is now at y = mg/k. (Why?) If
we let x(t) = y(t) − mg/k denote the distance that the spring is stretched beyond
its new equilibrium position, show that x satisfies (2.34).
m
9. A pendulum has a mass of 10 kg attached to a rod of length 1/2 m. Use linearization to find the circular frequency and period of small oscillations. If the mass is
doubled to 20 kg, what is the effect on the period?
58
2.5
CHAPTER 2. SECOND-ORDER DIFFERENTIAL EQUATIONS
Nonhomogeneous Equations with Constant Coefficients
In this section we want to develop a method for finding the general solution of a secondorder nonhomogeneous linear differential equation with constant coefficients. Using the
theory in Section 2.2, we know that the general solution is of the form y = yp +yc , where
yp is a particular solution and yc is the general solution for the associated homogeneous
equation. In Section 2.3 we developed a method for finding yc , so now we focus on yp .
As in the homogeneous case, the basic method works for equations of order n so we first
develop the method in that context.
Let us consider a nonhomogeneous linear differential equation of order n with constant coefficients
Ly = y (n) + a1 y (n−1) + · · · an y = f (x),
(2.47)
where L is the differential operator Dn + a1 Dn−1 + · · · an and the coefficients a1 , . . . , an
are constants; the given function f is what makes (2.47) nonhomogeneous. The technique for finding a particular solution of (2.47) is called the method of undetermined
coefficients. It involves making an educated guess about what form yp should take
based upon the form of f . This yields a trial solution yp involving one or more “undetermined coefficients” that may be evaluated simply by plugging into (2.47). Let us
proceed with some examples to see how easy it is.
Example 1. Find the general solution of y 00 + 4y = 3 e2x .
Solution. Although we are anxious to find yp , it is usually best to start with yc ,
which we need for the general solution anyway. So we solve y 00 + 4y = 0 by using the
roots of the characteristic equation r2 + 4 = 0, which are r = ±2i. This gives us the
complementary solution
yc = c1 cos 2x + c2 sin 2x.
Now what is our educated guess for yp based upon F (x) = 3 e2x ? We know that
exponentials differentiate as exponentials, so a reasonable assumption is
yp (x) = A e2x ,
where A is our undetermined coefficient. To see if this works and to evaluate A, we
simply plug into y 00 + 4y = 3 e2x :
yp00 + 4yp = 4A e2x + 4A e2x = 8A e2x = 3 e2x .
We see that 8A = 3 or A = 3/8. In other words yp (x) =
is
y(x) =
3
8
3
8
e2x and our general solution
e2x + c1 cos 2x + c2 sin 2x.
2
Example 2. Find the general solution of y 00 + 3y 0 + 2y = sin x.
Solution. We start with the homogeneous equation y 00 +3y 0 +2y = 0. The characteristic
equation r2 + 3r2 + 2 = 0 factors as (r + 1)(r + 2) = 0, so the complementary solution is
yc (x) = c1 e−x + c2 e−2x .
2.5. NONHOMOGENEOUS EQUATIONS WITH CONSTANT COEFFICIENTS 59
What does F (x) = sin x suggest that we use for yp (x)? We could try yp (x) = A sin x,
but when we plug into y 00 + 3y 0 + 2y = sin x we’ll get cosine as well as sine terms on the
left hand side; learning from this experience, we are led to the trial solution
yp (x) = A cos x + B sin x,
where A and B are to be determined. We plug into y 00 + 3y 0 + 2y = sin x to find
(−A cos x − B sin x) + 3(−A sin x + B cos x) + 2(A cos x + B sin x) = sin x.
Rearranging terms on the left hand side, we obtain
(−3A + B) sin x + (A + 3B) cos x = sin x.
Comparing left and right hand sides gives us two equations that A and B must satisfy:
−3A + B = 1
A + 3B = 0.
We easily solve these equations to find B = 1/10 and A = −3/10, so yp (x) =
3
10 cos x and our general solution is
y(x) =
1
10
sin x −
3
10
cos x + c1 e−x + c2 e−2x .
1
10
sin x −
2
Example 3. Find the general solution of y 00 + 4y 0 + 5y = x2 .
Solution. The associated homogeneous equation y 00 + 4y 0 + 5y = 0 has characteristic
equation r2 + 4r + 5 = 0. The quadratic formula yields roots r = −2 ± i, so the
complementary solution is
yc (x) = e−2x (c1 cos x + c2 sin x).
Turning to the nonhomogeneous equation, f (x) = x2 is a polynomial, and derivatives
of polynomials are polynomials, so we suspect yp should be a polynomial. Let us take
as our trial solution
yp (x) = Ax2 + Bx + C.
Plugging this into y 00 + 4y 0 + 5y = x2 and collecting terms on the left according to the
power of x:
5Ax2 + (8A + 5B)x + (2A + 4B + 5C) = x2 .
Comparing the left and right hand sides, we get the following equations:
5A = 1,
8A + 5B = 0,
2A + 4B + 5C = 0.
We easily solve these equations to find A = 1/5, B = −8/25, and C = 22/125, so the
general solution is
y(x) =
x2
5
−
8
25 x
+
22
125
+ e−2x (c1 cos x + c2 sin x).
2
60
CHAPTER 2. SECOND-ORDER DIFFERENTIAL EQUATIONS
At this point, we have managed to find yp when f (x) in (2.47) is an exponential function, a periodic function involving sine or cosine, or a polynomial. We can summarize
these results in a table:
f (x)
eαx , α is a real number
cos βx or sin βx, β is a real number
xk , k is an integrer ≥ 0
Expected Trial yp (x)
A eαx
A cos βx + B sin βx
Ak x k + · · · + A0
Table 1. The expected trial solutions for certain functions f (x)
But does this table always work? Let us consider one more example:
Example 4. Find the general solution of y 00 − 3y 0 + 2y = ex .
Solution. According to the table, we should take yp (x) = A ex . But when we plug
that into the equation, on the left hand side we obtain
A ex − 3A ex + 2A ex = 0.
However, comparing with the right hand side of the equation we get 0 = ex , which
cannot be satisfied no matter what A is! What went wrong?
For one thing, we forgot to find the complementary solution, so let us do that now.
The associated homogeneous equation is Ly = y 00 − 3y 0 + 2y = 0, and its characteristic
equation r2 − 3r + 2 = (r − 1)(r − 2) = 0 has roots r = 1, 2, so we obtain
yc (x) = c1 ex + c2 e2x .
We now see that the trial function yp (x) = A ex is a solution of the homogeneous
equation Ly = 0, so there is no way that it can be a particular solution of the nonhomogeneous equation. What can we do?
We shall use a trick called the annihilator method. We first write our nonhomogeneous equation as
(D − 1)(D − 2)y = ex .
Now let us apply the operator D − 1 to both sides; since D − 1 annihilates ex , we get a
third-order homogeneous equation
(D − 1)2 (D − 2)y = 0.
But in Section 2.3, we found three distinct solutions of this homogeneous equation:
y1 (x) = ex ,
y2 (x) = x ex ,
and y3 (x) = e2x .
The first and last of these are included in yc , so we discard them and use the middle
one for our trial solution with an undetermined coefficient:
yp (x) = Axex .
Calculating yp0 = Aex + Axex and yp00 = 2Aex + Axex , we plug into y 00 − 3y 0 + 2y = ex :
(2Aex + Axex ) − 3(Aex + Axex ) + 2Axex = ex .
2.5. NONHOMOGENEOUS EQUATIONS WITH CONSTANT COEFFICIENTS 61
But the terms involving xex on the left hand side cancel each other, leaving us with
only −Aex on the left hand side. Comparing with ex on the right hand side, we obtain
A = −1. We conclude that yp (x) = −x ex is a particular solution of y 00 − 3y 0 + 2y = ex
and the general solution is
2
y(x) = −x ex + c1 ex + c2 e2x .
Let us now consider how to generalize Table 1 to include more complicated functions
f (x), as well as cases (as in Example 4) where the expected trial solution yp fails. At this
point we restrict our attention to second-order equations for which we have complete
knowledge of the roots of the characteristic polynomial.
Second-Order Equations
Let us consider the second-order nonhomogeneous linear diffferential equation
Ly = ay 00 + by 0 + cy = f (x),
(2.48)
which has characteristic polynomial
p(r) = ar2 + br + c
so that we can write the differential operator as L = p(D). Let us recall that there are
three cases for the roots of p(r) = 0: i) distinct real roots r1 , r2 , ii) a double real root
r1 = r2 , or iii) complex conjugate roots r = α ± iβ.
The type of function f (x) that we shall consider in (2.48) is
f (x) = xk eαx cos βx or xk eαx sin βx,
(2.49)
where k is a nonnegative integer and α, β are real numbers. Based upon Table 1, our
expected trial solution is
yp (x) = (Ak xk + · · · + A0 )eαx cos βx + (Bk xk + · · · + B0 )eαx sin βx.
(2.50)
What could go wrong? Some term in (2.50) could be annihilated by p(D). What are
the possibilities?
• If β = 0 and p(α) = 0, then α is a real root with multiplicity m = 1 or 2. Then
p(D)[A0 eαx ] = 0 so we must multiply yp by xm .
• If p(α ± iβ) = 0, then p(D)[A0 eαx cos βx] = 0 = p(D)[B0 eαx sin βx], so we must
multiply yp by x. (Note that p(α + iβ) = 0 ⇔ p(α − iβ) = 0.)
We summarize these results in the following:
f (x)
x e cos βx or
xk eαx sin βx
k αx
General Trial yp (x)
x (Ak xk + · · · + A0 )eαx cos βx +
m
x (Bk xk + · · · + B0 )eαx sin βx, where m is the
smallest integer 0, 1, or 2 so no term is annihilated by L
m
Table 2. The general case for undetermined coefficients
62
CHAPTER 2. SECOND-ORDER DIFFERENTIAL EQUATIONS
Example 5. Find the general solution of y 00 − 4y 0 + 5y = e2x sin x.
Solution. We first find the complementary solution by solving y 00 − 4y 0 + 5y = 0. The
characteristic equation r2 − 4r + 5 = 0 has complex roots r = 2 ± i, so
yc (x) = e2x (c1 cos x + c2 sin x).
Our first guess for finding a particular solution would be yp = e2x (A cos x + B sin x),
but this is a solution of the homogeneous equation, so we must multiply by x:
yp (x) = x e2x (A cos x + B sin x).
Let us differentiate this twice:
yp0 (x) = e2x (A cos x + B sin x) + xe2x [(2A + B) cos x + (−A + 2B) sin x]
yp00 (x) = e2x [(4A + 2B) cos x + (−2A + 4B) sin x]
+ xe2x [(3A + 4B) cos x + (−4A + 3B) sin x] .
If we plug this into y 00 − 4y 0 + 5y = e2x sin x and simplify, after some algebra we get
2B cos x − 2A sin x = sin x
⇒
A=−
1
2
and B = 0.
We conclude that our particular solution is
1
yp (x) = − x e2x cos x,
2
and the general solution is
y(x) = − 12 x e2x cos x + e2x (c1 cos x + c2 sin x).
2
Let us do one more example to illustrate the different roles played by xm and Ak xk +
· · · + A0 in Table 2.
Example 6. Find the general solution for y 00 + y = x sin x.
Solution. We first note that the general solution of the homogeneous equation y 00 +y =
0 is
yc (x) = c1 cos x + c2 sin x.
Our first guess for a particular solution would be to use m = 0 in Table 2, i.e. yp (x) =
(A1 x + A0 ) cos x + (B1 x + B0 ) sin x. But both A0 cos x and B0 sin x are solutions of the
homogeneous equation, so we must use m = 1 in Table 2:
yp (x) = x (A1 x + A0 ) cos x + x (B1 x + B0 ) sin x
= (A1 x2 + A0 x) cos x + (B1 x2 + B0 x) sin x.
Now no terms in yp are solutions of the homogeneous equation, so this should work!
Let us proceed by differentiating yp :
yp0 = (2A1 x + A0 ) cos x − (A1 x2 + A0 x) sin x + (2B1 x + B0 ) sin x + (B1 x2 + B0 x) cos x
2.5. NONHOMOGENEOUS EQUATIONS WITH CONSTANT COEFFICIENTS 63
yp00 = 2A1 cos x − (2A1 x + A0 ) sin x − (2A1 x + A0 ) sin x − (A1 x2 + A0 x) cos x
+ 2B1 sin x + (2B1 x + B0 ) cos x + (2B1 x + B0 ) cos x − (B1 x2 + B0 x) sin x.
Plug this into the equation
y 00 + y = [4B1 x + 2A1 + 2B0 ] cos x + [−4A1 x − 2A0 + 2B1 ] sin x = x sin x.
Comparing the coefficients, we have the following equations
−4A1 = 1,
4B1 = 0,
2A1 + 2B0 = 0,
−2A0 + 2B1 = 0.
We easily solve these to find A1 = −1/4, B1 = 0, A0 = 0, and B0 = 1/4, so our
particular solution is
x2
x
yp (x) = −
cos x + sin x.
4
4
Therefore, the general solution is
2
y(x) = − x4 cos x +
x
4
sin x + c1 cos x + c2 sin x.
2
Finally, let us observe that it is easy to generalize the method of undetermined
coefficients to functions of the form f (x) = f1 (x) + f2 (x), where f1 and f2 are of
different types in the table: take yp to be the sum of the two trial solutions indicated.
(Be careful to use different m parameters for the two indicated trial solutions.) Let us
illustrate this with an example.
Example 7. Find the general solution for y 00 − 4y = 8 e2x + 5 cos x.
Solution. We first find the general solution of the associated homogenous equation
y 00 − 4y = 0. The characteristic equation is r2 − 4 = (r − 2)(r + 2) = 0, so
yc (x) = c1 e2x + c2 e−2x .
In f (x) = 8 e2x + 5 cos x, the term e2x is a solution of the homogeneous equation, but
cos x and sin x are not. Consequently, we take as our trial solution
yp (x) = A x e2x + B cos x + C sin x.
We calculate yp0 = Ae2x + 2Axe2x − B sin x + C cos x and yp00 = 4Ae2x − B cos x − C sin x.
We plug these into our differential equation to obtain
4 A e2x − 5B cos x − 5C sin x = 8 e2x + 5 cos x.
We conclude A = 2, B = −1, and C = 0, so yp (c) = 2 x e2x − cos x, and the general
solution is
y(x) = 2 x e2x − cos x + c1 e2x + c2 e−2x .
2
Remark 1. There is another method for finding a particular solution of a nonhomogeneous equation when undetermined coefficients fails, either because the equation has
nonconstant coefficients, or f (x) is not of the form appearing in Table 2. This method,
called “variation of parameters,” is discussed in Exercises 3 and 4.
64
CHAPTER 2. SECOND-ORDER DIFFERENTIAL EQUATIONS
Exercises
1. Find the general solution y(x):
(a) y 00 + 3y 0 + 2y = 4 e−3x .
(d) y 00 + 9y = 2 sin 3x.
(b) y 00 + 4y = 2 cos 3x.
(e) y 00 + 2y 0 = 4x + 3.
Solution
(c) y 00 + y 0 − 2y = 4x + 1.
(f) y 00 − y 0 + 2y = 3ex + 4 cos 2x
2. Find the solution of the initial-value problem:
(a) y 00 − 4y = 8e2x , y(0) = 1, y 0 (0) = 0.
00
0
00
0
(b) y + 4y + 4y = 5 e
−2x
Solution
0
, y(0) = 1, y (0) = −1.
(c) y + 3y + 2y = x + 6ex , y(0) = 0, y 0 (0) = 1.
(d) y 00 + y = 1 + 3 sin x, y(0) = 2, y 0 (0) = 0.
3. Another method for finding a particular solution of the nonhomogeneous equation
y 00 + p(x)y 0 + q(x)y = f (x)
(2.51)
is called variation of parameters. Starting with linearly independent solutions
y1 , y2 of the associated homogeneous equation, we write
yp (x) = u1 (x)y1 (x) + u2 (x)y2 (x),
(2.52)
and try to find functions u1 , u2 so that yp satisfies (2.51).
(a) Impose the additional condition on u1 , u2 that
u01 y1 + u02 y2 = 0,
(2.53)
and plug (2.52) into (2.51) to obtain
u01 y10 + u02 y20 = f.
(2.54)
(b) Eliminate u02 from (2.53)-(2.54) and solve for u01 to find
u01 = −
y2 f
.
W (y1 , y2 )
Obtain an analogous expression for u02 .
(c) Show that yp is given by
Z
yp (x) = −y1 (x)
y2 (x) f (x)
dx + y2 (x)
W (y1 , y2 )(x)
Z
y1 (x) f (x)
dx.
W (y1 , y2 )(x)
(2.55)
2.6. FORCED MECHANICAL VIBRATIONS
65
4. Use (2.55) to find a particular solution
2.6
(a) y 00 + 3y 0 + 2y = ex
(c) y 00 + y = tan x
(b) y 00 + y = sin2 x
(d) y 00 + 9y = sec 3x
Forced Mechanical Vibrations
In this section we apply the techniques of the preceding section to study the nonhomogeneous second-order linear differential equation associated to a forced mechanical
vibration. In particular, we consider the spring-mass system (damped or undamped)
that we studied in Section 2.4, but add a time-varying external forcing function f (t).
According to Newton’s Second Law, the equation governing this motion is
m
dx
d2 x
+c
+ kx = f (t),
2
dt
dt
(2.56)
We know that the general solution of (2.56) is of the form
x(t) = xp (t) + xc (t),
where xc (t) is the general solution of the homogeneous equation mx00 +cx0 +kx = 0 that
we discussed in detail in Section 2.4. So the problem is reduced to finding a particular
solution xp (t), which we can do using the method of undetermined coefficients.
We saw that we can use the method of undetermined coefficients when f (t) is a
linear combination of terms of the form tk eαt cos βt and tk eαt sin βt. However, we shall
usually consider a simple periodic forcing function f (t) of the form
f (t) = F0 cos ωt.
(2.57)
Notice that f (t) has amplitude F0 and a circular frequency ω that we call the forcing
frequency. We know that damping can make a big difference in the form of the general
solution, so let us begin by considering the undamped case.
If we take c = 0 in (2.56), we obtain
m
d2 x
+ kx = F0 cos ωt.
dt2
(2.58)
We know that the complementary solution is
r
xc (t) = c1 cos ω0 t + c2 sin ω0 t,
where ω0 =
k
.
m
The quantity ω0 is called the natural frequency of the spring-mass system, and is
independent of the forcing frequency ω. It is not surprising that the behavior of the
particular solution depends on whether these two frequencies are equal or not.
Fig.1. Spring-mass system
with external forcing.
66
CHAPTER 2. SECOND-ORDER DIFFERENTIAL EQUATIONS
Undamped, Forced Vibrations: ω 6= ω0
We can use A cos ωt instead of A cos ωt + B sin ωt
since the left hand side
does not involve x0 .
As our trial solution for (2.58), let us take
xp (t) = A cos ωt.
Since x00p (t) = −ω 2 A cos ωt, we obtain from (2.58)
(−m ω 2 + k) A cos ωt = F0 cos ωt,
so
A=
F0 /m
F0
= 2
.
k − m ω2
ω0 − ω 2
Thus our particular solution is
xp (t) =
F0 /m
cos ωt
ω02 − ω 2
and our general solution is
x(t) =
F0 /m
cos ωt + c1 cos ω0 t + c2 sin ω0 t.
ω02 − ω 2
(2.59)
Once we have determined c1 and c2 from given initial conditions, we can put that part
of the solution into amplitude-phase form and write the solution as
x
x(t) =
t
Fig.2. Spring-mass system
with periodic external forcing.
F0 /m
cos ωt + C cos(ω0 t − φ).
ω02 − ω 2
(2.60)
In this form, it is clear that the solution is a superposition of two vibrations, one with
circular frequency ω0 (the natural frequency) and one with frequency ω (the forcing
frequency). If ω and ω0 happen to satisfy pω = qω0 for some positive integers p and q,
then the solution is periodic with period T = 2πp/ω0 = 2πq/ω; otherwise, the vibration
is not periodic, and can appear quite complicated (see Figure 2).
Although we have assumed ω 6= ω0 , it is possible for ω to be close to ω0 ; in this case
something quite interesting occurs. For simplicity, let us assume that the mass is initially
at rest when the external force begins, i.e. we have initial conditions x(0) = 0 = x0 (0).
Using these in (2.59), we find
F0 /m
+ c1 = 0
ω02 − ω 2
so our solution is
x(t) =
and c2 = 0,
F0 /m
(cos ωt − cos ω0 t).
ω02 − ω 2
Now we use the trigonometric identity 2 sin A sin B = cos(A − B) − cos(A + B) with
A = (ω0 + ω)t/2 and B = (ω0 − ω)t/2 to put our solution in the form
x(t) =
2F0 /m
(ω0 − ω)t
(ω0 + ω)t
sin
sin
.
ω02 − ω 2
2
2
(2.61)
2.6. FORCED MECHANICAL VIBRATIONS
67
Since ω ≈ ω0 , we see that sin(ω0 − ω)t/2 has very small frequency (i.e. is slowly varying)
compared with sin(ω0 + ω)t/2. Therefore, we can consider our solution to be rapidly
varying with frequency (ω0 + ω)/2 and a slowly varying amplitude (see Figure 3):
x(t) = A(t) sin
(ω0 + ω)t
,
2
where A(t) =
x
(ω0 − ω)t
2F0 /m
sin
.
ω02 − ω 2
2
t
This phenomenon is experienced in acoustics in the form of beats: if two musical
instruments play two notes very close in pitch (i.e. frequency), then their combined
volume (i.e. amplitude) will be heard to vary slowly in regular “beats.”
Fig.3. “Beats” in ampli-
Undamped, Forced Vibrations: ω = ω0 (Resonance)
If ω = ω0 , then the denominator in (2.59) is zero, so that formula cannot be used for
the solution. In fact, f (t) = F0 cos ω0 t is a solution of the homogeneous equation, so we
need to modify our trial solution in the method of undetermined coefficients:
tude result when natural
and forcing frequencies are
close.
xp (t) = t (A cos ω0 t + B sin ω0 t).
We need to include both
A cos ωt and B sin ωt because we have multiplied
by t.
If we plug into the equation (2.58), we obtain
−2Aω0 sin ω0 t + 2Bω0 cos ω0 t =
F0
cos ω0 t,
m
x
which implies
A=0
F0
and B =
,
2mω0
and so our particular solution is
xp (t) =
F0
t sin ω0 t.
2mω0
t
(2.62)
This solution oscillates with frequency ω0 but ever-increasing amplitude due to the
factor t (see Figure 4). This is called resonance.
The phenomenon of resonance is very important in physical structures like buildings
and bridges which have a natural frequency of motion, and are subject to external
forces such as wind or earthquakes. If the external force is applied periodically with
the same frequency as the natural frequency of the structure, it will lead to larger and
larger vibrations which could eventually destroy the structure. Moreover, this can occur
even if the external force is very small but is sustained over a long period of time. For
example, the Broughton Suspension Bridge near Manchester, England, collapsed in 1831
when British troops marched across in step; as a result of this event, the British military
issued an order that troops should “break step” when marching across a bridge.
Damped, Forced Vibrations
Physical systems usually involve some damping forces, so it is important to analyze
(2.56) when c is not zero. As above, we consider the periodic forcing function (2.57), so
we want to study the behavior of solutions to
m
d2 x
dx
+c
+ kx = F0 cos ωt,
2
dt
dt
(2.63)
Fig.4. Resonance results
when natural and forcing
frequencies coincide.
68
CHAPTER 2. SECOND-ORDER DIFFERENTIAL EQUATIONS
Recall that we solved the homogeneous equation in Section 2.4, and found the complementary solution xc (t) in the three cases: overdamped, critically damped, and underdamped. In each case, we found that xc (t) → 0 exponentially quickly as t → ∞. For
this reason, xc (t) is called the transient part of the general solution; this terminology
is also applied to a solution of an initial-value problem (see Example 1 below).
Now let us find a particular solution xp (t) for (2.63). We take as our trial solution
xp (t) = A cos ωt + B sin ωt.
(2.64)
We can plug this into our equation and evaluate the coefficients A and B to find (see
Exercise 1):
A=
(k − mω 2 )F0
(k − mω 2 )2 + c2 ω 2
and B =
c ωF0
.
(k − mω 2 )2 + c2 ω 2
(2.65)
However, we would like to write (2.64) in amplitude-phase form C cos(ωt − φ), so let us
introduce C > 0 and 0 ≤ φ < 2π satisfying
C cos φ = A and C sin φ = B.
We immediately find (after some easy algebra)
C=
x
t
p
A2 + B 2 = p
F0
(k − mω 2 )2 + c2 ω 2
.
To find φ, we observe that C, B > 0 ⇒ sin φ > 0 ⇒ 0 < φ < π; of course, we also have
tan φ = B/A, so we will have

−1
2

tan (B/A) if A > 0 (i.e. if k > mω )
(2.66)
φ = π/2 if A = 0 (i.e. if k = mω 2 )

 −1
tan (B/A) + π if A < 0 (i.e. if k < mω 2 ).
In any case, we can write our particular solution as
Fig.5. Vibration becomes
steady-periodic.
Fig.6. Practical resonance.
F0
xp (t) = p
cos(ωt − φ).
(k − mω 2 )2 + c2 ω 2
(2.67)
This is, of course, a periodic function. Since the general solution of (2.63) is of the form
x(t) = xp (t) + xc (t) and we have already observed that xc (t) → 0 as t → ∞, we see that
every solution of (2.63) tends towards xp (t) as t → ∞. For this reason, xp is called the
steady-periodic solution of (2.63), and sometimes denoted xsp .
Since c > 0, we do not have true resonance: in particular, the trial solution (2.64) is
never a solution of the associated homogeneous equation, so we
pdo not need to multiply
k/m, the denominator
by t. However, if c is very small and ω is close to ω0 =
in (2.67) can be very close to zero, which means that the amplitude of the steadyperiodic solution can be very large. This is called practical resonance: starting from
equilibrium, the increasing amplitudes can look a lot like resonance before they taper
off into the steady-periodic solution (see Figure 6). Moreover, a very large amplitude
in the steady-periodic solution can be as destructive for buildings and bridges as is true
2.6. FORCED MECHANICAL VIBRATIONS
69
resonance, so it needs to be avoided. In fact, all physical structure experience at least
a small amount of damping, so practical resonance is actually more relevant than true
resonance. Incidentally, one might suspect that the maximal practical resonance occurs
when ω = ω0 , i.e. k − mω 2 = 0, but this is not quite correct (cf. Exercise 5).
Example 1. Consider a mass of 1 kg attached to a spring with k = 17 N/m and a
dashpot with c = 2 N/(m/s). If the mass begins at equlibrium and then is subjected
to a periodic force f (t) = 5 cos 4t, (a) find the solution, (b) identify the transient and
steady-periodic parts of the solution, and (c) discuss practical resonance for various
periodic forcing functions.
Solution. The initial-value problem governing the motion is
x00 + 2x0 + 17x = 5 cos 4t,
x(0) = 0 = x0 (0).
Rather than simply plugging into the solution formulas derived above, let us apply the
solution method; this is usually the best way to approach a specific problem. So we
begin with finding the general solution of the associated homogeneous equation
x00 + 2x0 + 17x = 0.
The characteristic equation is r2 +2r +17 = 0, which has complex solutions r = −1±4i.
So the system is underdamped and the complementary solution is
xc (t) = e−t (c1 cos 4t + c2 sin 4t).
To find a particular solution of the nonhomogeneous equation, we use
xp (t) = A cos 4t + B sin 4t.
Plugging into the nonhomogeneous equation and equating coefficients of cosine and sine,
we obtain
A + 8B = 5 and B − 8A = 0.
We easily solve these equations to find A = 1/13 and B = 8/13, so our particular
solution is
8
1
cos 4t +
sin 4t.
xp (t) =
13
13
We now have our general solution
x(t) =
1
8
cos 4t +
sin 4t + e−t (c1 cos 4t + c2 sin 4t).
13
13
x
0.5
We next use our initial conditions to evaluate c1 and c2 :
1
1
+ c1 = 0 ⇒ c1 = −
13
13
32
33
0
x (0) =
− c1 + 4c2 = 0 ⇒ c2 = − .
13
52
x(0) =
We conclude that the solution of the initial-value problem is
x(t) =
1
8
1
33
cos 4t +
sin 4t − e−t cos 4t − e−t sin 4t.
13
13
13
52
t
0
1
2
3
4
5
6
7
-0.5
Fig.7. Example 1 solution.
8
70
CHAPTER 2. SECOND-ORDER DIFFERENTIAL EQUATIONS
We easily identify the transient and steady-periodic parts of the solution:
1 −t
33
e cos 4t − e−t sin 4t.
13
52
p
√
Note that the amplitude of the steady-periodic solution is 65/13 = 5/13 ≈ 0.620,
and we can put the steady-periodic solution into amplitude-phase form (as in 2.67)
√
65
xsp (t) =
cos(4t − 1.45),
13
xsp (t) =
1
8
cos 4t +
sin 4t,
13
13
xtr (t) = −
where we have used phase angle φ = tan−1 (8) ≈ 1.45.
Now let us discuss practical resonance. Notice
√ that the natural frequency of the
spring-mass system (without damping) is ω0 = 17 ≈ 4.123, so the forcing frequency
ω = 4 is relatively close to ω0 ; this means
that practical resonance should be contributing
√
to the steady-periodic amplitude of 65/13 = 0.602. On the other hand, if we had taken
a forcing function with the same magnitude but a frequency ω̃ further from ω0 , then the
amplitude of the resulting steady-periodic solution should be reduced. For example, if
we take fe(t) = 5 cos 2t, then the steady-periodic solution according to (2.67) is
5
cos(2t − φ)
185
√
for some phase angle φ; notice that the amplitude 5/ 185 ≈ 0.368 is much reduced from
0.620! (Of course, we could choose another frequency
that would increase the effect of
√
practical resonance. If we use the value ω ∗ = 15 found in Exercise 4, we find that the
forcing function fe(t) = 5 cos(ω ∗ t) results in a steady-periodic solution with amplitude
5/8 ≈ 0.625, which is slightly larger than 0.602.)
2
x
esp (t) = √
While Example 1 illustrates the effect of practical resonance, one might protest
that an increase in amplitude from 0.368 to 0.620 (or even to 0.625) does not seem as
catastrophic as the effect of resonance when there is no damping. But the value c = 2
that was used in Example 1 is larger (compared with m = 1 and k = 17) than would
be expected in cases in which practical resonance might be a concern. If a very small
value of c is used, then the amplitude of the steady-periodic solution that results from
using a near-resonant forcing frequency can become very large; cf. Exercise 8.
Exercises
1. A mass of 1 kg is attached to a spring with constant k = 4 kg/ft. Initially at
equilibrium, a periodic external force of f (t) = cos 3t begins to affect the mass for
t > 0. Find the resultant motion x(t).
2. A 100 lb weight is attached to a vertical spring, which stretches the spring 1 ft.
Then the weight is subjected to a periodic external force f (t) = f0 cos ωt. For
what value of the circular frequency ω will resonance occur?
3. For the following undamped, forced vibrations mx00 + kx = f (t), identify the natural and forcing frequencies, ω0 and ω, and determine whether resonance occurs.
If resonance does not occur but ω ≈ ω0 , find the amplitude and frequency of the
beats.
2.7. ELECTRICAL CIRCUITS
71
(a) 2x00 + 18x = 5 cos 3t.
(b) 3x00 + 11x = 2 cos 2t.
(c) 6x00 + 7x = 3 cos t.
Solution
(d) 3x00 + 12x = 4 cos 2t.
4. Evaluate the constants A and B in (2.64) and confirm that they have the values
(2.65).
5. To find the frequency ω ∗ that will produce maximal practical resonance, we want
to minimize the denominator in (2.67). Use calculus to show that, provided c2 <
2mk, such a minimum occurs at
r
2mk − c2
∗
.
ω =
2m2
p
(Note that ω ∗ ≈ k/m for c very small.)
6. For the given values of m, c, k, and f (t), assume the forced vibration is initially
at equilibrium. For t > 0, find the motion x(t), and identify the steady-periodic
and transient parts.
(a) m = 1, c = 4, k = 5, f (t) = 10 cos 3t.
Solution
(b) m = 1, c = 6, k = 13, f (t) = 29 sin 5t.
(c) m = 2, c = 2, k = 1, f (t) = 5 cos t.
7. For the given values of m, c, k, and f (t), find the steady-periodic solution in
amplitude-phase form A cos(ωt − φ).
(a) m = 1, c = 2, k = 5, f (t) = 26 cos 3t.
(b) m = 1, c = 2, k = 6, f (t) = 5 sin 4t.
8. Let m = 1, c = 0.1, k = 16, and f (t) = 5 cos ωt where ω is to be determined.
For each of the values of ω below, calculate the amplitude of the steady-periodic
solution. Can you account for the dramatically different numbers?
(a) ω = 2,
2.7
(b) ω = 4,
(c) ω = 6
Electrical Circuits
In this section we consider another application of second-order differential equations,
namely to an electrical circuit involving a resistor, an inductor, and a capacitor. If we
let vR , vL , and vC denote the respective voltage changes across these three components,
then Kirchoff’s circuit laws tells us
vR + vL + vC = v(t),
(2.68)
where v(t) is the (possibly) time-varying voltage provided by an electrical source in the
circuit. The quantities vR , vL , and vC can be expressed in terms of the electric charge
on the capacitor q(t), which is measured in coulombs (C), and the electric current in
the circuit i(t), which is measured in amperes (A):
Fig.1. An RLC Circuit.
72
CHAPTER 2. SECOND-ORDER DIFFERENTIAL EQUATIONS
• According to Ohm’s Law, the voltage drop across the resistor is proportional to
the current. If we denote this proportionality constant by R, measured in ohms
(Ω), then we obtain
vR = R i.
• A capacitor stores charge and opposes the passage of current. The resultant
difference in voltage across the capacitor is proportional to the charge, and it is
customary to denote this proportionality constant by 1/C, where C is measured
in farads (F). Consequently, we have
vC =
1
q.
C
• An inductor stores energy and opposes the passage of current. The resultant
difference in voltage across the inductor is proportional to the rate of change of
the current; if we denote this constant by L, measured in henrys (H), then we
have
di
vL = L .
dt
Substituting these relations into Kirchoff’s voltage law (2.68), we obtain
L
di
1
+ R i + q = v(t).
dt
C
But the current is just the rate of change of the charge, so
i(t) =
dq
,
dt
(2.69)
and we obtain the second-order linear nonhomogeneous equation for q:
L
1
d2 q
dq
+ q = v(t).
+R
2
dt
dt
C
(2.70)
In applications, the current is usually more important than the charge, but if we solve
(2.70) with initial conditions
q(0) = q0
and i(0) = i0
(2.71)
to find q(t), then we can find i(t) using (2.69).
If we compare (2.70) with (2.56), we see that there is a mechanical-electric analogue
provided by the following table:
LRC Circuit
Spring-Mass System
q(t)
x(t)
L
m
R
c
1/C
k
v(t)
f (t)
To apply this analogue, let us consider the homogenous equation associated with (2.70):
L
d2 q
dq
1
+R
+ q = 0.
2
dt
dt
C
(2.72)
2.7. ELECTRICAL CIRCUITS
73
We find the characteristic equation is L r2 + R r + C −1 = 0, which has roots
p
−R ± R2 − 4L/C
r=
.
2L
Analogous to mechanical vibrations, we have the three cases:
• R2 > 4L/C is overdamped and the general solution of (2.72) is
p
R2 − 4L/C
−Rt/2L
−µ t
µt
.
q(t) = e
(c1 e
+ c2 e ), where µ =
2L
• R2 = 4L/C is critically damped and the general solution of (2.72) is
q(t) = e−Rt/2L (c1 + c2 t).
• R2 < 4L/C is underdamped and the general solution of (2.72) is
p
4L/C − R2
−Rt/2L
.
q(t) = e
(c1 cos µt + c2 sin µt), where µ =
2L
√
In particular, if R = 0 then µ simplifies to 1/ LC; we generally write
ω0 = √
1
,
LC
as this corresponds to the natural frequency of the “undamped circuit.”
Recall that the general solution of (2.70) is given by
q(t) = qp (t) + qc (t),
where qp is a particular solution of (2.70) and qc is the general solution of (2.72). Since
lim qc (t) = 0,
t→∞
we call qc the transient part of the solution. To determine a particular solution qp of
(2.70), we can use undetermined coefficients, depending on the form of v(t). The two
most common cases are when v(t) ≡ v0 is a constant (this corresponds to direct current
such as supplied by a battery) or v(t) is periodic (this corresponds to alternating
current, such as household power). We consider the periodic case first.
Let us assume our periodic electric source is v(t) = V0 cos ωt, where V0 is the amplitude and ω is the frequency, so (2.70) becomes
L
d2 q
dq
1
+R
+ q = V0 cos ωt.
dt2
dt
C
(2.73)
But we used undetermined coefficients to find a particular solution for (2.73) under the
mechanical vibration analogue, so we can simply avail ourselves of the formula (2.67),
and make the appropriate substitutions to obtain
qp (t) = p
V0
(C −1
− Lω 2 )2 + R2 ω 2
cos(ωt − φ),
(2.74)
74
CHAPTER 2. SECOND-ORDER DIFFERENTIAL EQUATIONS
where 0 < φ < π is the phase shift given by (2.66). Since q(t) → qp (t) as t → ∞, we
call qp the steady-periodic charge and denote it by qsp , just as we did for mechanical
vibrations. By differentiating qsp we obtain the steady-periodic current:
V0
sin(ωt − φ − π),
isp (t) = q
1
( Cω − Lω)2 + R2
(2.75)
where we have used − sin(θ) = sin(θ − π) to remove the negative sign that result from
differentiating cos.
Example 1. Suppose a resistor of 3 ohms, a capacitor of 5 × 10−3 farads, and an
inductor of 10−2 henrys are connected in an electric circuit with a 110 volt, 60 Hz
alternating current generator. Find the steady periodic current isp (t).
Solution. We have R = 3, C = 5 × 10−3 , and L = 10−2 . Frequency 60 Hz means
ω = (2π) 60 = 120π, so our initial-value problem for q(t) is
10−2 q 00 + 3q 0 + 200 q = 110 cos(120π t),
q(0) = 0 = q 0 (0).
We first find qsp . We can plug our values for L, R, and C into (2.74) to obtain
qsp (t) = p
110
(200 − 10−2 ω 2 )2 + 9 ω 2
cos(ωt − φ),
ω = 120π,
but to obtain the phase shift φ we need to appeal again to our mechanical vibration
analogue: making the appropriate substitutions into (2.65), we find φ satisfies
cos φ = p
(200 − ω 2 ) 110
(200 − 10−2 ω 2 )2 + 9 ω 2
and
sin φ = p
3 ω 110
(200 − 10−2 ω 2 )2 + 9 ω 2
.
Since 200 − ω 2 = 200 − (120π)2 < 0, we see that cos φ < 0, so φ is in the second
quadrant, and we compute
φ = tan−1
3ω
+ π ≈ 3.134.
200 − ω 2
Evaluating the amplitude in qsp we can write
qsp (t) = 6.61 × 10−2 cos(120π t − 3.134).
This is the steady-periodic charge, but to find the steady-periodic current we need to
differentiate. Since 120π × 6.61 × 10−2 ≈ 24.9 and − sin(θ) = sin(θ − π), we obtain
isp (t) = −24.9 sin(120π t − 3.134) = 24.9 sin(120π t − 6.276).
2
In the previous problem, we did not need (and were not given) initial conditions.
But if we had been given initial conditions, then we could have used them to find the
solution q(t) satisfying the initial conditions. Since q(t) = qp (t) + qc (t), this means we
would also have found the transient part of the solution, namely qc (t). Some examples of
this occur in the Exercises. But now let us consider a problem involving direct current.
2.7. ELECTRICAL CIRCUITS
75
Example 2. Suppose the circuit in the previous example is disconnected from the
alternating current generator, and all the charge is allowed to decay. At t = 0, a
battery supplying a constant power of 110 volts is attached. Find the charge q(t) and
the current i(t).
Solution. At t = 0 there is no charge (q(0) = 0) and no current (i(0) = q 0 (0) = 0), so
our initial-value problem for q(t) is
10−2 q 00 + 3q 0 + 200 q = 110,
q(0) = 0 = q 0 (0).
(2.76)
This is a nonhomogeneous equation that we can easily solve to find q(t) and then
differentiate to find i(t). But if we differentiate the equation and use q 00 = i0 and
q 000 = i00 , we obtain a homogeneous equation for i(t), namely
10−2 i00 + 3i0 + 200i = 0.
However, we need initial conditions to solve this equation. We certainly have i(0) =
q 0 (0) = 0, but what about i0 (0)? We simply evaluate 10−2 i0 + 3i + 200q = 110 at t = 0
to obtain
10−2 i0 (0) + 3 i(0) + 200 q(0) = 110
⇒
i0 (0) = 1.1 × 104 .
Consequently, our initial-value problem for i(t) becomes
10−2 i00 + 3 i0 + 200 i = 0,
i(0) = 0, i0 (0) = 1.1 × 104 .
To solve this initial-value problem, we consider the characteristic equation:
√
−300 ± 9 × 104 − 8 × 104
2
4
= −100, −200.
r + 300 r + 2 × 10 = 0 ⇒ r =
2
We see the circuit is overdamped and the general solution is
i(t) = c1 e−100t + c2 e−200t .
We next use the initial conditions to evaluate c1 and c2 : c1 = 1.1 × 102 , c2 = −1.1 × 102 .
So the current in the circuit is
i(t) = 1.1 × 102 (e−100 t − e−200 t ).
In particular, we see that the current decays rapidly to zero even though there is a
constant electric source.
Now we can simply integrate i(t) to find
q(t) = −1.1 e−100t + 0.55 e−200t + c,
and then use q(0) = 0 to obtain c = 0.55. We conclude that the charge is given by
q(t) = 0.55 − 1.1 e−100t + 0.55 e−200t .
Note that q(t) → 0.55 as t → ∞, so it does not decay to zero. (Of course, q(t) could
also have been found directly by solving the nonhomogeneous equation (2.76).
2
76
CHAPTER 2. SECOND-ORDER DIFFERENTIAL EQUATIONS
Electrical Resonance
Resonance can occur in electrical circuits in the same way that it occurs in mechanical
vibrations, except that we need to specify whether we are considering q(t) √
or i(t). In
the undamped case (R = 0), we see that true resonance occurs at ω0 = 1/ LC; this
can be found by making the denominator in either (2.74) or (2.75) equal to zero, or by
considering the roots of the characteristic equation Lr2 + C −1 = 0 that is associated
with (2.73). Moreover, we see that practical resonance for qsp (t) can occur if the
denominator in (2.74) is small; as for mechanical vibrations, this requires R to be
small and ω close to ω0 , but the resonant frequency ω ∗ is that which minimizes the
denominator in (2.74). We can also consider practical resonance for isp : it occurs at
the frequency ω # that minimizes the denominator in (2.75). It is interesting that the
resonant frequencies ω ∗ and ω # are not exactly the same (see Exercises 1 and 2).
Recall from Section 2.6 that resonance in mechanical vibrations can be a very destructive phenomenon. However, in electrical circuits, resonance is used in some very
useful applications. One example is tuning a radio. To tune in a radio station broadcasting with frequency ω, we can adjust
the capacitor C so that ω coincides with the
√
resonant frequency of the circuit 1/ LC, i.e. we let C = 1/Lω 2 . This maximizes the
amplitude of isp , i.e. the volume of the sound of that station, enabling us to hear it
above stations broadcasting at other frequencies.
Exercises
1. Find the frequency ω ∗ in the electric source that will induce practical resonance
for the steady-periodic charge qsp (t) in (2.74).
2. Find the frequency ω # in the electric source that will induce practical resonance
for the steady-periodic current isp (t).
3. For the following LRC circuits with periodic electric source v(t), find the steadyperiodic current in the form isp (t) = I0 sin(ω t − δ), where I0 > 0 and 0 ≤ δ < 2π.
(a) R = 30, L = 10, C = 0.02, v(t) = 50 cos 2t.
(b) R = 20, L = 10, C = 0.01, v(t) = 200 cos 5t
(c) R = 3/2, L = 1/2, C = 2/3, v(t) = 13 cos 3t
4. For the following LRC circuits with electric source v(t), assume the charge and
current are initially zero, and find the current i(t) for t > 0. Is the circuit overdamped, critically damped, or underdamped?
(a) L = 1, R = 2, C = 1/5, v(t) = 10.
(b) L = 2, R = 60, C = 0.0025, v(t) = 100e−5t , q(0) = 1, i(0) = 0.
(c) L = 10, R = 20, C = 0.01, v(t) = 130 cos 2t, q(0) = 0, i(0) = 0.
(d) L = 1/2, R = 3/2, C = 1, v(t) = 5 cos 2t, q(0) = 1, i(0) = 0
2.8. ADDITIONAL EXERCISES
2.8
77
Additional Exercises
1. An object of unknown mass m stretches a vertical spring 2 feet. Then the object
is given an upward velocity of 3 ft/sec. Find the period of the resultant motion.
2. An object of unknown mass m is attached to a spring with spring constant k = 1
N/m. If the mass is pulled 1 m beyond its equilibrium position and given velocity
1 m/sec back towards its equilibrium position, this results in an oscillation with
amplitude 3 m. Find the mass m.
3. Two springs S1 and S2 both have natural length 1 m, but different spring constants: k1 = 2 N/m and k2 = 3 N/m. Suppose the springs are attached to two
walls 1 m apart with a mass m = 1 kg between them.
(a) In the equilibrium position for this new arrangement, what are the lengths
of S1 and S2 ? (Neglect the “length” of the mass.)
(b) If the mass is displaced from the equilibrium position and released, find the
frequency of the resultant vibration.
4. Consider the springs S1 , S2 and mass m = 1 kg as in the previous exercise, but
now assume the two walls are 3 m apart. Answer the same questions (a) and (b).
5. A mass m = 1 kg is attached to a spring with constant k = 4 N/m and a dashpot
with variable damping coefficient c. If the mass is to be pulled 2 m beyond its
equilibrium (stretching the spring) and released with zero velocity, what value of
c ensures that the mass will pass through the equilibrium position and compress
the spring exactly 1 m before reversing direction?
6. A mass m = 1 kg is attached to a spring with constant k = 4 N/m and a dashpot
exerting a resistive force propostional to velocity but with unknown coefficient
c. The mass is pulled
√ 1 m beyond its equilibrium resulting in a motion with
pseudo-period µ = 3. Find c.
7. Consider the homogeneous equation with nonconstant coefficients
y 00 +
1 0
1
y − 2y=0
x
x
for x > 0.
(a) Assuming y = xr , find two linearly independent solutions y1 , y2 .
(b) Use variation of parameters (see Exercise 3 in Section 2.5) to find a particular
solution of
√
1
1
y 00 + y 0 − 2 y = x for x > 0.
x
x
8. Consider the homogeneous equation with nonconstant coefficients
y 00 +
1 0
1
y + 2y=0
x
x
for x > 0.
(a) Find two linearly independent (real-valued) solutions y1 , y2 . (Hint: first try
y = xr .)
Fig.1. Exercise 3.
78
CHAPTER 2. SECOND-ORDER DIFFERENTIAL EQUATIONS
(b) Use variation of parameters to find a particular solution of
y 00 +
1 0
ln x
1
y − 2y= 2
x
x
x
for x > 0.
9. A car’s suspension system and springs and shock absorbers may be modeled as a
spring-mass-dashpot system (see Figure 2); irregularities in the road surface act
as a forcing function. Suppose the mass of the car is 500 kg, the springs in the
suspension system have constant k = 104 N/m, and the shock absorbers have
damping coefficient c = 103 N/(m/sec). Suppose the road surface has periodic
vertical displacements 0.5 cos π5 x, where x is the position along the road.
(a) Show that y(t), the vertical displacement from the car’s equilibrium position,
satisfies
π
y 00 + 2y 0 + 20y = 10 cos vt,
5
where v is the speed of the car.
Fig.2. Exercise 9.
(b) If the shock absorbers break, what speed of the car will induce resonance
along this road?
10. Let us model a tall building as a spring-mass system: a horizontal force F = kx is
required to displace the top floor of the building a distance x from its equilibrium
position. (The model treats the top floor as the mass and the rest of the building
as the spring.)
(a) Suppose the top floor has mass 103 kg and a force of 500 N is required to
displace it 1 m. If a wind exerts a periodic force f (t) = 100 cos π2 t N on the
top floor, find the magnitude of the horizontal displacements.
(b) What frequency ω of the periodic wind force f (t) = 100 cos ωt will produce
resonance?
Fig.3. Exercise 10.
(c) If the building is initially in equilibrium and a wind force f (t) = 100 cos ωt
with the resonant frequency ω found in (b), how long will it take until the
magnitude of the horizontal displacment of the top floor reaches 10 m?
11. Suppose that a pendulum of rod length L and mass m experiences a horizontal
force f (t) as in Figure 8.
(a) Show that the nonlinear model (2.45) becomes
mL
Fig.8. Forced Pendulum.
d2 θ
+ mg sin θ = cos θ f (t).
dt2
(b) For small oscillations, show that this is well-approximated by the linear equation
d2 θ
mL 2 + mg θ = f (t).
dt
Chapter 3
Laplace Transform
3.1
Laplace Transform and Its Inverse
The basic idea of the Laplace transform is to replace a differential equation with an
algebraic equation. This is achieved by starting with a function f of the variable t,
and transforming the function into a new function F of a new variable s, in such a
way that derivatives in t are transformed into multiplication by s. As we shall see, this
is particularly useful when solving initial-value problems for differential equations that
involve discontinuous terms. Let us discuss the details.
Definition 1. If f (t) is defined for t ≥ 0, then its Laplace transform F (s), also
denoted Lf (s) or L[f (t)], is defined by
Z ∞
F (s) = L[f (t)] =
e−st f (t) dt,
(3.1)
The transform strategy
0
for values of s for which the improper integral converges.
The Laplace transform is
named after French mathematician Pierre Simon de
Laplace (1749-1827)
Recall that the improper integral is defined by
Z ∞
Z N
−st
e f (t) dt = lim
e−st f (t) dt,
N →∞
0
0
and if the limit exists then we say that the improper integral converges. Notice that the
integrand e−st f (t) depends on s and t; it is typical that the improper integral converges
for some values of s and not for others. Let us calculate some examples.
Example 1. If f (t) ≡ 1 for t ≥ 0, then for s 6= 0 we calculate
Z ∞
Z N
F (s) = L[1] =
e−st dt = lim
e−st dt
N →∞
0
0
t=N
1 −st
1 1 −sN
= lim − e
= lim
− e
.
N →∞
N →∞ s
s
s
t=0
Now, provided s > 0, we have
1
s
e−sN → 0 as N → ∞, so we conclude
L[1] =
1
s
for s > 0. 2
79
(3.2)
80
CHAPTER 3. LAPLACE TRANSFORM
Example 2. If f (t) = eat for a real number a, then
Z
∞
F (s) =
e
Z
−st at
∞
e dt =
0
e
0
Now if s > a, then limN →∞
we conclude
e(a−s)N
(a−s)
(a−s)t
e(a−s)t
dt =
a−s
= 0; and since evaluating
t=∞
.
t=0
e(a−s)t
(a−s)
at t = 0 gives
1
a−s ,
1
for s > a.
(3.3)
s−a
It is worth noting that (3.3) continues to hold if a is a complex number and we replace
the condition s > a by s > Re(a).
2
L[eat ] =
As we do more examples, we shall gradually treat improper integrals less formally.
Example 3. If f (t) = tn for a positive integer n, then we can use integration by parts
to compute
−st n t=∞
Z
Z ∞
n ∞ −st n−1
e
t
n
−st n
e
t
dt.
F (s) = L[t ] =
+
e
t dt =
−s t=0
s 0
0
Provided s > 0, we can say limt→∞ e−st tn = 0 since exponential decay is greater than
polynomial growth, and evaluating e−st tn at t = 0 also gives zero. So
Z
n ∞ −st n−1
F (s) =
e
t
dt.
s 0
We have not reached our answer yet, but we have made progress: the power of t has been
reduced from n to n − 1. This suggests that we apply integration by parts iteratively
until we reduce to the Laplace transform of t0 = 1, which we know from Example 1:
Z
Z
Z
n ∞ −st n−1
n n − 1 ∞ −st n−2
n! ∞ −st
n!
e
t
dt =
e
t
dt = · · · = n
e
dt = n+1 .
s 0
s s
s 0
s
0
We conclude
n!
L[tn ] =
for s > 0. 2
sn+1
(3.4)
Example 4. If f (t) = sin bt for a real number b 6= 0, then we can use a table of integrals
(cf. Appendix C) to compute
−st
t=∞
Z ∞
e
F (s) =
e−st sin bt dt = 2
(−s
sin
bt
−
b
cos
bt)
s + b2
0
t=0
=
b
s2 + b2
We have shown
L[sin bt] =
for s > 0.
b
s2 + b2
for s > 0.
(3.5)
for s > 0. 2
(3.6)
A similar calculation shows
L[cos bt] =
s
s2 + b2
3.1. LAPLACE TRANSFORM AND ITS INVERSE
81
The next example generalizes Example 3.
Example 5. Suppose f (t) = ta for a real number a > −1. Then we use a substitution
u = st ⇒ du = sdt to evaluate the Laplace transform:
Z ∞
Z ∞
Z ∞
ua du
1
e−u ua du.
L[ta ] =
e−st ta dt =
e−u a
= a+1
s
s
s
0
0
0
We conclude that
L[ta ] =
1
Γ(a + 1).
sa+1
(3.7)
In (3.7) we have used the gamma function Γ(x) that is defined for x > 0 by
∞
Z
e−t tx−1 dt.
Γ(x) =
(3.8)
0
The gamma function arises in several fields of mathematics, including probability and
statistics. Some of its important properties include
Γ(1) = 1
(3.9a)
Γ(x + 1) = x Γ(x)
√
Γ(1/2) = π.
(3.9b)
(3.9c)
(These properties are not difficult to check: see Exercise 1.) Using (3.9b) we see that
Γ(n + 1) = n!, so if a = n then (3.7) agrees with (3.4).
Now let us discuss the linearity of the Laplace transform. The next result follows
immediately from the linearity of the integral in the definition of the Laplace transform.
Theorem 1. If f (t) and g(t) are defined for t ≥ 0, and their Laplace transforms F (s)
and G(s) are defined for s > a, then the Laplace transform of f (t) + g(t) exists for s > a
and equals F (s) + G(s). In other words,
L[f (t) + g(t)] = L[f (t)] + L[g(t)].
Moreover, if λ is a real or complex number, then
L[λf (t)] = λL[f (t)].
This result enables us to compute the Laplace transform of more complicated-looking
functions.
Example 6. If f (t) = 2t3 + 5e−t − 4 sin 2t, then the Laplace transform is
F (s) = L[f (t)] = 2L[t3 ] + 5L[e−t ] − 4L[sin 2t]
1
− 4 s22+4 =
= 2 s3!4 + 5 s+1
12
s4
+
5
s+1
−
8
s2 +4
for s > 0.
2
82
CHAPTER 3. LAPLACE TRANSFORM
Note: There is a table of frequently encountered Laplace transforms in Appendix D.
Existence of the Laplace Transform
We have defined the Laplace transform and computed it for several examples. However, we have not discussed conditions on f (t) that are sufficient to guarantee that its
Laplace transform F (s) exists, at least for sufficiently large s. We do this now.
For the Laplace transform to exist, we need the improper integral in (3.1) to converge.
We can allow f to be discontinuous, but not too wild.
|
|
t
Fig.1. A piecewise
continuous function.
Definition 2. A function f defined on a closed interval [a, b] is piecewise continuous
if [a, b] can be divided into a finite number of subintervals such that
(a) f is continuous on each subinterval, and
(b) f has finite limits (from within) at the endpoints of each subinterval.
A function is piecewise continuous on [0, ∞) if it is piecewise continuous on [0, b]
for each finite value b > 0.
An important example of a piecewise continuous function is the unit step function
(
0, for t < 0
u(t) =
(3.10)
1, for t ≥ 0.
Notice that u is continuous on the subinterval (−∞, 0) with a finite limit (from below)
at the endpoint 0:
u(0−) = lim u(t) = 0.
2
1
t→0−
0
a
t
And u is continuous on the subinterval [0, ∞) with a finite limit (from above) at the
endpoint 0:
Fig.2. Graph of u(t − a).
u(0+) = lim u(t) = 1.
t→0+
The discontinuity of u at t = 0 is called a finite jump discontinuity; this is exactly
the kind of discontinuity allowed for piecewise continuous functions. We can also shift
the discontinuity from 0 to another point a:
(
0, for t < a
ua (t) = u(t − a) =
(3.11)
1, for t ≥ a.
Example 7. If a > 0, then let us compute the Laplace transform of ua (t):
−st t=∞
Z ∞
Z ∞
e
e−as
L[ua (t)] =
e−st ua (t) dt =
e−st dt =
=
.
−s t=a
s
0
a
(3.12)
Even if f is continuous, the improper integral in (3.1) might diverge if f grows too
rapidly at infinity, so we need to impose some limit on its growth.
Definition 3. A function f defined on [0, ∞) is of exponential order c as t → ∞
(where c is a real number) if there exist positive constants M and T so that
|f (t)| ≤ M ect
for all t > T.
(3.13)
3.1. LAPLACE TRANSFORM AND ITS INVERSE
83
All of the functions that we have considered so far are certainly of exponential order
2
as t → ∞; on the other hand, f (t) = et grows too rapidly as t → ∞ for the Laplace
transform to exist.
We can now state and prove our existence theorem for the Laplace transform.
Theorem 2. If f is piecewise continuous on [0, ∞) and of exponential order c as t → ∞,
then the Laplace transform F (s) = L[f (t)] exists for all s > c (where c is the constant
in (3.13)).
Proof.
R ∞Assume s > c. By the comparison theorem of integral calculus, we need only
show 0 g(t) dt < ∞ for some positive function satisfying e−st |f (t)| ≤ g(t) for t ≥ 0.
f e(c−s)t where M
f is greater than or equal to the M in (3.13). Then we
Let g(t) = M
−st
f
have e |f (t)| ≤ g(t) for t ≥ T , and since f (t) is bounded on [0, T ], we can increase M
if necessary to have
f e(c−s)t
e−st |f (t)| ≤ g(t) ≡ M
Now we simply compute (recalling that s > c)
it=∞
h
R∞
f e(c−s)t
=
g(t)
dt
=
M
c−s
0
t=0
for t ≥ 0.
f
M
s−c
< ∞.
2
Inverse Laplace Transform
In applications, we also need to be able to recover f (t) from its Laplace transform
F (s): this is called the inverse Laplace transform and is denoted L−1 . Of course, we
need to make sure that f is uniquely determined by its transform F . While this need
not be true for piecewise continuous f (see Exercise 2), it is true if we assume that f is
continuous.
Theorem 3. Suppose that f (t) and g(t) are continuous on [0, ∞) and of exponential
order c as t → ∞. By Theorem 2, Laplace transforms F (s) and G(s) exist. If F (s) =
G(s) for s > c, then f (t) = g(t) for all t.
The proof of this theorem is not difficult, but for the details we refer to [2]. However, let
us comment on the significance of the assumption that f is continuous. Recall that we
want to use the Laplace transform to solve differential equations, and these differential
equations could have discontinuous terms which need to be transformed. (More will
be said about discontinuous inputs in Section 3.4.) However, the solution that we will
recover using the inverse Laplace transform is usually a continuous function, so the
assumption of continuity in Theorem 3 is no restriction for us.
Formulas for L can now be inverted to obtain formulas for L−1 :
n!
1
tn
L[tn ] = n+1 ⇒ L−1 n+1 =
s
s
n!
1
1
L[eat ] =
⇒ L−1
= eat
s−a
s−a
b
b
−1
= sin bt
L[sin bt] = 2
⇒
L
s + b2
s2 + b2
s
s
−1
L[cos bt] = 2
⇒
L
= cos bt
s + b2
s2 + b2
84
CHAPTER 3. LAPLACE TRANSFORM
Moreover, the linearity of L means that L−1 is also linear, so the above formulas may
be used to handle more complicated expressions.
5
s3
Example 8. Find the inverse Laplace transform of F (s) =
+
1
s2 +4 .
1
2
sin 2t.
Solution. We use linearity and then the above formulas:
L−1 [F (s)] =
5
2
L−1
2
s3
+
1
2
L−1
h
2
s2 +4
i
=
5 2
2 t
+
2
Frequently we encounter functions F (s) that need to be simplified using a partial
fraction decomposition before we can apply the inverse Laplace transform. (For a review
of partial fraction decompositions, see Appendix B.)
Example 9. Find the inverse Laplace transform of F (s) =
1
s(s+1) .
Solution. We cannot find F (s) in our table, but we can use partial fractions to express
it using terms that do appear in the table. Let us write
1
A
B
(A + B)s + A
= +
=
.
s(s + 1)
s
s+1
s(s + 1)
This means A + B = 0 and A = 1, so B = −1. We now can take the inverse Laplace
transform:
L−1
h
1
s(s+1)
i
= L−1
h
1
s
−
1
s+1
i
= L−1
1
s
− L−1
h
1
s+1
i
= 1 − e−t .
2
Hyperbolic Sine and Cosine
The hyperbolic sine and cosine functions defined by
6
sinh t =
cosh(t)
et − e−t
2
and
cosh t =
et + e−t
2
(3.14)
4
2
-3
-2
-1
0
1
2
3
-2
sinh(t)
-4
Fig.3. Graphs of cosh and
sinh.
share similar properties to sin and cos (see Exercise 3); their definitions are also reminiscent of the expressions for sin and cos in terms of exponentials (cf. (A.9) in Appendix
A). However, our interest in sinh and cosh stems from the fact that their Laplace transforms are similar to those for sin and cos. In fact, using the definition of sinh bt and
cosh bt in terms of ebt and e−bt , for which we know the Laplace transform, it is easy to
verify (see Exercise 3c) the following:
L[sinh bt] =
s2
b
− b2
and L[cosh bt] =
s2
s
.
− b2
(3.15)
We add these formulas to the table in Appendix C since they can be useful in finding
inverse Laplace transforms.
3.1. LAPLACE TRANSFORM AND ITS INVERSE
85
Exercises
1. In this exercise we verify the gamma function’s properties (3.9).
(a) Verify directly that Γ(1) = 1.
(b) Use an integration by parts in Γ(x + 1) to obtain x Γ(x) when x > 0.
√
(c) Use the substitution u = t in the formula for Γ(1/2) and then apply the
R ∞ −u2
√
√
well-known Gaussian formula −∞ e
du = π to show Γ(1/2) = π.
2. In Example 7, we showed that L[ua (t)] = s−1 e−as . Let ũa (t) be defined like ua
except with ũa (a) = 0 (whereas ua (a) = 1). Compute L[ũa (t)]. Why does this
not contradict Theorem 3?
3. (a) Show that sinh(−t) = − sinh(t) and cosh(−t) = cosh t, i.e. sinh is an odd
function and cosh is an even function.
d
d
sinh t = cosh t and dt
cosh t = sinh t. (In particular, you do not
(b) Show that dt
d
need to worry about the pesky minus sign that occurs in dt
cos t = − sin t.)
(c) Verify the Laplace transform formulas in (3.15).
4. Apply Definition 1 to find the Laplace transform of the following functions:
(
(a) f (t) = e2t+1
1, if 0 ≤ t ≤ 1
(d) f (t) =
Sol’n
0, if t > 1
t
(b) f (t) = t e
(
t, if 0 ≤ t ≤ 1
(e) f (t) =
(c) f (t) = et sin 2t Solution
0, if t > 1
5. Use the formulas derived in this section (or the table in Appendix C) to find the
Laplace transform of the following functions:
(a) f (t) = cos 3t
(b) g(t) = 3t2 − 5e3t
(c) h(t) = 2 cos 3t − 3 sin 2t
√
(e) g(t) = t + 1
(d) f (t) = t3/2
(f) h(t) = (1 + t)3
(g) f (t) = 3 sinh 2t + 2 sin 3t
(h) g(t) = 3 u(t − π)
6. Find the inverse Laplace transform of the following functions:
(c) F (s) =
2
s−2
2
√
s s
(d) F (s) =
2s−3
s2 +4
1
s2 −4
(e) F (s) =
3s−2
s2 −16
(f) F (s) =
e−2s
3s
(a) F (s) =
(b) F (s) =
7. Use a partial fraction decomposition in order to find the inverse Laplace transform:
(a) F (s) =
1
s(s+2)
(b) F (s) =
s−1
(s+1)(s2 +1)
Solution
(c) F (s) =
−2
s2 (s−2)
(d) F (s) =
s
(s+1)(s+2)(s+3)
86
CHAPTER 3. LAPLACE TRANSFORM
8. Using complex partial fraction decompositions (CPFD’s, see the last subsection
in Appendix B), we can avoid having to take the inverse Laplace transform of a
rational function with an irreducible quadratic denominator. For example,
s2
b
b
A
B
i 1
i 1
=
=
+
=
−
2
+b
(s + ib)(s − ib)
s + ib s − ib
2 s + ib 2 s − ib
so
L−1
i −1
i −1
b
1
1
=
L
−
L
s2 + b2
2
s + ib
2
s − ib
i −ibt i ibt
eibt − e−ibt
e
= sin bt,
− e =
2
2
2i
where we used (3.3) with a = ±ib, and then (A.9) in the last step.
Use a CPFD to find the inverse Laplace transform of the following:
=
(a) F (s) =
(b) F (s) =
3.2
s
s2 +b2
1
s2 +2s+2
(c) F (s) =
(d) F (s) =
1
s3 +4s
2s+4
s2 +4s+8
Transforms of Derivatives, Initial-Value Problems
Recall that we want to use the Laplace transform to solve differential equations, so we
need to know how to evaluate L[f 0 (t)]. To allow the possibility that f 0 (t) has jump
discontinuities, we define a continuous function f to be piecewise differentiable on
[a, b] if f 0 (t) exists at all but finitely many points and it is piecewise continuous on [a, b];
we say f is piecewise differentiable on [0, ∞) if it is piecewise differentiable on [0, b]
for every finite b.
Theorem 1. Suppose f (t) is piecewise differentiable on [0, ∞) and of exponential
order c as t → ∞. Then L[f 0 (t)](s) exists for s > c and is given by
L[f 0 (t)] = sL[f (t)] − f (0) = s F (s) − f (0).
(3.16)
Proof. For simplicity, we assume that f 0 is continuous on [0, ∞). We use integration
by parts to evaluate L[f 0 (t)]:
Z ∞
Z ∞
−st
t=∞
0
−st 0
L[f (t)] =
e f (t) dt = e f (t) t=0 −
(−s e−st ) f (t) dt.
0
0
But since f is of exponential order c as t → ∞, we have e−st |f (t)| ≤ M e(c−s)t and this
tends to zero as t → ∞ for s > c. Consequently,
−st
t=∞
e f (t) t=0 = lim e−st f (t) − f (0) = 0 − f (0),
t→∞
and −
R∞
0
(−s e−st ) f (t) dt =
R∞
0
s e−st f (t) dt = sL[f (t)], so we obtain (3.16).
2
3.2. TRANSFORMS OF DERIVATIVES, INITIAL-VALUE PROBLEMS
87
Let us immediately apply this to an initial-value problem.
Example 1. Solve y 0 + 3y = 1, y(0) = 1.
Solution. Let us denote the Laplace transform of y(t) by Y (s), i.e. L[y(t)] = Y (s).
Using (3.16) we have
L[y 0 (t)] = sY (s) − y(0) = sY (s) − 1.
So, taking the Laplace transform of the differential equation and solving for Y (s) yields
sY + 3Y − 1 =
1
s
⇒
Y (s) =
1
1
+
.
s + 3 s(s + 3)
Before we can take the inverse Laplace transform, we need to perform a partial fraction
decomposition on the last term:
1/3 −1/3
1
=
+
.
s(s + 3)
s
s+3
Using this in Y (s) and then taking the inverse Laplace transform, we find our solution:
Y (s) =
1/3
s
+
2/3
s+3
⇒
y(t) =
1
3
+
2
3
e−3t .
2
Of course, we would also like to use the Laplace transform to solve second-order
(and possibly higher-order) differential equations, so we want to evaluate L[f 00 (t)]. But
if we assume that f and f 0 are both piecewise differentiable and of exponential type on
[0, ∞), then we can apply (3.16) twice, once with f 0 in place of f , and once as it stands:
L[f 00 (t)] = sL[f 0 (t)] − f 0 (0) = s (sL[f (t)] − f (0)) − f 0 (0).
Let us record the result as
L[f 00 (t)] = s2 F (s) − s f (0) − f 0 (0).
(3.17)
We leave as Exercise 1 the generalization of (3.17) to n-th order derivatives.
We can apply this to initial-value problems.
Example 2. Solve y 00 + 4y = 1, y(0) = 1, y 0 (0) = 3.
Solution. We apply the Laplace transform to the equation to obtain
1
s2 Y (s) − s − 3 + 4Y = .
s
Next we solve for Y (s):
(s2 + 4) Y (s) =
1
s2 + 3s + 1
+s+3=
s
s
⇒
Y (s) =
s2 + 3s + 1
.
s(s2 + 4)
We now find the partial fraction decomposition for Y (s):
1/4 (3/4)s + 3
1 1
3
s
3
2
Y (s) =
+
=
+
+
.
s
s2 + 4
4 s
4 s2 + 4
2 s2 + 4
In this last form, we can easily apply the inverse Laplace transform to obtain our
solution:
y(t) =
1
4
+
3
4
cos 2t +
3
2
sin 2t.
2
88
CHAPTER 3. LAPLACE TRANSFORM
We can, of course, use the Laplace transform to solve initial-value problems for
mass-spring-dashpot systems as in Sections 2.4 and 2.6.
Example 3. Suppose a mass m = 1 is attached to a spring with constant k = 9, and
subject to an external force of f (t) = sin 2t; ignore friction. If the spring is initially at
equilibrium (x0 = 0 = v0 ), then use Laplace transform to find the motion x(t).
Solution. The initial-value problem that we must solve is
x00 + 9x = sin 2t,
x(0) = 0 = x0 (0).
If we take the Laplace transform of this equation and let X(s) = L[x(t)], we obtain
s2 X(s) + 9X(s) =
s2
2
.
+4
Solving for X(s) and using a partial fraction decomposition, we obtain
X(s) =
2/5
2/5
1 2
2
3
2
= 2
−
=
−
.
(s2 + 4)(s2 + 9)
s + 4 s2 + 9
5 s2 + 4 15 s2 + 9
In this last form it is easy to apply the inverse Laplace transform to obtain
x(t) =
1
5
sin 2t −
2
15
sin 3t.
2
We now illustrate one more use of Theorem 1, namely providing a short-cut in the
calculation of some Laplace transforms.
Example 4. Show that
L[t sin bt] =
(s2
2bs
.
+ b2 ) 2
(3.18)
Solution. Let f (t) = t sin bt, so we want to find F (s). Let us differentiate f (t) twice:
f 0 (t) = sin bt + bt cos bt,
f 00 (t) = 2b cos bt − b2 t sin bt,
and observe that f (0) = 0 = f 0 (0). Now we can apply the Laplace transform to f 00 (t)
to conclude
s
s2 F (s) = 2b 2
− b2 F (s).
s + b2
Finally, we solve this for F (s) and obtain (3.18).
2
Derivatives of Transforms
Theorem 1 shows that differentiation of f with respect to t corresponds under L to
multiplication of F by s (at least when f (0) = 0). We might wonder if differentiation
of F with respect to s corresponds to multiplication of f by t. We can check this
by differentiating (3.1) with respect to s. If we are allowed to differentiate under the
integral sign, we obtain
Z ∞
Z ∞
Z ∞
d
d −st
0
−st
F (s) =
e f (t) dt =
[e f (t)] dt = −
e−st [t f (t)] dt = −L[t f (t)],
ds 0
ds
0
0
so we see that our expectation was off by a minus sign! The differentiation under the
integral sign is justified provided the improper integral converges “uniformly” (see, for
example, [2]), so we conclude the following:
3.2. TRANSFORMS OF DERIVATIVES, INITIAL-VALUE PROBLEMS
89
Theorem 2. Suppose f (t) is piecewise continuous on [0, ∞) and of exponential order
c as t → ∞. Then, for s > c, its Laplace transform F (s) satisfies
F 0 (s) = −L[tf (t)].
Repeating the differentiation multiple times yields the following useful formula:
L[tn f (t)] = (−1)n F (n) (s),
(3.19)
where n is a positive integer. In fact, with n = 1 and f (t) = sin bt, we see that (3.18) is
also a consequence of this formula. Let us consider another example.
Example 5. Show that
L[t eat ] =
1
(s − a)2
for s > a.
(3.20)
Solution. Letting f (t) = eat , we have F (s) = 1/(s − a) for s > a, and so (3.19) implies
h
i
d
1
1
L[t eat ] = − ds
2
s−a = (s−a)2 .
Exercises
1. If f , f 0 ,. . . ,f (n−1) are piecewise differentiable functions of exponential type on
[0, ∞), show that the Laplace transform of the nth-order derivative is given by
L[f (n) (t)] = sn F (s) − s(n−1) f (0) − s(n−2) f 0 (0) − · · · − f (n−1) (0).
2. Use the Laplace transform to solve the following initial-value problems for firstorder equations.
(a) y 0 + y = et ,
y(0) = 2.
(b) y 0 + 3y = 2e−t ,
0
(c) y − y = 6 cos t,
y(0) = 1.
y(0) = 2.
3. Use the Laplace transform to solve the following initial-value problems for secondorder equations.
(a) y 00 + 4y = 0,
00
y(0) = −3, y 0 (0) = 5.
0
(b) y + y − 2y = 0,
00
y(0) = 1, y (0) = −1.
(c) y − y = 12 e ,
y(0) = 1 = y 0 (0).
(d) y 00 − y = 8 cos t,
y(0) = 0 = y 0 (0).
2t
Solution
0
(e) y 00 − 5y 0 + 6y = 0, y(0) = 1, y 0 (0) = −1.
(f) y 00 + 4y 0 + 3y = cosh 2t, y(0) = 0 = y 0 (0).
4. Use the Laplace transform to find the motion x(t) of a mass-spring-dashpot system
with m = 1, c = 6, k = 8, x0 = 0, and v0 = 2.
90
CHAPTER 3. LAPLACE TRANSFORM
5. Use the Laplace transform to find the motion x(t) of forced mass-spring-dashpot
system with m = 1, c = 5, k = 6, and F (t) = cos t. Assume the mass is initially
at equilibrium.
6. Use the technique of Example 4 or Theorem 2 to take the Laplace transform of
the following functions
3.3
(a) t2 eat
(b) t2 sin bt
(c) t cos bt
(d) t u1 (t)
Shifting Theorems
Suppose f (t) has a Laplace transform F (s). What is the Laplace transform of eat f (t)?
The answer is found by a simple calculation:
Z ∞
Z ∞
L[eat f (t)] =
e−st eat f (t) dt =
e−(s−a)t f (t) dt = F (s − a).
0
0
Now the function F (s − a) is just the function F (s) “shifted” or “translated” by the
number a, so we see that multiplication of f by eat corresponds to shifting F (s) by a.
We summarize this as our first “shifting theorem”:
Theorem 1. Suppose f has Laplace transform F (s) defined for s > c, and a is a
real number. Then eat f (t) has Laplace transform defined for s > a + c by
L[eat f (t)] = F (s − a).
(3.21)
A rather trivial application of this theorem shows
L[1] =
1
s
⇒
L[eat ] = L[eat · 1] =
1
,
s−a
but of course, we already knew this. More importantly, we obtain the following Laplace
transforms that we did not know before:
L[eat tn ] =
n!
(s − a)n+1
(s > a)
(3.22a)
L[eat cos bt] =
(s − a)
(s − a)2 + b2
(s > a)
(3.22b)
L[eat sin bt] =
b
(s − a)2 + b2
(s > a).
(3.22c)
Example 1. Find the inverse Laplace transform of the following functions
F (s) =
2
(s − 2)3
G(s) =
s+2
.
s2 + 2s + 5
3.3. SHIFTING THEOREMS
91
Solution. For L−1 [F (s)] we can apply (3.22a) with a = 2 = n; but to see how Theorem
1 applies in this case, observe that L[t2 ] = 2/s3 implies L[e2t t2 ] = 2/(s − 2)3 , and hence
2
−1
f (t) = L
= e2t t2 .
(s − 2)3
Now let us consider G(s). Since the denominator does not factor into linear factors, we
cannot use a partial fraction decomposition. However, we can “complete the square” to
write it as
s2 + 2s + 5 = (s + 1)2 + 4 = (s + 1)2 + 22 .
Hence
G(s) =
s+1
1
(s + 1) + 1
2
s+2
=
+
.
=
s2 + 2s + 5
(s + 1)2 + 22
(s + 1)2 + 22
2 (s + 1)2 + 22
In this form we can easily use Theorem 1 to take the inverse Laplace transform to
conclude
i
h
i
h
s+1
g(t) = L−1 (s+1)
+ 12 L−1 (s+1)22 +22 = e−t cos 2t + 21 e−t sin 2t.
2
2 +22
Theorem 1 is frequently useful in solving initial-value problems.
Example 2. Solve y 00 + 6y 0 + 34y = 0, y(0) = 3, y 0 (0) = 1.
Solution. We take the Laplace transform of the equation and use the initial conditions
to obtain
(s2 Y − 3s − 1) + 6(sY − 3) + 34Y = 0.
We can rearrange this and solve for Y to find
Y (s) =
3s + 19
.
s2 + 6s + 34
The denominator does not factor, but we can complete the square to write it as
s2 + 6s + 34 = s2 + 6s + 9 + 25 = (s + 3)2 + 52 .
Using this and rearranging the numerator yields
Y (s) =
3s + 19
s+3
5
=3
+2
.
(s + 3)2 + 52
(s + 3)2 + 52
(s + 3)2 + 52
Finally, we can apply (3.22) to obtain our solution
y(t) = 3 e−3t cos 5t + 2 e−3t sin 5t.
2
Since multiplication by the exponential function eat corresponds to a shift in the
Laplace transform, it is natural to wonder whether multiplication of the Laplace transform by an exponential corresponds to a shift in the original function. This is essentially
true, but there is an additional subtlety due to the fact that the original function is only
assumed defined for t ≥ 0. In fact, we know L[1] = 1/s and in Example 7 in Section 3.1
we found that L[ua (t)] = e−as /s. So multiplication of L[1] by e−as does not correspond
to a simple of shift of the function 1 (which would have no effect), but corresponds to
multiplication of 1 by ua (t) = u(t − a). This observation is generalized in our second
shifting theorem:
92
CHAPTER 3. LAPLACE TRANSFORM
Theorem 2. Suppose f (t) has Laplace transform F (s) defined for s > c, and suppose
a > 0. Then
L[u(t − a)f (t − a)] = e−as F (s).
(3.23)
Proof. We simply compute, using the definition of u(t − a) and a change of the integration variable t → t̃ = t − a:
Z ∞
Z ∞
−st
L[u(t − a)f (t − a)] =
e u(t − a)f (t − a) dt =
e−st f (t − a) dt
0
=
R∞
0
a
e−s(t̃+a) f (t̃) dt̃ = e
R
−sa ∞
0
e−st̃ f (t̃) dt̃ = e−sa F (s).
2
Theorem 2 is useful in taking the Laplace transform of functions that are defined piecewise, provided they can be written in the form u(t − a)f (t − a).
Example 2. Find the Laplace transform of
(
0,
if 0 ≤ t < 1
g(t) =
t − 1, if t ≥ 1.
2
g(t)
1
Solution. If we let f (t) = t then g(t) = u(t − 1)f (t − 1). Since f (t) has Laplace
transform F (s) = 1/s2 , we use Theorem 2 with a = 1 to conclude
t
0
1
2
Fig.1. Graph of g(t) in
Example 2.
L[g(t)] = L[u(t − 1)f (t − 1)] = e−s s12 .
2
On the other hand, Theorem 2 is also useful in taking the inverse Laplace transform of
functions involving exponentials.
Example 3. Find the inverse Laplace transform of e−πs s/(s2 + 4).
Solution. If we let F (s) = s/(s2 +4), then its inverse Laplace transform is f (t) = cos 2t.
Consequently, using Theorem 2 with a = π, we find
s
−1
−πs
L
e
= u(t − π) cos 2(t − π) = u(t − π) cos 2t,
s2 + 4
1
t
-1
0
1
2
3
4
5
6
7
8
9
Fig.2. Graph of
u(t − π) cos 2t.
10
where in the last step we used the fact that cosine has period 2π. The graph of this
solution is shown in Figure 2.
2
Now let us apply Theorem 2 to an initial-value problem.
Example 4. Solve x00 + 4x = 2uπ (t), x(0) = 0, x0 (0) = 1.
Solution. We take the Laplace transform of the problem to obtain
s2 X(s) − 1 + 4X(s) =
2e−πs
.
s
Solving for X(s), we obtain
X(s) =
2e−πs
1
+
.
s(s2 + 4) s2 + 4
3.3. SHIFTING THEOREMS
93
If we use partial fractions, we can write
2
A Bs + C
1
= + 2
=
s(s2 + 4)
s
s +4
2
1
s
−
s s2 + 4
,
so X(s) becomes
s
1 2
1 −πs 1
e
− 2
+
.
2
s s +4
2 s2 + 4
Now we can apply Theorem 2 to conclude
X(s) =
x(t) =
1
1
1
1
u(t − π) (1 − cos 2(t − π)) + sin 2t = u(t − π) (1 − cos 2t) + sin 2t,
2
2
2
2
2
where in the last step we used cos 2(t − π) = cos 2t.
Exercises
1. Use Theorem 1 to calculate the Laplace transform of the following functions
(d) e−4t sin 3t.
(a) t e2t .
2 −3t
(b) t e
.
(e) t (et + e−t ).
Solution
(f) t3 et + e−t cos 2t.
(c) e3t cos 5t.
2. Use Theorem 1 to find the inverse Laplace transform of the following functions
(a) F (s) =
3
(s−2)3 .
(b) F (s) =
1
s2 +4s+4 .
Solution
(c) F (s) =
s+3
s2 +4s+13 .
(d) F (s) =
√1 .
s+3
3. Solve the following initial-value problems involving homogeneous equations
(a) y 00 − 4y 0 + 8y = 0, y(0) = 0, y 0 (0) = 1.
(b) y 00 + 6y 0 + 10y = 0, y(0) = 2, y 0 (0) = 1.
(c) y 00 − 6y 0 + 25y = 0, y(0) = 2, y 0 (0) = 3.
Solution
4. Solve the following initial-value problems involving nonhomogeneous equations
(a) y 00 + 4y 0 + 8y = e−t , y(0) = 0 = y 0 (0)
(b) y 00 − 4y = 3 t et , y(0) = 0 = y 0 (0)
(c) y 00 − y = 8 et sin 2t, y(0) = 0 = y 0 (0)
5. Use Theorem 2 to find the Laplace transform of the following functions:
(a)
(
f (t) =
0
sin(t − π)
for 0 ≤ t < π
for t ≥ π
(b)
(
0
g(t) =
et
for 0 ≤ t < 1
for t ≥ 1
94
CHAPTER 3. LAPLACE TRANSFORM
6. Use Theorem 2 to find the inverse Laplace transforms of the following functions:
(a) 2 e−s /(s2 + 9).
Solution
(b) e−3s /(s2 + 2s + 5).
7. Use the Laplace transform to find the motion x(t) of a forced mass-spring-dashpot
system with m = 1, c = 4, k = 5, and F (t) = −20 cos 3t. Assume the mass is
initially at equilibrium.
8. Solve the initial-value problem x00 + 9x = 4 uπ (t) sin(t − π), x(0) = 0, x0 (0) = 1.
3.4
Discontinuous Inputs
In this section we focus on differential equations involving discontinuous inputs. An
example is the damped spring-mass-dashpot model with external forcing that we studied
in Section 2.6
dx
d2 x
+ kx = f (t),
(3.24)
m 2 +c
dt
dt
but now we allow the forcing term f (t) to be a discontinuous function. Such discontinuous forcing functions arise naturally in applications where the forcing may suddenly be
turned on or off. In fact, we also want to consider systems in which the external force
occurs in the form of an impulse, i.e. a very sudden and short input.
When f (t) is a piecewise continuous function, the first step is to express it using unit
step functions; then we can use Theorem 2 in Section 3.3 to take the Laplace transform,
and carry on with the solution. In this context, it is worth observing that if a > 0 and
f (t) is a function that is defined for t ≥ 0, then multiplication by u(t − a) has the effect
of “turning off the input” for 0 ≤ t < a:
(
0
for 0 ≤ t < a
u(t − a)f (t) =
f (t) for t ≥ a.
a
t
Fig.1. Graph of
u(t − a)f (t).
On the other hand, multiplication by (1 − u(t − a)) has the effect of turning off the input
for t ≥ a:
(
f (t) for 0 ≤ t < a
(1 − u(t − a)) f (t) =
0
for t ≥ a.
These observations can be useful in expressing a piecewise continuous function in terms
of unit step functions.
Example 1. The function
a
t
Fig.2. Graph of
(1 − u(t − a))f (t).


0
f (t) = sin t


0
for 0 ≤ t < 2π
for 2π ≤ t < 4π
for t ≥ 4π
can be thought of as sin t, but turned off for 0 ≤ t < 2π and also for t ≥ 4π. To turn off
sin t for 0 ≤ t < 2π, we multiply by u(t − 2π) to obtain u(t − 2π) sin t. Now we multiply
this by (1 − u(t − 4π)) in order to also turn it off for t ≥ 4π. The result is
f (t) = (1 − u(t − 4π))u(t − 2π) sin t =u(t − 2π) sin t − u(t − 4π)u(t − 2π) sin t
= u(t − 2π) sin t − u(t − 4π) sin t,
3.4. DISCONTINUOUS INPUTS
95
where in the last step we have used the fact that u(t − 4π)u(t − 2π) = u(t − 4π). Since
sine has period 2π, sin(t) = sin(t − 2π) = sin(t − 4π), so we can rewrite this as
0
f (t) = u(t − 2π) sin(t − 2π) − u(t − 4π) sin(t − 4π).
The advantage of this last form of f (t) is that we are ready to apply (3.23) to compute
its Laplace transform.
2
Now let us apply this technique to solving initial-value problems involving piecewise
continuous functions. We begin with a first-order equation.
Example 2. Solve the initial-value problem
0
y − y = g(t), y(0) = 0
(
cos t
where g(t) =
0
for 0 ≤ t < π
for t ≥ π.
Solution. We can express g(t) as (1 − uπ (t)) cos t, where by “turning off” cos t at t = π
we have a jump discontinuity. Now we want to take the Laplace transform of g(t), but
uπ (t) cos t is not in the form to apply (3.23). However, if we recall from trigonometry
that cos(t − π) = − cos t, then we can write
g(t) = cos t + u(t − π) cos(t − π).
This allows us to use (3.23) to compute the Laplace transform of g(t) and obtain
G(s) =
e−πs s
s
+
.
s2 + 1 s2 + 1
We use this to take the Laplace transform of the equation and then solve for Y (s):
Y (s) =
s
(1 + e−πs ).
(s − 1)(s2 + 1)
We can use partial fractions to write
s
1
=
(s − 1)(s2 + 1)
2
so
−1
L
1
s
1
−
+
s − 1 s2 + 1 s2 + 1
,
s
1 t
=
e − cos t + sin t .
2
(s − 1)(s + 1)
2
If we use (3.23), we compute
s e−πs
1
L−1
= u(t − π) et−π − cos(t − π) + sin(t − π) .
2
(s − 1)(s + 1)
2
Using cos(t − π) = − cos t and sin(t − π) = − sin t, we conclude that our solution is
given by
y(t) =
1
2
(et − cos t + sin t) + 12 u(t − π) (et−π + cos t − sin t) .
1
2
4π
2π
t
-1
Fig.3. Graph of f (t) in
Example 1.
96
CHAPTER 3. LAPLACE TRANSFORM
Finally, we apply this to (3.24) when f is piecewise continuous.
Example 3. Suppose a mass of 1 kg is attached to a spring with constant k = 4. The
mass is initially in equilibrium, but for 0 ≤ t < 2π experiences a periodic force cos 2t,
and then the force is turned off. Determine the motion, ignoring friction (i.e. let c = 0).
Solution. As usual we let x denote the displacement from equilibrium that stretches
the spring, so the initial-value problem that we want to solve is
(
cos 2t for 0 ≤ t < 2π
00
0
(3.25)
x + 4x = f (t), x(0) = 0 = x (0), where f (t) =
0
for t ≥ 2π.
We can write f (t) as
f (t) = (1 − u(t − 2π)) cos 2t = cos 2t − u(t − 2π) cos 2(t − 2π),
where cos 2t = cos 2(t − 2π) since cos is periodic with period 2π. We can now take the
Laplace transform of f (t) to obtain
F (s) =
s
e−2πs s
−
.
s2 + 4
s2 + 4
Now we can apply the Laplace transform to the differential equation and conclude that
X(s) =
e−2πs s
s
−
.
(s2 + 4)2
(s2 + 4)2
To take the inverse Laplace transform, we can apply (3.18) to conclude
s
1
−1
L
= t sin 2t.
2
2
(s + 4)
4
If we combine this with (3.23), we obtain
−2πs 1
1
e
s
−1
L
= u(t − 2π)(t − 2π) sin 2(t − 2π) = u(t − 2π)(t − 2π) sin 2t.
2
2
(s + 4)
4
4
Thus we can express our solution as
1
x(t) = [t − u(t − 2π)(t − 2π)] sin 2t =
4
x=π/2
|
2π
|
t
4π
x=-π/2
Fig.4. Graph of x(t) in
Example 3.
(
t
4
π
2
sin 2t for 0 ≤ t < 2π
sin 2t for t ≥ 2π
The graph of this solution may be seen in Figure 4. Notice that, for 0 ≤ t < 2π,
amplitude of the oscillations steadily increase since the forcing frequency matches
natural frequency, i.e. we have resonance. But once the force is removed at t = 2π,
oscillations continue at the amplitude that has been attained.
the
the
the
2
Impulsive Force: Dirac Delta Function
If a force f (t) is applied to an object over a time interval [t1 , t2 ], then the impulse
I represents the resultant change in the momentum mv of the object. Since this change
3.4. DISCONTINUOUS INPUTS
97
in momentum can be calculated by mv(t2 ) − mv(t1 ) = m
see that the impulse over [t1 , t2 ] is given by
Z t2
I=
f (t) dt.
R t2
t1
v 0 (t) dt = m
R t2
t1
a(t) dt, we
(3.26)
t1
Now suppose that the force is applied over a very short time interval [t1 , t1 + ], but
has such a large magnitude that its impulse is 1. For example, for > 0 very small,
consider the function
(
1/ for 0 ≤ t < d (t) =
0
for t < 0 and t ≥ ,
Fig.5. Graph of d (t).
whose graph is shown in Figure 5. Then we calculate its impulse over (−∞, ∞) and
find
Z ∞
Z 1
d (t) dt =
dt = 1.
−∞
0 So the impulse of d is 1 for all > 0. But if we take → 0, what happens to d ? For
t < 0 we have d (t) = 0 for all > 0, and for t0 > 0 we have d (t0 ) = 0 for < t0 . On
the other hand, d (0) = 1/ → +∞ as → 0, so
(
0
for all t 6= 0
lim d (t) =
→0
+∞ for t = 0.
Now let us be bold and define the Dirac delta function δ(t) to be the limit of
d (t) as → 0. It has the following significant properties:
Z ∞
δ(t) = 0 for t 6= 0 and
δ(t) dt = 1.
(3.27)
−∞
If we think of δ(t) as a function, this is very strange indeed: it is zero for all t 6= 0 and
yet its integral is 1! In fact, δ(t) is not a function at all, but a “generalized function”
or “distribution.” Such objects are analyzed by their effect on actual functions through
integration, and in this capacity δ(t) performs perfectly well. For example, if g(t) is any
continuous function, then by the mean value theorem of calculus, we know
Z
1 g(t) dt = g(t∗ ) for some 0 < t∗ < .
0
But as → 0, we have t∗ → 0, so
Z ∞
Z
δ(t)g(t) dt = lim
−∞
→0
∞
−∞
1
→0 Z
d (t)g(t) dt = lim
g(t) dt = g(0).
0
In fact, if we consider δ(t − a) for any real number a, we may generalize this calculation
to obtain the important property
Z ∞
δ(t − a) g(t) dt = g(a) for any continuous function g(t).
(3.28)
−∞
Now we are in business, since (3.28) enables us to compute the Laplace transform
of δ(t − a):
Fig.6. The mean value
theorem.
98
CHAPTER 3. LAPLACE TRANSFORM
Z
L[δ(t − a)] =
∞
e−st δ(t − a) dt = e−as
for a > 0,
(3.29)
0
and in the special case a = 0 we can take the limit a → 0 to obtain
L[δ(t)] = 1.
(3.30)
We can now use the Laplace transform to solve initial-value problems involving the
Dirac delta function.
Example 4. Solve the initial-value problem y 0 + 3y = δ(t − 1),
y(0) = 1.
Solution. If we take the Laplace transform of this equation, we obtain
sY − 1 + 3Y = e−s ,
where we have used (3.29) with a = 1 to calculate L[δ(t − 1)]. We easily solve this in
terms of Y (s)
1
e−s
Y (s) =
+
,
s+3 s+3
1
t
0
1
2
Fig.7. The solution for
Example 4.
and then take the inverse Laplace transform, using (3.23) on the second term:
y(t) = e−3t + u1 (t)e−3(t−1) .
A sketch of this solution appears in Figure 7. Notice that the jump discontinuity at
t = 1 corresponds to the time when the impulse δ(t − 1) is applied.
2
The Dirac delta function is especially useful in applications where it can be used to
express an instantaneous force. For example, if we consider a spring-mass system, then
a sudden, sharp blow to the mass that imparts an impulse I0 over a very short time
interval near the time t0 can be modeled using
±I0 δ(t − t0 ),
where ± depends on the direction of the impulse. Let us illustrate this with an example.
Example 5. A vertically hung spring with constant k = 1 is attached to a mass of 1
kg. At t = 0 the spring is in its equilibrium position but with a downward velocity of 1
m/sec. At t = π sec, the mass is given an instantaneous blow that imparts an impulse
of 2 units of momentum in an upward direction. Find the resultant motion.
Solution. As usual, we measure y in a downward direction and assume y = 0 is the
equilibrium position after the mass has been attached. Thus the impulse imparted by
the instantaneous blow at t = π has the value −2 and the initial-value problem that we
must solve is
y 00 + y = −2δ(t − π), y(0) = 0, y 0 (0) = 1.
If we take the Laplace transform of this equation we obtain
s2 Y − 1 + Y = −2 e−πs ,
3.4. DISCONTINUOUS INPUTS
99
which we can solve for Y to find
Y (s) =
1
e−πs
−
2
.
s2 + 1
s2 + 1
3
2
1
Now we can take the inverse Laplace transform to find
0
y(t) = sin t − 2 u(t − π) sin(t − π).
But if we recall from trigonometry that sin(t − π) = − sin t, then we can write our
solution as
(
sin t,
0≤t<π
y(t) = (1 + 2uπ (t)) sin t =
3 sin t, t ≥ π.
The graph appears in Figure 8. Notice that the amplitude of the vibration is 1 until
the impulse occurs at t = π; thereafter the amplitude is 3.
2
Exercises
1. Express the following functions using step functions ua (t) = u(t − a)

(

0 ≤ t < 1,
0
1
0 ≤ t < 2π,
(b) g(t) =
(a) f (t) = t − 1 1 ≤ t < 2,

cos
t
t
≥ 2π.

1
t ≥ 2.
2. Solve the following first-order initial-value problems
(a) y 0 − y = 3u1 (t),
y(0) = 0.
0
(b) y + 2y = u5 (t),
y(0) = 1.
3. Solve the following second-order initial-value problems
(a) y 00 − y = u1 (t),
y(0) = 2,
00
(b) y − 4y = u1 (t) − u2 (t),
00
0
(c) y + 4y + 5y = 5u3 (t),
y 0 (0) = 0.
y(0) = 0,
y(0) = 2,
(d) y 00 + 3y 0 + 2y = 10 uπ (t) sin t,
Solution
0
y (0) = 2.
y 0 (0) = 0.
y(0) = 0 = y 0 (0).
4. Solve the following first-order initial-value problems
(a) y 0 − 2y = δ(t − 1),
y(0) = 1.
(b) y 0 − 5y = e−t + δ(t − 2),
y(0) = 0.
5. Solve the following second-order initial-value problems
(a) y 00 + 4y = δ(t − 3),
y(0) = 1,
y 0 (0) = 0.
Solution
00
0
y(0) = 0 = y (0).
00
0
y(0) = 0,
(b) y + 4y + 4y = 1 + δ(t − 2),
0
(c) y + 4y + 5y = δ(t − π) + δ(t − 2π),
(d) y 00 + 16y = cos 3t + 2 δ(t − π2 ),
y(0) = 0 = y 0 (0).
y 0 (0) = 2.
-1
-2
-3
Fig.8. The solution for
Example 5.
100
CHAPTER 3. LAPLACE TRANSFORM
6. A 1 kg mass is attached to a spring with constant k = 9 N/m. Initially, the mass
is in equilibrium, but for 0 ≤ t ≤ 2π experiences a periodic force of 4 sin t N, and
then the force is turned off. Ignoring friction (let c = 0), determine the motion.
7. A mass m = 1 is attached to a spring with constant k = 5 and a dashpot with
coefficient c = 2. Initially in equilibrium, the mass is subjected to a constant force
f = 1 for 0 ≤ t ≤ π, and then the force is turned off. Find the motion.
8. A mass m = 1 is attached to a spring with constant k = 1 and no friction (c = 0).
At t = 0, the mass is pulled from its equilibrium at x = 0 to x = 1, extending the
spring, and released; then at t = 1 sec, the mass is given a unit impulse in the
positive x-direction. Determine the motion x(t).
3.5
Convolutions
When solving an initial-value problem using the Laplace transform, we may want to
take the inverse Laplace transform of a product of two functions whose inverse Laplace
transforms are known, i.e. F (s)G(s) where F (s) = L[f (t)] and G(s) = L[g(t)]. In
general, L−1 [F (s)G(s)] 6= f (t)g(t) so we need to know how to take the inverse Laplace
transform of a product. This leads us to the “convolution” of two functions.
Definition 1. If f and g are piecewise continuous function for t ≥ 0, then the convolution is the function defined for t ≥ 0 by
Z t
f ? g(t) =
f (t − τ )g(τ ) dτ.
0
If we make the substitution u = t − τ in the integral, then du = −dτ and
Z t
Z 0
Z t
f (t − τ )g(τ ) dτ = −
f (u)g(t − u) du =
g(t − u)f (u) du = g ? f (t),
0
t
0
so convolution is commutative: f ? g = g ? f . It is also easy to check that convolution
is associative, i.e. f ?(g ?h) = (f ?g)?h, and distributive, i.e. f ?(g +h) = f ?g +f ?h.
Let us compute an example.
Example 1. If f (t) = t and g(t) = sin t, then we use integration by parts to calculate
Z t
Z t
Z t
f ? g(t) =
(t − τ ) sin τ dτ = t
sin τ dτ −
τ sin τ dτ
0
0
0
Z t
τ =t
= −t (cos τ )|ττ =t
cos τ dτ
=0 + (τ cos τ )|τ =0 −
0
= −t (cos t − 1) + t cos t − sin τ |t=τ
t=0 = t − sin t.
2
We conclude that t ? sin t = t − sin t.
Now let us see how convolution relates to the Laplace transform.
Theorem 1. If f and g are both piecewise continuous on [0, ∞) and of exponential
order c as t → ∞, then
L[f ? g(t)] = F (s) G(s)
for s > c.
3.5. CONVOLUTIONS
101
Proof. Let us write G(s) as
Z ∞
Z
G(s) =
e−su g(u) du =
0
∞
e−s(t−τ ) g(t − τ ) dt = esτ
Z
∞
e−st g(t − τ ) dt,
τ
τ
where we have replaced the integration variable u by t = u + τ ; here τ > 0 is fixed but
below we will allow it to vary as an integration variable. Using this we find
Z ∞
Z ∞
e−sτ f (τ ) G(s) dτ
e−sτ f (τ ) dτ G(s) =
F (s) G(s) =
0
0
Z ∞
Z ∞Z ∞
Z ∞
−st
f (τ ) e−st g(t − τ ) dt dτ.
e g(t − τ ) dt dτ =
f (τ )
=
0
0
τ
τ
Now in this last integral, we want to change the order of integration: instead of integrating τ ≤ t < ∞ and then 0 ≤ τ < ∞, we want to integrate 0 ≤ τ ≤ t and then
0 ≤ t < ∞ (see Figure 1). But this means
Z ∞Z t
F (s) G(s) =
f (τ ) e−st g(t − τ ) dτ dt
0
0
Z t
Z ∞
=
e−st
f (τ )g(t − τ ) dτ dt = L[f ? g(t)],
0
0
2
which establishes our result.
Fig.1. Domain of
integration
One application of Theorem 1 is simply to compute inverse Laplace transforms.
Example 2. Find the inverse Laplace transform:
1
L−1 2
.
s (s + 1)
Solution. We could use partial fractions to express s−2 (s + 1)−1 as a sum of terms
that are easier to invert through the Laplace transform. But, since f (t) = t has Laplace
transform F (s) = 1/s2 and g(t) = e−t has Laplace transform G(s) = 1/(s + 1), we can
also use Theorem 1:
Z t
Z t
Z τ
1
−1
−τ
−τ
L
=
f
?
g(t)
=
(t
−
τ
)
e
dτ
=
t
e
dτ
−
τ e−τ dτ
s2 (s + 1)
0
0
0
=t
=t
= −t (e−τ )|ττ =0
+ (τ e−τ )|ττ =0
−
Rt
0
e−τ dτ = e−t + t − 1.
2
Another application of Theorem 1 is to derive solution formulas when the input
function is not yet known. This can be important in applications where the system is
fixed but the input function is allowed to vary; for example, consider a fixed springmass-dashpot system with different forcing functions f (t).
Example 3. Show that the solution of the initial-value problem
x00 + ω 2 x = f (t),
x(0) = 0 = x0 (0),
102
CHAPTER 3. LAPLACE TRANSFORM
is given by
x(t) =
1
ω
Z
t
sin ω(t − τ ) f (τ ) dτ.
(3.31)
0
Solution. We assume that f (t) is piecewise continuous on [0, ∞) and of exponential
order as t → ∞. Taking the Laplace transform, we obtain
(s2 + ω 2 )X(s) = F (s),
where F (s) = L[f (t)]. Solving for X(s), we find
X(s) =
s2
1
ω
1
F (s) =
F (s).
2
2
+ω
ω s + ω2
But we recognize ω/(s2 + ω 2 ) as the Laplace transform of sin ωt, so we may apply
Theorem 1 to obtain (3.31).
2
Remark 1. The solution formula (3.31) provides the mapping of the system’s input f (t)
to its output or response, x(t). Under the Laplace transform, the relationship of the input
to the output is given by X(s) = H(s)F (s), where H(s) is called the transfer function.
Of course, for the system in Example 3, the transfer function is H(s) = (s2 + ω 2 )−1 .
Exercises
1. Calculate the convolution of the following functions
(a) f (t) = t, g(t) = 1,
(b) f (t) = t, g(t) = eat (a 6= 0),
(c) f (t) = t2 , g(t) = sin t
(d) f (t) = eat , g(t) = ebt (a 6= b)
2. Find the inverse Laplace transform using Theorem 1. (You may want to use the
table of integrals in Appendix C.)
2
s
1
(a) (s−1)(s−2)
(b) (s+1)(s
(c) s3 +s
2 +1)
2
3. Find the Laplace transform of the given function
Rt
(a) f (t) = 0 (t − τ )2 cos 2τ dτ
Rt
(b) f (t) = 0 e(t−τ ) sin 3τ dτ
Rt
(c) f (t) = 0 sin(t − τ ) cos τ dτ
4. Determine solution formulas like (3.31) for the following initial-value problems
(a) y 00 − y = f (t),
y(0) = 0 = y 0 (0)
(b) y 00 + 2y 0 + 5y = f (t),
(c) y 00 + 2y 0 + y = f (t),
y(0) = 0 = y 0 (0)
y(0) = 0 = y 0 (0)
3.6. ADDITIONAL EXERCISES
3.6
103
Additional Exercises
1. Find the values of c so that f (t) = t100 e2t is of exponential order c.
2
2. Show that f (t) = sin(et ) is of exponential order 0, but f 0 (t) is not of exponential
order at all!
3. Consider the function
(
f (t) =
1,
for 2k ≤ t < 2k + 1
−1, for 2k + 1 ≤ t < 2k + 2,
where k = 0, 1, . . . . Use Definition 1 and a geometric series to show
L[f (t)] =
1 − e−s
s (1 + e−s )
for s > 0.
4. Consider the function
(
1, for 2k ≤ t < 2k + 1
f (t) =
0, for 2k + 1 ≤ t < 2k + 2,
where k = 0, 1, . . . . Use Definition 1 and a geometric series to show
1
1
L[f (t)] =
s (1 + e−s )
for s > 0.
0
5. Consider the saw-tooth function f (t) given in the sketch. Find L[f (t)].
(Hint: Sketch f 0 (t).)
1
7. Solve the following initial-value problem (see Exercise 1 in Section 3.2):
0 1
x(0) = 1, x0 (0) = x00 (0) = x(3) (0) = 0.
8. Solve the following initial-value problem (see Exercise 1 in Section 3.2):
x(4) + 2x00 + x = 1,
x(0) = x0 (0) = x00 (0) = x(3) (0) = 0.
9. Find the inverse Laplace transform of F (s) =
1
s4 −a4
10. Find the inverse Laplace transform of F (s) =
s2
s4 −a4
11. Consider the equation
y 00 + y = f (t)
where
f (t) = u0 (t) + 2
∞
X
k=1
(−1)k ukπ (t).
2
3
4
5
Fig.1. f (t) in Exercise 5.
6. Consider the modified saw-tooth function f (t) given in the sketch. Find L[f (t)].
(Hint: Sketch f 0 (t).)
x(4) − x = 0,
1
2
3
4
5
6
7
Fig.2. f (t) in Exercise 6.
8
104
CHAPTER 3. LAPLACE TRANSFORM
(a) Explain why the infinite series converges for each t in [0, ∞).
(b) Sketch the graph of f (t). Is it periodic? If so, find the period.
(c) Use the Laplace transform to find the solution satisfying y(0) = 0 = y 0 (0).
(d) Do you see resonance? Explain.
12. Consider the equation
y 00 + y = g(t)
where
g(t) =
∞
X
ukπ (t).
k=0
(a) Explain why the infinite series converges for each t in [0, ∞).
(b) Sketch the graph of g(t). Is it periodic? If so, find the period.
(c) Use the Laplace transform to find the solution satisfying y(0) = 0 = y 0 (0).
(d) Would you describe this as resonance? Explain.
13. Suppose a mass m = 1 is connected to a spring with constant k = 4 and experiences a series of unit impulses at t = nπ for n = 0, 1, 2, . . . so that the following
equation holds:
∞
X
x00 + 4x =
δnπ (t)
n=0
(a) If the mass is initially at equilibrium, find the motion x(t) for t > 0.
(b) Sketch x(t). Do you see resonance?
14. Suppose the spring in the previous problem is replaced by one for which k = 1, so
that the equation becomes
x00 + x =
∞
X
δnπ (t)
n=0
(a) If the mass is initially at equilibrium, find the motion x(t) for t > 0.
(b) Sketch x(t). Explain why there is no resonance.
Chapter 4
Systems of Linear Equations
and Matrices
4.1
Introduction to Systems and Matrices
In this chapter we want to study the solvability of linear systems, i.e. systems of
linear (algebraic) equations. A simple example is the system of two equations in two
unknowns x1 , x2 :
a x1 + b x2 = y1
(4.1)
c x1 + d x2 = y2 ,
where a, b, c, d, y1 , y2 are known. Note that these are algebraic rather than differential
equations; however, as we saw in Section 2.2, the solvability theory for (4.1) is essential
for proving the linear independence of solutions of 2nd-order differential equations.
Moreover, the linear algebra theory and notation that we shall develop will be useful
for our study of systems of differential equations in Chapter 7. However, before we
generalize (4.1) to more equations and unknowns, let us review the results for (4.1) that
should be familiar from high school.
One way to solve (4.1) is by substitution: solve for one variable in one equation,
say x1 in terms of x2 , and then substitute it in the other equation to obtain an equation
only involving x2 ; solve this equation for x2 and use this to find x1 . Another method
for solving (4.1) is elimination: multiply one equation by a constant so that adding it
to the other equation eliminates one of the variables, say x1 ; solve this equation for x2
and substitute this back into one of the original equations to find x1 . Let us illustrate
the method of elimination with a simple example.
Example 1. Find the solution of
x1 + 2 x2 = 6
3 x1 − x2 = 4.
Solution. We multiply the second equation by 2 to obtain
x1 + 2 x2 = 6
6 x1 − 2x2 = 8.
105
106
CHAPTER 4. SYSTEMS OF LINEAR EQUATIONS AND MATRICES
If we add these equations, we obtain 7 x1 = 14, i.e. x2 has been eliminated. We find
x1 = 2 and then plug back into either of the original equations to find x2 = 2.
2
In this example we not only found a solution of the linear system, but showed that it
is the unique solution (i.e. there are no other values of x1 , x2 satisfying the equation).
In fact, we can interpret this result geometrically if we consider each equation in the
system as defining a straight line in the x1 , x2 -plane:
⇔
x1 + 2 x2 = 6
4
6 x1 − 2x2 = 8
x2
1
x2 = − x1 + 3 (slope − 12 , x2 -intercept=3)
2
⇔ x2 = 3x1 − 4 (slope 3, x2 -intercept=-4).
3
. (2,2)
2
1
0
1
2
3
4
5
x1
6
-1
-2
-3
-4
Fig.1 Geometric solution
of Example 1
The fact that there is a unique solution is geometrically the fact that the two lines
intersect in one point (see Figure 1). Of course, two lines do not necessarily intersect
(they could be parallel); in this case the two equations do not admit any solution and
are called inconsistent. There is one other possibility that we illustrate in the next
example.
Example 2. Discuss the solvability of
a)
x1 − x2 = 1
2 x1 − 2 x2 = 3
b)
x1 − x2 = 1
2 x1 − 2 x2 = 2.
Solution. a) If we multiply the 1st equation by 2 and subtract it from the second
equation we obtain 0=1! There is no choice of x1 , x2 to make this statement true, so we
conclude that the equations are inconsistent; if we graph the two lines, we find they are
parallel. b) If we perform the same algebra on these equations, we obtain 0=0! While
this is certainly true, it does not enable us to find the values of x1 , x2 . In fact, the two
equations are just constant multiples of each other, so we have an infinite number of
solutions all satisfying x1 − x2 = 1.
2
We conclude that there are three possibilities for the solution set of (4.1):
• There is a unique solution;
• There is no solution;
• There are infinitely many solutions.
We shall see that these three possibilities also apply to linear systems involving more
equations and more unknowns.
Matrix Notation and Algebra
Let us now begin to develop the notation and theory for the general case of a linear
system of m equations in n unknowns:
a11 x1 + a12 x2 + · · · + a1n xn = c1
a21 x1 + a22 x2 + · · · + a2n xn = c2
..
.
am1 x1 + am2 x2 + · · · + amn xn = cm .
(4.2)
4.1. INTRODUCTION TO SYSTEMS AND MATRICES
107
We want to find the solution set, i.e. all values of x1 , . . . xn satisfying (4.2). The
numbers aij and bk are usually real numbers, but they could also be complex numbers;
in that case we must allow the unknowns xj to also be complex. The aij are called the
coefficients of the system (4.2) and we can gather them together in an array called a
matrix:


a11 a12 · · · a1n
 a21 a22 · · · a2n 


(4.3)
A= .
..
..
..  .
 ..
.
.
. 
am1
am2
···
amn
We also want to introduce vector notation for the unknowns x1 , . . . , xn and the values
on the right-hand side c1 , . . . , cm . In fact, let us represent these as “column vectors”:
 
 
c1
x1
 c2 
 x2 
 
 
(4.4)
c =  . .
x= . 
 .. 
 .. 
cm
xn
We want to be able to perform various algebraic operations with matrices and vectors,
including a definition of multiplication that will enable us to write (4.2) in the form
Ax = c,
(4.5)
where Ax represents the product of A and x that we have yet to define.
Let us now discuss the definition and algebra of matrices more systematically.
Definition 2. An (m × n)-matrix is a rectangular array of numbers arranged in m
rows and n columns. The numbers in the array are called the elements of the matrix.
An (m × 1)-matrix is called a column vector and a (1 × n)-matrix is called a row
vector; the elements of a vector are also called its components.
We shall generally denote matrices by bold faced capital letters such as A or B, although
we sometimes write A = (aij ) or B = (bij ) to identify the elements of the matrix. We
shall generally denote vectors by bold faced lower-case letters such as x or b. To
distinguish numbers from vectors and matrices, we frequently use the term scalar for
any number (usually a real number, but complex numbers can be used as well).
Now let us discuss matrix algebra. Here are the two simplest rules:
Multiplication by a Scalar. For a matrix A and a scalar λ we define λA by multiplying each element by λ.
Matrix Addition. For (m × n)-matrices A and B, we define their sum A+B by
adding elementwise.
Let us illustrate these rules with an example:
Example 3. Compute λA, λB, A + B, and where
1 2 3
1
λ = −2,
A=
,
B=
4 5 6
0
0
1
−1
.
−1
In the notation aij , the
first subscript refers to the
row and the second refers
to the column
108
CHAPTER 4. SYSTEMS OF LINEAR EQUATIONS AND MATRICES
Solution. To compute λA and λB, we multiply each element of the matrix by −2:
−2 −4 −6
−2 0 2
λA =
and
λB =
.
−8 −10 −12
0 −2 2
To compute A + B we simply add the corresponding elements:
1 2 3
1 0 −1
2 2 2
A+B=
+
=
.
4 5 6
0 1 −1
4 6 5
2
We also want to be able to multiply two matrices together, at least if they are of
the correct sizes. Before defining this, let us recall the definition of the dot product
of two vectors x = (x1 , x2 , . . . , xn ) and y = (y1 , y2 , . . . , yn ):
x · y = x1 y1 + x2 y2 + · · · + xn yn .
(4.6)
Now we use the dot product to define matrix multiplication. If we are given matrices A
and B, we want to define the matrix C so that AB = C. To determine the element cij
in the ith row and jth column of C, we consider the ith row of A and the jth column
of B as vectors and take their dot product:




a11 · · · a1n
c11 · · · c1j · · · c1p
  .
 ..
..   b
..
.. 
 .
 11 · · · b1j · · · b1p
 ..
·
·
·
.
···
.
···
. 

 .


..
..  =  c

 ai1 · · · ain   .
···
.
···
.   i1 · · · cij · · · cip 

 .
 .


..
.
..
.. 
 ..
 ..
···
.  bn1 · · · bnj · · · bnp
···
.
···
. 
am1 · · · amn
cm1 · · · cmj · · · cmp
cij = ai1 b1j + ai2 b2j + · · · + ain bnj .
Note that this is only possible if the number of columns in A equals the number of rows
in B. Let summarize this as follows:
Matrix Multiplication. For an (m × n)-matrix A with elements aij and an (n × p)matrix B with elements bij , we define their product C = A B to be the (m × p)-matrix
with elements
n
X
cij =
aik bkj .
(4.7)
k=1
Before we compute numerical examples, let us observe that this definition of matrix
multiplication is exactly what we need to represent the linear system (4.2) as (4.5):

  

a11 a12 · · · a1n
x1
a11 x1 + a12 x2 + · · · + a1n xn
 a21 a22 · · · a2n   x2   a21 x1 + a22 x2 + · · · + a2n xn


  

Ax =  .
=



,
.
.
.
.
.
..
..
..   ..  
..
 ..

am1
am2
···
amn
xn
so Ax = c is exactly the same thing as (4.2).
am1 x1 + am2 x2 + · · · + amn xn
4.1. INTRODUCTION TO SYSTEMS AND MATRICES
109
Now let us compute some numerical examples of matrix multiplication. Notice that
the matrices A and B in Example 3 are both (2 × 3)-matrices so the number of columns
of A does not match the number of rows of B, and they cannot be multiplied. Let us
try another example where the matrices are of the correct sizes.
Example 4. Compute A B and B A where
1
A=
4
2
5
3
6

and
1
B= 1
−1

0
0 .
−1
Solution. Note that A is a (2 × 3)-matrix and B is a (3 × 2)-matrix, so the product
A B is defined and is a (2 × 2)-matrix:
1+2−3 0+0−3
0 −3
AB =
=
.
4+5−6 0+0−6
3 −6
Note that the product B A is also defined and results in a (3 × 3)-matrix:


1
0 1 2 3


0
BA = 1
4 5 6
−1 −1

 

1+0
2+0
3+0
1
2
3
2+0
3+0 = 1
2
3 .
= 1+0
−1 − 4 −2 − 5 −3 − 6
−5 −7 −9
2
This last example shows that, even if two matrices A and B are of compatible sizes so
that both A B and B A are defined, we need not have A B = B A; in fact A B and
B A need not even be of the same size!
Let us summarize some of the properties of matrix algebra that are valid:
Properties of Matrix Addition and Multiplication. Let A, B, and C be matrices of
the appropriate sizes so that the indicated algebraic operations are well-defined.
Commutativity of Addition: A + B = B + A
Associativity of Addition: A + (B + C) = (A + B) + C
Associativity of Multiplication: A (B C) = (A B) C
Distributivity: A (B + C) = A B + A C and (A + B) C = A C + B C
Again, notice that commutativity of matrix multiplication is conspicuously absent since
in general A B 6= B A.
Now let us introduce an important class of matrices and a special member.
Definition 3. An (n × n)-matrix, i.e. one with the same number of rows and columns,
is called a square matrix. The square matrix with 1’s on its main diagonal, i.e. from
the upper left to lower right, and 0’s everywhere else is called the identity matrix. The
identity matrix is denoted by I, or In if we want to emphasize the size is (n × n).
The identity matrixis
1 0 ··· 0
 0 1 · · · 0


I = . .
..
.. 
 .. ..
.
.
0 0 ··· 1
110
CHAPTER 4. SYSTEMS OF LINEAR EQUATIONS AND MATRICES
If A and B are square matrices of the same size (n × n) then both A B and B A are
defined (although they need not be equal to each other). In particular, the identity
matrix has the property
IA = A = AI
(4.8)
for any square matrix A (of the same size).
Transpose of a Matrix
If A = (aij ) is an (m×n)-matrix, then its transpose AT is the (n×m)-matrix obtained
by using the rows of A as the columns of AT ; hence the columns of A become the rows
of AT . If we denote the elements of AT by aTij , then we find
aTij = aji .
(4.9)
Here are a couple of simple examples:
1
4
2
5

T
1
3
= 2
6
3

4
5
6

1
1
4
and
2
0
5
T 
3
1
−1 = 2
6
3
1
0
−1

4
5 .
6
Note that the transpose of a square matrix is also a square matrix (of the same size), and
may be obtained by “mirror reflection” of the elements of A across the main diagonal.
A square matrix A is symmetric if AT = A, i.e. aij = aji for i, j. Here are some
examples:




1 0 1
1 0 1
0 0 0  is symmetric, but  0 0 0  is not symmetric.
−1 0 −1
1 0 −1
The following properties of the transpose are easily verified and left as exercises:
i)
(AT )T = A,
ii)
(A + B)T = AT + BT ,
iii)
(AB)T = BT AT .
But the true significance of the transpose lies in its relationship to the dot product. In
particular, if A is an (m × n)-matrix, x is an m-vector, and y is an n-vector, then
x · Ay = AT x · y.
(4.10)
It is elementary (although perhaps a little confusing) to directly verify (4.10):


m
n
m X
n
X
X
X
x · Ay =
xi 
aij yj  =
aij xi yj
i=1
j=1
i=1 j=1




n
m
n
m
n X
m
X
X
X
X
X


AT x · y =
aTij xj  yi =
aji xj  yi =
aji xj yi .
i=1
j=1
i=1
j=1
i=1 j=1
We see these two quantities are the same (only the roles of i and j are different), so this
verifies (4.10).
4.1. INTRODUCTION TO SYSTEMS AND MATRICES
111
Exercises
1. For the following systems of two equations with two unknowns, determine whether
(i) there is a unique solution, and if so find it, (ii) there is no solution, or (iii)
there are infinitely many solutions:
(a)
(c)
3x1 + 2x2 = 9
x1 − x2 = 8
(b)
Sol’n
2x1 + 3x2 = 1
(d)
3x1 + 5x2 = 3
x1 − 2x2 = 3
2x1 − 4x2 = 5
4x1 − 2x2 = 12
6x1 − 3x2 = 18
2. For the following matrices A and B calculate 2A, 3B, A + B, and A − 2B:
1 0 −1
0 1 2
A=
,
B=
1 2 3
2 1 0
3. For the following
1
(a) A =
3
matrices, calculate AB if it exists:
2
2 1
,
B=
Sol’n
4
0 −1
1
(b) A =
3
2
,
4
2
B=
0
1
−1
1
(c) A =
3
2
,
4

2
B = 0
1

1
−1
3

1
(d) A = 4
5

2
3,
6
1
B=
−1
0
1
4. Calculate AT , BT , AB, BA, AT BT , and BT AT .


−1 1
3 1 −1
3
A=
,
B= 2
2 4 5
0 −4
5. Determine which of the following square matrices are symmetric:
1 2
1 2
A=
,
B=
,
2 −1
−2 1

1
C = 2
3
2
0
−1

3
−1,
4

0
D = 1
2
−1
0
3

−2
3
0
112
4.2
CHAPTER 4. SYSTEMS OF LINEAR EQUATIONS AND MATRICES
Gaussian Elimination
In this section we want to use matrices to address the solvability of (4.2). In addition to
the coefficient matrix A in (4.3) and the vector c in (4.4), we can gather both together
in a single matrix called the augmented coefficient matrix:


a11 a12 · · · a1n c1
 a21 a22 · · · a2n c2 


(4.11)
 ..
..
..
..
.. 
 .
.
.
.
. 
am1 am2 · · · amn cm
(The optional vertical line distinguishes the vector column from the coefficient matrix.)
The objective of “Gaussian elimination” is to perform operations on (4.11) that do not
change the solution set, but convert (4.11) to a new form from which the solution set
is easily described. The operations that we are allowed to perform are exactly those
used in the method of elimination reviewed in the previous section. Let us illustrate
this with an example:
Example 1. Apply the method of elimination to solve
x + 2y + z = 4
3x + 6y + 7z = 20
2x + 5y + 9z = 19.
Solution. We can multiply the first equation by −3 and add it to the second equation:
x + 2y + z = 4
4z = 8
2x + 5y + 9z = 19.
Now multiply the first equation by −2 and add it to the third equation:
x + 2y + z = 4
4z = 8
y + 7z = 11.
Now multiply the second equation by 1/4 and then interchange the second and third
equations:
x + 2y + z = 4
y + 7z = 11
z = 2.
The last row tells us that z = 2, but the other rows have a nice triangular form allowing
us to iteratively substitute back into the previous equation to find the other unknowns:
z = 2 into 2nd equation ⇒ y + 7(2) = 11 ⇒ y = −3,
z = 2, y = −3 into 1st equation ⇒ x + 2(−3) + (2) = 4 ⇒ x = 8.
4.2. GAUSSIAN ELIMINATION
We have our (unique) solution: x = 8, y = −3, z = 2.
113
2
In this example, we used three types of operations that did not affect the solution
set: i) multiply one equation by a number and add it to another equation, ii) multiply
one equation by a nonzero number, and iii) interchange two equations. However, we
carried along the unknowns x, y, and z unneccessarily; we could have performed these
operations on the augmented coefficient matrix:








1 2 1 4
1 2 1 4
1 2 1 4
1 2 1 4
 3 6 7 20  →  0 0 4 8  →  0 0 4 8  →  0 1 7 11 
2 5 9 19
2 5 9 19
0 1 7 11
0 0 1 2
Here we see that the final version of the augmented matrix is indeed in a triangular form
that enables us to conclude z = 2 and then “back substitute” as before to find y = −3
and x = 8. When performed on a matrix, we call these “elementary row operations”:
Elementary Row Operations (EROs). The following actions on an (m × n)-matrix
are called elementary row operations:
• Multiply any row by a nonzero number.
• Interchange any two rows.
• Multiply one row by a number and add it to another row (leaving the first row
unchanged).
It will often be convenient to denote the rows of a matrix by R1 , R2 , etc.
Notice that elementary row operations are reversible: if an ERO transforms the
matrix A into the matrix B, then there is an ERO that transforms B to A. This
enables us to make the following definition.
Definition 1. Matrices A and B are called row-equivalent if one may be transformed
into the other by means of elementary row operations. In this case we write A ∼ B.
The fact that EROs are reversible also means that the following holds:
Theorem 1. If the augmented coefficient matrix for a system of linear equations can be
transformed by EROs to the augmented coefficient matrix for another system of linear
equations, then the two systems have the same solution set.
We can now specify that Gaussian elimination involves the use of elementary row
operations to convert a given augmented coefficient matrix into a form from which we
can easily determine the solution set. However, we have not specified what this final
form should look like. We address this next:
Row-Echelon Form (REF). An (m × n)-matrix is in echelon or row-echelon form
if it meets the following conditions:
1. All rows consisting entirely of zeros lie beneath all nonzero rows.
2. The first nonzero element in any row is a 1; this element is called a leading 1.
3. Any leading 1 must lie to the right of any leading 1’s above it.
If you add λR1 to R2 , be
sure to leave R1 in place.
114
CHAPTER 4. SYSTEMS OF LINEAR EQUATIONS AND MATRICES
At first glance, it is difficult to tell what this definition means, so let us give some simple
examples. It is easy to check that the following three matrices are all in row-echelon
form (and we have colored the leading 1’s in red):






1 2 3 4
1 2 3
1 0 −1 1
0 1 0 1


0 1 2
0 0 1 −1
0 0 0 1 .
0 0 1
0 0 0
0
0 0 0 0
In particular, note that a matrix does not need to be a square matrix to be in rowechelon form. However, here are some examples of matrices that are not in row-echelon
form:






1 2 3
1 2 3
1 1 −1 1
0 0 0
0 1 2
1 0 1 −1 .
0 0 1
0 0 2
0 0 0
0
In fact, the first of these violates condition 1, the second violates condition 2, and the
third violates condition 3.
Now let us suppose that we have used elementary row operations to put the augmented coefficient matrix for a linear system into row-echelon form. How do we determine the solution set?
Determining the Solution Set from the REF. Suppose an augmented coefficient matrix
is in row-echelon form.
1. The leading variables correspond to columns that contain a leading 1.
2. All other variables are called free variables; set each free variable equal to a
free parameter.
3. Use the first nonzero row (from the bottom) to solve for the corresponding
leading variable in terms of the free parameters.
4. Continue up the matrix until all leading variables have been expressed in terms
of the free parameters. (This process is called back substitution.)
Let us illustrate this with two examples.
Example 2. Suppose a system of linear equations has an augmented coefficient matrix
in the following form:
1 2 −1 1
.
0 0 1 −1
Determine the solution set.
Solution. This matrix has 4 columns, but the last one corresponds to the right hand
side of the system of equations, so there are 3 variables x1 , x2 , and x3 (although we
could also have called them x, y, z). Since the first and third columns both contain
leading 1’s, we identify x1 and x3 as leading variables. Since the second column does
not contain a leading 1, we recognize x2 as a free variable, and we let x2 = t, where t is
4.2. GAUSSIAN ELIMINATION
115
a free parameter. Now the first nonzero row from the bottom is the second row and it
tells us
x3 = −1.
The remaining nonzero row is the first one and it tells us
x1 + 2x2 − x3 = 1.
But we know x2 = t and x3 = −1, so we “back substitute” to obtain x1 + 2t − (−1) = 1
or x1 = −2t. The solutions set is
x1 = −2t,
x2 = t,
x3 = −1
where t is any real number.
In particular, we have found that the system has an infinite number of solutions.
2
Example 3. Suppose a system of linear equations has an augmented coefficient matrix
in the following form:


1 2 3 0
 0 1 0 −1  .
0 0 0 1
Determine the solution set.
Solution. The last equation reads 0 = 1! This is impossible, so the system is inconsistent, i.e. there are no solutions to this system.
2
Remark 1. Generalizing Example 3, we see that if an augmented coefficient matrix has
a row in the form (0 . . . 0 b) where b 6= 0, then there cannot be a solution, i.e. the system
is inconsistent.
There is one more piece of our analysis: we would like an algorithmic procedure to
go from the original augmented coefficient matrix to the row-echelon form. It is this
algorithm that often is called “Gaussian elimination.”
Gaussian Elimination Algorithm. Suppose A is an (m × n)-matrix.
1. Find the leftmost nonzero column; this is called a pivot column and the top
position in this column is called the pivot position. (If A = 0, go to Step 6.)
2. Use elementary row operations to put a 1 in the pivot position.
3. Use elementary row operations to put zeros in all positions below the pivot
position.
4. If there are no more nonzero rows below the pivot position, go to Step 6; otherwise go to Step 5.
5. Repeat Steps 1-4 with the sub-matrix below and to the right of the pivot position.
6. The matrix is in row-echelon form.
116
CHAPTER 4. SYSTEMS OF LINEAR EQUATIONS AND MATRICES
Example 4. Use Gaussian elimination to
form:

1 −1 3
2 0 10
3 0 15
put the following matrix into row-echelon

−3 0
−6 2  .
−4 −2
Solution. The first column is nonzero, so it is the pivot column and we already have
1 in the pivot position, which we have colored red in the displayed matrices below. We
use EROs to put zeros below the pivot position:

 

1 −1 3 −3 0
1 −1 3 −3 0
2 0 10 −6 2  ∼ 0 2
4
0
2  (add − 2R1 to R2 )
3 0 15 −4 −2
3 0 15 −4 −2


1 −1 3 −3 0
2  (add − 3R1 to R3 )
∼ 0 2 4 0
0 3 6 5 −2
2
Now we repeat the process on the 2 × 4-matrix containing the new pivot column
.
3
There is a 2 in the pivot position, so we color it red below as we replace it with a 1:

 

1 −1 3 −3 0
1 −1 3 −3 0
1
0 2 4 0
2  ∼ 0 1 2 0
1  (multiply R2 by )
2
0 3 6 5 −2
0 3 6 5 −2
Now we want zeros below the pivot position:
 

1 −1 3 −3
1 −1 3 −3 0
0 1 2 0
1  ∼ 0 1 2 0
0 0 0 5
0 3 6 5 −2
Finally, we look
we make a 1:

1 −1
0 1
0 0

0
1
−5
(add − 3R2 to R3 )
at the last row, and see that there is a 5 in the pivot position, which
3 −3
2 0
0 5
 
1
0
1  ∼ 0
0
−5
−1
1
0
3 −3
2 0
0 1

0
1
−1
(multiply R3 by 1/5)
2
This is in row-echelon form.
Let us do one more example where we start with a system of linear equations and
finish with the solution set.
Example 5. Find the solution set for
x1 + 2 x2 + x4 = 3
x1 + x2 + x3 + x4 = 1
x2 − x3 = 2.
Solution. Notice that there are three equations with four unknowns, so putting the
augmented coefficient matrix into row-echelon form will result in at most three leading
4.2. GAUSSIAN ELIMINATION
117
variables. Consequently, there must be at least one free variable; even at this stage we
know that there must be an infinite number of solutions.
Now let us put the augmented coefficient matrix into row-echelon form:
 


1 2
0 1 3
1 2 0 1 3
 1 1 1 1 1  ∼  0 −1 1 0 −2  (add − R1 to R2 )
0 1 −1 0 2
0 1 −1 0 2


1 2 0 1 3
∼  0 1 −1 0 2  (add R2 to R3 , then multiply R2 by -1).
0 0 0 0 0
We see that x3 = s and x4 = t are both free variables and we can solve for x1 and x2
in terms of them:
R2 : x2 − s = 2 ⇒ x2 = s + 2
R1 : x1 + 2(s + 2) + t = 3 ⇒ x1 = −1 − 2s − t.
We can write the solution set as (x1 , x2 , x3 , x4 ) = (−1 − 2s − t, s + 2, s, t), where s, t are
any real numbers.
2
Homogeneous Systems
Of particular interest is (4.5) when c is the vector 0, whose components are all zero. In
this case, we want to solve Ax = 0, which is called a homogeneous system. (Note the
similarity with a homogeneous linear differential equation L(D)y = 0, as discussed in
Chapter 2.) One solution is immediately obvious, namely the trivial solution x = 0.
The question is whether there are nontrivial solutions? Gaussian elimination enables us
to answer this question. Moreover, since the augmented coefficient matrix has all zeros
in the last column, we can ignore it in our Gaussian elimination.
Example 6. Find the solution set for
x1 + x2 + x3 − x4 = 0
−x1 − x3 + 2 x4 = 0
x1 + 3 x2 + x3 + 2x4 = 0.
Solution. To begin with, the number of equations m = 3 is less than the number of
variables n = 4, so we suspect there will be an infinite number of solutions. We apply
Gaussian elimination to the coefficient matrix to find the solution set:

 

1 1 1 −1
1 1 1 −1
−1 0 −1 2  ∼ 0 1 0 1  (add R1 to R2 , then −R1 to R3 )
1 3 1
2
0 2 0 3


1 1 1 −1
∼ 0 1 0 1  (add − 2R2 to R3 ).
0 0 0 1
Remember that the augmented coefficient matrix has another column of zeros on the
right, so the last row means we must have x4 = 0. We also see that x3 is a free variable,
118
CHAPTER 4. SYSTEMS OF LINEAR EQUATIONS AND MATRICES
so we let x3 = s. From the second row we conclude x2 = 0. Finally, the first row tells
us x1 + 0 + s − 0 = 0, so x1 = −s. Our infinite solution set is
x1 = −s, x2 = 0, x3 = s, x4 = 0.
It includes the trivial solution x = 0 since we can take s = 0.
2
Systems with Complex Coefficients
The linear systems and matrices that we have considered so far all had real coefficients,
but the principles that we have followed work equally well when the coefficients are
complex numbers; of course, in this case the solutions will also be complex numbers.
In order to put a 1 in the leading position, we often need to multiply by the complex
conjugate; for a review of this and other properties of complex numbers, cf. Appendix
A. Let us consider an example.
Example 7. Find the solution set for
(1 + i) x1 + (1 − 3i) x2 = 0
(−1 + i) x1 + (3 + i) x2 = 0.
Solution. The augmented coefficient matrix for this system is
(1 + i) (1 − 3i) 0
.
(−1 + i) (3 + i) 0
We want to use elementary row operations to put this into row-echelon form. But since
the system is homogeneous, as we saw above we may omit the last column and just
work with the coefficient matrix. We begin by making (1 + i) real by multiplying by its
complex conjugate (1 − i) and then dividing by the resultant real number:
(1 + i) (1 − 3i)
2
(−2 − 4i)
1
(−1 − 2i)
∼
∼
.
(−1 + i) (3 + i)
(−1 + i)
(3 + i)
(−1 + i)
(3 + i)
Now we want to make the entry below the leading 1 a zero, so we multiply the first row
by 1 − i and add it to the second row:
1
(−1 − 2i)
1 (−1 − 2i)
∼
.
(−1 + i)
(3 + i)
0
0
This is in row-echelon form and we see that x2 is a free variable, so we let x2 = t and
we solve for x1 to find x1 = (1 + 2i)t. But since we are allowing complex numbers, we
can allow t to take complex values, so our solution set is (x1 , x2 ) = ((1 + 2i)t, t) where
t is any complex number.
2
Linear systems with complex coefficients are of interest in their own right. However,
they also come up when studying real linear systems; for example when the coefficient
matrix has complex eigenvalues (cf. Section 6.1).
4.2. GAUSSIAN ELIMINATION
119
Exercises
1. Determine whether

1 2 3
(a) 0 1 2
0 0 0

2 1 3
(c) 0 1 0
0 0 1
the following matrices are in row-echelon form:



0
1 2 3 0
3
(b) 0 0 1 0
1
0 1 0 0



0
0 1 2 0
0
(d) 0 1 0 0
0
0 0 0 0
2. Use Gaussian elimination to put the following matrices into row-echelon form:


0 1 2
2 3
(b) 0 1 3
(a)
Sol’n
1 −1
0 2 4




2 1 0 −1
0 1
(c) 0 0 1 0 
(d) 1 2
1 2 3 4
1 0
3. For the following linear systems, put the augmented coefficient matrix into rowechelon form, and then use back substitution to find all solutions:
(a)
(b)
2x1 + 8x2 + 3x3 = 2
x1 + 3x2 + 2x3 = 5
3x1 − 6x2 − 2x3 = 1
Sol 0 n
2x1 − 4x2 + x3 = 17
x1 − 2x2 − 2x3 = −9
2x1 + 7x2 + 4x3 = 8
(c)
4x1 + 3x2 + x3 = 8
(d)
2x1 + 5x2 + 12x3 = 6
2x1 + x2 = 3
3x1 + x2 + 5x3 = 12
−x1 + x3 = 1
5x1 + 8x2 + 21x3 = 17
x1 + 2x2 + x3 = 3
4. Apply Gaussian elimination to the coefficient matrix of the following homogeneous
systems to determine the solution set.
(a)
(c)
3x1 + 2x2 − 3x3 = 0
(b)
2x1 − x2 − x3 = 0
2x1 + x2 + x3 = 0
5x1 − x2 + 2x3 = 0
5x1 − 4x2 + x3 = 0
x1 + x2 + 4x3 = 0
x1 − 5x3 + 4x4 = 0
x2 + 2x3 − 7x4 = 0
Sol 0 n
(d) x1 − 3x2 + 6x4 = 0
x3 + 9x4 = 0
5. Find all solutions for the following systems with complex coefficients:
(a)
(1 − i) x1 + 2i x2 = 0
(1 + i) x1 − 2 x2 = 0
(c)
i x1 + (1 − i) x2 = 0
(−1 + i) x1 + 2 x2 = 0
Sol 0 n
(b)
i x1 + x2 = 1
2 x1 + (1 − i) x2 = i
(d)
x1 + i x2 = 1
x1 − x2 = 1 + i
120
CHAPTER 4. SYSTEMS OF LINEAR EQUATIONS AND MATRICES
4.3
Reduced Row-Echelon Form and Rank
As we saw in the previous section, every (m × n)-matrix can be put into row-echelon
form. However, the row-echelon form is not unique: any row containing a leading 1
can be added to any row above it, and the result is still in row-echelon form. On the
other hand, every (m × n)-matrix can be put into a particular row-echelon form that is
unique: this is called reduced row-echelon form.
Reduced Row-Echelon Form (RREF). An (m×n)-matrix is in reduced row-echelon
form if it meets the following conditions:
1. It is in row-echelon form.
2. Any leading 1 is the only nonzero element in that column.
Of course, we need to know how to put a matrix into reduced row-echelon form; this is
straight-forward, but it is given the name “Gauss-Jordan elimination”:
Gauss-Jordan Elimination Algorithm. Suppose A is an (m × n)-matrix.
1. Use Gaussian elimination to put A in row-echelon form.
2. Use each leading 1 to make all entries above it equal to zero.
The following are examples of matrices in



0 1
1 0 0
0 0
0 1 0
0 0
0 0 0
reduced row-echelon


0 2
1 2
0 0
1 −1
0 0
0 0
form:

0
1 .
0
The Gauss-Jordan algorithm shows that every matrix can be put into reduced rowechelon form. A little experimentation should convince you that two distinct matrices
in reduced row-echelon form cannot be row-equivalent; this means the RREF is unique.
While this is not really a proof, let us record this as a theorem:
Theorem 1. Every (m × n)-matrix A is row-equivalent to a unique matrix in reduced
row-echelon form denoted rref(A).
Let us illustrate the theorem with an example:
Example 1. Put the following matrix
find all solutions of Ax = 0:

1
A = 0
2
A into reduced row-echelon form, and use it to
1
1
0
5
3
4
2
−1
1

−1
1 .
1
Solution. We first put the matrix into row-echelon form. The first column is the pivot
column, and it already has a 1 in the pivot position, so we want to put zeros in the
4.3. REDUCED ROW-ECHELON FORM AND RANK
positions below

1 1
0 1
2 0
121
it:
5
3
4
2
−1
1
 
−1
1
1  ∼ 0
1
0
1
1
−2
5
3
−6
2
−1
−3

−1
1
3
(add − 2R1 to R3 )
1
Now we consider the (2 × 4)-matrix beginning with the column
. We have a 1 in
−2
the pivot position, so we put a zero in the position below:

 

1 1
5
2 −1
1 1 5 2 −1
0 1
3 −1 1  ∼ 0 1 3 −1 1  (add 2R2 to R3 )
0 −2 −6 −3 3
0 0 0 −5 5
Finally, we

1
0
0
consider the bottom row, and we want a leading 1 where −5 is:
 

1 5 2 −1
1 1 5 2 −1
1 3 −1 1  ∼ 0 1 3 −1 1  (multiply R3 by − 1/5)
0 0 −5 5
0 0 0 1 −1
This is in row-echelon form. For reduced row-echelon form, we want to put zeros above
each leading 1. We will move a little more quickly:

 

1 1 5 2 −1
1 1 5 2 −1
0 1 3 −1 1  ∼ 0 1 3 0 0  (add R3 to R2 )
0 0 0 1 −1
0 0 0 1 −1


1 1 5 0 1
∼ 0 1 3 0 0  (add − 2R3 to R1 )
0 0 0 1 −1


1 0 2 0 1
∼ 0 1 3 0 0  (add − R2 to R1 )
0 0 0 1 −1
The last matrix is rref(A). From this we conclude that x3 = s and x5 = t are free
variables, and we can use them to find the other variables:
x1 + 2x3 + x5 = 0
⇒
x1 = −2s − t
x2 + 3x3 = 0
⇒
x2 = −3s
x4 − x5 = 0
⇒
x4 = t.
Notice that we did not need to use back substitution to solve for the leading variables
x1 , x2 , and x4 .
2
Remark 1. The RREF is more convenient than the REF for finding solutions since
it does not require back substitution. However, for very large systems, the REF plus
back substitution is computationally more efficient than using extra ERO’s to convert
the REF to RREF.
122
CHAPTER 4. SYSTEMS OF LINEAR EQUATIONS AND MATRICES
It is often useful to express solutions of linear systems in vector form, in which free
parameters appear as multiplicative factors for fixed vectors. This is most conveniently
done using column vectors. We illustrate this with two examples.
Example 2. Express the solutions in vector form for Example 1.
Solution. We found above that the solutions are (x1 , x2 , x3 , x4 ) = (−2s − t, −3s, s, t, t)
where s and t are free parameters, but we want to express the solution in the vector
form sv1 + tv2 . To find the vectors v1 and v2 we separate the s and t parts:
 
 


−1
−2
−2s − t
0
−3
 −3s 
 
 


 
 

2
x=
 s  = s  1  + t  0  = s v1 + t v2 .
1
0
 t 
1
0
t
We can also express solutions of nonhomogeneous systems in vector form, but now
there will be a fixed vector v0 that has no free parameters as multiplicative factors.
Example 3. Express the solutions in vector form for Example 5 of Section 4.2.
Solution. We found (x1 , x2 , x3 , x4 ) = (−1 − 2s − t, 2 + s, s, t), so
 
 
  

−1
−2
−1
−1 − 2s − t
0
1
 2+s   2 
 =   + s   + t   = v0 + sv1 + tv2 .
x=
0
1
 0

s
1
0
0
t
2
The Rank of a Matrix
Definition 1. The rank of an (m × n)-matrix is the number of nonzero rows in its
reduced row-echelon form.
It is clear that rank(A) ≤ m. Moreover, since each nonzero row in rref(A) contains
exactly one leading 1, we see that rank(A) is also the number of leading variables; in
particular, rank(A) cannot exceed the total number of variables, so rank(A)≤ n.
Example 4. Find the rank of

−1
A= 3
2
the following matrices


−1
1 2
2 4
B= 3
2
3 7

1 2
2 4 .
3 6
Solution. We put A into reduced row-echelon form (moving still

 
 
 
−1 1 2
1 −1 −2
1 −1 −2
1
 3 2 4 ∼ 0 5
10  ∼ 0 1
2  ∼ 0
2 3 7
0 5
11
0 0
1
0
So we conclude that

−1 1
3 2
2 3
(4.12)
more quickly):

0 0
1 0 .
0 1
rank(A)=3. Similarly, we put B into reduced
 
 
 
2
1 −1 −2
1 −1 −2
1
4 ∼ 0 5
10  ∼ 0 1
2  ∼ 0
6
0 5
10
0 0
0
0
We conclude that rank(B)=2.
row-echelon form:

0 0
1 2 .
0 0
2
4.3. REDUCED ROW-ECHELON FORM AND RANK
123
Remark 2. Since all row-echelon forms of a matrix have the same number of zero rows,
the rank is actually equal to the number of nonzero rows in any of its row-echelon forms.
The rank of the (m × n)-matrix A has important implications for the solvability
of a linear system Ax = c. If rank (A) = m then rref(A) has no zero rows; therefore
the RREF of the augmented coefficient matrix has no rows of the form (0 0 · · · 0 1),
so Ax = c is solvable (no matter what c is!). If rank (A) = n then there is a leading
1 in each column, which means there are no free variables and there is at most one
solution. On the other hand, if rank (A) < n, then there is at least one free variable;
if the system is consistent, there will be infinitely many solutions, but if inconsistent,
then no solutions. We summarize this discussion in the following:
Theorem 2. Consider a system of m linear equations in n unknowns and let A be the
coefficient matrix.
(a) We always have rank (A) ≤ m and rank (A) ≤ n;
(b) If rank (A) = m then the system is consistent and has at least one solution;
(c) If rank (A) = n then the system has at most one solution;
(d) If rank (A) < n then the system has either infinitely many solutions or none.
This theorem is of more theoretical than practical significance: for a given linear system one should use Gaussian elimination (or Gauss-Jordan elimination) to obtain the
solution set, which provides more information than the conclusions of the theorem.
Nevertheless, the theorem yields some useful observations.
Corollary 1. A linear system with fewer equations than unknowns either has infinitely
many solutions or none.
A homogeneous linear system always admits the trivial solution, so we can sharpen the
conclusions in this case.
Corollary 2. Suppose A is the coefficient matrix for a homogeneous linear system with
n unknowns. If rank(A) = n then 0 is the only solution, but if rank(A) < n then there
are infinitely many solutions.
Finally, let us consider the special case of a linear system with the same number of
equations and unknowns; in this case the coefficient matrix A is a square matrix.
Corollary 3. If A is an (n × n)-matrix, then the linear system Ax = c has a unique
solution x for every choice of c if and only if rank(A) = n.
This last corollary will be very useful in the next section when we consider the inverse
of a square matrix.
As we observed, in practice one simply uses the RREF to determine solvability, so
let us do an example, using it to illustrate the theoretical results above.
Example 5. Determine the solution set for the linear system
x1 + 2x2 + 3x3 = −2
2x1 + 5x2 + 5x3 = −3
x1 + 3x2 + 2x3 = −1.
124
CHAPTER 4. SYSTEMS OF LINEAR EQUATIONS AND MATRICES
Solution. We

1
 2
1
put the augmented coefficient matrix into
 
 
2 3 −2
1 2 3 −2
5 5 −3  ∼  0 1 −1 1  ∼ 
3 2 −1
0 1 −1 1
RREF:
1 0
0 1
0 0
5
−1
0

−4
1 .
0
We see that the coefficient matrix A has rank 2, but this system is consistent. We let
x3 = t be a free variable and the solution set is (x1 , x2 , x3 ) = (−4 − 5t, 1 + t, t). (Note
that, for other values of c, the system Ax = c may be inconsistent!)
2
Exercises
1. The following matrices are in row-echelon form; find the reduced row-echelon form:






1 2 3
1 2 3 4
1 −1 3 0
1 2
(a)
(b) 0 1 4
(c) 0 0 1 0
(d) 0 1 2 0 
0 1
0 0 0
0 0 0 1
0 0 1 −1
2. For the following linear systems, put the augmented coefficient matrix into reduced
row-echelon form, and use this to find the solution set:
(a)
x1 + 2x2 + x3 = 1
3x1 + 5x2 − x3 = 14
(b)
3x1 + 5x2 + x3 = 0
x1 + 2x2 + x3 = 3
2x1 + 6x2 + 7x3 = 10
4x1 − 2x2 − 3x3 + x4 = 3
(c)
2x1 + 5x2 + 6x3 = 2
(d)
2x1 − 2x2 − 5x3 = −10
4x1 + x2 + 2x3 + x4 = 17
3x1 + x3 + x4 = 12
(e)
x1 − 2x2 + 3x3 + 2x4 + x5 = 10
2x1 − 4x2 + 8x3 + 3x4 + 10x5 = 7
3x1 − 6x2 + 10x3 + 6x4 + 5x5 = 27
3x1 + x2 + x3 + 6x4 = 14
x1 − 2x2 + 5x3 − 5x4 = −7
4x1 + x2 + 2x3 + 7x4 = 17
Solution
(f) x1 + x2 + x3 = 6
2x1 − 2x2 − 5x3 = −13
3x1 + x3 + x4 = 13
4x1 − 2x2 − 3x3 = −3
3. For the systems in Exercise 2, find the solution set in vector form. Sol’n (d)
4. Determine the solutions in vector form for Ax = 0.




2 1 5
1 −1 0 −1
(a) A = −1 1 −1 , Sol’n
(b) A = 2 1 3 7  ,
1 1 3
3 −2 1 0


0 1 0
1
1 0 −1 0 
1 0 −1 2 7


(c) A = 
,
(d) A =
.
0 0 1 −1
0 1 2 −3 4
1 1 0
0
5. Determine the rank of the following matrices:

0
−3 12
2 3
, (c) −1
(a)
Sol’n , (b)
2 −8
−1 4
0


1 2
−1
0 1 , (d)  1
0 −1
1
−2
3
−3

−2
1
7
4.4. INVERSE OF A SQUARE MATRIX
4.4
125
Inverse of a Square Matrix
In this section we study the invertibility of a square matrix: if A is an (n × n)-matrix,
then we say that A is invertible if there exists an (n × n)-matrix B such that
AB = I
and BA = I,
(4.13)
where I is the (n × n) identity matrix. We shall explain shortly why we are interested in
the invertibility of square matrices, but first let us observe that the matrix B in (4.13)
is unique:
Proposition 1. If A is an (n × n)-matrix and two (n × n)-matrices B and C satisfy
A B = I = B A and A C = I = C A, then B = C.
Proof. Since A C = I, we can multiply on the left by B to conclude B (A C) = B I = B.
But B(A C) = (BA) C = I C = C, so B = C.
2
This uniqueness enables us to introduce the following notation and terminology.
Definition 1. If A is an invertible (n × n)-matrix then we denote by A−1 the unique
(n × n)-matrix satisfying
A A−1 = I = A−1 A.
We call A−1 the inverse matrix (or just the inverse) of A.
Of course, the notation A−1 is reminiscent of a−1 , which is the inverse of the nonzero
real number a. And just like with real numbers, not all square matrices are invertible:
certainly the zero matrix 0 (i.e. with all elements equal to 0) is not invertible, but also
the (2 × 2)-matrix
a 0
A=
where a is any real number
0 0
is not invertible (cf. Exercise 1(a)). Matrices that are not invertible are called singular.
Let us observe that inverses have some interesting properties
Proposition 2. If A and B are invertible matrices of the same size, then
(a) A−1 is invertible and (A−1 )−1 = A;
(b) AB is invertible and (AB)−1 = B−1 A−1 .
Proof. (a) The formula AA−1 = I = A−1 A actually shows that A−1 is invertible
and (A−1 )−1 = A. (b) We just observe (AB)(B−1 A−1 ) = A(BB−1 )A−1 = AIA−1 =
AA−1 = I and similarly (B−1 A−1 )(AB) = I.
2
Now let us explain our primary interest in the inverse of a matrix. Suppose we have
a linear system of n equations in n unknowns that we write as
A x = b,
(4.14)
where A is the coefficient matrix. If A is invertible, then we can solve (4.14) simply by
letting x = A−1 b :
x = A−1 b
⇒
A x = A (A−1 b) = (AA−1 ) b = I b = b.
126
CHAPTER 4. SYSTEMS OF LINEAR EQUATIONS AND MATRICES
Solving Ax=b.
If A is an invertible (n × n)-matrix, then we can solve (4.14) by letting
x = A−1 b.
Consequently, we would like to find a method for calculating A−1 .
When A is a (2×2)-matrix, the construction of A−1 is provided by a simple formula:
Inverting a (2 × 2)-matrix.
Recall from Section 2.2 the determinant of a (2 × 2)-matrix:
a b
A=
⇒ det(A) = ad − bc.
c d
If det(A) 6= 0, then A is invertible and
A−1 =
1
d −b
.
det(A) −c a
If det(A) = 0, then A is not invertible.
To verify that the above formula for A−1 works when det(A) 6= 0, just compute AA−1
and A−1 A. When det(A) = 0, then A is row-equivalent to a matrix with a row of
zeros; but then b can be chosen so that the row operations performed on the augmented
coefficient matrix will be inconsistent, so (4.14) is not solvable, and A is not invertible.
Example 1. Find the inverses for the following matrices:
1 2
2
a) A =
b) B =
3 4
4
3
6
Solution. det(A)= 4 − 6 = −2, so A is invertible and
1 4 −2
−2
1
−1
=
.
A =−
3/2 −1/2
2 −3 1
On the other hand, det(B)= 12 − 12 = 0, so B is not invertible.
Example 2. Solve the following linear systems
(a)
2x1 + 7x2 = 1
x1 + 3x2 = −1
Solution. Both systems (a) and (b) have the
2
A=
1
(b)
2x1 + 7x2 = 2
x1 + 3x2 = 0.
same coefficient matrix
7
.
3
2
4.4. INVERSE OF A SQUARE MATRIX
127
We calculate det(A)= −1 and
A
−1
3
=−
−1
−7
−3
=
2
1
7
.
−2
We can now use A−1 to solve both (a) and (b):
−3 7
1
−10
(a) x =
=
1 −2 −1
3
−3 7
2
−6
(b) x =
=
.
1 −2 0
2
2
We now turn to the general case of trying to invert an n × n-matrix. We first observe
that the invertibility of A is determined by its rank.
Theorem 1. An (n × n)-matrix A is invertible if and only if rank(A) = n.
The proof of this theorem actually provides a method for computing A−1 , but we need
some notation. For j = 1, . . . , n, let ej denote the jth column vector of I:
 
 
0
1
1
0
 
 
I = e1 · · · en
where e1 =  .  , e2 =  .  , etc.
 .. 
 .. 
0
0
Proof. ⇒: If A−1 exists, then for any b we can solve Ax = b uniquely by x=A−1 b.
But we know from Theorem 2 in Section 4.3 that a unique solution of Ax = b requires
rank(A) = n. So A being invertible implies rank(A) = n.
⇐: We assume rank(A) = n, and we want to construct A−1 . For j = 1, . . . , n, let xj
be the unique solution of Axj = ej , and let X denote the matrix with column vectors
xj . Then we compute
AX = Ax1 Ax2 · · · Axn = e1 e2 · · · en = I.
We have shown AX = I, but in order to conclude X = A−1 we also need to show
XA = I. To do this, we use a little trick: multiply AX = I on the right by A to obtain
AXA = IA = A
⇔
A(XA − I) = 0.
Now this last equation says that every column vector y in XA-I satisfies Ay = 0. But
using Corollary 2 of Section 4.3, we know that Ay = 0 only has the trivial solution
y = 0, so we conclude that every column vector in XA-I is zero, i.e. XA = I. We
indeed conclude that X = A−1 .
2
Now let us use the proof of Theorem 1 to obtain an algorithm for finding the inverse
of a square matrix. To be specific, let us take n = 3. The matrix X has column vectors
x1 , x2 , x3 found by solving Ax1 = e1 , Ax2 = e2 , Ax3 = e3 . To find x1 we use
Gauss-Jordan elimnation:
 


1 0 0 x11
a11 a12 a13 1
 a21 a22 a23 0  ∼  0 1 0 x21  for some values x11 , x21 , x31
a11 a12 a13 0
0 0 1 x31
128
CHAPTER 4. SYSTEMS OF LINEAR EQUATIONS AND MATRICES
and then we let
 
x11
x1 = x21  .
x31
We then do the same thing with Ax2 = e2 and Ax3 = e3 to find x2 and x3 . But in
each case, the row operations are the same, i.e. to transform A to I, so we might as well
do all three at once:

 

1 0 0 x11 x12 x13
a11 a12 a13 1 0 0
 a21 a22 a23 0 1 0  ∼  0 1 0 x21 x22 x23  .
a11 a12 a13 0 0 1
0 0 1 x31 x32 x33
Then we let X = A−1 be the (3 × 3)-matrix on the right. This process is called the
Gauss-Jordan method for inverting a matrix.
Gauss-Jordan Method for Inverting a Matrix.
If A and X are (n × n)-matrices such that
A|I ∼ I|X ,
then A is invertible and X = A−1 .
Example 3. Use the Gauss-Jordan method
is invertible, and if so find A−1 :

3
A = 1
2
Solution. We apply

3
 1
2
to determine whether the following matrix
5
2
6

1
1 .
7
the Gauss-Jordan method:
 

5 1 1 0 0
1 2 1 0 1 0
2 1 0 1 0 ∼ 3 5 1 1 0 0 
6 7 0 0 1
2 6 7 0 0 1


1 2
1 0 1 0
∼  0 −1 −2 1 −3 0 
0 2
5 0 −2 1


1 2 1 0
1 0
∼  0 1 2 −1 3 0 
0 0 1 2 −8 1


1 2 0 −2 9 −1
∼  0 1 0 −5 19 −2 
0 0 1 2 −8 1


1 0 0 8 −29 3
∼  0 1 0 −5 19 −2 
0 0 1 2
−8
1
4.4. INVERSE OF A SQUARE MATRIX
129
Since we were able to transform A into I by ERO’s, we conclude A is invertible and


8 −29 3
A−1 = −5 19 −2 .
2
−8
1
Of course, we can check our work by making sure that A−1 A = I.
2
Now let us illustrate the use of the inverse matrix to solve a linear system.
Example 4. Use the inverse of the coefficient matrix to solve
3 x1 + 5 x2 + x3 = 1
x1 + 2 x2 + x3 = 0
2 x1 + 6 x2 + 7 x3 = −1.
Solution. If we write the above linear

3 5
A = 1 2
2 6
system in the form Ax=b, then we see that

 
1
1
1
b =  0 .
7
−1
However, we computed A−1 in the previous example, so we can calculate x = A−1 b:
   

5
1
8 −29 3
x = −5 19 −2  0  = −3 .
1
−1
2
−8
1
In other words, the solution is x1 = 5, x2 = −3, x3 = 1.
2
Remark 1. Of course, instead of computing A−1 , we can solve (4.14) using Gaussian
elimination, which is computationally more efficient. However, the advantage of finding
A−1 is that we can use it repeatedly if we want to solve (4.14) for several values of b.
This was illustrated in Example 2.
Invertibility Conditions
Let us gather together various equivalent conditions for the invertibility of a square
matrix.
Invertibility Conditions.
For an (n × n)-matrix A, the following are equivalent.
1. A is invertible,
2. A is row-equivalent to I,
3. rank(A)=n,
4. Ax = 0 implies x = 0,
5. for any n-vector b there exists a unique solution of Ax = b,
6. det(A)6= 0.
130
CHAPTER 4. SYSTEMS OF LINEAR EQUATIONS AND MATRICES
The equivalence of Conditions 1 and 2 follows from the Gauss-Jordan method and the
equivalence of 1 and 3 is just Theorem 1. The equivalences with Conditions 4 and 5 are
not difficult to prove; see Exercise 3. For completeness, we have included Condition 6
involving the determinant, which we shall discuss in the following section.
Exercises
a 0
1. (a) For any number a, let A =
. Show that there is no (2 × 2)-matrix C
0 0
so that AC = I.
0 b
(b) For any number b, let B =
. Show that there is no (2 × 2)-matrix C
0 0
so that BC = I.
2. If A is invertible and k is a positive integer, show that Ak is invertible and
(Ak )−1 = (A−1 )k .
3. (a) Show that Invertibility Conditions 3 and 4 are equivalent.
(b) Show that Invertibility Conditions 4 and 5 are equivalent.
Hint: are there free variables?
4. Find the inverses of the following matrices (if they exist):
1 −1
3 2
3 −2
(a)
,
(b)
,
(c)
,
1 2
4 3
−6 4






2 7 3
1 2 −3
1 5 1
(d) 2 5 0, (e) 1 3 2, (f)  2 6 −2 Sol’n
3 7 9
−1 1 4
2 7 1
5. Find the inverses of the
3 7
(a) A =
,
2 5

1 3
(c) A = 2 8
3 10
following matrices A and use them to solve Ax = b:
−1
3 2
5
b=
Sol’n ,
(b) A =
, b=
,
3
5 4
6

 


 
2
1
1 4 3
6
3 , b = 1,
(d) A = 1 4 5 , b = 0.
6
2
2 5 1
6
6. Write the following systems as Ax = b, then find A−1 and the solution x:
(a)
5x1 + 12x2 = 5
7x1 + 17x2 = 5
Sol’n
(b)
x1 + 4x2 + 13x3 = 5
3x1 + 2x2 + 12x3 = −1
7x1 + 9x2 = 3
5x1 + 7x2 = 2
2x1 + x2 + 3x3 = 3
x1 + x2 + 5x3 = 5
(c)
.
(d)
x1 − x2 + 2x3 = 6
3x1 + 3x2 + 5x3 = 9
4.5. THE DETERMINANT OF A SQUARE MATRIX
4.5
131
The Determinant of a Square Matrix
For an (n × n)-matrix A we want to assign a number det(A) that can be used to
determine whether A is invertible. Recall the definition of det(A) for n = 2,
a b
det
= ad − bc,
(4.15)
c d
and the fact that Lemma 1 in Section 2.2 showed A is invertible if and only if det(A)6= 0.
The goal of this section is to achieve this for n > 2.
Before we give the definition of det(A) for general n ≥ 2, let us list some of the
properties that we want to hold. The first three pertain to the effects that elementary
row operations have upon the determinant; the fourth refers to matrices in upper
triangular form, i.e. having all zeros below the main diagonal:


a11 a12 a13 · · · a1n
 0 a22 a23 · · · a2n 


 0
0 a33 · · · a3n 


 ..

..
..
..
 .

.
.
.
0
0
0 · · · ann
A matrix with all zeros above the main diagonal is in lower triangular form.
Desired Properties of the Determinant. Let A and B be (n × n)-matrices.
P1: If B is obtained from A by interchanging two rows, then det(B) = −det(A).
P2: If B is obtained from A by multiplying one row by k then det(B) = k det(A).
P3: If B is obtained from A by adding a multiple of one row to another row
(leaving the first row unchanged), then det(B) = det(A).
P4: If A is an upper (or lower) triangular matrix, then det(A)= a11 a22 · · · ann .
It is easy to see that these properties are satisfied by the definition (4.15) for (2 × 2)matrices. Once we give the definition of det(A) for (n×n)-matrices, we shall show these
properties hold in general, but for now let us show that they imply the invertibility
condition that we mentioned in the previous section:
Theorem 1. If A is a square matrix, then A is invertible if and only if det(A) 6= 0.
Proof. We already know that A is invertible if and only if Gaussian elimination makes
A ∼ I. But P1-P3 show that ERO’s cannot change a nonzero determinant to a zero
determinant. (Recall that multiplication of a row by k is an ERO only if k is nonzero.)
Since P4 implies det(I)= 1 6= 0, we conclude that det(A) 6= 0 is equivalent to A being
invertible.
2
132
CHAPTER 4. SYSTEMS OF LINEAR EQUATIONS AND MATRICES
Remark 1. Sometimes it is convenient to write |A| instead of det(A), but we must
remember that |A| could be negative!
Before turning to the formal definition, let us use the above properties to actually
compute the determinant of some square matrices.
Example 1. Calculate the determinant of

2
A = −1
4
Apply P1 to R1 with
R2 ; then apply P2 to R1
with k = −1

1 3
2 6 .
1 12
Solution. We use ERO’s with P1-P3 to put the matrix into upper triangular form and
then use P4 to calculate the determinant:
2
−1
4
Apply P3 first to R1 and
R2 ; then to R1 and R3
1
2
1
3
−1
6 =− 2
12
4
1
= 0
4
Apply P2 to R2 and to R3 ;
then apply P3 to R2 and
R3 ; finally use P4
1
= 45 0
0
−2
1
1
6
1
3 = 2
12
4
−2
1
1
1
−6
15 = 0
0
12
−2
5
9
−6
15
36
−6
1
3 = 45 0
4
0
−2
1
0
−6
3 = 45.
1
−2
5
1
2
1
1
Example 2. Calculate the determinant of

0
1
−1 0
A=
 1 −1
−1 −1
−6
3
12
2

−1 1
1 1
.
0 1
−1 0
Solution. Again we use ERO’s to put the matrix into upper triangular form so that
we can use P4:
0
−1
1
−1
1
0
−1
−1
−1 1
1
1 1
0
=
0 1
1
−1 0
−1
1 0
0 1
=
0 0
0 0
−1
−1
0
−3
0
1
−1
−1
−1
−1
0
−1
−1
1
1
0
=
3
0
0
0
−1
1
1
0
=
1
0
0
0
0
1
0
0
−1
−1
3
0
0
1
−1
−1
−1
−1
1
−2
−1
1
= 9.
0
3
−1
1
2
−1
2
4.5. THE DETERMINANT OF A SQUARE MATRIX
133
Defining the Determinant using Permutations
A permutation p1 , . . . , pn of the integers 1, . . . , n is just a reordering of the ordered
n-tuple (1, . . . , n). For example, with n = 4, we could take (p1 , p2 , p3 , p4 ) = (2, 1, 3, 4)
or (p1 , p2 , p3 , p4 ) = (1, 2, 4, 3); these are both obtained by a simple interchange of
two elements of (1, 2, 3, 4). In general, every permutation can be realized as a sequence
of simple interchanges of the elements of (1, . . . , n), and it is easy to see that there
are n! permutations of (1, . . . , n). The permutation is called even or odd depending
on whether it requires an even or odd number of such interchanges. (Although a given
permutation can be realized in several different ways as sequences of simple interchanges,
the number of such interchanges is either even or odd; this fact is not entirely obvious,
but we shall not bother to give a formal proof.) For a given permutation (p1 , . . . , pn )
we define its sign by
(
1,
if (p1 , . . . , pn ) is an even permutation of (1, . . . , n)
σ(p1 , . . . , pn ) =
(4.16)
−1, if (p1 , . . . , pn ) is an odd permutation of (1, . . . , n).
Note that a simple interchange of two elements of a permutation changes its sign. For
example, with n = 3, we have σ(1, 2, 3) = 1, σ(2, 1, 3) = −1, σ(2, 3, 1) = 1, etc.
Now suppose that we are given a square matrix A with elements aij . The product of
the elements on the diagonal is a11 a22 · · · ann ; notice that each row and each column of
A contributes a factor to this n-product. We want to consider other n-products to which
each row and column contributes a factor. Such a general n-product may be written
as a1p1 a2p2 · · · anpn where (p1 , . . . , pn ) is a permutation of (1, . . . , n). The determinant
of A is then defined by summing these n-products over all n! such permutations, and
using the sign of each permutation:
Definition 1. For an n × n-matrix A = (aij ), we define its determinant by
X
det(A) =
σ(p1 , p2 , . . . , pn ) a1p1 a2p2 · · · anpn ,
where the sum is taken over all n! permutations (p1 , p2 , . . . , pn ) of (1, 2, . . . , n).
Let us verify that this definition coincides with (4.15) when n = 2. But for n = 2,
the only permutations of (1, 2) are (1, 2) itself and (2, 1). Moreover, σ(1, 2) = 1 and
σ(2, 1) = −1, so
a11 a12
det
= a11 a22 − a12 a21 ,
a21 a22
which agrees with (4.15).
For n = 3, let us carry out the calculation of det(A) and obtain a way of remembering
the result. We have 3! = 6 permutations of (1, 2, 3) and their signs are
σ(1, 2, 3) = σ(2, 3, 1) = σ(3, 1, 2) = 1
σ(1, 3, 2) = σ(2, 1, 3) = σ(3, 2, 1) = −1.
We conclude that

a11 a12
deta21 a22
a31 a32

a13
a23  =
a33
a11 a22 a33 + a12 a23 a31 + a13 a21 a32
−a11 a23 a32 − a12 a21 a33 − a13 a22 a31 .
(4.17)
134
CHAPTER 4. SYSTEMS OF LINEAR EQUATIONS AND MATRICES
This result may be remembered as follows. First create a 3 × 5 matrix by repeating the
first two columns as columns four and five respectively:
a11
a21
a31
a12
a22
a32
a13
a23
a33
a11
a21
a31
a12
a22
a32
Then add together the 3-products on the three downward diagonals and subtract from
them the 3-products on the three upward diagonals. The result is exactly det(A).
Before we show that det indeed satisfies the desired properties P1-P4, let us add two
additional properties:
P5: If two rows of A are the same, then det(A) = 0.
P6: Suppose that A, B, and C are all (n × n)-matrices which are identical except
the ith row of A is the sum of the ith row of B and the ith row of C; then
det(A)=det(B)+det(C).
Theorem 2. The determinant, i.e. det, satisfies the properties P1-P6.
Proof. Interchanging two rows of a matrix affects the determinant by changing each
even permutation to an odd one and vice versa (see Exercise 3), so P1 holds. Since each
of the products a1p1 · · · anpn contains exactly one element of each row, multiplying one
row of a matrix by k multiplies the product a1p1 · · · anpn by k; thus P2 holds. Turning to
P5, if two rows of A are equal, then interchanging them does not affect A, but changes
the sign of det(A) according to P1; thus det(A)= −det(A), which implies det(A)=0.
Going on to P6, we suppose that the elements of A, B, and C are all ajk for j 6= i and
aik = bik + cik
Then
for k = 1, . . . , n.
det(A) =
X
σ(p1 , . . . , pn )a1p1 a2p2 · · · aipi · · · anpn
=
X
σ(p1 , . . . , pn )a1p1 a2p2 · · · (bipi + cipi ) · · · anpn
=
X
σ(p1 , . . . , pn )a1p1 a2p2 · · · bipi · · · anpn
X
+
σ(p1 , . . . , pn )a1p1 a2p2 · · · cipi · · · anpn
= det(B) + det(C),
proving P6.
Finally, we prove P4. If A= (aij ) is upper triangular, then aij = 0 whenever i > j.
So the only nonzero terms
σ(p1 , . . . , pn ) = a1p1 · · · anpn
occur when pi ≥ i. Since the pi must be distinct, the only possibility is pi = i for
i = 1, . . . , n. Thus the sum reduces to a single term:
det(A) = σ(1, 2, . . . , n)a11 a22 · · · ann = a11 a22 · · · ann .
This proves P4.
2
There is one more useful property of determinants that relates to the transpose that
was discussed at the end of Section 4.1:
4.5. THE DETERMINANT OF A SQUARE MATRIX
135
P7: det AT = det(A).
Proof of P7. We recall that aTij = aji , so
X
det(AT ) =
σ(p1 , p2 , . . . , pn ) ap1 1 ap2 2 · · · apn n .
(4.18)
But (p1 , p2 , . . . , pn ) is a permutation of (1, 2, . . . , n), so by rearranging the factors we
have
ap1 1 ap2 2 · · · apn n = a1q1 a2q2 · · · anqn ,
(4.19)
for some permutation (q1 , q2 , . . . , qn ) of (1, 2, . . . , n). Now we claim that
σ(p1 , p2 , . . . , pn ) = σ(q1 , q2 , . . . , qn ).
(4.20)
The reason (4.20) is true is that (4.19) implies the number of simple interchanges of
(1, 2, . . . , n) to achieve (p1 , p2 , . . . , pn ) must equal the number of simple interchanges of
(1, 2, . . . , n) to achieve
(q1 , q2 , . . . , qn ). But now plugging (4.19) and (4.20) into (4.18),
we find that det AT = det(A).
2
Remark 2. Using P7 it is clear that in properties P1-P3 and P5-P6 the word “row”
may be replaced by the word “column,” and why P4 holds for upper triangular matrices
as well as lower triangular ones.
Exercises
1. Use properties P1-P4 to find the determinants of the following matrices:




3 1 −1
1 2 3
(b) 5 3 1  Solution
(a) 2 5 8 
2 2 0
3 7 13




0
1 −1 1
1 −1 0 4
−1 0
3
1 −1
1
2 4


(d) 
(c) 


−1 1
1 −1 0
1
3 2
−1 1 −1 0
2 −2 −2 2
2. Find the values of k for which the system has a nontrivial solution:
(a)
x1 + kx2 = 0
kx1 + 9x2 = 0
x1 + 2x2 + kx3 = 0
Solution
(b)
2x1 + kx2 + x3 = 0
−x1 − 2x2 + kx3 = 0
3. Calculate the following signs of permutations (with n = 4):
(a)
σ(1, 3, 2, 4),
4. Use properties

i
(a) 1
0
(b)
σ(3, 4, 1, 2),
(c)
σ(4, 3, 2, 1).
P1-P4 to find the determinants of the following


1+i
−1 0
−i −1 Solution
(b)  1
1
1
2i
complex matrices:

1 i
0 1
1 0
5. Another important property of determinants is det(AB) = det(A) det(B). By
direct calculation, verify this formula for any (2 × 2)−matrices A and B.
136
4.6
CHAPTER 4. SYSTEMS OF LINEAR EQUATIONS AND MATRICES
Cofactor Expansions
In this section we derive another method for calculating the determinant of a square
matrix. We also discuss another method for computing the inverse matrix. First we
introduce some terminology.
Definition 1. For any element aij of an (n × n)-matrix A, the minor Mij of aij is the
determinant of the (n − 1) × (n − 1) matrix obtained by deleting that row and column.
The cofactor Cij of aij is (−1)i+j Mij .
The signs of
alternate ±:

+ − +
− + −

+ − +

..
..
..
.
.
.
the cofactors
For example, if we consider the matrix

···
· · ·

· · ·

..
.

1
A = 0
2
2
1
3

3
−1 ,
2
we see that a23 = −1, its minor is
M23 =
1
2
2
= 3 − 4 = −1,
3
and its cofactor is
C23 = (−1)2+3 M23 = (−1)(−1) = 1.
If we choose any row of A, multiply each element by its cofactor and take the sum,
we obtain the cofactor expansion along that row. For example, the cofactor expansion
along the first row is
a11 C11 + a12 C12 + · · · + a1n C1n = a11 M11 − a12 M12 + · · · + (−1)1+n M1n .
We can also compute the cofactor expansion along any column of A. The significance
of cofactor expansions is that they may be used to compute the determinant.
Theorem 1. For an (n × n)-matrix A, the cofactor expansion along any row or
column is det(A).
We shall prove this theorem below, but first let us see it in action. To see how it works
for a (2 × 2)-matrix, let us select the first row for our cofactor expansion. The theorem
tells us
a
a12
det 11
= (−1)1+1 a11 M11 + (−1)1+2 a12 M12 = a11 M11 − a12 M12 .
a21 a22
But M11 = a22 and M12 = a21 so we obtain det(A) = a11 a22 − a12 a21 , which is the
expected result. Similarly, let us use the first row for a (3 × 3)-matrix:


a11 a12 a13
det a21 a22 a23  = a11 M11 − a12 M12 + a13 M13 ,
a31 a32 a33
4.6. COFACTOR EXPANSIONS
137
where
M11 =
a22
a32
a23
,
a33
a21
a31
M12 =
a23
,
a33
M13 =
a21
a31
a22
.
a32
If we evaluate these 2 × 2 determinants, the result agrees with (4.17). Of course, we did
not have to use the first row for these cofactor expansions and in some cases it makes
more sense to choose other rows or columns.
Example 1. Calculate the determinant

1
0
A=
2
1
of the matrix

2 3 2
1 −1 0
,
3 2 1
0 1 0
Solution. We first select a row or column for the cofactor expansion, and we may as
well choose one that simplifies the calculation. Note that the second row has two zeros,
so let us use that. (The last row or last column would work equally well.)
1
0
2
1
2
1
3
0
3
−1
2
1
2
1
0
= 2
1
1
0
3
2
1
1
2
1 − (−1) 2
1
0
2
3
0
1
2
1 = 2
1
0
3
2
1
1
2
1 + 2
1
0
2
3
0
2
1
0
We now must evaluate the determinants of two (3 × 3)-matrices. For the first of these,
let us use the last row for the cofactor expansion since it has a zero (although we might
have chosen the last column for the same reason):
1
2
1
3
2
1
2
3
1 =
2
0
1
2
−
2
1
2
= (3 − 4) − (1 − 4) = 2.
1
We also use the last row to evaluate the determinant of the second 3 × 3 matrix:
1
2
1
2
3
0
2
2
1 =
3
0
2
= 2 − 6 = −4.
1
We conclude that det(A)= 2 − 4 = −2.
2
We can sometimes use elementary row operations to simplify the matrix before we
use a cofactor expansion.
Example 2. Calculate the determinant of

1
A = −1
2
the matrix

2
3
−1 −2 ,
3
4
Solution. None of the rows or columns contain zeros, but we can use ERO’s to change
this fact:
138
CHAPTER 4. SYSTEMS OF LINEAR EQUATIONS AND MATRICES
1
−1
2
2
−1
3
1
3
−2 = 0
0
4
2
1
−1
3
1
1 =
−1
−2
1
= −2 + 1 = −1.
−2
2
Now we prove that cofactor expansions indeed compute the determinant.
Proof of Theorem 1. We first show that det(A) may be computed using a cofactor
expansion along the first row. Note that
X
det(A) =
σ(p1 , p2 , . . . , pn ) a1p1 a2p2 · · · anpn
X
= a11
σ(1, p2 , . . . , pn )a2p2 · · · anpn
pi 6=1
+ a12
X
σ(2, p2 , . . . , pn )a2p2 · · · anpn
pi 6=2
(4.21)
..
.
+ a1n
X
σ(n, p2 , . . . , pn )a2p2 · · · anpn
pi 6=n
Now we observe that the (n − 1) × (n − 1)-matrix obtained by deleting the first row and
column is (apq )np,q=2 , and its determinant is the minor for a11 :
M11 =
X
pi 6=1
σ(p2 , . . . , pn )a2p2 · · · anpn =
X
σ(1, p2 , . . . , pn )a2p2 · · · anpn ,
pi 6=1
since σ(1, p2 , . . . , pn ) = σ(p2 , . . . , pn ), which is the sign of the permutation (p2 , . . . , pn )
of (2, . . . , n). Similarly, the (n − 1) × (n − 1)-matrix obtained by deleting the first row
and second column is (apq )p6=1,q6=2 , and its determinant is
X
X
M12 =
σ(p2 , . . . , pn )a2p2 · · · anpn = −
σ(2, p2 , . . . , pn )a2p2 · · · anpn ,
pi 6=2
pi 6=2
since σ(2, p2 , . . . , pn ) = −σ(p2 , 2, p3 , . . . , pn ) = −σ(p2 , p3 , . . . , pn ) for any permutation
(p2 , . . . , pn ) of (1, 3, . . . , n). But recall that C12 = −M12 . Continuing in this fashion we
find that (4.21) is just the cofactor expansion along the first row:
det(A) = a11 C11 + a12 C12 + · · · + a1n C1n .
Now suppose that we want to use the second row for the cofactor expansion. Let
A0 denote the matrix obtained by switching the first and second rows of A. Then
the cofactor expansion along the first row of A0 , which we know from the above is
det(A0 ), is the negative of the cofactor expansion of A along its second row (since each
factor (−1)2+j has changed sign). But from Property P1 of determinants, we also have
det(A0 )= −det(A), so the cofactor expansion of A along its second row indeed yields
det(A). Expansion along other rows of A are treated analogously.
If we want to use a cofactor expansion along a column, we note that a column of A
is a row of AT and we have det(AT )=det(A).
2
4.6. COFACTOR EXPANSIONS
139
Adjoint Method for Computing the Inverse of a Matrix
In Section 4.4 we used the Gauss-Jordan method to compute the inverse of a square
matrix. Here we discuss another method for computing the inverse that uses cofactors.
First we need some additional terminology:
Definition 2. If A is an n × n matrix with elements aij , let Cij denote the cofactor
for aij and C denote the cofactor matrix, i.e. C = (Cij ). Then the adjoint matrix
for A is the transpose of C:
Adj(A) = CT .
Now we show how to use the adjoint matrix to compute the inverse of a matrix.
Theorem 2. If A is a square matrix with det(A) 6= 0, then
A−1 =
1
Adj(A).
det(A)
(4.22)
Let us confirm that this coincides with the formula in Section 4.4 for (2 × 2)-matrices:
a b
d −c
d −b
A=
⇒ C=
⇒ Adj(A) =
c d
−b a
−c a
so we obtain the familiar formula
A
−1
1
d −b
.
=
ad − bc −c a
Now let us use (4.22)
Example 3. Find the inverse of the matrix

1
2
A = −1 −1
2
3

3
−2 ,
4
Solution. In Example 2 we computed det(A) = −1, so A is invertible. Let us compute
the cofactors:
−1 −2
= 2,
3
4
2 3
C21 = −
= 1,
3 4
2
3
C31 =
= −1,
−1 −2
C11 =
So

−1 −2
= 0,
2
4
1 3
C22 =
= −2,
2 4
1
3
C32 = −
= −1,
−1 −2
2
Adj(A) = CT =  0
−1
C12 = −
1
−2
1

−1
−1
1
⇒
A−1
−1
= −1,
3
1 2
C23 = −
= 1, .
2 3
1
2
C33 =
=1
−1 −1
C13 =

−2
= 0
1
−1
2
−1
2
−1

1
1 .
−1
140
CHAPTER 4. SYSTEMS OF LINEAR EQUATIONS AND MATRICES
Proof of Theorem 2. Theorem 1 states that, if we multiply the elements of any row
(or column) of A by their respective cofactors and take the sum, we obtain det(A):
ai1 Ci1 + · · · ain Cin = det(A).
(4.23)
However, we now want to show that if we multiply the elements of any row by the
cofactors of a different row and take the sum, we get zero:
ai1 Cj1 + · · · ain Cjn = 0
if i 6= j.
(4.24)
To show this, let B denote the matrix obtained by adding the ith row to the jth row
of A. By Property P3, det(B)=det(A). If we take the cofactor expansion along the jth
row of B, we obtain
det(B) =
n
X
(ajk + aik )Cjk =
k=1
n
X
ajk Cjk +
k=1
n
X
aik Cjk .
k=1
But the first sum on the right is just det(A), so we have
det(A) = det(A) +
n
X
aik Cjk ,
k=1
which proves (4.24). (A similar result holds for elements and cofactors in different
columns of A.)
Now let B= [det(A)]−1 Adj(A); to prove (4.22) we want to show AB = I = BA.
We compute the elements of AB using (4.23) and (4.24):
(
n
n
n
X
X
X
1, if i = j
1
1
Adj(A)kj =
aik Cjk =
(AB)ij =
aik bkj =
aik
det(A)
det(A)
0, if i 6= j.
k=1
k=1
k=1
This shows that AB = I and a similar calculation shows BA = I.
2
Computational Efficiency
We now have two different methods for computing the determinant (and finding the
inverse) of an (n × n)-matrix: (a) using elementary row operations and properties of
the determinant as in Section 4.5, and (b) using cofactor expansions as in this section.
However, for large values of n, computing a determinant by method (b) requires many
more calculations than method (a). For example, if we compare the number of multiplications (which are computationally more demanding than additions) we find that
method (b) requires n! multiplications, while method (a) requires fewer than n3 . But
n! grows much more rapidly than n3 as n increases. For example, 20! = 2.4 × 1018 , so to
compute the determinant of a (20 × 20)-matrix by method (a) requires less than 4, 000
multiplications, but by method (b) requires 2.4 × 1018 multiplications! Similarly, the
number of multiplications involved in using (4.22) to calculate the inverse of a matrix is
much larger than those involved in the Gauss-Jordan algorithm. For this reason, computational programs use ERO’s instead of cofactor expansions to compute determinants
and inverses of matrices.
4.6. COFACTOR EXPANSIONS
141
Exercises
1. Use a cofactor expansion to compute the following determinants:




1 2 3
1 2 3
(a) det0 1 0
(b) det0 1 1
4 5 6
4 5 6




1 2 3 4
0 1 −1
2 −1 1 0

(c) det2 0 −2
(d) det
3 2 0 0 Solution
4 2 1
4 3 0 0




2 0 3 1
1 0 −1 0
1 4 −2 3
 0 1 0 −1


(f) det
(e) det
0 2 −1 0
−1 0 −1 0 
1 3 −2 4
0 1 0
1
2. Use elementary row operations to simplify and then perform a cofactor expansion
to evaluate the following determinants:




−1 1
2
−1 1
2
1 −2
(b) det 1
(a) det 1 −1 −2
2 −1 −2
2 −1 −2




2 1 4 2
1 2 −1 3
5 5 −3 7
2 4 −1 6 


(d) det
(c) det
6 3 10 3
3 1 5 −1
4 2 −4 4
6 2 9 −2
3. Use Theorem 2 to find the inverses for the following matrices:




3 5 2
2 3 0
(b) −2 3 −4
(a) 2 1 5
−5 0 5
0 −1 2
4. To illustrate the issue of computational efficiency, compute A−1 for the following
(4 × 4)-matrix A in two different ways: a) using the Gauss-Jordan method, and
b) using (4.22). Which did you find to be shorter?


1 0 −1 0
0 −1 1
0

A=
1 0
0 −1
0 1
0
0
142
4.7
CHAPTER 4. SYSTEMS OF LINEAR EQUATIONS AND MATRICES
Additional Exercises
1. The equations
a1 x + b1 y + c1 z = d1
a2 x + b2 y + c2 z = d2
define two planes in R3 . Are the usual three cases (a unique solution, no solution,
or an infinite number of solutions) all possibilities? Give a geometric explanation.
2. The equations
a1 x + b1 y + c1 z = 0
a2 x + b2 y + c2 z = 0
3
define two planes in R . Give a geometric explanation for why there must be an
infinite number of solutions.
3. If A and B are (n×n)-matrices, is it always true that (A+B)(A−B) = A2 −B2 ?
4. If A and B are upper triangular matrices of the same size, prove that AB is upper
triangular.
5. If A and B are symmetric (n×n)-matrices, is it always true that AB is symmetric?
6. If A is any matrix (not necessarily square), prove that AAT is a symmetric matrix.
7. If A is invertible and symmetric, show that A−1 is also symmetric.
8. If A is symmetric and B is any matrix of the same size, show that BT AB is
symmetric.
9. A square matrix A is called skew-symmetric if AT = −A. Show that a skewsymmetric matrix must have zeros along its main diagonal.
10. If A is an (n×n)-matrix, its trace tr(A) is the sum of the main diagonal elements:
a11 +a22 +· · ·+ann . If B is also an (n×n)-matrix, show tr(A+B) = tr(A)+tr(B).
11. A square matrix A is called nilpotent if Ak is the zero matrix for some positive
integer k. Determine which of the following matrices are nilpotent.
0 1
1 0
A=
,
B=
0 0
0 −1




0 1 2
1 2 3
C = 0 0 3,
D = 0 0 0
0 0 0
0 0 0
a b
12. (a) If A =
, show that A2 = (a + d)A − (ad − bc)I.
c d
(b)
(c)
(d)
(e)
Find
Find
Find
Find
a
a
a
a
matrix
matrix
matrix
matrix
A without ±1 on the main diagonal such that A2 = I.
A with zeros on its main diagonal such that A2 = I.
A with zeros on its main diagonal such that A2 = −I.
A 6= 0 and A 6= I such that A2 = A.
4.7. ADDITIONAL EXERCISES
143
Determine the solution set for the system of linear equations
x1 + 3x2 + 2x3 = 5
x1 + 3x2 + 2x3 = 5
13.
x1 − x2 + 3x3 = 3
14.
x1 − x2 + 3x3 = 3
3x1 + x2 + 8x3 = 11
3x1 + x2 + 8x3 = 0
3x1 + 2x2 − x4 − 2x5 = 0
2x1 + x2 + 4x4 = −1
3x1 + x2 + 5x4 = −1
15.
5x1 + 2x2 + x3 − 3x4 − x5 = 0
ix1 + x2 + (i − 1)x4 = 0
(2 − i)x1 − x2 + 2(1 − i)x3 = 2 + i
ix1 + x2 + 2ix3 = i
17.
2x1 + x2 − x4 − x5 = 0
16.
4x1 + x2 + x3 + 9x4 = −3
18.
ix1 + 2x2 + (i − 2)x4 = 0
−ix1 + x3 = 0
x1 + ix2 = 1 + 2i
Determine the values of k for which the system of linear equations a) has a unique
solution, b) has no solutions, and c) has an infinite number of solutions.
x1 − x2 = 2
19.
x1 − x2 + ix3 = 1
3x1 − x2 + x3 = 7
20.
x1 − 3x2 − k 2 x3 = −k
ix1 + (1 − i)x2 = −1 + i
ix1 − ix2 + k 2 x3 = k
Find the rank of the matrix A and find the solutions in vector form for Ax=b.

 
9
3 −3
6
4  , b = 0
21. A =  2 −2
0
−7 7 −14

 

1
4 2 1
1
3 3 2

 
23. A = 
3 3 3 , b = 1
1
3 2 1


0
22. A = 0
0

1
3
24. A = 
2
2
1
3
2
1
1
3
−3
2
1
0

1
2 ,
1
 
1
b = 0
0

 
2
0 1
8
−2 3
, b =  
3
1 2
9
−5 2
For a linear system Ax = b, let r = rank(A) and r# = rank(A|b).
25. If r < r# , show that the system is inconsistent and has no solution.
26. If r = r# , show that the system is consistent, and
(a) there is a unique solution if and only if r# = n,
(b) there are an infinite number of solutions if and only if r# < n.
Find the inverse of A and use it to solve the matrix equation AX = B.
27.

1
A = 2
1
5
1
7

1
−2 ,
2

1
B= 0
−1

0 1
3 0
0 1
144
28.
CHAPTER 4. SYSTEMS OF LINEAR EQUATIONS AND MATRICES

6
A = 5
3
5
3
4

3
2 ,
2

2
B = 1
1

−1 1
0 1
−1 2
For the next two problems, assume that A and B are both (n × n)-matrices.
29. If AB = I, show that both A and B are invertible and B = A−1 .
30. If AB is invertible, show that both A and B are invertible.
Chapter 5
Vector Spaces
5.1
Vectors in Rn
In this chapter we want to discuss vector spaces and the concepts associated with
them: subspaces, linear independence of vectors, bases, and dimension. We begin in
this section with the vector space Rn , which generalizes the familiar two and threedimensional cases, R2 and R3 .
Rn is the collection of all n-tuples of real numbers (x1 , . . . , xn ); the quantities
x1 , . . . , xn are called coordinates. When n = 2, the coordinates are generally labelled (x, y) and values for x and y represent the location of a point P in the plane;
when n = 3, the coordinates are generally labelled (x, y, z) and locate a point P in
3-dimensional space. For both n = 2 and n = 3, the coordinates of P may also be
−−→
considered as the components of the vector OP that may be visualized as an arrow
pointing from the origin O to the point P , i.e. the arrow has its “tail” at the origin and
its “head” at P (see Figures 1 and 2). As in Section 4.1, we shall generally represent
vectors using bold-face letters such as v, u, etc. In fact, the origin itself may be considered as a vector: for n = 2, we have 0 = (0, 0) and for n = 3, we have 0 = (0, 0, 0). For
n ≥ 4, it is not so easy to visualize (x1 , . . . , xn ) as a point or a vector, but we shall be
able to treat it in exactly the same way as for n = 2 or 3.
n
Definition 3. A vector in R is an n-tuple v of real numbers (v1 , . . . , vn ) called the
components of v, and we write v = (v1 , . . . , vn ). The zero vector is 0 = (0, . . . , 0).
Two important algebraic operations on vectors are addition and scalar multiplication. For n = 2 or n = 3, vector addition is defined geometrically using the parallelogram rule: u + v is the vector obtained by placing the tail of v at the head of u (or
vice versa); see Figure 3. However, the components of u + v are obtained algebraically
simply by adding the components. Similarly, a vector for n = 2 or 3 may be multiplied
by a real number r simply by multiplying component-wise. Generalizing this, we are
able to define vector addition and scalar multiplication in Rn as follows:
145
Fig.1. (x, y) as a point P
and as a vector v in R2
Fig.2. (x, y, z) as a point P
and as a vector v in R3
Fig.3. The parallelogram
rule for vector addition
146
CHAPTER 5. VECTOR SPACES
Vector Addition. If u = (u1 , . . . , un ) and v = (v1 , . . . , vn ) are vectors in Rn , then
their sum is the vector u + v in Rn with components:
u + v = (u1 + v1 , . . . , un + vn ).
Scalar Multiplication. If u = (u1 , . . . , un ) is a vector in Rn and r is any real number
then ru is the vector in Rn with components:
ru = (ru1 , . . . , run ).
Note that real numbers are often called scalars to distinguish them from vectors.
It is not difficult to verify that vector addition and scalar multiplication in Rn satisfy
the following properties:
Rn as a Vector Space. If u, v, w are vectors in Rn and r, s are real numbers, then:
• u+v =v+u
(commutativity of vector addition)
• u + (v + w) = (u + v) + w
• u+0=0+u
(zero element)
• u + (−u) = −u + u = 0
• 1u = u
(associativity of vector addition)
(additive inverse)
(multiplicative identity)
• (rs)u = r(su)
(associativity of scalar multiplication)
• r(u + v) = ru + rv
(distributivity over vector addition)
• (r + s)u = ru + su
(distributivity over scalar addition)
As we shall see in the next section, these properties of Rn pertain to a more general
class of objects called “vector spaces.”
There is another important aspect of vectors in R2 and R3 that carries over to Rn ,
and that is the notion of the magnitude of a vector. Recall that the distance to the
origin of P = (x, y) in R2 or Q = (x, y, z) in R3 is given by the square root of the sum
of the square of the coordinates:
dist(O, P ) =
p
x2 + y 2
or
dist(O, Q) =
p
x2 + y 2 + z 2 .
−−→
−−→
But these quantities also represent the respective lengths of the vectors OP and OQ.
Consequently, the following definition is natural:
5.1. VECTORS IN RN
147
Definition 4. The magnitude or length of the vector v = (v1 , . . . , vn ) in Rn is
q
kvk = v12 + · · · + vn2 .
If kvk = 1, then v is called a unit vector.
Finally, let us mention the special unit vectors that point in the direction of the
coordinate axes:
i = (1, 0)
and
j = (0, 1)
in R2 ,
Fig.4. the unit vectors i
and j in R2
and
i = (1, 0, 0),
j = (0, 1, 0),
and
k = (0, 0, 1)
in R3 .
In Rn , we use a different notation:
e1 = (1, 0, 0, . . . , 0)
e2 = (0, 1, 0, . . . , 0)
..
.
en = (0, 0, 0, . . . , 1)
The significance of these special unit vectors is that other vectors may be expressed in
terms of them. For example,
in R2 :
v = (2, 3)
⇒
v = 2i + 3j,
in R3 : v = (1, −1, 5)
⇒
v = i − j + 5k.
Having a set of vectors that may be used to write other vectors is an important concept
that we shall return to later in this chapter.
Exercises
1. If u = (1, 3) and v = (−1, 4) are vectors in R2 , find the following vectors and
sketch them:
(a) 2u, (b) −v, (c) u + 2v.
2. If u = (2, 0, −1) and v = (−1, 4, 1) are vectors in R3 , find the following vectors
and sketch them:
(a) 2u, (b) −v, (c) u + 2v.
3. If u = (0, 2, 0, −1) and v = (−1, 1, 4, 1) are vectors in R4 , find the following
vectors:
(a) 2u, (b) −v, (c) u + 2v.
4. Find the length of the following vectors:
(a) u = (−1, 1), (b) v = (2, 0, 3), (c) w = (1, −1, 0, 2).
5. If u = (1, 0, −1, 2) and v = (−2, 3, 5, −1), find the length of w = u + v.
6. If v = (1, −1, 2), find a number r so that u = rv is a unit vector.
7. Express v = (2, −3) in terms of i and j.
8. Express v = (2, −3, 5) in terms of i, j, and k.
9. If u = (1, 0, −1) and v = (−2, 4, 6), find w so that 2u + v + 2w = 0.
Fig.5. the unit vectors i, j,
and k in R3
148
5.2
CHAPTER 5. VECTOR SPACES
General Vector Spaces
In this section we give a general definition of a vector space V and discuss several
important examples. We shall also explore some additional properties of vector spaces
that follow from the definition. In the definition, the set of “scalars” is either the real
numbers R or the complex numbers C; if the former then we call V a real vector
space; if the latter then V is a complex vector space.
Definition of a Vector Space
Suppose V is a nonempty set, whose elements we call vectors, on which are defined
both vector addition and multiplication by scalars. We say that V (with its set of
scalars) is a vector space provided the following conditions hold:
Closure under Addition. If u and v are in V , then the sum u + v is in V .
Closure under Scalar Multiplication. If u is in V and r is a scalar, then ru is in V .
Commutativity of Addition. If u and v are in V , then u + v = v + u.
Associativity of Addition. If u, v, and w are in V , then (u + v) + w = u + (v + w).
Existence of Zero Vector. There is a vector 0 in V so that 0 + v = v for all v in
V.
Existence of Additive Inverses. For any v in V there exists a vector w in V such
that v + w = 0. (We usually write w = −v.)
Multiplicative Identity. The scalar 1 satisfies 1v = v for all v in V .
Associativity of Scalar Multiplication. If v is in V and r, s are scalars, then (rs)v =
r(sv).
Distributivity over Vector Addition. If u and v are in V and r is a scalar, then
r(u + v) = ru + rv.
Distributivity over Scalar Addition. If v is in V and r, s are scalars, then (r+s)v =
rv + sv.
This is a long list of conditions, but most of them are pretty intuitive. In fact, the first
two conditions may seem trivial, but they are the most critical conditions: we need to
know that we can add vectors and multiply by scalars and stay within the set V .
Let us now discuss several examples. In each example we need to identify both the
vectors V and the set of scalars, but usually the latter is understood.
Example 1. Let V = Rn . The fact that this is a real vector space was discussed in
the previous section.
2
Example 2. Let V = Cn . Of course, by Cn we mean n-tuples of complex numbers which can be added together component-wise, or multiplied by complex scalars
component-wise, exactly as we did in in the previous example. We leave checking the
remaining conditions that V is a complex vector space as an exercise; see Exercise 1.
5.2. GENERAL VECTOR SPACES
149
(Although it is not natural to do so, we could have taken V = Cn with real scalars and
obtain a “real vector space.”)
2
Example 3. Let M m×n (R) be the collection of (m × n)-matrices with real elements.
We claim this is a real vector space. Considering matrices as “vectors” may seem
strange, but we observed in Section 4.1 that we can add two (m × n)-matrices simply
by adding their corresponding elements, and we can multiply an (m × n)-matrix by a
scalar simply by multiplying element-wise. So V is closed under vector addition and
scalar multiplication; we leave checking the remaining conditions that V is a real vector
space as an exercise; see Exercise 2. (We also could have taken V = M m×n (C), i.e.
the collection of (m × n)-matrices with complex elements, to obtain a complex vector
space.)
2
The next two examples involve vector spaces of functions.
Example 4. Let F(I) denote the real-valued functions on an interval I. At first sight,
this seems a lot more complicated than the previous examples since the “vectors” are
now functions! But we know how to add two functions f and g together, simply by
adding their function values at each point:
(f + g)(x) = f (x) + g(x)
for any x in I.
And we can multiply any real-valued function f by a real number r, simply by multiplying the function value by r at each point:
(rf )(x) = rf (x)
for any x in I.
So V is closed under vector addition and scalar multiplication. Most of the other
conditions are rather easily verified, but let us observe that the “zero vector” is just the
function O(x) that is identically zero at each point of I, and the additive inverse of a
function f in V is just the function −f which takes the value −f (x) at each x in I. 2
The next example shows that vector spaces of functions are relevant for differential
equations.
Example 5. The real-valued solutions of the second-order linear differential equation
y 00 + p(x)y 0 + q(x)y = 0
(5.1)
form a real vector space V . To confirm this, recall from Chapter 2 that we can write
(5.1) as L y = 0, where L y = y 00 + p(x)y 0 + q(x)y. Moreover, L is a linear operator, so
L(c1 y1 + c2 y2 ) = c1 L(y1 ) + c2 L(y2 ).
(5.2)
Now, to show that V is closed under vector addition, we let y1 and y2 be in V so that
L(y1 ) = 0 = L(y2 ). Then we use (5.2) with c1 = 1 = c2 to conclude
L(y1 + y2 ) = L(y1 ) + L(y2 ) = 0 + 0 = 0,
i.e. y1 + y2 is in V . Similarly, to show that V is closed under scalar multiplication, we
let r be a real number and y be in V , i.e. Ly = 0. Then r y is also in V since we can
use (5.2) with c1 = r, y1 = y, and c2 = 0 to conclude
L(r y) = r L y = r 0 = 0,
150
CHAPTER 5. VECTOR SPACES
i.e. r y is in V . (We should be more precise about the differentiability properties of the
functions p and q, as well as the functions in the vector space V , but this would not
change the simple fact that solutions of (5.1) can be added together or multiplied by
scalars.)
2
Additional Properties of Vector Spaces
Now let us consider some additional properties of vector spaces that follow from the
definition:
Theorem 1. Suppose V is a vector space. Then the following hold:
(a) 0 v = 0 for any vector v in V .
(b) r 0 = 0 for any scalar r.
(c) The zero vector is unique.
(d) For any vector v in V , its additive inverse is unique.
(e) For any vector v in V , its additive inverse is (−1)v.
(f ) If r v = 0 for some scalar r and v in V , then either r = 0 or v = 0.
Most of these properites, like 0 v = 0 or r 0 = 0, seem intuitively obvious because they
are trivial for V = Rn (and other examples). But we have introduced the definition of
a vector space in a general context, so we need to prove the properties in this context;
this means being rather pedantic in the proofs below. The benefit, however, is that once
we have shown that a property is true for a general vector space, we know it is true in
all instances, and we do not need to check it for each example.
Proof. (a) Let v be any vector in V . Multiplication of v by the scalar 0 is in V and
(since 0 = 0 + 0) we know by distributivity that 0v = (0 + 0)v = 0v + 0v. Moreover, the
additive inverse −0v for 0v is in V , so we can add it to both sides and use associativity
of addition to conclude 0 = 0v −0v = (0v +0v)−0v = 0v +(0v −0v) = 0v, as claimed.
(b) Let r be any scalar. The zero vector satisfies 0 = 0 + 0, so by distributivity we have
r0 = r0+r0. Now the additive inverse −r0 for r0 is in V , so we can add it to both sides
and use associativity to conclude 0 = −r0+r0 = −r0+(r0+r0) = (−r0+r0)+r0 = r0,
as claimed.
(c) Suppose there are two zero vectors, 01 and 02 , i.e. v + 01 = v for all v in V and
w + 02 = w for all w in V . Letting v = 02 and w = 01 , we obtain 02 + 01 = 02 and
01 + 02 = 01 . But by commutativity of addition, we know 02 + 01 = 01 + 02 , so we
conclude 01 = 02 , as desired.
(d) Suppose that v has two additive inverses w1 and w2 , i.e. v + w1 = 0 = v + w2 . But
w1 has an additive inverse −w1 which we can add to v + w1 = 0 to conclude v = −w1 .
Substitute this for v in 0 = v + w2 and conclude 0 = −w1 + w2 . Adding w1 to this
yields w1 = w2 , as desired.
(e) For any v in V , we use (a), 0 = 1 + (−1), and distributivity to conclude
0 = 0 v = (1 + (−1))v = v + (−1)v.
Now this shows that (−1)v is an additive inverse for v; by the uniqueness in (d) we
conclude that (−1)v is the additive inverse for v, i.e. −v = (−1)v.
(f) If r 6= 0, then r−1 exists. Let us multiply rv = 0 by r−1 and use (b) to obtain
r−1 (rv) = r−1 0 = 0. But r−1 (rv) = (r−1 r)v = v, so v = 0.
2
5.3. SUBSPACES AND SPANNING SETS
151
Exercises
1. Verify that V = Cn is a complex vector space.
2. Verify that V = M m×n (R) is a real vector space.
3. Determine whether the following sets are real vector spaces (and state what fails
if they are not):
(a) The set of all integers (. . . , −1, 0, 1, 2, . . . ), Solution
(b) The set of all rational numbers,
(c) The set of all nonnegative real numbers,
(d) The set of all upper triangular n × n matrices,
(e) The set of all upper triangular square matrices,
(f) The set of all polynomials of degree ≤ 2 (i.e. ax2 + bx + c, a, b, c reals),
(g) The set of all polynomials of degree 2 (i.e. ax2 + bx + c, a, b, c reals, a 6= 0),
(h) The set of all solutions of the differential equation y 00 + y = 0,*
(i) The set of all solutions of the differential equation y 00 + y = sin x.*
* You need not solve the differential equation.
5.3
Subspaces and Spanning Sets
In this section we consider subsets of vector spaces that themselves form vector spaces.
Definition 1. Suppose that S is a nonempty subset of a vector space V . If S itself
is a vector space under the addition and scalar multiplication that is defined in V ,
then we say that S is a subspace of V .
Since S inherits its algebraic structure from V , the main issue is showing that S is
closed under vector addition and scalar multiplication. Thus we have the following.
Theorem 1. A nonempty subset S of a vector space V is itself a vector space if and
only if it is closed under vector addition and scalar multiplication.
Proof. If S is a vector space, by definition it is closed under vector addition and scalar
multiplication. Conversely, suppose S is closed under these operations; we must prove
that the remaining conditions of a vector space hold. But the various commutativity,
associativity, and distributivity conditions are inherited from V ; the multiplicative identity is also inherited, so we need only show that the zero vector and additive inverses
(which we know exist in V ) actually lie in S. But if we choose any vector v in V ,
then 0 v and (−1) v are scalar multiples of v, so lie in V by hypothesis. However, by
Theorem 1 of Section 5.2, 0 v is the zero vector 0 and (−1) v is the additive inverse of
v. We conclude that S is indeed a vector space.
2
Let us discuss several examples. The first one has several variations within it.
Example 1. Let V = Rn and let a = (a1 , . . . , an ) be a nonzero vector. Let S be the
set of vectors v = (x1 , . . . , xn ) in Rn whose dot product with a is zero:
a · v = a1 x1 + · · · + an xn = 0.
(5.3)
152
CHAPTER 5. VECTOR SPACES
We easily check that S is closed under addition: if v = (x1 , . . . , xn ) and w = (y1 , . . . , yn )
are in S, then v + w = (x1 + y1 , . . . , xn + yn ) and
a · (v + w) = a1 (x1 +y1 ) + · · · + an (xn +yn )
= (a1 x1 + · · · +an xn ) + (a1 y1 + · · · +an yn ) = 0 + 0 = 0
shows that v+w is in S. We also easily check that S is closed under scalar multiplication:
if v = (x1 , . . . , xn ) is in S and r is any scalar, then rv = (rx1 , . . . , rxn ) and
a · (rv) = a1 (r x1 ) + · · · + an (r xn ) = r(a1 x1 + · · · + an xn ) = r 0 = 0
shows that rv is in S. By Theorem 1, S is a subspace.
Let us consider the special case a = (0, . . . , 0, 1). Then (5.3) implies that xn = 0; in
other words, S is the set of vectors of the form v = (x1 , . . . , xn−1 , 0) where x1 , . . . , xn−1
are unrestricted. But this means that S can be identified with Rn−1 .
Now suppose that n = 3. For those familiar with multivariable calculus, (5.3) says
that S is the set of vectors perpendicular to the vector a. In other words, S is a twodimensional plane in V = R3 that passes through the origin (since S must contain the
zero vector).
2
Fig.1. S = Rn−1 as a
subspace in Rn
The next example shows that the solutions of a homogeneous linear system Ax = 0
form a subspace.
Example 2. Let A be an (m × n)-matrix and let S be the vectors x in Rn satisfying
Ax = 0. We easily verify that S is closed under addition: if x and y are in S, then
A(x + y) = Ax + Ay = 0 + 0 = 0,
so x + y is in S. And also verify that S is closed under scalar multiplication: if x is in
S and r is any scalar then
A(r x) = r Ax = r 0 = 0,
so rx is in S. We conclude that S is a subspace of Rn .
2
Example 2 is so important that it leads us to make the following definition.
Definition 2. If A is an (m × n)-matrix, then the solution space for Ax = 0 is a
subspace of Rn called the nullspace of A and is denoted N (A).
The next example concerns functions as vectors, as in Example 4 in Section 5.2. In
fact, let F denote the real-valued functions on (−∞, ∞), which is a real vector space.
Example 3. Let Pn denote the polynomials with real coefficients and degree ≤ n:
p(x) = a0 + a1 x + · · · an xn ,
where a0 , . . . , an are real numbers.
Then Pn is a subspace of F since the sum of two polynomials in Pn is also a polynomial
in Pn , and the scalar multiple of a polynomial in Pn is also in Pn .
2
Having given several examples of subspaces of vector spaces, let us consider some
subsets of R2 that fail to be subspaces (and the reason that they fail):
5.3. SUBSPACES AND SPANNING SETS
153
• The union of the x-coordinate axis and the y-coordinate axis (not closed under
vector addition);
• The upper half plane H+ = {(x, y) : y ≥ 0} (not closed under multiplication by
negative scalars);
• The points on the parabola y = x2 (not closed under either vector addition or
scalar multiplication).
Linear Combinations and Spanning Sets
There are another couple of important concepts concerning vector spaces and their
subspaces: linear combinations of vectors and spanning sets. To help explain these
concepts, let us consider a special case.
Suppose that v1 and v2 are two nonzero vectors in R3 . Let S be the vectors in R3
that can be written as v = c1 v1 + c2 v2 for some constants c1 , c2 . We see that S is
closed under addition since
v = c1 v1 + c2 v2
⇒
and w = d1 v1 + d2 v2
v + w = (c1 + d1 )v1 + (c2 + d2 )v2 ,
and closed under scalar multiplication since
v = c1 v1 + c2 v2
and r is a scalar
⇒
rv = (r c1 )v1 + (r c2 )v2 .
Consequently, S is a subspace of R3 . Note that S is generally a plane in R3 (see Figure
2); however, if v1 and v2 are co-linear, i.e. v1 = kv2 for some scalar k, then S is the
line in R3 containing v1 and v2 (see Figure 3). In any case, S passes through the origin.
Taking the sum of constant multiples of a given set of vectors as we did above is so
important that we give it a name:
Fig.2. Two nonzero
vectors v1 , v2 in R3
Definition 3. If v1 , . . . , vn are vectors in a vector space V , then a linear combination of these vectors is any vector of the form
v = c1 v1 + · · · + cn vn
where the c1 , . . . , cn are scalars.
(5.4)
The collection of all linear combinations (5.4) is called the linear span of
v1 , . . . , vn and is denoted by span(v1 , . . . , vn ). If every vector v in V can be
written as in (5.4), then we say that V is spanned by v1 , . . . , vn , or v1 , . . . , vn
is a spanning set for V .
Since span(v1 , . . . , vn ) is closed under vector addition and scalar multiplication, we have
the following:
Theorem 2. If v1 , . . . , vn are vectors in a vector space V , then span(v1 , . . . , vn ) is a
subspace of V .
In the particular case of two nonzero vectors v1 and v2 in R3 , we see that span(v1 ,v2 )
is either a line or a plane (depending on whether v1 and v2 are colinear or not). In
fact, if v1 and v2 are colinear, then span(v1 ,v2 )=span(v1 )=span(v2 ). Clearly, whether
Fig.3. Two colinear
vectors v1 , v2 in R3
154
CHAPTER 5. VECTOR SPACES
or not v1 and v2 are colinear makes a big difference to the number of vectors required
in a spanning set; this is related to the more general notions of “linear independence”
and “dimension” that we shall discuss in the next two sections.
For the rest of this section, we begin to address two important questions concerning
spanning sets:
Question 1. Given a set of vectors v1 , . . . , vn , what other vectors lie in their span?
Question 2. Given a subspace S of V , can we find a set of vectors that spans S?
To answer Question 1, we must solve for the constants c1 , . . . , cn in (5.4). But we can
use Gaussian elimination to do exactly this! Let us consider two examples; in the first
example, the vector v in (5.4) is specified.
Example 4. Does v = (3, 3, 4) lie in the span of v1 = (1, −1, 2) and v2 = (2, 1, 3) in
R3 ?
Solution. We want to know whether there exist constants c1 and c2 so that
   
 
3
2
1
c1 −1 + c2 1 = 3 .
4
3
2
But we can write this as a linear system for c1 and c2
c1 + 2c2 = 3
−c1 + c2 = 3
2c1 + 3c2 = 4,
and then solve

1
 −1
2
by applying Gaussian elimination to the augmented coefficient matrix:
 

2 3
3
1 2
6  (add R1 to R2 and (−2)R1 to R3 )
1 3  ∼ 0 3
3 4
0 −1 −2


3
1 2
2  (multiply R2 by 1/3)
∼ 0 1
0 −1 −2


1 2 3
∼  0 1 2  (add R2 to R3 ).
0 0 0
From this we see that c2 = 2 and then we back substitute to find c1 : c1 + 2(2) = 3
implies c1 = −1. So v does lie in span(v1 ,v2 ), and in fact
 
 
 
3
1
2
3 = − −1 + 2 1 .
2
4
2
3
Another variant of Question 1 is whether a given set of n (or more) vectors spans
all of Rn ? In this case, the vector v in (5.4) is allowed to be any vector in Rn ; and yet
we can still use Gaussian elimination to obtain our answer.
5.3. SUBSPACES AND SPANNING SETS
155
Example 5. Do the three vectors v1 = (1, −1, 2), v2 = (3, −4, 7), and v3 = (2, −1, 3)
span all of R3 ?
Solution. For any vector v = (x1 , x2 , x3 ) we want to know whether we can find
constants c1 , c2 , and c3 so that


 
   
1
3
2
x1
c1 −1 + c2 −4 + c3 −1 = x2  .
2
7
3
x3
Even though we do not know the values x1 , x2 , x3 , we can perform Gaussian elimination
on the augmented coefficient matrix:

1
 −1
2
3
−4
7
2
−1
3
 
1
x1
x2  ∼  0
x3
0

1
∼ 0
0

1
∼ 0
0
3
−1
1
2
1
−1
3
0
1
2
0
−1
3
1
0
2
−1
0

x1
x2 + x1 
x3 − 2x1

x1
x2 − x1 + x3 
x3 − 2x1

x1
x3 − 2x1  .
x2 − x1 + x3
From this row-echelon form, we see that there are vectors v for which c1 , c2 , and c3
cannot be found: just choose a vector v = (x1 , x2 , x3 ) for which x2 − x1 + x3 6= 0.
Consequently, the vectors v1 , v2 , v3 do not span all of R3 . (In fact, to answer the
question in this example, we did not even need to introduce x1 , x2 , x3 ; it would have
sufficed to show that a row-echelon form of the matrix A = [v1 v2 v3 ] contains a row
of zeros.)
2
Now let us turn to Question 2. An important case is when the subspace S is the
nullspace of a matrix. In this case we can find a spanning set by using Gauss-Jordan
elimination to achieve reduced row-echelon form, and then expressing the solution in
terms of free variables that play the role of the constants in (5.4).
Example 6. Find vectors that span the nullspace of

1
A = 1
1
−4
2
1

1 −4
1 8 .
1 6
156
CHAPTER 5. VECTOR SPACES
Solution. We want to find a spanning set of vectors for the solutions x of Ax = 0. Let
us use ERO’s to convert A to reduced row-echelon form:

 

1 −4 1 −4
1 1 1 6
1 2 1 8  ∼ 1 2 1 8  (switching R1 and R3 )
1 1 1 6
1 −4 1 −4


1 1 1
6
2  (adding (−1)R1 to R2 and to R3 )
∼ 0 1 0
0 −5 0 −10


1 1 1 6
∼ 0 1 0 2 (adding 5R2 to R3 )
0 0 0 0


1 0 1 4
∼ 0 1 0 2 (adding (−1)R2 to R1 ).
0 0 0 0
From this we see that x3 = s and x4 = t are free variables, and we easily determine x1
and x2 in terms of s and t: x1 = −s − 4t and x2 = −2t. So we can write our solution
in vector form as
 
 

   

−4
−1
−4t
−s
−s − 4t
−2
0
 −2t   0  −2t
 
 

   
x=
 s  =  s  +  0  = s 1  + t 0 .
1
0
t
0
t
In other words, we have expressed our solution x as a linear combination of the two
vectors,
v1 = (−1, 0, 1, 0) and v2 = (−4, −2, 0, 1).
Therefore these two vectors span the nullspace of A.
2
Exercises
1. Determine whether the following subsets of R2 are subspaces (and state at least
one condition fails if not). Sketch the set:
(a) The set of all vectors v = (x, y) such that 2x + 3y = 0, Solution
(b) The set of all vectors v = (x, y) such that x + y = 1,
(c) The set of all vectors v = (x, y) such that x y = 0,
(d) The set of all vectors v = (x, y) such that |x| = |y|.
2. Determine whether the following subsets of R3 are subspaces (and state at least
one condition that fails if not). Sketch the set:
(a) The set of all vectors v = (x, y, z) such that z = 0, Solution
(b) The set of all vectors v = (x, y, z) such that x + y + z = 0,
(c) The set of all vectors v = (x, y, z) such that z = 2y,
5.4. LINEAR INDEPENDENCE
157
(d) The set of all vectors v = (x, y, z) such that x2 + y 2 + z 2 = 1.
3. Let M 2 (R) denote the (2 × 2)-matrices A with real elements. Determine whether
the following subsets are subspaces (and state at least one condition that fails if
not):
(a) The invertible matrices,
(b) The matrices with determinant equal to 1,
(c) The lower triangular matrices,
(d) The symmetric matrices (AT = A).
4. Determine whether v lies in span(v1 ,v2 ):
(a) v = (5, 6, 7), v1 = (1, 0, −1), v2 = (1, 2, 3), Solution
(b) v = (0, −2, 0), v1 = (1, 0, −1), v2 = (1, 2, 3),
(c) v = (2, 7, −1, 2), v1 = (1, 2, −2, −1), v2 = (0, 3, 3, 4),
(d) v = (1, 2, 3, 4), v1 = (1, 2, −2, −1), v2 = (0, 3, 3, 4).
5. If possible, express w as a linear combination of v1 , v2 , and v3
(a) w = (4, 5, 6), v1 = (2, −1, 4), v2 = (3, 0, 1), v3 = (1, 2, −1), Solution
(b) w = (2, 1, 2), v1 = (1, 2, 3), v2 = (−1, 1, −2), v3 = (1, 5, 4),
(c) w = (1, 0, 0), v1 = (1, 0, 1), v2 = (2, −3, 4), v3 = (3, 5, 2).
6. Find vectors

1
(a) 4
7

1
(d) 2
2
that span the nullspace of the


2 3
3
5 6 Sol’n
(b) −1
8 9
2


1 −4
5 13 14
5 11 13
(e) 2 −1
1 2
7 17 22
following matrices:


1 2i
1
1 −2i
(c) 2
1
i
1

−3 −7
1
7
3
11
−1
−1
0

0 2
2 7
2 5
7. Find vectors that span the solution sets of the homogeneous linear systems:
(a)
5.4
x1 − x2 + 2x3 = 0
(b)
x1 + 3x2 + 8x3 − x4 = 0
2x1 + x2 − 2x3 = 0
x1 − 3x2 − 10x3 + 5x4 = 0
x1 − 4x2 + 8x3 = 0
x1 + 4x2 + 11x3 − 2x4 = 0.
Linear Independence
For vectors v1 , . . . , vn in a vector space V , we know that S = span(v1 , . . . , vn ) is a
subspace of V . But it could be that not all the vi are needed to generate S. For
example, if v1 , v2 are nonzero colinear vectors in R3 , then S is the line containing both
v1 and v2 , so is generated by v1 alone (or by v2 alone), i.e. we do not need both v1
and v2 . Two vectors that are scalar multiples of each other are not only called colinear,
but “linearly dependent;” in fact, we encountered this terminology for two functions in
Section 2.2. The generalization of the concept to n vectors is provided by the following:
158
CHAPTER 5. VECTOR SPACES
Definition 1. A finite collection v1 , . . . , vn of vectors in a vector space V is linearly
independent if the only scalars c1 , . . . , cn for which
c1 v1 + c2 v2 + · · · cn vn = 0
(5.5)
are c1 = c2 = · · · = cn = 0. Thus v1 , . . . , vn are linearly dependent if (5.5)
holds for some scalars c1 , . . . , cn that are not all zero.
Technically, the second sentence in this definition should read “Thus the collection
{v1 , . . . , vn } is linearly dependent if. . . ”, because linear independence is a property of
a collection of vectors rather than the vectors themselves. However, we will frequently
use the more casual wording when the meaning is clear.
The linear independence of v1 , . . . , vn ensures that vectors in their linear span can
be represented uniquely as a linear combination of the v1 , . . . , vn :
Theorem 1. If v1 , . . . , vn are linearly independent vectors in a vector space V and
v is any vector in span(v1 , . . . , vn ), then there are unique scalars c1 , . . . , cn for
which
v = c1 v1 + · · · + cn vn .
Proof. If we have v = c1 v1 + · · · + cn vn and v = d1 v1 + · · · + dn vn , then v − v = 0
implies
(c1 − d1 )v1 + · · · + (cn − dn )vn = 0.
But linear independence then implies c1 − d1 = · · · = cn − dn = 0, i.e. di = ci for all
i = 1, . . . , n.
2
Let us consider some examples of linearly independent vectors in Rn .
Example 1. The standard unit vectors e1 = (1, 0, . . . , 0),. . . ,en = (0, 0, . . . , 1) in Rn
are linearly independent since the vector equation
c1 e1 + · · · cn en = 0
2
means (c1 , c2 , . . . , cn ) = (0, 0, . . . , 0), i.e. c1 = c2 = · · · = cn = 0.
3
Example 2. Let v1 = (1, 0, 1), v2 = (2, −3, 4), and v3 = (3, 5, 2) in R . Determine
whether these vectors are linearly independent.
Solution. We want to know whether c1 v1 + c2 v2 + c3 v3 = 0 has a nontrivial solution
c1 , c2 , c3 . Now we can write this equation as the homogeneous linear system
c1 + 2c2 + 3c3 = 0
−3c2 + 5c3 = 0
c1 + 4c2 + 2c3 = 0,
and we can determine the solution set by applying Gaussian elimination to the coefficient
matrix:




1 2 3
1 2 3
0 −3 5 ∼ · · · ∼ 0 1 −4 .
1 4 2
0 0 1
5.4. LINEAR INDEPENDENCE
159
But this tells us c3 = 0 and, by back substitution, c1 = c2 = 0. So the vectors are
linearly independent.
2
In Example 2, we observed that the issue of linear independence for three vectors in
R3 reduces to solving a homogeneous linear system of three equations for the three unknowns, c1 , c2 , c3 . But this observation generalizes to vectors v1 , . . . , vk in Rn . Namely,
let A denote the (n × k)-matrix using v1 , . . . , vk as column vectors, which we write as
A = [ v1 v2 · · · vk ],
and then study the solutions of
A c = 0.
(5.6)
Let us summarize this as follows:
Theorem 2. Let v1 , . . . , vk be vectors in Rn and A = [ v1 · · · vk ]. Then v1 , . . . , vk
are linearly independent if and only if A c = 0 has only the trivial solution c = 0.
Our study in Chapter 4 of solving linear systems such as (5.6) implies the following:
Corollary 1. Let v1 , . . . , vk be vectors in Rn and A = [ v1 · · · vk ].
(a) If k > n then v1 , . . . , vk are linearly dependent.
(b) If k = n then v1 , . . . , vk are linearly dependent if and only if det(A)=0.
Remark 1. If k < n, then we need to use Theorem 2 instead of Corollary 1.
Example 3. Determine whether the given vectors are linearly independent in R4 :
(a) v1 = (1, 0, 1, 2), v2 = (2, 0, 4, 8), v3 = (3, 2, 1, 0), v4 = (4, 0, 0, 0), v5 = (5, 4, 3, 2).
(b) v1 = (1, 0, 1, 2), v2 = (2, 0, 4, 8), v3 = (3, 2, 1, 0), v4 = (4, 0, 0, 0).
(c) v1 = (1, 0, 1, 2), v2 = (2, 0, 4, 8), v3 = (3, 2, 1, 0).
Solution. (a) We have k = 5 vectors in R4 , so Corollary 1 (a) implies that v1 , . . . , v5
are linearly dependent.
(b) We have k = 4 vectors in R4 , so we appeal to Corollary 1 (b). We form the
(4 × 4)-matrix A and compute its determinant:
1
0
1
2
2
0
4
8
3
2
1
0
4
1
0
= −2 1
0
2
0
2
4
8
4
1
0 = −8
2
0
4
= −8(8 − 8) = 0.
8
So det(A)= 0, and Corollary 1 implies that v1 , . . . , v4 are linearly dependent.
(c) We have k = 3 vectors in R4 , so we use Theorem 2 instead of Corollary 1:

 
 

1 2 3
1 2 3
1 0 0
0 0 2 0 0 1  0 1 0
 
 

A=
1 4 1 ∼ 0 2 −2 ∼ 0 0 1
2 8 0
0 4 −6
0 0 0
160
CHAPTER 5. VECTOR SPACES
shows that Ac = 0 has only the trivial solution c = 0. Hence v1 , v2 , v3 are linearly
independent.
2
While Corollary 1 provides us with nice shortcuts for determining whether or not a
collection of vectors is linearly independent, to find the nonzero constants so that (5.5)
holds, we need to invoke Theorem 2.
Example 3 (revisited). For the collection of vectors in (a) and (b), find a nontrivial
linear combination satisfying (5.5).
Solution. (a) We let A = [ v1 · · · v5 ] and solve (5.6) by Gauss-Jordan elimination:




1 0 0 8 −3
1 2 3 4 5
0 1 0 −2 1 
0 0 2 0 4

.

A=
1 4 1 0 3 ∼ · · · ∼ 0 0 1 0
2
0 0 0 0
0
2 8 0 0 2
We see that c4 = s and c5 = t are free variables, and in terms of these we find c1 =
−8s + 3t, c2 = 2s − t, c3 = −2t. We have an infinite number of solutions, but we can
arbitrarily pick s = 0 and t = 1 to obtain c1 = 3, c2 = −1, c3 = −2, c4 = 0, c5 = 1:
3v1 − v2 − 2v3 + v5 = 0.
(b) We let A = [ v1 · · · v4 ] and solve (5.6) by Gauss-Jordan elimination as in (a):




1 0 0 8
1 2 3 4
0 1 0 −2
0 0 2 0



A=
1 4 1 0 ∼ · · · ∼ 0 0 1 0  .
0 0 0 0
2 8 0 0
We see that c4 = s is a free variable, and c1 = −8s, c2 = 2s, c3 = 0. We arbitrarily pick
s = 1 to write
−8v1 + 2v2 + v4 = 0.
Linear Independence for Functions and the Wronskian
Now let us consider linear independence for a set of functions. Recall that in Section 2.2
we said two functions are linearly independent if, like two vectors, they are not scalar
multiples of each other. For a larger number of functions, we apply Definition 1 to
functions and conclude:
Definition 2. A finite collection of functions {f1 , f2 , . . . , fn } is linearly independent
on an interval I if the only scalars c1 , c2 , . . . , cn for which
c1 f1 (x) + c2 f2 (x) + · · · + cn fn (x) = 0
for all x in I
are c1 = c2 = · · · = cn = 0.
Also recall from Section 2.2 that the Wronskian of two differentiable functions was
useful in determining their linear independence. For a larger number of functions, we
require more differentiabilty, so let us recall
C k (I) = {the functions on the interval I with continuous derivatives up to order k}.
5.4. LINEAR INDEPENDENCE
161
Definition 3. For functions f1 , f2 , . . . , fn in C n−1 (I), we define the Wronskian as
f1
f10
..
.
W [f1 , f2 , . . . , fn ] =
(n−1)
f1
f2
f20
(n−1)
f2
· · · fn
· · · fn0
.
(n−1)
· · · fn
Let us consider an example.
Example 4. Let f1 (x) = x, f2 (x) = x2 , and f3 (x) = x3 on I = (−∞, ∞). We compute
x
W (f1 , f2 , f3 )(x) = 1
0
x2 x3
x2
2x 3x2
2x 3x2 = x
−
2
2
6x
2
6x
x3
= 2x3 .
6x
2
Let us now show how the Wronskian is useful for establishing the linear independence
of a collection of functions.
Theorem 3. Let f1 , f2 , . . . , fn be functions in C n−1 (I) with W [f1 , f2 , . . . , fn ](x0 ) 6= 0
for at least one point x0 in I. Then {f1 , f2 , . . . , fn } is linearly independent.
Proof. To prove the theorem, we assume
c1 f1 (x) + c2 f2 (x) + · · · + cn fn (x) = 0
for all x in I,
and we want to show c1 = c2 = · · · = cn = 0. But we can differentiate this equation
n − 1 times to obtain
c1 f1 (x) + c2 f2 (x) + · · · + cn fn (x) = 0
c1 f10 (x) + c2 f20 (x) + · · · + cn fn0 (x) = 0
..
.
(n−1)
c1 f1
(n−1)
(x) + c2 f2
(x) + · · · + cn fn(n−1) (x) = 0,
which we view as a linear system for the unknowns c1 , c2 , . . . , cn . Now this linear
system can have a nontrivial solution (c1 , c2 , . . . , cn ) only if the determinant of its coefficient matrix is zero; however, the determinant of the coefficient matrix is precisely
W [f1 , f2 , . . . , fn ], which we assumed is nonzero at x0 . We conclude that c1 = c2 = · · · =
cn = 0, and hence that {f1 , f2 , . . . , fn } is linearly independent.
2
Example 4 (revisited). Using this theorem and W [x, x2 , x3 ] = 2x3 , we see that
{x, x2 , x3 } is linearly independent.
2
It is important to note that, as we saw for n = 2 in Section 2.2, Theorem 3 can not
be used to conclude that a collection of functions is linearly dependent: it does not say
that if W [f1 , f2 , . . . , fn ](x) = 0 for all x in I then {f1 , f2 , . . . , fn } is linearly dependent.
162
CHAPTER 5. VECTOR SPACES
Instead, to show a collection {f1 , f2 , . . . , fn } is linearly dependent on I, we need to
display constants c1 , . . . , cn (not all zero) so that c1 f1 (x) + · · · cn fn (x) = 0 for all x ∈ I.
Example 5. Let f1 (x) = 1, f2 (x) = x2 , and f3 (x) = 3 + 5x2 . Are these functions
linearly independent?
Solution. If we think they might be linearly independent, we can try computing the
Wronskian:
1 x2 3 + 5x2
2x 10x
10x
= 0.
=
W [f1 , f2 , f3 ] = 0 2x
2
10
0 2
10
Since the Wronskian is identically zero, Theorem 3 does not tell us anything. We now
suspect that the functions are linearly dependent, but to verify this, we must find a
linear combination that vanishes. We observe that f3 = 3f1 + 5f2 , i.e.
3f1 + 5f2 − f3 = 0.
That this linear combination of f1 , f2 , f3 vanishes shows that the functions are linearly
dependent.
2
Exercises
1. Determine whether the given vectors are linearly independent; if linearly dependent, find a linear combination that vanishes.
(a) v1 = (1, 2), v2 = (−1, 0)
(b) v1 = (1, 2), v2 = (−1, 0), v3 = (0, 1)
(c) v1 = (1, −1, 0), v2 = (0, 1, −1), v3 = (1, 1, 1) Solution
(d) v1 = (2, −4, 6), v2 = (−5, 10, −15) Solution
(e) v1 = (2, 1, 0, 0), v2 = (3, 0, 1, 0), v3 = (4, 0, 0, 1)
(f) v1 = (1, 2, 0, 0), v2 = (0, −1, 0, 1), v3 = (0, 0, 1, 1), v4 = (1, 0, −1, 0)
2. Find all values of c for which the vectors are linearly independent.
(a) v1 = (1, c), v2 = (−1, 2) Solution
(b) v1 = (1, c, 0), v2 = (1, 0, 1), v3 = (0, 1, −c)
(c) v1 = (1, 0, 0, c), v2 = (0, c, 0, 1), v3 = (0, 0, 1, 1), v4 = (0, 0, 2, 2)
3. Use the Wronskian to show the given functions are linearly independent on the
given interval I.
(a) f1 (x) = sin x, f2 (x) = cos x, f3 (x) = x, I = (−∞, ∞). Solution
(b) f1 (x) = 1, f2 (x) = x, f3 (x) = x2 , f4 (x) = x3 , I = (−∞, ∞).
(c) f1 (x) = 1, f2 (x) = x−1 , f3 (x) = x−2 , I = (0, ∞).
4. The Wronskian of the given functions vanishes. Show they are linearly dependent.
(a) f1 (x) = x, f2 (x) = x + x2 , f3 (x) = x − x2 . Solution
(b) f1 (x) = 1 − 2 cos2 x, f2 (x) = 3 + sin2 x, f3 (x) = π.
(c) f1 (x) = ex , f2 (x) = cosh x, f3 (x) = sinh x.
5.5. BASES AND DIMENSION
5.5
163
Bases and Dimension
We are now able to introduce the notion of a basis for a vector space:
Definition 1. A finite set of vectors B = {v1 , v2 , . . . , vn } in a vector space V is a
basis if
(a) the vectors are linearly independent, and
(b) the vectors span V .
The role of a basis for V is to provide a representation for vectors in V : if v is any
vector in V , then we know (since {v1 , v2 , . . . , vn } spans V ) that there are constants
c1 , c2 , . . . , cn so that
v = c1 v1 + c2 v2 + · · · + cn vn ,
(5.7)
and we know (by Theorem 1 in Sec. 5.4) that the constants c1 , c2 , . . . , cn are unique.
The most familiar example of a basis is the standard basis for Rn :
e1 = (1, 0, 0, . . . , 0)
e2 = (0, 1, 0, . . . , 0)
..
.
en = (0, 0, 0, . . . , 1).
That {e1 , e2 , . . . , en } is linearly independent was confirmed in Example 1 in Sec. 5.4.
That {e1 , e2 , . . . , en } spans Rn follows from the fact that any v = (v1 , v2 , . . . , vn ) in Rn
can be realized as a linear combination of e1 , e2 , . . . , en :
v = v1 e1 + v2 e2 + · · · + vn en .
A basis for a vector space is not unique. In Rn , for example, there are many choices
of bases and we can easily determine when a collection of n vectors forms a basis. In
fact, given vectors v1 , . . . , vn in Rn , we form the matrix A = [v1 · · · vn ] with the vj
as column vectors, and use Corollary 1 (b) of Section 5.4 to conclude that v1 , . . . , vn
are linearly independent if and only if det(A) 6= 0. But the invertibility conditions in
Section 4.4 show that det(A) 6= 0 is also equivalent to being able to solve Ac = v for
any v in Rn , i.e. to showing that v1 , . . . , vn span Rn . In other words, we have just
proved the following:
Theorem 1. A collection {v1 , . . . , vn } of n vectors in Rn is a basis if and only if
A = [v1 · · · vn ] satisfies det(A) 6= 0.
Example 1. Show that v1 = (1, 2, 3), v2 = (1, 0, −1), v3 = (0, 1, 0) is a basis for R3 .
Solution. Let A = [v1 v2 v3 ]. We compute
1
det(A) = 2
3
1
0
−1
0
1
1 =−
3
0
1
= −(−1 − 3) = 4 6= 0.
−1
2
164
CHAPTER 5. VECTOR SPACES
Now let us return to a general vector space V . Although a basis for V is not unique,
the number of vectors in each basis is, and that is what we will call the “dimension” of
V . Towards that end we have the following:
Theorem 2. If a vector space V has a basis consisting of n vectors, then any set of
more than n vectors is linearly dependent.
Proof. Let {v1 , v2 , . . . , vn } be a basis for V and let A = {w1 , . . . , wm } where m > n.
Let us write each wi in terms of the basis vectors:
w1 = a11 v1 + · · · + a1n vn
w2 = a21 v1 + · · · + a2n vn
..
.
(5.8)
wm = am1 v1 + · · · + amn vn .
To show that A is linearly dependent, we want to find c1 , . . . , cm , not all zero, so that
c1 w1 + · · · cm wm = 0.
(5.9)
Replace each wi in (5.9) by its expression in terms of v1 , . . . , vn in (5.8):
c1 (a11 v1 + · · · + a1n vn ) + · · · + cm (am1 v1 + · · · amn vn ) = 0.
Let us rearrange this equation, collecting the vi terms together:
(c1 a11 + · · · + cm am1 )v1 + · · · + (c1 a1n + · · · + cm amn )vn = 0.
However, since {v1 , v2 , . . . , vn } is linearly independent, the coefficients of this linear
combination must vanish:
c1 a11 + · · · + cm am1 = 0
..
..
..
.
.
.
c1 a1n + · · · + cm amn = 0.
But this is a homogeneous linear system with m unknowns c1 , . . . , cm and n equations;
since m > n, we know by Section 3.3 that it has an infinite number of solutions; in
particular, it has a nontrivial solution. But this shows that A is linearly dependent. 2
Now suppose we have two bases A = {v1 , . . . , vn } and B = {w1 , . . . , wm } for a
vector space. Since A is a basis and B is linearly independent, Theorem 2 implies
m ≤ n. Interchanging the roles of A and B we conclude n ≤ m. Thus m = n and we
obtain the following:
Corollary 1. Any two bases for a vector space V have the same number of vectors.
Using this corollary, we can make the following definition of “dimension.”
5.5. BASES AND DIMENSION
165
Definition 2. If a vector space V has a basis containing n vectors, then n is the
dimension of V which we write as dim(V ). We also say that V is finitedimensional.
In particular, since we showed above that e1 , . . . , en is a basis for Rn , we see that
dim(Rn ) = n, as expected.
Remark 1. This definition does not mean that every vector space has finite dimension,
i.e. a basis of finitely many vectors. If there is no such finite basis, then the vector
space is called “infinite dimensional.” Although we have limited exposure to infinite
dimensional vector spaces in this book, an example is discussed in Exercise 2.
If we know that V has dimension n, then, for a collection of n vectors, the conditions
(a) and (b) in Definition 1 are equivalent.
Theorem 3. Suppose dim(V ) = n and B = {v1 , . . . , vn } is a set of n vectors in V .
Then: B is a basis for V ⇔ B is linearly independent ⇔ B spans V .
Proof. It suffices to show (a) B is linearly independent ⇒ B spans V , and (b) B spans
V ⇒ B is linearly independent.
(a) Let {v1 , v2 , . . . , vn } be linearly independent and let v be any vector in V . We
want to show that v lies in span(v1 , . . . , vn ). But by Theorem 2, we know that
{v, v1 , v2 , . . . , vn } is linearly dependent, so we must have
cv + c1 v1 + c2 v2 + · · · + cn vn = 0
for some nontrivial choice of constants c, c1 , . . . , cn . If c = 0, then we would have
c1 v1 + c2 v2 + · · · + cn vn = 0, which by the linear independence of {v1 , v2 , . . . , vn }
means c1 = c2 = · · · cn = 0, i.e. this is a trivial choice of the constants c, c1 , . . . , cn . So
we must have c 6= 0, and so we can divide by it and rearrange to have
v = c̃1 v1 + c̃2 v2 + · · · + c̃n vn ,
where c̃i = −ci /c.
But this says that v is in span(v1 , . . . , vn ). Hence, {v1 , v2 , . . . , vn } spans V .
(b) Assume that B = {v1 , v2 , . . . , vn } spans V . If B is not linearly independent, then we
will show that B contains a proper subset B 0 that spans V and is linearly independent,
i.e. is a basis for V ; since B 0 has fewer than n vectors, this contradicts dim(V ) = n. So
assume B is linearly dependent: c1 v1 + · · · ck vk = 0 for some constants ci not all zero.
Suppose cj 6= 0. Then we can express vj as a linear combination of the other vectors:
vj = c01 v1 + · · · c0j−1 vj−1 + c0j+1 vj+1 + · · · + c0k vk
where c0i = −ci /cj .
Thus vj is in the span of A = {v1 , . . . , vj−1 , vj+1 , . . . , vk }, so A also spans V . If A is
also linearly dependent, then we repeat the procedure above. Since we started with a
finite set of vectors, this process will eventually terminate in a linearly independent set
166
CHAPTER 5. VECTOR SPACES
B 0 that spans V and contains fewer than n vectors, contradicting dim(V ) = n. This
contradiction shows that B itself must be linearly independent.
2
In a sense, Theorem 3 can be viewed as a generalization of Theorem 1 to vector
spaces other than Rn where we do not have access to a condition like det(A) 6= 0. For
example, it applies to the important case of subspaces S of Rn : if dim(S) = k < n
and we are given a collection of k vectors v1 , . . . , vk in S, then each vi is an n-vector
and so the matrix A = [v1 · · · vk ] has n rows and k columns; thus det(A) is not even
defined! However, using Theorem 3 to show that {v1 , . . . , vk } is a basis for S, we need
only show either (a) they are linearly independent or (b) they span S.
Example 3. The equation x + 2y − z = 0 defines a plane, which is a subspace S of
R3 . The vectors v1 = (1, 1, 3) and v2 = (2, 1, 4) clearly lie on the plane. Moreover,
v1 6= kv2 , so the vectors are linearly independent. By Theorem 3, {v1 , v2 } forms a
basis for S.
2
Let us continue to discuss subspaces in the context of bases and dimension. If S is
a subspace of a vector space V , then (by Theorem 2) we have
dim(S) ≤ dim(V ).
But a stronger statement is that we can add vectors to a basis for S to obtain a basis
for V :
Theorem 4. A basis {v1 , . . . , vk } for S can be extended to a basis {v1 , . . . , vk , . . . , vn }
for V .
Proof. Let {w1 , . . . , wn } be a basis for V . Is w1 in S? If yes, discard it and let
S 0 = S; if not, let vk+1 = w1 so {v1 , . . . , vk , vk+1 } is a linearly independent set and
let S 0 = span{v1 , . . . , vk , vk+1 }. Now repeat this process with w2 , w3 , etc. After
we exhaust {w1 , . . . , wn }, we will have a linearly independent set {v1 , . . . , v` } whose
linear span includes all the {w1 , . . . , wn }, i.e. {v1 , . . . , v` } spans V . But this means
{v1 , . . . , v` } is a basis for V and we must have ` = n.
2
Example 3 (revisited). To extend the basis {v1 , v2 } for S to a basis for R3 , we
follow the proof of Theorem 4: try adding one of e1 , e2 , e3 . Let us try {v1 , v2 , e1 }. We
compute
1 2 1
1 1
1 1 0 =
= 1 6= 0.
3 4
3 4 0
So by Theorem 1, {v1 , v2 , e1 } is a basis for R3 .
2
Now let us consider a different problem: can we find a basis for a subspace S? For
example, suppose we are given vectors w1 , . . . , wm and we let S = span{w1 , . . . , wm }.
We don’t want to remove the wj one by one until we get a linearly independent set!
There is a neat answer to this problem, but we must put it off until the next section.
On the other hand, there is another important class of subspaces, namely the nullspace
of an (m × n)-matrix (or, equivalently, the solution space for a homogeneous linear
system), for which we know how to find a spanning set that in fact is a basis.
5.5. BASES AND DIMENSION
167
Basis for a Solution Space
Recall that in Section 5.3 we constructed a spanning set for the nullspace of an (m × n)matrix A, i.e. the solution space of
Ax = 0.
(5.10)
To summarize, we used Gauss-Jordan elimination to put the matrix into reduced rowechelon form. In order for there to be a solution, there must be free variables. In that
case, we introduced free parameters, and expressed the leading variables in terms of
them. Putting the resultant solutions x into vector form, the spanning set of vectors
appear with free parameters as coefficients. We now show that this spanning set of
vectors is also linearly independent, so forms a basis for the solution space.
Suppose that there are k free variables and ` = n − k leading variables in the
reduced row-echelon form of A. For simplicity of notation, let us suppose the leading
variables are the first ` variables x1 , . . . , x` and the free variables are the last k variables
x`+1 , . . . , xn . We introduce k free parameters t1 , . . . , tk for the free variables: x`+j = tj
for j = 1, . . . , k. After we express the leading variables in terms of the free parameters,
the solution in vector form looks like


 
 
 
b11 t1 + · · · + b1k tk
b11
b12
b1k


 .. 
 .. 
 .. 
..


 . 
 . 
 . 
.


 
 
 
 b`1 t1 + · · · b`k tk 
 b`1 
 b`2 
 b`k 


 
 
 






 
t1
x=
 = t1  1  + t2  0  + · · · + tk  0 






 0 
t2


0
1
 


 . 
 . 
 . 
..


 .. 
 .. 
 .. 
.
tk
We see that the k vectors
 
b11
 .. 
 . 
 
 b`1 
 

v1 = 
 1 ,
0
 
 . 
 .. 
0
0
 
b12
 .. 
 . 
 
 b`2 
 

v2 = 
 0 ,
1
 
 . 
 .. 
0
0
...
,
1
 
b1k
 .. 
 . 
 
 b`k 
 

vk = 
 0 
 0 
 
 . 
 .. 
(5.11)
1
span the solution space. A linear combination c1 v1 + · · · ck vk = 0 takes the form
   
∗
0
 ..   .. 
 .  .
   
 ∗  0
   
  
c1 v1 + c2 v2 + · · · + ck vk = 
c1  = 0 ,
c2  0
   
 .  .
 ..   .. 
ck
0
168
CHAPTER 5. VECTOR SPACES
where ∗ indicates terms that we don’t care about; we see that c1 = c2 = · · · = ck =
0. Consequently, v1 , . . . , vk are linearly independent, and hence form a basis for the
solution space of (5.10).
Let us discuss an example.
Example 4. Find the dimension and a basis for the solution space of the linear system
x1 + 3x2 + 4x3 + 5x4 = 0
2x1 + 6x2 + 9x3 + 5x4 = 0.
Solution. We write the coefficient matrix and use Gauss-Jordan elimination to put it
in reduced row-echelon form:
1 3 4 5
1 3 0 25
∼
.
2 6 9 5
0 0 1 −5
We see that x2 = s and x4 = t are free variables, and the leading variables are x1 =
−3s − 25t and x3 = 5t. In vector form we have



 

−25
−3
−3s − 25t


 


s
 = s 1  + t 0 .
x=
 5 
0


5t
1
0
t
We find that the vectors v1 = (−3, 1, 0, 0) and v2 = (−25, 0, 5, 1) span the solution
space. Notice that, where one vector has a 1 the other vector has a 0, so v1 , v2 are
linearly independent and form a basis for the solution space. In particular, the dimension
of the solution space is 2.
2
Exercises
1. Determine whether the given set of vectors is a basis for Rn .
(a) v1 = (1, 2), v2 = (3, 4);
(b) v1 = (1, 2, 3), v2 = (2, 3, 4), v3 = (1, 0, −1); Solution
(c) v1 = (1, 0, −1), v2 = (1, 2, 3), v3 = (0, 1, 0);
(d) v1 = (1, 2, 3, 4), v2 = (1, 0, −1, 0), v3 = (0, 1, 0, 1);
(e) v1 = (1, 0, 0, 0), v2 = (1, 2, 0, 0), v3 = (0, −1, 0, 1), v4 = (1, 2, 3, 4).
2. Show that the collection P of all real polynomials on (−∞, ∞) is an “infinitedimensional” vector space.
3. Find the dimension and a basis for the solution space.
(a)
x1 − x2 + 3x3 = 0
2x1 − 3x2 − x3 = 0
(c)
Sol’n
x1 − 3x2 − 10x3 + 5x4 = 0
(b)
3x1 + x2 + 6x3 + x4 = 0
2x1 + x2 + 5x3 − 2x4 = 0
(d)
x1 + 3x2 − 4x3 − 8x4 + 6x5 = 0
x1 + 4x2 + 11x3 − 2x4 = 0
x1 + 2x3 + x4 + 3x5 = 0
x1 + 3x2 + 8x3 − x4 = 0
2x1 + 7x2 − 10x3 − 19x4 + 13x5 = 0
5.6. ROW AND COLUMN SPACES
169
4. Find the dimension and a basis for the nullspace

1
−2 4
1 2
(a)
Sol’n ; (b)
; (c) −1
3 −6
3 4
1
5.6
of the given matrix A.



1 −1 2 3
2 3
2 −1 3 4

0 1 ; (d) 
1 0 1 1.
6 11
3 −1 4 5
Row and Column Spaces
Consider an (m × n)-matrix A. We may think of each row of A as an n-vector and
consider their linear span: this forms a vector space called the row space of A and is
denoted Row(A). Since elementary row operations on A involve no more than linear
combinations of the row vectors and are reversible, this has no effect on their linear
span. We have proved the following:
Theorem 1. If two matrices A and B are row-equivalent, then Row(A) = Row(B).
Since A has m rows, the dimension of Row(A) is at most m, but it could be less. In
particular, if E is a row-echelon form for A, then the dimension of Row(A) is the same
as the number of nonzero rows in E, which we identified in Section 4.3 as rank(A). Thus
we have the following:
Corollary 1. dim(Row(A)) = rank(A).
In fact, we can use the same reasoning to find a basis for Row(A):
Algorithm 1: Finding a Basis for Row(A)
Use ERO’s to put A into row-echelon form E. Then the nonzero rows of E form
a basis for Row(A).
Example 1. Find a basis for the row space of

1 2 1
2 3 3
A=
3 4 5
1 3 0
0
−2
−3
2

2
1

−1
5
Solution. We use ERO’s to find the reduced row-echelon form


1 0 3 0 −8
 0 1 −1 0 5 

E=
 0 0 0 1 −1 .
0 0 0 0 0
We conclude that rank(A)=3 and a basis for the row space is v1 = (1, 0, 3, 0, 8), v2 =
(0, 1, −1, 0, 5), v3 = (0, 0, 0, 1, −1). Of course, we did not need to find the reduced rowechelon form; if we had used another echelon form, we would obtain a different (but
equivalent) basis for Row(A).
2
Having used the row vectors of A to form a vector space, we can do the same with
the column vectors: the linear span of the n column vectors of A is a vector space
170
CHAPTER 5. VECTOR SPACES
called the column space and denoted Col(A). We are interested in finding a basis
for Col(A), but obtaining it is more subtle than for Row(A) since elementary row
operations need not preserve Col(A). However, if we let E denote a row-echelon form
for A, then we shall see that it can be used to select a basis for Col(A). In fact, let us
denote the column vectors of A by c1 , . . . , cn and the column vectors of E by d1 , . . . , dn ,
i.e.
A = c1 · · · cn
and
E = d1 · · · dn .
Because A and E are row-equivalent, they have the same solution sets, so
Ax = 0
⇔
E x = 0.
But if x = (x1 , . . . , xn ), then we can write this equivalence using columns vectors as
x 1 c1 + · · · + x n cn = 0
⇔
x1 d1 + · · · + xn dn = 0.
This shows that linear dependence amongst the vectors d1 , . . . , dn is exactly mirrored
by linear dependence amongst the vectors c1 , . . . , cn . However, we know a subset of the
column vectors of E that is linearly independent: they are the pivot columns, i.e. the
columns that contain the leading 1’s. In fact, if E is in reduced row-echelon form (as
we had in Example 1), then these pivot columns are the vectors
e1 = (1, 0, . . . , 0), e2 = (0, 1, 0, . . . , 0), . . . , er = (0, . . . , 0, 1, . . . , 0) where r = rank(E),
which are clearly independent. But in any row-echelon form, the pivot columns still
take the form
(1, 0, . . . , 0), (∗, 1, 0, . . . , 0), . . . , (∗, ∗, . . . , ∗, 1, . . . , 0)
where ∗ denotes some number, so their linear independence remains clear. Moreover,
the pivot columns span Col(E). We conclude that exactly the same column vectors of
A are linearly independent and span Col(A). We can summarize this as follows:
Algorithm 2: Finding a Basis for Col(A)
Use ERO’s to put A into row-echelon form E. Then a basis for Col(A) is obtained by selecting the column vectors of A that correspond to the columns of E
containing the leading 1’s.
Since the number of leading ones is the same as the number of nonzero rows, we immediately have the following consequence:
Theorem 2. For any matrix A we have dim(Row(A)) = dim(Col(A)).
We sometimes abbreviate the above theorem with the statement: for any matrix, the
row and column ranks are equal. We already acknowledged that the term rank(A)
that we introduced in Section 4.3 coincides with the row rank of A, but we now see
that it also coincides with the column rank of A, further justifying its simple moniker
“rank(A).”
Example 2. Find a basis for the column space of the matrix in Example 1.
5.6. ROW AND COLUMN SPACES
171
Solution. We see that the column vectors in E containing leading 1’s are the first,
second, and fourth columns; notice that these three vectors are indeed linearly independent and span Col(E). We select the corresponding columns of A as a basis for
Col(A): v1 = (1, 2, 3, 1), v2 = (2, 3, 4, 3), v3 = (0, −2, −3, 2). As expected, we have
dim(Col(A)) = 3 = dim(Row(A)).
2
We want to mention one more result that is similar to the dimension results that we
have obtained in this section. It concerns the dimension of the nullspace of a matrix.
Definition 1. For any matrix A, the dimension of its nullspace N (A) is called the
nullity of A and is denoted by null(A).
Notice that null(A) is just the number of free variables in E = rref(A), and the number
of free variables is the total number of variables minus the number of leading 1’s in E,
which is just rank(A). In other words, we obtain the following:
The Rank-Nullity Identity
For an (m × n)-matrix A we have:
rank(A) + null(A) = n.
Finding a Basis for a Linear Span
Given a collection of (possibly) linearly dependent vectors v1 , . . . , vk in Rn that
span a subspace W , we can use either Algorithm 1 or Algorithm 2 to find a basis for
W : we can let v1 , . . . , vk denote the rows of a (k × n)-matrix and apply Algorithm 1,
or we can let v1 , . . . , vk denote the columns of an (n × k)-matrix and use Algorithm 2.
However, there is a difference: only Algorithm 2 will select a basis from the collection
v1 , . . . , vk .
Example 3. Let S be the linear span of the vectors v1 = (1, 3, 0, 1), v2 = (0, 1, 1, −1),
v3 = (−1, −1, 2, −3), v4 = (3, 7, −1, 4) in R4 . Select a basis for S from these vectors.
Solution. It is not immediately clear whether S is a two or three dimensional subspace,
or possibly all of R4 . However, if we use the vectors as column vectors in the 4×4-matrix


1 0 −1 3
3
1 −1 7 
,
A=
0
1
2 −1
1 −1 −3 4
then S is just the column space of A, and Algorithm 2 will achieve exactly what we
want. In fact, if we put A in reduced row-echelon form, we obtain


1 0 −1 0
0 1 2 0

E=
0 0 0 1 .
0 0 0 0
The pivot columns of E are the first, second, and fourth columns, so we conclude that
S is spanned by v1 , v2 , and v4 .
2
172
CHAPTER 5. VECTOR SPACES
Exercises
1. For the given matrix A, find a

3
3 −9
(a)
,
(b) 1
−1 3
1
basis for Row(A) and for Col(A).



2 −1
3 1 0 2
3 2 ,
(c) 2 1 1 1
2 1
1 0 −1 1
2. Let W denote the linear span of the given set of vectors. Select from the vectors
a basis for W .
(a) v1 = (0, 3, 2), v2 = (1, 2, 1), v3 = (−1, 1, 1)
(b) v1 = (2, 3, 0, 1), v2 = (1, 1, 1, 0), v3 = (3, 5, −1, 2), v4 = (−1, 0, 1, 0)
(c) v1 = (1, −1, 1, −1), v2 = (2, −1, 3, 1), v3 = (3, −4, 2, −6), v4 = (4, −2, 6, 2)
5.7
Inner Products and Orthogonality
In Section 4.1, we defined the dot product of vectors v and w in Rn : v · w = v1 w1 +
· · · vn wn . In particular, we have v · v = v12 + · · · vn2 , so the magnitude or length of v can
be expressed using the dot product as
√
(5.12)
kvk = v · v.
In this section we will also call kvk the norm of the vector v. Recall from analytic
geometry in R3 that two vectors v and w are orthogonal if v · w = 0. We can use the
same definition in Rn , namely v and w are orthogonal if
v · w = 0.
(5.13)
The dot product in Rn is an example of an “inner product”: if we write
hv, wi = v · w
for vectors v and w in Rn ,
(5.14)
then we see that the following properties hold:
• hv, vi ≥ 0, and hv, vi = 0 if and only if v = 0;
• hv, wi = hw, vi for any vectors v and w;
• hλv, wi = λhv, wi for any scalar λ and vectors v and w;
• hu + v, wi = hu, wi + hv, wi for any vectors u, v, and w;
p
• kvk = hv, vi is the norm of v.
What about an inner product for vectors in Cn ? We no longer
√ hv, wi = v·w
√ want to use
since the norm/magnitude of a complex number is |z| = zz and not z 2 . Instead, let
us define
hv, wi = v · w = v1 w1 + v2 w2 + · · · vn wn
for vectors v and w in Cn .
(5.15)
5.7. INNER PRODUCTS AND ORTHOGONALITY
173
With this definition, we see that we need to modify one of the properties listed above,
since we no longer have hv, wi = hw, vi, but rather hv, wi = hw, vi. Of course, for real
vectors v and w, we have v · w = v · w, so the definitions (5.14) and (5.15) coincide,
and there is no harm in using (5.15) for Rn as well.
Let us now give the general definition of an inner product on a vector space; the
complex conjugate signs may be removed in case V is a real vector space.
Definition 1. An inner product on a vector space V associates to each pair of vectors
v and w a scalar hv, wi satisfying the following properties:
1. hv, vi ≥ 0 for any vector v, and hv, vi = 0 if and only if v = 0;
2. hv, wi = hw, vi for any vectors v and w;
3. hλv, wi = λhv, wi for any scalar λ and vectors v and w;
4. hu + v, wi = hu, wi + hv, wi for any vectors u, v, and w.
A vector space with an inner product is called an inner product space. The norm of
a vector v in an inner product space is defined by
p
kvk = hv, vi
(5.16)
As a consequence of Properties 2 and 3 we have (see Exercise 2):
hv, λwi = λhv, wi for any scalar λ and vectors v and w.
(5.17)
It is important to realize that subspaces of inner product spaces inherit the inner
product.
Example 1. Let V be a subspace of Rn . Then V is an inner product space under the
dot product. (The analogous statement about subspaces of Cn is also true.)
2
In Rn we know that the norm of a vector v corresponds to its length, so a vector with
norm 1 is a unit vector. Of particular interest in Rn are collections of vectors such as
{e1 , e2 , . . . , en } which form a basis. Notice that the collection {e1 , e2 , . . . , en } has two
additional properties: i) each element is a unit vector, and ii) any two distinct elements
are orthogonal. We want to consider these properties in all inner product space.
Definition 2. Let V be an inner product space.
• A vector v is a unit vector if kvk = 1;
• Two vectors v and w are orthogonal (and we write v ⊥ w) if hv, wi = 0;
• A collection of vectors {v1 , . . . , vn } is an orthogonal set of vectors if vi ⊥ vj
for all i 6= j;
• An orthogonal set of unit vectors {v1 , . . . , vn } is called an orthonormal set.
Example 2. The zero vector 0 is orthogonal to every vector v in V . This is a simple
consequence of the linearity of the inner product:
174
CHAPTER 5. VECTOR SPACES
h0, vi = hv − v, vi = hv, vi − hv, vi = 0.
2
Example 3. In R3 , the vectors v1 = (1, 0, −1) and v2 = (1, 2, 1) are orthogonal since
hv1 , v2 i = (1, 0, −1) · (1, 2, 1) = 1 − 1 = 0.
Consequently,
{v1 , v2 } is
√ an orthogonal set. But v1 and v2 are not unit vectors since
√
kv1 k = 2 and kv2 k = 6. However, if we let
1
u1 = √ (1, 0, −1)
2
1
and u2 = √ (1, 2, 1),
6
2
then {u1 , u2 } is an orthonormal set.
If {v1 , . . . , vn } is an orthogonal set that forms a basis for V , then we say that
{v1 , . . . , vn } is an orthogonal basis; similarly, an orthonormal basis is an orthonormal set that is also a basis.
Example 4. We know that i = (1, 0) and j = (0, 1) form a basis for R2 . Since i and j
are orthogonal unit vectors, they form an orthonormal basis for R2 . But there are many
other orthonormal bases for R2 . For example, v1 = (1, 1) and v2 = (1, −1) are linearly
independent (since the determinant of the matrix [v1 v2 ] is not zero), so {v1 , v2 } is a
basis for R2 . They are also orthogonal since v1 · v2 = 0, so {v1 , v2 } is an orthogonal
basis. Note v1 and v2 are not unit vectors, but we can normalize them by defining
1
u1 = √ (1, 1)
2
1
and u2 = √ (1, −1).
2
Now we see that {u1 , u2 } is an orthonormal basis for R2 .
2
We are beginning to suspect that orthogonality is somehow related to linear independence. The following result confirms our suspicions.
Theorem 1. If {v1 , . . . , vn } is an orthogonal set of nonzero vectors in an inner product
space V , then the vectors v1 , . . . , vn are linearly independent.
Proof. Suppose {v1 , . . . , vn } is an orthogonal set of nonzero vectors, and we have
scalars c1 , . . . , cn so that
c1 v1 + · · · cn vn = 0.
Now let us take the inner product of both sides with v1 :
hv1 , c1 v1 + · · · cn vn i = hv1 , 0i.
Using linearity and the fact that hv1 , vj i = 0 for all j 6= 1, the left-hand side is just
c1 hv1 , v1 i = c1 kv1 k2 . Since the right hand side is just 0, we have
c1 kv1 k2 = 0.
But since v1 is nonzero, we know kv1 k 6= 0, so we must have c1 = 0. We can repeat
this for j = 1, . . . , n and conclude c1 = c2 = · · · = cn = 0. This means that v1 , . . . , vn
are linearly independent.
2
Corollary 1. In an n-dimensional inner product space V , an orthogonal set of n vectors
forms a basis for V .
5.7. INNER PRODUCTS AND ORTHOGONALITY
175
Orthogonal Projections and the Gram-Schmidt Procedure
Does a finite-dimensional inner product space always have an orthonormal basis?
The answer is not only “yes”, but there is a procedure for finding it. Let h, i be an
inner product on a vector space V and suppose {w1 , . . . , wn } is a basis for V . We shall
show how to use {w1 , . . . , wn } to define an orthonormal basis {u1 , . . . , un } for V . The
procedure involves “orthogonal projection,” which we now discuss.
Let u be a unit vector in V . For any vector v in V , we define the projection of v
onto u to be the vector
proju (v) = hv, uiu.
(5.18)
Notice that proju (v) is a scalar multiple of u, so it points in the same (or directly
opposite) direction as u. Moreover, the vector
orthu (v) = v − proju (v),
(5.19)
is orthogonal to u since
hv − proju (v), ui = hv, ui − hhv, uiu, ui = hv, ui − hv, uikuk2 = 0.
(Notice that we need u to be a unit vector in the last equality.) So we can write v as
the sum of two vectors that are orthogonal to each other:
v = proju (v) + orthu (v).
(5.20)
Fig.1. Projection of v
onto the unit vector u
Orthogonal projection onto a unit vector can be generalized to orthogonal projection
onto the span U of a finite set {u1 , u2 , . . . , un } of orthonormal vectors. Namely, we
define the projection of v onto U = span({u1 , u2 , . . . , un }) to be the vector
projU (v) = hv, u1 iu1 + hv, u2 iu2 + · · · + hv, un iun ,
(5.21)
which is a linear combination of {u1 , u2 , . . . , un }, and so lies in U . Analogous to (5.19),
we can define
orthU (v) = v − projU (v),
(5.22)
which is orthogonal to each of the basis vectors {u1 , u2 , . . . , un } since
Fig.1. Projection of v
onto the subspace U
hv − projU (v), ui i = hv, ui i − hhv, u1 iu1 + · · · + hv, un iun , ui i
= hv, ui i − hv, u1 ihu1 , ui i − · · · − hv, un ihun , ui i
= hv, ui i − hv, ui i = 0.
Consequently, we can write v as the sum of two vectors
v = projU (v) + orthU (v),
where projU (v) lies in U and orthU (v) is orthogonal to U .
(5.23)
176
CHAPTER 5. VECTOR SPACES
The Gram-Schmidt Procedure.
Suppose V is an n-dimensional vector space with basis {v1 , . . . , vn }. First let
u1 =
v1
kv1 k
and V1 = span{u1 }.
Next we let
w2 = v2 − projV1 (v2 ) = v2 − proju1 (v2 )
so that w2 is orthogonal to u1 , and then we let
u2 =
w2
kw2 k
and V2 = span{u1 , u2 }.
We can continue this process iteratively: having defined {u1 , . . . , uk } and Vk =
span(u1 , . . . , uk ), we define wk+1 by
wk+1 = vk+1 − projVk (vk+1 ),
so that wk+1 is orthogonal to {u1 , . . . , uk }, and then we normalize to obtain uk .
We stop when we have exhausted the set {v1 , . . . , vn }.
We have established the following.
Theorem 2. Let {v1 , . . . , vn } be a basis for an inner product space V . Then V has an
orthonormal basis {u1 , . . . , un } generated by the Gram-Schmidt procedure.
Example 5. Find an orthonormal basis for the plane in R3 spanned by the vectors
v1 = (1, −1, 0) and v2 = (0, 1, −1).
Solution. We follow the Gram-Schmidt procedure. Let
u1 =
v1
1
= √ (1, −1, 0).
kv1 k
2
Next we calculate
proju1 (v2 ) = hv2 , u1 i u1 =
1
−√
2
1
1
√ (1, −1, 0) = (−1, 1, 0),
2
2
and let
1
1
w2 = v2 − proju1 (v2 ) = (0, 1, −1) − (−1, 1, 0) = (1, 1, −2).
2
2
Finally, we normalize w2 to obtain u2 :
√
kw2 k =
6
2
⇒
1
u2 = √ (1, 1, −2).
6
Thus {u1 , u2 } is an orthonormal basis for the plane spanned by {v1 , v2 }.
2
5.7. INNER PRODUCTS AND ORTHOGONALITY
177
Exercises
1. Let V = C([0, 1]) be the real-valued continuous functions on [0, 1], and let hf, gi =
R1
f (x)g(x) dx.
0
(a) Show that V is a real vector space.
(b) Show that h , i is an inner product on V .
(c) Show that f (x) = sin πx and g(x) = cos πx are orthogonal.
2. Prove (5.17).
3. The following are special cases of the orthogonal projections (5.19) and (5.21).
(a) If v is a scalar multiple of a unit vector u, show that orthu (v) = 0.
(b) If v is in U = span(u1 , . . . , un ), where {u1 , . . . , un } is an orthonormal set,
show that orthU (v) = 0.
4. (a) Find all values of c so that v = (2, c) and w = (3, 1) are orthogonal.
(b) Find all values of c so that v = (−1, 2, c) and w = (3, 4, 5) are orthogonal.
(c) Find all values of c so that v = (1, 2, c) and w = (1, −2, c) are orthogonal.
5. Use the Gram-Schmidt procedure to find an orthonormal basis for the vector space
spanned by the given vectors
(a) v1 = (6, 3, 2), v2 = (2, −6, 3)
(b) v1 = (1, −1, −1), v2 = (2, 1, −1)
(c) v1 = (1, 0, 1, 0), v2 = (0, 1, 1, 0), v3 = (0, 1, 0, 1)
178
5.8
CHAPTER 5. VECTOR SPACES
Additional Exercises
P∞
1. Let V be the collection
of all infinite series k=1 ak of real numbers that converge
P∞
absolutely, i.e. k=1 |ak | < ∞. Is V a real vector space?
2. Let V be the collection of all infinite sequences {zk }∞
k=1 of complex numbers that
converge to zero, i.e. limk→∞ zk = 0. Is V a complex vector space?
3. Let F[0, ∞) denote the vector space of real-valued functions on [0, ∞) and EO[0, ∞)
denote those functions that are of exponential order, i.e. |f (t)| ≤ M ect for some
positive constants M and c. Is EO a subspace of F?
4. Let F denote the vector space of real-valued functions on (−∞, ∞) and P denote
those functions that are periodic on (−∞, ∞). Is P a subspace of F?
5. Let S1 and S2 be subspaces of a vector space V . Show that the intersection S1 ∩S2
is also a subspace of V .
6. Let S1 and S2 be subspaces of a vector space V and let S1 + S2 denote the vectors
in V of the form v = v1 + v2 , where vi is in Si . Show that S1 + S2 is a subspace
of V .
7. Let V = M 2×2 (R) be the vector space of all real (2×2)-matrices. Are the following
four matrices linearly independent?
1 0
0 1
0 0
1 2
A1 =
, A2 =
, A3 =
, A4 =
.
0 0
0 0
1 0
3 0
8. Let V = M 2×2 (C) be the vector space of all complex (2 × 2)-matrices. Are the
following three matrices linearly independent?
1 0
0 i
0
0
A1 =
, A2 =
, A3 =
.
0 0
0 0
1+i 0
9. Let Sym3×3 (R) denote all real symmetric (3 × 3)-matrices.
(a) Show that Sym3×3 (R) is a subspace of M 3×3 (R).
(b) Find a basis for Sym3×3 (R).
(c) What is the dimension of Sym3×3 (R)?
10. Let T ri3×3
+ (R) denote the vector space of all upper triangular real (3×3)-matrices.
Find a basis and the dimension for T ri3×3
+ (R).
11. Let f1 (x) = x and f2 (x) = |x| for −∞ < x < ∞.
(a) Show that f2 is not in C 1 (−∞, ∞).
(b) Show that {f1 , f2 } is linearly independent on (−∞, ∞).
12. Let f1 (x) = x, f2 (x) = |x|, and
(
f3 (x) =
0 for x ≤ 0
x for x > 0.
Is {f1 , f2 , f3 } linearly independent on (−∞, ∞)?
5.8. ADDITIONAL EXERCISES
179
13. Let P2 (x) denote the quadratic polynomials in x (i.e. degree ≤ 2), which is a
vector space (cf. Example 3 in Section 5.3). Find a basis for P2 (x).
14. Let P2 (x, y) denote the quadratic polynomials in x and y (i.e. degree ≤ 2). Confirm
that P2 (x, y) is a vector space and find its dimension.
15. Show that C n (I), where n is any nonnegative integer and I is an interval, is an
infinite-dimensional vector space.
16. Show that the complex numbers C can be considered to be a real vector space.
Find its dimension.
17. If S is a subspace of a finite-dimensional vector space V and dim(S) = dim(V ),
then show S = V .
18. Give an example of an infinite-dimensional subspace S of an infinite-dimensional
vector space V , such that S 6= V .
19. Without performing calculations, determine the rank of the matrix and and use
the rank-nullity identity to determine the nullity.


1 −2 3
−4
8 
A = −2 4 −6
3 −6 9 −12
20. Without performing calculations, determine the rank of the matrix and and use
the rank-nullity identity to determine the nullity.


2
1 −3 0
4
0
1
2
3
4

B=
−4 −2 6
0 −8
0 −2 −4 −6 −8
21. If v and w are orthogonal vectors in a real inner product space, show that
kv + wk2 = kvk2 + kwk2 .
(*)
22. If v and w are vectors in a real inner product space that satisfy the above equality
(∗), show that v and w are orthogonal.
If S is a subspace of a real inner product space V , let its orthogonal complement S ⊥
be the set of vectors v in V satisfying hv, si = 0 for all s in S. The next four problems
concern this object.
23. Show that S ⊥ is a subspace of V .
24. Show that S ∩ S ⊥ = {0}.
25. If S1 is a subspace of S2 , show that S2⊥ is a subspace of S1⊥ .
26. Show that S is a subspace of (S ⊥ )⊥ .
180
CHAPTER 5. VECTOR SPACES
Chapter 6
Linear Transformations and
Eigenvalues
6.1
Introduction to Transformations and Eigenvalues
In this chapter we will consider linear transformations between vector spaces, T : V →
W , and especially linear transformations on a vector space, T : V → V . Let us give a
careful definition.
Definition 3. A linear transformation between vector spaces V and W is a mapping
T : V → W that assigns to each vector v in V a vector w = T v in W satisfying
• T (u + v) = T (u) + T (v) for all vectors u and v in V , and
• T (s v) = s T (v) for all vectors v and scalars s.
If V = W , then T : V → V is a linear transformation on V .
Two additional properties that follow immediately are
T (0) = 0
and
T (−v) = −T (v)
for all v in V .
(6.1)
There are linear transformations between infinite-dimensional vector spaces (see Exercises 1 & 2 in Section 6.4), but the most common linear transformations are given by
matrices. In fact, any (m × n)-matrix A defines a linear transformation A : Rn → Rm ,
and any (n × n)-matrix A defines a linear transformation on Rn .
Fig.1. Linear
transformation between
vector spaces.
Example 1. The (2 × 2)-matrix
A=
0
1
−1
0
defines a linear transformation on R2 ; let us see investigate its behavior. Recalling the
basis vectors i = (1, 0) and j = (0, 1) for R2 , we see that
0 −1 1
0
0 −1 0
−1
Ai =
=
= j and Aj =
=
= −i.
1 0
0
1
1 0
1
0
181
Fig.2. Rotation of the
plane by 90◦
182
CHAPTER 6. LINEAR TRANSFORMATIONS AND EIGENVALUES
In both cases, we see that A has the effect of rotating the vector 90◦ counterclockwise.
Since i and j form a basis for R2 , we see that A rotates every vector in R2 in this way;
in particular, for every nonzero vector v, Av is perpendicular to v.
2
Example 2. The (2 × 2)-matrix
A=
2
0
0
1/2
also defines a linear transformation on R2 ; let us explore its properties. Notice that
Ai = 2i and Aj = 21 j. Thus A stretches vectors in the i-direction by a factor of 2, and
it compresses vectors in the j-direction by a factor of 12 . Clearly the two vectors i and
j play a special role for the matrix A.
2
Fig.3. Stretching in the
x-direction, compressing
in the y-direction.
Eigenvalues and Eigenvectors
In the case of a linear transformation on a vector space T : V → V , of particular interest
is any vector v that is mapped to a scalar multiple of itself, i.e.
T v = λv,
for some scalar λ.
(6.2)
Of course, (6.2) always holds when v = 0 (by (6.1)), but we are interested in nonzero
solutions of (6.2).
Definition 4. An eigenvalue for a linear transformation T on a vector space V is a
scalar λ so that (6.2) holds for some nonzero vector v ∈ V . The vector v is called an
eigenvector associated with the eigenvalue λ, and (λ, v) is called an eigenpair.
Fig.4. The action of T on
an eigenvector.
Notice that an eigenvector is not unique. In fact, when V = Rn and T = A is an
(n × n)-matrix, we may write (6.2) as
(A − λI) v = 0.
(6.3)
We see that the eigenvectors for the eigenvalue λ (together with the zero vector 0)
form the nullspace of the matrix A − λI. In particular, the solution space of (6.3) is a
subspace of Rn ; consequently, if v is an eigenvector for λ then so is s v for any nonzero
scalar s, and if v and w are both eigenvectors for λ then so is the sum v + w. We call
this subspace the eigenspace for λ; the dimension of the eigenspace could be 1 or it
could be greater than 1.
In order to find eigenvalues and eigenvectors for A, we use the linear algebra of
matrices that we studied in Chapter 4. In particular, the existence of a nontrivial
solution of (6.3) means that the matrix A − λI is singular, i.e. that
det(A − λI) = 0.
(6.4)
But p(λ) = det(A−λI) is just an nth-order polynomial called the characteristic polynomial and (6.4) is called the characteristic equation. So finding the eigenvalues of
A reduces to finding the roots of its characteristic polynomial, and for each eigenvalue
we can then solve (6.3) to find its eigenvectors. We summarize this as follows:
6.1. INTRODUCTION TO TRANSFORMATIONS AND EIGENVALUES
183
Finding the eigenvalues & eigenvectors of a matrix A
1. Find all roots λ of the characteristic equation (6.4); these are the eigenvalues.
2. For each eigenvalue λ, solve (6.3) to find all eigenvectors v associated with λ.
Example 3. Find the eigenvalues and a basis for each eigenspace for the (2 × 2)-matrix
5
7
A=
.
−2 −4
Solution. We first calculate the characteristic polynomial
5−λ
7
det(A − λI) = det
= (5 − λ)(−4 − λ) − (−2)(7) = λ2 − λ − 6.
−2
−4 − λ
Next we find the roots of the characteristic polynomial by factoring it:
λ2 − λ − 6 = (λ − 3)(λ + 2) = 0
⇒
λ1 = 3, λ2 = −2.
Let us first find an eigenvector associated with the eigenvalue λ1 = 3. We must solve
(A − λ1 I) v1 = (A − 3I) v1 = 0. Writing v1 = (x, y), we must solve
2
7
x
0
=
.
−2 −7 y
0
Since the system is homogeneous, we recall from Chapter 4 that we want to use elementary row operations to put the coefficient matrix into row-echelon form. Adding the top
row to the bottom row, and then dividing the top row by 2, we obtain
2
7
2 7
1 7/2
∼
∼
.
−2 −7
0 0
0 0
We see that y is a free variable, so we can let y = s and solve for x to find x = −7s/2.
To get one eigenvector, it is convenient to let s = 2:
v1 = (−7, 2).
(Of course, we could as easily have taken s = −2 to obtain v1 = (7, −2).)
Now we turn to the eigenvalue λ2 = −2. We must solve (A + 2I) v2 = 0. Writing
v2 = (x, y), we must solve
7
7
x
0
=
.
−2 −2 y
0
We can put the coefficient matrix into row echelon form
7
7
1 1
∼
,
−2 −2
0 0
184
CHAPTER 6. LINEAR TRANSFORMATIONS AND EIGENVALUES
and conclude that y = s is free and x = −s. For example, we can take s = 1 to obtain
v2 = (−1, 1).
(Of course, using s = −1 to obtain v2 = (1, −1) works just as well.)
2
In Example 3 we found each eigenvalue had a one-dimensional eigenspace spanned
by a single eigenvector. However, eigenspaces could be multi-dimensional, as the next
example illustrates.
Example 4. Find the eigenvalues and a basis for each eigenspace for the (3 × 3)-matrix


4 −3 1
A = 2 −1 1 .
0 0 2
Solution. We calculate the characteristic polynomial
det(A − λI) =
4−λ
2
0
= (4 − λ)
−3
1
−1 − λ
1
0
2−λ
−1 − λ
0
−3
1
−2
0
2−λ
1
2−λ
= (λ2 − 3λ + 2)(2 − λ) = (λ − 1)(λ − 2)(2 − λ).
We find that there are just two eigenvalues, λ1 = 1 and λ2 = 2, the latter having
multiplicity 2.
To find the eigenvectors associated with λ1 = 1, we put A − I into row-echelon form:
 
 
 


1 −1 0
1 −1 0
1 −1 0
3 −3 1
A − I = 2 −2 1 ∼ 2 −2 1 ∼ 0 0 1 ∼ 0 0 1 .
0 0 1
0 0 1
0 0 0
0 0 1
We see that x2 = t is a free variable, x3 = 0, and x1 = t. We can choose t = 1 to get
v1 = (1, 1, 0).
For the eigenvectors associated with λ2 = 2, we put A − 2I into row-echelon form:

 

2 −3 1
1 −3/2 1/2
0
0 
A − 2I = 2 −3 1 ∼ 0
0 0 0
0
0
0
We see that x2 = s and x3 = t are free variables, and x1 = (3/2)s − (1/2)t. Taking
s = 2 and t = 0 we get one eigenvector associated with λ2 = 2
v2 = (3, 2, 0),
and if we take s = 0 and t = 2, we get another eigenvector associated with λ2 = 2
v3 = (−1, 0, 2).
2
6.1. INTRODUCTION TO TRANSFORMATIONS AND EIGENVALUES
185
Example 4 shows that an eigenvalue with multiplicity 2 can have two linearly independent eigenvectors, so the eigenspace has dimension 2. Note that there are two
notions of multiplicity for an eigenvalue λ: the algebraic multiplicity ma is the multiplicity as a root of the characteristic polynomial and the geometric multiplicity
mg is the dimension of the associated eigenspace. The following example shows that, if
ma > 1, these two notions of multiplicity need not coincide.
Example 5. Find the eigenvalues and a basis for each eigenspace for the (3 × 3)-matrix

1
B = 0
0

3
2 .
2
2
1
0
Solution. The upper triangular form of this matrix makes it particularly easy to
calculate its eigenvalues:

1−λ
det(B − λI) = det  0
0

2
3
1−λ
2  = (1 − λ)2 (2 − λ).
0
2−λ
We see that the characteristic polynomial has two roots: λ1 = 1 has algebraic multiplicity 2, and λ2 = 2 has algebraic multiplicity 1. To find the geometric multiplicities,
let us find the eigenvectors for both eigenvalues.
For λ1 = 1, we put the matrix B − I into row echelon form:

0
B − I = 0
0
2
0
0
 
3
0
2 ∼ 0
1
0
1
0
0
 
3/2
0
2  ∼ 0
1
0

0
1 .
0
1
0
0
We see the first variable is free and the other two must be zero, i.e. v1 = s (1, 0, 0) for
any value s. We see that the eigenspace has dimension 1 and, if we take s = 1, we get
a particular eigenvector
v1 = (1, 0, 0).
Notice that λ1 = 1 has ma = 2 but mg = 1.
For λ2 = 2 we proceed similarly:

−1
B − 2I =  0
0
2
−1
0
 
3
1
2 ∼ 0
0
0
−2
1
0
 
−3
1
−2 ∼ 0
0
0
0
1
0

−7
−2 ,
0
so the third variable is free and we obtain the general solution v2 = t (7, 2, 1). We see
that this eigenspace has dimension 1 (so ma = 1 = mg ), and we can choose t = 1 to
obtain a particular eigenvector
v2 = (7, 2, 1).
2
186
CHAPTER 6. LINEAR TRANSFORMATIONS AND EIGENVALUES
Complex Eigenvalues
Even when A is a real (n × n)-matrix, some of the roots of the characteristic polynomial
p(λ) could be complex numbers. But if λ is complex, then the solutions of (6.3) are
complex vectors, so do not qualify as eigenvectors for A : Rn → Rn ; consequently, λ
is not an eigenvalue. In particular, we see that there are linear transformations on Rn
without any eigenvalues; this is illustrated by Example 1 above.
However, if we now let V = Cn and consider A : Cn → Cn , then a complex root
of p(λ) is an eigenvalue since we can now find (complex) solutions of (6.3) which are
perfectly valid as eigenvectors. Of course, when V = Cn , we could allow the matrix
A to have complex elements, but let us continue to assume that A has real elements
and see what benefits we obtain. One benefit is that the coefficients of p(λ) being real
means that the complex roots of p(λ) = 0 occur in complex conjugate pairs:
p(λ) = 0
⇔
p(λ) = p(λ) = 0.
A second benefit is that if v is an eigenvector for λ, then v is an eigenvector for λ since
Av = λv
⇔
Av = Av = λv.
We restate this as:
If A is a square matrix with real elements, then (λ, v) is an eigenpair if and only if
(λ, v) is an eigenpair.
This is a useful observation: it means that, having solved the linear system (6.3) to find
an eigenvector for a complex eigenvalue λ, we do not need to solve a linear system to
find an eigenvector for λ ! Let us illustrate this with an example.
Example 6. Find the eigenvalues and a basis for each eigenspace for the (2 × 2)-matrix
−3 −1
A=
.
5
1
Solution. We first find the roots of the characteristic equation:
−3 − λ
−1
det(A − λI) =
= λ2 + 2λ + 2 = 0 ⇒
5
1−λ
λ = −1 ± i.
As expected, the eigenvalues λ1 = −1 + i and λ2 = −1 − i are conjugate pairs. Let us
find an eigenvector for λ1 :
−2 − i −1
5 2−i
5 2−i
A − λ1 I =
∼
∼
.
5
2−i
5 2−i
0
0
From this we can conclude that an eigenvector for λ1 is v1 = (−2 + i, 5). To find an
eigenvector for the eigenvalue λ2 = −1 − i, instead of solving a linear system, we simply
take the complex conjugate of v1 : v2 = v1 = (−2 − i, 5). Thus we have two eigenpairs:
λ1 = −1 + i, v1 = (−2 + i, 5)
λ2 = −1 − i, v2 = (−2 − i, 5).
2
6.2. DIAGONALIZATION AND SIMILARITY
187
Exercises
1. Find all eigenvalues and a basis for each eigenspace for the following matrices. If
an eigenvalue has algebraic multiplicity ma > 1, find its geometric multiplicity mg .
4 −3
10 −8
1 6
(b)
;
(c)
;
(a)
Solution ;
2 −1
6 −4
2 −3






1 1 0
3 6 −2
7 −8 6
(d) 0 2 0;
(e) 0 1 0 ;
(f) 8 −9 6 .
0 0 3
0 0 1
0 0 −1
2. The following matrices have (some) complex eigenvalues. Find all eigenvalues and
associated eigenvectors.




0 1 0 0
1 0 0
−1 0 0 0 
−2 1
−2 −6

; (c) 0 0 −2; (d) 
(a)
Sol’n ; (b)
 0 0 0 −1.
3
4
−1 −2
0 2 0
0 0 1 0
3. Suppose λ is an eigenvalue with associated eigenvector v for a square matrix A.
Show that λn is an eigenvalue with associated eigenvector v for the matrix An .
4. Consider the linear transformation T : R2 → R2 that reflects a vector in the xaxis: see Figure 5. Geometrically determine all eigenvectors and their eigenvalues.
6.2
Diagonalization and Similarity
In this section we investigate when an (n × n)-matrix A has n eigenvectors v1 , . . . , vn
that form a basis, and how we can use such a basis to transform A into a particularly
simple form.
Definition 1. An (n × n)-matrix A is called diagonalizable if it has n linearly
independent eigenvectors v1 , . . . , vn . In this case, we call v1 , . . . , vn an eigenbasis.
We shall explain shortly why we use the term “diagonalizable,” but first let us explore
when we have linearly independent eigenvectors.
Suppose (λ1 , v1 ) and (λ2 , v2 ) are two eigenpairs for A with λ1 6= λ2 ; we claim that
v1 and v2 are linearly independent. To check this, we want to show that c1 v1 +c2 v2 = 0
implies c1 = 0 = c2 . But if we apply (A − λ1 I) to c1 v1 + c2 v2 and note (A − λ1 I)v1 = 0
while (A − λ1 I)v2 = (λ2 − λ1 )v2 , then we conclude
c1 v1 + c2 v2 = 0
⇒
c2 (λ2 − λ1 )v2 = 0.
Since λ2 − λ1 6= 0, we must have c2 = 0, and hence c1 = 0. This reasoning can be
extended to prove the following useful result.
Theorem 1. Suppose v1 , . . . , vk are eigenvectors for A, corresponding to distinct
eigenvalues λ1 , . . . , λk . Then v1 , . . . , vk are linearly independent.
Fig.5. Reflection in the
x-axis
188
CHAPTER 6. LINEAR TRANSFORMATIONS AND EIGENVALUES
Proof. The proof is by mathematical induction. By the reasoning above, we know the
result is true for k = 2. Now we assume it is true for k − 1 and we prove it for k. Let
c1 v1 + c2 v2 + · · · + ck vk = 0.
(6.5)
We want to show c1 = c2 = · · · = ck = 0. If we apply (A − λ1 I) to this equation and
use (A − λ1 I)v1 = 0 as well as (A − λ1 I)vj = (λj − λ1 )vj for j 6= 1, we obtain
c2 (λ2 − λ1 )v2 + · · · ck (λk − λ1 )vk = 0.
By the induction hypothesis, we know that v2 , . . . , vk are linearly independent, so
c2 (λ2 − λ1 ) = · · · = ck (λk − λ1 ) = 0.
But (λj − λ1 ) 6= 0 for j 6= 1, so c2 = · · · = ck = 0. But if we plug these into (6.5), we
conclude c1 = 0 too. We conclude that v1 , . . . , vk are linearly independent.
2
Of course, if A has n distinct eigenvalues, then there are n linearly independent eigenvectors:
Corollary 1. If an (n × n)-matrix A has n distinct eigenvalues, then A is diagonalizable.
However, it is not necessary for a matrix to have distinct eigenvalues in order for it to
be diagonalizable. The (3 × 3)-matrix A in Example 4 in the previous section has only
two eigenvalues, but it has three linear independent eigenvectors, and so A is diagonalizable. On the other hand, the (3×3)-matrix B in Example 5 in the previous section has
only two linear independent eigenvectors, and so B is not diagonalizable. Clearly, the
issue is whether the algebraic multiplicity ma (λ) and the geometric multiplicity mg (λ)
coincide for each eigenvalue λ. We restate this observation as a theorem:
Theorem 2. A square matrix A is diagonalizable if and only if ma (λ) = mg (λ) for
each eigenvalue λ.
Remark 1. All of the above results apply to complex eigenvalues and eigenvectors as
well as real ones. However, as observed at the end of the previous section, we then need
to view A as a transformation on Cn ; see Exercise 2.
Now let us explain the significance of the term “diagonalizable”. If A has n linearly
independent eigenvectors v1 , . . . , vn , then let us introduce the (n×n)-matrix E obtained
by using the vj as column vectors:
E = v1 v2 · · · vn .
Since the column vectors are linearly independent, the column rank of E is n. But, as
we saw in Section 4.6, this means that the row rank of E is also n, which implies that E
is invertible: E−1 exists. Now let us denote by (λ1 , . . . , λn ) the eigenvalues associated
6.2. DIAGONALIZATION AND SIMILARITY
189
with these eigenvectors (although the λj need not be distinct). Since Avj = λj vj for
each j = 1, . . . , n, we see that the product of A and E yields
AE = Av1 Av2 · · · Avn = λ1 v1 λ2 v2 · · · λn vn .
Now let D denote the diagonal matrix with the λj on its main diagonal:


λ1 0 · · · 0
 0 λ2 · · · 0 


D = Diag(λ1 , . . . , λn ) =  .
..
. .
 ..
. · · · .. 
0
0 · · · λn
If we take the product of E and D we get
ED = λ1 v1 λ2 v2
···
λn vn .
We conclude that AE = ED. But if we multiply this equation on the left by E−1 , we
obtain
E−1 AE = D = Diag(λ1 , . . . , λn ).
(6.6)
Thus, by multiplying A on the right by its eigenvector matrix E and on the left by E−1 ,
we have turned A into the diagonal matrix with its eigenvalues on the main diagonal.
The relationship (6.6) is important enough that it is given a name:
Definition 2. Two matrices A and B are similar if there is an invertible matrix S
such that S−1 AS = B.
Moreover, the above process is reversible, so we have proved the following result that
expresses the true meaning of saying that A is diagonalizable:
Theorem 3. A square matrix has a basis of eigenvectors if and only if it is similar
to a diagonal matrix.
Let us illustrate this theorem with an example.
Example 1. The (3 × 3)-matrix

4
A = 2
0
−3
−1
0

1
1
2
in Example 4 of Section 6.1 was found to have an eigenvalue λ1 = 1 with eigenvector
v1 = (1, 1, 0), and an eigenvalue λ2 = 2 with two eigenvectors, v2 = (3, 2, 0) and
v3 = (−1, 0, 2); for notational convenience we write λ3 = 2. Let E be the matrix with
these eigenvectors as column vectors:


1 3 −1
E = 1 2 0  .
0 0 2
190
CHAPTER 6. LINEAR TRANSFORMATIONS AND EIGENVALUES
Using our methods of Section 4.4, we compute the inverse of E to find


−2 3 −1
E−1 =  1 −1 1/2 .
0
0 1/2
Now we simply compute

−2 3
E−1 AE =  1 −1
0
0

−1
4
1/2 2
1/2 0

−3 1 1
−1 1 1
0 2 0
3
2
0
 
−1
1
0  = 0
2
0
0
2
0

0
0 .
2
The result is indeed a diagonal matrix with the eigenvalues on the main diagonal.
2
In (6.6), A and D clearly have the same eigenvalues; but is there a relationship
between the eigenvalues of similar matrices in general? Yes: if (λ, v) is an eigenpair for
B, i.e. Bv = λv, and B = S−1 AS, then S−1 ASv = λv and we can multiply on the left
by S to obtain Aw = λw where w = Sv. In particular, we have shown the following:
Theorem 4. Similar matrices have the same eigenvalues and geometric multiplicities.
Application to Computing Powers of a Matrix
In certain applications, we may have a square matrix A and need to compute its successive powers A2 , A3 , etc. Let us mention such an application:
Transition Matrix. Suppose that a sequence of vectors x0 , x1 , x2 , . . . is defined iteratively by
xk+1 = A xk ,
where A is a square matrix. Then A is called the transition matrix, and we can use a
power of A to express each xk in terms of x0 : x1 = Ax0 , x2 = Ax1 = AAx0 = A2 x0 ,
x3 = Ax2 = AA2 x0 = A3 x0 , etc. In general, we have
x k = Ak x 0 .
Thus, to generate the sequence xk , we need to compute the matrix powers Ak .
Fig.3. Transition matrix
generates a sequence
Now Ak can be computed simply by the usual matrix multiplication, but the calculation becomes increasingly lengthy as k increases. However, if A is diagonalizable,
then we can use (6.6) to greatly simplify the calculation. In fact, observe
A = E D E−1 ,
A2 = E D E−1 E D E−1 = E D2 E−1 ,
A3 = E D2 E−1 E D E−1 = E D3 E−1 ,
..
.
6.2. DIAGONALIZATION AND SIMILARITY
191
Ak = E Dk E−1 .
Moreover, if D = Diag(λ1 , λ2 , . . . , λn ), then D2 = Diag(λ21 , λ22 , . . . , λ2n ), and in general
Dk = Diag(λk1 , λk2 , . . . , λkn ).
We conclude that
Ak = E Diag(λk1 , λk2 , . . . , λkn ) E−1 .
(6.7)
Example 2. Find A5 where A is the (3 × 3)-matrix in Example 1.
Solution. In Example 1 we found E as well as E−1 . Consequently, we can apply (6.7):
A5 = E Diag(λ51 , λ52 , λ53 ) E−1

1
= 1
0
3
2
0

94 −93
62 −61
0
0

−1 15
0  0
2
0
0
25
0

0
−2
0  1
25
0
3
−1
0

−1
1/2
1/2

31
31 .
32
This method has certainly simplified the calculation of A5 !
2
Application to the Matrix Exponential
We are familiar with the exponential function ex where x is a real or complex number.
It turns out that we can apply the exponential function to an (n × n)-matrix A to
obtain an (n × n)-matrix eA called the matrix exponential . This is formally defined
by simply by replacing x with A in the familiar formula ex = 1 + x + 21 x2 + · · · :
eA = I + A +
A2
A3
+
+ ···
2!
3!
(6.8)
It can be shown that the series (6.8) always converges to an (n × n)-matrix; the problem
that we address here is how to compute eA when A is diagonalizable.
The simplest case is when A itself is a diagonal matrix: A = Diag(λ1 , . . . , λn ). In
this case, we know that Ak = Diag(λk1 , . . . , λkn ) for k = 1, 2, . . . , and so
1
Diag(λ21 , . . . , λ2n ) + · · ·
2!
1
1
= Diag(1 + λ1 + λ21 + · · · , . . . , 1 + λn + λ2n + · · · )
2!
2!
= Diag(eλ1 , . . . , eλn ).
eA = I + Diag(λ1 , . . . , λn ) +
192
CHAPTER 6. LINEAR TRANSFORMATIONS AND EIGENVALUES
Example 3.

1
A = 0
0
0
2
0

0
0
3

⇒
eA
e
= 0
0
0
e2
0

0
0.
e3
2
It is only slightly more work to calculate eA when A is diagonalizable:
A = E D E−1
⇒
Ak = E Dk E−1 for k = 1, 2, . . .
⇒
eA = I + E D E−1 +
= E (I + D +
E D2 E−1
+ ···
2!
D2
+ · · · )E−1 = E eD E−1 .
2!
We summarize this as
eA = E Diag(eλ1 , . . . , eλn ) E−1 .
Example 4. Let us compute eA for the



1 3 −1
1
E = 1 2 0  , D = 0
0 0 2
0
(6.9)
matrix A in Example 1. There we had



0 0
−2 3 −1
2 0 , E−1 =  1 −1 1/2 .
0 2
0
0 1/2
Consequently,
eA




1 3 −1
e 0 0
−2 3 −1
= 1 2 0  0 e2 0   1 −1 1/2
0 0 2
0 0 e2
0
0 1/2


2
2
2
−2e + 3e 3e − 3e −e + e
= −2e + 2e2 3e − 2e2 −e + e2  . 2
0
0
e2
We shall find the matrix exponential useful when studying systems of first-order equations in Chapter 7.
Exercises
1. Determine whether the given matrix is diagonalizable;
a diagonal matrix D so that E−1 AE = D.
−7 4
4 −2
;
Solution ;
(b)
(a)
−4 1
1 1





1
0 1 0
1 3 0
0
(d) 0 2 0;
(e) −1 2 0;
(f) 
0
0 0 2
−1 1 1
0
if so, find a matrix E and
(c)
6
4
0 −2
1 −2
0 −1
0 0
−6
;
−4

0
0
.
0
1
6.3. SYMMETRIC AND ORTHOGONAL MATRICES
193
2. The following matrices have (some) complex eigenvalues.
given matrix is diagonalizable; if so, find a matrix E and
that E−1 AE = D.
0 1
0 2
(b)
(a)
;
(c)
Sol’n ;
−2 2
−2 0
3. Calculate Ak for the given matrix

1
3 −2
(a)
, k = 5; (b) 0
1 0
0
4. Calculate eA for the following

1
3 −2
(a)
;
(b) 0
1 0
0
6.3
Determine whether the
a diagonal matrix D so

0
0
5
A and integer k.


3 0
1
2 0, k = 6; (c) 0
0 2
0
matrices A:


3 0
1
2 0;
(c) 0
0 2
0
1
0
−7

0
1.
3

−2 1
1 0, k = 8.
−2 2

−2 1
1 0.
−2 2
Symmetric and Orthogonal Matrices
Recall that a square matrix A is symmetric if AT = A. In this section we shall see
that symmetric matrices are always diagonalizable. In fact, the matrix S for which
S−1 AS = Diag(λ1 , . . . , λn ) can be chosen to have some special additional properties.
To discuss the results, let h , i denote the natural inner product on Rn or Cn :
hv, wi = v · w. Recall that v and w are orthogonal if hv, wi = 0. Now, if A is a real
symmetric (n × n)-matrix, then for any (real or complex) vectors v and w we have
hAv, wi = Av · w
T
=v·A w
= v · Aw
(by definition of h , i)
(using (4.10) in Section 4.1)
(since A is symmetric)
= v · Aw (since A is real)
= hv, Awi (by definition of h , i).
We summarize this calculation as a lemma:
Lemma 1. If A is a real symmetric (n × n)-matrix, then
hAv, wi = hv, Awi
for any vectors v and w in Rn or Cn .
Now let us state the main result for symmetric matrices.
Theorem 1. If A is a real symmetric (n × n)-matrix, then it is diagonalizable.
Moreover:
(a) All eigenvalues of A are real;
(b) Eigenvectors corresponding to distinct eigenvalues are orthogonal;
(c) A has a set of n orthonormal eigenvectors.
194
CHAPTER 6. LINEAR TRANSFORMATIONS AND EIGENVALUES
Proof. We here prove (a) and (b); part (c) will be proved at the end of this section.
(a) If λ is an eigenvalue with eigenvector v, then Av = λv implies
hAv, vi = hλv, vi = λhv, vi = λkvk2 .
Using this and the inner product property hu, wi = hw, ui, we obtain
hv, Avi = hAv, vi = λkvk2 .
But by Lemma 1 we have hAv, vi = hv, Avi, so
0 = hAv, vi − hv, Avi = (λ − λ)kvk2 .
Since kvk =
6 0, we conclude λ − λ = 0, i.e. λ is real.
(b) Suppose we have distinct eigenvalues λ1 and λ2 with associated eigenvectors v1 and
v2 . Then
hAv1 , v2 i = hλ1 v1 , v2 i = λ1 hv1 , v2 i
and
hv1 , Av2 i = hv1 , λ2 v2 i = λ2 hv1 , v2 i = λ2 hv1 , v2 i,
where in the last step λ2 = λ2 follows from (a). But by Lemma 1 we have hAv1 , v2 i =
hv1 , Av2 i, so
0 = λ1 hv1 , v2 i − λ2 hv1 , v2 i = (λ1 − λ2 )hv1 , v2 i.
But since we assumed λ1 − λ2 6= 0, we conclude hv1 , v2 i = 0, i.e. v1 ⊥ v2 .
Example 1. Find the eigenvalues and an
matrix

1
A = 2
1
orthonormal basis of eigenvectors for the

2 1
4 2 .
2 1
Solution. First we find the eigenvalues:
1−λ
2
1
2
1
0
4−λ
2
= 0
2
1−λ
1
2
2λ
−λ
2
−λ2 + 2λ
2λ
= λ2 (6 − λ).
1−λ
So λ = 0 is a double eigenvalue and λ = 6 is a single eigenvalue.
For λ = 0 we find its eigenvectors by finding the RREF of the matrix:

 

1 2 1
1 2 1
x2 = s
2 4 2 ∼ 0 0 0 ⇒
x3 = t
1 2 1
0 0 0
x1 = −2s − t.
If we express the solution in vector form


 
 
−2s − t
−2
−1
x =  s  = s 1  + t 0 .
t
0
1
6.3. SYMMETRIC AND ORTHOGONAL MATRICES
195
We see that the zero eigenspace is spanned by the vectors
v1 = (−1, 0, 1)
and v2 = (−2, 1, 0).
(We chose v1 to be the simpler vector to make subsequent calculations a little easier.)
Now let us apply Gram-Schmidt to convert these to an orthonormal basis:
v1
1
= √ (−1, 0, 1),
kv1 k
2
2
1
√ (−1, 0, 1) = (−1, 1, −1),
w2 = v2 − hv2 , u1 iu1 = (−2, 1, 0) − √
2
2
w2
1
u2 =
= √ (−1, 1, −1).
kw2 k
3
Now let us find the eigenvector associated with the eigenvalue λ = 6:

 

x3 = t
1 0 −1
−5 2
1
 2 −2 2  ∼ 0 1 −2 ⇒ x2 = 2t
x1 = t.
0 0 0
1
2 −5
u1 =
Choosing t = 1 we find we have the eigenvector v3 = (1, 2, 1), and we normalize to find
u3 =
v3
1
= √ (1, 2, 1).
kv3 k
6
2
We have found our orthonormal set of eigenvectors {u1 , u2 , u3 }.
Remark 1. Since a real symmetric matrix A has all real eigenvalues, we can also take
eigenvectors to be real. Consequently, we henceforth use Rn as our vector space and dot
product as the inner product.
Orthogonal Matrices
Theorem 1 shows that a real symmetric matrix A has an eigenbasis, so (by the results
of the previous section) A is diagonalizable: E−1 AE = D, where E is the matrix with
the eigenbasis as column vectors and D is the diagonal matrix with the eigenvalues on
the diagonal. But since the eigenbasis provided by Theorem 1 is orthonormal, E has
some nice additional features.
Let O be an invertible real (n × n)-matrix whose column vectors u1 , . . . , un are
orthonormal. Then OT has u1 , . . . , un as its row vectors, so
 


u1
u1 · u1 u1 · u2 · · · u1 · un
 u2  

 
 u2 · u1 u2 · u2 · · · u2 · un 
OT O =  .  u1 u2 · · · un =  .
.
.
.
..
..
.. 
 .. 
 ..

un

1
0

= .
 ..
0
un · u1

0
1
..
.
···
···
..
.
0
0

..  = I.
.
0
···
1
un · u2
···
un · un
(6.10)
196
CHAPTER 6. LINEAR TRANSFORMATIONS AND EIGENVALUES
Now, if we multiply the matrix equation OT O = I on the right by O−1 , we obtain
OT = O−1 . This property is so important that we give it a special name.
Definition 1. A square real matrix O is called orthogonal if
OT = O−1 .
(6.11)
The calculation (6.10) shows the following
Proposition 1. A real (n × n)-matrix O is orthogonal if and only if its column vectors
form an orthonormal basis for Rn .
Example 2. Find the inverse of the matrix
√ 
 √
1/ 2 0 1/ 2
1
0√  .
O =  0√
−1/ 2 0 1/ 2
Solution. We observe that the column vectors are orthonormal, so O is an orthogonal
matrix and we can use (6.11) to compute its inverse:
√ 
 √
1/ 2 0 −1/ 2
1
0√  .
O−1 = OT =  0√
2
1/ 2 0 1/ 2
Another important property enjoyed by orthogonal matrices is that they preserve
the length of vectors. In fact, this property characterizes orthogonal matrices:
Proposition 2. A real (n × n)-matrix O is orthogonal if and only if kO vk = kvk for
all v ∈ Rn .
Proof. It is easy to see that OT = O−1 implies kOvk = kvk for all v:
kO vk2 = O v · O v = OT O v · v = O−1 O v · v = v · v = kvk2 .
Conversely, let us assume kO vk = kvk for all v ∈ Rn . Then in particular kOei k =
kei k = 1, so the vectors u1 = Oe1 , . . . , un = Oen are all unit vectors. Moreover, for
i 6= j we have
kui + uj k2 = kOei + Oej k2
2
= kO(ei + ej )k
= kei + ej k
2
(by definition of ui )
(by linearity)
(by our assumption kO vk = kvk for all v ∈ Rn )
= kei k2 + 2 ei · ej + kej k2
2
= kei k + kej k
2
2
(since ei · ej = 0)
= kOei k + kOej k2
= kui k2 + kuj k2
(by expanding (ei + ej ) · (ei + ej ))
(by our assumption kO vk = kvk for all v ∈ Rn )
(by definition of ui ).
6.3. SYMMETRIC AND ORTHOGONAL MATRICES
197
On the other hand, kui + uj k2 = kui k2 + 2 ui · uj + kuj k2 , so we conclude ui · uj = 0.
Consequently, {u1 , . . . , un } is an orthonormal basis for Rn . But u1 = Oe1 , . . . , un =
Oen are the column vectors of O, so Proposition 1 implies that O is an orthogonal
matrix, as we wanted to show.
2
It is still not immediately clear why a matrix satisfying (6.11) should be called
“orthogonal.” The following result shows the terminology is justified since an orthogonal
matrix maps orthogonal vectors to orthogonal vectors.
Corollary 2. If O is an orthogonal (n × n)-matrix and v, w are orthogonal vectors in
Rn , then Ov and Ow are orthogonal vectors in Rn .
Proof. Using kOvk = kvk and kOwk = kwk, we compute
kOv + Owk2 = kOvk2 + 2 Ov · Ow + kOwk2
= kvk2 + 2 Ov · Ow + kwk2 .
Meanwhile, using v · w = 0, we find
kv + wk2 = kvk2 + kwk2 .
But these must be equal since kO(v + w)k = kv + wk. We conclude Ov · Ow = 0. 2
Now let us return to our diagonalization of a real symmetric matrix A. If we use
the orthonormal eigenvectors provided by Theorem 1, then E is orthogonal, so let us
denote it by O. Hence we obtain
Theorem 2. If A is a real symmetric (n × n)-matrix, then there is an orthogonal
matrix O so that
OT A O = Diag(λ1 , . . . , λn ),
(6.12)
where λ1 , . . . , λn are the eigenvalues for A.
An (n × n)-matrix A satisfying (6.12) is called orthogonally diagonalizable. Note
that “orthogonally diagonalizable” is a stronger condition than just “diagonalizable.”
Example 1 (revisited). For the matrix A given in Example 1, the orthogonal matrix
O that provides the diagonalization OT A O = Diag(0, 0, 6) is
 1

− √2 − √13 √16

√1
√2 
O= 0
.
2
3
6
1
1
√1
√
√
− 3
2
6
Change of Basis
There is an important interpretation of (6.12) as expressing the action of A in terms
of a nonstandard basis for Rn . Let us discuss what changing the basis means for the
action of A before we explore its application to (6.12) and the proof of Theorem 1(c).
Suppose {v1 , . . . , vn } is an orthonormal basis for Rn (different from the standard
basis {e1 , . . . , en }). Then any vector w in Rn can be expressed either as a linear
198
CHAPTER 6. LINEAR TRANSFORMATIONS AND EIGENVALUES
combination of the standard basis vectors {e1 , . . . , en } or this new basis {v1 , . . . , vn }.
Let us denote the coefficients of these linear combinations respectively as x1 , . . . , xn and
y1 , . . . , y n :
w = x1 e1 + · · · xn en = y1 v1 + · · · + yn vn .
(6.13)
Notice that, like x1 , . . . , xn , the coefficients y1 , . . . , yn are unique (cf. Exercise 4). Therefore, we can think of y1 , . . . , yn as the coordinates of w in the basis {v1 , . . . , vn }
in the same way that x1 , . . . , xn are the coordinates of w in the basis {e1 , . . . , en },
provided we do not change the order of the v1 , . . . , vn . Therefore we specify that
B = {v1 , . . . , vn } is an ordered orthonormal basis for Rn , and we call y1 , . . . , yn in
(6.13) the B-coordinates for w.
What is the relationship between the standard coordinates xi and the B-coordinates
yi in (6.13)? If we let O be the matrix [v1 · · · vn ], then we see from (6.13) that x = O y.
Moreover, O is an orthogonal matrix, so we can multiply both sides of this last equation
by OT = O−1 to conclude y = OT x.
Now suppose that we have a linear transformation T : Rn → Rn . Usually such a
linear transformation is defined using an (n × n)-matrix A and matrix multiplication:
T w = A x,
where w = x1 e1 + · · · + xn en .
(6.14)
On the other hand, if we are first given the linear transformation T , then we can find
the matrix A so that (6.14) holds simply by computing T e1 , . . . , T en , and using them
as column vectors. But we could also express T using matrix multiplication on the
coordinates in the basis B = {v1 , . . . , vn }:
If w = y1 v1 + · · · + yn vn , then
T w = z1 v1 + · · · + zn vn ,
where z = AB y.
Here AB is the (n × n)-matrix that transforms the B-coordinates for w to the Bcoordinates for T w. What is the relationship between the matrices A and AB ? Using
y = OT x, we must have AB OT x = OT A x, which we can abbreviate as AB OT =
OT A. But now we can multiply on the right by O to express this as AB = OT A O.
We summarize this discussion in the following:
Proposition 3. Suppose B = {v1 , . . . , vn } is an ordered orthonormal basis for Rn .
Then, for any vector w in Rn , its coordinates x1 , . . . , xn in the basis {e1 , . . . , en }
and its coordinates y1 , . . . , yn in the basis {v1 , . . . , vn } are related by
Oy = x
or equivalently
y = OT x,
where O = [v1 · · · vn ]. Moreover, the action of an (n × n)-matrix A : Rn → Rn
(using the x-coordinates) can be expressed in terms of the y-coordinates for the
basis B using the matrix
AB = OT A O.
(6.15)
6.3. SYMMETRIC AND ORTHOGONAL MATRICES
199
Now let us interpret the formula (6.12) in light of Proposition 3: it simply says that
if we compute the action of a real symmetric (n × n)-matrix A in terms of its basis of
orthonormal eigenvectors B, we get a diagonal matrix:
AB = Diag(λ1 , . . . , λn ).
(6.16)
In other words, a real symmetric matrix admits a basis {v1 , . . . , vn } in which the matrix
has been diagonalized.
Proof of Theorem 1(c). We want to show that a real symmetric (n × n)-matrix
A has n orthonormal eigenvectors. We proceed by induction, namely we assume that
any real symmetric (n − 1) × (n − 1)-matrix has n − 1 orthonormal eigenvectors. But
we know all eigenvalues of A are real, so let us pick one and call it λ1 . Let u1 be a
normalized eigenvector associated with λ1 , and let V1 be the set of vectors in Rn that
are orthogonal to u1 :
V1 = {w in Rn such that w · u1 = 0}.
Notice that V1 is an (n − 1)-dimensional real vector space, so we can use the GramSchmidt procedure (cf. Section 5.7), to find an orthonormal basis for V1 ; we denote this
basis for V1 by u2 , . . . , un . Note that u1 , u2 , . . . , un is an orthonormal basis for Rn .
If v is in V1 , then Av is also in V1 since Av · u1 = 0:
Av · u1 = v · Au1 = v · λ1 u1 = λ1 v · u1 = 0.
So A maps V1 to itself. In terms of the basis B = {u1 , . . . , un } for Rn , we see that
λ1 0
AB =
e ,
0 A
e is a real (n − 1) × (n − 1)-matrix. Is A
e symmetric? Clearly it is if AB is
where A
symmetric. But by (6.15), AB = OT AO, where O = [u1 · · · un ], so
ATB = (OT A O)T = OT A O = AB .
e is symmetric. Consequently, by hypothWe conclude that AB is symmetric and hence A
e
esis, A has (n − 1) orthonormal eigenvectors ũ2 , . . . , ũn in V1 . (These are most likely
different from u2 , . . . , un , which were not required to be eigenvectors.) Thus AB has n
orthonormal eigenvectors: u1 , ũ2 , . . . , ũn . But this implies that A has n orthonormal
eigenvectors (cf. Exercise 5).
2
Example 3. Let B = {u1 , u2 } be the ordered orthonormal basis for R2 consisting
of u1 = √12 (1, −1) and u2 = √12 (1, 1), and T : R2 → R2 be a linear transformation
satisfying T u1 = u1 + 2u2 and T u2 = u1 − u2 . Find the matrix A satsfying (6.14).
Solution. In this case we have
√ √
√
1/ √2 1/√2
1/√2
O=
, OT =
−1/ 2 1/ 2
1/ 2
√ −1/√ 2
,
1/ 2
and AB =
1
2
1
.
−1
200
CHAPTER 6. LINEAR TRANSFORMATIONS AND EIGENVALUES
Consequently
A = OAB OT =
√
1/ √2
−1/ 2
√ 1/√2 1
1/ 2 2
√
1
1/√2
−1 1/ 2
√ √
−1/√ 2
3/ √2
=
1/ 2
−1/ 2
√ −3/√2
.2
−3/ 2
Exercises
1. For the following symmetric (n × n)-matrices, find a set of n orthonormal eigenvectors:
1 2
4 6
(a)
(b)
2 1
6 9




3 1 0
0 1 0
(d) 1 0 1
(c) 1 3 0
0 0 2
0 1 0
2. For the folowing matrices A, find an
D so that OT AO = D:
1 3
(a)
(b)
3 1


−1 0 0
(c)  0 1 2
(d)
0 2 1
orthogonal matrix O and a diagonal matrix
1 −1
−1 1


2 0 0
0 3 1
0 1 3
3. Use (6.11) to find the inverses of the following matrices:
√


 √
1 −1
1/√3 1/ 2 0
(b) √12 0 0
(a)  1/ √3
0√ 1
1 1
−1/ 3 1/ 2 0
√


2
1 √0 −1
1
0 

√2 1
(c) 
−1
2 −1 √0 
−1 0
1
2

√0
2
0
4. Show that the coefficients y1 , . . . , yn in (6.13) are unique.
5. If A is an (n × n)-matrix with n orthonormal eigenvectors and B = OT AO where
O is an orthogonal (n × n)-matrix, show that B has n orthonormal eigenvectors.
6. Let B = {u1 , u2 , u3 } be the ordered orthonormal basis for R3 consisting of vectors
u1 = √12 (1, 0, 1), u2 = (0, 1, 0), and u3 = √12 (−1, 0, 1), and T : R3 → R3 be a linear
transformation satisfying T u1 = u1 + u3 and T u2 = u2 − u3 , and T u3 = u1 − u2 .
Find the matrix A satsfying (6.14).
6.4. ADDITIONAL EXERCISES
6.4
201
Additional Exercises
1. Let P denote the polynomials with real coefficients on (−∞, ∞) and let T denote
differentiation: T (f ) = f 0 . (Recall from Exercise 2 in Section 5.5 that P is infinitedimensional.)
(a) Show that T : P → P is a linear transformation.
(b) Find all eigenvalues of T : P → P and describe the eigenspace.
2. Let V = C ∞ (R) denote the “smooth” real-valued functions on (−∞, ∞), i.e.
functions for which all derivatives are continuous. As in the previous problem, let
T denote differentiation: T (f ) = f 0 .
(a) Show that V is infinite-dimensional.
(b) Find all eigenvalues of T : V → V and describe the eigenspace.
3. If A is a square matrix, show that it is invertible if and only if 0 is not an
eigenvalue.
4. If A is an invertible matrix, show that λ is an eigenvalue for A if and only if λ−1
is an eigenvalue for A−1 .
5. If B is a square matrix, show that B and BT have the same eigenvalues. (Recall
from Section 4.5 that det(A) = det(AT ).)
6. Suppose v is an eigenvector with eigenvalue λ for A and also an eigenvector with
eigenvalue µ for B.
(a) Show v is an eigenvector for AB and find its eigenvalue.
(b) Show v is an eigenvector for A + B and find its eigenvalue.
7. Let M 2 (R) denote the (2 × 2)-matrices with real coefficients and
1
A=
0
0
.
−1
(a) Show that TA : B 7→ AB defines a linear transformation on the vector space
M 2 (R).
(b) Find all eigenvalues and an eigenbasis for TA .
8. As in the previous problem, consider
1
A=
1
0
2
as defining a linear transformation TA on M 2 (R). Find all eigenvalues and an
eigenbasis for TA .
9. For a square matrix A, show that the constant term (i.e. coefficient of λ0 ) in the
characteristic polynomial p(λ) is det(A).
202
CHAPTER 6. LINEAR TRANSFORMATIONS AND EIGENVALUES
A stochastic matrix is one for which the sum of the elements in each column is 1;
such matrices arise naturally in certain applications. A (2 × 2) stochastic matrix takes
the simple form
p
1−q
A=
where 0 < p, q < 1.
(6.17)
1−p
q
An interesting feature of stochastic matrices is that the limit Ak v as k → ∞ is the same
for all vectors v = (x, y) with the same value for x + y. This will be illustrated in the
exercises below.
10. Let A be the stochastic matrix (6.17).
(a) Show that the eigenvalues of A are λ1 = 1 and λ2 = p + q − 1 where |λ2 | < 1.
(b) If v1 and v2 are eigenvectors for λ1 and λ2 respectively, show that Ak v1 = v1
for all k = 1, 2, . . . and Ak v2 → 0 as k → ∞.
(c) If v0 = (x0 , y0 ) has the property x0 + y0 = 1 and, show that v1 = Av0 has
the same property: v1 = (x1 , y1 ) satisfies x1 + y1 = 1.
(d) Show that Ak → [v1∗ v1∗ ] as k → ∞, where v1∗ = (x∗1 , y1∗ ) is the eigenvector
for the eigenvalue λ1 = 1 satisfying x∗1 + y1∗ = 1.
11. The population of a state is fixed at 10 million, but divided into x million in urban
areas and y million in rural areas, so x + y = 10. Each year, 20% of the urban
population moves to rural areas, and 30% of the rural population moves to urban
areas.
(a) If the initial populations are x0 , y0 , introduce a stochastic matrix A as in
(6.17) so that the populations x1 , y1 after one year satisfy
x1
x
=A 0 .
y1
y0
(b) If initially x0 = 5 = y0 , find the populations after 10 years, i.e. x10 and y10 .
(c) If initially x0 = 7 and y0 = 3, find x10 and y10 .
(d) Explain the similar results that you obtained in (b) and (c) by referring to
part (d) of the previous exercise.
Chapter 7
Systems of First-Order
Equations
7.1
Introduction to First-Order Systems
In this chapter we want to study a first-order system of the form
dx1
= f1 (x1 , . . . , xn , t)
dt
dx2
= f2 (x1 , . . . , xn , t)
dt
..
.
dxn
= fn (x1 , . . . , xn , t),
dt
(7.1)
with initial conditions of the form
x1 (t0 ) = b1 , x2 (t0 ) = b2 , . . . , xn (t0 ) = bn .
(7.2)
Since all differential equations in this chapter will have t as the independent variable, we
often will write x0i in place of dxi /dt. As we saw in Section 2.1, a system of first-order
equations arises naturally in the study of an nth-order differential equation. For example, recall the initial-value problem for a forced mechanical vibration that we studied in
Section 2.6:
mx00 + cx0 + kx = f (t)
(7.3)
x(0) = x0 , x0 (0) = v0 ,
If we introduce x1 = x and x2 = x0 , then (7.3) is equivalent to the first-order system
with initial conditions
x01 = x2 ,
x1 (0) = x0 ,
1
x02 = (f (t) − cx2 − kx1 ), x2 (0) = v0 .
m
203
Fig.1. Spring-mass system
(7.4)
204
CHAPTER 7. SYSTEMS OF FIRST-ORDER EQUATIONS
We shall study this system and its generalization to coupled mechanical vibrations later
in this chapter.
But first-order systems like (7.1) also arise naturally in applications involving firstorder processes. For example, let us generalize the mixing problems of Section 1.4 to
two tanks as in Figure 1. If we let r0 and c0 denote respectively the rate and the
concentration of the solute, say salt, in the inflow to Tank 1, and we let rj denote
the rate of outflow from Tank j (for j = 1, 2) then we get a first-order system for the
amounts of salt x1 (t) and x2 (t) in the two tanks:
Fig.1.
Mixing with Two
Tanks
x01 = k0 − k1 x1
x02 = k1 x1 − k2 x2 ,
(7.5)
where k0 = c0 r0 , k1 (t) = r1 /V1 (t), and k2 (t) = r2 /V2 (t). If we are given the initial
amounts of salt in each tank, x1 (0) and x2 (0), then we expect that we can solve (7.5)
to find the amounts x1 (t) and x2 (t) at any time t.
Existence and Uniqueness of Solutions
Notice that we can simplify (7.1) by using vector notation: if we let x = (x1 , . . . , xn )
and f = (f1 , . . . , fn ), then we can write (7.1) and its initial condition (7.2) as
dx
= f (x, t)
dt
x(t0 ) = x0 ,
(7.6)
where we have let x0 = (b1 , . . . , bn ). If a solution x(t) exists, then it can be considered as
defining a curve in n-dimensional space starting at the point x0 . Let us now address the
question of whether a unique solution exists and for how long. In the following, we use
fx to denote ∂fi /∂xj for i, j = 1, . . . , n, i.e. all first-order derivatives of the components
fi with respect to the xj .
Theorem 1. Suppose t0 is a fixed point in an interval I = (a, b) and x0 is a fixed point
in Rn such that f (x, t) and fx (x, t) are continuous for a < t < b and |x − x0 | < R.
Then there is a unique solution x(t) of (7.6) defined for t in a neighborhood of t0 , i.e.
for |t − t0 | < where is positive but may be small.
This theorem is quite analogous to Theorem 1 in Section 1.2, and is also proved using
successive approximations. In particular, since the dependence of f upon x may be
nonlinear, the solution x(t) need not be defined on all of the interval I.
x2
Autonomous Systems and Geometric Analysis
x1
Fig.2. Vector Field f and a
Solution Curve
Like first-order equations, first-order systems are called autonomous if the function f
in (7.6) does not depend on t :
dx
= f (x).
(7.7)
dt
Notice that f assigns an n-vector to each point x; in other words, f is a vector field
on Rn . Moreover, a solution x(t) of (7.7) is a curve in Rn whose tangent vector at each
point is given by the vector field at that point; this is illustrated in Figure 2 for n = 2.
By sketching the vector field and a few sample solution curves, we obtain geometric
7.1. INTRODUCTION TO FIRST-ORDER SYSTEMS
205
insight into the behavior of solutions of a first-order autonomous system. When n = 2,
such pictures are called phase plane portraits. Some examples are shown in Figures
3 and 4; we have added arrows to indicate the direction of increasing t.
Like first-order autonomous equations, first-order autonomous systems admit equilibrium solutions and stability analysis. Values x0 for which f (x0 ) = 0, i.e. for which the
vector field vanishes, are called critical points and they correspond to equilibrium
solutions of (7.7) if we let x(t) ≡ x0 . The notion of stability for an equilibrium solution of a first-order system is similar to the case of a single equation that we discussed
in Section 1.2, but is a little different since it involves solution curves in n-dimensional
space:
• An equilibrium solution x0 for (7.7) is stable if all nearby solutions remain nearby.
This is certainly true if all nearby solutions approach x0 ; as in Section 1.2 such an
equilibrium is called a sink. But for systems it is possible for solutions to remain
nearby by orbiting around the equilibrium; such an equilibrium is called a center.
These stable equilibria are illustrated below.
x2
x1
Fig.3. Phase Plane
x2
Fig.5. Two Stable Equilibria: a Sink (left) and a Center (right).
• An equilibrium solution x0 for (7.7) is unstable if at least some nearby solutions
move away from x0 . This is certainly true if all nearby solutions move away from
the equilibrium; as in Section 1.2 such an equilibrium is called a source. However,
for systems it could be that some solutions approach x0 while others move away;
such an equilibrium is called a saddle. These unstable equilibria are illustrated
below.
Fig.6. Two Unstable Equilibria: a Source (left) and a Saddle (right).
x1
Fig.4. Phase Plane
206
CHAPTER 7. SYSTEMS OF FIRST-ORDER EQUATIONS
In applications where initial conditions can only be approximated, an unstable equilibrium is of limited significance since a very small perturbation could yield an extremely
different outcome. On the other hand, stable equilibria generally represent very important states of a physical system.
Example 1. Let us consider damped free vibrations of a mechanical system, i.e. (7.3)
with f ≡ 0:
mx00 + cx0 + kx = 0.
(7.8)
If we introduce x1 = x and x2 = x0 , then we obtain the first-order system
x01 = x2
x02 = −
x2
x1
We see that the only critical point of (7.9) is x1 = 0, x2 = 0, corresponding to the
trivial solution of (7.8) x(t) ≡ 0. We shall study (7.9) in detail in Section 7.5, but for
now let us use the solutions that we obtained in Chapter 2 for (7.8) to reach conclusions
about the stability of the critical point (0, 0) for (7.9). Let us consider the overdamped
(c2 > 4mk) and underdamped (c2 < 4mk) cases separately.
If (7.8) is overdamped, then we know from Section 2.6 that the general solution is
x(t) = c1 er1 t + c2 er2 t
Fig.7. Phase Plane for an
Overdamped Vibration
(7.9)
c
k
x1 −
x2 .
m
m
where r1 < r2 < 0.
Recalling that x1 = x and x2 = x0 , we see that the general solution of (7.9) can be
written
c1 er1 t + c2 er2 t
x(t) = 0 r1 t
where c0j = cj rj .
c1 e + c02 er2 t
We see that both components of x(t) tend to zero exponentially as t → ∞, so (0, 0) is
a sink; this is illustrated in Figure 7 (with m = k = 1, c = 3).
If (7.8) is underdamped, then we know from Section 2.6 that the general solution is
ct
x(t) = e− 2m (c1 cos µt + c2 sin µt)
x2
where µ > 0.
In terms of our vector formulation, we have
x1
Fig.8. Phase Plane for an
Underdamped Vibration
− ct
e 2m (c1 cos µt + c2 sin µt)
x(t) = − ct 0
e 2m (c1 cos µt + c02 sin µt)
for certain constants c0j .
We again have both components decaying exponentially to zero as t → ∞, but both
also oscillate about 0. If we plot the solution curves in the phase plane, we see that they
spiral about (0, 0) as they approach it; such an equilibrium is called a stable spiral.
Thus for both overdamped and underdamped mechanical vibrations (as well as critically damped vibrations whose phase plane diagram looks much like that in Figure 7),
the origin is a stable equilibrium. Clearly this equilibrium is important as the “end
state” of the damped system.
2
7.2. THEORY OF FIRST-ORDER LINEAR SYSTEMS
207
Exercises
1. For the following second-order linear equations: i) use x1 = x and x2 = x0 to
replace the equation with an equivalent first-order system; ii) use the general
solution for the second-order equation to determine the stability of the critical
point (0, 0) for the first-order system.
(a) x00 − 4x0 + 3x = 0,
(b) x00 + 9x = 0,
(c) x00 − 2x0 − 3x = 0,
(d) x00 + 5x0 + 6x = 0.
2. For the following second-order nonlinear equations: i) use x1 = x and x2 = x0
to replace the equation with an equivalent first-order system; ii) find all critical
points for the system.
(a) x00 + x(x − 1) = 0,
(b) x00 + x0 + ex = 1,
(c) x00 = sin x,
(d) x00 = (x0 )2 + x(x − 1).
3. For each of the first-order systems obtained in Exercise 2, plot the vector field
near each critical point to determine its stability.
7.2
Theory of First-Order Linear Systems
Now let us bring our knowledge of linear algebra into the analysis of a system of
first-order linear equations or first-order linear system of the form
x01 = a11 (t)x1 + a12 (t)x2 + · · · + a1n (t)xn + f1 (t)
x02 = a21 (t)x1 + a22 (t)x2 + · · · + a2n (t)xn + f2 (t)
..
.
(7.10)
x0n = an1 (t)x1 + an2 (t)x2 + · · · + ann (t)xn + fn (t),
where aij (t) and fi (t) are known functions of t. If f1 = · · · fn = 0, then (7.10) is
homogeneous; otherwise it is nonhomogeneous. An initial-value problem for (7.10)
specifies the values of x1 , . . . xn at some value t0 of t:
x1 (t0 ) = b1 ,
x2 (t0 ) = b2 ,
...
xn (t0 ) = bn ,
(7.11)
where b1 , . . . , bn are given numbers.
As we saw in Section 2.1, higher-order equations can be reduced to first-order systems
by introducing additional unknown functions. Similarly, given a first-order system of
differential equations, we may be able to eliminate variables and obtain a single higherorder equation involving just one of the unknown functions, say x1 ; if we can solve that
equation for x1 , then we can use it to obtain the other unknown functions. This method
of elimination is particularly useful for a first-order linear system with two equations
and two unknowns since it reduces to a second-order linear differential equation.
Example 1. (a) Find the general solution of
x01 = x2
x02 = 2x1 + x2 .
208
CHAPTER 7. SYSTEMS OF FIRST-ORDER EQUATIONS
(b) Find the particular solution satisfying the initial conditions
x1 (0) = −1,
x2 (0) = 0.
Solution. (a) We differentiate the first equation, and then use both equations to
eliminate x2 :
x001 = x02 = 2x1 + x2 = 2x1 + x01 .
This is a second-order equation for x1 : x001 − x01 − 2x1 = 0. The characteristic equation
is r2 − r − 2 = (r + 1)(r − 2), so the general solution is
x1 (t) = c1 e−t + c2 e2t .
Now we use the first equation to obtain x2 from x1 simply by differentiation:
x2 (t) = −c1 e−t + 2c2 e2t .
(b) We use the initial conditions to find the constants c1 and c2 :
x1 (0) = c1 + c2 = −1
x2 (0) = −c1 + 2c2 = 0.
We easily solve these two equations to find c1 = −2/3 and c2 = −1/3. So the solution
satisfying the initial conditions is
x1 (t) = − 32 e−t − 13 e2t ,
x2 (t) = 23 e−t − 23 e2t .
2
If we use vector and matrix notation, we can greatly simplify the expression (7.10).
We define an (n × n)-matrix valued function A(t) and n-vector valued function f (t) by

 

a11 a12 · · · a1n
f1
 a21 a22 · · · a2n 
 f2 

 

A= .
f =  . .
,
 ..
 .. 

an1
an2
· · · ann
fn
These are functions of t, which we generally assume are continuous on an interval I,
continuity being defined in terms of their element functions. Now we can write (7.10)
as
x0 = A(t) x + f (t),
(7.12)
where x is the n-vector valued function treated as a column vector


x1 (t)
 x2 (t) 


x(t) =  .  .
 .. 
xn (t)
Similarly, the initial condition is simplified using vector notation:
x(t0 ) = b.
(7.13)
The existence and uniqueness theorem discussed in the previous section assures us that
solutions of of the initial-value problem (7.12)-(7.13) exist, but the linearity of (7.12)
implies a stronger result:
7.2. THEORY OF FIRST-ORDER LINEAR SYSTEMS
209
Theorem 1. If A(t) and f (t) are continuous on an interval I containing t0 , then
(7.12)-(7.13) has a unique solution x defined on I.
Note that linearity implies that existence and uniqueness hold on all of the interval I,
and not just near t0 .
As was the case for second-order equations, the solution of the nonhomogeneous
system (7.12) is closely connected with the associated homogeneous system
x0 = A(t) x.
(7.14)
If c1 , c2 are constants and x1 , x2 are both solutions of (7.14), then by linearity, c1 x1 +
c2 x2 is also a solution, so the solutions of (7.14) form a vector space. In fact, if the
solutions x1 , . . . , xn are linearly independent, then we want to show that every solution
of (7.14) is of the form
x(t) = c1 x1 (t) + · · · cn xn (t)
(7.15)
for some choice of the constants c1 , . . . , cn . As in previous sections, the linear independence of functions is determined by the “Wronskian.”
Definition 1. Let x1 , . . . , xn be n-vector functions defined on an interval I. The Wronskian of x1 , . . . , xn is the determinant of the (n × n)-matrix obtained by using the xj
as column vectors:
W [x1 , . . . , xn ](t) = det x1 (t) x2 (t) · · ·
xn (t) .
Theorem 2. Suppose x1 , . . . , xn are n-vector-valued functions on an interval I.
(a) If W [x1 , . . . , xn ](t0 ) 6= 0 for some t0 in I, then x1 , . . . , xn are linearly independent
functions.
(b) If x1 , . . . , xn are all solutions of (7.14) and W [x1 , . . . , xn ](t0 ) = 0 for some t0 in
I, then x1 , . . . , xn are linearly dependent on I.
Proof. (a) Let c1 x1 (t) + · · · cn xn (t) ≡ 0 for t in I; we must show c1 = · · · = cn = 0.
But if we let X(t) denote the (n×n)-matrix with the xj as column vectors, and c denote
the vector (c1 , . . . , cn ), then we can write our assumption as X(t) c = 0; in particular,
we have X(t0 ) c = 0. But W [x1 , . . . , xn ](t0 ) = det X(t0 ) 6= 0, so X(t0 ) is invertible and
we must have c = 0.
(b) If W [x1 , . . . , xn ](t0 ) = 0 then the n vectors x1 (t0 ), . . . , xn (t0 ) are linearly dependent,
so there exists a nonzero vector c = (c1 , . . . , cn ) such that c1 x1 (t0 ) + · · · cn xn (t0 ) = 0.
Using these constants, define x(t) = c1 x1 (t) + · · · cn xn (t), which satisfies (7.14) and
x(t0 ) = 0. But the uniqueness statement in Theorem 1 implies that x(t) ≡ 0, i.e.
c1 x1 (t) + · · · cn xn (t) ≡ 0 for all t in I, so x1 , . . . , xn are linearly dependent on I.
2
Example 2. Show that the following three 3-vector functions are linearly independent
on I = (−∞, ∞):
x1 (t) = (1, 0, et ),
x2 (t) = (0, 1, et ),
x3 (t) = (sin t, cos t, 0).
210
CHAPTER 7. SYSTEMS OF FIRST-ORDER EQUATIONS
Solution. We compute the Wronskian
W [x1 , x2 , x3 ](t) =
1
0
et
0
1
et
sin t
1
cos t = t
e
0
0
cos t
+ sin t t
e
0
1
= −et (cos t + sin t).
et
Now the Wronskian does vanish at some points, but all we need is one point were
it is nonzero. Since W [x1 , x2 , x3 ](0) = −1, we conclude that x1 , x2 , x3 are linearly
independent.
2
Now we are ready to find all solutions for the homogeneous system (7.14).
Theorem 3. Suppose A(t) is continuous on an interval I and x1 , . . . , xn are linearly
independent solutions of (7.14). Then every solution of (7.14) is of the form (7.15).
Proof. Let S denote the vector space of all solutions of (7.14). It suffices to show
that S is n-dimensional, since then any linearly independent set of n solutions forms a
basis for S. Pick t0 in I and for each j = 1, . . . , n, let xj be the unique solution of the
initial-value problem
x0 = A x,
x(t0 ) = ej .
Now let x be any solution of (7.14) and consider its value at t0 . Since {e1 , . . . , en } is a
basis, we can find constants c1 , . . . , cn so that x(t0 ) = c1 e1 + · · · + cn en . Using these
same constants, let us define a solution of (7.14) by y = c1 x1 + · · · cn xn . Then x and y
are both solutions of (7.14) with the same value at t0 : x(t0 ) = y(t0 ). So the uniqueness
statement in Theorem 1 implies that x ≡ y, i.e. we have written x in the form (7.15).
2
As a consequence of this theorem, if x1 , . . . , xn are linearly independent solutions of
(7.14) then we define (7.15) to be the general solution of (7.14). As with secondorder equations, the general solution may be used to solve initial-value problems.
Example 3. Show that the following functions are linearly independent solutions of
the given homogeneous system, and use them to find the solution satisfying the given
initial conditions:
t 2t 4
2
1
2e
e
0
.
x1 (t) =
, x2 (t) =
; x =
x, x(0) =
−3et
−e2t
−3 −1
0
Solution. To show x1 and x2 are linearly independent, we compute the Wronskian:
W [x1 , x2 ](t) =
2et
−3et
e2t
= −2e3t + 3e3t = e3t 6= 0.
−e2t
Before proceeding, let us observe that we can rewrite x1 and x2 as
2
1
x 1 = et
and x2 = e2t
.
−3
−1
7.2. THEORY OF FIRST-ORDER LINEAR SYSTEMS
211
This slightly simplifies the calculation showing that x1 and x2 are solutions, e.g.
1
4
2
4
2
1
2
1
x02 = 2e2t
and
x2 = e2t
= e2t
= 2e2t
.
−1
−3 −1
−3 −1 −1
−2
−1
It also allows us to express the general solution in multiple ways:
t 2t 2
1
2e
e
2c1 et + c2 e2t
.
x(t) = c1 et
+ c2 e2t
= c1
+
c
=
2
−3
−1
−3et
−e2t
−3c1 et − c2 e2t
Finally, we can use the initial conditions to specify the constants:
2c1 + c2
1
x(0) =
=
⇒ c1 = −1, c2 = 3.
−3c1 − c2
0
We conclude that the solution is
−2et + 3e2t
x(t) =
.
3et − 3e2t
2
As with second-order equations, the general solution for the nonhomogeneous system
(7.12) is obtained from a particular solution and the general solution of (7.14).
Theorem 4. Suppose A(t) and f (t) are continuous on an interval I and xp is a
particular solution of (7.12). Then every solution of (7.12) is of the form
x(t) = xp (t) + xh (t),
where xh is the general solution of (7.14).
To find a particular solution xp for (7.12), the simplest case is when A and f are both
independent of t, since the system is then autonomous and we can use an equilibrium
solution as xp . This reduces to solving the algebraic equation Ax + f = 0.
Example 4. Find the general solution for
4
2
2
x0 =
x+
.
−3 −1
0
Solution. We observe that the associated homogeneous system is the same as in Example 3, so we know that
2
1
xh (t) = c1 et
+ c2 e2t
.
(7.16)
−3
−1
Since the system is autonomous, we find the equilibrium solution by solving the algebraic
equation:
4
2
2
0
xp +
=
.
−3 −1
0
0
212
CHAPTER 7. SYSTEMS OF FIRST-ORDER EQUATIONS
We achieve this, as usual, by putting the augmented coefficient matrix into REF:
1 1/2 −1/2
1
4
2 −2
∼
⇒ xp =
.
−3
−3 −1 0
0 1
−3
We conclude that the general solution is
1
2
1
t
2t
x(t) =
+ c1 e
+ c2 e
.
−3
−3
−1
2
When A is constant and the components of f are functions of t, then we can try to
find xp using the method of undetermined coefficients, as we did for second-order
equations in Section 2.5. Since we are dealing with systems, the method can become a
little more complicated; but rather than trying to cover all cases, let us illustrate what
is involved with an example.
Example 5. Find the general solution for
4
2
2
0
x =
x+ t .
−3 −1
e
Solution. We only need xp since xh is given by (7.16). The functions 2 and et require
different forms in the method of undetermined coefficients, so let us write
2
2
0
f (t) = t =
+ t ,
e
0
e
and try to find separate particular solutions x1p and x2p for
2
0
f1 (t) =
and f2 (t) = t .
0
e
For f1 , we get the autonomous system as in Example 4, so we could find x1p as an
equilibrium as we did there. But let us take the perspective of undetermined coefficients
and assume x1p is a constant vector a = (A, B). We plug-in to evaluate A and B:
0
4
2
2
4A + 2B + 2
4A + 2B = −2
0
x1p =
and
x +
=
⇒
0
−3 −1 p
0
−3A − B
−3A − B = 0.
We can solve the coefficients to conclude A = 1 and B = −3 to obtain x1p = (1, −3).
For f2 , we might try x2p = et a where a is a constant vector, but this fails; the reason
is that vectors of the form et a may be a solution of the associated homogeneous system
given by (7.16). Instead, let us play it safe by taking:
x2p = tet a + et b,
where a and b are constant vectors.
If we plug into x0 = Ax + f2 , we find
−2
−2/3
−2/3
a=
and b =
+s
,
3
0
1
where s is a free variable.
We may choose s = −1 and conclude
x2p = tet
−2t
.
3t − 1
So the general solution is x = x1p + x2p + xh where xh is given by (7.16).
2
7.2. THEORY OF FIRST-ORDER LINEAR SYSTEMS
213
Remark 1. While Theorems 3 and 4 have important and useful conclusions, they do
not provide a method for actually finding n linearly independent solutions of (7.14).
This topic will be discussed in detail in the next section.
Exercises
1. Use the method of elimination to obtain the general solution for the following
first-order linear systems. If initial conditions are given, also find the particular
solution satisfying them.
(a) x01 = 2x2 , x02 = −2x1
Solution
(b) x01 = 3x1 + x2 , x02 = −2x1 , x1 (0) = 1, x2 (0) = 0
(c) x01 = 2x1 − 3x2 , x02 = x1 − 2x2
(d) x01 = x1 + 2x2 + 5e4t , x02 = 2x1 + x2 .
2. Express each of the systems in the previous problem in matrix form x0 = Ax + f .
3. (i) Verify that the given vector functions xj (t) are linearly independent, (ii) Verify
that they satisfy the given homogeneous first-order system, and (iii) Use them to
solve the initial-value problem.
−3 2
0
3t 1
−2t 2
0
(a) x1 = e
, x2 = e
; x =
x, x(0) =
.
3
1
−3 4
5
sin 2t
− cos 2t
0 2
1
(b) x1 =
, x2 =
; x0 =
x, x(0) =
;
cos 2t
sin 2t
−2 0
1
 2t 
 −t 
 −t 


 
e
−e
−e
0 1 1
0
(c) x1 = e2t , x2 =  e−t , x3 =  0 ; x0 = 1 0 1 x, x(0) = 0;
e2t
0
e−t
1 1 0
5
 t






3t
5t
3 −2 0
2e
−2e
2e
(d) x1 = 2et , x2 =  0 , x3 = −2e5t ; x0 = −1 3 −2 x,
et
e3t
e5t
0 −1 3
x(0) = (4, 0, 0).
4. Find a particular solution xp as an equilibrium for the following nonhomogeneous
(but autonomous) systems.
1 2
0
1 2
1
0
0
(b) x =
x+
(a) x =
x+
, Solution
3 4
1
2 1
−1


 


 
1 −1 −2
0
1 −1 −2
−1
(c) x0 = 2 −1 −3 x + −2
(d) x0 = 2 −1 −3 x +  1 .
3 −3 −5
1
3 −2 −5
0
5. Use undetermined coefficients to find a particular solution xp of the following
nonhomogeneous equations.
1 3
4 e2t
0 1
sin 2t
0
0
(a) x =
x+
;
(b) x =
x−
;
2 1
−3 e2t
2 0
4 cos 2t
t t
1 2
e
1 1
e
Atet
0
x+
;
(d)
x
=
x
+
,
try
x
=
.
(c) x0 =
p
2 1
3 e2t
0 2
3et
Bet
214
7.3
CHAPTER 7. SYSTEMS OF FIRST-ORDER EQUATIONS
Eigenvalue Method for Homogeneous Systems
In this section we shall investigate how to find the general solution for the homogeneous
system of first-order equations (7.14) when A is a constant matrix:
x0 = A x.
(7.17)
Notice that (7.17) is autonomous, and the trivial solution x(t) ≡ 0 is an equilibrium.
(Assuming that A is invertible, x = 0 is the only equilibrium solution of (7.17).) We
are anxious to construct nonconstant solutions. Suppose that λ is a real eigenvalue for
A with real eigenvector v, and consider the vector function
x(t) = eλt v.
(7.18)
Then x0 = λeλt v and Ax = A(eλt v) = eλt Av = eλt λv, so (7.18) is a solution of (7.17).
Note that, for all values of t, x(t) lies on the ray passing through the vector v, so a
solution of the form (7.18) is called a straight-line solution of (7.17). Moreover, if
λ < 0 then x(t) → 0 as t → ∞, while if λ > 0 then x(t) tends away from 0 as t → ∞. So
the signs of the eigenvalues determines the stability of the equilibrium solution x(t) ≡ 0.
Example 1. Find the general solution of
5
0
x =
8
−4
x,
−7
and determine the stability of the equilibrium solution x(t) ≡ 0.
Solution. We first find the eigenvalues and eigenvectors for A:
det(A − λI) =
5−λ
8
−4
= λ2 + 2λ − 3 = (λ + 3)(λ − 1).
−7 − λ
So A has eigenvalues λ1 = 1 and λ2 = −3. For λ1 = 1 we find an eigenvector
4 −4
1 −1
1
A − λ1 I =
∼
⇒ select v1 =
,
8 −8
0 0
1
Fig.1.
Straight-line solu-
tion x1 (t).
and then use (7.18) to define a straight-line solution of (7.17):
t 1
x1 (t) = e
.
1
Notice that x1 (t) moves away from 0 as t → ∞; see Figure 1. Similarly, for λ2 = −3
we find
8 −4
2 −1
1
A − λ2 I =
∼
⇒ select v2 =
,
8 −4
0 0
2
and obtain the straight-line solution
x2 (t) = e
−3t
1
.
2
Notice that x2 (t) → 0 as t → ∞; see Figure 2. Moreover, x1 and x2 are linearly
Fig.2.
Straight-line solu-
tion x2 (t).
7.3. EIGENVALUE METHOD FOR HOMOGENEOUS SYSTEMS
215
independent since
W [x1 , x2 ](t) =
et
et
e−3t
= e−2t 6= 0,
2e−3t
x2
so the general solution is
x(t) = c1 et
1
1
+ c2 e−3t
.
1
2
x1
If we choose an initial condition for which c1 = 0, then x(t) → 0. However, if c1 6= 0,
then x(t) tends to infinity (in some direction) as t → ∞. Consequently, x = 0 is an
unstable equilibrium, in fact a saddle point; see Figure 3.
2
This example shows how we can proceed in general. Suppose A has n linearly independent eigenvectors v1 , . . . , vn with associated eigenvalues λ1 , . . . , λn (not necessarily
distinct). Then we have n solutions of (7.17) that are in the special form (7.18):
x1 (t) = eλ1 t v1 , . . . , xn (t) = eλn t vn .
Moreover, the solutions are linearly independent since
W [x1 , . . . , xn ](t) = det([ eλ1 t v1 · · · eλn t vn ]) = e(λ1 +···+λn )t det([ v1 · · · vn ]) 6= 0
for any t. Notice that the above remarks apply to complex eigenvalues and eigenvectors
as well as real ones. However, since we are interested in real-valued solutions of (7.17),
let us first restrict our attention to the case of real eigenvalues and eigenvectors. We
have shown the following:
Theorem 1. If A is an (n×n)-matrix of real constants that has an eigenbasis v1 , . . . , vn
for Rn , then the general solution of (7.17) is given by
x(t) = c1 eλ1 t v1 + · · · + cn eλn t vn ,
where λ1 , . . . , λn are the (not necessarily distinct) eigenvalues associated respectively
with the eigenvectors v1 , . . . , vn .
Example 2. Find the general solution of

4 −3
x0 =  2 −1
0 0

1
1  x,
2
and determine the stability of the equilibrium solution x(t) ≡ 0.
Solution. Notice that the coefficient matrix is the matrix A of Example 4 in Section
6.1. There we found that A has an eigenvalue λ1 = 1 with eigenvector v1 = (1, 1, 0);
this lead to the solution
 
1
x1 (t) = et 1 .
0
Fig.3. Full Phase Plane for
Example 1.
216
CHAPTER 7. SYSTEMS OF FIRST-ORDER EQUATIONS
We also found that A has an eigenvalue λ2 = 2 with two linearly independent eigenvectors, v2 = (3, 2, 0) and v3 = (−1, 0, 2); these lead to the two solutions
 
 
3
−1
x2 (t) = e2t 2 and x3 (t) = e2t  0  .
0
2
We conclude that the general solution is
 
 
 
1
3
−1
x(t) = c1 et 1 + c2 e2t 2 + c3 e2t  0  .
0
0
2
Notice that all solutions tend away from (0, 0, 0) as t increases, so x = 0 is an unstable
equilibrium, in fact a source point.
2
Complex Eigenvalues
Suppose that the characteristic polynomial p(λ) for the constant matrix A has a complex
root λ = α + βi leading to a complex eigenvector v. For the reasons above, we see that
eλt v is a solution of (7.17), but it is complex-valued. However, if A is real-valued then
we would prefer to have real-valued solutions of (7.17). Is there anything to be done?
Recall that complex eigenvalues and eigenvectors for a real-valued matrix A come
in complex conjugate pairs. So w(t) = eλt v and w(t) = eλt v are linearly independent
complex-valued solutions of (7.17). If we decompose w(t) into its real and imaginary
parts, then we can write
w(t) = a(t) + ib(t)
and w(t) = a(t) − ib(t).
But a linear combination of two solutions of (7.17) is again a solution, so
a(t) =
1
(w(t) + w(t))
2
and b(t) =
1
(w(t) − w(t))
2i
are both real-valued solutions of (7.17). Moreover, a(t) and b(t) are linearly independent
since their linear span (over C) includes both w(t) and w(t). We have shown the
following.
Theorem 2. If A is an (n × n)-matrix of real constants that has a complex eigenvalue
λ and eigenvector v, then the real and imaginary parts of w(t) = eλt v are linearly
independent real-valued solutions of (7.17): x1 (t) = Re(w(t)) and x2 (t) = Im(w(t)).
Example 3. Find the general solution of
−1
x0 =
−2
2
x.
−1
Solution. We find the eigenvalues of A:
det(A − λI) =
−1 − λ
−2
2
= λ2 + 2λ + 5 = 0.
−1 − λ
7.3. EIGENVALUE METHOD FOR HOMOGENEOUS SYSTEMS
217
Using the quadratic formula, we find λ = −1 ± 2i. Let us take λ1 = −1 + 2i and find
an associated eigenvector:
−2i
2
2
2i
1 i
1
A − λ1 I =
∼
∼
⇒ select v1 =
.
−2 −2i
−2 −2i
0 0
i
We now have a complex-valued solution
(−1+2i)t
w(t) = e
1
−t 2it 1
.
=e e
i
i
In order to find the real and imaginary parts of this solution, we need to use Euler’s
formula (see Appendix A):
e2it = cos 2t + i sin 2t.
Now we compute
x2
w(t) = e−t
cos 2t + i sin 2t
cos 2t
sin 2t
= e−t
+ i e−t
,
− sin 2t + i cos 2t
− sin 2t
cos 2t
x1
and take the real and imaginary parts to obtain two real-valued solutions:
cos 2t
−t
−t sin 2t
x1 (t) = e
and x2 (t) = e
.
− sin 2t
cos 2t
We conclude that the general solution is
cos 2t
sin 2t
x(t) = c1 e−t
+ c2 e−t
.
− sin 2t
cos 2t
We see that for any choice of c1 and c2 , x(t) → 0 as t → ∞, so the equilibrium at x = 0
is stable; in fact is a stable spiral point (see Figure 2).
2
Remark 1. When (λ, v) is a complex eigenpair, we should not call w(t) = eλt v a
“straight-line solution” since we have seen that the real-valued solutions do not move in
straight lines, but rather circle the equilibrium 0 in spirals or closed curves.
Defective Eigenvalues and Generalized Eigenvectors
Recall from Section 6.1, that a real eigenvalue may have algebraic multiplicity ma greater
than its geometric multiplicity mg ; in this case, the eigenvalue is called defective and
d = ma − mg is called its defect. When A has a defective eigenvalue, it fails to have
an eigenbasis, so we cannot use Theorem 1. However, we would still like to find the
general solution for (7.17). What can be done?
In answering this question, we shall encounter the following concept.
Definition 1. If A is a square matrix and p is a positive integer, then a nonzero
solution v of
(A − λ)p v = 0
(7.19)
is called a generalized eigenvector for the eigenvalue λ.
Fig.2. Phase Plane for Example 3.
218
CHAPTER 7. SYSTEMS OF FIRST-ORDER EQUATIONS
Of course, eigenvectors correspond to the special case p = 1 in (7.19). Moreover, if
(7.19) holds for p > 1, then it could hold for a smaller value of p; but if we let p0 denote
the smallest positive integer for which (7.19) holds, then v1 = (A − λ)p0 −1 v 6= 0 and
satisfies (A−λI)v1 = (A−λ)p0 v = 0, so v1 is an eigenvector for λ. In particular, we see
that generalized eigenvectors for an eigenvalue λ only exist when an actual eigenvector
exists. But when the eigenvalue is defective, generalized eigenvectors can be used to
find solutions of (7.17) by generalizing the definition (7.18).
Let us first discuss the case of an eigenvalue λ with ma = 2 and mg = 1. We have
one linearly independent eigenvector v which can be used to create a solution of (7.17):
⇒
Av = λv
x1 (t) = eλt v.
Recalling the case of multiple roots of the characteristic polynomial for a higher-order
differential equation, we might try multiplying this by t to create another solution; but
defining x2 (t) = t eλt v does not work. Before we give up on this idea, let us generalize
it and try
x2 (t) = eλt (tu + w),
where the vectors u and w are to be determined. Compute and compare x02 and Ax2 :
x02 = eλt (tλu + λw + u)
and Ax2 = eλt (tAu + Aw).
For these to be equal for all t, we require λu = Au and λw + u = Aw. In other words,
if we take u = v and w satisfies (A − λ)w = v, then x2 (t) is indeed a solution of (7.17)!
Moreover, we claim that x1 (t) and x2 (t) are linearly independent. To verify this, we
compute the Wronskian at t = 0 to find W [x1 , x2 ](0) = det( v w ), so we only need
show that v and w are linearly independent vectors. But
c1 v + c2 w = 0 ⇒ (A − λ)(c1 v + c2 w) = c2 v = 0
shows c2 = 0 and hence c1 = 0, confirming that v and w are linearly independent. With
a change in notation, let us summarize this analysis as follows
If λ is an eigenvalue for A with ma = 2 and mg = 1, then two linearly independent
solutions of (7.17) are
• x1 (t) = eλt v1 where v1 is an eigenvector for λ
• x2 (t) = eλt (tv1 + v2 ) where (A − λI)v2 = v1 .
Note that v2 is a generalized eigenvector for λ since (A − λI)2 v2 = 0 although
(A − λI)v2 6= 0. Below, we shall discuss why we can always solve (A − λI)v2 = v1 .
Example 4. Find two linearly independent solutions of
1 −2
0
x =
x.
2 5
Solution. We find the eigenvalues of the matrix A:
det(A − λI) =
1−λ
2
−2
= λ2 − 6λ + 9 = (λ − 3)2 ,
5−λ
7.3. EIGENVALUE METHOD FOR HOMOGENEOUS SYSTEMS
219
so we have only one eigenvalue, namely λ = 3. Now we find its eigenvectors:
−2 −2
1 1
1
A − λI =
∼
⇒ select v1 =
.
2
2
0 0
−1
x2
We see that the eigenspace is one-dimensional so mg = 1, whereas we had ma = 2; so
λ = 3 is defective. Now we want to solve (A − λ)v2 = v1 :
−2
2
−2
2
1
−1
∼
1
0
1
0
−1/2
0
x1
implies that x2 = s is free and x1 = −1/2 − s, so we can choose s = 0 to obtain
Fig.3. Phase Plane for Ex-
−1/2
v2 =
.
0
ample 4.
We conclude that we have the two linearly independent solutions
1
1
−1/2
x1 (t) = e3t
and x2 (t) = e3t t
+
.
−1
−1
0
Both of these solutions grow exponentially as t → ∞, so the equilibrium at x = 0 is
unstable; in fact, it is a source (see Figure 3).
2
The above analysis naturally raise two questions:
• How did we know that we could solve (A − λI)v2 = v1 ?
• What happens if the defect d is greater than 1?
To address the first question, we change the logic. We first pick v2 to be a generalized
eigenvector, so that (A − λ)2 v2 = 0 but (A − λ)v2 6= 0. Then we define v1 = (A − λ)v2 .
We know v1 is nonzero and (A − λ)v1 = 0, so v1 is an eigenvector; we are done.
Let us use this logic to address the second question. We suppose v is a generalized
eigenvector for the eigenvalue λ and let p > 1 be the smallest integer such that (7.19)
holds, i.e.
(A − λI)p v = 0 and (A − λI)p−1 v 6= 0.
If we let
v1 := (A − λI)p−1 v,
(7.20)
then we see that (A − λI) annihilates v1 , i.e. (A − λI)v1 = 0, so v1 is a true eigenvector
for the eigenvalue λ. On the other hand,
v2 := (A − λI)p−2 v, . . . , vp := (A − λI)0 v = v,
are generalized eigenvectors because they are annihilated by powers of (A − λI):
(A − λI)2 v2 = 0, . . . , (A − λI)p vp = 0.
220
CHAPTER 7. SYSTEMS OF FIRST-ORDER EQUATIONS
This sequence is called a chain of generalized eigenvectors of length p associated
with λ. Notice that the chain is based upon a true eigenvector v1 . Moreover, the chain
leads to the construction of p distinct solutions of (7.17) in the following way:
x1 (t) = eλt v1 ,
x2 (t) = eλt (tv1 + v2 ) ,
2
t
λt
v1 + tv2 + v3 ,
x3 (t) = e
2
..
.
p−1
tp−2
t
λt
v1 +
v2 + · · · + vp .
xp (t) = e
(p − 1)!
(p − 2)!
(7.21)
We can easily show x1 (t),. . . ,xp (t) are linearly independent, as we did for p = 2.
If mg = 1, then there is only one linearly independent eigenvector on which to
base a chain of generalized eigenvectors. But if mg > 1, then we may have multiple
chains associated with the same eigenvalue. However, it turns out that the sum of the
lengths of all chains of generalized eigenvectors associated with an eigenvalue λ is always
equal to its algebraic multiplicity ma . Moreover, if we have two chains of generalized
eigenvectors based upon linearly independent eigenvectors, then the union of all these
vectors is linearly independent. (These are both consequences of the “Jordan normal
form” for A; for more details, see [4].) If we follow the above prescription for the
construction of solutions of (7.17) from each chain of generalized eigenvectors, then we
obtain a collection of ma linearly independent solutions, as desired.
Let us illustrate this procedure with one more example.
Example 5. Find the general solution for

−3 0
x0 =  −1 −1
1
0

−4
−1  x.
1
Solution. Let us find the eigenvalues of the coefficient matrix A:
det(A − λI) =
−3 − λ
−1
1
0
−1 − λ
0
so λ = −1 is the only eigenvalue and ma = 3. Let

 
−2 0 −4
1 0
A − λI = A + I = −1 0 −1 ∼ 0 0
1 0 2
0 0
−4
−1 = −(1 + λ)3 ,
1−λ
us find its eigenvectors:
 

0
0
1 ⇒ select v1 = 1 .
0
0
We see that mg = 1 and so the defect is d = 2. We want to construct a chain of
generalized eigenvectors of length p = 3 for the eigenvalue λ = −1. Let us find the
vector v by solving (7.20), i.e. (A + I)2 v = v1 . But we compute

2 

0 0 0
−2 0 −4
(A + I)2 = −1 0 −1 = 1 0 2 ,
1 0 2
0 0 0
7.3. EIGENVALUE METHOD FOR HOMOGENEOUS SYSTEMS
221
so we can solve for v:

0
1
0
0
0
0

 
0
0
2 v = 1
0
0
 
1
⇒ select v = 0 .
0
We then take v2 = (A + I)v and v3 = v:

   
−2 0 −4
1
−2
v2 = −1 0 −1 0 = −1
1 0 2
0
1
 
1
and v3 = 0 .
0
Our three linearly independent solutions are
   
 
    
  
−2
1
0
0
−2
2 0
t
x1 (t) = e−t 1, x2 (t) = e−t t 1 + −1, x3 (t) = e−t  1 + t −1 + 0.
2
0
1
0
0
0
1
Exercises
1. Use Theorem 1 to obtain the general solution of the given first-order systems; if
initial conditions are given, find the particular solution satisfying them.
2 3
4 1
7
0
0
(a) x =
x;
(b) x =
x, x(0) =
; Sol’n
2 1
6 −1
0

 



1
9
4 0
5 0 −6
(d) x0 = −6 −1 0 x, x(0) =  0 .
(c) x0 = 2 −1 −2 x;
−1
6
4 3
4 −2 −4
2. The matrices in the following systems have complex eigenvalues; use Theorem 2
to find the general (real-valued) solution; if initial conditions are given, find the
particular solution satisfying them.
4 −3
1 −5
3
0
0
(a) x =
x;
(b) x =
x, x(0) =
; Sol’n
3 4
1 −1
−1




 
1 0
0
0 2 0
1
(c) x0 = 0 −1 −6 x;
(d) x0 = −2 0 0 x, x(0) = 2.
0 3
5
0 0 3
3
3. The matrices in the following systems have (some) defective eigenvalues; use generalized eigenvectors to find the general solution.
−2 1
3 −1
(a) x0 =
x;
(b) x0 =
x.
−1 −4
1 5
4. Find the general solution for the nonhomogeneous first-order system. If initial
conditions are given, also find the solution satisfying them.
t
3 4
1
6 −7
8e
1
0
0
(a) x =
x+
; (b) x =
x+
, x(0) =
; Sol’n
3 2
2
1 −2
0
0


 


 2t 
5 0 −6
3e
9
4 0
−1
(c) x0 = 2 −1 −2 x +  0 ;
(d) x0 = −6 −1 0 x +  4 .
4 −2 −4
0
6
4 3
−7
222
7.4
CHAPTER 7. SYSTEMS OF FIRST-ORDER EQUATIONS
Applications to Multiple Tank Mixing
Suppose that we have two tanks that contain solutions involving the same solute and
solvent (such as salt dissolved in water) but of different concentrations; the two solutions
are allowed to flow between the two tanks, changing the concentrations in each tank.
This will lead to a first-order linear system involving the rates of change of x1 (t), the
amount of solute in Tank 1 at time t, and x2 (t), the amount of solute in Tank 2. We
must solve this first-order linear system to find x1 and x1 .
Fig.1. An Open System involving Two Tanks
Let us consider one arrangement, illustrated in Figure 1, in which the solutions not
only flow between the two tanks, but there is inflow to Tank 1 and outflow from Tank
2; such a system that is open to the outside is called an open system. Suppose the
inflow has rate ri and concentration ci while the outflow has rate ro . In addition, the
solution in Tank 1 is pumped into Tank 2 at the rate r12 and the solution in Tank 2 is
pumped into Tank 1 at the rate r21 . The concentration of the solution in Tank 1, c1 (t),
is determined by x1 (t) and the volume of solution in Tank 1 at time t, which we denote
by V1 (t); similarly, the concentration of the solution in Tank 2, c2 (t), is determined by
x2 (t) and V2 (t):
x1 (t)
x2 (t)
c1 (t) =
and
c2 (t) =
.
V1 (t)
V2 (t)
Now let us find the equations for the rate of change of x1 (t) and x2 (t). After a moment’s
thought, it is easy to see that the following equations hold:
dx1
x1
x2
= ci ri −
r12 +
r21
dt
V1
V2
dx2
x1
x2
x2
=
r12 −
r21 −
ro .
dt
V1
V2
V2
If we introduce the vector function x(t) = (x1 (t), x2 (t)), then we recognize this as a
nonhomogeneous first-order system as in (7.12):
0 r12
− V1
x1
= r12
x2
V1
r21
x1
V2
−r21 −ro
x2
V2
+
ri ci
.
0
(7.22)
If we are also given initial conditions on x, then we can solve this system using the
techniques discussed in this section.
7.4. APPLICATIONS TO MULTIPLE TANK MIXING
223
Example 1. Suppose that each tank initially contains 2 grams of salt dissolved in 10
Liters of water, so
x1 (0) = 2 g = x2 (0)
and V1 (0) = 10 L = V2 (0).
Moreover, suppose the concentration and flow-rate parameters in (7.22) are
ci = 1 g/L, ri = 6 L/min, r12 = 8 L/min, r21 = 2 L/min, ro = 6 L/min.
Notice that both tanks have the same net in-flow and out-flow, so V1 (t) = 10 = V2 (t)
for all t. Consequently, the system in (7.22) with initial-conditions becomes
0 x1
−0.8
=
x2
0.8
0.2
x1
6
+
,
−0.8 x2
0
x1 (0)
2
=
.
x2 (0)
2
Since the matrix A is constant, we can apply the eigenvalue method to solve this
problem. We compute the eigenvalues of A:
−0.8 − λ
0.2
det
= (0.8 + λ)2 − 0.16 = 0 ⇒ λ = −0.8 ± 0.4.
0.8
−0.8 − λ
Let us compute an eigenvector for λ1 = −1.2:
0.4 0.2
2 1
∼
⇒
0.8 0.4
0 0
v1 =
Similarly, we compute an eigenvector for λ2 = −0.4:
−0.4 0.2
−2 1
∼
⇒
0.8 −0.4
0 0
1
.
−2
1
v2 =
.
2
The general solution of the homogeneous equation is
1
−1.2t
−0.4t 1
xh (t) = c1 e
+ c2 e
.
−2
2
Now we need a particular solution of the nonhomogeneous equation, so let us use undetermined coefficients:
A
−0.8A + 0.2B + 6 = 0
xp =
⇒
⇒ A = 10 = B.
B
0.8A − 0.8B = 0
We conclude that our general solution is
1
10
−1.2t
−0.4t 1
x(t) = c1 e
+ c2 e
+
.
−2
2
10
Now let us use the initial conditions to evaluate c1 and c2 :
c1 + c2 + 10 = 2
−2c1 + 2c2 + 10 = 2
⇒
c1 = −2
c2 = −6.
224
CHAPTER 7. SYSTEMS OF FIRST-ORDER EQUATIONS
Finally, we are able to write our solution as
x1 (t) = 10 − 2 e−1.2t − 6 e−0.4t
x2 (t) = 10 + 4 e−1.2t − 12 e−0.4t .
Recalling that the volume in each tank remains 10 L, we see that the concentration in
both tanks tends to 1 g/L as t → ∞, which matches the concentration of the in-flow. 2
Of course, the same ideas apply to mixing between more than two tanks, and will
lead to a larger system of first-order linear equations. For example, consider the system
of three tanks in Figure 2 in which the solution in Tank 1 is pumped into Tank 2, the
solution in Tank 2 is pumped into Tank 3, and the solution in Tank 3 is pumped into
Tank 1, all at the same rate r. In this system notice that no solution is added from
outside the system or removed from the system: for this reason, the system in Figure 2
is called closed.
Fig.2. A Closed System involving Three Tanks
For j = 1, 2, 3, we let xj denote the amount of solute and Vj denote the volume of solution in Tank j; notice that Vj is independent of t. We then find that the concentration
in Tank j is given by cj = xj /Vj , so the rates of change are given by
dx1
x1
x3
= −r
+r
= −k1 x1 + k3 x3
dt
V1
V3
dx2
x2
x1
= −r
+r
= k1 x1 − k2 x2
dt
V2
V1
dx3
x3
x2
= −r
+r
= k2 x2 − k3 x3
dt
V3
V2
where kj = r/Vj are constants. If we introduce the vector function x(t) = (x1 (t), x2 (t), x3 (t)),
then we can write this homogeneous system as


−k1
0
k3
0 x
x0 =  k1 −k2
(7.23)
0
k2 −k3
7.4. APPLICATIONS TO MULTIPLE TANK MIXING
225
Since the coefficient matrix is constant, we can use the methods of this chapter to find
the general solution of (7.23), and if we are given initial conditions then we can find the
particular solution.
Example 2. Suppose that r = 20 L/min, V1 = V2 = 40 L, and V3 = 100 L. Moreover,
suppose that initially, there is 18 grams of salt in Tank 1 and no salt in Tanks 2 and 3.
With these values, we find that the initial-value problem for (7.23) becomes


 
−0.5
0
0.2
18
0  x,
x0 =  0.5 −0.5
x(0) =  0  .
0
0.5 −0.2
0
We find the eigenvalues by solving

−0.5 − λ
0
−0.5 − λ
det  0.5
0
0.5

0.2
 = −λ(λ2 + 1.2 λ + 0.45) = 0,
0
−0.2 − λ
which yields one real and two complex eigenvalues
λ1 = 0,
λ2 = −0.6 + 0.3 i,
Now we compute an eigenvector for λ1
 

1
−0.5
0
0.2
 0.5 −0.5
0  ∼ 0
0
0
0.5 −0.2
λ3 = −0.6 − 0.3 i.
= 0 by
0
1
0

−0.4
−0.4
0
⇒
select
 
2
v1 = 2 .
5
Using (7.18), we obtain one solution that is a constant:
 
2
x1 (t) ≡ 2 .
5
For λ2 = −0.6 + 0.3 i, we compute a complex eigenvector by
 

1 0.2 − 0.6 i
0.1 − 0.3 i
0
0.2
 ∼ 0

1
0.5
0.1 − 0.3 i
0
0
0
0
0.5
0.4 − 0.3 i


−1 − 3 i
⇒ select v2 = −4 + 3 i .
5

0
0.8 − 0.6 i
0
This provides a complex-valued solution




−1 − 3 i
−1 − 3 i
w(t) = e(−0.6+0.3 i)t −4 + 3 i = e−0.6 t (cos .3 t + i sin .3 t) −4 + 3 i
5
5


− cos .3 t + 3 sin .3 t − i(3 cos .3 t + sin .3 t)
= e−0.6 t −4 cos .3 t − 3 sin .3 t + i(3 cos .3 t − 4 sin .3 t) ,
5 cos .3 t + i 5 sin .3 t
226
CHAPTER 7. SYSTEMS OF FIRST-ORDER EQUATIONS
so we take real and imaginary parts to get two linearly independent real-valued solutions




− cos .3 t + 3 sin .3 t
−3 cos .3 t − sin .3 t
x2 (t) = e−0.6 t −4 cos .3 t − 3 sin .3 t and x3 (t) = e−0.6 t  3 cos .3 t − 4 sin .3 t  .
5 cos .3 t
5 sin .3 t
The general solution is
x(t) = c1 x1 (t) + c2 x2 (t) + c3 x3 (t).
If we evaluate this at t = 0 and use the initial condition, we obtain
2c1 − c2 − 3c3 = 18
2c1 − 4c2 + 3c3 = 0
⇒
c1 = 2, c2 = −2, c3 = −4,
5c1 + 5c2 = 0
and our final solution is


 
14 cos .3t − 2 sin .3t
4
x(t) =  4  + e−0.6 t  −4 cos .3t + 22 sin .3t  .
−10 cos .3t − 20 sin .3t
10
Notice that, as t → ∞, we have xj /Vj → 0.1 g/L for each j = 1, 2, 3, so the three tanks
will eventually have a uniform concentration of salt, as expected.
2
Exercises
1. As in Figure 1, suppose that a salt solution of 4 g/L is pumped into Tank 1 at
the rate of 3 L/min, the solution in Tank 2 is pumped out at the same rate, but
the solutions in the two tanks are mixed by pumping from Tank 1 to 2 at r12 = 4
L/min and from Tank 2 to 1 at r21 = 1 L/min. Suppose both tanks initially
contain 20 L of salt solution, but the solution in Tank 1 has 30 g of salt and the
solution in Tank 2 has 60 g of salt. Find the amount of salt in each tank at time
t. What happens to the concentration in both tanks as t → ∞?
2. Suppose we have a closed system of two tanks with flow between them of 3 L/min
in each direction. Initially both tanks contain 15 grams of a certain chemical, but
in Tank 1 it is dissolved in 6 Liters of water and in Tank 2 it is dissolved in 12
Liters of water. Find the amounts of the chemical in each tank at time t.
3. For the closed system with three tanks as in Figure 2, suppose that the rate of flow
is r = 120 L/min, and the volumes of the three tanks are V1 = 20, V2 = 6, and
V3 = 40 Liters. Assume initially that Tank 1 contains 37 g salt, Tank 2 contains
29 g salt, but there is no salt in Tank 3. Find the amounts of salt in each tank,
x1 , x2 , and x3 , and time t.
4. Three 10 L tanks are arranged as in Figure 3. At t = 0 all tanks are full of water
in which the following amounts of salt are dissolved: Tank 1 has 10 g, Tank 2 has
5 g, and Tank 3 has no salt. A salt solution of 1 g/L begins to flow into Tank 1
at 2 L/min and the mixed solution flows out of Tank 2 at the same rate; the flow
between the tanks is 1 L/min. Find the amount of salt in each tank at time t.
Fig.3. Exercise 4.
7.5. APPLICATIONS TO MECHANICAL VIBRATIONS
7.5
227
Applications to Mechanical Vibrations
In this section we use first-order systems to study mechanical vibrations. Let us begin with the simple spring-mass-dashpot vibration that we studied in Chapter 2 using
second-order equations. Recall (Example 1 in Section 7.1) that introducing x = (x, x0 )
enables us to replace the second-order equation with the first-order system
0
1
x0 = Ax, where A =
.
(7.24)
−k/m −c/m
Let us compute the eigenvalues for A:
det(A − λI) =
−λ
−k/m
c
k
1
= λ2 + λ + .
−c/m − λ
m
m
So the eigenvalues of A are given by
λ=
−c ±
√
c2 − 4mk
.
2m
As we found in Section 2.4, the behavior of the solutions is heavily dependent on the
sign of c2 − 4mk.
In the overdamped case c2 − 4mk > 0, A has two distinct real eigenvalues, both
negative:
√
√
−c + c2 − 4mk
−c − c2 − 4mk
< λ2 =
< 0.
λ1 =
2m
2m
We know that each eigenvalue has a one-dimensional eigenspace, so we can find an
eigenbasis {v1 , v2 } for R2 . The general solution in this case is
x(t) = c1 eλ1 t v1 + c2 eλ2 t v2 .
We see that all solutions decay without oscillation to zero as t → ∞, just as we found
in Section 2.4. (Recall that x(t) is simply the first component of x.)
In the critically damped case c2 − 4mk = 0, we have one (double) eigenvalue λ =
−c/2m. We leave it as an exercise (Exercise 1) to show that this eigenvalue is defective
and to find the general solution.
In the underdamped case c2 − 4mk < 0, A has two complex conjugate eigenvalues:
√
−c
4mk − c2
λ± =
± iµ, where µ =
.
2m
2m
The associated eigenvectors v± also occur in complex conjugate pairs, yielding linearly
independent complex solutions w(t) = eλt v and w(t) = eλt v (where, say, λ = λ+ ).
However, as we saw in Section 7.3, we can obtain real-valued solutions by taking the
real and imaginary parts of one of these:
c
x1 (t) = Re(eλt v) = e− 2m t Re(eiµ v)
c
x2 (t) = Im(eλt v) = e− 2m t Im(eiµ v).
Fig.1.
Spring-mass-dashpot
system
228
CHAPTER 7. SYSTEMS OF FIRST-ORDER EQUATIONS
Recalling Euler’s formula eiµ = cos µ + i sin µ, we see that both x1 and x2 oscillate as
c
they decay to zero (due to the factor e− 2m t ), just as we found in Section 2.4.
Example 1. Let us consider m = 0.5 kg, c = 0.2 N/(m/sec), and k = 2 N/m, as in
Example 2 of Section 2.4. Then k/m = 4 and c/m = 0.4, so
√
0
1
A=
with eigenvalues λ± = −0.2 ± i 3.96.
−4 −0.4
√
Using µ = 3.96 and λ = −0.2 + iµ, we find
0.2 − iµ
1
1 .05 + i µ4
A − λI =
,
∼
0
0
−4
−0.2 − iµ
so the eigenspace is t(−.05 − i µ4 , 1) where t is a free complex parameter. Let us choose
t = −(.05 + i µ4 )−1 so that the resultant eigenvector v has first component v1 = 1. We
let
1
λt
−0.2t
w(t) = e v = e
(cos µt + i sin µt)
,
v2
and then take real and imaginary parts to obtain
−0.2t cos µt
−0.2t sin µt
x1 (t) = e
and x2 (t) = e
,
y1 (t)
y2 (t)
where y1 (t) and y2 (t) oscillate with frequency µ. The general solution of our first-order
system is x(t) = c1 x1 (t) + c2 x2 (t). But if we are only interested in the position function
x(t), we just consider the first component of x(t) to obtain
x(t) = c1 e−0.2t cos µt + c2 e−0.2t sin µt,
exactly as we found in Section 2.4.
2
Coupled Mechanical Vibrations
Now let us consider a more complicated system in which two masses m1 and m2 are
attached to three springs with spring constants k1 , k2 , and k12 as in Figure 2; we also
assume mi is subject to a linear damping force with coefficient ci . Let x denote the
displacement of m1 from its equilibrium position and y denote the displacement of m2
from its equilibrium (both measured from left to right). Then Newton’s law implies
that the following second-order equations hold (cf. Exercise 2):
Fig.2. Coupled
spring-mass system
m1 x00 = −k1 x + k12 (y − x) − c1 x0
m2 y 00 = −k2 y − k12 (y − x) − c2 y 0
(7.25)
We can express (7.25) as a first-order system by introducing x = (x1 , x2 , x3 , x4 ), where
x1 = x, x2 = x0 , x3 = y, and x4 = y 0 . Then the two second-order equations can be
replaced by a first-order system:
x01 = x2
k1 + k12
c1
k12
x1 −
x2 +
x3
x02 = −
m1
m1
m1
x03 = x4
k12
k12 + k2
c2
x04 =
x1 −
x3 −
x4 .
m2
m2
m2
7.5. APPLICATIONS TO MECHANICAL VIBRATIONS
To simplify our equations, let us now assume c1 =
system can be written in matrix form:


0
1
0
0
−q1 0 q2 0
 x, where
x0 = 
 0
0
0
1
q3 0 −q4 0
229
c2 = 0. In this case, our first-order

q1 = (k1 + k12 )/m1



q2 = k12 /m1
.
 q3 = k12 /m2


q4 = (k12 + k2 )/m2
(7.26)
If we calculate the characteristic polynomial of the (4×4)-matrix, we find its eigenvalues
satisfy
(λ2 + q1 )(λ2 + q4 ) − q2 q3 = 0
(7.27)
It turns out that there are no real eigenvalues: there will be four complex eigenvalues
that occur in two complex conjugate pairs ± i ω1 and ± i ω2 (see Exercise 3). The
quantities ω1 and ω2 appear as circular frequencies in the general solution, and hence
are called the natural frequencies of the coupled system. Let us consider an example.
Example 2. Suppose m1 = 2, m2 = 1, k1 = 2, k2 = 1, and k12 = 2 in (7.26). Let
us determine the natural frequencies of the system and find the general solution. We
calculate q1 = 2 = q3 , q2 = 1, and q4 = 3, and use (7.27) to find the eigenvalues:
(λ2 + 2)(λ2 + 3) − 2 = λ4 + 5λ2 + 4 = (λ2 + 1)(λ2 + 4) = 0
With λ1 = i we find

−i 1
−2 −i
A − iI = 
0
0
2
0
0
1
−i
−3
 
1 0 0
0
0 1 0
0
∼
1  0 0 1
0 0 0
−i

i
−1

i 
0
⇒
⇒
λ = ±i, ±2i.
v1 = (−i, 1, −i, 1).
(We can find v2 = v1 for λ2 = −i, but we won’t need it.) The eigenpair (λ1 , v1 )
produces the complex-valued solution


sin t − i cos t
cos t + i sin t

w1 (t) = eit v1 = (cos t + i sin t)v1 = 
sin t − i cos t ,
cos t + i sin t
and we take the real and imaginary parts to obtain two linearly independent solutions




sin t
− cos t
cos t
 sin t 



x1 (t) = 
 sin t  and x2 (t) = − cos t .
cos t
sin t
With λ = 2i we find

−2i
1
 −2 −2i
A − 2i I = 
 0
0
2
0
0
1
−2i
−3
 
0
1
0
0 
∼
1  0
−2i
0
i/2
1
0
0

0 0
i 0 

1 i/2
0 0
⇒
v3 = (i, −2, −2i, 4).
230
CHAPTER 7. SYSTEMS OF FIRST-ORDER EQUATIONS
This produces the complex-valued solution


− sin 2t + i cos 2t
−2 cos 2t − 2i sin 2t

w3 (t) = e2it v3 = (cos 2t + i sin 2t) v3 = 
 2 sin 2t − 2i cos 2t 
4 cos 2t + 4i sin 2t
and we take the real and imaginary parts to obtain two linearly independent solutions




cos 2t
− sin 2t
 −2 sin 2t 
−2 cos 2t



x3 (t) = 
 2 sin 2t  and x4 (t) = −2 cos 2t .
4 sin 2t
4 cos 2t
We see that the natural frequencies of the system are ω1 = 1 and ω2 = 2, and the
general solution of the first-order system is x = c1 x1 + c2 x2 + c3 x3 + c4 x4 . However, if
we only want the position functions x(t) and y(t), then we pick out the first and third
components of x:
x(t) = c1 sin t − c2 cos t − c3 sin 2t + c4 cos 2t
y(t) = c1 sin t − c2 cos t + 2c3 sin 2t − 2c4 cos 2t.
Note that this is oscillatory about the equilibrium solution (0, 0, 0, 0), so the equilibrium
is a center.
2
We can also consider coupled mechanical vibrations with external forces, which leads
to a nonhomogeneous first-order system. When there is no damping and the external
force is periodic with forcing frequency equal to one of the natural frequencies, then we
encounter resonance. However, this analysis is simplified if we use second-order systems,
which we discuss next.
Undamped, Forced Vibrations as Second-Order Systems
When the coupled second-order equations (7.25) do not involve damping (i.e. when c1 =
c2 = 0), then it is actually simpler to treat the equations as a second-order system;
this is particularly advantageous if we also have an external force f (t) = (f1 (t), f2 (t)).
In fact, the two second-order equations with the forcing terms added
m1 x00 = −(k1 + k12 ) x + k12 y + f1 (t)
m2 x00 = k12 x − (k2 + k12 ) y + f2 (t)
(7.28)
can be represented in matrix form as
x
x=
,
y
M=
m1
0
M x00 = K x + f (t) where
0
−k1 − k12
k12
, K=
,
m2
k12
−k2 − k12
(7.29)
f1 (t)
f (t) =
.
f2 (t)
The diagonal matrix M is called the mass matrix and the symmetric matrix K is
called the stiffness matrix ; note that these are (2 × 2)-matrices and simpler than the
(4 × 4)-matrix in (7.26).
7.5. APPLICATIONS TO MECHANICAL VIBRATIONS
231
To analyze (7.29), we multiply on the left by
−1
m1
0
M−1 =
0
m−1
2
to obtain
x00 = Ax + g(t)
−1
A=M
K
where
−1
and g(t) = M
(7.30)
f (t).
Using our experience with second-order linear equations, it is not difficult to show that
the general solution of (7.30) is
x(t) = xp (t) + xh (t),
(7.31)
where xp (t) is a particular solution of (7.31) and xh (t) is the general solution of the
homogenous second-order system
x00 = A x.
(7.32)
Since (7.32) is a second-order (2 × 2)-system, we expect four linearly independent
solutions, but how do we find them? From our experience with first-order homogenous
systems, we try solutions in the form
x(t) = eµt v,
(7.33)
where µ is a (possibly complex) scalar and v is a fixed vector. If we plug (7.33) into
(7.32), we obtain
eµt µ2 v = eµt A v ⇒ µ2 v = A v,
in other words, (µ2 , v) is an eigenpair for A. If A has a negative eigenvalue −ω12 (where
ω1 > 0) with real eigenvector v1 , then we can take µ = iω1 and obtain a complex-valued
solution of (7.32):
x(t) = eiω1 t v1 = (cos ω1 t + i sin ω1 t) v1 .
Taking real and imaginary parts, we obtain two linearly independent solutions of (7.32):
x1 (t) = (cos ω1 t) v1
and x2 (t) = (sin ω1 t) v1 .
If A has another negative eigenvalue − ω22 with ω1 6= ω2 , then we can repeat this process
and obtain two more solutions that are linearly independent of each other and the two
solutions above; taking a linear combination of these gives us the general solution of
(7.32). This argument can be modified to cover λ = 0 (see Exercise 7). Generalizing
this process to (n × n)-matrices A, we have the following theorem:
Theorem 1. If A is a real (n × n)-matrix with negative eigenvalues λ1 = −ω12 > · · · >
λn = −ωn2 and associated eigenvectors v1 , . . . , vn , then the general solution of (7.32) is
given by
n
X
x(t) =
(ai cos ωi t + bi sin ωi t) vi ,
i=1
where ai and bi are arbitrary constants. If λ1 = −ω12 is replaced by λ1 = 0 then
(a1 cos ω1 t + b1 sin ωi t) v1 should be replaced by (a1 + b1 t)v1 .
232
CHAPTER 7. SYSTEMS OF FIRST-ORDER EQUATIONS
How do we find a particular solution xp of (7.31)? Based upon our experience with
second-order equations and first-order systems, we shall use the method of undetermined
coefficients. As in Section 2.6, let us consider the case of a periodic forcing function
f (t), which in (7.30) means that we want
g(t) = (cos ωt) g0
(7.34)
where ω is the forcing frequency and g0 is a fixed vector. Let us try as our trial solution
xp (t) = (cos ωt) u,
where the constant vector u is to be determined. But
x00p (t) = (−ω 2 cos ωt) u
and Axp (t) = (cos ωt) Au,
so u should satisfy
(A + ω 2 I) u = −g0 .
(7.35)
Notice that (7.35) has a unique solution u provided ω is not an eigenvalue of A, i.e.
provided ω is not one of the natural frequencies of the system. On the other hand, if ω
is one of the natural frequencies of the system, then we encounter resonance.
Example 3. Suppose m1 = 2, m2 = 1, k1 = 2, k2 = 1, and k12 = 2 as in Example 2,
but now let us also assume that mass m2 is subject to a periodic force f2 (t) = cos ωt.
Let us find the general solution using a second-order system. In this case (7.29) becomes
2 0 00
−4 2
0
x =
x+
,
0 1
2 −3
cos ωt
and after multiplying by M−1 we obtain for (7.30)
−2 1
0
00
x =
x+
.
2 −3
cos ωt
Now we compute the eigenvalues and eigenvectors of A to find λ1 = −1 with eigenvector
v1 = (1, 1) and λ2 = −4 with eigenvector v2 = (1, −2). So we take ω1 = 1 and ω2 = 2
and obtain our four linearly independent solutions of the homogenous equation:
1
1
1
1
x1 (t) = sin t
, x2 (t) = cos t
, x3 (t) = sin 2t
, x4 (t) = cos 2t
.
1
1
−2
−2
To find xp we want to solve (7.35) with g0 = (0, 1):
−2 + ω 2
1
0
u=
.
2
−3 + ω 2
−1
We can solve this for u using the inverse matrix
−2 + ω 2
2
1
−3 + ω 2
−1
=
2
1
ω −3
−2
(ω 2 − 1)(ω 2 − 4)
−1
,
ω2 − 2
7.5. APPLICATIONS TO MECHANICAL VIBRATIONS
233
provided ω 6= 1, 2. We find
−1 1
−2 + ω 2
1
0
1
u=
= 2
,
2
−3 + ω 2
−1
(ω − 1)(ω 2 − 4) 2 − ω 2
and consequently the particular solution is
xp (t) =
cos ωt
1
.
(ω 2 − 1)(ω 2 − 4) 2 − ω 2
(7.36)
Putting this together, we obtain the general solution
cos ωt
1
1
x(t) = 2
+
2 + c1 sin t
2
2
−
ω
1
(ω − 1)(ω − 4)
1
1
1
c2 cos t
+ c3 sin 2t
+ c4 cos 2t
.
1
−2
−2
Of course, we needed to assume that the forcing frequency ω is not equal to either of the
natural frequencies ω1 = 1 and ω2 = 2 in order for xp to be defined in (7.36). Moreover,
for ω close to either of the natural frequencies we see that amplitude of the oscillations
of xp become very large; this is resonance.
2
Exercises
1. In the critically damped case c2 − 4mk = 0, show that the eigenvalue λ = −c/2m
for (7.24) is defective and find the general solution. Compare this with the result
for critically damped vibrations in Section 2.4.
2. Explain how Newton’s law implies that (7.25) holds.
3. For the parameters q1 , . . . , q4 satisfying (7.26), show that (7.27) always has four
roots of the form ± i ω1 , ± i ω2 where ω1 , ω2 > 0.
Fig.3. Exercise 4.
4. Suppose that masses m1 and m2 are only connected by two springs as in Figure
3. If m1 = 2, m2 = 1, k1 = 4 and k12 = 2, find the natural frequencies and the
general solution x(t) of the first-order system (7.26).
5. Replace the spring with constant k2 in (7.25) with a dashpot having damping
coefficient c2 > 0; see Figure 4. Find the first-order system that replaces (7.26).
6. Consider the two spring configuration of Exercise 4, but add an external force
of cos ωt that acts on m2 . For what forcing frequencies ω does resonance occur?
Assuming ω is nonresonant, find the general solution x(t) of the second-order
system (7.29).
Fig.4. Exercise 5.
7. If the (n × n)-matrix A has eigenvalue λ = 0 with (nonzero) eigenvector v, show
that two linearly independent solutions of x00 = Ax are x1 (t) = v and x2 (t) = tv.
8. Consider three masses m1 , m2 , and m3 connected by two springs as in Figure 5.
Suppose m1 = 3 = m3 , m2 = 2, and k12 = 12 = k23 . Use a second-order system
to find the general solution.
9. Consider the three mass and two spring system in Exercise 8, but add an external
force of 12 cos 3t acting on m2 . Find the general solution.
Fig.5. Exercise 8.
234
7.6
CHAPTER 7. SYSTEMS OF FIRST-ORDER EQUATIONS
Additional Exercises
1. Consider the vector-valued functions x1 (t) = (t, t2 ) and x2 (t) = (t2 , t3 ). Show
that there is no first-order system x0 = A(t) x with A(t) continuous in t for which
x1 and x2 are both solutions.
2. Suppose A is a constant (n × n)-matrix and x0 = Ax admits a constant solution
x(t) ≡ x0 where x0 6= 0. Can you identify one of the eigenvectors of A and its
eigenvalue?
3. If A is an (n × n)-matrix, then so is tA and we can define etA as in Section 6.2:
etA = I + tA +
(tA)2
+ ···
2!
If v is any vector, show that the unique solution of x0 = Ax, x(0) = v is given by
x(t) = etA v.
4. If A is a diagonalizable (n×n)-matrix so that A = E D E−1 where E is the matrix
of eigenvectors for A and D = Diag(λ1 , . . . , λn ), show that
etA = E Diag(etλ1 , . . . , etλn ) E−1 .
5. For the first-order system, compute the matrix exponential etA discussed in the
previous two exercises and use it to find the solution of the initial-value problem:
2 −1
2
0
x =
x, x(0) =
.
−4 2
0
6. For the first-order system, compute the the matrix exponential etA and use it to
find the solution of the initial-value problem:
−3 −2
1
x0 =
x, x(0) =
.
9
3
1
Consider two masses m1 and m2 connected to two walls by dashpots with respective
damping coefficients c1 and c2 , and connected to each other by a spring with constant
k; see Figure 1. If x and y denote the displacements from the equilibrium positions of
m1 and m2 respectively, then the following second-order equations hold:
m1 x00 = −c1 x0 − k(x − y)
m2 y 00 = −c2 y 0 − k(y − x).
Fig.1. Exercise 7.
(7.37)
7. (a) Let c1 = c2 = 1, m1 = m2 = 1, and k = 1. Using x1 = x, x2 = x0 , x3 = y,
and x4 = y 0 , replace the second-order system (7.37) by a first-order system.
(b) Find the general solution of the first-order system,
7.6. ADDITIONAL EXERCISES
235
(c) Would you describe this system as overdamped, underdamped, or critically
damped?
(d) Do all solutions satisfy x(t), y(t) → 0 as t → ∞? What is the significance of
the zero eigenvalue?
8. (a) Let c1 = c2 = 3, m1 = m2 = 1, and k = 1. Using x1 = x, x2 = x0 , x3 = y,
and x4 = y 0 , replace the second-order system (7.37) by a first-order system.
(b) Find the general solution of the first-order system,
(c) Would you describe this system as overdamped, underdamped, or critically
damped?
(d) Do all solutions satisfy x(t), y(t) → 0 as t → ∞? What is the significance of
the zero eigenvalue?
Two railroad cars of masses m1 and m2 collide on a railroad track. At the time of
contact, t = 0, suppose m1 is moving with velocity v0 but m2 is stationary. However,
there is a buffer spring with constant k that prevents the cars from actually hitting each
other. Let x(t) denote the displacement of m1 and y(t) denote the displacement of m2
after contact. As long as contact is maintained, they satisfy the following system:
m1 x00 = −k(x − y),
x(0) = 0, x0 (0) = v0 ,
m2 y 00 = k(x − y),
y(0) = 0, y 0 (0) = 0.
(7.38)
Of course, at some t∗ > 0 the cars will separate and will continue with constant velocities
v1 and v2 for t > t∗ , but it is not immediately clear whether v1 and v2 are both positive.
9. Let m1 = 3, m2 = 2, k = 6, and v0 = 20.
(a) Find the solution of (7.38) for 0 < t < t∗ .
(b) Find the value of t∗ .
(c) What are the velocities v1 and v2 for t > t∗ ?
(d) Verify that momentum and energy are both conserved (i.e. check their values
for t < 0 and t > t∗ ).
10. Let m1 = 3, m2 = 4, k = 12, and v0 = 35.
(a) Find the solution of (7.38) for 0 < t < t∗ .
(b) Find the value of t∗ .
(c) What are the velocities v1 and v2 for t > t∗ ?
(d) Verify that momentum and energy are both conserved (i.e. check their values
for t < 0 and t > t∗ ).
When A is a diagonalizable (n × n)-matrix, we can use a change of variable to solve
the nonhomogeneous system x0 = Ax + f (t). Let E denote the matrix of eigenvectors
for A so that (see Section 6.2)
E−1 AE = D = Diag(λ1 , . . . , λn )
Fig.2. Exercise 9.
236
CHAPTER 7. SYSTEMS OF FIRST-ORDER EQUATIONS
where λ1 , . . . , λn are the eigenvalues for A. Letting y = E−1 x yields (see Exercise 11)
a first-order system for y:
y0 = D y + g(t)
where g = E−1 f .
(7.39)
The advantage of (7.39) is that it has been decoupled into n linear equations yj0 =
λj yj +gj (t), which can be solved (see Exercise 12) by an integrating factor. The solution
of the original system is then obtained by computing x = Ey.
11. Show that letting x = E y in x0 = Ax + f (t) yields (7.39).
12. Show that the system (7.39) with initial condition y(0) = b can be solved to find
yj (t) = eλj t
Z
t
e−λj s gj (s) ds + bj eλj t .
0
13. Use the method described above to solve the following initial-value problem:
0 1
sin t
2
x0 =
x+
, x(0) =
.
1 0
0
0
14. Use the method described above to solve the following initial-value problem:
4 −2
t
1
0
x =
x+
, x(0) =
.
3 −1
2t
1
Appendix A
Complex Numbers
A complex number is of the form
z = a + b i,
(A.1)
where a and b are real numbers and i has the property that i2 = −1. (In some books, the
letter j is sometimes used in place of i.) The number a in (A.1) is called the real part
of z, denoted a = Re(z), and b is called the imaginary part of z, denoted b = Im(z).
Complex numbers can be added or multiplied together (using i2 = −1) to again obtain
a complex number.
Example 1. If w = 2 − i and z = 1 + 3i, find Re(w), Im(z), w + z, and wz.
Solution. Re(w) = 2, Im(z) = 3, w + z = 2 + 1 − i + 3i = 3 + 3i, and
wz = (2 − i)(1 + 3i) = 2 − i + 6i − 3i2 = 2 + 5i + 3 = 5 + 5i.
2
Fig.1. The complex plane
If we identify the complex number z = x + y i with the point (x, y), we can represent complex numbers as points in the complex plane. In the complex plane, the
horizontal (or “x-axis”) is called the real axis and the vertical axis (or “y-axis”) is
called the imaginary axis. The addition of two complex numbers can be performed
in the complex plane simply by adding their coordinates, but multiplying two complex
numbers looks complicated in the complex plane; we shall soon see how to make sense
of it as well.
If z is a complex number, its complex conjugate, denoted z̄, is obtained by changing the sign of the imaginary part:
z = a + bi
⇒
z̄ = a − b i.
(A.2)
If we take the product of z and z̄, we obtain the sum of the squares of its real and
imaginary parts:
z = a + bi
⇒
z z̄ = (a + b i)(a − b i) = a2 + b2 .
Since z z̄ is a nonnegative real number, we may take its square root and obtain a nonnegative real number called the modulus of z and denoted by |z|:
z = a + bi
⇒
|z|2 = z z̄ = a2 + b2 .
237
(A.3)
Fig.2. The complex
conjugate of a complex
number
238
APPENDIX A. COMPLEX NUMBERS
Note that the complex conjugate and modulus can be represented geometrically: if
z = x + y i is plotted in the complex plane, then z̄ = x − y i is just its reflection in the
horizontal axis, and |z| is the distance of the point (x, y) to the origin in the complex
plane.
Example 2. If z = −1 + 3i, find the complex conjugate z̄ and the modulus |z|.
Represent these geometrically in the complex plane.
p
√
Solution. We find z = −1 − 3i, z̄ = −1 − 3i and |z| = (−1)2 + 32 = 10. We
represent them in the complex plane in Figure 2.
2
In addition to multiplying complex numbers, we want to be able to divide by a
(nonzero) complex number. In particular, for a nonzero complex number z = a + bi, we
want to realize 1/(a + bi) in the form (A.1). This can be achieved by multiplying the
numerator and denominator by z̄ = a − bi:
1
a − bi
a − bi
a
−b
1
=
·
= 2
= 2
+ 2
i.
2
2
a + bi
a + bi a − bi
a +b
a +b
a + b2
(A.4)
Now we can perform the division w/z simply by multiplying w and 1/z.
Example 3. If w = 2 − i and z = −1 + 3i, find z −1 and w/z.
Solution.
1
1
−1 − 3i
1
3
z −1 =
=
·
=− −
i.
−1 + 3i
−1 + 3i −1 − 3i
10 10
w
1
3
1
−1
= (2 − i) · − 10
− 10
i = 10
+ 25 i.
z = wz
2
There is another way of writing complex numbers. The polar form is
z = r eiθ = r (cos θ + i sin θ),
(A.5)
where r = |z| is the modulus of z and θ is the angle that z makes with the real axis in
the complex plane; θ is called the argument of z and takes values in [0, 2π). Behind
(A.5) is Euler’s formula, which states
Fig.3. The modulus and
argument of a complex
number
eiθ = cos θ + i sin θ.
(A.6)
We can derive (A.6) by using complex numbers in the familiar power series
ex = 1 + x +
∞
X
x2
xn
+ ··· =
,
2!
n!
n=0
1 2
1
1
1
x + x4 − · · · and sin x = x − x3 + x5 · · ·
2!
4!
3!
5!
In fact, replacing x by iθ in (A.7) and collecting terms involving i, we find
cos x = 1 −
1
1
1
(iθ)2 + (iθ)3 + (iθ)4 + · · ·
2!
3!
4!
1 2
1 4
1
1
= 1 − θ + θ + · · · + i θ − θ3 + θ5 + · · ·
2!
4!
3!
5!
eiθ = 1 + iθ +
= cos θ + i sin θ,
(A.7)
(A.8)
239
which is just (A.6). We can use (A.6) to express cos θ and sin θ as complex exponentials:
cos θ =
eiθ + e−iθ
2
and
sin θ =
eiθ − e−iθ
.
2i
(A.9)
Moreover, using (A.5) we see that the product of two complex numbers is given by the
product of their moduli and the sum of their arguments:
z1 = r1 eiθ1 ,
z2 = r2 eiθ2
⇒
z1 z2 = r1 r2 ei(θ1 +θ2 ) .
(A.10)
This provides the geometric interpretation of multiplication that we referred to above.
√
√
Example 4. Let z1 = 1 + 3 i and z2 = −1 + 3 i. Express both z1 and z2 in polar
form, and use them to illustrate the validity of the product rule (A.10).
√
Solution. Both z1 and z2 have modulus 1 + 3 = 2. To find the argument θ1 for z1 ,
we write
√
1 + 3 i = 2eiθ1 = 2(cos θ1 + i sin θ1 ).
√
So cos θ1 = 1/2 and sin θ1 = 3/2. We see that θ1 is in the 1st quadrant and tan θ1 =
√
3, so θ1 = π/3. To find the argument θ2 for z2 we write
√
−1 + 3 i = 2 eiθ2 = 2(cos θ2 + i sin θ2 ).
√
So cos θ2 =
√ −1/2 and sin θ2 = 3/2. We see that θ2 is in the 2nd quadrant and
tan θ2 = − 3, so θ2 = 2π/3. We conclude that the polar forms are
z1 = 2 ei π/3
and
z2 = 2 ei 2π/3 .
Let us compute the produce z1 z2 two ways:
√
√
z1 z2 = (1 + 3 i)(−1 + 3 i) = −4 and z1 z2 = 2 ei π/3 2 ei 2π/3 = 4 ei π .
Since eiπ = −1, we have the same answer, as asserted by (A.10).
Exercises
1. For the following complex numbers z, find z and |z|.
(a) 1 + 2i,
(b) 2 − 3i,
(c) −3 + 4i,
(d) 5 − 7i
2. For the following complex numbers w and z, find wz and w/z.
(a) w = 3 + 4i, z = 5i;
(c) w = 2 − 3i, z = 1 + i;
(b) w = 1 + 2i, z = 1 − 2i;
(d) w = 4 + i, z = 1 + 4i.
3. Express the following complex numbers in polar form.
√
(a) −3i, (b) −3 − 3i, (c) 1 − i 3, (d) −3 + 4i
4. Use (A.6) to prove (A.9).
2
Fig.4. The geometric
interpretation of complex
multiplication
240
APPENDIX A. COMPLEX NUMBERS
Appendix B
Review of Partial Fractions
In high school algebra, we learn to put fractions over a common denominator. This
applies not just to numbers like
1 1
5
+ = ,
2 3
6
but to rational functions, i.e. quotients of polynomials:
1
2x + 3
1
+
=
.
x+1 x+2
(x + 1)(x + 2)
(B.1)
However, when we want to integrate a complicated rational function, we often want
to “go the other way” and separate it into a sum of simpler rational functions. The
resulting sum is called a partial fraction decomposition (PFD). For example, if we
started with the rational function on the right in (B.1), the expression on the left is its
partial fraction decomposition.
In general, suppose we have a rational function
R(x) =
p(x)
q(x)
where deg p < deg q.
(B.2)
(Recall that if we do not have deg p < deg q, then we can perform polynomial division to
achieve this.) Now we factor q(x) into its linear and irreducible quadratic terms. (Recall
that “irreducible quadratic” means it does not contain any real linear factors, so x2 + 1
is irreducible but x2 − 1 is not irreducible since it can be factored as (x − 1)(x + 1).)
• Each factor of q(x) in the form (ax − b)k contributes the following to the PFD:
A1
A2
Ak
+
+ ··· +
.
2
(ax − b) (ax − b)
(ax − b)k
• Each factor of q(x) in the form (ax2 + bx + c)k contributes the following to the
PFD:
A2 x + B 2
Ak x + B k
A1 x + B1
+
+ ··· +
.
2
2
2
ax + bx + c (ax + bx + c)
(ax2 + bx + c)k
241
242
APPENDIX B. REVIEW OF PARTIAL FRACTIONS
Once we have the correct form of the partial fraction decomposition, we simply recombine over the common denominator and compare with the original expression to
evaluate the constants.
Example 1. Find the PFD for
3x + 14
.
x2 + x − 6
Solution. The denominator factors as x2 +x−6 = (x−2)(x+3), so the partial fraction
decomposition takes the form
3x + 14
A
B
=
+
.
x2 + x − 6
x−2 x+3
We want to find A and B. We recombine over the common denominator and collect
terms in the numerator according to the power of x:
B
A(x + 3) + B(x − 2)
(A + B)x + (3A − 2B)
A
+
=
=
.
x−2 x+3
(x − 2)(x + 3)
(x − 2)(x + 3)
Comparing the numerator of this last expression with that of the original function, we
see that A and B must satisfy
A+B =3
and
3A − 2B = 14.
We can easily solve these simultaneously to obtain A = 4 and B = −1, so our PFD is
3x+14
x2 +x−6
=
4
x−2
−
1
x+3 .
2
Example 2. Find the PFD for
x2 − 2x + 7
.
(x + 1)(x2 + 4)
Solution. The denominator has already been factored into a linear and an irreducible
quadratic factor, so the PFD takes the form
A
Bx + C
x2 − 2x + 7
=
+ 2
.
2
(x + 1)(x + 4)
x+1
x +4
Now we recombine over the common denominator and collect terms in the numerator
according to the power of x:
x2 − 2x + 7
A(x2 + 4) + (x + 1)(Bx + C)
(A + B)x2 + (B + C)x + 4A + C
=
=
.
2
2
(x + 1)(x + 4)
(x + 1)(x + 4)
(x + 1)(x2 + 4)
Comparing both sides of this equation, we see that A, B, C must satisfy the following
three equations:
A + B = 1, B + C = −2, 4A + C = 7.
243
These can be solved simultaneously (e.g. one can use Gaussian elimination as in Chapter
3) to obtain A = 2, B = −1, and C = −1. In other words, we have the PFD
x2 − 2x + 7
2
x+1
=
−
.
(x + 1)(x2 + 4)
x + 1 x2 + 4
Example 3. Find the PFD for
2x3 + 4x2 + 3
.
(x + 1)2 (x2 + 4)
Solution. The denominator has already been factored into linear and irreducible
quadratic factors, so the PFD takes the form
A1
A2
2x3 + 4x2 + 3
B1 x + C1
=
+
.
+
(x + 1)2 (x2 + 4)
x + 1 (x + 1)2
x2 + 4
=
(A1 + B1 )x3 + (A1 + A2 + 2B1 + C1 )x2 + (4A1 + B1 + 2C1 )x + 4A1 + 4A1 + C1
(x + 1)2 (x2 + 4)
So we get the following system
A1 + B 1 = 2
A1 + A2 + 2B1 + C1 = 4
4A1 + B1 + 2C1 = 0
4A1 + 4A2 + C1 = 3.
These may be solved simultaneously (for example, using Gaussian elimination) to find
A1 = 0, A2 = 1, B1 = 2, C1 = −1. Hence the PFD is
1
2x − 1
2x3 + 4x2 + 3
=
+ 2
.
2
2
2
(x + 1) (x + 4)
(x + 1)
x +4
Trick for Evaluating the Constants
After recombining the terms in our PFD over the common denominator, we want the
resultant numerator to equal the numerator of the original rational function. But these
are both functions of x, so the equality must hold for all x. In particular, we can
choose convenient values of x which simplify the calculation of the constants; this works
especially well with linear factors. Let us redo Examples 1 & 2 using this method.
Example 1 (revisited). We wanted to find A and B so that
3x + 14
A(x + 3) + B(x − 2)
=
.
x2 + x − 6
(x − 2)(x + 3)
Since there are two constants to find, we choose two values of x; the obvious choices are
x = 2, −3. If we plug x = 2 into both numerators, the term involving B vanishes and
we obtain 3(2) + 14 = A(5), i.e. 5A = 20 or A = 4. Similarly, we can plug x = −3 into
244
APPENDIX B. REVIEW OF PARTIAL FRACTIONS
both numerators and obtain 3(−3) + 14 = B(−5), which means B = −1. (These agree
with the values found previously.)
2
Example 2 (revisited). We wanted to find A, B, and C so that
x2 − 2x + 7
A(x2 + 4) + (x + 1)(Bx + C)
=
.
(x + 1)(x2 + 4)
(x + 1)(x2 + 4)
Since there are three constants, we choose three values of x. One obvious choice is
x = −1, and evaluating both numerators yields 10 = 5A, or A = 2. No other choices of
x will cause terms to vanish, so let us choose simple values like x = 0, 1:
x=0
⇒
7 = 4A + C = 8 + C
x=1
⇒
6 = 5A + 2(B + C) = 8 + 2B
⇒
C = −1
⇒
B = −1,
2
(These agree with the values found previously.)
Allowing Complex Factorization
If we allow complex numbers in our PFD, then an “irreducible quadratic” polynomial
such as x2 +1 can be factored as (x+i)(x−i). This allows us to treat any quadratic polynomial in the denominator as a product of linear factors; we shall call this a complex
partial fraction decomposition (CPFD).
Example 3 (revisited). Find the CPFD for
2x3 + 4x2 + 3
.
(x + 1)2 (x2 + 4)
We factor x2 + 4 as (x + 2i)(x − 2i), so we want to write
2x3 + 4x2 + 3
A1
A2
B
C
=
+
+
+
.
2
2
(x + 1) (x + 2i)(x − 2i)
x + 1 (x + 1)
x + 2i x − 2i
But the work that we did before evaluated A1 = 0 and A2 = 1, so let us use the answer
that we obtained before and concentrate on B and C:
2x − 1
2x − 1
B
C
=
=
+
x2 + 4
(x + 2i)(x − 2i)
x + 2i x − 2i
B(x − 2i) + C(x + 2i)
(B + C)x + 2i(C − B)
=
.
x2 + 4
x2 + 4
Hence we get the system
B+C =2
=
2i(C − B) = −1,
which can easily be solved to find B = 1 −
2x3 +4x2 +3
(x+1)2 (x2 +4)
=
i
4
1
(x+1)2
and C = 1 + 4i . Thus
+
1− 4i
x+2i
+
1+ 4i
x−2i .
One application of a CPFD is to Laplace transform; see Exercise 8 in Section 3.1.
2
Appendix C
Table of Integrals
R
R
R
un du =
1
u
1
n+1
n+1 u
if n 6= −1
+C
du = ln |u| + C
eau du =
1
a
eau + C
au
ln a
if a 6= 0
R
cot u du = ln | sin u| + C
R
sec u du = ln | sec u + tan u| + C
R
csc u du = ln | csc u − cot u| + C
R
√ du
a2 −u2
R
du
a2 +u2
=
1
a
=
1
2a
= sin−1
u
a
+ C,
if a 6= 0
R
au du =
R
cos u du = sin u + C
R
sin u du = − cos u + C
R
du
a2 −u2
R
sec2 u du = tan u + C
R
sin2 u du = 12 u −
1
4
sin 2u + C
R
csc2 u du = − cot u + C
R
cos2 u du = 12 u +
1
4
sin 2u + C
R
sec u tan u du = sec u + C
R
tan2 u du = tan u − u + C
R
csc u cot u du = − csc u + C
R
R
tan u du = ln | sec u| + C
+C
if a > 0
tan−1
ln
u
a
u+a
u−a
+ C,
if a 6= 0
+ C,
cot2 u du = − cot u − u + C
R
R
u dv = uv − v du
(over)
245
if a 6= 0
246
APPENDIX C. TABLE OF INTEGRALS
sin(a−b)u
2(a−b)
sin(a+b)u
2(a+b)
R
sin au sin bu du =
R
cos au cos bu du =
R
sin au cos bu du = − cos(a−b)u
2(a−b) −
R
eau sin bu du =
eau
a2 +b2 (a sin bu
R
eau cos bu du =
eau
a2 +b2 (a cos bu
R√
R
√
a2 −u2
u
R√
R
a2 − u2 du =
du =
√
√
a2 − u2 − a ln
u
2
√
if a2 6= b2
+C
if a2 6= b2
sin(a+b)u
2(a+b)
+
a 2 − u2 +
+C
cos(a+b)u
2(a+b)
+C
if a2 6= b2
− b cos bu) + C
+ b sin bu) + C
a2
2
sin−1
u
2
+C
√
a+ a2 −u2
u
2
u2 ± a2 ± a2 ln u +
√
= ln u + u2 ± a2 + C
u2 ± a2 du =
√ du
u2 ±a2
u
2
sin(a−b)u
2(a−b)
−
√
+C
u2 ± a2 + C
Appendix D
Table of Laplace Transforms
Here a and b are real numbers, and the transforms will exist for sufficiently large s.
→
Function
Transform
→
Function
Transform
f (t)
→
F (s)
ta
f 0 (t)
→
sF (s) − f (0)
eat
f 00 (t)
→
s2 F (s) − sf (0) − f 0 (0)
tn eat
→
n!/(s − a)n+1
→
cos bt
→
s/(s2 + b2 )
sin bt
→
b/(s2 + b2 )
Rt
0
f (τ ) dτ
eat f (t)
→
F (s − a)
u(t − a)f (t − a)
Rt
0
→
→
e−as F (s)
→
F (s)G(s)
−F 0 (s)
→
f (t)/t
tn
→
f (τ )g(t − τ )dτ
tf (t)
1
F (s)/s
→
R∞
s
F (σ)dσ
1/s
n!/sn+1
247
Γ(a + 1)/sa+1
→
→
1/(s − a)
cosh bt
→
s/(s2 − b2 )
sinh bt
→
b/(s2 − b2 )
eat cos bt
→
(s−a)/((s−a)2 +b2 )
eat sin bt
→
b/((s − a)2 + b2 )
u(t − a)
→
e−as /s
δ(t − a)
→
e−as
248
APPENDIX D. TABLE OF LAPLACE TRANSFORMS
Appendix E
Answers to Some Exercises
Section 1.1
(d) Equilibria at y
=
nπ:
semistable for n = 0; sink
for n = 1, 3, . . . and n =
−2, −4, . . . ; source for n =
−1, −3, . . . and n = 2, 4, . . .
1. (a) 1st-order nonlinear
(b) 2nd-order linear
(c) 3rd-order linear
4. (b) y = k 2 /(2ga2 ) is a stable equilibrium.
(d) 3rd-order nonlinear
2. (a) y = 2 − cos t
Section 1.3
2
(b) y = 12 (1 + ex )
(c) y = x sin x + cos x
(d) y = 3 − 2(1 − x)1/2
3. The maximum height of 64 feet is
achieved after 2 sec.
4. The ball hits after 4.5 sec with a
speed of 44.3 m/sec.
Section 1.2
1. (a) y = C exp(−x2 )
−1
(b) y = tan−1
x+C
√
(c) y = 3 3 sin x + C
(d) y + 13 y 3 = ex + C
(e) x2 = C exp(2t) − 1
(f) x = (t3/2 + C)2
2. (a) y(x) = exp(sin x)
q
(b) y(x) = 23 x3 + 2x +
1
3
(c) y(t) = −2 exp(t3 − t)
2. (a) Yes
3. (c) ∂f /∂y not continuous at y = 0
(b) No
4. 64,473
(c) Yes
3. (a) y = 0 is a source, y = 1 is a
sink.
(b) y = 1 is a sink, y = 0 is a
semistable equilibrium.
(c) y = 1 is a sink, y = 0 is a source,
y = −1 is a sink.
249
5. 1.71 hours
6. t = 1.45 × 109 years
7. 2,948 years
8. 87 31 minutes
9. 4 minutes
250
APPENDIX E. ANSWERS TO SOME EXERCISES
10. 7:37 am
11. A(t) = 50(1 − e−.02t )
13. Rock achieves max height of 5.03 m
at the time t = 1.01 sec
p
14. v ∗ = mg/c
Section 1.4
1. (a) y = 2 + C exp(−x)
(b) y = (x2 /2 + c) exp(−3x)
(c) y = xe−2x − e−2x + ce−3x
(d) y = 23 x1/2 + cx−1
(e) x = t + 1 + c/(t + 1)
3
(f) y = t [−t cos t + sin t + c ]
2. (a) y =
(b) y =
(c) y =
3. (a)
(b)
(c)
(d)
Exact: x2 ey + cos y = C
Exact: x sin y + sin x = C
Not Exact
Exact: sin x + x ln y + ey = C
4. (a) y 2 = 4x2 (ln x + 1) (x > 0)
(b) x2 y 2 + x3 + y 4 = 1
(c) y = (2e−2x − 1)−1/2 (x < 21 ln 2)
Section 1.6: Odd numbers only
1. 6.7 seconds
3.
10
9
7
(
2
(
−x
1 + e if 0 ≤ x ≤ 1,
(e + 1)e−x if x > 1.
ln x
x
− 1 + x2 if 0 ≤ x ≤ 1,
1 if x > 1.
3. 81 lbs
4. (a) 27 lb, (b) 75 lb
5. 1.7 days
6. When the tank becomes full, there is
150 g of salt; and when it becomes
half-full again, there is 75 g of salt.
7. T (t) = 25e−0.2t − 5e−t
Section 1.5
1. (a) y 2 = x2 (ln |x| + C)
(b) y = x(ln x + C)2
(c) ln |xy| = C +
x
y
(d) y = C(x2 + y 2 )
x
8
1 x
−x
)
2 (e + e
1
2 (sin x + csc x)
ln |t|−1
t
(d) y =
(e) y =
2. (a) y = ( x2 + Cx4 )−1/2
(b) y = (x + Cx2 )−3
6
5
4
3
1
0 1
t
2
3
4
5
6
7
8
9 10
5. (c) dy/dx = y 2 + x
7. y = ±1 are sources, y = 0 is a sink.
9. If c < 1 then√2 equilibrium solutions:
y+ = −1 + √1 − c unstable (source)
y− = −1 − 1 − c stable (sink)
If c = 1 then 1 equilibrium solution:
y0 = −1 unstable
If c > 1 then no equilibrium solution.
11. (a) A(y) = πr2 where r is the base
of a right triangle with height R − y
and hypotenuse R.
Ry
(b) Differentiate V (y) = 0 A(u)du,
with respect to t using the chain rule
to obtain dV /dt = A(y)dy/dt. On
the other hand, the rate at which the
volume of water changes due to the
hole is the the cross-sectional area of
the hole times the velocity
√ through
the hole, i.e. −a v = −a 2gy.
251
13. Emigration of 71, 000 annually.
15. y(x) = C exp[x +
17. y(x) =
x2
2
−
x2
4 ln x
x2
2 ]
+
−1
C
ln x
19. y(x) = (1 − cos x)−1 .
21.
y2
2
+ ey − e−x −
23. x2 =
t2
4
2 (3t
x2
2
= e − 12 .
10. (c) y = 13 (1 − cos 3x + sin 3x)
11. (c) y = 2x + 2 − 2ex cos x.
12. (c) y = 1 − 2x + x2 .
Section 2.3
2. (a) y = c1 e2x + c2 e−2x
(b) y = c1 cos 2x + c2 sin 2x
− 1).
(c) y = c1 e−3x + c2 x e−3x
(d) y = c1 e3x + c2 e−5x
Section 2.1
2
2. (a) y10 = y2 , y1 (0) = 1
y20 = 9y1 , y2 (0) = −1
(b) y10 = y2 , y1 (0) = 1
y20 = ex − 3y2 + y1 , y2 (0) = 0
(c) y10 = y2 , y1 (0) = 1
y20 = y12 , y2 (0) = 0
(d) y10 = y2 , y1 (0) = 1
y20 = ex −y2 −5 sin y1 , y2 (0) = 1
3. (a) x01 = x2 , x02 = −kx1 /m
(b)
(c)
y10
x01
= y2 , y20 = g − ky1 /m
= x2 , x02 = −kx1 /m−cx2 /m
4. k = 196 N/m
2
(e) y = c1 e− 3 x + c2 x e− 3 x
(f) y = e−4x (c1 cos 3x + c2 sin 3x)
√
√
+ c2 e(−1+ 3)x
√
√
3x
(h) y = e (c1 cos 2x+c2 sin 2x)
(g) y = c1 e(−1−
3)x
3. (a) y = 31 e3x + 23 e−3x
(b) y = cos 3x −
1
3 sin 3x
5x
(c) y = 6xe5x − e
(d) y = e3x (cos 4x − sin 4x)
(e) y = 54 (e3x − ex/2 )
(f) y = e2x cos x − 2 e2x sin x
Section 2.4
1. T = π, ω = 2
5. k = 5 lb/ft
2. T = π/2, ω = 4
Section 2.2
3. ω = 2, T = π, A =
4. x(t) =
2. y = ex − e2x
5. x(t) =
x
3. y = e + 2x e
4. y = e−x (cos x + 2 sin x)
5. linearly independent
6. linearly independent
7. dependent: 4f − 3g = 0
8. linearly independent
9. (c) y = −x + ex
5/2
√
1. y = 12 (ex + e−x )
x
√
10
3
√
17
4
cos(3t − 0.322)
cos(4t − 6.038)
6. (a) i) x(t) = e−2t − e−4t , ii) overdamped
(b) i) x(t) = 4e−t/2 − e−2t , ii) overdamped
(c) i) x(t) = e−t (cos t + sin t),
ii) underdamped
−3t/2
,
(d) i) x(t) = e−3t/2 + 5t
2e
ii) critically damped
(e) i) x(t) = e−3t (cos 4t +
ii) underdamped
5
4
sin 4t),
252
APPENDIX E. ANSWERS TO SOME EXERCISES
7. (a)
√
2 e−t cos(t√− π/4),
µ = 1, ω = 2 ≈ 1.414
(b) 4.03 e−t/4 cos(1.98 t − 3.01),
µ = 1.98, ω = 2
√
9. ω = 19.6 = 4.43, T = 1.42. Doubling the mass has no effect on T !
Section 2.5
(b) xsp (t) = − 56 cos 5t − 31 sin 5t
xtr (t)
=
e−3t ( 56 cos 2t +
25
12 sin 2t)
(c) xsp (t) = − cos t + 2 sin t
xtr (t) = e−t/2 (cos(t/2) −
3 sin(t/2))
√
7. (a) xsp (t) = 13 cos(3t − 2.16)
√5
2 41
(b) xsp (t) =
1. (a) y(x) = 2e
−3x
− 25
(b) y(x) =
c2 sin 2x
+ c1 e
−x
+ c2 e
cos(4t − 4.04)
−2x
cos 3x + c1 cos 2x +
(c) y(x) = −2x − 32 + c1 e−2x + c2 ex
(d) y(x) = − 13 x cos 3x + c1 cos 3x +
c2 sin 3x
8. (a) 0.42, (b) 12.5, (c) 0.25
Section 2.7
q
1. ω ∗ = ω02 −
R2
2L2
(e) y(x) = x2 + 12 x + c1 + c2 e−2x
2. ω # = ω0
(f) y(x) = 32 ex − cos 2x − sin 2x +
√
√
ex/2 (c1 cos 27 x + c2 sin 27 x)
3. (a) isp (t) =
√10
37
sin(2t − 4.548)
(b) isp (t) =
√20
13
sin(5t − 5.69)
2. (a) y(x) = 2xe2x + e−2x
√
(b) y(x) = ( 52 x2 + x + 1)e−2x
x
2
(c) y(x) =
− 34 + ex − e−x + 34 e−2x
(d) y(x) = 1 −
3
2 sin x
4. (a)
(b)
3
2 x cos x
+ cos x +
1 x
6e
2
3
−
x
3
sin x
3
sin 3x +
4. (a) i(t) = 5e−t sin 2t, underdamped
−5t
(b) i(t) = − 10
− 10e−10t +
3 e
40 −20t
, overdamped
3 e
(c) i(t) = 2 cos 2t − 3 sin 2t +
e−t (−2 cos 3t+ 17
3 sin 3t), underdamped
2
(c) − cos x ln | sec x + tan x|
(d)
(c) isp (t) = 2 13 sin(3t − 5.30)
cos 3x
9
ln | cos 3x|
Section 2.6
(d) i(t) = sin 2t + 3 cos 2t − 3e−2t ,
overdamped
Section 2.8: Odd numbers only
1
5
1. x(t) = cos 2t −
√
2. ω = 4 2
1
5
cos 3t
3. (a) Resonance
(b) Beats: ampl. 4, freq. 0.045
(c) Beats: ampl. 6, freq. 0.04
(d) Resonance
6. (a) xsp (t) = − 41 cos 3t + 34 sin 3t,
xtr (t) = e−2t ( 14 cos t − 74 sin t).
1. π/2
3. (a) S1 , S√
2 have lengths 2/5, 3/5 resp.
(b) ω = 5
p
5. c = 4 ln 2/ π 2 + (ln 2)2
7. (a) y1 = x, y2 = x−1 ,
4
(b) yp = 21
x5/2
9. v = 7.1 m/sec
253
11. (a) Use ma = F in the direction tan(d) 2e−2t cos 2t
2
2
gent to the arc, so a = Ld θ/dt and
F = Fg + Ff , where Fg = −mg sin θ Section 3.2
is the component of the gravitational
2. (a) y = 23 e−t + 12 et
force, and Ff = f (t) cos θ is the
(b) y = e−t
component of the external horizontal
force.
(c) y = 5et − 3 cos t + 3 sin t
3. (a) y = −3 cos 2t +
Section 3.1
2 −2t
3e
−t
(b) y =
+
(b)
(c)
2
(s−1)2 +4
(e) y = 4e2t − 3e3t
(d)
1−e−s
s
(e)
1
s2
(f) y = 61 e−t −
3 −3t
10 e
5. (a)
(b)
(c)
−
(d) y = 2et + 2e−t − 4 cos t
e−s
s
−
e−s
s2
(e)
5. x(t) =
6. (a)
(f)
(g)
(h)
1
s
+
2s3/2
1
3
6
6
s + s2 + s3 + s4
6
6
s2 −4 + s2 +9
3 −πs
s e
6. (a) 2e
(d)
(e)
(f)
7. (a)
1
−3t
−4e−2t +cos t+sin t)
10 (3e
2
(s−a)3
(b)
6bs2 −2b3
(s2 +b2 )3
(c)
s2 −b2
(s2 +b2 )2
(d)
e−s
s
+
e−s
s2
1. (a) 1/(s − 2)2
sin 2t
(c) (s − 3)/((s − 3)2 + 25)
(e) 1/(s − 1)2 + 1/(s + 1)2
sinh 2t
e4t +
7
4
(f) 6/(s−1)4 +(s+1)/((s+1)2 +4)
e−4t
2. (a)
u(t − 2)
(c) e
+ cos t
(c)
1
2
(d)
− 12 e−t
(d)
+ t − 12 e2t
+ 2e
3 2t 2
2e t
−2t
(b) e
1
−2t
)
2 (1 − e
−t
−2t
8. (a) cos bt
(b) e−t sin t
1
4 (1
+
(d) 3/((s + 4)2 + 9)
(b) −e
(c)
1 2t
30 e
(b) 2/(s + 3)3
3
2
√
4√ t
π
1
2
5
4
1
3
+
Section 3.3
2t
(b) 2 cos 2t −
(c)
1 −2t
2e
4. x(t) = e−2t − e−4t
√
(d) 3 π/(4s5/2 )
π
− 5e + 4e2t
(c) y = 2e
s
s2 +9
6
5
s3 − s−3
2s
6
s2 +9 − s2 +4
√
sin 2t
1 t
3e
t
e
s−2
1
(1−s)2
4. (a)
5
2
− cos 2t)
−
3 −3t
2e
3. (a)
t
−2t
(cos 3t +
1
3
sin 3t)
−3t
e√
πt
1 2t
2 e sin 2t
−3t
(b) 2e
cos t + 7e−3t sin t
(c) 2e3t cos 4t − 43 e3t sin 4t
4. (a)
1 −t
5e
−
1 −2t
(sin 2t
10 e
+ 2 cos 2t)
254
APPENDIX E. ANSWERS TO SOME EXERCISES
(b) et (− 23 − t) −
1 −2t
12 e
+ 43 e2t
8. cos t + u1 (t) sin(t − 1)
−t
t
(c) e (2 − cos 2t − sin 2t) − e
Section 3.5
5. (a) e−πs s21+1
(b)
6. (a)
(b)
1. (a) f ? g(t) = t2 /2
e1−s
s−1
(b) f ? g(t) =
2
3 u1 (t) sin 3(t − 1)
1
−(t−3)
sin 2(t
2 u3 (t)e
(d) f ? g(t) = (ebt − eat )/(b − a)
1
2 (cos 3t
−2t
Section 3.4
2. (a) 2(e2t − et )
(b)
2. (a) 3u1 (t) (et−1 − 1)
(b) e−2t + 12 u5 (t)(1 − e−2(t−5) )
3. (a) 2 cosh t
+ u1 (t)(cosh(t − 1) − 1)
(b) sinh 2t
+ 14 u1 (t)(cosh 2(t − 1) − 1)
− 14 u2 (t)(cosh 2(t − 2) − 1)
−2t
2e
cos t + 4 e
sin t +
u(t − 3)[1 − e2(t−3) cos(t − 3) −
2e−2(t−3) sin(t − 3)]
(d) uπ (t)[sin t − cos t − 5e
2e−2(t−π) ]
2t
4. (a) e + u1 (t)e
1 5t
6 (e
−(t−π)
+
2(t−1)
− e−t ) + u2 (t)e5(t−2)
5. (a) cos 2t + 12 u3 (t) sin 2(t − 3)
1
−2t
) − 12 te−2t
4 (1 − e
−2(t−2)
+ u2 (t)(t −
2)e
(c) [2−e2π uπ (t)+e4π u2π (t)]e−2t sin t
(d)
+t−1
(b) 3/[(s − 1)(s2 + 9)]
1
1
7 (cos 3t−cos 4t)+ 2 uπ/2 (t) sin 4t
6.
1
2 (1
7.
1
−t
cos 2t + 12 e−t sin 2t] +
5 [−1 + e
uπ (t)
−(t−π)
cos 2t+ 12 e−(t−π) sin 2t]
5 [1−e
− u2π (t))(sin t −
1
3
(c) s/(s2 + 1)2
Rt
4. (a) y(t) = 0 sinh(t − τ )f (τ ) dτ
Rt
(b) 21 0 e−(t−τ ) sin 2(t − τ )f (τ ) dτ
Rt
(c) y(t) = 0 e−(t−τ ) (t − τ )f (τ ) dτ
Section 3.6: Odd numbers only
−2t
1
2
(b)
+ sin t − e−t ]
3. (a) 2/(s4 + 4s2 )
(b) g(t) = 1 − u2π (t) + u2π (t) cos t
(b)
1
2 [cos t
−t
(c) e
1. (a) f (t) = u1 (t) (t−1)−u2 (t) (t−2)
(c)
− at − 1)
(c) f ? g(t) = t − 2 + 2 cos t
− 3)
− 3 sin 3t − e−2t cos t +
7e
sin t)
1
8. 3 sin 3t + uπ (t) 16 sin 3t − 12 sin t
7.
1
at
a2 (e
2
sin 3t)
1. Any c > 2.
−s
3. 1s 1−e
−s
1+e
5.
1
s2
1−e−s
1+e−s
7. x(t) = 12 (cos t + cosh t)
9. f (t) = (2a3 )−1 (sinh at − sin at)
11. (a) For any t, only finitely many
terms are nonzero
(b) Period is 2π
(c) 1 − cos t + 2
cos(t − kπ))
P∞
k=1
ukπ (t)(1 −
(d) Resonance occurs since the forcing frequency equals the natural
frequency
13. (a)
1
2
sin 2t (1 + uπ (t) + u2π (t) + · · · )
255
(b) Yes, we have resonance: each
time the mass passes through
the equilibrium position in the
positive x direction, it is given
an additional unit impulse.

1
(d) 0
0

0
1
0
3. (a) x1 = 3, x2 = −2, x3 = 4
(b) x1 = 5 + 2t, x2 = t, x3 = 7
Section 4.1
1. (a) Unique solution x1 = 5, x2 = −3
(b) No solution
(c) Unique solution x1 = −4, x2 = 3
(d) Infinite number: 2x1 − x2 = 6
1 −2 −5
2. A − 2B =
−3 0
3
2 −1
3. (a) AB =
6 −1
2 −1 2
(b) AB =
6 −1 4
(c) AB does not exist
 
−1
(d) AB =  1 
−1
−1 10
4. AB =
6 −6


−1
3
6
14
13 
BA =  12
−8 −16 −20
5. A and C are symmetric.
Section 4.2
1. (a) Yes, (b) No, (c) No, (d) No
2. The REF is not unique, so other answers are possible:
1 −1
(a)
0 1


0 1 2
(b) 0 0 1
0 0 0


1 2 3 4
(c) 0 1 2 3
0 0 1 0
(c) x1 = 2, x2 = −1, x3 = 3
(d) Inconsistent (no solution)
4. (a) Only the trivial solution
(b) (x1 , x2 , x3 ) = (−t, −3t, t)
(c) (x1 , x2 , x3 , x4 ) =
(5s − 4t, −2s + 7t, s, t)
(d) (x1 , x2 , x3 , x4 ) =
(3s − 6t, s, −9t, t)
5. (a) x1 = (1 − i)t, x2 = t
(b) x1 = (−3 + i)/2,
x2 = (3 + 3i)/2
(c) x1 = (1 + i)t, x2 = t
(d) x1 = (1 + i)/2, x2 = (−1 − i)/2
Section 4.3
1 0
1. (a)
0 1

1 0
(b) 0 1
0 0

1 2
(c) 0 0
0 0

1 0
(d) 0 1
0 0

−5
4
0

0 0
1 0
0 1

0 5
0 2
1 −1
2. (a) x1 = 1, x2 = −1, x3 = 2
(b) No solution
(c) x1 = 3, x2 = −2, x3 = 4,
x4 = −1
(d) x1 = 3 − s − t, x2 = 5 + 2s − 3t,
x3 = s, x4 = t
(e) x1 = 5 + 2s − 3t, x2 = s,
x3 = −3−2t, x4 = 7+4t, x5 = t
256
APPENDIX E. ANSWERS TO SOME EXERCISES
(f) x1 = 2, x2 = 1, x3 = 3, x4 = 4
 
 
3
1
−2

3. (a) x = −1
(c) x = 
4
2
−1
 
 
 
−1
3
−1
−3
5
2
 

 
(d) x = 
0 + s  1  + t  0 
1
0
0
 
 
 
−3
2
5
0
1
0
 
 
 
 
 

(e) x = 
−3 + s 0 + t −2
4
0
7
1
0
0
 
2
1

(f) x = 
3
4
4. (a)
(b)
(c)
(d)
 
−2
x = t −1
1
 
 
−2
−1
−3
−1

 
x = s
 1  + t 0 
1
0
 
1
−1

x = t
1
1
 
 
 
1
−2
−7
−2
3
−4
 
 
 

 
 
x = r
 1 +s 0 +t 0 
0
1
0
0
0
1
5. (a) rank 2, (b) rank 1, (c) rank 3,
(d) rank 2
Section 4.4
2
4. (a) 13
−1
1
1
(b)
3
−4
−2
3

−5 −2
1
(c) No inverse. (d)  2
−4 −3


−13 42 −5
−9 1 
(e)  3
2
−7 1


−26 11 −14
1 
6
−1
4 
(f) 10
−8
3
−2
−26
5. (a) x =
11
 
6
(c) x = −1
−1

5
−2
5
8
(b) x =
−7


−13
(d) x =  7 
−3
1
2
3
25
(b) x = 41
−1
−10




−12
−107
(c) x = −128 (d) x =  0 
9
48
6. (a) x =
Section 4.5
1. (a) 2, (b) −8, (c) −24, (d) 1
2. (a) k = ±3, (b) k = 0, 4
3. (a) −1, (b) 1, (c) 1
4. (a) 2 + i, (b) −1 + 2i
Section 4.6
1. (a) −6, (b) −3, (c) −14,
(d) −4, (e) −4, (f) 1
2. (a) 0, (b) −4, (c) 0, (d) −180


7 −6 15
3. (a) A−1 = 21 −4 4 −10
−2 2
−4


15 −25 −26
1 
30 25
8 
(b) A−1 = 225
15 −25 19
257
4.
A−1

1
0
=
0
1
1
0
1
1
0
0
0
−1

1
1

1
1
Section 4.7: Odd numbers only
1. Two planes either intersect in a line,
a plane, or not at all. So cannot have
a unique solution: either no solution
or an infinite number of solutions.
3. No. Requires AB = BA.
Section 5.1
1. (a) (2, 6), (b) (1, −4), (c) (−1, 11)
2. (a) (4, 0, −2), (b) (1, −4, −1),
(c) (0, 8, 1)
3. (a) (0, 4, 0, −2), (b) (1, −1, −4, −1),
(c) (−2, 4, 8, 1)
√
√
√
4. (a) 2, (b) 13, (c) 6
√
5. 3 3
√
6. r = 1/ 6
7. 2i − 3j
5. No.
8. 2i − 3j + 5k
7. Take the transpose of I = AA−1 to
obtain I = (A−1 )T AT = (A−1 )T A.
9. w = (0, −2, −2)
Similarly, the transpose of I =
A−1 A yields I = AT (A−1 )T = Section 5.2
A(A−1 )T . These together imply
3. (a) No (not closed
A−1 = (A−1 )T .
multiplication by
9. Skew-symmetric ⇒ aji = −aij all i, j
(b) No (not closed
⇒ aii = −aii all i ⇒ aii = 0 all i.
multiplication by
11. (a) Nilpotent: A2 = 0
(c) No (not closed
(b) Not nilpotent (B3 = B)
(c) Nilpotent: C3 = 0
2
(d) Not nilpotent (D = D)
13. Inconsistent: no solution.
15. x1 = −t, x2 = −1 − 2t, x3 = −2 − 3t,
x4 = t
17. x1 = 1 + i − t, x2 = 1 − it, x3 = t
under scalar
all reals)
under scalar
all reals)
under scalar
multiplication by all reals)
(d) Yes
(e) No (need to be the same size to
define addition)
(f) Yes
(g) No (has no zero vector)
(h) Yes
(i) No (not closed under addition)
2
19. a) k 6= 1, b) k = 1, c) k = −1
Section 5.3
21. Rank is 1. No solutions.
23. Rank is 3. No solutions.
25. If we put A|b into reduced rowechelon form, there will be a row of
the form 0 · · · 0|1, so inconsistent.


−27 9 −5
27. X =  10 −3 2 
−22 6 −4
1. (a) Yes.
(b) No (has no zero vector).
(c) No (not closed under addition).
(d) No (not closed under addition).
2. (a) Yes.
(b) Yes.
258
APPENDIX E. ANSWERS TO SOME EXERCISES
(c) Yes.
(b) f1 − 2f2 + (7/π)f3 = 0
(d) No (has no zero vector).
(c) f1 − f2 − f3 = 0
3. (a) No (not closed under addition).
(b) No (not closed under scalar
multiplication).
(c) Yes.
(d) Yes.
4. (a) Yes. (b) No. (c) Yes. (d) No.
5. (a) w = 3v1 − 2v2 + 4v3
(b) Not possible.
(c) w =
26
7 v1
− 75 v2 − 37 v3
6. (a) v = (1, −2, 1)
(b) v = (−i, i, 1)
(c) v1 = (−2, −2, 1, 0)
v2 = (−5, −3, 0, 1)
(d) v = (2, −3, 1, 0)
(e) v1 = (−1, −1, 1, 0)
v2 = (−5, −3, 0, 1)
7. (a) v = (0, 2, 1)
(b) v1 = (1, −3, 1, 0)
v2 = (−2, 1, 0, 1)
Section 5.4
1. (a) linearly independent
(b) dependent
(c) linearly independent
(d) dependent
(e) linearly independent
(f) linearly independent
2. (a) linearly independent if c 6= −2
Section 5.5
1. (a) Yes, (b) No, (c) Yes,
(d) No, (e) Yes
2. Any finite collection {1, x, x2 , . . . , xk }
is linearly independent.
3. (a) v = (−10, −7, 1), dim=1
(b) v1 = (−1, −3, 1, 0),
v2 = (−3, 8, 0, 1), dim=2
(c) v1 = (1, −3, 1, 0),
v2 = (−2, 1, 0, 1), dim=2
(d) v1 = (−2, 2, 1, 0, 0),
v2 = (−1, 3, 0, 1, 0),
v3 = (−3, −1, 0, 0, 1), dim=3
4. (a) v = (2, 1), dim=1
(b) dim=0,
(c) v = (1, −2, 1), dim=1
(d) v1 = (−1, 1, 1, 0),
v2 = (−1, 2, 0, 1), dim=2
Section 5.6
1. (a) (1, −3) is a basis for Row(A);
(3, −1) is a basis for Col(A)
(b) v1 = (1, 0, −1), v2 = (0, 1, 1) is
a basis for Row(A);
w1 = (3, 1, 1), w2 = (2, 3, 2) is
a basis for Col(A)
(c) v1 = (1, 0, −1, 1), v2 =
(0, 1, 3, −1) is a basis for
Row(A);
w1 = (3, 2, 1), w2 = (1, 1, 0) is
a basis for Col(A)
(b) linearly independent if c 6= ±1
(c) v4 = 2v3 so the collection is
never linearly independent!
3. (a) W = −x
(b) W = 12
(c) W = −2x−6
4. (a) −2f1 + f2 + f3 = 0
2. (a) v1 , v2 is a basis
(b) v1 , v2 , v4 is a basis
(c) v1 , v2 is a basis
Section 5.7
2. hv, λwi = hλw, vi = λhw, vi =
λ hw, vi = λ hv, wi
259
3. (a) orthu (v) = λu − proju (v)
= λu − hv, uiu = λu − hλu, uiu
= λu − λhu, uiu = λu − λu = 0
√
4. (a) c = −6, (b) c = −1, (c) c = ± 3
5. (a) u1 = 17 (6, 3, 2), u2 = 17 (2, −6, 3)
(b) u1 =
u2 =
(c) u1 =
u2 =
u3 =
√1 (1, −1, −1),
3
√1 (4, 5, −1)
42
√1 (1, 0, 1, 0)
2
√1 (−1, 2, 1, 0),
6
√1 (1, 1, −1, 3)
12
15. C n (I) is a vector space since:
(rf )(n) = rf (n) .
It has no finite basis since for any
k, the set {1, x, x2 , . . . , xk } is linearly
independent.
1. Yes.
3. Yes.
17. Let n = dim(S) and {s1 , . . . , sn } be
a basis for S. Show that it is also a
basis for V .
19. nullity = 3.
5. (a) If f, g are in S1 ∩ S2 , then
f, g in S1 ⇒ f + g in S1 , and
f, g ∈ S2 ⇒ f + g in S2 , so
f + g is in S1 ∩ S2 .
(b) If f is in S1 ∩ S2 , then
f in S1 ⇒ rf in S1 , and
f in S2 ⇒ rf in S2 , so
rf is in S1 ∩ S2 .
7. No: A1 + 2A2 + 3A3 − A4 = 0

0
1
0
13. {1, x, x2 } is a basis.
(f + g)(n) = f (n) + g (n) and
Section 5.8: Odd numbers only
9. (b)

1
0
0
(b) If c1 f1 + c2 f2 = 0 then for x > 0
we have c1 x + c2 x = 0 so c1 = −c2 ,
while for x < 0 we have c1 x−c2 x = 0
so c1 = c2 . This means c1 = c2 = 0.
Basis:
 
0
0 0
0 0 , 0
0
0 0
0
1
0
 
0
0
0 , 0
0
0
 
0
0
0 , 0
0
1
0
0
0
 
1
0
0 , 0
0
0
1
0
0
(c) dim=6
kv+wk2 = kvk2 +hv, wi+hw, vi+kwk2 .
23. If v1 , v2 are in S ⊥ and s is in S, then
hv1 + v2 , si = hv1 , si + hv2 , si = 0 so
v1 + v2 is in S ⊥ . Similarly show rv
is in S ⊥ if v is.
25. If v is in S2⊥ then hv, si = 0 for all s
in S2 . But S1 ⊂ S2 so hv, si = 0 for
all s in S1 , i.e. v is in S1⊥ .

0
Section 6.1
0 ,
1
1. (a) λ1
λ2

0 0
(b) λ1
0 1 ,
λ2
1 0
(c) λ1
λ2
0
0
0
11. (a)
(
1
f20 (x) =
−1
21. Compute
for x > 0
for x < 0
so f20 is not continuous at x = 0.
= 3, v1 = (3, 1);
= −5, v2 = (−1, 1).
= 1, v1 = (1, 1);
= 2, v2 = (3, 2).
= 2, v1 = (1, 1);
= 4, v2 = (4, 3).
(d) λ1 = 1, v1 = (1, 0, 0);
λ2 = 2, v2 = (1, 1, 0);
λ3 = 3, v3 = (0, 0, 1).
(e) λ1 = 1 with ma = 2 = mg ,
v1 = (1, 0, 1), v2 = (−3, 1, 0);
λ2 = 3, v3 = (1, 0, 0).
260
APPENDIX E. ANSWERS TO SOME EXERCISES
(f) λ1 = −1, ma = 3, mg = 2,
v1 = (1, 1, 0), v2 = (−3, 0, 4)
(no 3rd eigenvector).
(b)
2. (a) λ1 = 1 + 3i, v1 = (−1 + i, 1);
λ2 = 1 − 3i, v2 = (−1 − i, 1).
(b) λ1 = −2 + i, v1 = (1, i);
λ2 = −2 − i, v2 = (1, −i).
(c)
(c) λ1 = 1, v1 = (1, 0, 0);
λ2 = 2i, v2 = (0, i, 1);
λ3 = −2i, v3 = (0, −i, 1).
(d) λ1 = i, v1 = (−i, 1, 0, 0),
v2 = (0, 0, i, 1);
λ2 = −i, v3 = (i, 1, 0, 0),
v4 = (0, 0, −i, 1).
3. (a)
3. An v = An−1 Av = λAn−1 v = · · · =
λn v.
(b)
4. λ1 = 1, v1 = (1, 0),
λ2 = −1, v2 = (0, 1)
(c)
Section 6.2
1
1. (a) E =
1
4. (a)
2
2
,D=
1
0
(b) Not diagonalizable
1 3
0
(c) E =
,D=
1 2
0


1 3 0
(d) E = 0 1 0,
0 0 1


1 0 0
D = 0 2 0
0 0 2
(e) Not diagonalizable


1 0 0 1
0 1 0 1

(f) E = 
0 0 0 1,
0 0 1 0


1 0 0 0
0 1 0 0 

D=
0 0 1 0 
0 0 0 −1
1 1
2i
0
,D=
i −i
0 −2i
1−i 1+i
E=
,
2
2
1+i
0
D=
0
1−i


1 1 − 2i 1 + 2i
E = 1 1 − 2i 1 + 2i,
1
5
5


1
0
0
0 
D = 0 1 + 2i
0
0
1 − 2i
63 −62
31 −30


1 189 0
0 64 0 
0 0 64


1 −510 255
0
1
0 
0 −510 256
−e + 2e2 2e − 2e2
−e + e2
2e − e2


e −3e + 3e2 0
0
e2
0
0
0
e2


e 2e − 2e2 −e + e2
0
e
0 
0 2e − 2e2
e2
2. (a) E =
0
3
(b)
0
2
(c)
Section 6.3
1. (a) u1 =
(b) u1 =
u2 =
(c) u1 =
√1 (1, −1), u2
2
√1 (3, −2),
13
√1 (2, 3)
13
√1 (−1, 1, 0),
2
u2 = (0, 0, 1),
u3 = √12 (1, 1, 0)
(d) u1 =
u2 =
u3 =
√1 (−1, 0, 1),
2
√
1
(1,
2, 1),
2
√
1
2 (1, − 2, 1)
=
√1 (1, 1)
2
261
√ √
1/ √2 1/√2
O=
−1/ 2 1/ 2
−2 0
D=
0 4
√ √
1/√2 1/ √2
O=
1/ 2 −1/ 2
0 0
D=
0 2


1
0√
0√
O = 0 1/ √2 1/√2
0 −1/ 2 1/ 2


−1 0 0
D =  0 −1 0
0
0 3


1
0√
0√
O = 0 1/ √2 1/√2
0 −1/ 2 1/ 2


2 0 0
D = 0 2 0
0 0 4
2. (a)
(b)
(c)
(d)
3. (a) O−1
(b) O−1
(c) O−1
 √
1/√3
= 1/ 2
0

1
= √12 −1
0

1

1  0
= 4
−1
√
2
√
1/ 3
0
1
√ 
−1/√ 3
1/ 2 
0

0 1

0
√ 1
2 0
√1
2
1
0

−1 −1
√
2 0 

−1 √1 
0
2
4. y1 v1 + · · · yn vn = z1 v1 + · · · zn vn
⇒ (y1 −z1 )v1 +· · ·+(yn −zn )vn = 0.
Use linearly independence.
5. If Auj = λj uj , let vj = Ouj .
Show that {v1 , . . . , vn } are orthnormal eigenvectors for B with the same
eigenvalues.
√


1/2
−1/2
1/ 2
√
√
6. A = 1/ 2
1√
−1/ 2
1/2 −1/ 2
3/2
Section 6.4: Odd numbers only
1. (a) (p(x) + q(x))0 = p0 (x) + q 0 (x)
and (rp(x))0 = rp0 (x)
(b) Only eigenvalue is λ = 0 and
eigenspace is the constants
3. A invertible ⇔ det(A) 6= 0
⇔ det(A − 0I) 6= 0
⇔ λ = 0 is not an eigenvalue
5. (B−λI)T = BT −λI so det(B−λI) =
det(BT − λI) ⇒ BT and B have
the same characteristic polynomial,
hence the same roots.
7. λ = ±1 and eigenbasis
1 0
0 1
0 0
0
,
,
,
0 0
0 0
1 0
0
0
1
9. Set λ = 0 in the characteristic equation.
.8 .3
11. (a) A =
.2 .7
(b) (x10 , y10 ) = (5.999, 4.001) ≈ (6, 4)
(c) (x10 , y10 ) = (6.001, 3.999) ≈ (6, 4)
(d) From (d) in previous problem,
1 3
10
∗ ∗
∗
,
A ≈ [v1 v1 ] where v1 = 5
2
5
6
so A10
≈
5 4 7
6
and A10
≈
.
3
4
Section 7.1
1. (a) i) x01 = x2 , x02 = 4x2 − 3x1
ii) (0, 0) is unstable (source)
(b) i) x01 = x2 , x02 = −9x1
ii) (0, 0) is stable (center)
(c) i) x01 = x2 , x02 = 3x1 + 2x2
ii) (0, 0) is unstable (saddle)
(d) i) x01 = x2 , x02 = −6x1 − 5x2
ii) (0, 0) is stable (sink)
2. (a) i) x01 = x2 , x02 = −x1 (x1 − 1)
ii) critical pts (0, 0) and (1, 0)
262
APPENDIX E. ANSWERS TO SOME EXERCISES
(b) i) x01 = x2 , x02 = −x2 − ex1 + 1
ii) critical pt (0, 0)
(c) i) x01 = x2 , x02 = sin x1
ii) critical pts (±nπ, 0),
n = 0, 1, 2, . . .
(d) i) x01 = x2 ,
x02 = (x2 )2 + x1 (x1 − 1)
ii) critical pts (0, 0), (1, 0)
3. (a) (0, 0) is a saddle, and
(1, 0) is a center.
(b) (0, 0) is a stable spiral.
(c) (±nπ, 0) is a saddle for n even,
and a center for n odd.
(d) (0, 0) is a center, and
(1, 0) is a saddle.
Section 7.2
1. (a) x1 = c1 cos 2t + c2 sin 2t
x2 = −c1 sin 2t + c2 cos 2t
(b) x1 = −et + 2e2t
x2 = 2et − 2e2t
(c) x1 = c1 et + c2 e−t
x2 = 13 c1 et + c2 e−t
(d) x1 = c1 e−t + c2 e3t + 3e4t
x2 = −c1 e−t + c2 e3t + 2e4t
0 x1
0 2 x1
2. (a)
=
x2
−2 0 x2
0 x1
3 1 x1
(b)
=
x2
−2 0 x2
0 x1
2 −3 x1
(c)
=
x2
1 −2 x2
0 4t x1
1 2 x1
5e
(d)
=
+
x2
2 1 x2
0
3t
2e − 2e−2t
3. (a) x =
6e3t − e−2t
sin 2t + cos 2t
(b) x =
cos 2t − sin 2t
 2t

e − e−t
(c) x = 53  e2t − e−t 
e2t + 2 e−t

et + 2e3t + e5t

et − e5t
(d) x = 
1 5t
1 t
3t
2e − e + 2e
1
−1
4. (a) xp =
(b) xp =
−1
1/2
 
 
1
−2
(c) xp =  3  (d) xp = −3
−1
0
1
5. (a) xp = e2t
−1
cos 2t
(b) xp =
− sin 2t
−2 e2t
(c) x =
− 1 et − e2t
2 t
−2te
(d) xp =
−3et

Section 7.3
1
3
1. (a) x = c1 e−t
+ c2 e4t
−1
2
−2t
5t
e
+ 6e
(b) x =
−6 e−2t + 6 e5t
 
 
 
2
3
6
(c) x = c1 2+c2 et 1+c3 e−t 1
2
2
5
 5t

3t
3e − 2e
(d) x(t) = −3e5t + 3e3t 
3e5t − 4e3t
− sin 3t
2. (a) x = c1 e4t
cos3t
cos 3t
+ c2 e4t
sin 3t
4 sin 2t + 3 cos 2t
(b) x =
2 sin 2t − cos 2t
 
1
(c) x = c1 et 0
 0

0
+ c2 e2t cos 3t + sin 3t
 − cos 3t

0
+ c3 e2t − sin 3t + cos 3t
sin 3t
263


2 sin 2t + cos 2t
(d) x = 2 cos 2t − sin 2t
3 e3t
c1 + c2 (t + 1)
−c1 − c2 t
4t c1 + c2 (t − 1)
(b) x = e
−c1 − c2 t
3. (a) x = e−3t
4. (a)
(b)
(c)
(d)
−1
6t 4
x(t) =
+ c1 e
1/2
3
−1
+ c2 e−t
1
1 5t 7
t −3
x(t) = e
+2e
1
−1
1
+ 12 e−t
1
 
 
6
7
x(t) = e2t 2 + c1 2
5
4
 
 
2
3
+ c2 et 1 + c3 e−t 1
2
2
 
 
1
1
x(t) = −2 + c1 e5t −1 +
3
1
 
 
0
−2
c2 e3t  3  + c3 e3t 0
1
0
Section 7.4
1. x1 (t) = 80 − 30 e−0.1t − 20 e−0.3t
x2 (t) = 80 − 60 e−0.1t + 40 e−0.3t
−3t/4
2. x1 (t) = 10 + 5 e
x2 (t) = 20 − 5 e−3t/4
3. x1 (t) = 20 + 12 e−11t + 5 e−18t
x2 (t) = 6 + 8 e−11t + 15 e−18t
x3 (t) = 40 − 20 e−11t − 20 e−18t
4. x1 (t) = 10
x2 (t) = 10 − 5 e−t/10
x3 (t) = 10 − 5 e−t/10 − 5e−t/5
Section 7.5
1. The general
solution is
ct
c1 + c2 t
− 2m
x=e
c
c
−c1 2m
− c2 t 2m
+ c2
The first component is the position
ct
and x(t) = e− 2m (c1 +c2 t) agrees with
the solution found in Section 2.4.
2. The spring forces on m1 come from
the left and middle springs, and
on m2 from the middle and right
springs. Calculate these.
3. Let µ = λ2 , apply the quadratic formula to find µ, and compare terms
to show µ± < 0.
4. Natural frequencies
and
 are ω = 1, 2 

− cos t
sin t
 sin t 
 cos t 



x(t) = c1 
 2 sin t  + c2 −2 cos t +
2 sin t
2 cos t




cos 2t
− sin 2t
−2 sin 2t
−2 cos 2t



c3 
 sin 2t  + c4  − cos 2t 
2 sin 2t
2 cos 2t


0
1
0
0
−q1 −λ q2
0 
 where
5. x0 = 
 0
0
−λ
1 
q3
0 −q3 −q4
q4 = c2 /m2 while q1 , q2 , q3 are as for
case of three springs.
6. Resonant frequencies are ω = 1, 2.
For nonresonant ω,
1
cos ωt
x(t) = (ω2 −1)(ω
+
2 −4)
3 − ω2
1
1
c1 cos t
+
c2 sin t
+
2
2
1
1
c3 cos 2t
+ c4 sin 2t
−2
−2
7. x00i (t) ≡ 0 and Ax1 = Av = 0 =
A(tv) = A(x2 (t)).
264
APPENDIX E. ANSWERS TO SOME EXERCISES
 
 
1
1
8. x(t)
=
a0 1 + b0 t 1 +
1
1
 
 
1
1
a1 cos 2t  0  + b1 sin 2t  0  +
−1
−1
 
 
1
1
a2 cos 4t −3 + b2 sin 4t −3
1
1
9. To the solution xh (t) in Exercise 8,
add the particular
 solution

−4
3t 
5
xp (t) = 2 cos
21
−4
Section 7.6: Odd numbers only
1. Since x1 (0) = (0, 0) = x2 (0), uniqueness of solutions would imply x1 (t) ≡
x2 (t), which is not the case.
3. We differentiate the series term-byterm and then factor 2out A:
x(t) = (I + tA + (tA)
2! + · · · )v
2
0
2
x (t) = (A + tA + t2! A3 + · · · )v
2
= A(I + tA + (tA)
2! + · · · )v
= AetA v = Ax(t).
Check I.C.: x(0) = e0 v = Iv = v.
"
#
4t
4t
5. etA =
1+e
2
4t
1−e
4
1+e4t
2
√
√
√

7
t
7 cos 27 t+sin
2
√


7
−4 sin

2 t √ 
√
+c4 e−t/2  √

7
7
− 7 cos 2 t−sin 2 t
√
4 sin 27 t
(c) The system is underdamped
since x1 = x and x3 = y oscillate as they decay.
(d) Not all solutions decay to zero
since can have c1 6= 0; but this
represents a shift in the location of the spring rather than a
change in the motion.
√
9. (a) x(t) = 12t + √85 sin 5t
√
12
y(t) = 12t − √
sin 5t
5
√
(b) t∗ = π/ 5
(c) v1 = 4, v2 = 24
(d) momentum is 60, energy is 600
11. x = Ey ⇒ x0 = Ey0 and Ax + f =
AEy + f , so Ey0 = AEy + f . Multiply both sides on the left by E−1 and
use E−1 AE = D.
cos t 5 t 5 −t − 2 + 4e + 4e
13. x =
− sin2 t + 45 et − 54 e−t
1−e
Appendix A
1 + e4t
x(t) =
√
2(1 − e4t )
1. (a) z = 1 − 2i, |z| = 5


√
0
1
0
0
(b) z = 2 + 3i, |z| = 13
−1 −1 1
0
x
7. (a) x0 = 
(c) z = −3 − 4i, |z| = 5
0
0
0
1
√
1
0 −1 −1
(d) z = 5 + 7i, |z| = 74
 
 
1
−1
2. (a) wz = −20 + 15i, wz = 54 − 35 i
0


−t  1 


(b) x(t) = c1   +c2 e  
1
−1
(b) wz = 5, wz = − 35 + 45 i
0
1
(c) wz = 5 − i, wz = − 12 − 52 i
√
√


√
7
7
cos 2 t− 7√sin 2 t
8
(d) wz = 17i, wz = 17
− 15
17 i


7
−4
cos
t


−t/2
2
√
√
√
+c3 e


− cos 27 t+ 7 sin 27 t 3. (a) 3 ei 3π/2 , (b) 3 ei 5π/4 ,
√
(c) 2 ei 5π/3 , (d) 5 ei 2.498 .
4 cos 27 t
Bibliography
[1] G. Birkhoff & G.-C. Rota, Ordinary Differential Equations (4th ed.), John Wiley,
New York, 1989.
[2] R. Churchill, Operational Mathematics (3rd ed.), McGraw-Hill, New York, 1972.
[3] M. Hirsch, S. Smale, Differential Equations, Dynamical Systems, and Linear Algebra, Academic Press, New York, 1974.
[4] G. Strang, Linear Algebra and its Applications (2nd ed.), Academic Press, 1980.
265
266
BIBLIOGRAPHY
Index
nth-order differential equation, 35
adjoint matrix, 139
algebraic multiplicity, 185
alternating current, 73
amplitude-phase form, 51
argument of a complex number, 238
augmented coefficient matrix, 112
autonomous, 12, 204
back substitution, 114
basis
for a solution space, 167
for a vector space, 163
beats, 67
Bernoulli equations, 28
capacitor, 71
center, 205
chain of generalized eigenvectors, 219
characteristic equation
for a square matrix, 182
for an nth order equation, 45
characteristic polynomial
for a square matrix, 182
for an nth order equation, 45
circuits, electric, 71
circular frequency, 50
closed system (of tanks), 224
co-linear vectors, 153
coefficients, 35
coefficients of a system/matrix, 107
cofactor expansion, 136
cofactor matrix, 139
cofactor of a matrix, 136
column space of a matrix, 170
column vector, 107
complementary solution, 43
complex
conjugate, 237
number, 237
plane, 237
complex vector space, 148
components of a vector, 107, 145
conservation of energy, 56
convolution of functions, 100
cooling, 9
coordinates in Rb , 145
coupled mechanical vibrations, 228, 230
critical points, 12, 205
critically damped, 53
damped
forced vibrations, 67
free vibration, 53
damping coefficient, 38
dashpot, 50
defect (of an eigenvalue), 217
defective eigenvalue, 217
delta function, 96
determinant
of a (2 × 2)-matrix, 41, 126
of a square matrix, 131
diagonalizable matrix, 187
differential form, 29
Dirac delta function, 96
direct current, 73
direction field, 11
discontinuous inputs, 94
dot product, 108
drag, 17
echelon form, 113
eigenbasis, 187
eigenpair, 182
eigenvalue, 182
eigenvalue method, 214
267
268
eigenvector, 182
electric circuits, 71
electrical resonance, 76
elementary row operations, 113
elements of a matrix, 107
elimination
for first-order systems, 207
for linear systems, 105
equilibrium, 10, 12, 205
ERO, 113
Euler’s formula, 238
even permutation, 133
exact equations, 29
existence of solutions
for first-order equations, 11
for first-order systems, 204
for second-order equations, 39
exponential order, 82
external force, 38
first-order differential equation, 7
first-order linear system, 207
first-order system, 37, 203
forced vibrations, 65
forcing frequency, 65
free variables, 114
free vibrations, 38, 50
gamma function, 81
Gauss-Jordan method, 128
Gaussian elimination, 112
general solution
for 1st-order systems, 210
for 2nd-order equations, 41
for first-order equations, 8
generalized eigenvector, 217
geometric multiplicity, 185
Gram-Schmidt procedure, 176
homogeneous nth-order equation, 35
homogeneous equations (1st-order), 27
homogeneous first-order system, 207
homogeneous linear system, 117
Hooke’s law, 37
hyperbolic cosine, 84
hyperbolic sine, 84
identity matrix, 109
INDEX
imaginary axis, 237
imaginary part, 237
implicit form, 14
impulsive force, 96
inconsistent equations, 106
inductor, 71
initial-value problem
for first-order equations, 8
for first-order system, 37
for second-order equation, 39
inner product, 173
integrating factor, 22
internal forces, 38
inverse Laplace transform, 83
inverse of a matrix, 125
invertible matrix, 125
jump discontinuity, 82
Laplace transform, 79
inverse, 83
leading 1, 113
leading variables, 114
length of a vector in Rn , 147
linear combination of vectors, 153
linear dependence
for functions, 40, 160
for vectors, 158
linear differential equation, 8
nth-order equations, 35
first-order equations, 21
second-order equations, 39
linear differential operator, 35
linear independence
for functions, 40, 160
for vectors, 158
linear span of vectors, 153
linear systems, 105
linear transformation, 181
linearization, 56
logistic model, 18
lower triangular form, 131
magnitude of a vector in Rn , 147
main diagonal of a matrix, 109
mass matrix, 230
mathematical models, 9
INDEX
matrix, 107
addition, 107
determinant of, 131
elements of, 107
inverse, 125
multiplication, 108
scalar multiplication, 107
matrix exponential, 191
minor of a matrix, 136
mixture problems, 24
modulus, 237
multiple tank mixing, 222
multiplication of matrices, 108
multiplicity for an eigenvalue, 185
natural frequencies, 229
natural frequency, 65
nonhomogeneous nth-order equation, 35
nonhomogeneous first-order system, 207
norm of a vector, 172, 173
nullity of a matrix, 171
nullspace of a matrix, 152
odd permutation, 133
Ohm’s law, 72
open system (of tanks), 222
order (of a differential equation), 8
ordered basis, 198
ordinary differential equation, 10
orthogonal
basis, 174
matrix, 196
vectors, 172, 173, 193
orthogonally diagonalizable, 197
orthonormal
basis, 174
set, 173
overdamped, 53
parallelogram rule, 145
partial differential equation, 10
particular solution, 8
pendulum, 56
permutation, 133
phase plane portraits, 205
phase shift, 51
piecewise continuous, 82
269
piecewise differentiable, 86
pivot column, 115
pivot position, 115
polar form of a complex number, 238
population growth, 9
potential function, 30
practical resonance, 68, 76
projection
onto a subspace, 175
onto a vector, 175
pseudo-frequency, 54
pseudo-periodic, 54
radioactive decay, 9
radiocarbon dating, 21
rank of a matrix, 122
rank-nullity identity, 171
real axis, 237
real part, 237
real vector space, 148
reduced row-echelon form, 120
REF, 113
resistive force, 17
resistor, 71
resonance
for second-order equations, 67
for second-order systems, 232
restorative force, 37
row space of a matrix, 169
row vector, 107
row-echelon form, 113
row-equivalent matrices, 113
RREF, 120
saddle, 205
scalars, 146, 148
second-order system, 230
separable, 14
shifting theorems, 90
similar matrices, 189
singular matrix, 125
sink, 12, 205
skew-symmetric matrix, 142
slope field, 11
solution curve, 11
source, 12, 205
span(v1 , . . . , vn ), 153
270
spanning set, 153
spring constant, 37
square matrix, 109
stability
for first-order equations, 12
for first-order systems, 205
stable equilibrium, 12, 205
standard basis for Rn , 163
steady-periodic solution, 68
steady-periodic charge, 74
steady-periodic current, 74
step function, 82
stiffness matrix, 230
stochastic matrix, 202
straight-line solution, 214
subspace of a vector space, 151
substitution, 27, 105
successive approximations, 12
superposition, 36
symmetric matrix, 110
system
of first-order equations, 37, 203
of first-order linear equations, 207
of linear equations, 105
of second-order equations, 230
terminal velocity, 18
time lag, 51
Torricelli’s law, 14, 34
trace of a square matrix, 142
transfer function, 102
transient part, 68, 73
transition matrix, 190
transpose of a matrix, 110
trial solution, 58
triangular form, 131
trivial solution, 40, 117
undamped
forced vibrations, 66
free vibrations, 50
underdamped, 53
undetermined coefficients, 58, 212
uniqueness of solutions
for first-order equations, 11
unit step function, 82
unit vector
INDEX
in Rn , 147
in an inner product space, 173
unstable equilibrium, 12, 205
upper triangular form, 131
variation of parameters, 64
vector field, 204
vector notation, 107
vector space, 148
vectors
in Rn , 145
in a vector space, 148
warming, 10
Wronskian, 41, 161
zero vector
in Rn , 145
in a vector space, 148
Download