Differential Equations and Linear Algebra Jason Underdown

advertisement
Differential Equations
and
Linear Algebra
Jason Underdown
December 8, 2014
Contents
Chapter 1.
First Order Equations
1
1.
Differential Equations and Modeling
1
2.
Integrals as General and Particular Solutions
5
3.
Slope Fields and Solution Curves
9
4.
Separable Equations and Applications
13
5.
Linear First–Order Equations
20
6.
Application: Salmon Smolt Migration Model
26
7.
Homogeneous Equations
28
Chapter 2.
Models and Numerical Methods
31
1.
Population Models
31
2.
Equilibrium Solutions and Stability
34
3.
Acceleration–Velocity Models
39
4.
Numerical Solutions
41
Chapter 3.
Linear Systems and Matrices
45
1.
Linear and Homogeneous Equations
45
2.
Introduction to Linear Systems
47
3.
Matrices and Gaussian Elimination
50
4.
Reduced Row–Echelon Matrices
53
5.
Matrix Arithmetic and Matrix Equations
53
6.
Matrices are Functions
53
7.
Inverses of Matrices
57
8.
Determinants
58
Chapter 4.
Vector Spaces
61
i
ii
Contents
1.
Basics
61
2.
Linear Independence
64
3.
Vector Subspaces
65
4.
5.
Affine Spaces
Bases and Dimension
65
66
6.
Abstract Vector Spaces
67
Chapter 5.
Higher Order Linear Differential Equations
69
1.
Homogeneous Differential Equations
69
2.
Linear Equations with Constant Coefficients
70
3.
Mechanical Vibrations
74
4.
The Method of Undetermined Coefficients
76
5.
The Method of Variation of Parameters
78
6.
Forced Oscillators and Resonance
80
7.
Damped Driven Oscillators
84
Chapter 6.
Laplace Transforms
87
1.
The Laplace Transform
87
2.
The Inverse Laplace Transform
92
3.
Laplace Transform Method of Solving IVPs
94
4.
Switching
101
5.
Convolution
102
Chapter 7.
Eigenvalues and Eigenvectors
105
1.
Introduction to Eigenvalues and Eigenvectors
105
2.
Algorithm for Computing Eigenvalues and Eigenvectors
107
Chapter 8.
Systems of Differential Equations
109
1.
First Order Systems
109
2.
Transforming a Linear DE Into a System of First Order DEs
112
3.
Complex Eigenvalues and Eigenvectors
113
4.
Second Order Systems
115
Chapter 1
First Order Equations
1. Differential Equations and Modeling
A differential equation is simply any equation that involves a function, say y(x)
and any of its derivatives. For example,
(1)
y 00 = −y.
The above equation uses the prime notation (0 ) to denote the derivative, which
has the benefit of resulting in compact equations. However, the prime notation
has the drawback that it does not indicate what the independent variable is. By
just looking at equation 1 you can’t tell if the independent variable is x or t or
some other variable. That is, we don’t know if we’re looking for y(x) or y(t). So
sometimes we will write our differential equations using the more verbose, but also
more clear Leibniz notation.
(1)
d2 y
= −y
dx2
In the Leibniz notation, the dependent variable, in this case y, always appears
in the numerator of the derivative, and the independent variable always appears in
the denominator of the derivative.
Definition 1.1. The order of a differential equation is the order of the highest
derivative that appears in it.
So the order of the previous equation is two. The order of the following equation
is also two:
(2)
x(y 00 )2 = 36(y + x).
Even though y 00 is squared in the equation, the highest order derivative is still just
a second order derivative.
1
2
1. First Order Equations
Our primary goal is to solve differential equations. Solving a differential equation requires us to find a function, that satisfies the equation. This simply means
that if you replace every occurence of y in the differential equation with the found
function, you get a valid equation.
There are some similarities between solving differential equations and solving
polynomial equations. For example, given a polynomial equation such as
3x2 − 4x = 4,
it is easy to verify that x = 2 is a solution to the equation simply by substituting
2 in for x in the equation and checking whether the resulting statement is true.
Analogously, it is easy to verify that y(x) = cos x satisfies, or is a solution to
equation 1 by simply substituting cos x in for y in the equation and then checking
if the resulting statement is true.
?
(cos x)00 = − cos x
?
(− sin x)0 = − cos x
?
− cos x = − cos x
X
The biggest difference is that in the case of a polynomial equation our solutions
took the form of real numbers, but in the differential equation case, our solutions
take the form of functions.
Example 1.2. Verify that y(x) = x3 − x is a solution of equation 2.
y 00 = 6x ⇒ x(y 00 )2 = x(6x)2 = 36x3 = 36(y + x)
4
A basic study of differential equations involves two facets. Creating differential
equations which encode the behavior of some real life situation. This is called
modeling. The other facet is of course developing systematic solution techniques.
We will examine both, but we will focus on developing solution techniques.
1.1. Mathematical Modeling. Imagine a large population or colony of bacteria
in a petri dish. Suppose we wish to model the growth of bacteria in the dish. How
could we go about that? Well, we have to start with some educated guesses or
assumptions.
Assume that the rate of change of this colony in terms of population is directly
proportional to the current number of bacteria. That is to say that a larger population will produce more offspring than a smaller population during the same time
interval. This seems reasonable, since we know that a single bacterium reproduces
by splitting into two bacteria, and hence more bacteria will result in more offspring.
How do we translate this into symbolic language?
(3)
∆P = P ∆t
1. Differential Equations and Modeling
3
This says that the change in a population depends on the size of the population
and the length of the time interval over which we make our population measurements. So if the time interval is short, then the population change will also be
small. Similarly it roughly says that more bacteria correspond to more offspring,
and vice versa.
But if you look closely, the left hand side of equation 3 has units of number
of bacteria, while the right hand side has units of number of bacteria times time.
The equation can’t possibly be correct if the units don’t match. However to fix this
we can multiply the left hand side by some parameter which has units of time, or
we can multiply the right hand side by some parameter which has units of 1/time.
Let’s multiply the right hand side by a parameter k which has units of 1/time.
Then our equation becomes:
(4)
∆P = kP ∆t
Dividing both sides of the equation by ∆t and taking the limit as ∆t goes to
zero, we get:
dP
∆P
=
= kP
lim
∆t→0 ∆t
dt
(5)
dP
= kP
dt
Here k is a constant of proportionality, a real number which allows us to balance
the units on both sides of the equation and it also affords some freedom. In essence
it allows us to defer saying how closely P and its derivative are related. If k is a
large positive number, then that would imply a large rate of change, and a small
positive number greater than zero but less than one would be a small rate of change.
If k is negative then that would imply the population is shrinking in number.
Example 1.3. If we let P (t) = Cekt , then a simple differentiation reveals that this
is a solution to our population model in equation 5.
Suppose that at time 0, there are 1000 bacteria in the dish. After one hour the
population doubles to 2000. This data corresponds to the following two equations
which allow us to solve for both C and k:
1000 = P (0) = Ce0 = C
=⇒ C = 1000
2000 = P (1) = Cek
The second equation implies 2000 = 1000ek which is equivalent to 2 = ek which
is equivalent to k = ln 2. Thus we see that with these two bits of data we now know:
P (t) = 1000eln(2)·t = 1000(eln(2) )t = 1000 · 2t
This agrees exactly with our knowledge that bacteria multiply by splitting into
two.
4
4
1. First Order Equations
1.2. Linear vs. Nonlinear. As you may have surmised we will not be able
to exactly solve every differential equation that you can imagine. So it will be
important to recognize which equations we can solve and those which we can’t.
It turns out that a certain class of equations called linear equations are very
amenable to several solution techniques and will always have a solution (under
modest assumptions), whereas the complementary set of nonlinear equations are
not always solvable.
A linear differential equation is any differential equation where solution functions can be summed or scaled to get new solutions. Stated precisely, we mean:
Definition 1.4. A differential equation is linear is equivalent to saying: If y1 (x)
and y2 (x) are any solutions to the differential equation, and c is any scalar (real)
number, then
(1) y1 (x) + y2 (x) will be a solution and,
(2) cy1 (x) will be a solution.
This is a working definition, which we will change later. We will use it for
now because it is simple to remember and does capture the essence of linearity,
but we will see later on that we can make the definition more inclusive. That is
to say that there are linear differential equations which don’t satisfy our current
definition until after a certain piece of the equation has been removed.
Example 1.5. Show that y1 (x) + y2 (x) is a solution to equation 1 when y1 (x) =
cos x and y2 (x) = sin x.
(y1 + y2 )00 = (cos x + sin x)00
= (− sin x + cos x)0
= (− cos x − sin x)
= −(cos x + sin x)
= −(y1 + y2 ) X
4
Notice that the above calculation does not prove that y 00 = −y is a linear
differential equation. The reason for this is that summability and scalability have
to hold for any solutions, but the above calculation just proves that summability
holds for the two given solutions. We have no idea if there may be solutions which
satisfy equation 1 but fail the summability test.
The previous definition is useless for proving that a differential equation is
linear. However, the negation of the definition is very useful for showing that a
differential equation is nonlinear, because the requirements are much less stringent.
Definition 1.6. A differential equation is nonlinear is equivalent to saying: y1
and y2 are any solutions to the differential equation, and c is any scalar (real)
number, but
2. Integrals as General and Particular Solutions
5
(1) y1 + y2 is not a solution or,
(2) cy1 is not a solution.
Again, this is only a working definition. It captures the essence of nonlinearity,
but since we will expand the definition of linearity to be more inclusive, we must by
the same token change the definition of nonlinear in the future to be less inclusive.
So let’s look at a nonlinear equation. Let y =
differential equation:
1
c−x ,
then y will satisfy the
y0 = y2
(6)
because:
y0 =
1
c−x
0
= (c − x)−1
0
= −(c − x)−2 · (−1)
1
=
(c − x)2
= y2
We see that actually, y =
any real number.
1
c−x
is a whole family of solutions, because c can be
Example 1.7. Use definition 1.6 to show that equation 6 is nonlinear.
1
1
and y2 (x) = 3−x
. We know from the previous paragraph that
Let y1 (x) = 5−x
both of these are solutions to equation 6, but
(y1 + y2 )0 = y10 + y20
1
1
=
+
2
(5 − x)
(3 − x)2
1
2
1
6=
+
+
(5 − x)2
(5 − x)(3 − x) (3 − x)2
2
1
1
=
+
5−x 3−x
= (y1 + y2 )2
4
2. Integrals as General and Particular Solutions
You probably didn’t realize it at the time, but every time you computed an indefinite
integral in Calculus, you were solving a differential equation.
For example if you
R
were asked to compute an indefinite integral such as f (x)dx where the integrand
is some function f (x), then you were actually solving the differential equation
dy
(7)
= f (x).
dx
6
1. First Order Equations
This is due to the fact that differentiation and integration are inverses of each other
up to a constant. Which can be phrased mathematically as:
Z
Z
dy
y(x) =
dx = f (x)dx = F (x) + C
dx
if F (x) is the antiderivative of f (x). Notice that the integration constant C can
be any real number, so our solution y(x) = F (x) + C to equation 7 is not a single
solution but actually a whole family of solutions, one for each value of C.
Definition 1.8. A general solution to a differential equation is any solution
which has an integration constant in it.
As noted above, since a constant of integration is allowed to be any real number,
a general solution is actually an infinite set of solutions, one for each value of the
integration constant. We often say that a general solution with one integration
constant forms a one parameter family of solutions.
Example 1.9. Solve y 0 = x2 − 3 for y(x).
x3
− 3x + C
3
Thus our general solution is y(x) = 31 x3 − 3x + C. Figure 2.1 shows plots of several
solution curves for C values ranging from 0 to 3.
Z
y(x) =
y 0 dx =
Z
(x2 − 3)dx =
Figure 2.1. Family of solution curves for y 0 = x2 − 3.
4
Thus we see that whenever we can write a differential equation in the form
y 0 = f (x) where the right hand side is only a function of x (or whatever the
2. Integrals as General and Particular Solutions
7
independent variable is, e.g. t) and does not involve y (or whatever the dependent
variable is), then we can solve the equation merely by integrating. This is very
useful.
2.1. Initial Value Problems (IVPs) and Particular Solutions.
Definition 1.10. An initial value problem or IVP is a differential equation and
a specific point which our solution curve must pass through. It is usually written:
(8)
y 0 = f (x, y)
y(a) = b.
Differential equations had their genesis in solving problems of motion, where
the indpendent variable is time, t, hence the use of the word “initial”, to convey
the notion of a starting point in time.
Solving an IVP is a two step process. First you must find the general solution.
Second you use the initial value y(a) = b to select one particular solution out of the
whole family or set of solutions. Thus a particular solution is a single function which
satisfies both the governing differential equation and passes through the initial value
a.k.a. initial condition.
Definition 1.11. A particular solution is a solution to an IVP.
Example 1.12. Solve the IVP: y 0 = 3x − 2,
y(2) = 5.
Z
y(x) =
(3x − 2) dx
3 2
x − 2x + C
2
3
y(0) = 22 − 2 · 2 + C = 5 =⇒ C = 3
2
3
y(x) = x2 − 2x + 3
2
y(x) =
4
2.2. Acceleration, Velocity, Position. The method of integration extends to
high order equations. For example, when confronted with a differential equation of
the form:
(9)
d2 y
= f (x),
dx2
8
1. First Order Equations
we simply integrate twice to solve for
the way.
Z
y(x) =
Z
=
Z
=
Z
=
y(x), gaining two integration constants along
dy
dx
dx
Z 2
d y
dx
dx
dx2
Z
f (x)dx dx
(F (x) + C1 )dx
= G(x) + C1 x + C2
Where we are assuming G00 (x) = F 0 (x) = f (x).
Acceleration is the time derivative of velocity (a(t) = v 0 (t)), and velocity is
the time derivative of position (v(t) = x0 (t)). Thus acceleration a(t) is the second
derivative of position x(t) with respect to time, or a(t) = x00 (t).
If we let x(t) denote the position of a body, and we assume that the acceleration
that the body experiences is constant with value a, then in the language of math
this is written as:
x00 (t) = a
(10)
The right hand side of this is just the constant function f (t) = a, so this
equation conforms to the form of equation 7. However the function name is x
instead of y and the independent variable is t instead of x, but no matter, they are
just names. To solve for x(t) we must integrate twice with respect to t, time.
(11)
0
Z
v(t) = x (t) =
00
x (t)dt =
Z
adt = at + v0
Here we’ve named our integrtion constant v0 because it must match the initial
velocity, i.e. the velocity of the body at time t = 0. Now we integrate again.
Z
(12)
x(t) =
Z
v(t)dt =
(at + v0 )dt =
1 2
at + v0 t + x0
2
Again, we have named the integration constant x0 because it must match the
initial position of the body, i.e. the position of the body at time t = 0.
Example 1.13. Suppose we wish to know how long it will take an object to fall
from a height of 500 feet down to the ground, and we want to know its velocity
when it hits the ground. We know from Physics that near the surface of the Earth
the acceleration due to gravity is roughly constant with a value of 32 feet per second
per second (f /s2 ).
Let x(t) represent the vertical position of the object with x = 0 corresponding
to the ground and x(0) = x0 = 500. Since up is the positive direction and since the
3. Slope Fields and Solution Curves
9
acceleration of the body is down towards the earth a = −32. Although the problem
says nothing about an initial velocity it is safe to assume that v0 = 0.
1 2
at + v0 t + x0
2
1
x(t) = (−32)t2 + 0 · t + 500
2
x(t) = −16t2 + 500
x(t) =
We wish to know the time when the object will hit the ground so we wish to
solve the following equation for t:
0 = −16t2 + 500
500
t2 =
16
r
500
t=±
16
5√
t=±
5
2
t ≈ ±5.59
So we find that it will take approximately 5.59 seconds to hit the earth. We
can use this knowledge and equation 11 to compute its velocity at the moment of
impact.
v(t) = at + v0
v(t) = −32t
v(5.59) = −32 · 5.59
v(5.59) = −178.88 ft/s
v(5.59) ≈ −122 mi/hr.
4
3. Slope Fields and Solution Curves
In section 1 we noticed that there are some similarities between solving polynomial
equations and solving differential equations. Specifically, we noted that it is very
easy to verify whether a function is a solution to a differential equation simply by
plugging it into the equation and checking that the resulting statement is true.
This is exactly analogous to checking whether a real number is a solution to a
polynomial equation. Here we will explore another similarity. You are certainly
familiar with using the quadratic formula for solving quadratic equations, i.e. degree
two polynomial equations. But you may not know that there are similar formulas
for solving third degree and even fourth degree polynomial equations. Interestingly,
it was proved early in the nineteenth century that there is no general formula similar
to the quadratic formula which will tell us the roots of all fifth and higher degree
10
1. First Order Equations
polynomial equations in terms of the coefficients. Put simply, we don’t have a
formulaic way of solving all polynomial equations. We do have numerical techniques
(e.g. Newton’s Method) of approximating the roots which work very well, but these
do not reveal the exact value.
As you might suspect, since differential equations are generally more complicated than polynomial equations the situation is even worse. No procedure exists by
which a general differential equation can be solved explicitly. Thus, we are forced to
use ad hoc methods which work on certain classes of differential equations. Therefore any study of differential equations necessarily requires one to learn various ways
to classify equations based upon which method(s) will solve the equation. This is
unfortunate.
3.1. Slope Fields and Graphing Approximate Solutions. Luckily, in the case
of first order equations a simple graphical method exists by which we may estimate
solutions by constraints on their graphs. This method of approximate solution uses
a special plot called a slope field. Specifically, if we can write a differential equation
in the form:
dy
= f (x, y)
(13)
dx
then we can approximate solutions via the slope field plot. So how does one construct such a plot?
The answer lies in noticing that the right hand side of equation 13 is a function
of points in the xy plane which result in the left hand side which is exactly the slope
of y(x), the solution function we seek! If we know the slope of a function at every
point on the x–axis, then we can graphically reconstruct the solution function y(x).
Creating a slope field plot is normally done via software on a computer. The
basic algorithm that a computer employs to do this is essentially the following:
(1) Divide the xy plane evenly into a grid of squares.
(2) For each point (xi , yi ) in the grid do the following:
(a) compute the slope, dy/dx = f (xi , yi ).
(b) Draw a small bar centered on the point (xi , yi ) with slope computed
above. (Each bar should be of equal length and short enough so that
they do not overlap.)
Let’s use Maple to create a slope field plot for the differential equation
y
(14)
y0 = 2
.
x +1
with(DEtools):
DE := y’(x) = y(x)/(x^2+1)
dfieldplot(DE, y(x), x=-4..4, y=-4..4, arrows=line)
Maple Listing 1. Slope field plot example. See figure 3.1.
Because any solution curve must be tangent to the bars in the slope field plot,
it is fairly easy for your eye to detect possible routes that a solution curve could
3. Slope Fields and Solution Curves
Figure 3.1. Slope field plot for y 0 =
11
y
.
x2 +1
take. One can immediately gain a feel for the qualitative behavior of a solution
which is often more valuable than a quantitative solution when modeling.
3.2. Creating Slope Field Plots By Hand. The simple algorithm given above
is fine for a computer program, but is very hard for a human to use in practice.
However there is a simpler algorithm which can be done by hand with pencil and
graph paper. The main idea is to find the isoclines in the slopefield, and plot
regularly spaced, identical slope bars over the entire length of the isocline.
Definition 1.14. An isocline is a line or curve decorated by regularly spaced short
bars of constant slope.
Example 1.15. Suppose we wish to create a slope–field plot for the differential
equation
dy
= x − y = f (x, y).
dx
The method involves two steps. First, we create a table. Each row in the
table corresponds to one isocline. Second, for each row in the table we graph the
corresponding isocline and decorate it with regularly spaced bars, all of which have
equal slope. The slope corresponds to the value in the first column of the table.
Table 1 contains the data for seven isoclines, one for each integer slope value
from −3, . . . , 3. We must graph each equation of a line from the third column, and
decorate it with regularly spaced bars where the slope comes from the first column.
Figure 3.2. Isocline slope–field plot for y 0 = x − y.
12
1. First Order Equations
m
-3
-2
-1
0
1
2
3
m = f (x, y) y = h(x)
−3 = x − y
−2 = x − y
−1 = x − y
0=x−y
1=x−y
2=x−y
3=x−y
y=x+3
y=x+2
y=x+1
y=x
y=x−1
y=x−2
y=x−3
Table 1. Isocline method.
4
3.3. Existence and Uniqueness Theorem. It would be useful to have a simple
test that tells us when a differential equation actually has a solution. We need
to be careful here though, because recall that a general solution to a differential
equation is actually an infinite family of solution functions, one for each value of
the integration constant. We need to be more specific. What we should really ask
is, “Does my IVP have a solution?” Recall that an IVP (Initial Value Problem) is
a differential equation and an initial value,
(8)
y 0 = f (x, y)
y(a) = b.
If a particular solution exists, then our follow up question should be, “Is my particular solution unique?”. The following theorem gives a test that can be performed
to answer both questions.
Theorem 1.16 (Existence and Uniqueness). Consider the IVP
dy
= f (x, y)
y(a) = b
dx
(1) Existence If f (x, y) is continuous on some rectangle R in the xy–plane
which contains the point (a, b), then there exists a solution to the IVP on
some open interval I containing the point a.
∂
(2) Uniqueness If in addition to the conditions in (1), ∂y
f (x, y) is continuous on R, then the solution to the IVP is unique in I.
√
Example 1.17. Consider the IVP: y 0 = 3 y y(0) = 0. Use theorem 1.16 to
determine (1) whether or not a solution to the IVP exists, and (2) if one does,
whether it is unique.
(1) The cube root function is defined for all real numbers, and is continuous
everywhere thus a solution to the IVP exists.
4. Separable Equations and Applications
(2) f (x, y) =
√
3
13
1
y = y3
∂f
1 2
= y− 3
∂y
3
1
= p
3
3 y2
which is discontinuous at (0, 0), thus the solution is not unique.
4
4. Separable Equations and Applications
In the previous section we explored a method of approximately solving a large class
dy
= f (x, y), where the right hand side is any
of first order equations of the form dx
function of both the independent variable x and the dependent variable y. The
graphical method of creating a slope field plot is useful, but not ideal because it
does not yield an exact solution function.
Luckily, a large subclass (subset) of these equations, the so–called separable
equations can be solved exactly. Essentially an equation is separable if the right
hand side can be factored into a product of two functions, one a function of the
independent variable, and the other a function of the dependent variable.
Definition 1.18. A separable equation is any differential equation that can be
written in the form:
dy
(15)
= f (x)g(y).
dx
Example 1.19. Determine whether the following equations are separable or not.
dy
(1)
= 3x2 y − 5xy
dx
dy
x−4
(2)
= 2
dx
y +y+1
dy
√
(3)
= xy
dx
dy
(4)
= y2
dx
dy
(5)
= 3y − x
dx
dy
(6)
= sin(x + y) + sin(x − y)
dx
dy
(7)
= exy
dx
dy
(8)
= ex+y
dx
Solutions:
(1) separable: 3x2 y − 5xy = (3x2 − 5x)y
14
1. First Order Equations
x−4
= (x − 4)
2
y +y+1
√ √
√
(3) separable:
xy = x y
(2) separable:
1
2
y +y+1
(4) separable: y 2 = y 2 · 1
(5) not separable
(6) separable: sin(x + y) + sin(x − y) = 2 sin(x) cos(y)
(7) not separable
(8) separable: ex+y = ex · ey
4
Before explaining and justifying the method of separation of variables formally,
it is helpful to see an example of how it works. A good way to remember this
method is to remember that it allows us to treat derivatives written using the
Leibniz notation as if they were actual fractions.
Example 1.20. Solve the initial value problem:
dy
= −kxy,
dx
assuming k is a positive constant.
y(0) = 4,
dy
= −kx dx
y
Z
Z
dy
= −k x dx
y
x2
ln |y| = −k
+C
2
eln|y| = e(−k
k
x2
2
+C)
2
|y| = e(− 2 x ) · eC
y = C0 e
let C0 = eC
2
(− k
2x )
Now plug in x = 0 and set y = 4 to solve for our parameter C0 .
4 = C0 e0 = C0
=⇒
k
2
y(x) = 4e− 2 x
4
There are several steps in the above solution which should raise an eyebrow.
First, how can you pretend that the derivative dy/dx is a fraction when clearly it
is just a symbol which represents a function? Second, why are we able to integrate
with respect to x on the right hand side, but with respect to y which is a function of x
on the left hand side? The rest of the solution just involves algebraic manipulations
and is fine.
The answer to both questions above is that what we did is simply “shorthand”
for a more detailed, fully correct solution. Let’s start over and solve equation 15.
4. Separable Equations and Applications
15
dy
= f (x)g(y)
dx
1 dy
= f (x)
g(y) dx
So far, so good, all we have to watch out for is when g(y) = 0, but that just
means that our solutions y(x) might not be defined for the whole real line. Next,
let’s integrate both sides of the equation with respect to x, and we’ll rewrite y as
y(x) to remind us that it is a function of x.
Z dy
1
g(y(x)) dx
Z
dx =
f (x) dx
Now, to help us integrate the left hand side, we will make a u–substitution.
u = y(x)
dy
dx.
dx
Z
du = f (x) dx
du =
Z
1
g(u)
This equation matches up with the second line in the example above. The
“shorthand” technique used in the example skips the step of making the u–substitution.
If we can integrate both sides, then on the left hand side we will have some
function of u = y(x), which we can hopefully solve for y(x). However, even if we
cannot solve for y(x) explicitly, we will still have an implicit solution which can be
useful.
Now, let’s use the above technique of separation of variables to solve the Population model from section 1.
Example 1.21. Find the general solution to the population model:
(5)
dP
= kP.
dt
dP
= kdt
Z P
Z
dP
= k dt
P
ln |P | = kt + C
eln|P | = ekt+C
eln|P | = ekt · eC , let P0 = eC
(16)
P (t) = P0 ekt
4
16
1. First Order Equations
The separation of variables solution technique is important because it allows
us to solve several nonlinear equations. Let’s use the technique to solve equation 6
which is the first order, nonlinear differential equation we examined in section 1.
Example 1.22. Solve y 0 = y 2 .
dy
= y2
dx
Z
Z
dy
=
dx
y2
Z
Z
y −2 dy = dx
−y −1 = x + C
1
− =x+C
y
1
=C −x
y
1
y(x) =
C −x
absorb negative sign into C
4
4.1. Radioactive Decay. Most of the carbon in the world is of the isotope
carbon–12, (126 C), but there are small amounts of carbon–14, (146 C) continuously
being created in the upper atmosphere as a result of cosmic rays (neutrons in this
case) colliding with nitrogen.
1
0n
+ 147 N → 146 C + 11 p
The resulting 146 C is radioactive and will eventually beta decay to
and an anti–neutrino:
14
6C
14
7 N,
an electron
→ 147 N + e− + ν̄e
The half–life of 146 C is 5732 years. This is the time it takes for half of the 146 C in
a sample to decay to 147 N. The half–life is determined experimentally. From this
knowledge we can solve for the constant of proportionality k:
1
P0 = P0 ek·5732
2
1
= ek·5732
2
1
ln
= 5732k
2
ln(1) − ln(2)
k=
5732
− ln(2)
k=
5732
k ≈ −0.00012092589
4. Separable Equations and Applications
17
The fact that k is negative is to be expected, because we are expecting the
population of carbon–14 atoms to diminish as time goes on since we are modeling
exponential decay. Let us now see how we can use our new knowledge to reliably
date ancient artifacts.
All living things contain trace amounts of carbon–14. The proportion of carbon–
14 to carbon–12 in an organism is equal to the proportion in the atmosphere. This
is because although carbon atoms in the organism continually decay, new radioactive carbon–14 atoms are taken in through respiration or consumption. That is to
say that a living organism whether it be a plant or animal continually replenishes
its supply of carbon–14. However, once it dies the process stops.
If we assume that the amount of carbon–14 in the atmosphere has remained
constant for the past several thousand years, then we can use our knowledge of differential equations to carbon date ancient artifacts that contain once living material
such as wood.
Example 1.23 (Carbon Dating). The logs of an old fort contain only 92% of the
carbon–14 that modern day logs of the same type of wood contain. Assuming that
the fort was built at about the same time as the logs were cut down, how old is the
fort?
Let’s assume that the decrease in the population of carbon–14 atoms is governed
by the population equation dy/dt = ky, where y represents the number of carbon–14
atoms. From previous work, we know that solution to this equation is y(t) = y0 ekt ,
where y0 is the initial amount of carbon–14. We know that currently the wood
contains 92% of the carbon–14 that it would have had upon being cut down, thus
we can solve:
0.92y0 = y0 ekt
ln(0.92) = kt
ln(0.92)
k
5732 ln(0.92)
t=
− ln(2)
t ≈ 778 years
t=
4
4.2. Diffusion. Another extremely important separable
equation comes about from modeling diffusion. Diffusion is
the spreading of something from a concentrated state to a
less concentrated state.
We will model the diffusion of salt across a semi–
permeable membrane such as a cell wall. Imagine a cell,
which contains a salt solution that is immersed in a bath
of saline solution. If the salt concentration inside the cell is
Figure 4.1. Cell in salt
higher than outside the cell, then salt will on average, mostly bath
18
1. First Order Equations
flow out of the cell, and vice versa. Let’s assume that the rate of change of salt concentration in the cell is proportional to the difference between the concentrations
outside and inside the cell. Also, let’s assume that the surrounding bath is so much
larger in volume than the cell, that its concentration remains essentially constant
because the outflow from the cell is miniscule. We must translate these ideas into
a model. If we let y(t) represent the salt concentration inside the cell, and A the
constant concentration of the surrounding bath, then we get the diffusion equation:
dy
= k(A − y)
dt
(17)
Again, k is a constant of proportionality with units, 1/time, and we assume k > 0.
This is a separable equation, so we know how to solve it.
Z
Z
dy
= k dt
A−y
Z
Z
du
−
= k dt
u
− ln |A − y| = kt + C
|A − y| = e−kt−C
u = A − y, −du = dy
let C0 = e−C
|A − y| = C0 e−kt
(
C0 e−kt
A−y =
−C0 e−kt
(
C0 e−kt
y = A−
−C0 e−kt
A>y
A<y
A>y
A<y
Thus we get two solutions depending on which concentration is initially higher.
(18)
y(t) = A − C0 e−kt
A>y
(19)
−kt
A<y
y(t) = A + C0 e
Actually, there is a third rather uninteresting solution which occurs when A =
y, but then the right hand side of equation 17 is simply 0, which forces y(t) = A,
the constant solution. A remark is in order here. Rather than memorizing the
solution, it is far better to become familar with the steps of the solution.
Example 1.24. Suppose a cell with a salt concentration of 5% is immersed in a
bath of 15% salt solution. If the concentration in the cell doubles to 10% in 10
minutes, how long will it take for the salt concentration in the cell to reach 14%?
We wish to solve the IVP:
dy
= k(.15 − y) y(0) = .05,
dt
along with the extra information y(10) = .10.
4. Separable Equations and Applications
19
Z
Z
dy
= k dt
.15 − y
Z
Z
du
−
= k dt
u
− ln |.15 − y| = kt + C
u = .15 − y, −du = dy
|.15 − y| = e−kt−C
.15 − y = C0 e−kt
y = .15 − C0 e−kt
.05 = .15 − C0 e0 ⇒ C0 = .10
Now we can use the second condition, (point on the solution curve), to determine k:
y(t) = .15 − .10e−kt
.10 = .15 − .10e−k·10
.15 − .10
e−k·10 =
.10
1
−k · 10 = ln
2
ln(2) − ln(1)
k=
10
ln(2)
k=
10
Figure 4.2 graphs a couple of solution curves, for a few different starting cell
concentrations. Notice that in the limit, as time goes to infinity all cells placed in
this salt bath will approach a concentration of 15%. In other words, all cells will
eventually come to equilibrium with their environment.
with(DEtools)
DE := y’(t) = k*(A-y(t))
A := .15
k := ln(2)/10
IVS := [y(0)=.25, y(0)=.15, y(0)=.05] # Initial values array
DEplot(DE, y(t), t=0..60, IVS, y=0..0.3, linecolor=navy)
Maple Listing 2. Diffusion example. See figure 4.2.
20
1. First Order Equations
Figure 4.2. Three solution curves for example 1.24, showing the change in
salt concentration due to diffusion.
Finally, we wish to find the time at which the salt concentration of the cell will
be exactly 14%. To find this time, we solve the following equation for t:
.14 = .15 − .10e−kt
.15 − .14
= .1
e−kt =
.10 1
−kt = ln
10
−kt = ln(1) − ln(10)
−kt = − ln(10)
ln(10)
k
10 ln(10)
t=
ln(2)
t ≈ 33.22 minutes
t=
4
5. Linear First–Order Equations
5.1. Linear vs. Nonlinear Redux. In section 1 we defined a differential equation to be linear if all of its solutions satisfied summability and scalability.
A first–order, linear differential equation is any differential equation which can
be written in the following form:
(20)
a(x)y 0 + b(x)y = c(x).
5. Linear First–Order Equations
21
If we think of y 0 and y as variables, then this equation is reminiscent of linear
equations from algebra, except that the coefficients are now allowed to be functions of the independent variable x, instead of real numbers. Of course, y and y 0
are functions, not variables, but the analogy is useful. Notice that the coefficient
functions are strictly forbidden from being functions of y or any of its derivatives.
The above definition of linear extends to higher–order equations. For example,
a fourth order, linear differential equation can be written in the form:
(21)
a4 (x)y (4) + a3 (x)y 000 + a2 (x)y 00 + a1 (x)y 0 + a0 (x)y = f (x)
Definition 1.25. In general, an n–th order, linear differential equation is any
equation which can be written in the form:
(22)
an (x)y (n) + an−1 (x)y (n−1) + · · · + a1 (x)y 0 + a0 (x)y = f (x).
This is not just a working definition. It is the definition that we will continue
to use throughout the text. Notice that this definition is very different from the
previous definition in section 1. That definition suffered from the defect that it was
impossible to positively determine whether an equation was linear. We could only
use it to determine when a differential equation is nonlinear. The above definition
is totally different. You can use the above definition to tell on sight (with practice)
whether or not a given differential equation is linear. Also notice that it suffers from
being a poor tool for determining whether a given differential equation is nonlinear.
This is because, you don’t know if perhaps your are just not being clever enough
to write the equation in the form of the definition.
These notes focus on solving linear equations, however recall from section 4
that we can solve nonlinear, first–order equations when they are separable. However, in general, solving higher order, nonlinear differential equations is much more
difficult. However, not all is lost. A Norwegian mathematician named Sophus Lie
(prononunced “lee”) discovered that if a differential equation possesses a type of
transfomational symmetry, then that symmetry can be used to find solutions of the
equation. His work led a German mathematician, Hermann Weyl, to extend Lie’s
ideas and today Weyl’s work forms the foundations of much of modern Quantum
Mechanics. Lie’s symmetry methods are beyond the scope of this book, but if you
are a Physics student, you should definitely look into them after completing this
course.
5.2. The Integrating Factor Method. Good news. We can solve any first
order, linear differential equation! The caveat here is that the method involves
integration, so a solution function might have to be defined in terms of an integral,
that is, it might be an accumulation function.
The first step in this method is to divide both sides of equation 20 by the
coefficient function of y 0 , i.e. a(x).
a(x)y 0 + b(x)y = c(x)
=⇒
y0 +
c(x)
b(x)
y=
a(x)
a(x)
22
1. First Order Equations
We will rename b(x)/a(x) to p(x) and c(x)/a(x) to q(x) and rewrite this equation in what we will call standard form for a first order, linear equation.
y 0 + p(x)y = q(x)
(23)
The reason for using p(x) and q(x) is simply because they are easier to write
than b(x)/a(x) and c(x)/a(x). The heart of the method is what follows. If the
left hand side of equation 23 were the derivative of some expression, then we could
perhaps get rid of the prime on y 0 by integrating both sides and then algebraically
solve for y(x). Notice that the left hand side of equation 23 almost resembles the
result of differentiating the product of two functions. Recall the product rule:
d
[uv] = u0 v + uv 0 .
dx
Perhaps we can multiply both sides of equation 23 by something that will make
the left hand side into an expression which is the derivative of a product of two
functions. Remember, however, that we must multiply both sides of the equation
by the same factor or else we will be solving an entirely different equation. Let’s
call this factor ρ(x) because the Greek letter “rho” resembles the Latin letter “p”,
and we will see that p(x) must be related to ρ(x). That is we want:
d
(24)
[yρ] = y 0 ρ + ypρ
dx
By comparing with the product rule, we find that if ρ0 = pρ, then the expression
y 0 ρ + ypρ0 will indeed be the derivative of the product yρ. Notice that we have
reduced the problem down to solving a first order, separable equation that we
know how to solve.
ρ0 = p(x)ρ
(25)
=⇒
ρ=e
R
p(x)dx
Upon multiplying both sides of equation 23 by the integrating factor ρ from
equation 25, we get:
R
p(x)dx 0
R
) = q(x)e p(x)dx
Z
Z
R
R
p(x)dx 0
(ye
) dx = q(x)e p(x)dx dx
Z
R
R
ye p(x)dx = q(x)e p(x)dx dx
Z
R
R
− p(x)dx
y=e
q(x)e p(x)dx dx
(ye
You should not try to memorize the formula above. Instead remember the
following steps:
(1) Put the first order linear equation in standard form.
(2) Calculate ρ(x) = e
R
p(x)dx
.
(3) Multiply both sides of the equation by ρ(x).
(4) Integrate both sides.
5. Linear First–Order Equations
23
(5) Solve for y(x).
Example 1.26. Solve xy 0 − y = x3 for y(x).
1
y = x2
x
(1) y 0 −
(2) ρ(x) = e−
(3) y 0
(4)
R
R
dx
x
−1
= e− ln|x| = eln(|x|
)
=
1
1
=
|x|
x
x>0
1
1
1
− y 2 = x2
x
x
x
y
(5) y =
1
x
0
dx =
R
xdx
1 3
x + Cx
2
=⇒
y
1
1
= x2 + C
x
2
x>0
4
An important fact to notice is that we ignore the constant of integration when
computing the integrating factor ρ. This is because the constant of integration is
part of the exponent of e. Assume P (x) is the antiderivative of p(x), then
R
ρ=e
p(x)dx
= e(P (x)+C) = eC · eP (x) = C1 eP (x) .
Since we multiply both sides of the equation by the integrating factor, the C1 s cancel
out.
5.3. Mixture Problems. One very common modeling technique heavily used
throughout the sciences is called compartmental analysis. The idea is to model
the spread of some measurable quantity such as a chemical as it travels from one
compartment to the next. Compartment models are used in many fields including
medicine, epidemiology, engineering, physics, climate science and the social sciences.
Figure 5.1. A brine mixing tank
We will build a simple model based upon a brine mixing tank. Imagine a mixing
tank with a brine solution flowing into the tank, being well mixed, and then flowing
out a spigot. If we let x(t) represent the amount of salt in the tank at time t, then
the main idea of the model is:
dx
= “rate in - rate out”.
dt
24
1. First Order Equations
We will use the following names/symbols for the different quantities in the model:
Symbol
x(t)
ci (t)
fi (t)
co (t)
fo (t)
v(t)
Interpretation
=
=
=
=
=
=
amount of salt in the tank (lbs)
concentration of incoming solution (lbs/gal)
flow rate of incoming solution (gal/min)
concentration of outgoing solution (lbs/gal)
flow rate of outgoing solution (gal/min)
amount of brine in the tank (gal)
Notice that if you multiply a concentration by a flow rate, then the units will
be lbs/min which exactly match the units of the derivative dx/dt, hence our model
is:
(26)
dx
= ci (t)fi (t) − co (t)fo (t)
dt
Often, ci , fi and fo will be fixed quantities, but co (t) depends upon the amount
of salt in the tank at time t, and the volume of brine in the tank at that time. If we
assume that the incoming salt solution and the solution in the tank are perfectly
mixed, then:
x(t)
(27)
co (t) =
.
v(t)
Often the flow rate in, fi , and the flow rate out, fo , will be equal. When this
is the case, the volume of the tank will remain constant. However if the two flow
rates do not match, then v(t) = [fi (t) − fo (t)]t + v0 , where v0 is the initial volume
of the tank. Now we can rewrite equation 26, in the same form as the standard
first order linear equation.
(28)
dx fo (t)
+
x = ci (t)fi (t)
dt
v(t)
Example 1.27 (Brine Tank). A tank initially contains 200 gallons of brine, holding
50 lbs of salt. Salt water (brine) containing 2 lbs of salt per gallon flows into the
tank at a constant rate of 4 gal/min. The mixture is kept uniform by constant
stirring, and the mixture flows out at a rate of 4 gal/min. Find the amount of salt
in the tank after 40 minutes.
dx
= ci fi − co fo
dt
dx
2 lb
4 gal
x lb
4 gal
=
−
dt
gal
min
200 gal
min
dx
1
=8− x
dt
50
dx
1
+ x=8
dt
50
5. Linear First–Order Equations
25
This equation can be solved via the integrating factor technique.
ρ(t) = e
R
1
50
dt
= et/50
1 t/50
xe
= 8et/50
50
d h t/50 i
xe
= 8et/50
dt
Z
Z
d h t/50 i
dt = 8 et/50 dt
xe
dt
x0 et/50 +
xet/50 = 8 · 50et/50 + C
x(t) = e−t/50 [400et/50 + C]
x(t) = 400 + Ce−t/50
Next we apply the initial condition x(0) = 50:
50 = 400 + Ce0
=⇒
C = −350
Finally, we compute x(40).
x(t) = 400 − 350e−t/50
x(40) = 400 − 350e−40/50
x(40) ≈ 242.7 lbs
Notice that limt→∞ x(t) = 400, which is exactly how much salt would be in a
200 gallon tank filled with brine at the incoming concentration of 2 lbs/gal.
4
In the previous example the inflow rate, fi and the outflow rate, fo were equal.
This results in a convenient situation where the volume in the tank remains constant. However, this does not have to be the case. If fi 6= fo , then we need to find
a valid expression for v(t).
Example 1.28. Suppose we again have a 200 gallon tank that is initially filled
with 50 gallons of pure water. If water flows in at a rate of 5 gal/min and flows out
at a rate of 3 gal/min, when will the tank be full?
The rate at which the volume of fluid in the tank changes depends on two
factors, the initial volume of the tank, and the difference in flow rates.
v(t) = v0 + [fi (t) − fo (t)]t
In this example, we have:
v(t) = 50 + [5 − 3]t
v(t) = 50 + 2t.
The tank will be completely full when v(t) = 200, and this will occur when t =
75.
4
26
1. First Order Equations
6. Application: Salmon Smolt Migration Model
Salmon spend their early life in rivers, and then swim out to sea where they live
their adult lives and gain most of their body mass. When they have matured, they
return to the rivers to spawn. Usually they return with uncanny precision to the
river where they were born, and even to the very spawning ground of their birth.
The salmon run is the time when adult salmon, which have migrated from
the ocean, swim to the upper reaches of rivers where they spawn on gravel beds.
Unfortunately, the building of dams and the reservoirs produced by these dams have
disrupted both the salmon run and the subsequent migration of their offspring to
the ocean.
Luckily, the problem of how to allow the adult salmon to migrate upstream past
the tall dams has been solved with the introduction of fish ladders and in a few
circumstances fish elevators. These devices allow the salmon to rise up in elevation
to the level of the reservoir and thus overcome the dam.
However, the reservoirs still cause problems for the new generation of salmon.
About 90 to 150 days after deposition, the eggs or roe hatch. These young salmon
called fry remain near their birthplace for 12 to 18 months before traveling downstream towards the ocean. Once they begin this migration to the ocean they are
called smolts.
The problem is that the reservoirs tend to be quite large and the smolt population literally becomes far less concentrated in the reservoir water than their original
concentration in the stream or river which fed the reservoir. Thus the water exiting
the reservoir through the spillway has a very low concentration of smolts. This
increases the time required for the migration. The more time the smolts spend in
the reservoir, the more likely it is that they will be preyed upon by larger fish.
The question is how to speed up smolt migration through reservoirs in order
to keep the salmon population at normal levels.
Let s(t) be the number of smolts in the reservoir. It is impractical to measure
the concentration of smolts in the river which feeds the reservoir (the tank). Instead
we will assume that the smolts arrive at a steady rate, r which has units of fish/day.
If we assume the smolts spread out thoroughly through the reservoir, then the
outflow concentration of the smolts is simply the number of smolts in the reservoir,
s(t) divided by the volume of the reservoir which for this part of the problem we
will assume remains constant, v. Finally, assume the outflow of water from the
reservoir is constant and denote it by f . We have the following IVP:
(29)
s(t)
ds
=r−
f
dt
v
s(0) = s0
We can use the integrating factor method to show that the solution to this IVP
is:
(30)
s(t) =
f
vr
vr
+ s0 −
e− v t .
f
f
6. Application: Salmon Smolt Migration Model
27
ds f
+ s=r
dt
v
Rf
f
dt
v
ρ(t) = e
= evt
Z
f
f
se v t = re v t dt
f
se v t =
vr f t
ev + C
f
f
Multiply both sides by e− v t :
s(t) =
f
vr
+ Ce− v t
f
general solution
Use the initial value, s(0) = s0 to find C:
vr
+ Ce0
s0 =
f
vr
C = s0 −
f
In the questions that follow, assume the following values, and keep all water
measurements in millions of gallons.
r = 1000 fish/day
v = 50 million gallons
f = 1 million gallons/day
s0 = 25000 fish
(1) How many fish are initially exiting the reservoir per day?
(2) How many days will it take for the smolt population in the reservoir to reach
40000?
One way to allow the smolts to pass through the reservoir more quickly is to
draw down the reservoir. This means letting more water flow out than is flowing in.
Reducing the volume of the reservoir increases the concentration of smolts resulting
in a higher rate of smolts exiting the reservoir through the spillway.
This situation can be modeled by the following IVP:
ds
s(t)
=r−
· fout
dt
v0 + ∆f t
s(0) = s0 ,
where v0 is the initial volume of the reservoir and ∆f = fin − fout . Use this model
and
fin = 1 mil gal/day
fout = 2 mil gal/day
to find a function s(t) which gives the number of smolts in the reservoir at time t.
(3) How many days will it take to reduce the smolt population from 25000 down
to 20000? And what will the volume of the reservoir be?
28
1. First Order Equations
7. Homogeneous Equations
A homogeneous function is a function with multiplicative scaling behaviour. If the
input is multiplied by some factor then the output is multiplied by some power of
this factor. Symbolically, if we let α be a scalar—any real number, then a function
f (x) is homogeneous if f (αx) = αk f (x) for some positive integer k. For example,
f (x) = 3x is homogeneous of degree 1 because
f (αx) = 3(αx) = α3x = αf (x).
In this example k = 1, hence we say f is homogeneous of degree one. A function
whose graph is a line which does not pass through the origin, such as g(x) = 3x + 1
is not homogeneous because,
g(αx) = 3(αx) + 1 = α(3x) + 1 6= α(3x + 1) = αg(x).
Definition 1.29. A multivariable function, f (x, y, z) is homogeneous of degree k,
if given a real number α the following holds
f (αx, αy, αz) = αk f (x, y, z).
In other words, scaling all of the inputs by the same factor results in the output
being scaled by some power of that factor.
Monomials in n variables form homogeneous functions. For example, the monomial in three variables: f (x, y, z) = 4x3 y 5 z 2 is homogeneous of degree 10 since,
f (αx, αy, αz) = 4(αx)3 (αy)5 (αz)2 = α10 (4x3 y 5 z 2 ) = α10 f (x, y, z).
Clearly, the degree of a monomial function is simply the sum of the exponents
on each variable. Polynomials formed from monomials of the same degree are
homogeneous functions. For example, the polynomial function
g(x, y) = x3 + 5x2 y + 9xy 2 + y 3
is homogeneous of degree three since, g(αx, αy) = α3 g(x, y).
Definition 1.30. A first order differential equation is homogeneous if it can be
written in the form
(31)
a(x, y)
dy
+ b(x, y) = 0,
dx
where a(x, y) and b(x, y) are homogeneous functions of the same degree.
Suppose both a(x, y) and b(x, y) from equation (31) are of degree k, then we
can rewrite equation (31) in the following manner:
y
k
Z
y
xZ
b 1,
dy
xy = F
(32)
=−
.
k
dx
x
Z
xZ
a 1,
x
An example will illustrate the rewrite rule demonstrated in equation (32).
7. Homogeneous Equations
29
Example 1.31. Transform the following first order, homogeneous equation into
dy
the form dx
= F ( xy ).
(x2 + y 2 )
dy
+ (x2 + 2xy + y 2 ) = 0
dx
(x2 + 2xy + y 2 )
dy
=−
dx
(x2 + y 2 )
y
y 2
2
Z
x
1
+
2
+
Z
x
x
dy
=−
2 y
2
dx
Z
xZ 1 +
x
4
Definition 1.32. A multivariable function, f (x, y, z) is called scale invariant if
given any scalar α,
f (αx, αy, αz) = f (x, y, z).
Lemma 1.33. A function of two variables f (x, y) is scale invariant iff the function
depends only on the ratio xy of the two variables. In other words, there exists a
function F such that
y
f (x, y) = F
.
x
Proof.
(⇒) Assume f (x, y) is scale invariant, then for all scalars α, f (αx,
αy) = f (x, y).
Pick α = 1/x, then f (αx, αy) = f (x/x, y/x) = f (1, y/x) = F xy .
αy
(⇐) Assume f (x, y) = F xy , then f (αx, αy) = F αx
= F xy = f (x, y).
Thus by the lemma, we could have defined a first order, homogeneous equation
as one where the derivative is a scale invariant function. Equivalently we could
have defined it to be an equation which has the form:
y
dy
(33)
=F
.
dx
x
7.1. Solution Method. Homogeneous differential equations are special because
they can be transformed into separable equations.
Chapter 2
Models and Numerical
Methods
1. Population Models
1.1. The Logistic Model. Our earlier population model suffered from the fact
that eventually the population would “blow up” and grow at unrealistic rates. This
was due to the fact that the solution involved an exponential function. Recall the
model and solution:
(5)
(16)
dP
= kP
dt
P (t) = P0 ekt .
Bacteria in a petri dish can’t reproduce forever because they eventually run out
of food and space. In our previous population model, the constant of proportionality
was actually the birth rate minus the death rate: k = β − δ, where k and therefore
also β and δ have units of 1/time.
To make our model more realistic, we need the birth rate to taper off as the population reaches a certain number or size. Perhaps the simplest way to accomplish
this is to have it decrease linearly with population size.
β(P ) = β0 − β1 P
For this to make sense in the original equation, β0 must have units of 1/time, and
β1 must have units of 1/(population·time). Let’s incorporate this new, decreasing
birth rate into the original population model.
31
32
2. Models and Numerical Methods
dP
= [(β0 − β1 P ) − δ]P
dt
= P [(β0 − δ) − β1 P ]
β0 − δ
= β1 P
−P
β1
In order to get a simple, easy to remember equation, let’s let k = β1 and M =
β0 −δ
β1 .
dP
= kP (M − P )
dt
(34)
Notice that M has units of population. We have specifically written equation 34,
in the form at the bottom of the derivation because M has a special meaning, it is
the carrying capacity of the population.
Notice that equation 34 is separable, so we know how to go about solving it.
However, before we solve the logistic model, let’s refresh our memory of solving
integrals via partial fractions, because we will need to use this when solving the
logistic model. Let’s solve a simplified version of the logistic model, with k = 1 and
M = 1.
dx
= x(1 − x)
dt
(35)
Z
1
dx = dt
x(1 − x)
Z Z
A
B
+
dx = dt
x
(1 − x)
A(1 − x) + Bx = 1
Z
x = 0 : A(1 − 0) + B · 0 = 1 ⇒ A = 1
x = 1 : A(1 − 1) + B · 1 = 1 ⇒ B = 1
Z
Z 1
1
+
dx = dt
x (1 − x)
ln |x| − ln |1 − x| = t + C0
x = t + C0
ln 1 − x
x t+C0
= C1 et
1 − x = e
 x


x 1 − x
=
1 − x 

 x
x−1
x
≥0
1−x
0≤x<1
x
<0
1−x
x<0
S
x>1
1. Population Models
33
Let’s solve for x(t) for 0 ≤ x < 1:
x = (1 − x)C1 et
x + C1 xet = x0 et
x(1 + C1 et ) = C1 et
C 1 et
1 + C 1 et
1
x(t) =
1 + Ce−t
x=
(36)
When x < 0 or x > 1, then we get:
(37)
x(t) =
1
1 − Ce−t
The last solution occurs when x(t) = 1, because this forces dx/dt = 0.
Figure 1.1 shows some solution curves superimposed on the slope field plot for
x0 = x(1 − x). Notice that the solution x(t) = 1 seems to “attract” solution curves,
but the solution x(t) = 0, “repels” solution curves.
Figure 1.1. Slope field plot and solution curves for x0 = x(1 − x).
Let us now use what we just learned to solve the logistic model, with an initial
condition.
(34)
dP
= kP (M − P )
dt
P (0) = P0
34
2. Models and Numerical Methods
Z
Z
1
dP = k dt
P (M − P )
Z
Z
1/M
1/M
+
dP = k dt
P
(M − P )
Z
Z
1
1
+
dP = kM dt
P
(M − P )
P = kM t + C0
ln M −P
P kM t
M − P = C1 e

P



M − P
P M − P = 



P
P −M
If we solve for the first case, we find:
0≤P <M
P <0
S
P >M
P = (M − P )C1 ekM t
P + P C1 ekM t = M C1 ekM t
M C1 ekM t e−kM t
·
1 + C1 ekM t e−kM t
M C1
P = −kM t
.
e
+ C1
P =
(38)
Now we can plug in the initial condition to get a particular solution:
M C1
1 + C1
P0 + P0 C1 = M C1
P0 =
P0 = M C 1 − P0 C 1
P0
C1 =
M − P0
0
M MP−P
0
P (t) =
0
e−kM t + MP−P
0
(39)
P (t) =
M P0
.
P0 + (M − P0 )e−kM t
2. Equilibrium Solutions and Stability
Whenever the right hand side of a first order equation only involves the dependent
variable, then we can quickly determine the qualitative behavior of its solutions.
2. Equilibrium Solutions and Stability
35
For example, if a differential equation has the form:
dy
(40)
= f (y).
dx
Definition 2.1. When the independent variable does not appear explicitly in a
differential equation, we say that equation is autonomous.
Recall from section 3 how a computer makes a slope field plot. It simply grids
off the xy–plane and then at each vertex of the grid draws a short bar with slope
corresponding to f (xi , yi ), however if the right hand side function is only a function
of the dependent variable, y in this case, then the slope field does not depend on
the independent variable, i.e. location on the x–axis. This means that for an
autonomous equation, the slopes which lie on a horizontal line such as y = 2 are
all equivalent and thus parallel.
This means that if a solution curve is shifted (translated) left or right along
the x-axis, then this shifted curve will also be a solution curve, because it will
still fit the slope field. We have established an important property of autonomous
equations, namely translation invariance.
2.1. Phase Diagrams. Consider the following autonomous differential equation:
(41)
y 0 = y(y − 2).
Notice that the two constant functions y(x) = 0, and y(x) = 2 are solutions
to equation 41. In fact any time you have an autonomous equation, any constant
function which makes the right hand side of the equation zero will be a solution.
This is because constant functions have slope zero. Thus as long as this constant
value of y is a root of the right hand side, then that particular constant function
will satisfy the equation. Notice that other constant functions such as y(x) = 1
and y(x) = 3 are not solutions of equation 41, because y 0 = 1(1 − 2) = −1 6= 0 and
y 0 = 3(3 − 2) = 1 6= 0 respectively.
Definition 2.2. Given an autonomous first order equation: y 0 = f (y), the solutions
of f (y) = 0 are called critical points of the equation.
So the critical points of equation 41 are y = 0 and y = 2.
Definition 2.3. If c is a critical point of the autonomous first order equation
y 0 = f (y), then y(x) ≡ c is an equilibrium solution of the equation.
So the equilibrium solutions of equation 41, are y(x) = 0 and y(x) = 2. Something in equilibrium, is something that has settled and does not change with time,
i.e. is contant.
To create the phase diagram for this function we pick y values surrounding the
critical points to determine whether the slope is positive or negative.
y = −1 : −1(−1 − 2) = (−) (−) = +
y = 1 : 1(1 − 2) = (+) (−) = −
y = 3 : 3(3 − 2) = (+) (+) = +
36
2. Models and Numerical Methods
Example 2.4. Create a phase diagram and plot several solution curves by hand
for the differential equation: dx/dt = x3 − 7x2 + 10x.
We factor the right hand side to find the critical points and hence equilibrium
solutions.
x3 − 7x2 + 10x = 0
x(x2 − 7x + 10) = 0
x(x − 2)(x − 5) = 0
The critical points are x = 0, 2, 5, and thus the equilibrium solutions are x(t) =
0, x(t) = 2 and x(t) = 5.
Figure 2.1. Phase diagram for y 0 = y(y − 2).
Figure 2.2. Phase diagram for x0 = x3 − 7x2 + 10x.
Figure 2.3. Hand drawn solution curves for x0 = x3 − 7x2 + 10x.
2. Equilibrium Solutions and Stability
37
4
2.2. Logistic Model with Harvesting. A population of fish in a lake is often
modeled accurately via the logistic model. But the question, “How do you take
into account the decrease in fish numbers as a result of fishing?”, soon arises. If
the amount of fish harvested from the lake is relatively constant per time period,
then we can modify the original logistic model, equation 34, by simply subtracting
the amount harvested.
(42)
dx
= kx(M − x) − h
dt
Where h is the amount harvested, and where we have switched from the population being represented by the variable P to the variable x, simply because it is
more familiar.
Example 2.5. Suppose a lake has a carrying capacity of M = 16, 000 fish, and a
k value of k = .125 = 18 . What is a safe yearly harvest rate?
To simplify the numbers we have to deal with, let’s let x(t) measure the fish
population in thousands. Then the equation we wish to examine is:
1
(43)
x0 = x(16 − x) − h.
8
We don’t need to actually solve this differential equation to understand the
behavior of its solutions. We just need to determine for which range of h values
will the right hand side of the equation result in equilibrium solutions. Thus we
only need to solve a quadratic equation with parameter h:
1
x(16 − x) − h = 0
8
x(16 − x) − 8h = 0
16x − x2 − 8h = 0
(44)
(45)
x2 − 16x + 8h = 0
√
−b ± b2 − 4ac
x=
√2a
16 ± 256 − 32h
x=
√2
16 ± 4 16 − 2h
x=
√2
x = 8 ± 2 16 − 2h
Recall that if the discriminant is positive, i.e. 16 − 2h > 0, then we get two
distinct rational roots. When the discriminant is zero, i.e. 16 − 2h = 0 we get a
repeated rational root. And finally, when the discriminant is negative, i.e. 16−2h <
0, then we get two complex conjugate roots.
The critical values are exactly the roots of the right hand side polynomial, and
we only get equilibrium solutions for real critical values, thus if the fish population
38
2. Models and Numerical Methods
(a) h = 10
(b) h = 8
(c) h = 7.5
(d) h = 6
Figure 2.4. Logistic model with harvesting
is to survive the harvesting, then we must choose h so that we get at least one real
root. Notice that for any value of h ≤ 8 we get at least one real root. Further,
letting h = 8, 7.5, 6, 3.5 all result in the discriminant being a perfect square, which
allows us to factor equation 44 nicely.
x2 − 16x − 8(8) = x2 − 16x − 64 = (x − 8)(x − 8)
x2 − 16x − 8(7.5) = x2 − 16x − 60 = (x − 6)(x − 10)
x2 − 16x − 8(6) = x2 − 16x − 48 = (x − 4)(x − 12)
x2 − 16x − 8(3.5) = x2 − 16x − 28 = (x − 2)(x − 14)
Thus we find that any harvesting rate above 8,000 fish per year is sure to result
in the depletion of all fish. But actually harvesting 8,000 fish per year is risky,
because if you accidentally overharvest one year, you could eventually cause the
depletion of all fish. So perhaps a harvesting level somewhere between 6,000 and
7,500 fish per year would be acceptable.
4
3. Acceleration–Velocity Models
39
3. Acceleration–Velocity Models
In section 2 we modeled a falling object, but we ignored the frictional force due to
wind resistance. Let’s fix that omission.
The force due to wind resistance can be modeled by positing that the force will
be in the opposite direction of motion, but proportional to velocity.
FR = −kv
(46)
Recall from physics that Newton’s second law of motion: ΣF = ma = m(dv/dt),
relates the sum of the forces acting on a body with its rate of change of momemtum.
There are two forces acting on a falling body, one is the pull of gravity, and the
other is a buoying force due to wind resistance. If we set up our y–axis with the
positive y direction pointing upward and let zero correspond to ground level, then
FR = −kv = −k(dy/dt). Note that this is an upward force because v is negative,
thus the sum of the forces is:
ΣF = FR + FG = −kv − mg.
(47)
Hence our governing IVP becomes:
m
(48)
dv
= −kv − mg
dt
k
dv
=− v−g
dt
m
dv
= −ρv − g
dt
v(0) = v0
This is a separable, first–order equation. Let’s solve it.
Z
1
dv = −
ρv + g
Z
dt
1
ln |ρv + g| = −t + C
ρ
ln |ρv + g| = −ρt + C
eln|ρv+g| = e−ρt+C
|ρv + g| = Ce−ρt

−ρt

ρv + g ≥ 0
 Ce
ρv + g =


−Ce−ρt ρv + g < 0

g
g

v≥−
Ce−ρt −


ρ
ρ

v(t) =

g
g


−Ce−ρt −
v<−
ρ
ρ
40
2. Models and Numerical Methods
Next, we plug in the initial condition v(0) = v0 to get a particular solution.
g
v0 = C −
ρ
g
C = v0 +
ρ
 g
g
g
−ρt

−
v≥−
v
+

0
ρ e

ρ
ρ

(49)
v(t) =


− v + g e−ρt − g v < − g

0
ρ
ρ
ρ
Notice that the limit as time goes to infinity of both solutions is the same.
g
g
mg
g
(50)
lim v(t) = lim ± v0 +
e−ρt − = − = −
t→∞
t→∞
ρ
ρ
ρ
k
This limiting velocity is called terminal velocity. It is the fastest speed that a
dropped object can achieve. Notice that it is negative because it is a downward
velocity.
The first solution of equation 49 handles the situation where the body is falling
more slowly than terminal velocity. The second solution handles the case where
the body or object is falling faster than terminal velocity, for example a projectile
shot downward.
Example 2.6. In example 1.13 we calculated that it would take approximately
5.59 seconds for an object to fall 500 feet, but we neglected the effects of wind
resistance. Compute how long it will take for an object to fall 500 feet if ρ = .16,
and compute its final velocity.
Recall that v(t) = dy/dt, and since v(t) is only a function of the independent
variable, t, we can integrate the velocity to find the position as a function of time.
Since we are dropping the object from 500 feet, y0 = 500 and v0 = 0.
Z
Z
dy
y(t) =
dt = v(t) dt
dt
Z g
g
y(t) =
v0 +
e−ρt −
dt
ρ
ρ
Z
Z
g
g
y(t) = v0 +
e−ρt dt −
dt
ρ
ρ
g
−1 −ρt
g
e
−
t+C
y(t) = v0 +
ρ
ρ
ρ
32
+C
500 = −
(.16)2
C = 500 + 1250
C = 1750
(51)
y(t) = −1250e−.16t − 200t + 1750
As you can see in figure 3.1, when we model the force due to wind resistance it
adds almost a full second to the amount of time that it takes for an object to fall
4. Numerical Solutions
41
Figure 3.1. Falling object with and without wind resistance
500 feet. In fact it takes approximately 6.56 seconds to reach the ground. Knowing
this, we can compute the final velocity.
v(6.56) = −1250e−.16(6.56) − 200(6.56) + 1750
= −1250e−.16(6.56) − 200(6.56) + 1750
60 mi/hr
≈ −130 ft/s
88 ft/s
mi
≈ −89
hr
Thus it takes almost a full second longer to reach the ground (6.56 s vs. 5.59 s)
and will be travelling at approximately -89 miles per hour as opposed to -122 miles
per hour.
4
4. Numerical Solutions
In actual real–world applications, more often than not, you won’t be able to find
an analytic solution to the general first order IVP.
(13)
dy
= f (x, y)
dx
y(a) = b
In this situation it often makes sense to approximate a solution via simulation.
We will look at an algorithm for creating approximate solutions called Euler’s
Method, and named after Leonhard Euler, (pronounced like “oiler”). The algorithm
42
2. Models and Numerical Methods
is easier to explain if the independent variable is time, so let’s rewrite the general
first order equation above using time, t as the independent variable:
(13)
dy
= f (t, y)
dt
y(t0 ) = y0 .
The fundamental idea behind Euler’s Method and all numerical/simulation
techniques is discretization. Essentially the idea is to change the independent variable time, t, from something that can take on any real number, to a variable that
is only allowed to have values from a discrete, i.e. finite sequence. Each time value
is separated from the next by a fixed period of time, the “tick” of our clock. The
length of this “tick” depends on how accurately we wish to approximate exact
solutions. Shorter tick lengths will result in more accurate approximations.
Normally a solution function must be continuous, and smooth, a.k.a. differentiable. Discretizing time forces us to relax the smoothness requirement. The
approximate solution curves we create will be continuous, but not smooth. They
will have small angular corners at each tick of the clock, i.e. at each time in the
discrete sequence of allowed time values.
The goal of the algorithm is to create a sequence of pairs (ti , yi ) which when
plotted and connected by straight line segments will approximate exact solution
curves. The method of generating this sequence is recursive, i.e. computing the
next pair in the sequence will require us to know the values of the previous pair in
the sequence. This recursion is written via two equations:
ti+1 = ti + ∆t
yi+1 = yi + ∆y,
where the subscript i+1 refers to the “next” value in the sequence, and the subscript
i refers to the “previous” value in the sequence. There are two values in the above
equations that we must compute, ∆t, and ∆y. ∆t is simply the length of each clock
tick, which is a constant that we choose. ∆y on the other hand changes and must
be computed using the discretized version of equation 13:
(52)
∆y = f (ti , yi )∆t
y(t0 ) = y0
We start the clock at time t0 , which we call time zero. This will often be zero,
however any starting time will work. The time after one tick is labelled t1 , and the
time after two ticks is labelled t2 and so on and so forth. We know from the initial
condition y(t0 ) = y0 what y value corresponds to time zero, and with equation 52
we can approximate y1 as follows:
(53)
y1 ≈ y0 + ∆y = y0 + f (t0 , y0 )∆t.
If we continue in this fashion, we can generate a table of (ti , yi ) pairs which,
as long as ∆t is “small” will approximate a particular solution through the point
(t0 , y0 ) in the ty plane. Generating the left hand column of our table of values
couldn’t be easier. It is done via adding the same small time interval, ∆t to the
4. Numerical Solutions
43
current time, to get the next time, i.e.
t1 = t0 + ∆t
t2 = t1 + ∆t = t0 + 2∆t
t3 = t2 + ∆t = t0 + 3∆t
t4 = t3 + ∆t = t0 + 4∆t
..
.
(54)
tn+1 = tn + ∆t = t0 + (n + 1)∆t
Generating the yi values for this table is harder because unlike ∆t which stays
constant, ∆y depends on the previous time and y value.
y1 ≈ y0 + ∆y = y0 + f (t0 , y0 )∆t
y2 ≈ y1 + ∆y = y1 + f (t1 , y1 )∆t
y3 ≈ y2 + ∆y = y2 + f (t2 , y2 )∆t
y4 ≈ y3 + ∆y = y3 + f (t3 , y3 )∆t
..
.
(55)
yn+1 ≈ yn + ∆y = yn + f (tn , yn )∆t
Equations 54 and 55, together with the initial condition y(t0 ) = y0 constitute
the numerical solution technique known as Euler’s Method.
Chapter 3
Linear Systems and Matrices
1. Linear and Homogeneous Equations
In order to understand solution methods for higher–order differential equations,
we need to switch from discussing differential equations to discussing algebraic
equations, specifically linear algebraic equations. However, since the phrase “linear
algebraic equation” is a bit of a mouthful, we will shorten it to the simpler “linear
equation”.
Recall how we provisionally defined a linear differential equation in definition 1.4 to be any differential equation where all of its solutions satisfy summability
and scalability. If there is any justice in mathematical nomenclature, then linear
equations should also involve summing and scaling, and indeed they do.
A linear equation is any equation that can be written as a finite sum of scaled
variables set equal to a scalar. For example, 2x + 3y = 1 is an example of a linear
equation in two variables, namely x and y. Another example is:
x − 2y + 3 = 7z − 11,
because this can be rearranged to the equivalent equation:
x − 2y − 7z = −14.
Notice that there is no restriction on the number of variables other than that
there must be a finite number. So for example, 2x = 3 is an example of a linear
equation in one variable, and 4w − x + 3y + z = 0 is an example of a linear equation
in four variables. Typically, once we go beyond four variables we begin to run out
of the usual variable names, and thus switch to using subscripted variables and
subscripted coefficients as in the following definition.
Definition 3.1. A linear equation is a finite sum of scaled variables set equal to
a scalar. More generally, a linear equation is any equation that can be written in
the form:
(56)
a1 x1 + a2 x2 + a3 x3 + · · · + an xn = b
45
46
3. Linear Systems and Matrices
In this case we would say that the equation has n variables, which stands for
some as yet undetermined but finite number of variables.
Notice the similarity between the definition of a linear equation given above and
definition 1.25, which defined a linear differential equation as a differential equation
that can be written as a scaled sum of the derivatives of a function set equal to
a scalar function. If you think of each derivative as a distinct variable, then the
above definition is very similar to our current definition for linear equations. Here
we reproduce the equation from each definition for comparison.
a1 x1 + a2 x2 + a3 x3 + · · · + an xn = b
(56)
(22)
an (x)y
(n)
+ an−1 (x)y (n−1) + · · · + a1 (x)y 0 + a0 (x)y = f (x)
We can add another variable and rearrange equation 56 to increase the similarity.
an xn + an−1 xn−1 + · · · + a1 x1 + a0 x0 = b
(56’)
(22)
an (x)y
(n)
+ an−1 (x)y (n−1) + · · · + a1 (x)y 0 + a0 (x)y = f (x)
The difference between these two equations is that we switch from scalar coefficients and variables to scalar coefficient functions and derivatives of a function.
Definition 3.2. A homogeneous equation is a linear equation where the right
hand side of the equation is zero.
Every linear equation has an associated homogeneneous equation which can be
obtained by changing the constant term on the right hand side of the equation to
0. Later, we will see that undersanding the set of all solutions of a linear equation,
what we will eventually call the solution space, will be facilitated by understanding
the solutions to the homogenous equation.
Homogeneous equations are special because they always have a solution, namely
the origin. For example the following homogeneous equation in three variables has
solution (0, 0, 0), as you can easily check.
2x − 7y − z = 0
There are an infinite number of other solutions as well, for example (0, 1, −7)
or (1/2, 0, 1) or (7, 1, 7). The interesting thing to realize is that we can take any
number of solutions and sum them together to get a new solution. For example,
(0, 1, −7) + (1/2, 0, 1) + (7, 1, 7) = (15/2, 2, 1) is yet another solution. Thus homogeneous equations have the summability property. And as you might guess, they
also have the scalability property. For example 3 · (0, 1, −7) = (0, 3, −21) is again a
solution to the above homogeneous equation.
Notice that summability and scalability do not hold for regular linear equations,
just the homogeneous ones. For example
2x − 7y − z = 2
has solutions (7/2, 1, −2) and (1, 0, 0) but their sum, (9/2, 1, −2) is not a solution.
Nor can we scale the two solutions above to get new solutions. At this point it is
2. Introduction to Linear Systems
47
natural to wonder why we call these equations “linear” at all if they don’t satisfy
the summability or scalability properties.
The reason that we do classify these equations as linear is rather simple. Notice
that when we plug any solution to the homogeneous equation, i.e. equation 62, into
the left hand side of equation 1 we get zero. But if we add a solution to the
homogeneous equation to a solution to the non–homogeneous linear equation we
get a new solution to the linear equation. For example, (7/2, 1, −2) is a solution
to the linear equation, and (7, 1, 7) is a solution to the corresponding homogeneous
equation. Their sum,
(7/2, 1, −2) + (7, 1, 7) = (21/2, 2, 5),
is a solution to the linear equation:
2(21/2) − 7(2) − (5) = 21 − 14 − 5 = 2
The explanation for this situation is simple. It works because the distributive
property of multiplication over addition and subtraction holds. That is a(x + y) =
ax + ay.
2(21/2) − 7(2) − (5) = 2(7/2 + 7) − 7(1 + 1) − (−2 + 7)
= 2(7/2) + 2(7) − 7(1) − 7(1) − (−2) − (7)
= [2(7/2) + 7(1) − (−2)] + [2(7) − 7(1) − (7)]
=0+2=2
What the above calculation shows is that given a solution to any linear equation,
we can always add to this solution any solution of the corresponding homogeneous
equation to get a new solution. This is perhaps the fundamental concept of linear
equations.
We will exploit this fact repeatedly when solving systems of linear equations,
and later when we solve second order and higher order differential equations.
2. Introduction to Linear Systems
A linear system is a collection of one or more linear equations. For example,
2x − 2y = −2
3x + 4y = 11.
The above linear system is an example of a 2 × 2 sytem, prononunced “two by two”,
because it has two equations in two unknowns, (x and y).
When a system has equations with two unknowns as is the case above, then
the solution set will be the set of all pairs of real numbers, (x, y), that satisfy both
equations simultaneously. Geometrically, since each equation above is the equation
of a line in the xy–plane, the solution set will be the set of all points in the xy–plane
that lie on both lines.
48
3. Linear Systems and Matrices
2.1. Method of Elimination. Finding the solution set is done by the method
of elimination, which in the 2 × 2 case has three steps:
(1) Add a multiple of equation (1) to equation (2) such that one of the variables,
perhaps x, will sum to 0 and hence be eliminated.
(2) Solve the resulting equation for the remaining variable, in this case y.
(3) Back–substitute the value found in step two into either of the original two
equations, and solve for remaining variable, in this case x.
Example 3.3. Use the method of elimination to solve the following system.
2x − 2y = −2
3x + 4y = 11
(1)
(2)
(1) Multiplying the first equation by −3/2 and adding to the second yields:
−3x + 3y = 3
3x + 4y = 11
7y = 14
− 23 (1)
(2)
(10 )
(2) 7y = 14 implies y = 2.
(3) Now plug y = 2 into equation (1):
2x − 2(2) = −2
2x − 4 = −2
2x = 2
x=1
The solution set is therefore {(1, 2)}, which we’ll often simply report as the pair
(1,2).
4
Notice that the method of elimination transformed a 2 × 2 system down to a
1 × 1 system, namely 7y = 14. In other words, the method works by transforming
the problem down to one that we already know how to solve. This is a common
problem solving technique in math.
A natural question to ask at this point is: “What are the possible outcomes of
the method of elimination when applied to a 2×2 system?” Since the transformation
down to a 1×1 system can always be performed this question is equivalent to asking,
“What are the possible outcomes of solving a 1 × 1 system?”
Let’s consider the following linear equation in the single variable x:
ax = b,
where a and b are real constants. We solve this system by multiplying both sides
of the equation by the multiplicative inverse of a, namely a−1 or equivalently a1 , to
yield:
b
x = a−1 b = .
a
However, upon a little reflection we realize that if a = 0, then we have a problem
because 0 has no mulitiplicative inverse, or equivalently, division by 0 is undefined.
2. Introduction to Linear Systems
49
So clearly not every 1 × 1 system has a solution. There is also another interesting
possibility, if both a and b are zero, then we get
0 · x = 0,
and this equation is true for all possible values of x, or in other words it has infinitely
many solutions. Let’s summarize our findings regarding ax = b:
(1) a 6= 0: one unique solution.
(2) a = 0 and b 6= 0: no solution.
(3) a = 0 and b = 0: infinitely many solutions.
Thus we have shown that solving a two by two system may result in three
distinct possibilities. What do these three distinct possibilities correspond to geometrically? Recall that solving a two by two system corresponds to finding the set
of all points in the intersection of two lines in the plane. The geometric analogies
are as follows:
(1) One unique solution :: two lines intersecting in a single point.
(2) No solution :: two parallel lines which never intersect.
(3) Infinitely many solutions :: two overlapping lines.
The method of elimination extends naturally to solving a 3 × 3 system, that is
a set of three linear equations in three unknowns. We will do one more example
and then use the knowledge gained to generalize the method to n × n systems.
Example 3.4. Since we know how to solve a 2 × 2 system, we will transform a
3 × 3 system down to a 2 × 2 system and proceed as we did in the previous example.
Consider the following system,

 3x − 2y + z = −1 (1)
x + 5y + 2z = 11 (2)

−x + 2y − z = 3 (3)
The basic idea is to pick one equation and use it to eliminate a single variable
from the other two equations. First we’ll use equation (2) to eliminate x from
equation (1), and number the resulting equation (1’).
3x − 2y + z = −1
−3x − 15y − 6z = −33
− 17y − 5z = −34
(1)
−3(2)
(10 )
Now we use equation (2) to eliminate x from equation (3) and number the resulting
equation (20 ).
−x + 2y − z = 3
(3)
x + 5y + 2z = 11
(2)
7y + z = 14 (20 )
Note, we could just as well have numbered the resulting equation (30 ) since we
eliminated x from equation (3). I numbered it (20 ) because I like to think of it as
the second equation in a 2 × 2 system. You can use whichever numbering scheme
you prefer.
50
3. Linear Systems and Matrices
We see that we have reduced the problem down to a 2 × 2 system:
−17y − 5z = −34
(10 )
7y + z = 14
(20 )
Now we must pick the next variable to eliminate. Eliminating z from the above
system will be a little easier than eliminating y.
(10 )
5(20 )
(100 )
−17y − 5z = −34
35y + 5z = 70
18y
= 36
Equation (100 ) becomes y = 2. Now that we know one of the values in the single
prime system we can use either (10 ) or (20 ) to solve for z. Let’s use equation (10 ).
−17(2) + 5z = −34
−34 + 5z = −34
5z = 0
z=0
Finally, we can use any one of equations (1), (2) or (3) to solve for x, let’s use
equation (1).
3x − 2(2) + (0) = −1
3x − 4 = −1
3x = 3
x=1
The solution set to a 3 × 3 system is the set of all triples (x, y, z) that satisfy
all three equations simultaneously. Geometrically, it is the set of all points at the
intersection of three planes. For this problem the solution set is {(1, 2, 0)}.
Clearly the point (1, 2, 0) is a solution of equation (1), but we should check that
it also satisfies equations (2) and (3).
?
(1) + 5(2) + 2(0) = 11 X
?
−(1) + 2(2) − (0) = 3 X
4
3. Matrices and Gaussian Elimination
The method of elimination for solving linear systems introduced in the previous
section can be streamlined. The method works well for small systems of equations
such as a 3 × 3 system, but as n grows, the number of variables one must write for
an n × n system grows as n2 . In addition, one must repeatedly write other symbols
such as addition, “+”, and “=”. However, we could just use the columns as a proxy
for the variables and just retain the coefficients of the variables and the column of
constants.
3. Matrices and Gaussian Elimination
51
Remark 3.5. The coefficients and column of constants encode all of the information in a linear system.
For example, a linear system can be represented as a grid of numbers, which
we will call an augmented matrix.



3 −2
1
2
 3x − 2y + z = 2
 0
←→
5
2 16
5y + 2z = 16

0
−x + 2y − z = 0
−1
2 −1
Gaussian Elimination is simply the method of elimination from the previous section
applied systematically to an augmented matrix. In short, the goal of the algorithm
is to transform an augmented matrix until it can be solved via back–substitution.
This corresponds to transforming the augmented matrix until the left side forms
a descending staircase (echelon) of zeros. To be precise, we wish to transform the
left side of the augmented matrix into row–echelon form.
Definition 3.6 (Row–Echelon Form). A matrix is in row–echelon form if it satisfies
the following two properties:
(1) Every row consisting of all zeros must lie beneath every row that has a nonzero
element.
(2) In each row that contains a nonzero element, the leading nonzero element of
that row must be strictly to the right of the leading nonzero element in the
row above it.
Remark 3.7. The definition for row–echelon form is just a precise way of saying
that the linear system has been transformed to a point where back–substitution
can be used to solve the system.
The operations that one is allowed to use in transforming an augmented matrix
to row–echelon form are called the elementary row operations, and there are three
of them.
Definition 3.8 (Elementary Row Operations). There are three elementary row
operations, which can be used to transform a matrix to row–echelon form.
(1) Multiply any row by a nonzero constant.
(2) Swap any two rows.
(3) Add a constant multiple of any row to another row.
Example 3.9. Solve the following linear system:

−x + 2y − z = −2
2x − 3y + 4z = 1

2x + 3y + z = −2



−1
2 −1 −2
−1 2 −1
R2 +2R1
 2 −3
 0 1
4
1
2
−→
2
3
1 −2
2 3
1

−2
−3
−2
R3 +2R1
−→
52
3. Linear Systems and Matrices

−1
 0
0
2 −1
1
2
7 −1

−2
−3
−6
R3 −7R2
−→

−1
 0
0
2
1
0
−1
2
−15

−2
−3
15
At this point, the last augmented matrix is in row–echelon form, however we can
do two more elementary row ops to make back–substitution easier.




1 −2 1
2
−1 2 −1 −2
−1R1
0
 0 1
2 −3
1 2 −3
−→
−1/15R3
0 0 −15 15
0
0 1 −1
The last augmented matrix corresponds to the linear system.

x − 2y + z = 2
y + 2z = −3

z = −1
Back–substituting z = −1 into the second equation yields:
y + 2(−1) = −3
y = −1
Back–substituting z = −1, y = −1 into the first equation yields:
x − 2(−1) + (−1) = 2
x+1=2
x=1
Thus the solution set is: {(1, −1, −1)}, which is just a single point of R3 .
4
Definition 3.10. Two matrices are called row equivalent if one can be obtained
by a (finite) sequence of elementary row operations.
Theorem 3.11. If the augemented coefficient matrices of two linear systems are
row equivalent, then the two systems have the same solution set.
3.1. Geometric Interpretation of Row–Echelon Forms. Recall that an augmented matrix directly corresponds to a linear system. The solution set of a linear
system has a geometric interpretation. In the tables which follow, the following
symbols will have a special meaning.
∗ = any nonzero number
= any number
The number of n×n, row–echelon matrix forms follows a pattern. It follows the
pattern found in Pascal’s triangle. Once you pick n, then just read the nth row of
Pascal’s triangle from left to right to determine the number of n × n matrices that
have 0, 1, 2, . . . , n rows of all zeroes. So for example, when n = 4 there are exactly
4 unique row–echelon forms that correspond to one row of all zeros, 6 unique row–
echelon forms that correspond to two rows of all zeros, and 4 unique row–echelon
forms that correspond to three rows of all zeros.
6. Matrices are Functions
53
Table 1. All possible 2 × 2, row–echelon matrix forms
All Zero
Rows
0
1
2
Representative Matrices Solution Set
∗ Unique point in R2
0 ∗
∗ 0 ∗
Line of points in R2
0 0 0 0
0 0
All points in R2 plane
0 0
Table 2. All possible 3 × 3, row–echelon matrix forms
All Zero
Rows
Representative Matrices


∗ 0 ∗ 
0 0 ∗
0
Solution Set
Unique point in R3
1

∗
0
0
∗
0

∗
 0
0
0
0
0

0
∗  0
0
0
∗
0
0

∗  Line of points in R3
0
2

∗
0
0
0
0

0
0  0
0
0
∗
0
0

0
0  0
0
0
0
0
0

∗
0
0

0
0
0
0
0
0

0
0
0
3
Plane of points in R3
All points in R3
4. Reduced Row–Echelon Matrices
5. Matrix Arithmetic and Matrix Equations
6. Matrices are Functions
It is conceptually advantageous to change our perspective from viewing matrices
and matrix equations as devices for solving linear systems to functions in their own
right. Recall that the reason we defined matrix multiplication the way we did was
so that we could write a linear system with many equations into a single matrix
equation.
That is, matrix multiplication allows us to write linear systems in a more
compact form. The following illustrates the equivalence of the two notations.
54
3. Linear Systems and Matrices
Table 3. Pascal’s Triangle
n = 0:
1
n = 1:
1
n = 2:
1
n = 3:
1
n = 4:
1
n = 5:
n = 6:
1
1
2x − y = 2
x + 3y = 1
2
1
3
4
5
6
1
3
6
10
15
1
4
10
20
←→
5
15
2
1
1
1
6
1
−1 x
2
=
3
y
1
If you squint a little bit, the above matrix equation evokes the notion of a
function. This function takes a pair of numbers as input and outputs a pair of
numbers. A function f : R → R, often written f (x) is defined by some expression
such as
f (x) = x2 + x + 5
When a function is defined in terms of an expression in x, then function application
is achieved by replacing all occurences of the variable x in the definition with the
input value and evaluating. For example f (3) is computed by
f (3) = 32 + 3 + 5 = 17.
In the matrix case, the function definition is the matrix itself, and application to
an input is achieved via matrix multiplication. For example,
2 −1 4
3
=
1 3
5
19
In general if A is an m × n matrix, then we can place any column vector with
n rows, i.e. a vector from Rn to the right of the matrix and multiplication will be
defined. Upon multiplication we will get a new vector with m rows, a vector from
Rm . In other words an m × n matrix is a function from Rn to Rm . We often denote
this:
Am×n : Rn → Rm .
6.1. Matrix Multiplication is Function Composition. The real reason matrix
multiplication is defined the way it is, is so that it agrees with function composition.
For example, if you have two n×n matrices, say A and B, then we know that,
A(B~x) = (AB)~x
because matrix multiplication is associative. But in the language of functions, the
above equation says that applying function B to your input vector, ~x, and then
6. Matrices are Functions
55
applying A to the result is the same as first composing (multiplying) A and B and
then applying the composite function (product) to the input vector.
Example 3.12. This example demonstrates that matrix multiplication does indeed
correspond with function composition.
Let,
1
~u =
,
0
0
~v =
.
2
If we plot both of these vectors on coordinate axes then we get the “L” shaped
figure you see to the right.
Figure 6.1. ~
u and ~v
The following matrix is called a rotation matrix, because it rotates all vectors
by θ radians (or degrees) in a counter–clockwise direction around the origin.
A=
cos θ
sin θ
− sin θ
cos θ
Let θ = π/4, and apply A to both ~u and ~v .
√ 2 1 −1
A=
.
2 1 1
√ √ 2 1 −1 1
2 1
A~u =
=
1
1
0
2
2 1
√ √ −1
2 1 −1 0
A~v =
= 2
2
1
2 1 1
Figure 6.2. A~
u and A~v
Next we introduce matrix B, which flips all vectors with respect to the y axis.
Flipping with respect to the y axis simply entails changing the sign of the x component and leaving the y component untouched, and this is exactly what B does.
B=
−1
0
0
1
56
3. Linear Systems and Matrices
−1 0 1
−1
B~u =
=
0 1 0
0
−1 0 0
0
B~v =
=
0 1 2
2
Figure 6.3. B~
u and B~v
Next, we multiply matrices A and B in both orders and apply them to ~u and
~v .
√ 2 1 −1
−1 0
C = BA =
=
0 1 2 1 1
√ 2 1 −1 −1 0
D = AB =
=
0 1
2 1 1
√ 2 −1
C~u =
1
2
√ 2 −1
C~v =
1
2
√ 2 −1 1
1 1
2
√ 2 −1 −1
2 −1 1
√ 2 −1
1 1
=
1 0
1
2
√ 1
1 0
= 2
1
1 2
Figure 6.4. BA~
u and BA~v
√ 2 −1
D~u =
2 −1
√ 2 −1
D~v =
2 −1
√ 2 −1
−1 1
=
1
0
2 −1
√ −1
−1 0
= 2
1
2
1
Figure 6.5. AB~
u and AB~v
You can use your thumb and index finger on your left hand, which form an
upright “L” shape to verify that first applying A (the rotation) to both vectors
followed by applying B (the flip) results in figure 6.4. Next, change the order and
first apply B (the flip) followed by A (the rotation), and that results in figure 6.5.
7. Inverses of Matrices
57
Recall that function application is read from left to right, so AB~u corresponds to
first applying B and then applying A. Adding parentheses may help: AB~u = A(B~u)
4
7. Inverses of Matrices
Perhaps the most common problem in algebra is solving an equation. But you’ve
probably never thought much about exactly what algebraic properties of arithmetic
allow us to solve as simple an equation as 2x = 3. Undoubtedly, you can look at the
equation and quickly arrive at an answer of x = 3/2, but what are the underlying
algebraic principles which you are subconsciously employing to allow you to draw
that conclusion?
Suppose a, b are real numbers, can we always solve the equation:
ax = b
for any unknown x?
No, not always. For example if a = 0 and b 6= 0, then there is no solution. This
is the only case that does not have a solution because 0 is the only real number that
does not have a multiplicative inverse. Assuming a 6= 0, you solve the equation in
the following manner:
ax = b
−1
a
(ax) = a−1 b
−1
(a
(existence of multiplicative inverses)
−1
b
(associativity of multiplication)
−1
b
(multiplicative inverse property)
−1
b
(multiplicative identity property)
a)x = a
1x = a
x=a
Notice that we never needed to use the commutative property of multiplication
nor distributivity. Associativity, inverses, and identity form the core of any algebraic
system.
Now we wish to solve matrix equations in a similar fashion, i.e. we wish to
solve matrix equations by multipying both sides of an equation by the inverse of a
matrix, e.g.
A~x = ~b
A−1 (A~x) = A−1~b
(A−1 A)~x = A−1~b
I~x = A−1~b
~x = A−1~b
where A is a matrix and ~x and ~b are vectors. Since matrix multiplication is
the same as composition of maps (functions), this method of solution amounts to
finding the inverse of the map A and then applying it to the vector ~b.
However, not all matrices have an inverse.
58
3. Linear Systems and Matrices
7.1. Inverse of a General 2 × 2 Matrix. In what follows, we will often need to
compute the inverse of 2×2 matrix. It will save time if we can generate a formula or
simple rule for determining when such a matrix is invertible and what the inverse
is.
To derive such a formula, we must compute the inverse of the matrix: A = ac db .

a

c
b
1
d
0

1

0


1
0
1
R
a 1
 −→

1
c
b
a
1
a
d
0

b
a
1
a
0
1
c
− ad−bc
a
ad−bc

0

1
−→
−cR1 +R2
b
−a
R2 +R1

−→

1

0

1 0

0 1
b
a
1
a
0
ad−bc
a
− ac
1

−→

d
ad−bc
b
− ad−bc
c
− ad−bc
a
ad−bc
a
ad−bc R2


Thus the inverse of A is:
A
−1
1
d −b
=
a
ad − bc −c
Clearly, we will not be able to invert A if ad − bc = 0, thus we have found the
condition for the genral 2×2 matrix which determines whether or not it is invertible.
8. Determinants
The determinant is a function which takes a square, n × n matrix and returns a
real number. If we let Mn (R) denote the set of all n × n matrices with entries from
R, then the determinant function has the following signature:
det : Mn (R) → R.
We denote this function two ways, det(A) = |A|. The algorithm for computing
this function is defined recursively, similar to how the elimination algorithm was
defined. Thus the first definition below, the definition for the minor of a matrix
will use the term determinant which is defined later. This is just the nature of
recursive algorithms.
Definition 3.13. The ij th minor of a matrix A, denoted Mij , is the determinant
of the matrix A with its ith row and j th column removed.
For example, if A is a 4 × 4 matrix, then M23 is:
a11 a12 a13 a14 a
a
a22 a23 a24 11
M23 = 21
= a31
a31 a32 a33 a34 a41
a41 a42 a43 a44 a12
a32
a42
a14 a34 a44 Definition 3.14. The ij th cofactor of a matrix A, denoted Aij , is defined to be,
Aij = (−1)(i+j) Mij .
Notice that cofactors are defined in terms of minors. Next we define the determinant in terms of cofactors.
8. Determinants
59
Definition 3.15. The determinant function is defined recursively, so we need two
cases, a base case and a recursive case.
• The determinant of a 1 × 1 matrix or scalar is just the scalar.
• The determinant of a square, n × n matrix A, is
det(A) = a11 A11 + a12 A12 + · · · + a1n A1n .
Notice that a11 , a12 , . . . , a1n are just the elements in the first row of the matrix
A. The A11 , A12 , . . . , A1n are cofactors of the matrix A. We call this a cofactor
expansion along the first row.
Let’s unwind the previous definitions to compute the determinant of the matrix,
a b
A=
c d
det(A) = a11 A11 + a12 A12
= aA11 + bA12
= a(−1)(1+1) d + b(−1)(1+2) c
= ad − bc
Notice that this same quantity appeared when we found the inverse of A in the
previous section. This is no accident. The determinant is closely related to invertibility.
Example 3.16. Compute det(A), if A is the following matrix.


1
0 3
A =  0 −2 2
−5
4 1
det(A) = a11 A11 + a12 A12 + a13 A13
= a11 (−1)1+1 M11 + a12 (−1)1+2 M12 + a13 (−1)1+3 M13
1+2 0 2
1+3 0
1+1 −2 2
= 1(−1)
4 1 + 0(−1)
−5 1 + 3(−1)
−5
−2 2
+ 0 + 3 0 −2
= −5
4 1
4
−2
4
Using the definition for the determinant of a 2 × 2 matrix found in the previous
section we get:
= −10 + 0 + 3(−10)
= −40
4
The recursive nature of the determinant makes it difficult to compute the determinants of large matrices even with computers. However, there are several key facts
about the determinant which make computations easier. Most computer systems
60
3. Linear Systems and Matrices
use the following theorem to make computing of determinants for large matrices
feasible.
Theorem 3.17 (Elementary Row Operations and the Determinant). Recall there
are three elementary row operations. They each affect the computation of |A| differently.
(1) Suppose B is obtained by swapping two rows of the matrix A, then
|B| = − |A| .
(2) Suppose B is obtained by multiplying a row of matrix A by a nonzero constant
k, then
|B| = k |A| .
(3) Suppose B is obtained by adding a multiple of one row to another row in matrix
A, then
|B| = |A| .
Theorem 3.18 (Determinants of Matrix Products). If A and B are n×n matrices,
then
|AB| = |A| |B|
Definition 3.19. The transpose of a matrix is obtained by changing its rows into
columns, or vice versa, and keeping their order intact. The transpose of a matrix
A is denoted AT .
Example 3.20.

2
0

2
5
1
0
−1
1
T

0
2
7
 = 1
3
0
−2
0
0
7
2
−1
3

5
1
−2
4
Theorem 3.21 (Transpose Facts). The following properties of transposes are often
useful.
(1) (AT )T = A.
(2) (A + B)T = AT + B T
(3) (cA)T = c(AT )
(4) (AB)T = B T AT
Theorem 3.22 (Determinants of Transposed Matrices). If A is a square matrix,
then det(AT ) = det(A).
8.1. Geometric Interpretation of Determinants.
Chapter 4
Vector Spaces
1. Basics
Definition 4.1. A vector is an ordered list of real numbers.
√ Vectors will be
denoted by a lower case letter with an arrow over it, e.g. ~v = ( 2, −3, 0).
Definition 4.2 (n-space). Rn = {(a1 , a2 , . . . , an ) | ai ∈ R for i = 1 . . . n}, which
in words reads: Rn (pronounced “r”, “n”) is the set of all vectors with n real
components.
Example 4.3. A vector in R2 is a pair of real numbers. For example (3, 2) is a
vector. We can interpret this pair in two ways:
(1) a point in the xy–plane, or
(2) an arrow whose tail is at the origin and whose tip is at (3, 2).
A vector in R3 is a triple of real numbers, and can be interpreted as a point
(x, y, z) in space, or as an arrow whose tail is at (0, 0, 0) and whose tip is at (x, y, z)
4
We sometimes call a vector with n components an n–tuple. The “tuple” part
of n–tuple comes from quadruple, quintuple, sextuple, septuple, etc..
In this chapter we will mostly use the arrow interpretation of vectors, i.e. the
notion of directional displacement. Since the arrow really only represents displacement, it doesn’t matter where we put the tail of the arrow. In fact you can translate
(or move) a vector in R2 all around the plane and it remains the same vector, as
long as you don’t rotate it or scale it. Similarly for a vector with three components
or any number of components for that matter. Thus the defining characteristic of
a vector is its direction and magnitude (or length).
Definition 4.4. The magnitude of a vector is the distance from the origin to the
point in Rn the vector represents. Magnitude is denoted by |~v | and is computed
61
62
4. Vector Spaces
via the following generalization of the Pythagorean theorem:
! 12
n
q
X
vi2
|~v | = v12 + v22 + · · · + vn2 =
i=1
Example 4.5. Given the vector ~v = (3, −2, 4) ∈ R3 ,
p
√
|~v | = 32 + (−2)2 + 42 = 29
4
Definition 4.6 (vector addition). Two vectors are added component–wise, meaning that given two vectors say, ~u = (u1 , u2 , . . . , un ) and ~v = (v1 , v2 , . . . , vn ), then
~u + ~v = (u1 + v1 , u2 + v2 , . . . , un + vn ).
Notice that this way of defining addition of vectors only makes sense if both
vectors have the same number of components.
Definition 4.7 (scalar multiplication). A vector can be scaled by any real
number, meaning that if a ∈ R and ~u = (u1 , u2 , . . . , un ), then
a~u = (au1 , au2 , . . . , aun ).
Scaling literally corresponds to stretching or contracting the magnitude of the
vector. Scaling by a negative number is scaling and reflecting. If the vector is a
pair, then the reflection corresponds to a reflection about the line y = −x which
when written in general form is x + y = 0. A reflection in 3–space corresponds to
a reflection about the plane x + y + z = 0, and so on and so forth.
Definition 4.8. A vector space is a nonempty set, V , of vectors, along with the
operations of vector addition, and scalar multiplication which satisfies the following
requirements for all ~u, ~v , w
~ ∈ V , and for all scalars a, b ∈ R.
(1) ~u + ~v = ~v + ~u (commutativity)
(2) (~u + ~v ) + w
~ = ~u + (~v + w)
~ (additive associativity)
~
(3) ~v + 0 = ~v (additive identity)
(4) ~v + −~v = ~0 (additive inverses)
(5) a(~u + ~v ) = a~u + a~v (distributivity over vector addition)
(6) (a + b)~u = a~u + b~u (distributivity over scalar addition)
(7) (ab)~u = a(b~u) (multiplicative associativity)
(8) 1~u = ~u (multiplicative identity)
1. Basics
63
Remark 4.9. Notice that the definition for a vector space does not require a way
of multiplying two vectors to yield another vector. If you studied multi–variable
calculus then you may be familiar with the cross product, which is a form of vector
multiplication, but the definition of a vector space does not mention the cross
product. However, being able to scale vectors by real numbers is a vital piece of
the vector space definition.
A vector space is probably your first introduction to a mathematical definition
involving a “set with structure”. The structure here is provided by the operations
of vector addition and scalar mulitplication, as well as the eight requirements that
they must satisfy. We will see at the end of this chapter that a vector space isn’t
defined by the objects in the set as much as by the rigid relationships between these
objects that the eight requirements enforce.
In short, a vector space is a nonempty set that is closed under vector addition
and scalar multiplication. Here, “closed” means that if you take any two vectors in
the vector space and add them, then their sum will also be a vector in the vector
space. Likewise if you scale any vector in the vector space, then the scaled version
will also be an element of the vector space.
Definition 4.10. A linear combination of vectors is a scaled sum of vectors.
For example, a linear combination of the vectors ~v1 , ~v2 , . . . , ~vn could be written:
c1~v1 + c2~v2 + · · · + cn~vn , where c1 , c2 , . . . , cn are real numbers (scalars).
The concept defined above, of generating a new vector from other vectors is a
foundational concept in the study of Linear Algebra. It will occur throughout the
rest of this book.
Given this definition, one can think of a vector space as a nonempty set of
vectors that is closed under linear combinations. Which is to say that any linear
combination of vectors from a vector space will again be an element of the vector
space. You may be wondering why we make the requirement that a vector space
must be a nonempty set. This is because every vector space must contain the zero
vector, ~0. Since vector spaces are closed under scalar multiplication, any nonempty
vector space will necessarily contain the zero vector. This is because you can always
take any vector and scale it by the scalar, 0, to get the vector, ~0. As we develop
the theory, you will see that it makes more sense for the set {~0} to be the smallest
possible vector space rather than the empty set {}. This is the reason for requiring
vector spaces to be nonempty.
Linear combinations are important because they give us a new geometric perspective on the calculations we did in the previous chapter.
64
4. Vector Spaces
Linear System
1x + 1y = 3
3x + 0y = 3
Matrix Equation
1 1 x
3
=
3 0 y
3
⇐⇒
Linear Combination
1
1
3
x
+y
=
3
0
3
⇐⇒
`1 : y = −x + 3
`2 : x = 1
y
`1
x~u + y~v = w
~
y
`2
(3, 3)
3
3
(1, 2)
2
2
1
1
1
2
3
4
w
~
~v
x
0
~u
0
~v
1
x
2
3
4
2. Linear Independence
The following definition is one of the most fundamental concepts in all of linear
algebra and will be used again and again. You must memorize it.
Definition 4.11. A set of vectors, {~v1 , . . . , ~vn } is linearly dependent if there exist
scalars, c1 , . . . , cn not all 0, such that
(57)
c1~v1 + c2~v2 + · · · + cn~vn = ~0.
Remark 4.12. Clearly if c1 = c2 = · · · = cn = 0, that is if all the scalars are zero,
then the equation is true, thus the “not all 0” phrase of the definition is key.
A linearly dependent set of vectors is a set that possesses a relationship amongst
the vectors. Specifically, you can write any vector in the set as a linear combination
of the other vectors. For example we could solve equation (57) for ~v2 , as follows:
c1
c3
cn
(58)
~v2 = − ~v1 − ~v3 − · · · − ~vn .
c2
c2
c2
Of course, ~v2 is not special. We can solve equation (57) for any of the vectors. Thus,
one of the vectors is redundant because we can generate it via a linear combination
of the others. It may be the case that there are other vectors in the set that are
also redundant, but at least one is.
So how does one figure out if there are scalars, c1 , . . . , cn not all zero such that
equation (57) is satisfied?
Example 4.13. Determine whether the following set of four vectors in R3 is linearly
dependent.
       
0
3
0 
 1
 0 , 2 , 0 , 1


−1
0
1
1
4. Affine Spaces
65
We must figure out if there exist scalars c1 , c2 , c3 , c4 not all zero such that
 
 
 
   
1
0
3
0
0
(59)
c1  0 + c2 2 + c3 0 + c4 1 = 0 .
−1
0
1
1
0
Let’s rewrite the last equation by absorbing the scalars into the vectors and summing them:

  
1 · c1 + 0 · c2 + 3 · c3 + 0 · c4
0
 0 · c1 + 2 · c2 + 0 · c3 + 1 · c4  = 0 .
−1 · c1 + 0 · c2 + 1 · c3 + 1 · c4
0
The last equation corresponds to a linear system of three equations in four unknowns
c1 , . . . , c 4 !

 1c1 + 0c2 + 3c3 + 0c4 = 0
0c1 + 2c2 + 0c3 + 1c4 = 0

−1c1 + 0c2 + 1c3 + 1c4 = 0
And this system of equations can be written as a coefficient matrix times a column
vector of the unknowns:
 
 c1

 
1 0 3 0  
0
c
2
 0 2 0 1   = 0 .
(60)
c3 
−1 0 1 1
0
c4
Equations (59) and (60) are two different ways of writing the exact same thing!
In other words, multiplying a matrix by a column vector is equivalent to making
a linear combination of the columns of the matrix. The columns of the matrix in
equation (60) are exactly the vectors of equation (59).
We can write this as an augmented matrix and perform elemetary row operations to determine the solution set if any. However, since it is a homogeneous
system, no matter what elementary row ops we apply, the rightmost column will
always remain as all zeros, thus there is no point in writing it. Instead we only
need to perform row ops on the coefficient matrix.
4
3. Vector Subspaces
4. Affine Spaces
There is an interesting connection between solutions to homogeneous and nonhomogeneous linear systems.
Lemma 4.14. If ~u and ~v are both solutions to the nonhomogeneous equation,
(61)
A~x = ~b,
then their difference ~y = ~u − ~v is a solution to the associated homogeneous system:
(62)
A~x = ~0,
66
4. Vector Spaces
Proof. This is a simple consequence of the fact that matrix multiplication distributes over vector addition.
A~y = A(~u − ~v )
= A~u − A~v
= ~b − ~b
= ~0
This idea of the lemma is illustrated in figure 4.1 for the system:
1 2 x
12
(63)
=
,
0 0 y
0
which has the following associated homogeneous system:
1 2 x
0
(64)
=
.
0 0 y
0
y
6
x + 2y = 12
(4, 4)
4
~u
(−4, 2)
(8, 2)
2
~v
~u − ~v
x
−4
−2
0
2
4
6
8
10
12
~v − ~u
−2
(4, −2)
x + 2y = 0
Figure 4.1. Affine solution space for equation 63 and vector subspace of
solutions to equation 64.
5. Bases and Dimension
We know that a vector space is simply a nonempty set of vectors that is closed under
taking linear combinations, a natural question to ask is, “Is there some subset of
vectors which allow us to generate (via linear combinations) every vector in the
vector space?”. Since a set is always considered a subset of itself, the answer to
this question is clearly yes, because the vector space itself can generate any vector
in the vector space. But can we find a proper subset, or perhaps even a finite
6. Abstract Vector Spaces
67
subset from which we can generate all vectors in the vector space by taking linear
combinations?
The answer to this question is yes, in fact such a set exists for every vector
space, although it might not always be finite.
6. Abstract Vector Spaces
When we first defined a vector, we defined it to be an ordered list of real numbers,
and we noted that the definining characteristic was that a vector had a direction
and a magnitude. We then proceeded to define a vector space as a set of objects
that obeyed eight rules. These eight rules revolved around addition or summing
and multiplication or scaling. Finally, we observed that all but one of these eight
rules could be summed up by saying that a vector space is a nonempty set of vectors
that is closed under linear combinations (scaled sums). Where “closed” meant that
if ~u, ~v ∈ V , then ~u + ~v ∈ V and if c ∈ R, then c~v ∈ V . The one vector space
requirement that this “definition” does not satisfy is the very first requirement that
addition must be commutative, i.e. ~u + ~v = ~v + ~u for all ~u, ~v ∈ V . It turns out
that vector spaces are ubiquitous in mathematics. This section will give several
examples.
Example 4.15 (Matrices as a Vector Space). Consider the set of m × n matrices
with real entries which we will denote, Mmn (R). This set is a vector space. To see
why let A, B be elements of Mmn (R), and let c be any real number then
(1) Mmn (R) is not empty, specifically it contains an m × n matrix made of all
zeros which serves as our zero vector.
(2) This set is closed under addition, A + B ∈ Mmn (R).
(3) This set is closed under scalar multiplication, cA ∈ Mmn (R).
(4) Matrix addition is commutative, A + B = B + A for all matrices in Mmn (R).
More concretely, consider M22 (R), the set of 2 × 2 matrices with real entries. What
subset of M22 (R), is a basis for this vector space? Well, whatever it is it must allow
us to write any matrix as a linear combination of its elements. The simplest choice
is called the standard basis, and in this case the simplest choice is the set:
1 0
0 1
0 0
0 0
B=
,
,
,
0 0
0 0
1 0
0 1
This allow us to write for example,
a b
1 0
0
=a
+b
c d
0 0
0
1
0
+c
0
1
0
0
+d
0
0
0
1
Is B really a basis? Clearly it spans the vector space M22 (R), but is there perhaps a
smaller set which still spans M22 (R)? Also, how do we know that the four matrices
in B are linearly independent?
4
Example 4.16 (Solution Space of Homogeneous, Linear Differential Equations).
Consider the differential equation
(65)
y 00 + y = 0.
68
4. Vector Spaces
You can check that both
y1 = cos x,
and
y2 = sin x
are solutions. But also, any linear combination of these two solutions is again a
solution. To see this let y = ay1 + by2 where a and b are scalars, then:
y = a cos x + b sin x
y 0 = −a sin x + b cos x
y 00 = −a cos x − b sin x
So y 00 + y = (−a cos x − b sin x) + (a cos x + b sin x) = 0, and thus we see that the
set of solutions is nonempty and closed under linear combinations and therefore a
vector space.
4
Notice that if equation (65) were not homogeneous, that is if the right hand
side of the equation were not zero, then the set of solutions would not form a vector
space.
Example 4.17 (Solution Space of Nonhomogeneous, Linear Differential Equations). Consider the differential equation
y 00 + y = ex .
(66)
You can check that both
1
y1 = cos x + ex , and
2
1
y2 = sin x + ex
2
are solutions. However, linear combinations of these two solutions are not solutions.
To see this let y = ay1 + by2 where a and b are scalars, then:
1
1
y = a cos x + ex + b sin x + ex
2
2
1
1
y 0 = a − sin x + ex + b cos x + ex
2
2
1
1
y 00 = a − cos x + ex + b − sin x + ex
2
2
1
1
1
1
y 00 + y = a ex + b ex + a ex + b ex
2
2
2
2
= (a + b)ex
Thus y1 and y2 will only be solutions when a + b = 1.
4
Chapter 5
Higher Order Linear
Differential Equations
1. Homogeneous Differential Equations
Similar to how homogeneous systems of linear equations played an important role
in developing the theory of vector spaces, a similar class of differential equations
will be instrumental in understanding the theory behind solving higher order linear
differential equations.
Recall definition 1.25, which states that a differential equation is defined to be
linear if it can be written in the form:
(22)
an (x)y (n) + an−1 (x)y (n−1) + · · · + a1 (x)y 0 + a0 (x)y = f (x).
Definition 5.1. A linear differential equation is called homogeneous if f (x) = 0.
That is if it can be written in the form:
(67)
an (x)y (n) + an−1 (x)y (n−1) + · · · + a1 (x)y 0 + a0 (x)y = 0.
where the right hand side of the equation is exactly zero.
Similar to how the solution space of a homogeneous system of linear equations
formed a vector subspace of Rn , the solutions of linear, homogeneous differential
equations form a vector subspace of F the vector space of functions of a real variable.
Theorem 5.2. The set of solutions to a linear homogeneous differential equation
(equation 67) form a vector subspace of F, the set of all functions of a real variable.
Proof. For sake of simplicity, we will only prove this for a general second order,
homogeneous differential equation such as:
(68)
a2 (x)y 00 + a1 (x)y 0 + a0 (x)y = 0.
The necessary modifications for a general nth order equation are left as an exercise.
69
70
5. Higher Order Linear Differential Equations
Let V be the set of all solutions to equation 68. We must show three things:
1) V is nonempty, 2) V is closed under vector addition, 3) V is closed under scalar
multiplication.
(1) Clearly the function y(x) ≡ 0 is a solution to equation 68, thus V is nonempty.
(2) Let y1 , y2 ∈ V and let y = y1 + y2 . If we plug this into equation 68 we get:
a2 (x)y 00 + a1 (x)y 0 + a0 (x)y
= a2 (x)(y1 + y2 )00 + a1 (x)(y1 + y2 )0 + a0 (x)(y1 + y2 )
= a2 (x)(y100 + y200 ) + a1 (x)(y10 + y20 ) + a0 (x)(y1 + y2 )
= [a2 (x)y100 + a1 (x)y10 + a0 (x)y1 ] + [a2 (x)y200 + a1 (x)y20 + a0 (x)y2 ]
=0
(3) Let y1 ∈ V, α ∈ R and let y = αy1 . If we plug y into equation 68 we get:
a2 (x)(αy1 )00 + a1 (x)(αy1 )0 + a0 (x)(αy1 )
= αa2 (x)y100 + αa1 (x)y10 + αa0 (x)y1
= α[a2 (x)y100 + a1 (x)y10 + a0 (x)y1 ]
=0
Now that we know that the set of solutions of a linear, homogeneous equation
form a vector space, the next obvious thing to do is figure out a way to generate a
basis for the solution space. Then we will be able to write the general solution as
a linear combination of the basis functions. For example, suppose a basis contains
three functions y1 (x), y2 (x) and y3 (x), then a general solution would have the form:
c1 y1 (x) + c2 y2 (x) + c3 y3 (x).
At this point we don’t even know how to determine the dimension of a basis for
the solution space let alone actual basis functions. Thus, it makes sense to impose
some simplifying assumptions. Instead of considering general nth order, linear,
homogeneous equations like equation 67, let’s only consider second order equations.
In fact, it will be advantageous to be even more restrictive, let’s only consider
equations with constant coefficients. Thus let’s consider DEs of the form:
(69)
y 00 + ay 0 + by = 0,
where a and b are elements of R, the set of real numbers.
2. Linear Equations with Constant Coefficients
2.1. Linear Operators. If we let D =
equation 69 like so,
(70)
d
dx ,
and let D2 =
d2
dx2 ,
then we can write
D2 y + aDy + by = 0.
Each term on the left now involves y, instead of different derivatives of y. This
allows us to rewrite the equation as:
(71)
(D2 + aD + b)y = 0.
2. Linear Equations with Constant Coefficients
71
Equation 71 deserves some explanation. It is analogous to the matrix equation:
(A2 + aA + b)~v = A2~v + aA~v + b~v = ~0,
where we have replaced differention, (D) with a square matrix A and the function
y is replaced with the vector, ~v . The expression (D2 + aD + b) is a function with
domain F and codomain (range) also F. In other words it is a function on functions.
You should already be familiar with the notion of the derivative as a function on
functions. The only thing new we have introduced is combining multiple derivatives
along with scalars into a single function. Also, we are using multiplicative notation
to denote function application. That is, we will usually forgo the parentheses in
(D2 + aD + b)(y) in favor of (D2 + aD + b)y. Thus we can rewrite a second order,
linear, homogeneous DE with constant coefficients in two equivalent ways:
y 00 + ay 0 + by = 0
⇐⇒
(D2 + aD + b)y = 0.
Finally, if we let
L = D2 + aD + b,
(72)
then we can write equation 69 very compactly:
Ly = 0.
Solving this DE amounts to figuring out the set of functions which get mapped to
the zero function, by the linear operator L = D2 + aD + b.
Definition 5.3. A linear operator is a function often denoted L with signature:
L:F →F
that is, it is a function which maps functions to other functions such that
L(c1 y1 + c2 y2 ) = c1 Ly1 + c2 Ly2 ,
where c1 , c2 are scalars and y1 , y2 are functions.
Remark 5.4. The linear operator notation: (D2 + aD + b) is simply shorthand for
a function on functions. In other words it is a name for a function similar to how we
might associate the symbol f with the expression x2 + 1 by writing f (x) = x2 + 1.
The difference is that the name also serves as the function definition.
Remark 5.5. The linear operator notation: (D2 + aD + b) does not represent
an expression that can be evaluated. Recall that this is analogous to the matrix
equation A2 + aA + b, but although the sum A2 + aA is defined for square n × n
matrices the second sum, aA + b is not even defined. For example, it makes no
sense to add a 2 × 2 matrix to a scalar. Similarly it makes no sense to add the
derivative operator to a scalar. However it does make sense to add the derivative
of a function to a scalar multiple of a function.
Of course, this notation works for first order and higher order differential equations as well. For example,
y
(4)
y 0 − 3y = 0
⇐⇒
(D − 3)y = 0,
00
⇐⇒
(D4 + 6D2 + 12)y = 0.
+ 6y + 12y = 0
72
5. Higher Order Linear Differential Equations
Lemma 5.6. First order linear operators commute. That is, if a and b are any
real numbers and y is a function, then
(73)
(D − a)(D − b)y = (D − b)(D − a)y
Proof.
(D − a)(D − b)y = (D − a)(Dy − by)
= D2 y + D(−by) − a(Dy) + aby
= (D2 y − D(ay))) − (bDy − bay)
= D(Dy − ay) − b(Dy − ay)
= (D − b)(Dy − ay)
= (D − b)(D − a)y
This lemma provides a method for finding the solutions of linear, homogeneous
DEs with constant coefficients.
Example 5.7. Suppose we wish to find all solutions of
(74)
y 000 − 2y 00 − y 0 + 2y = 0.
First rewrite the equation using linear operator notation,
(D3 − 2D2 − D + 2)y = 0
and then factor the linear operator exactly like you factor a polynomial.
(D − 2)(D2 − 1)y = 0
(D − 2)(D − 1)(D + 1)y = 0
Since first–order, linear operators commute, there are exactly three solutions to this
equation. This is because if any one of the linear operators map y to the constant
zero function, then all following operators will as well. In other words (D − a)0 = 0.
Thus the three solutions are:
(D − 2)y = 0 ⇔ y 0 = 2y
(D − 1)y = 0 ⇔ y 0 = y
(D + 1)y = 0 ⇔ y 0 = −y
⇔ y = c1 e2x
⇔ y = c2 ex
⇔ y = c3 e−x .
Therefore the general solution of equation 74 is a linear combination of these three
solutions:
(75)
y(x) = c1 e2x + c2 ex + c3 e−x .
4
The previous example leads us to believe that solutions to linear, homogeneous
equations with constant coefficients will have the form y = erx . If we adopt the
notation:
p(D) = an Dn + an−1 Dn−1 + · · · + a1 D + a0 ,
2. Linear Equations with Constant Coefficients
73
then we see that we can write a homogeneous linear equation as p(D)y = 0. In particular, we can write second order, homogeneous, linear equations in the following
way
a2 y 00 + a1 y 0 + a0 y = 0
(a2 D2 + a1 D + a0 )y = 0
p(D)y = 0
This means that we can interpret p(D) as a linear operator.
2.2. Repeated Roots. What if our linear operator has a repeated factor? For
example (D − r)(D − r)y = 0, does this mean that the differential equation
(76)
(D − r)2 y = 0
⇐⇒
y 00 − 2ry 0 + r2 y = 0
only has solution y1 = erx ? This is equivalent to saying that the solution space is
one dimensional. But could the solution space still be two dimensional? Let’s guess
that there is another solution y2 of the form
y2 = u(x)y1 ,
where u(x) is some undetermined function, but with the restriction u(x) 6≡ c. We
must not allow u(x) to be a constant function, otherwise y1 and y2 will be linearly
dependent and will not form a basis for the solution space.
(D − r)2 y2 = (D − r)2 u(x)y1
= (D − r)2 u(x)erx
= (D − r)[(D − r)u(x)erx ]
= (D − r)[Du(x)erx − ru(x)erx ]
= (D − r)[u0 (x)erx + ru(x)erx − ru(x)erx ]
= (D − r)[u0 (x)erx ]
= D[u0 (x)erx ] − ru0 (x)erx
= u00 (x)erx + ru0 (x)erx − ru0 (x)erx
= u00 (x)erx
= [D2 u(x)]erx = 0.
It follows that if y2 = u(x)y1 (x) is to be a solution of equation 76, then:
D2 u(x) = 0.
In other words u(x) must satisfy u00 (x) = 0. We already know that degree one
polynomials satisfy this constraint. Thus u(x) can be any linear polynomial, for
example:
u(x) = a0 + a1 x.
Hence y2 (x) = (a0 + a1 x)erx and thus our general solution to equation 76 is a linear
combination of the two solutions y1 and y2 :
74
5. Higher Order Linear Differential Equations
y = c1 y 1 + c2 y 2
= c1 erx + c2 (a0 + a1 x)erx
= c1 erx + c2 · a0 erx + c2 · a1 xerx
= (c1 + c2 · a0 )erx + (c2 · a1 )xerx
= (c∗1 + c∗2 x)erx
Notice that the general solution is equivalent to y2 alone. Since y2 necessarily has
two unknowns in it (a0 and a1 from the linear polynomial), this is reasonable. Hence
the general solution to a second order, linear equation with the single repeated root
r in its characteristic equation is given by:
y = (c1 + c2 x)erx .
The steps above can be extended to the situation where a linear operator consists of a product of k equal first order linear operators.
Lemma 5.8. If the characteristic equation of a linear, homogeneous differential
equation with constant coefficients has a repeated root of multiplicity k, for example
(77)
(D − r1 )(D − r2 ) · · · (D − rn )(D − r)k y = 0,
then the part of the general solution corresponding to the root r has the form
(78)
(c1 + c2 x + c3 x2 + · · · + ck xk−1 )erx .
2.3. Complex Conjugate Roots.
3. Mechanical Vibrations
A good physical example of a second order, linear differential equation with constant
coefficients is provided by a mass, spring and dashpot setup as depicted below. A
dashpot is simply a piston like device which provides resistance proportional to the
rate at which it is compressed or pulled. It is like the shock absorbers found in
cars. In this setup there are three separate forces acting on the mass.
k
c
m
Figure 3.1. Mass, spring, dashpot mechanical system
(1) Spring: Fs = −kx
(2) Dashpot: Fd = −cx0
(3) Driving Force: F (t)
The spring provides a restorative force meaning that its force is proportional to
but in the opposite direction of the displacement of the mass. Similarly, the force
due to the dashpot is proportional to the velocity of the mass, but in the opposite
direction. Finally, the driving force could correspond to any function we are capable
3. Mechanical Vibrations
75
of generating by physical means. For example, if the mass is made of iron, then we
could use an electromagnet to periodically push and pull it in a sinusoidal fashion.
We use Newton’s second law, which states that the sum of the forces applied
to a body is equal to its mass times acceleration to derive the governing differential
equation.
Σf = ma
−Fs − Fd + F (t) = mx00
−kx − cx0 + F (t) = mx00
mx00 + cx0 + kx = F (t)
(79)
As in the case of solving affine systems in chapter 4 finding the general solution
of equation 79 is a two step process. First we must find all solutions to the associated
homogeneous equation:
mx00 + cx0 + kx = 0.
(80)
Next, we must find a single solution to the nonhomogeneous (or original) equation
and add them together to get the general solution, i.e. family of all solutions.
3.1. Free Undamped Motion. First we will consider the simplest mass, spring
dashpot system, one where there is no dashpot, and there is no driving force. Setting
F (t) = 0 makes the equation homogeneous. In this case, equation 79 becomes
mx00 + kx = 0.
(81)
We define ω0 =
p
k/m, which allows us to write the previous equation as:
x00 + ω02 x = 0.
(82)
The characteristic equation of this DE is r2 + ω02 = 0, which has conjugate pure
imaginary roots and yields the general solution:
(83)
x(t) = A cos ω0 t + B sin ω0 t
It is difficult to graph the solution by hand because it is the sum of two trigonometric
functions. However, we can always write a sum of two sinusoids as a single sinusoid.
That is, we can rewrite our solution in the form:
x(t) = C cos(ω0 t − α),
(84)
which is much easier to graph by hand. We just need a way to compute the
amplitude C and the phase shift α.
What makes this possible is the cosine subtraction trigonometric identity:
(85)
cos(θ − α) = cos θ cos α + sin θ sin α,
which we rearrange to:
(86)
cos(θ − α) = cos α cos θ + sin α sin θ.
76
5. Higher Order Linear Differential Equations
This formula allows us to rewrite our solution, equation 83 as follows:
x(t) = A cos ω0 t + B sin ω0 t
A
B
=C
cos ω0 t + sin ω0 t
C
C
= C (cos α cos ω0 t + sin α sin ω0 t)
= C cos(ω0 t − α)
where the substitutions,
A
= cos α
C
B
= sin α
C
p
C = A2 + B 2
are justified by the right triangle in figure 3.2. The final step follows from the cosine
subtraction formula in equation 86.
C
B
α
A
Figure 3.2. Right triangle for phase angle α
Put info about changing arctan to a version with codomain (0, 2π) here.
3.2. Free Damped Motion.
4. The Method of Undetermined Coefficients
Before we explain the method of undetermined coefficients we need to make a simple
observation about nonhomogeneous or driven equations such as
(87)
y 00 + ay 0 + by = F (x).
Solving such equations where the right hand side is nonzero will require us to actully
find two different solutions yp and yh . The p stands for particular and the h stands
for homogeneous. The following theorem explains why.
Theorem 5.9. Suppose yp is a solution to:
(88)
y 00 + ay 0 + by = F (x).
And suppose y1 and y2 are solutions to the associated homogeneous equation:
(89)
y 00 + ay 0 + by = 0.
4. The Method of Undetermined Coefficients
77
Then the function defined by
y = yh + yp
(90)
y = c1 y1 + c2 y2 +yp
|
{z
}
yh
is a solution to the original nonhomogeneous system, equation 88.
Proof.
(c1 y1 + c2 y2 + yp )00 + a(c1 y1 + c2 y2 + yp )0 + b(c1 y1 + c2 y2 + yp )
=(c1 y100 + c2 y200 + yp00 ) + a(c1 y10 + c2 y20 + yp0 ) + b(c1 y1 + c2 y2 + yp )
=c1 (y100 + ay10 + by1 ) + c2 (y200 + ay20 + by2 ) + (yp00 + ayp0 + byp )
=c1 · 0 + c2 · 0 + F (x)
=F (x)
Solving homogeneous, linear DEs with constant coefficients is simply a matter
of finding the roots of the characteristic equation and then writing the general
solution according to the types of roots and their multiplicities. But the method
relies entirely on the fact that the equation is homogeneous, that is that the right
hand side of the equation is zero. If we have a driven or nonhomogeneous equation
such as
(91)
y (n) + an−1 y (n−1) + · · · + a1 y 0 + a0 y = F (x)
then we can no longer rely upon factorization techniques to solve the characteristic
equation:
rn + an−1 rn−1 + · · · + a1 r + a0 = F (x).
(92)
y = yh + yp .
Example 5.10. Find a particular solution of
(93)
y 00 + y 0 − 12y = 2x + 5.
A particular solution will have the same form as the forcing function which in this
case is F (x) = 2x + 5, that is it will have the form:
yp = Ax + B.
Here A and B are real coefficients which are as of yet “undetermined”, hence the
name of the method. Our task is to determine what values for A and B will make yp
a solution of equation 93. We can determine values for the undetermined coefficients
by differentiating our candidate function (twice) and plugging the derivatives into
equation 93:
4
78
5. Higher Order Linear Differential Equations
5. The Method of Variation of Parameters
The method of undetermined coefficients examined in the previous section relied
upon the fact that the forcing function f (x) on the right hand side of the differential
equation had a finite number of linearly independent derivatives. What if this isn’t
the case? For example consider the equation
(94)
y 00 + P (x)y 0 + Q(x)y = tan x.
The derivatives of tan x are as follows:
sec2 x, 2 sec2 x tan x, 4 sec2 x tan2 x + 2 sec4 x, . . .
These functions are all linearly independent. In fact, tan x has an infinite number
of linearly independent derivatives. Thus, clearly, the method of undetermined
coefficients won’t work as a solution method for equation 94.
The method of variation of parameters can handle this situation. It is a more
general solution method, so in principle it can be used to solve any linear, non–
homogeneous differential equation, but the method does force us to compute indefinite integrals, so it does not always yield closed form solutions. However, it will
allow us to solve linear equations with non–constant coefficients such as:
(95)
y (n) + pn−1 (x)y (n−1 + · · · + p2 (x)y 00 + p1 (x)y 0 + p0 (x)y = f (x)
Recall that the general solution to equation 95 will have the form
y = yh + yp
where yh is a solution to the associated homogeneous equation and is obtained via
methods explained previously if the coefficients are all constant. If they are not
all constants, then your only recourse at this point will be trial and error. This
method assumes we already have a set of n linearly independent solutions to the
associated homogeneous equation. The method of variation of parameters is only
a method for finding a particular solution, yp .
For sake of simplicity, we first derive the formula for a general second order
linear equation, such as:
(96)
y 00 + P (x)y 0 + Q(x)y = f (x).
We begin by assuming, or guessing that a particular solution might have the form
(97)
yp = u1 (x)y1 + u2 (x)y2 ,
where u1 , u2 are unknown functions, and y1 and y2 are known, linearly independent
solutions to the homogeneous equation associated with equation 96.
Our goal is to determine u1 and u2 . Since we have two unknown functions,
we will need two equations which these functions must satisfy. One equation is
obvious, our guess for yp must satisfy equation 96, but there is no other obvious
equation. However, when we plug our guess for yp into equation 96, then we will
find another equation which will greatly simplify the calculations.
5. The Method of Variation of Parameters
79
Before we can plug our guess for yp into equation 96 we need to compute two
derivatives of yp :
yp0 = u01 y1 + u1 y10 + u02 y2 + u2 y20
yp0 = (u1 y10 + u2 y20 ) + (u01 y1 + u02 y2 ).
Since we have the freedom to impose one more equation’s worth of restrictions on
u1 and u2 , it makes sense to impose the following condition:
u01 y1 + u02 y2 = 0,
(*)
because then when we compute yp00 it won’t involve second derivatives of u1 or u2 .
This will make solving for u1 and u2 possible. Assuming condition (*), yp0 and yp00
become:
yp0 = u1 y10 + u2 y20
(98)
yp00 = u01 y10 + u1 y100 + u02 y20 + u2 y200
yp00 = u01 y10 + u02 y20 + u1 y100 + u2 y200
Recall that by assumption, y1 and y2 both satisfy the homogeneous version of
equation 96, thus we can write:
yi00 = −P (x)yi0 − Q(x)yi
Substituting this in for
(99)
yp00
yp00
=
=
u01 y10
u01 y10
+
+
y100
u02 y20
u02 y20
and
y200
for i = 1, 2.
in the equation above yields:
+ u1 (−P (x)y10 − Q(x)y1 ) + u2 (−P (x)y20 − Q(x)y2 )
− P (x)(u1 y10 + u2 y20 ) − Q(x)(u1 y1 + u2 y2 ).
If we plug yp , yp0 and yp00 found in equations 97, 98 and 99 into the governing
equation, 96, then we get:
((
(((
y1(+(u2 y2 )
−P
y10(+ u2 y20 )−Q(x)(u
yp00 = u01 y10 + u02 y20 (
((1(
((x)(u
(((1(
(
(((
+P
y10(+ u2 y20 )
P (x)yp0 =
((1(
((x)(u
(
((
(
+Q(x)(u
y1(+(u2 y2 )
+Q(x)yp =
(
1
(
(
(
f (x) = u01 y10 + u02 y20
The last line above is our second condition which the unknowns u1 and u2 must
satisfy. Combining the above condition with the previous condition (*), we get the
following linear system of equations:
u01 y1 + u02 y2 = 0
u01 y10 + u02 y20 = f (x)
Which when written as a matrix equation becomes:
y1 y2 u01
0
=
f (x)
y10 y20 u02
Notice that this system will have a unique solution if and only if the determinant
of the 2 × 2 matrix is nonzero. This is the same condition as saying the Wronskian,
W = W (y1 , y2 ) 6= 0. Since y1 and y2 were assumed to be linearly independent this
80
5. Higher Order Linear Differential Equations
will be guaranteed. Therefore we can solve the system by multiplying both sides of
the matrix equation by the inverse matrix:
0
0
1
u1
y2 −y2
0
=
u02
y1 f (x)
W −y10
0
1 −y2 f (x)
u1
=
u02
y1 f (x)
W
Integrating both of these equations with respect to x yields our solution:
Z
Z
y1 f (x)
y2 f (x)
dx
u2 (x) =
dx.
u1 (x) = −
W
W
Assuming these integrals can be computed, then a particular solution to equation 96 will be given by:
(100)
yp (x) =
Z
Z
y2 (x)f (x)
y1 (x)f (x)
−
dx y1 (x) +
dx y2 (x).
W (x)
W (x)
It is interesting to point out that our solution for yp does not depend on the
coefficient functions P (x) nor Q(x) at all. Of course, if P (x) and Q(x) are anything
other than constant functions, then we don’t have an algorithmic way of finding
the required linearly independent solutions to the associated homogeneous equation
anyway. This method is wholly dependent on being able to solve the associated
homogeneous equation.
6. Forced Oscillators and Resonance
Recall the mass, spring, dashpot setup of section 3. In that section we derived the
following governing differential equation for such systems:
(101)
mx00 + cx0 + kx = F (t).
Recall that depending on m, c and k the system will be either overdamped, critically
damped or underdamped. In the last case we get oscillatory behavior. It is this
last case that we are interested in now. Our goal for this section is to analyze the
behavior of such systems when a periodic force is applied to the mass. In particular,
we are interested in the situation where the period of the forcing function almost
matches or exactly matches the natural period of the mass and spring.
There are many ways in which to impart a force to the mass. One clever way to
impart a periodic force to the mass is to construct it such that it contains a vertical,
motorized, flywheel with an off–center, center of mass. A flywheel is simply any
wheel with mass. Any rotating flywheel which does not have its center of mass at
the physical center, will impart a force through the axle to its housing.
For a real life example, consider a top–loading, washing machine. The water
and clothes filled basket is a flywheel. These machines typically spin the load of
clothes to remove the water. Sometimes, however, the clothes become unevenly
distributed in the basket during the spin cycle. This causes the spinning basket to
impart a, sometimes quite strong, oscillatory force to the washing machine. If the
6. Forced Oscillators and Resonance
81
basket is unbalanced in the right (wrong?) way, then the back and forth vibrations
of the machine might even knock the washing machine off its base.
Yet another way to force a mass, spring, dashpot system is to drive or force the
mass via an electromagnet. If the electromagnet has a controllable power source,
then it can be used to drive the mass in numerous different ways, i.e. F (t) can take
on numerous shapes.
Undamped Forced Oscillations. If there is no frictional damping, that is if
c = 0, then the associated homogeneous equation for equation 101,
mx00 + kx = 0,
always results in oscillatory solutions. In this case, we let ω02 = k/m and rewrite
the equation as:
x00 + ω02 x = 0,
which has characteristic equation, r2 + ω02 = 0, and hence has solution xh (t) =
c1 cos ω0 t + c2 sin ω0 t. Here, ω0 is the natural frequency of the mass spring system,
or the frequency at which the system naturally vibrates if pushed out of equilibrium.
When we periodically force the system then we must find a particular solution,
xp , in order to assemble the full solution x(t) = xh (t) + xp (t). The method of
undetermined coefficients tells us to make a guess for xp which matches the forcing
function F (t) and any of its linearly independent derivatives. If the periodic forcing
function is modeled by:
F (t) = F0 cos ωt
ω 6= ω0 ,
then our guess for xp should be: xp = A cos ωt + B sin ωt, however since the governing equation does not have any first derivatives, B will necessarily be zero, thus
we guess: xp = A cos ωt. Plugging this into equation 101, still with c = 0 yields:
−mω 2 A cos ωt + kA cos ωt = F0 cos ωt
so
F0 /m
F0
= 2
.
k − mω 2
ω0 − ω 2
Thus the solution to equation 101, without damping, i.e. c = 0 is:
A=
x(t) = xh (t) + xp (t)
x(t) = c1 cos ω0 t + c2 sin ω0 t +
F0 /m
cos ωt.
ω02 − ω 2
Which with the technique from section 3 can be rewritten:
(102)
x(t) = C cos(ω0 t − α) +
F0 /m
cos(ωt).
ω02 − ω 2
This is an important result because it helps us to understand the roles of the homogeneous and particular solutions! In words, this equation says that the response of
a mass spring system to beingpperiodically forced is a superposition of two separate
responses. Recall that C = c21 + c22 , and similarly α only depends on c1 and c2
which in turn only depend on the initial conditions. Also ω0 is simply a function of
the properties of the mass and spring, thus the function on the left of (102), i.e. xh
represents the system’s response to the initial conditions. Notice that the function
82
5. Higher Order Linear Differential Equations
on the right of (102) depends on the driving amplitude (F0 ), driving frequency (ω)
and also m and k, but not at all on the initial conditions. That is, the function on
the right, i.e. xp is the system’s response to being driven or forced.
The homogeneous solution is the system’s response to being disturbed from
equilibrium. The particular solution is the system’s response to being periodically
driven. The interesting thing is that these two functions are not intertwined in
some complicated way. This observation is common to all linear systems, that is, a
solution function to any linear system will consist of a superposition of the system’s
response to intitial conditions and the system’s response to being driven.
Beats. In the previous solution, we assumed that ω 6= ω0 . We had to do this
so that xp would be linearly independent from xh . Now we will examine what
happens as ω → ω0 , that is if we let the driving frequency (ω) get close to the
natural frequency of oscillation for the mass and spring, (ω0 ). Clearly, as we let
these two frequencies get close, the amplitude of the particular solution blows up!
lim A(ω) = lim
ω→ω0
ω→ω0
F0 /m
= ∞.
ω02 − ω
We will solve for this situation exactly in the next subsection. But what can we
say about the solution when ω ≈ ω0 ?
This is easiest to analyze if we impose the initial conditions, x(0) = x0 (0) = 0
on the solution in (102). If we do so, then it is easy to compute the three unknowns:
F0
c2 = 0
α = π + tan−1 0 = π.
c1 = −
2
m(ω0 − ω 2 )
Recall that cos(ωt − π) = − cos(ωt), hence
F0
F0
xh = C cos(ω0 t − π) =
cos(ω0 t − π) = −
cos ω0 t.
2
2
2
m(ω0 − ω )
m(ω0 − ω 2 )
Therefore, the solution to the IVP is:
F0
x(t) =
[cos ωt − cos ω0 t]
2
m(ω0 − ω 2 )
F0
=
cos 12 (ω0 + ω) − 21 (ω0 − ω) t − cos
m(ω02 − ω 2 )
F0
[cos(A − B) − cos(A + B)]
=
m(ω02 − ω 2 )
F0
=
[2 sin A sin B]
2
m(ω0 − ω 2 )
2F0
=
sin 12 (ω0 + ω)t sin 12 (ω0 − ω)t
2
2
m(ω − ω )
0
2F0
1
=
sin 2 (ω0 − ω)t sin 12 (ω0 + ω)t
m(ω02 − ω 2 )
1
2 (ω0
+ ω) + 12 (ω0 − ω) t
= A(t) sin 21 (ω0 + ω)t.
Here we have used a trigonometric substitution, so that we could write the solution
as the product of two sine waves. We renamed the expression in large square brackets to A(t) which is suggestive of amplitude. Notice that A(t) varies sinusoidally,
6. Forced Oscillators and Resonance
83
but does so at a much slower frequency than the remaining sinusoidal factor. Thus
the solution corresponds to a rapid oscillation with a slowly varying amplitude.
This phenomenon is known as beats.
Figure 6.1. Example of beats
In our mechanical example of a driven mass, spring system, this solution corresponds to the mass moving back and forth at a frequency equal to the average of
the natural frequency and the driving frequency, i.e. (ω0 + ω)/2. However, the amplitude of each oscillation varies smoothly from zero amplitude to some maximum
amplitude and then back again.
When the waves are sound waves, this corresponds to a single pitch played with
varying amplitude or volume. It creates a “wah, wah” kind of sound. Musicians
actually use beats to tune their instruments. For example when tuning a piano or
guitar you can play a note with something known to be at the correct pitch and
then tighten or loosen the string depending on whether the amplitude changes are
getting closer in time or more spread out. Faster beats (amplitude changes) mean
you are moving away from matching the pitch, whereas slower beats correspond to
getting closer to the correct pitch.
Resonance. What if we let the driving frequency match the natural frequency?
That is, what if we let ω = ω0 ? Our governing equation is:
(103)
x00 + ω02 x =
F0
cos ω0 t.
m
Since our usual guess for xp will match the homogeneous solution we must multiply
our guess for xp by t, the independent variable. So our guess, and its derivatives
are:
xp (t) = t(A cos ω0 t + B sin ω0 t)
x0p (t) = (A cos ω0 t + B sin ω0 t) + ω0 t(B cos ω0 t − A sin ω0 t)
x00p (t) = 2ω0 (B cos ω0 t − A sin ω0 t) + ω02 t(−A cos ω0 t − B sin ω0 t).
84
5. Higher Order Linear Differential Equations
Upon plugging these derivatives into equation (103), we get:
(
(
(((
(B
(
(
x00p = 2ω0 (B cos ω0 t − A sin ω0 t) + ω02 t(−A
cos
ω
t
−
sin
ω
t)
(
0
0
(
( (((
(
(
(
(
+ω02 xp = ω02(
t(A cos(
ω(
t(
+ B sin ω0 t)
( (( 0
F0
cos ω0 t= 2ω0 (B cos ω0 t − A sin ω0 t)
m
Thus A = 0 and B = F0 /2mω0 , and our particular solution is:
F0
(104)
xp (t) =
t sin ω0 t.
2mω0
Figure 6.2. An example of resonance. Functions plotted are:
xp (t) = t sin(πt) and the lines x(t) = ±t.
Figure 6.2 shows the graph of xp (t) for the values F0 = ω0 = π, m = 21 . Notice
how the amplitude of oscillation grows linearly without bound, this is resonance.
Physically, the mass spring system has a natural frequency at which it changes
kinetic energy to potential energy and vice versa. When a driving force matches
that natural frequency, work is done on the system and hence its total energy
increases.
7. Damped Driven Oscillators
7. Damped Driven Oscillators
Figure 6.3. http://xkcd.com/228/
85
Chapter 6
Laplace Transforms
The Laplace transform is an integral transform that can be used to solve IVPs.
Essentially, it transforms a differential equation along with initial values into a
rational function. Whereupon the task will be to rewrite the rational function into
its partial fractions decomposition. After rewriting the rational function in this
simplified form, you can then perform the inverse Laplace transform to find the
solution.
Just as with all other solution methods for higher–order, linear, differential
equations, the Laplace transform method reduces the problem to an algebraic one,
in this case a partial fractions decomposition problem.
Unfortunately, the Laplace transform method typically requires more work, or
computation than the previous methods of undetermined coefficients and variation
of parameters. But the Laplace method is more powerful. It will allow us to solve
equations with more complicated forcing functions than before. It is especially
useful for analyzing electric circuits where the power is periodically switched on
and off.
1. The Laplace Transform
Definition 6.1. If a function f (t) is defined for t ≥ 0, then its Laplace transform
is denoted F (s) and defined by the integral:
F (s) = L {f (t)} =
Z
∞
e−st f (t) dt
0
for all values of s for which the improper integral converges.
Notice that the last sentence of the above definition reminds us that improper
integrals do not necessarily converge, i.e. equal a finite number. Thus when computing Laplace transforms of functions, we must be careful to state any assumptions
on s which we make to ensure convergence. Another way to think of this is that the
87
88
6. Laplace Transforms
domain of a transformed function, say F (s) is almost never the whole real number
line.
Example 6.2.
L {k} =
k
s
Recall that both the integral and the limit operator are linear, so we can pull
constants outside of these operations.
Z ∞
L {k} =
e−st k dt
0
b
Z
e−st dt
= k lim
b→∞
0
t=b
1 −st
= k lim − e
b→∞
s
t=0


*0 1
1
−sb
+ 
= k lim − e
b→∞ s
s
=
k
s
s>0
for s > 0.
4
Example 6.3.
L eat =
L eat =
Z
1
s−a
∞
e−st eat dt
Z0 ∞
=
e(−s+a)t dt
let u = (−s + a)t
du = −(s − a)dt
0
∞
−eu
du
s−a
0
−(s−a)t t=b
−e
= lim
b→∞
s−a
t=0


*0
−(s−a)b
1 
 −e
= lim  +

b→∞ s − a
s−a
Z
=
=
1
s−a
s>a
for s > a.
4
Notice that the integral in the previous example diverges if s ≤ a, thus we must
restrict the domain of the transformed function to s > a.
Example 6.4.
L {t} =
1
s2
1. The Laplace Transform
89
This integral will require us to use integration by parts with the following
assignments,
dv = e−st dt
−e−st
.
du = dt v =
s
u=t
L {t} =
Z
∞
e−st t dt
0
t=∞ Z ∞
−te−st
−e−st
=
−
dt
s
s
0
t=0
0Z ∞
*
−e−st
−be−sb
−
dt
= lim b→∞ s
s
0
−st t=b
e
= − lim
s>0
b→∞
s2 t=0


0
 e−sb 1 
s>0
= − lim 
− 2
b→∞  s2
s 
=
1
s2
s>0
for s > 0.
4
We will need to know the Laplace transforms of both sin kt and cos kt where k
is any real number. Each of these transforms can be computed in a straightforward
manner from the definition and using integration by parts twice. However, it is
less work to compute both of them simultaneously by making a clever observation
and then solving a system of linear equations. This way, instead of having to do
integration by parts four times we will only need to do it twice, and it illustrates a
nice relationship between the two transforms. First, we set up each integral in the
definition to be solved via integration by parts.
u= e−st
−st
du= −se
A = L {cos kt} =
Z
dv= cos kt dt
1
dt
v= sin kt
k
∞
e−st cos kt dt
0
Z
b s ∞ −st
1
=
lim e−st sin kt 0 +
e
sin kt dt
k b→∞
k 0
"
#
:0
1
s
−sb
=
lim e
sin k − 0 + L {sin kt} s > 0
b→∞
k
k
s
= B s>0
(see below for definition of B)
k
90
6. Laplace Transforms
u= e−st
−st
du= −se
B = L {sin kt} =
dv= sin kt dt
1
dt
v= − cos kt
k
∞
Z
e−st sin kt dt
0
Z
b s ∞ −st
1
e
cos kt dt
lim e−st cos kt 0 −
k b→∞
k 0
#
"
:0
1
s
−sb
= − lim e cos kb − 1 − L {cos kt} s > 0
k b→∞
k
=−
1
s
− A s>0
k k
Thus we have the following system:
s
A− B =0
k
s
1
A+B = ,
k
k
=
which upon solving and recalling A = L {cos kt} and B = L {sin kt} yields:
s
+ k2
k
L {sin kt}= 2
s + k2
L {cos kt}=
s2
Theorem 6.5. Linearity of the Laplace Transform
If a, b ∈ R are constants, and f and g are any two functions whose Laplace
transforms exist, then:
(105)
L {af (t) + bg(t)} = a L {f (t)} + b L {g(t)} = aF (s) + b G(s),
for all s such that the Laplace transforms of both functions f and g exist.
Proof. Recall that both the integral and the limit operators are linear, thus:
Z ∞
L {af (t) + bg(t)} =
e−st [af (t) + b g(t)] dt
0
Z c
= lim
e−st [af (t) + b g(t)] dt
c→∞ 0
Z c
Z c
−st
= a lim
e f (t) dt + b lim
e−st g(t) dt
c→∞
0
c→∞
0
= a L {f (t)} + b L {g(t)}
= aF (s) + b G(s)
1. The Laplace Transform
Example 6.6.
L {cosh kt} =
91
s
s2 − k 2
ekt + e−kt
L {cosh kt} = L
2
1
=
L ekt + L e−kt
2
1
1
1
=
+
2 s−k s+k
1 s + k + s − k
=
2
s2 − k 2
s
= 2
s − k2
4
Example 6.7.
L {sinh kt} =
k
s2 − k 2
ekt − e−kt
L {sinh kt} = L
2
1 kt =
L e
− L e−kt
2
1
1
1
=
−
2 s−k s+k
1 s + k − s + k
=
2
s2 − k 2
k
= 2
s − k2
4
Theorem 6.8. Translation on the s-Axis
If the Laplace transform of y(t) exists for s > b, then
L eat y(t) = Y (s − a) for s > a + b.
Proof.
L eat y(t) =
Z
∞
e−st eat y(t) dt
0
Z
=
∞
e−(s−a)t y(t) dt
0
= Y (s − a)
We have computed Laplace transforms of a few different functions, but the
question naturally arises, can we compute a Laplace transform for every function?
92
6. Laplace Transforms
The answer is no. So the next logical question is, what properties must a function
have in order for its Laplace transform to exist? This is what we will examine here.
Definition 6.9. A function f (t) is piecewise continuous on an interval [a, b] if
the interval can be divided into a finite number of subintervals such that
(1) f (t) is continuous on the interior of each subinterval, and
(2) f (t) has a finite limit as t approaches each endpoint of each subinterval.
Definition 6.10. A function f (t) is said to be of exponential order a or exponential of order a, if there exists positive constants M, a and T such that
(106)
|f (t)| ≤ M eat
for all t ≥ T.
Theorem 6.11. Existence of Laplace Transforms
2. The Inverse Laplace Transform
Although there is a way to define the inverse Laplace transform as an integral
transform, it is generally not necessary and actually more convenient to use other
techniques, especially table lookup. For example, we already know,
s
,
L {cos kt} = 2
s + k2
so certainly the inverse must satisfy:
s
L −1
= cos kt.
s2 + k 2
Thus, we will define the inverse Laplace transform simply to be the transform
satisfying:
L {y(t)} = Y (s)
⇐⇒
y(t) = L −1 {Y (s)}
That special double headed arrow has a specific meaning in Mathematics. It
is often read, “if and only if”, which has a specific meaning in formal logic, but
the colloquial way to understand it is simply that it means the two statements
it connects are exactly equivalent. This means that one can be interchanged for
the other in any logical chain of reasoning without changing the validity of the
argument.
Let’s do several examples of how to find inverse Laplace transforms.
Example 6.12. Find L −1 1s .
This follows directly from the computation we didin the previous section which
showed that L {k} = ks , thus if k = 1, then L −1 1s = 1.
4
n
o
1
Example 6.13. Find L −1 s+3
.
Again we previously showed L {eat } =
n
o
1
we see that L −1 s−(−3)
= e−3t .
1
s−a
Therefore if we set a = −3, then
4
2. The Inverse Laplace Transform
Example 6.14. Find L −1
n
s
s2 +5
93
o
.
This matches the formula L {cos kt} =
o
n
√
thus L −1 s2s+5 = cos 5t.
Example 6.15. Find L −1
n
s+1
s2 −4
s
s2 +k2
with k =
√
5 so that k 2 = 5,
4
o
.
After a simple rewrite, we see that the transforms of cosh kt and sinh kt apply.
L
−1
s+1
s2 − 4
=L
−1
s
2
s −4
−1
1
2
s −4
+L
1
2
= cosh 2t + L −1
2
2 s −4
2
1 −1
= cosh 2t + L
2
s2 − 4
1
= cosh 2t + sinh 2t
2
4
Example 6.16. Find L −1
n
o
s
(s−2)2 +9 .
This example will rely on the translation on the s-axis theorem, or theorem 6.8
from the previous section, which summarized says:
L eat y(t) = Y (s − a)
eat y(t) = L −1 {Y (s − a)}
L −1
s
(s − 2)2 + 9
s−2+2
(s − 2)2 + 9
(s − 2)
2
−1
+
L
= L −1
(s − 2)2 + 9
(s − 2)2 + 9
2
3
= e2t cos 3t + L −1
3 (s − 2)2 + 9
2
3
= e2t cos 3t + L −1
3
(s − 2)2 + 9
2
= e2t cos 3t + e2t sin 3t
3
= L −1
4
Example 6.17. Find L −1
n
o
1
2
s +4s+8 .
First we complete the square in the denominator.
s2
1
1
1
= 2
=
+ 4s + 8
(s + 4s + 4) − 4 + 8
(s + 2)2 + 4
94
6. Laplace Transforms
Thus,
L −1
1
s2 + 4s + 8
1
(s + 2)2 + 4
2
1
= L −1
2 (s + 2)2 + 4
1
2
= L −1
2
(s + 2)2 + 4
1
= e−2t sin 2t
2
= L −1
4
3. Laplace Transform Method of Solving IVPs
To solve a differential equation via the Laplace transform we will begin by taking
the Laplace transform of both sides of the equation. But by definition, a differential
equation will involve derivatives of some unknown function, say y(t), thus we need to
figure out what the Laplace transform does to derivatives such as y 0 (t), y 00 (t), y 000 (t)
and so on and so forth. We will start, by making a simplying assumption, we will
assume that y 0 (t) is continuous. Later, we will revise the following theorem such
that y(t) is just required to be a piecewise continuous function.
Theorem 6.18. Laplace Transforms of t–Derivatives
If y 0 (t) is continuous, piecewise smooth and of exponential order a, then L {y 0 (t)}
exists for s > a and is given by:
L {y 0 (t)} = s L {y(t)} − y(0)
s > a.
Or the equivalent but more compact form:
(107)
L {y 0 (t)} = sY (s) − y(0)
s > a.
Proof. We begin with the defintion of the Laplace transform and use integration
by parts with the following substitutions:
u= e−st
dv= y 0 (t) dt
du= −se−st
L {y 0 (t)} =
Z
v= y(t)
∞
e−st y 0 (t) dt
0
Z ∞
−st
∞
= e y(t) 0 + s
e−st y(t) dt
0
∞
= e−st y(t) 0 + s L {y(t)}
:0
−sb
= lim e y(b) − y(0) + s L {y(t)}
b→∞
= s L {y(t)} − y(0)
s>0
s > 0.
Where limb→∞ [e−sb y(b)] = 0 provided s > a because y(t) was assumed to be
exponential of order a. The reason we had to assume that y 0 (t) is continuous is
3. Laplace Transform Method of Solving IVPs
95
because we used the second Fundamental Theorem of Calculus to evaluate the
definite integrals, and therefore we need the endpoints of
Example 6.19. Solve the IVP: y 0 − 5y = 0 y(0) = 2
Taking the Laplace transform of both sides of the equation and using the linearity property we get:
L {y 0 − 5y} = L {0}
L {y 0 } − 5 L {y} = 0
sY (s) − y(0) − 5Y (s) = 0
Y (s)[s − 5] = y(0)
2
Y (s) =
s−5
Now we take the inverse Laplace transform of both sides, to solve for y(t):
1
2
= 2 L −1
= 2e5t .
(108)
y(t) = L −1 {Y (s)} = L −1
s−5
s−5
n
o
1
Where in the last step, we used the inverse Laplace transform L −1 s−a
= eat .
4
We can use the previous theorem and the linearity property of the Laplace
transform to compute the Laplace transform of a second derivative. First, let
v 0 (t) = y 00 (t), so v(t) = y 0 (t) + C, where C is a constant of integration, then:
L {y 00 (t)} = L {v 0 (t)}
= s L {v(t)} − v(0)
= s L {y 0 (t) + C} − [y 0 (0) + C]
= s L {y 0 (t)} + s L {C} − [y 0 (0) + C]
C
= s[sY (s) − y(0)] + s − [y 0 (0) + C]
s
2
0
= s Y (s) − sy(0) − y (0).
This formula is worth remembering:
L {y 00 (t)} = s2 Y (s) − sy(0) − y 0 (0)
(109)
Of course we can repeat the above procedure several times to obtain a corollary
to the previous theorem.
Corollary 6.20. Laplace Transforms of Higher Derivatives
If a function y(t) and all of its derivatives up to the (n − 1) derivative are continuous and piecewise
smooth
for t ≥ 0. Further suppose that each is of exponential
order a. Then L y (n) (t) exists when s > a and is:
(110)
L
n
o
y (n) (t) = sn Y (s) − sn−1 y(0) − sn−2 y 0 (0) − · · · − y (n−1) (0).
96
6. Laplace Transforms
If you examine all three results for Laplace transforms of derivatives in this
section, you will notice that if the graph of a function passes through the origin,
that is if y(0) = 0, and assuming y(t) meets the hypotheses of theorem 6.18, then
differentiating in the t domain corresponds to multiplication by s in the s domain,
or L {y 0 (t)} = sY (s). We can sometimes exploit this to our advantage as the next
example illustrates.
Example 6.21.
L teat =
1
(s − a)2
Let y(t) = teat , then y(0) = 0, thus
L {y 0 (t)} = L eat + teat
= L eat + L ateat
1
+ aY (s)
=
s−a
Now using theorem 6.18, y(0) = 0 and the result just calculated, we get:
1
+ aY (s)
s−a
1
(s − a)Y (s) =
s−a
1
Y (s) =
(s − a)2
sY (s) =
4
The previous example exploited the fact that if y(0) = 0, then the Laplace
transform of the derivative of y(t) is obtained simply by multiplying the Laplace
transform of y(t) by s. In symbols this can be concisely stated:
L {y 0 (t)} = sY (s) y(0) = 0.
Thus multiplying by s in the s domain corresponds to differentiating with respect
to t in the t domain, under the precise circumstance y(0) = 0. It is natural to
wonder whether the inverse operation of multiplying by s, namely dividing by s
corresponds to the inverse of the derivative namely integrating in the t domain.
And it does!
Theorem 6.22. Laplace Transforms of Integrals
If y(t) is a piecewise continuous function and is of exponential order a for
t ≥ T , then:
Z t
1
1
L
y(τ ) dτ = L {y(t)} = Y (s) s > a.
s
s
0
The inverse transform way to interpret the previous theorem is simply:
Z t
1
y(τ ) dτ = L −1
Y (s) .
s
0
3. Laplace Transform Method of Solving IVPs
97
Example 6.23. Use theorem 6.22 to find L −1
L
−1
1
2
s(s + 1)
Z
n
t
L
=
−1
0
Z
=
1
s(s2 +1)
o
.
1
2
s +1
dτ
t
sin τ dτ
0
t
= [− cos τ ]0
= − cos t − − cos 0
= 1 − cos t
4
Example 6.24. Solve the following IVP: y 0 + y = sin t
y(0) = 1
First we take the Laplace transform of both sides of the equation.
L {y 0 (t) + y(t)} = L {sin t}
1
by theorem 6.5
L {y 0 (t)} + L {y(t)} = 2
s +1
1
sY (s) − y(0) + Y (s) = 2
by theorem 6.18
s +1
1
sY (s) − 1 + Y (s) = 2
applied initial condition y(0) = 1
s +1
1
+1
(s + 1)Y (s) = 2
s +1
1
1
Y (s) =
+
(s + 1)(s2 + 1) s + 1
Now if we apply the inverse Laplace transform to both sides of the last equation,
we will get y(t) on the left, which is the solution function we seek! But in order to
compute the inverse Laplace transform of the right hand side, we need to recognize
1
=
it as the Laplace transform of some function or sum of functions. Since s+1
1
−t
at
s−(−1) the term on the right has inverse Laplace transform e , (recall L {e } =
1
s−a ). But the term on the left has no obvious inverse Laplace transform. Since
the denominator is a product of irreducible factors, we can do a partial fractions
decomposition. That is,
1
A
Bs + C
=
+ 2
(s + 1)(s2 + 1)
s+1
s +1
2
A(s + 1) + (Bs + C)(s + 1)
1
=
(s + 1)(s2 + 1)
(s + 1)(s2 + 1)
Equating just the numerators yields,
1 = A(s2 + 1) + (Bs + C)(s + 1)
1 = As2 + A + Bs2 + Bs + Cs + C
1 = (A + B)s2 + (B + C)s + (A + C)
98
6. Laplace Transforms
By equating the coefficients of powers of s on both sides we get three equations in
the three unknowns, A, B and C.
A +B
=0
B+C= 0
A
+C= 1
Which you can check by inspection has solution A = 1/2, B = −1/2, C = 1/2.
Thus,
Y (s) =
Y (s) =
Y (s) =
y(t) =
y(t) =
1
2
− 12 s + 21
1
+
s+1
s2 + 1
s+1
1
1
1 s−1
1
−
+
2
2 s+1
2 s +1
s+1
3
1
1
s
1
1
−
+
2 s+1
2 s2 + 1
2 s2 + 1
3 −1
1
1 −1
s
1 −1
1
L
− L
+ L
2
s+1
2
s2 + 1
2
s2 + 1
3 −t 1
1
e − cos t + sin t
2
2
2
+
4
Example 6.25. Solve the following IVP: y 00 (t) + y(t) = cos 2t
y(0) = 0, y 0 (0) = 1.
We proceed by taking the Laplace transform of both sides of the equation and
use the linearity property of the Laplace transform (theorem 6.5).
L {y 00 (t) + y(t)} = L {cos 2t}
s
L {y 00 (t)} + L {y(t)} = 2
s +4
s
s2 Y (s) − sy(0) − y 0 (0) + Y (s) = 2
s +4
s
2
s Y (s) − 1 + Y (s) = 2
s +4
s
(s2 + 1)Y (s) − 1 = 2
s +4
s
s2 + 4
(s2 + 1)Y (s) = 2
+ 2
s +4 s +4
s2 + s + 4
Y (s) = 2
(s + 1)(s2 + 4)
Now we must do a partial fractions decomposition of this rational function.
As + B
s2 + s + 4
Cs + D
= 2
+ 2
(s2 + 1)(s2 + 4)
s +1
s +4
s2 + s + 4 = (As + B)(s2 + 4) + (Cs + D)(s2 + 1)
s2 + s + 4 = (As3 + 4As + Bs2 + 4B) + (Cs3 + Cs + Ds2 + D)
s2 + s + 4 = (A + C)s3 + (B + D)s2 + (4A + C)s + (4B + D)
3. Laplace Transform Method of Solving IVPs
99
   
  

A
0
A
1/3
0
   
  

1
 B  = 1 =⇒ B  =  1 








C
1
C
−1/3
0
D
4
D
0
1
1 s
1 s
1
Y (s) =
−
+
3 s2 + 1 3 s2 + 4 s2 + 1
Finally, we apply the inverse Laplace transform to both sides to yield:
1
1
y(t) = cos t + sin t − cos 2t.
3
3

1
0

4
0
0
1
0
4
1
0
1
0
4
3.1. Electrical Circuits. A series electrical RLC circuit is analogous to a mass,
spring, dashpot system. The resistor with resistance R measured in ohms (ω)
is like the dashpot because it resists the flow of electrons. The capacitor with
capacitance C measured in Farads is like the spring, because it converts electron
flow into potential energy similar to how a spring converts kinetic energy into
potential energy. Finally, the inductor with inductance L measured in Henries is
like the mass, because it resists the flow of electrons initially, but once the current
reaches its maximum, the inductor also resists any decrease in current.
If we sit at some point in the circuit and count the amount of charge which
passes as a function of time, and denote this by q(t), then Kirchoff’s Current Law
and Kirchoff’s Voltage Law yield the following equation for a series RLC circuit.
Lq 00 + Rq 0 +
(111)
1
q = e(t)
C
By definition the current i(t) is the time rate of change of charge q(t), thus:
Z t
dq
0
i(t) =
= q (t) =⇒ q(t) =
i(τ ) dτ.
dt
0
This allows us to rewrite equation 111, in the following way.
Z
1 t
0
i(τ ) dτ = e(t)
(112)
Li + Ri +
C 0
C
Ip
Is
V0 sin(ωt)
V0 sin(ωt)
IL
IC
C
L
L
R
(b) Parallel RLC Circuit
(a) Series RLC circuit
Figure 3.1. RLC Circuits: a Series RLC configuration. b Parallel RLC configuration
IR
R
100
6. Laplace Transforms
Example 6.26. Consider the series RLC circuit shown in figure 3.1, with R =
110Ω, L = 1 H, C = 0.001 F, and a battery supplying E0 = 90 V. Initially there is
no current in the circuit and no charge on the capacitor. At time t = 0, the switch
is closed and left closed for 1 second. At time t = 1 the switch is opened and left
open. Find i(t), the current in the circuit as a function of time.
If we substitute the values for R, L and C into equation 112, then we get:
Z t
(113)
i0 + 110i + 1000
i(τ ) dτ = 90[1 − u(t − 1)].
0
Because
L
Z
t
i(τ ) dτ
0
=
1
I(s),
s
the transformed equation becomes:
sI(s) + 110I(s) + 1000
90
I(s)
=
(1 − e−s ).
s
s
We solve this equation for I(s) to obtain:
90(1 − e−s )
.
+ 110s + 1000
But we can use partial fractions to simplify:
90
1
1
=
−
,
s2 + 110s + 1000
s + 10 s + 100
so we have
1
1
1
1
I(s) =
−
−
− e−s
.
s + 10 s + 100
s + 10 s + 100
Whereupon we take the inverse Laplace transform and get:
h
i
(114)
i(t) = e−10t − e−100t − u(t − 1) e−10(t−1) − e−100(t−1)
I(s) =
s2
See figure 3.2 for the graph of the solution.
Figure 3.2. Current as a function of time in a series RLC circuit.
4
4. Switching
101
4. Switching
Definition 6.27. The unit step function corresponds to an “on switch”. It is
defined by
(
0 t<a
(115)
ua (t) = u(t − a) =
1 t>a
u(t − 2)
2u(t − 1)
2
2
1
1
0
−1
1
2
3
4
t
−1
0
−1
1
2
3
t
4
−1
Figure 4.1. Examples of step functions
This function acts like a switch for turning something on. For example, if you
want to turn on the function f (t) = t2 at time t = 1, then you could multiply f (t)
by u(t − 1). But more likely, you would probably like the function f (t) = t2 to act
as if time begins at time t = 1. This is accomplished by first shifting the input to
f , for example f (t − 1), and then multiplying by u(t − 1).
t2 · u(t − 1)
(t − 1)2 · u(t − 1)
2
2
1
1
0
−1
−1
1
2
3
4
t
0
−1
1
2
3
4
t
−1
Figure 4.2. Switching on t2 versus (t − 1)2 via the step function u(t − 1)
We can also repurpose the unit step function as a way to turn things off.
Lemma 6.28. The unit step function, u(t − a), changes to a “switch off at time
a” function when its input is multiplied by -1.
(
1 t<a
u(a − t) =
0 t>a
Proof. The unit step function is defined to be
(
(
0 t<a
0
u(t − a) =
⇐⇒
u(t − a) =
1 t>a
1
t−a<0
t−a>0
102
6. Laplace Transforms
Multiplying the input by (-1) requires us to flip the inequalities in the above definition yielding:
(
(
0 a−t>0
0 t>a
⇐⇒
u(a − t) =
u(a − t) =
1 a−t<0
1 t<a
5. Convolution
Definition 6.29. Let f (t), g(t) be piecewise continuous functions for t ≥ 0. The
convolution of f with g denoted by f ∗ g is defined by
Z t
(116)
(f ∗ g)(t) =
f (τ )g(t − τ ) dτ.
0
Theorem 6.30 (Convolution is Commutative). Let f (t) and g(t) be piecewise continuous on [0, ∞), then
f ∗ g = g ∗ f.
Proof. We can rewrite the convolution integral using the following substitution:
v =t−τ
⇐⇒
τ =t−v
Z
f ∗g =
=⇒
dτ = −dv.
t
f (τ )g(t − τ ) dτ
0
Z
τ =t
f (t − v)g(v) (−dv)
=
τ =0
When τ = 0, v = t − 0 = t and when τ = t, v = t − t = 0.
Z 0
=−
f (t − v)g(v) dv
t
Z t
=
g(v)f (t − v) dv
0
=g∗f
Theorem 6.31 (The Convolution Theorem). If f (t) and g(t) are piecewise continuous and of exponential order c, then the Laplace transform of f ∗ g exists for
s > c and is given by
(117)
L {f (t) ∗ g(t)} = F (s) · G(s),
or equivalently,
(118)
L −1 {F (s) · G(s)} = f (t) ∗ g(t).
5. Convolution
103
Proof. We start with the definitions of the Laplace transform and of convolution
and get the iterated integral:
Z t
Z ∞
−st
L {f (t) ∗ g(t)} =
e
f (τ )g(t − τ ) dτ dt.
0
0
Next, notice that we can change the bounds of integration on the second integral if
we multiply the integrand by the unit step function u(t − τ ), where τ is the variable
and t is the switch off time (see lemma 6.28):
Z ∞
Z ∞
f (τ )u(t − τ )g(t − τ ) dτ dt,
L {f (t) ∗ g(t)} =
e−st
0
0
Reversing the order of integration gives
Z
Z ∞
L {f (t) ∗ g(t)} =
f (τ )
0
∞
e
−st
u(t − τ )g(t − τ ) dt dτ,
0
The integral in square brackets can be rewritten by theorem ?? as e−sτ G(s), giving
Z ∞
Z ∞
L {f (t) ∗ g(t)} =
f (τ )e−sτ G(s)dτ = G(s)
e−sτ f (τ )dτ = F (s) · G(s).
0
0
Definition 6.32. The transfer function, H(s), of a linear system is the ratio of
the Laplace transform of the output function to the Laplace transform of the input
function when all initial conditions are zero.
X(s)
H(s) =
.
F (s)
−1
Definition 6.33. The function h(t) = L {H(s)}, is called the impulse response
function. It is called this because it is the system’s response to receiving a unit
impulse of force at time zero.
The impulse response function is also the unique solution to the following undriven (homogeneous) IVP:
mx00 + cx0 + kx = 0;
x(0) = 0,
x0 (0) =
1
.
m
1
0
0
*m
2
0
*
*

x (0) + c sX(s) − x(0) + kX(s) = 0
m s X(s) − s
x(0) − 
1
m s2 X(s) −
+ csX(s) + kX(s) = 0
m
(ms2 + cs + k)X(s) = 1
X(s) =
1
ms2 + cs + k
The importance of the impulse response function, h(t) is that once we know
how the system responds to a unit impulse, then we can convolve that response
104
6. Laplace Transforms
with any forcing function, f (t), to determine the system’s response to being driven
in any manner.
Chapter 7
Eigenvalues and Eigenvectors
In the next chapter, we will see how some problems are more naturally modeled
via a system or collection of differential equations. We will solve these systems of
equations by first transforming the system into a matrix equation, then finding the
eigenvalues and eigenvectors which belong to that matrix and finally constructing
a solution from those eigenvalues and eigenvectors.
We will also develop a simple algorithm to transform a single, higher order,
linear, differential equation into a system of first order equations. For example, a
third order, linear equation will transform into a system of three first order, linear
equations. In general, an n–th order, linear differential equation can always be
transformed into a system of n, first order, linear, differential equations. This system can be solved via eigenvalues and eigenvectors and then the reverse algorithm
translates the matrix solution back to the context of the original problem.
Thus the theory behind eigenvalues and eigenvectors has direct application
to solving differential equations, but it actually does much more! In chapter 9,
eigenvalues and eigenvectors will allow us to understand differential equations from
a geometric perspective. Perhaps most surprising is that although the theory arises
from a study of linear systems, it will also allow us to qualitatively understand
nonlinear systems!
A very large number of problems in science and engineering eventually distill
down to “the eigenvalue problem”. From web search to petroleum exploration to
archiving fingerprints to modeling the human heart, the variety of applications of
this theory are so myriad that it would be a daunting task to try to enumerate
them.
1. Introduction to Eigenvalues and Eigenvectors
Let’s start with a square n × n matrix. Recall that such a matrix can be thought of
as a function which maps Rn to itself. Unfortunately, even for the simplest case of
a 2 × 2 matrix, we can’t graph this function like we did the functions in Calculus.
105
106
7. Eigenvalues and Eigenvectors
This is because the graph would have to exist in a four dimensional space. The
graph of a 3 × 3 matrix requires a six dimensional space, and in general the graph
of an n × n matrix requires a 2n dimensional space. Since direct visualization of
matrix mappings is not possible, we must get clever!
A mapping takes an input and maps it to an output. That is, it changes
one thing into another. But sometimes a mapping maps an input back to itself.
Matrices map input vectors to output vectors. Some matrices have special vectors
which get mapped exactly back to themselves, but usually this is not the case.
However, many matrices do map certain vectors to scalar multiples of themselves.
This situation is very common. A vector ~v which gets mapped to a scalar multiple
of itself under the matrix A is called an eigenvector of A. In symbols we write:
(119)
A~v = λ~v .
In the above equation, the symbol, λ (pronounced “lambda”), is the scalar
multiplier of ~v . We call λ the eigenvalue associated with the eigenvector, ~v . An
eigenvalue can be any real number, even zero. However, since ~v = ~0 is always a
solution to equation (119) we will disallow the zero vector from being called an
eigenvector. In other words we are only interested in the nontrival, i.e. non zero–
vector solutions.
3 −1
Example 7.1. Consider the matrix A =
.
−1
3
1
The vector ~v1 =
is an eigenvector of A with eigenvalue λ1 = 2, because:
1
3 −1 1
2
1
A~v1 =
=
=2
= λ1~v1 .
−1
3 1
2
1
−1
The vector ~v2 =
is an eigenvector of A with eigenvalue λ2 = 4, because:
1
3 −1 −1
−4
−1
A~v2 =
=
=4
= λ2~v2 .
−1
3
1
4
1
Any scalar multiple of an eigenvector is again an eigenvector corresponding to the
same eigenvalue. For example,
1 1
1
1/2
~v1 =
=
1/2
2
2 1
is an eigenvector because:
3
A(1/2)~v1 =
−1
−1 1/2
1
1/2
=
=2
= λ1 (1/2)~v1 .
3 1/2
1
1/2
4
The fact that a scalar multiple of an eigenvector is again an eigenvector corresponding to the same eigenvalue is simply a consequence of the fact that scalar
2. Algorithm for Computing Eigenvalues and Eigenvectors
107
multiplication of matrices and hence vectors commutes. That is, if λ, ~v form an
eigenvalue, eigenvector pair for the matrix A, and c is any scalar, then
A~v = λ~v
cA~v = cλ~v
A(c~v ) = λ(c~v ).
2. Algorithm for Computing Eigenvalues and Eigenvectors
Given a square n × n matrix, A, how can we compute its eigenvalues and eigenvectors? We need to solve equation (119): A~v = λ~v , but this equation has two
unknowns: λ which is a scalar and ~v which is a vector. The trick is to transform
this equation into a homogeneous equation and use our knowledge of linear systems.
First we rewrite equation (119) as
A~v − λ~v = ~0.
Notice that both terms on the left hand side involve ~v so let’s factor ~v out:
(A − λ)~v = ~0.
The last equation is problematic because it makes no sense to subtract the scalar
λ from the matrix A! However there is an easy fix. Recall that the identity matrix
I is called the identity exactly because it maps all vectors to themselves. That is,
I~v = ~v for all vectors ~v . Thus we can rewrite the previous two equations as follows:
A~v − λI~v = ~0
(120)
(A − λI)~v = ~0.
Now the quantity in parentheses makes sense because λI is an n×n matrix just like
A. This last linear system is homogeneous and thus at least has solution ~v = ~0, but
by definition we disallow the zero vector from being an eigenvector simply because
it is an eigenvector for every matrix, and thus provides no information about A.
Instead we are interested in the nonzero vectors which solve equation (120).
Chapter 8
Systems of Differential
Equations
1. First Order Systems
A system of differential equations is simply a set or collection of DEs. A first order
system is simply a set of first order, linear DEs. For example,

dx



 dt = 3x − y


dy


= −x + 3y
dt
Solving a first order system is usually not as simple as solving two individual first
order DEs. Notice that in the system above, we cannot solve for x(t) without
also simultaneously solving for y(t). This is because dx/dt depends on two varying
quantities. When at least one of the equations in a system depends on more than
one variable we say the system is coupled.
This system can be rewritten as an equivalent matrix equation:
0 x
3 −1 x
(121)
=
y0
−1
3 y
which in turn can be written in the very compact form:
(122)
where ~x 0 =
0
x
3
,
A
=
y0
−1
~x 0 = A~x
−1
x
and ~x =
.
3
y
When we wish to emphasize the fact that both ~x 0 and ~x are vectors of functions
and not just constant vectors, we will write equation (122) in the following way:
(122)
~x 0(t) = A~x(t)
109
110
8. Systems of Differential Equations
Our method of solution will closely parallel that of chapter 5, where we guessed
what form the solution might take and then plugged that guess into the governing
DE to determine constaints on our guess.
For a system of n first order equations, we guess that solutions have the form
(123)
~x(t) = ~v eλt
where λ is a scalar (possibly complex), and where ~v is an n–dimensional vector of
scalars (again, possibly complex). To be clear we are assuming that ~x(t) can be
written in the following way:
~x(t) = ~v eλt

  
x1 (t)
v1
 x2 (t)   v2 

  
 ..  =  ..  eλt
 .  .
xn (t)
vn

  λt 
v1 e
x1 (t)
 x2 (t)   v2 eλt 

 

 ..  =  .. 
 .   . 
xn (t)
vn eλt
If the vector–valued function ~x(t) = ~v eλt is to be a solution to equation (122) then
its derivative must equal A~x. Computing its derivative yields:
d λt ~x 0(t) =
~v e
dt

 0  
λv1 eλt
x1 (t)
 x02 (t)   λv2 eλt 
 


 ..  =  .. 
 .   . 
x0n (t)
λvn eλt
 0 
 λt 
v1 e
x1 (t)
 x02 (t) 
 v2 eλt 




 ..  = λ  .. 
 . 
 . 
x0n (t)
vn eλt
~x 0(t) = λ~v eλt
(124)
~x 0(t) = λ~x(t)
Equating the right hand sides of equation (122) and equation (124) gives:
(125)
A~x(t) = λ~x(t)
which is the eigenvalue–eigenvector equation from chapter 7. The only difference
being that now the eigenvector is a vector of functions of t as opposed to just
scalars. Since the solutions of equation (125) are actually vector–valued functions
we will usually refer to them as eigenfunctions rather than eigenvectors, however
it is common to just say eigenvector as well.
1. First Order Systems
111
Guessing that our solution has the form ~x(t) = ~v eλt forces our solution to
satisfy equation (125). Thus, solving a system of first order DEs is equivalent
to computing the eigenvalues and eigenvectors of the matrix A which encodes the
salient features of the system. We know from chapter 7 that if A is an n × n matrix,
then it will have n linearly independent eigenvectors. The set of eigenpairs,
n
o
{λ1 , ~v1 }, {λ2 , ~v2 }, . . . , {λn , ~vn }
allow us to form a basis of eigenfunctions,
o
n
~x1 (t) = ~v1 eλ1 t , ~x2 (t) = ~v2 eλ2 t , . . . , ~xn (t) = ~vn eλn t
which span the solution space of equation (125). Since the eigenfunctions form a
basis for the solution space, we can express the solution ~x(t) as a linear combination
of them,
(126)
~x(t) = c1 ~x1 (t) + c2 ~x2 (t) + · · · + cn ~xn (t)
~x(t) = c1~v1 eλ1 t + c2~v2 eλ2 t + · · · + cn~vn eλn t
Example 8.1. Let’s solve the example system from the beginning of the chapter,
which we reproduce here, but let’s also add initial values. Since we have two first
order DEs, we need two initial values.
0 x
3 −1 x
(121)
=
x(0) = 6, y(0) = 0
y0
−1
3 y
For convenience we will refer to the coefficient matrix above as A. Computing the
eigenvalues of matrix A yields two distinct eigenvalues λ1 = 2 and λ2 = 4, because
3 − λ
−1 = (3 − λ)2 − 1
|A − λI| = −1
3 − λ
= (λ − 3)2 − 1
= λ2 − 6λ + 8
= (λ − 2)(λ − 4) = 0.
Solving the eigenvector equation, (A − λI)~v = ~0 for each eigenvalue yields
1 −1
1 −1
1
• λ1 = 2 : A − 2I =
∼
=⇒ ~v1 =
−1
1
0
0
1
−1 −1
1 1
1
• λ2 = 4 : A − 4I =
∼
=⇒ ~v2 =
−1 −1
0 0
−1
Thus the general solution is
~x(t) = c1~v1 eλ1 t + c2~v2 eλ2 t
x(t)
1 2t
1 4t
= c1
e + c2
e
y(t)
1
−1
Plugging in the initial values of x(0) = 6, y(0) = 0 yields the following linear system
x(0)
1
1
1
1 c1
6
= c1
+ c2
=
=
y(0)
1
−1
1 −1 c2
0
112
8. Systems of Differential Equations
This system has solution c1 = 3 and c2 = 3, so the solution functions are:
x(t) = 3e2t + 3e4t
y(t) = 3e2t − 3e4t .
4
2. Transforming a Linear DE Into a System of First Order DEs
The eigenvalue method can also be applied to second and higher order linear DEs.
We start with a simple example.
Example 8.2. Consider the homogeneous, linear second order DE,
(127)
y 00 + 5y 0 + 6y = 0.
Suppose y(t) represents the displacement (position) of a mass in an undriven,
damped, mass–spring system, then it is natural to let v(t) = y 0 (t) represent the
velocity of the mass. Of course, v 0 (t) = y 00 (t) and this allows us to rewrite equation (127) as follows:
v 0 + 5v + 6y = 0
=⇒
v 0 = −6y − 5v.
Combining our substitution, v = y 0 and our rewrite of equation (127) together
yields the following system of first–order equations
0 y0 =
v
y
0
1 y
=⇒
=
v 0 = −6y − 5v
v0
−6 −5 v
The coefficient matrix has eigenvalues λ1 = −2 and λ2 = −3, since
0 − λ
1 = λ(λ + 5) + 6
|A − λI| = −6
−5 − λ
= λ2 + 5λ + 6
= (λ + 2)(λ + 3) = 0.
Notice that the eigenvalue equation is exactly the characteristic equation which we
studied in chapter 5. Solving the eigenvector equation, (A − λI)~v = ~0 for each
eigenvalue yields
2
1
2 1
1
∼
=⇒ ~v1 =
• λ1 = −2 : A − (−2)I =
−6 −3
0 0
−2
3
1
3 1
1
• λ2 = −3 : A − (−3)I =
∼
=⇒ ~v2 =
−6 −2
0 0
−3
The general solution follows the pattern ~x(t) = c1~v1 eλ1 t + c2~v2 eλ2 t and is thus
y(t)
1 −2t
1 −3t
= c1
e
+ c2
e
v(t)
−2
−3
But we are only interested in y(t). That is, the general solution to equation (127)
is just:
y(t) = c1 e−2t + c2 e−3t .
Notice that v(t) is of course just y 0 (t) and is superfluous information in this case.
4
3. Complex Eigenvalues and Eigenvectors
113
Clearly this method of solution requires more work than the method of chapter 5, so it would appear that there is no advantage to transforming a linear equation
into a system of first order equations. However, we will see in the next chapter,
that this method allows us to study linear DEs geometrically. In the case of second order, linear DEs, the graphical methods of the next chapter will allow us to
understand mechanical systems and RLC circuits in a whole new way.
3. Complex Eigenvalues and Eigenvectors
Recall that our method of solving the linear system
~x 0(t) = A~x(t),
(122)
involves guessing that the solution will have the form
~x(t) = ~v eλt .
(123)
This forces ~x(t) to satisfy the eigenvalue–eigenvector equation:
(125)
A~x(t) = λ~x(t).
The eigenfunctions which satisfy this equation form a basis for the solution space
of equation (122). Thus the general solution of the system is a linear combination
of the eigenfunctions:
~x(t) = c1~v1 eλ1 t + c2~v2 eλ2 t + · · · + cn~vn eλn t
(126)
However if any eigenvalue, λi in the general solution is complex, then the solution will be complex–valued. We want real–valued solutions. The way out of this
dilemma is to realize that a single eigenpair, {λi , ~vi } where both λi and ~vi are
complex–valued can actually yield two real–valued eigenfunctions.
Suppose ~v eλt satisfies equation (125) and both ~v and λ are complex–valued.
Then we can expand ~v and λ yielding:


a1 + ib1
 a2 + ib2 

 (p+iq)t
~v eλt = 
e
..


.
an + ibn
 
 
a1
b1
 a2 
 b2   
 
=  .  + i  .  ept cos qt + i sin qt
 .. 
 .. 
an

bn

 

 
 

a1
b1
a1
b1
 a2 
 b2 

 a2 
 b2 

 
 

 
 

= ept  .  cos qt −  .  sin qt + i ept  .  sin qt +  .  cos qt
 .. 
 .. 

 .. 
 .. 

an
bn
an
= ept ~a cos qt − ~b sin qt + i ept ~a sin qt + ~b cos qt
{z
}
|
{z
}
|
~
x1 (t)
~
x2 (t)
bn
114
8. Systems of Differential Equations
The above just demonstrates that we can break up any complex–valued function
into its real and imaginary parts. That is, we can rewrite ~v eλt as:
~v eλt = Re ~v eλt + i Im ~v eλt = ~x1 (t) + i ~x2 (t),
where both ~x1 (t) and ~x2 (t) are real–valued functions. Since ~v eλt satisfies equation (125) so does ~x1 (t) + i ~x2 (t), but due to the fact that a matrix is a linear
operator (i.e. matrix multiplication distributes over linear combinations), they
both individually satisfy it as well.
A~v eλt = λ~v eλt
h
i
h
i
A ~x1 (t) + i ~x2 (t) = λ ~x1 (t) + i ~x2 (t)
A~x1 (t) + i A~x2 (t) = λ~x1 (t) + i λ~x2 (t)
Equating the real and imaginary parts of both sides yields the desired result:
A~x1 (t) = λ~x1 (t),
A~x2 (t) = λ~x2 (t).
In practice, you don’t need to memorize any formulas. The only thing from
above that you need to remember is that when confronted with a pair of complex conjugate eigenvalues, pick one of them and find its corresponding complex
eigenvector, then with this eigenpair form the eigenfunction ~x(t) = ~v eλt . The real
and imaginary parts of this eigenfunction will be real–valued eigenfunctions of the
coefficient matrix. That is find the two eigenfunctions:
~x1 (t) = Re ~v eλt
and ~x2 (t) = Im ~v eλt .
Then form the general solution by making a linear combination of all the eigenfunctions which correspond with the coefficient matrix:
~x(t) = c1 ~x1 (t) + c2 ~x2 (t) + · · · + cn ~xn (t).
Example 8.3. Consider the first–order, linear system:
x01 = 2x1 − 3x2
x02 = 3x1 + 2x2
⇐⇒
0 x1
2 −3 x1
=
x02
3
2 x2
| {z }
A
2 − λ
|A − λI| = 3
−3 = (2 − λ)(2 − λ) + 9 = 0
2 − λ
(λ − 2)2 = −9
(λ − 2) = ±3i
λ = 2 ± 3i
2 − (2 + 3i)
−3
−3i
A − λI =
=
3
2 − (2 + 3i)
3
−3
−3i
Next, we need to solve the eigenvector equation: (A − λI)~v = ~0, but elementary
row ops preserve the solution space and it is easier to solve the equation when the
4. Second Order Systems
115
matrix A − λI is in reduced row–echelon form (RREF) or at least row–echelon form
(REF).
−3i −3 R1 +iR2
0
0 R1 ↔R2
3 −3i (1/3)R1
1 −i
−→
−→
−→
3 −3i
3 −3i
0
0
0
0
1 −i i
0
i
=
=⇒
~v =
0
0 1
0
1
Now that we have an eigenpair, we can form a complex–valued eigenfunction which
we rearrange into real and imaginary parts:
i (2+3i)t
λt
~v e =
e
1
i 2t
=
e (cos 3t + i sin 3t)
1
2t i cos 3t − sin 3t
=e
cos 3t + i sin 3t
2t cos 3t
2t − sin 3t
+i e
=e
sin 3t
cos 3t
|
{z
}
{z
}
|
~
x1 (t)
~
x2 (t)
Finally, we form the general solution:
~x(t) = c1 x1 (t) + c2 x2 (t)
x1 (t)
2t − sin 3t
2t cos 3t
= c1 e
+ c2 e
x2 (t)
cos 3t
sin 3t
x1 (t) = −c1 e2t sin 3t + c2 e2t cos 3t
x2 (t) =
c1 e2t cos 3t + c2 e2t sin 3t
4
4. Second Order Systems
Consider two masses connected with springs as shown in figure 4.1. Since this is
a mechanical
system, Newton’s laws of motion apply, specifically the second law
P
ma =
F.
k1
m1
k2
m2
k3
Figure 4.1. Two mass system
Since each mass is attached to two springs, there are two forces which act upon
each mass. Let’s derive the equation of motion for mass one, m1 . If we displace
m1 a small amount to the right then spring one, labelled k1 , will pull it back. The
force, according to Hooke’s law will be equal to −k1 x1 . The negative sign simply
indicates that the force will be in the negative direction.
The force on m1 due to spring two is complicated by the fact that both m1 and
m2 can be displaced simultaneously. However, a simple thought experiment will
116
8. Systems of Differential Equations
clarify. Imagine diplacing m2 two units to the right from its equilibrium position,
and imagine displacing m1 only one unit to the right from its equilibrium. In this
configuration, since spring two is stretched, it will pull m1 to the right with a force
proportional to k2 times the the amount of stretch in spring two. This stretch is
exactly one unit, because x2 − x1 = 2 − 1 = 1. Therefore the equation of motion
for m1 is:
m1 x001 = −k1 x1 + k2 (x2 − x1 ).
To derive the equation of motion for mass two, m2 , we will again imagine displacing
m2 to the right by two units and m1 to the right by one unit. In this configuration,
since spring two is stretched, it will pull m2 to the left. Spring three will be
compressed one unit and hence push m2 to the left as well.
m2 x002 = −k2 (x2 − x1 ) − k3 x2 .
We wish to write this system as a matrix equation so we can we can apply the
eigenvalue–eigenvector method. Thus we need to rearrange these two equations
such that the variables x1 and x2 line up in columns.
m1 x001 = −(k1 + k2 )x1 + k2 x2
m1
0
m2 x002 = k2 x1 − (k2 + k3 )x2
00 0
x1
−(k1 + k2 )
k2
x1
=
m2 x002
k2
−(k2 + k3 ) x2
This matrix equation can be written very compactly as
M~x 00 (t) = K~x(t)
(128)
We will call matrix M the mass matrix and matrix K the stiffness matrix. Before
we can apply the eigenvalue–eigenvector method we need to rewrite equation (128)
so that it contains only one matrix. Luckily, the mass matrix is invertible with
inverse
1/m1
0
−1
M =
.
0
1/m2
This allows us to rewrite equation (128) as
~x 00 (t) = A~x(t)
(129)
where A = M −1 K. To solve this system we will employ the same method as before,
but since each equation now involves a second derivative we will have to take that
into account. We guess that the solution has the form:
~x(t) = ~v eαt .
Differentiating our guess solution twice yields:
(130)
~x(t) = ~v eαt
⇒
~x 0 (t) = α~v eαt
⇒
~x 00 (t) = α2~v eαt = α2 ~x(t).
Equating the right hand sides of equation (129) and equation (130) yields
(131)
A~x(t) = α2 ~x(t).
This is essentially the eigenvalue–eigenvector equation again if λ = α2 . But we
need to
5. Nonhomogeneous Linear Systems
k1
m1
k2
117
m2
k3
Figure 4.2. Three mass system
5. Nonhomogeneous Linear Systems
m3
k4
Download