Uploaded by davidjp656

notes

MAE 376: Applied Mathematics for MAEs
Dr. Paul T. Bauman, Dr. Ehsan Esfahani, Dr. Abani K. Patra
with contributions from Dr. Souma Chowdhury
Fall Semester 2016
c 2015 Paul T. Bauman, Ehsan Esfahani, Abani K. Patra
All rights reserved.
This work is intended for students of MAE 376 at the University of Buffalo and should not be
publicly distributed.
Contents
Introduction
0.1 Overview of Course . . . . . . . . . . . . . . . . . . . . . . . . . . . .
0.2 Historical Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . .
i
i
ii
I
1
Mathematical Background
1 Linear Systems
1.1 Matrix Algebra . . . . . . . . . . . . . . .
1.2 Representing Linear Systems of Equations
1.3 More Than Just an Array of Numbers... .
1.4 Solving Linear Systems (by hand) . . . . .
2 Eigenvalues and Eigenvectors
2.1 Core Idea . . . . . . . . . .
2.2 Applications . . . . . . . . .
2.3 Extending the idea . . . . .
2.3.1 Representation . . .
2.3.2 Special Properties . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3
3
7
15
16
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
21
21
23
29
29
29
3 Differential Equations
3.1 Introduction . . . . . . . . . . . . . . . . . . . .
3.1.1 Classification of Differential Equations .
3.1.2 Notations . . . . . . . . . . . . . . . . .
3.1.3 Operators . . . . . . . . . . . . . . . . .
3.2 Solutions of Linear ODEs . . . . . . . . . . . .
3.2.1 2nd Order ODE with constant coefficient
3.3 Solutions of Partial Differential Equations . . .
3.3.1 Wave Equation . . . . . . . . . . . . . .
3.3.2 Heat Equation . . . . . . . . . . . . . . .
3.3.3 Different Boundary conditions . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
31
31
31
32
33
34
35
37
38
40
43
II
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Numerical Methods
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
45
4 Numerical Solution of Linear Systems
3
47
4
CONTENTS
4.1
4.2
4.3
4.4
4.5
4.6
4.7
Automating Gaussian Elimination . . . . . . .
Computational Work of Gaussian Elimination
Partial Pivoting . . . . . . . . . . . . . . . . .
LU Decomposition . . . . . . . . . . . . . . .
Cholesky Decomposition . . . . . . . . . . . .
Computing the Inverse of a Matrix . . . . . .
Practice Problems . . . . . . . . . . . . . . . .
5 Numerical Error and Conditioning
5.1 Error Considerations . . . . . . . . .
5.2 Floating Point Representation . . . .
5.3 Review of Vector and Matrix Norms
5.4 Conditioning of Linear Systems . . .
5.5 Practice Problems . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
6 Numerical Differentiation
6.1 Approximating Derivatives . . . . . . . . . . . . . .
6.1.1 What Are Finite Differences? . . . . . . . .
6.1.2 Taylor Series and Approximate Derivative .
6.1.3 Taylor Series and Finite Differences . . . . .
6.1.4 What if there is error in evaluation of f (x)?
6.2 Higher Dimensions and Partial Derivatives . . . . .
6.3 Practice Problems . . . . . . . . . . . . . . . . . . .
7 Solution of Initial Value Problems
7.1 A Simple Illustration . . . . . . . . . . . .
7.2 Stability . . . . . . . . . . . . . . . . . . .
7.3 Multistage Methods . . . . . . . . . . . . .
7.3.1 First Order Explicit RK Methods .
7.3.2 Second Order Explicit RK Methods
7.3.3 Fourth Order Explicit RK . . . . .
7.3.4 MATLAB and RK methods . . . .
7.4 Practice Problems . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
8 Solution of Boundary Value Problems
8.1 Heat transfer in a One-Dimensional Rod . . . . . . . . . . . . . .
8.1.1 One-Dimensional Rod with Fixed Temperature Ends . . .
8.1.2 One-Dimensional Rod with Mixed Boundary Conditions .
8.2 General Linear Second Order ODEs with Nonconstant Coefficients
8.3 Two dimensional Equations . . . . . . . . . . . . . . . . . . . . .
8.4 Practice Problems . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
47
48
50
51
57
58
60
.
.
.
.
.
61
61
63
65
67
68
.
.
.
.
.
.
.
69
69
69
70
70
75
75
77
.
.
.
.
.
.
.
.
79
79
81
82
83
83
85
86
87
.
.
.
.
.
.
89
89
89
91
92
93
94
9 Solution of Eigenproblems
95
9.1 Power Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
9.2 Inverse Power Method . . . . . . . . . . . . . . . . . . . . . . . . . . 97
CONTENTS
9.3
9.4
5
Shifted Inverse Power Methods . . . . . . . . . . . . . . . . . . . . .
QR Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10 Nonlinear Equations
10.1 Example Problem: Rocket . . . . . . . .
10.1.1 Problem Formulation . . . . . . .
10.1.2 Problem Solution . . . . . . . . .
10.2 Solving Non-Linear Equations . . . . . .
10.2.1 Test for Linearity . . . . . . . . .
10.2.2 Methods of Solution . . . . . . .
10.3 Convergence . . . . . . . . . . . . . . . .
10.3.1 Bisection . . . . . . . . . . . . . .
10.3.2 Newton Rhapson . . . . . . . . .
10.3.3 Fixed Point . . . . . . . . . . . .
10.4 Nonlinear Systems of Equations . . . . .
10.4.1 Fixed-Point Method . . . . . . .
10.4.2 Newton-Raphson Method . . . .
10.4.3 Case Study: Four Bar Mechanism
III
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Data Analysis
12 Interpolation
12.1 Polynomial Interpolation . . . .
12.1.1 Monomial Functions and
12.1.2 Lagrange Polynomials .
12.2 Splines . . . . . . . . . . . . . .
12.2.1 Linear Splines . . . . . .
12.2.2 Cubic Splines . . . . . .
Bibliography
99
99
99
101
101
102
102
105
106
106
107
107
108
109
111
113
11 Linear Regression
11.1 Least Squares Fit: Two Parameter Functions
11.2 Polynomial Regression . . . . . . . . . . . .
11.3 Multiple Linear Regression . . . . . . . . . .
11.4 General Linear Least Squares Regression . .
13 Numerical Integration
13.1 Newton-Cotes Rules . .
13.1.1 Trapezoidal Rule
13.1.2 Simpson’s Rule .
13.1.3 Composite Rules
13.2 Gauss Quadrature . . . .
97
98
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .
the
. .
. .
. .
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . . . . . .
Vandermonde
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . . .
Matrix
. . . . .
. . . . .
. . . . .
. . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
115
115
119
120
121
.
.
.
.
.
.
123
123
123
124
127
127
128
.
.
.
.
.
131
131
131
133
133
135
139
6
CONTENTS
Introduction
0.1
Overview of Course
This course builds on “core” mechanics, engineering, and mathematics courses and
brings these ideas together to continue training you (the student) how to take an
engineering problem and solve it systematically using mathematical and computing
tools. As such, there are two themes that will continually arise during the course:
• Formulating an engineering problem as a mathematical problem
• Solving a mathematical problem using mathematical and computational tools
Conceptually, in any engineering problem, we are approximating the behavior of the
system using a model. The vast majority of models are mathematical models.
That is, we make a number of assumptions and
approximations to arrive at equations that model the
Engineering or Physics Problem
behavior of the system. Examples include, Newtonian mechanics, Bernoulli-Euler beams, Euler equations for compressible inviscid gas dynamics, NavierMathematical Model
(Approximations & Assumptions)
Stokes equations for viscous fluid flow, general relativity, quantum mechanics, etc. In a limited number
of instances, an analytical solution may be feasible,
Numerical Formulation of
Governing Equations
but in the vast majority of cases, we must resort to
numerical methods to compute solutions to our
models. The flow chart in Figure 1 shows a qualAnalytical & Numerical Methods
itative overview of this process. In this course, we
will focus on all aspects starting from the engineering problem all the way to computing a solution.
Solutions
The coverage of these topics is structured into
three modules: Applied Mathematics, Numerical Methods, and Data Analysis. During the
Applications/Decisions
applied mathematics portion of the course, we will
cover the mathematical foundation of linear systems
of equations, eigenvalues and eigenvectors, and ordi- Figure 1: Flow chart for solvnary and partial differential equations. In particular, ing an engineering problem.
we will heavily emphasize transforming an engineering problem into a mathematical
i
ii
INTRODUCTION
problem. During this time, the lab portion of the course will focus on introducing
Matlab. The second portion of the course will focus on formulating and solving the
mathematical problems using numerical methods. We will be heavily using Matlab
to program our algorithms to solve these problems. As such, programming will be a
core component of this course. The final module of the course is focused on methods
of data analysis, particularly regression, interpolation, and numerical integration.
Again, Matlab programming will be a heavy component of the data analysis module
of the course.
0.2
Historical Perspective
Before the advent of computers, scientific inquiry proceeded in two stages: theory
and experimentation. Theories are hypothesized on the basis of observations of
nature and logical arguments. Subsequently, physical experiments are conducted in
order to test the theory. As limitations of the theory are uncovered, the theory is
revised and new experiments are conducted.
Beginning in 1947, a dramatic change was initiated: the invention of the transistor, see Figure 2.
The transistor is a fundamental logical unit in every
computer processor. An increasing number of transistors in a computer chip allows for a greater number
of operations per unit time. An exponential growth in
the number of transistors in a computer processor was
observed for a number of decades and is attributed
to Gordon Moore of Intel: Moore’s Law. Figure 3
graphically illustrates Moore’s law (note the log-scale
Figure 2: Image of first on the vertical axis). The tremendous growth in comtransistor.
Taken from puting capability has brought modeling, numerical
methods, and computer simulation to the forefront of
https://en.wikipedia.
scientific inquiry and engineering analysis. Indeed, it
org/wiki/Transistor.
is now accepted that computation is the third pillar of science, next to theory and experimentation. Thus, a critical set of tools for
scientists and engineers is mathematical modeling and numerical methods. These
need to understand and use these tools is the purpose for this course.
0.2. HISTORICAL PERSPECTIVE
iii
Microprocessor Transistor Counts 1971-2011 & Moore's Law
16-Core SPARC T3
Six-Core Core i7
Six-Core Xeon 7400
2,600,000,000
Dual-Core Itanium 2
AMD K10
POWER6
Itanium 2 with 9MB cache
AMD K10
1,000,000,000
AMD K8
Pentium 4
Transistor count
8-core POWER7
Quad-core z196
Quad-Core Itanium Tukwila
8-Core Xeon Nehalem-EX
Six-Core Opteron 2400
Core i7 (Quad)
Core 2 Duo
Cell
Itanium 2
100,000,000
10-Core Xeon Westmere-EX
Atom
AMD K7
AMD K6-III
curve shows transistor
count doubling every
two years
10,000,000
Barton
AMD K6
Pentium III
Pentium II
AMD K5
Pentium
80486
1,000,000
80386
80286
100,000
68000
8085
6800
10,000
6809
Z80
8080
MOS 6502
8008
2,300
4004
80186
8088
8086
RCA 1802
1971
1980
1990
2000
2011
Date of introduction
Figure 3: Growth of the number of transistors in a single computer chip. Exponential
trend called Moore’s Law. From https://en.wikipedia.org/wiki/Moore’s_law
iv
INTRODUCTION
Part I
Mathematical Background
1
Chapter 1
Linear Systems
In this chapter, we focus on engineering systems that can be cast into linear systems of equations. Although a more complete treatment of linear systems is the
subject of courses in linear algebra, we first review necessary topics in matrices and
matrix algebra in Section 1.1 and then apply these ideas to constructing linear systems based on problems encountered in statics, circuits, and spring-mass systems in
Section 1.2. In Section 1.3, we briefly discuss matrices in the context of transformations. Finally, in Section 1.4, we introduce solving linear systems manually using
Gaussian Elimination.
1.1
Matrix Algebra
We first review the basic notation and algebra of vectors and matrices. Symbolically
a vector is an array of numbers, oriented in a row or column with only a single row
or single column, respectively. For example, take the vectors a and b:
 
4
5

a = 1, 2, 3 , b = 
6
7
a is a row vector with three entries while b is a column vector with four entries. As
scientists and engineers, we use the vector notation in a variety of ways including
the description of coordinates and points of bodies in physical space. We will use
vectors (and matrices) in more interesting ways to describe physical systems.
A matrix generalizes the notion of a vector. Instead of a single row or column of
entries, we now have an ordered array of entries. A matrix A is said to be n × m (“n
by m”) if it has n rows and m columns. Equation (1.1) illustrates a generic matrix
A.


a11 a12 . . . a1n
 a21 a22 . . . a2n 


A =  ..
(1.1)

.
.
 .
. aij . . . 
am1 am2 . . . amn
3
4
CHAPTER 1. LINEAR SYSTEMS
The scalar entry aij corresponds to the entry on the ith row and j th column. If
m = 1, we have a row vector; if n = 1 we have a column vector. The matrix is said
to be square if m = n. If the matrix is not square, it is said to be rectangular.
By convention, we typically use lower-case roman letters for vectors and upper-case
roman letters for matrices; for entries in a vector or a matrix, we use a lower-case
roman letter with numeric subscripts indicating the location of that entry in the
vector or matrix.
Example 1.1.1. Matrices
 



5
5 1 2
4
0
1



A1 = 1 3 7 , b = 
2 , c = 3 7 4 6 , B = 2 5
2 7 8
6 7
7

(1.2)
A1 is a 3 × 3 square matrix with a23 = 7. b is 4 × 1 column vector with b2 = 1. c is
a 1 × 4 row vector with c1 = 3. B is a 3 × 2 rectangular matrix with b12 = 0.
There are several special forms of matrices that occur frequently in the study
of linear systems. A symmetric matrix is such that for each i and j, aij = aji .
The matrix A1 in Equation 1.2 is a symmetric matrix. Note that a matrix must be
square for it to be symmetric. A diagonal matrix is one with zeros in the off-diagonal
entries; that is aij = 0, i 6= j.
Example 1.1.2. Diagonal Matrix


3 0 0
D = 0 2 0
0 0 1
(1.3)
D is a diagonal matrix since its off-diagonal entries are zero. Its diagonal entries are
d11 = 3, d22 = 2, d33 = 1.
The identity matrix is a particular diagonal matrix: all the diagonal entries are
1; that is, aii = 1 and aij = 0, i 6= j.
Example 1.1.3. Identity Matrix


1 0 0
I = 0 1 0
0 0 1
(1.4)
I is a 3 × 3 identity matrix. The symbol I is typically reserved for referring to the
identity matrix. The Matlab command eye(3) will generate a 3×3 identity matrix.
The zero matrix is what you’d expect: a matrix of all zeros. That is, aij = 0
for all i, j. A 4 × 5 zero matrix can be generated in Matlab with the command
zeros(4,5). A final set of special matrices that we’ll need possess all zeros below
the diagonal or all zeros above the diagonal are called upper triangular and lower
triangular, respectively.
1.1. MATRIX ALGEBRA
5
Example 1.1.4.




u11 u12 u13
l11 0 0
U =  0 u22 u23  , L = l21 l22 0 
0
0 u33
l31 l32 l33
(1.5)
U is a 3 × 3 upper triangular matrix while L is a 3 × 3 lower triangular matrix.
Matrix objects have their own rules for algebraic manipulation for which we will
make extensive use. Two matrices are said to be equal, A = B, if aij = bij for each
entry i, j. That is, each entry in the matrix A must be equal to the corresponding
entry in matrix B in order for the condition A = B to be true.
We can add and subtract matrices: C = A + B is defined by adding each corresponding entry of A and B and setting that value in C. That is, cij = aij + bij
for each i, j. Similarly, D = B − A is computed as dij = bij − aij . Matrix addition (and subtraction) are commutative: A + B = B + A and associative:
(A + B) + C = A + (B + C).
We can multiply matrices by a scalar number α ∈ R (α is a real number). Symbolically, we write α ∗ A = A ∗ α = αA; computationally, we multiply each entry in
the matrix by the scalar α: αA = αaij for each i, j.
We can multiply two matrices, but this is slightly more involved. Given an m × n
matrix A and an n × p matrix B, we wish to compute “A times B”: A ∗ B = AB.
C = AB is defined as
n
X
aik bkj
(1.6)
Cij =
k=1
So, note, in particular, that the inner dimensions of the matrices must be equal.
Otherwise, the multiplication of two matrices does not make sense and is not defined.
The dimensions of C will be equal to the outer dimensions of the matrices A and
B. In words, the matrix product proceeds by taking the ith row of the matrix A
and multiplying it by the j th column in B (a “dot-product” of those vectors) which
yields the scalar value for cij .
Example 1.1.5. Take


3 1
5
9
A = 8 6 , B =
7 2
0 4
(1.7)
First, we confirm the inner dimensions of A and B are equal, namely 2. The outer
dimension of A and B are 3, 2 and so C = AB will have dimensions of 3 × 2.



 

3 1 3×5+1×7 3×9+1×2
22 29
8 6 5 9 = 8 × 5 + 6 × 7 8 × 9 + 6 × 2 = 82 84
7 2
0 4
0×5+4×7 0×9+4×2
28 8
Note that BA is not defined.
(1.8)
6
CHAPTER 1. LINEAR SYSTEMS
Matrix multiplication enjoys some of the standard multiplicative algebraic properties. Matrix multiplication is associative: (AB) C = A (BC); it is distributive:
A (B + C) = AB + AC. However, matrix multiplication is not commutative:
AB 6= BA. We saw this already in Example 1.1.5. We can also now understand why
the identity matrix, I, is named as such: AI = IA = A.
We will have use for the transpose of a matrix. Given the matrix A, we denote
the transpose as AT and it is defined as aTij = aji .
Example 1.1.6. Take the matrix A as follows.


5 1
A = 2 3
4 6
Then,
5 2 4
A =
1 3 6
T
We’ve seen that many of the operations we use for scalars translate analogously to
vectors and matrices, but what about division? Division is not defined for matrices,
but there is an analogous operation: the inverse. The inverse of a matrix A, A−1 ,
is defined such that A−1 A = AA−1 = I. We will discuss the inverse matrix more
in-depth later.
We need one final operation on square matrices for later use: the determinant.
The determinant appears in a number of different algorithms and formulae throughout engineering science. We denote the determinant as |A| or det A. It has a recursive
definition based on the size of the matrix. For a 1 × 1 matrix, |A| = |a11 | = a11 . For
a 2 × 2 matrix,
a
a
(1.9)
|A| = 11 12 = a11 a22 − a12 a21
a21 a22
For a 3 × 3 matrix,
a11 a12 a13
a
a
a
a
a
a
|A| = a21 a22 a23 = a11 22 23 −a12 21 23 + a13 21 22
a31 a33
a31 a32
a32 a33
a31 a32 a33
| {z }
(1.10)
Minor,Mij
The general definition is in terms of the minors Mij , which are merely determinants
of the submatrices formed by removing the ith row and the j th column from the
original matrix. For a n × n matrix A,
det A =
n
X
j=1
aij (−1)
i+j
Mij =
n
X
aij Cij
(1.11)
j=1
where Cij is a cofactor; we select a particular i, usually 1. There are more efficient
ways to compute the determinant that will be considered later.
1.2. REPRESENTING LINEAR SYSTEMS OF EQUATIONS
7
There are several useful properties of determinants. Let In be the n × n identity
matrix and let α be a scalar.
det(AB) = det(A) det(B)
det(αIn ) = αn
det(αA) = det(αIn A) = αn det(A)
1
det(A−1 ) =
det(A)
T
det(A ) = det(A)
1.2
(1.12)
(1.13)
(1.14)
(1.15)
(1.16)
Representing Linear Systems of Equations
Now, we will study how to form linear systems of equations for several typical undergraduate engineering problems. First, however, we review the notion of linearity
and systems of equations. Let L(x) be some mathematical operator acting on x. x
could be a number and L could be a function; L could also be a matrix, x a vector.
We say that L is linear if it satisfies the following properties:
L(x + y) = L(x) + L(y)
L(αx) = αL(x), α ∈ R
(1.17)
(1.18)
If L does not satisfy these properties, then it is said to be nonlinear. For example,
if L is a one-dimensional function, say L(x) = 3x, then L is linear. If L(x) =
2x3 − 5x2 + 1, then it is nonlinear.
We can also have systems of equations. Systems of equations are multiple sets
of equations that share the same unknowns. Many engineering problems can be
modeled as systems of linear equations.
Example 1.2.1. Linear System of Equations Take the following system of equations.
2x1 + 0x2 = 3
−1x1 + 2x2 = 4
This is a system of two equations and two unknowns. The unknowns are x1 and x2 .
These are linear systems because the each of the equations in the system is linear
with respect to the unknowns. Later in the course, we will encounter nonlinear
systems — systems where some or all of the equations are nonlinear in one or more
of the unknowns. Our goal will be to solve such systems of equations in a systematic
way such that we can write computer programs to automate their solution.
It is prudent to ask several questions about these systems of equations. First,
are there any solutions? This is called existence of solutions. Second, how many
solutions are possible? This is called uniqueness of solutions. We say that the
system is non-singular if there exists one and only one solution. Otherwise, the
system is singular. For linear systems, there are only three possibilities about the
number of solutions: 0, 1, ∞.
8
CHAPTER 1. LINEAR SYSTEMS
Example 1.2.2. What is the/are solution(s) to the following linear system?
x1 + x 2 = 2
2x1 + 2x2 = 4
Take x1 = 2, x2 = 0. We can also take x1 = 1, x2 = 1. There are ∞ many solutions
to this linear system.
Example 1.2.3. What is the/are solution(s) to the following linear system?
x1 + 0x2 = 5
7x1 + 0x2 = 2
The first equation of the system suggests that x1 = 5 while the second shows x1 =
2/7. This contradiction indicates there is no solution to the system of equations.
Example 1.2.4. What is the/are solution(s) to the following linear system?
x1 + 0x2 = 5
x1 + x2 = 17
Take x1 = 5. Now substitute into the second equation to get 5 + x2 = 17. This yields
x2 = 12. This system has exactly one solution: x1 = 5, x2 = 12.
For systems of equations that only have two components, we can visualize what’s
happening graphically. We can plot each of the lines on a graph and where they
intersect represents the solution of the linear system. Figure 1.1a shows a clear
unique solution. Figure 1.1b shows a case where the lines nowhere intersect, i.e.
they are parallel — this case has no solutions. Finally, Figure 1.1c show a case where
the lines overlap — this case has infinitely many solutions, i.e. every point on the
line is a solution.
1.2. REPRESENTING LINEAR SYSTEMS OF EQUATIONS
(a) Graphical representation of a twodimensional linear system that possesses
a unique solution.
9
(b) Graphical representation of a twodimensional linear system that possesses
a no solution.
(c) Graphical representation of a twodimensional linear system that possesses
infinitely many solutions.
Figure 1.1: Figures illustrating the various possibilities of solutions for a twodimensional linear system. Figures taken from Chapra [1].
We now turn to the study of linear systems using tools of matrix algebra. This
step is critical to allowing us to systematically solving such systems. Before turning
to more interesting examples, we begin with a generic 3 × 3 linear system.
a11 x1 + a12 x2 + a13 x3 = b1
a21 x1 + a22 x2 + a23 x3 = b2
a31 x1 + a32 x2 + a33 x3 = b3
(1.19)
(1.20)
(1.21)
10
CHAPTER 1. LINEAR SYSTEMS
The first step is to recognize that we can write Equation 1.19 as two vectors in
equality:

  
a11 x1 + a12 x2 + a13 x3
b1
a21 x1 + a22 x2 + a23 x3  = b2 
(1.22)
a31 x1 + a32 x2 + a33 x3
b3
Now Equation 1.22 is a vector equation. Finally, we recognize that the vector on the
left of Equation 1.22 can actually be expressed as a matrix-vector product:

   
a11 a12 a13
x1
b1
a21 a22 a23  x2  = b2 
(1.23)
a31 a32 a33
x3
b3
We now have a system that can be concisely written as Ax = b: x is the (unknown)
solution of interest while A and b will be given. This form is more conducive to
systematic solutions by computer; this is especially critical on systems with large
numbers of unknowns. We now focus on typical applications that you’ve encountered
in previous coursework such that we write the problems in this form.
Example 1.2.5. Spring-Mass System in Static Equilibrium
Figure 1.2 illustrates a simple spring-mass system. We will assume that each of
the masses is loaded with a constant force and is in static equilibrium. We denote
the force on mass m1 as f1 and the force on mass m2 as f2 . We wish to compute the
displacements of each of the masses.
k
f1 k
m1
x1
f2 k
m2
x2
Figure 1.2: Spring-mass system.
First we draw free body diagrams for each of the masses.
Figure 1.3: Free body diagram of masses.
Since the system is in static equilibrium, we know that the sum of the forces on
each of the masses must be zero. This yields the following system of equations:
f1 + k(x2 − x1 ) − kx1 = 0
f2 − kx2 − k(x2 − x1 ) = 0
(1.24)
(1.25)
1.2. REPRESENTING LINEAR SYSTEMS OF EQUATIONS
11
We can rewrite this system as
2kx1 − x2 = f1
−x1 + 2kx2 = f2
(1.26)
(1.27)
Now we can rewrite this system in matrix form:
2k −k
−k 2k
x1
f
= 1
x2
f2
(1.28)
Example 1.2.6. Electrical Circuit with Resistors
Figure 1.4 illustrates a simple circuit with several resistors and voltage loadings.
We wish to compute the current in each of the closed loops in the circuit.
V2
R3
R1
R4
V1
R2
R5
V3
Figure 1.4: Simple electrical circuit.
Kirchoff’s law states that the sum of the voltages around a closed must be zero. In
this example, we orient the loops counter-clockwise and label as shown in Figure 1.5.
12
CHAPTER 1. LINEAR SYSTEMS
Figure 1.5: Simple electrical circuit with oriented loops for summing voltages.
Further, we use Ohm’s law which relates the voltage drop, V , across a resistor to
the current, I, and the resistance, R: V = IR. Summing around each of the voltage
loops yields the following system of equations:
Loop 1:
Loop 2:
Loop 3:
I1 R5 + I1 R4 − I2 R4 + I1 R2 − I3 R2 − V3 = 0
I2 R4 − I1 R4 + I2 R3 + V2 + I2 R1 − I3 R1 = 0
I3 R2 − I1 R2 + I3 R1 − I2 R1 + V1 = 0
(1.29)
(1.30)
(1.31)
We can rewrite the system as follows:
(R5 + R4 + R2 )I1 − R4 I2 − I3 R2 = V3
−R4 I1 + (R4 + R3 + R1 )I2 − R1 I3 = −V2
−R2 I1 − R1 I2 + (R2 + R1 )I3 = −V1
Now we can write the system in matrix form:

  

(R5 + R4 + R2 )
−R4
−R2
I1
V3

−R4
(R4 + R3 + R1 )
−R1  I2  = −V2 
−R2
−R1
(R2 + R1 ) I3
−V1
(1.32)
(1.33)
(1.34)
(1.35)
Notice that the matrix is symmetric.
Example 1.2.7. Static Truss with Two-Force Members
Figure 1.6 shows a simple truss under loading. The truss consists of members
with loadings at two points only. We will assume the truss is in static equilibrium.
We wish to compute the force response throughout the truss, in particular the forces
in the rods and at the pins at the walls.
1.2. REPRESENTING LINEAR SYSTEMS OF EQUATIONS
13
Figure 1.6: Static truss under loading with two-force members only.
We begin by drawing free body diagrams at each of the points B, A, and C. We
denote the force in the rod between points B and A as FBA and the force in the rod
between points A and C as FAC . The reaction forces at C are denoted Cx and Cy
for the x- and y-components, respectively. Similarly, the reaction forces at pin B are
denoted Bx and By .
Figure 1.7: Free body diagrams of truss with two-force members.
Because the truss is in static equilibrium the forces in the x- and y-directions
must each sum to zero. This gives six static equilibrium equations:
B:
C:
A:
Bx + FBA sin(30o ) = 0
By + FBA cos(30o ) = 0
Cx + FAC cos(20o ) = 0
Cy + FAC sin(20o ) = 0
FBA sin(30o ) + FAC cos(20o ) = 0
FBA cos(30o ) − FAC sin(20o ) − W = 0
As with the previous examples, we can now

sin(30o )
0
1 0
cos(30o )
0
0 1

o
 0
cos(20
)
0 0

o
 0
sin(20 ) 0 0

 sin(30o ) cos(20o ) 0 0
cos(30o ) − sin(20o ) 0 0
(1.36)
(1.37)
(1.38)
(1.39)
(1.40)
(1.41)
rewrite this system into matrix form.

  
0 0 FBA
0




0 0 FAC   0 


  
1 0
  Bx  =  0 
(1.42)

  
0 1
  By   0 
0 0  Cx   0 
0 0
Cy
W
14
CHAPTER 1. LINEAR SYSTEMS
Example 1.2.8. Static Truss with Three-Force Members
Figure 1.8 shows a simple truss under loading. The truss consists of members
with loadings at three points in the members, requiring additional considerations
beyond Example 1.2.7.
Figure 1.8: Static truss under loading with three-force members.
First we draw free body diagrams of the system as well as each of the members,
noting that the bar BE is a two force member.
Figure 1.9: Free body diagrams of forces in static truss shown in Figure 1.8.
We see, then, that the unknowns are Ax , Ay , FBE , CX , Cy , Dx , Gx , Gy , eight in all.
With eight unknowns, we’ll need eight equations in order to determine the unknowns.
1.3. MORE THAN JUST AN ARRAY OF NUMBERS...
15
We start by summing forces.
Ax + FBE sin(53.13o ) − Cx + W + Dx
Ay + FBE cos(53.13o ) + Cy
CX − FBE cos(36.8698o ) + Gx
−Cy − FBE sin(36.8698o ) + Gy
−Gx − W
−Gy − W
=0
=0
=0
=0
=0
=0
This gives us six equations. We need two more. We’ll use balance of moments. First,
taking moments of bar CEG around point C gives
−FBE sin(36.8698o ) ∗ 8 + Gy ∗ 16 = 0
Now, taking moments of bar ABCD around point A gives
−FBE sin(53.13o ) ∗ 6 + Cx ∗ 12 − W ∗ 15 − Dx ∗ 18
This gives us now 8 total equations. We can then write this system of equations in
matrix form:


 

1 0
sin(53.13o )
−1 0
1
0
0
−W
Ax
0 1

 

cos(53.13o )
0
1
0
0
0

  Ay   0 
o
0 0 − cos(36.8698 ) 1

 

0
0
1
0

 FBE   0 
0 0 − sin(36.8698o ) 0 −1 0




0
1   Cx   0 

 (1.43)
0 0

=

0
0
0
0 −1 0 

  Cy   W 
0 0




0
0
0
0
0 −1  Dx   W 


0 0 −8 sin(36.8698o ) 0
0
0
0 16   Gx   0 
0 0 −6 sin(53.13o ) 12 0 −18 0
0
Gy
15 ∗ W
1.3
More Than Just an Array of Numbers...
Up to this point, we’ve discussed basic matrix algebra and representing some of our
favorite engineering problems as linear systems of equations and, subsequently, in
matrix form. But what are matrices, really? Are they just convenient arrays of
numbers?
Yes, they are convenient, but they are more. They encapsulate information about
changing vectors. In mathematical parlance, they are linear operators. An “operator” is fancy language for “takes one object and converts it to another”. For
matrices that we’ve been discussing, we use the following notation to indicate that
the operator takes a n-dimensional vector and returns an m − dimensional vector:
A : x ∈ Rn → Rm . A could be many kinds of operators, but in the present context,
A is a matrix. (What must be the size of the matrix to take an n-vector to give an
m-vector?) In such cases, we tend to refer to A as a linear transformation — it
transforms an input vector into an output vector in a linear way.
16
CHAPTER 1. LINEAR SYSTEMS
Example 1.3.1. Scaling a Vector
Take the vector xT = [1, 1]. If we multiply by a real number, α, we get αxT =
[α, α]. We can also accomplish this using a matrix. Take the matrix
α 0
A=
(1.44)
0 α
Then the operation Ax will yield the vector [α, α]T .
Example 1.3.2. Scaling One Component of a Vector
Take the vector xT = [1, 1]. We can generalize the previous example by only
scaling the x-component of a vector:
α 0
A=
(1.45)
0 1
Now, Ax will yield the vector [α, 1]T . Already, we’re now beyond the simple algebra
of vectors.
Example 1.3.3. Reflect a Vector across the Axis
Take the vector xT = [1, 1]. We can use linear transformations to operate on
vectors by reflecting them across the x-axis.
1 0
A=
(1.46)
0 −1
How do we change A to give a reflection across the y-axis?
Example 1.3.4. Rotate a Vector by an Angle θ
Take the vector xT = [1, 1]. An extremely useful linear transformation is a
rotation.
cos θ − sin θ
A=
(1.47)
sin θ cos θ
This linear transformation will rotate a two-dimensional vector counterclockwise by
the angle θ. Notice that length of the vector is unchanged.
1.4
Solving Linear Systems (by hand)
In this section, we consider our first systematic strategy for solving linear systems
of equations called Gaussian Elimination. We need such strategies as directly
solving for unknowns by substitution is not conducive to programming. Gaussian
Elimination is a direct method as we directly compute a solution in a fixed number
of steps (as opposed to iterative algorithms where the solution cannot be computed
in a fixed number of steps).
Gaussian Elimination proceeds in two broad steps: 1. The elimination phase
where we transform the matrix into upper triangular form; 2. Because the matrix
is now in upper triangular form, we can perform backward substitution where
the last unknown is readily computed, from which the remaining unknowns can be
systematically computed. Let’s start with a 2 × 2 matrix.
1.4. SOLVING LINEAR SYSTEMS (BY HAND)
Example 1.4.1. Gaussian Elimination of a 2 × 2 Matrix
a11 a12 x1
b
= 1
a21 a22 x2
b2
17
(1.48)
In the first stage of Gaussian Elimination, we need to transform the matrix into
upper triangular form. We do this by eliminating the entries in the matrix that are
below the diagonal, in this case only a21 . We use information in the first equation to
accomplish this, namely we rescale and add the first equation to the second equation.
(−a21 /a11 )(a11 x1 + a12 x2 = b1 )
+ a21 x1 + a22 x2 = b2
Adding the first equation into the second yields
(a22 − a21 a12 /a11 )x2 = b2 − a21 /a11 b1
(1.49)
For simplicity, define a022 = a22 − a21 a12 /a11 and b02 = b2 − a21 /a11 b1 , then the linear
system from Equation (1.48) becomes
b
a11 a12 x1
(1.50)
= 10
b2
0 a022 x2
Now we can begin the backward substitution phase. The value of x2 is trivial:
x2 = b02 /a022 . Now that we have the value of x2 , we can substitute back into the
first equation:
a11 x1 + a12 (b02 /a022 ) = b1
Now we can readily compute x1 = (b1 − a12 (b02 /a022 ))/a11 .
Note that we did not create an x02 — our transformation of the system did not
change the solution. We effectively rotated the system about the solution.
Let’s now consider the previous examples where we constructed the linear systems
of equations.
Example 1.4.2. Solution of Example 1.2.5
Take k = 2, f1 = −1, and f2 = 3. Then, the linear system we constructed before
is now
4 −2 x1
−1
=
(1.51)
−2 4
x2
3
As before, we eliminate the (2, 1) entry by scaling the first equation by 1/2 and
adding it to the second. This gives
4 −2 x1
−1
=
(1.52)
0 3
x2
5/2
Then, we readily see x2 = 5/6. Substituting into the first equation gives
4x1 − 5/3 = −1
This gives x1 = 1/6.
18
CHAPTER 1. LINEAR SYSTEMS
Example 1.4.3. Solution of Example 1.2.6
Take R1 = 12, R2 = 4, R3 = 5, R4 = 2, R5 = 10, all in Ohms and V1 = 100, and
V2 = V3 = 0, all in Volts. Then the system from Example 1.2.6 becomes

  

18 −2 −4
I1
0
−2 19 −12 I2  =  0 
(1.53)
−4 −12 16
I3
−100
We need to eliminate all the entries below the diagonal; we’ll start with the (2, 1)
entry. We scale the first equation (row 1) by 1/9 and add it to the second equation
(row 2). This gives

  

18
−2
−4
I1
0
 0 169/9 −112/9 I2  =  0 
(1.54)
−4 −12
16
I3
−100
Now we need to eliminate the (3, 1) entry. We scale the first equation (row 1) by 2/9
and add it to third equation (row 3). This gives


  
0
18
−2
−4
I1
 0 169/9 −112/9 I2  =  0 
(1.55)
−100
0 −112/9 136/9
I3
Finally, we need to eliminate the (3, 2) entry. We scale the second equation (row 2)
112/9
= 112/169 and add it to the third equation (row 3). This gives
by 169/9

  

18 −2
−4
I1
0
 0 169/9 −112/9  I2  =  0 
(1.56)
0
0
1160/169 I3
−100
Now we may proceed with backward substitution. Clearly, I3 = −845/58. Substituting I3 into the second equation gives
169/9I2 − 112/9 ∗ (−845/58) = 0
This gives I2 = −280/29. Now substituting I2 and I3 into the first equation gives
18I1 − 2 ∗ (−280/29) − 4 ∗ (−845/58) = 0
This gives I1 = −125/29.
Example 1.4.4. Solution of Example 1.2.7
Take W = 100 Newtons. Rearranging the equations we have the following system
of equations:


  
1 0 0 0 sin(30o )
0
Bx
0
0 1 0 0 cos(30o )
  By   0 
0


  
0 0 1 0
 Cx   0 
0
cos(20o ) 



= 
(1.57)
o 
0 0 0 1
  0 
0
sin(20
)
C
y






0 0 0 0 sin(30o ) cos(20o )  FBA   0 
0 0 0 0 cos(30o ) − sin(20o )
FAC
100
1.4. SOLVING LINEAR SYSTEMS (BY HAND)
19
Now we eliminate all the entries below the diagonal. Because of our rearrangement, there is only one entry: the (6, 5) entry. First, scale the fifth equation (row 5)
by − cos(30o )/ sin(30o ) and add to the sixth equation (row 6). This gives (switching
to numerical values)


  
1 0 0 0 sin(30o )
0
Bx
0
0 1 0 0 cos(30o )
  By   0 
0


  
0 0 1 0
 Cx   0 
0
cos(20o )



= 
(1.58)
o 
0 0 0 1
  0 
0
sin(20
)
C
y






0 0 0 0 sin(30o ) cos(20o ) FBA   0 
0 0 0 0
0
a066
FAC
100
where a066 = − sin(20o ) − cos(20o ) cos(30o )/ sin(30o ). Now we can proceed with backward substitution. Clearly FAC = 100/a066 = −50.7713 Newtons. Substituting into
the fifth equation gives
sin(30o )FBA + cos(20o )(−50.7713) = 0
which yields FBA = 95.4189 Newtons. Substituting into the fourth equation gives
Cy + sin(20o )(−50.7713) = 0
yielding Cy = 17.3648. Substituting into the third equation gives
Cx + cos(20o )(−50.7713) = 0
yielding Cx = 47.7094. Now substituting into the second equation gives
By + cos(30o )(95.4189) = 0
yielding By = −82.6352. Finally, substituting into the first equation gives
Bx + sin(30o )(95.4189) = 0
yielding Bx = −47.7094.
20
CHAPTER 1. LINEAR SYSTEMS
Chapter 2
Eigenvalues and Eigenvectors
2.1
Core Idea
Consider a simple 2 × 2 matrix A and 2 × 1
a11 a12
, x=
A=
a21 a22
vectors x, y
y
x1
, y= 1
y2
x2
(2.1)
such that
Ax = y
y1 = a11 x1 + a12 x2 , y2 = a21 x1 + a22 x2
(2.2)
Let
a11 = 1, a12 = 2, a21 = 2, a22 = 3 ; x1 = 1, x2 = 0
⇒ y1 = 1, y2 = 2
1
1
[A]
=
0
2
or
A : vector (1, 0) → vector (1, 2)
Alternately stated, the operator A : x → y. Geometrically speaking (see Fig.
2.1) A stretched and rotated x to get y.
Clearly, the stretching and rotating must depend on both A and x. Use the
MATLAB command eigshow with a matrix of your choice and watch. What happens
when the matrix is symmetric and real? 1
Let us now see if for a given A, we can find x such that y1 = λx1 and y2 = λx2
i.e. x and y are on the same line and y is just a simple multiple of x. This implies
a11 x1 + a12 x2 = λx1
a21 x1 + a22 x2 = λx2
1
for some value of x, y and x will line up
21
22
a)
CHAPTER 2. EIGENVALUES AND EIGENVECTORS
b)
Figure 2.1: A : x → y generated using the eigshow command in Matlab. a) Note
the stretching and rotation. b) Note the simple scaling for the case when x is an
eigenvector.
or
Ax = λx
(2.3)
Rearranging
(a11 − λ)x1 + a12 x2 = 0
a21 x1 + (a22 − λ)x2 = 0
or
(A − λI)x = 0
(2.4)
The trivial answer to (2.4) is 0. Much more interesting is the possible non-trivial
answer.
In the last chapter, we discussed the idea of existence and uniqueness of solutions
of linear systems. One method of examining the solvability of the linear system is
examining its determinant. If the determinant of a matrix is non-zero, then the
matrix is invertible. If the determinant is exactly zero, then the matrix is singular.
Using this fact, then we know that for (A−λI)x = 0 to have a non-trivial solution
det(A − λI) = 0 i.e.
det(A − λI) = 0
⇒ (a11 − λ)(a22 − λ) − a12 a21 = 0
2
⇒ λ + (−a11 − a22 )λ + (a11 a22 − a12 a21 ) = 0
2.2. APPLICATIONS
23
This polynomial is named the characteristic polynomial and has roots
p
(a11 + a22 ) + (a11 + a22 )2 − 4(a11 a22 − a12 a21 )
λ1 =
2
p
(a11 + a22 ) + (a11 − a22 )2 + 4a12 a21 )
=
(2.5)
2
p
(a11 + a22 ) − (a11 + a22 )2 − 4(a11 a22 − a12 a21 )
λ2 =
2
p
(a11 + a22 ) − (a11 − a22 )2 + 4a12 a21 )
(2.6)
=
2
Now let us plug in these choices λ1 , λ2 in (2.4). We recover two relationships among
the components of x as
x1
a12
=−
(2.7)
x2
a11 − λ1
x1
a12
=−
(2.8)
x2
a11 − λ2
Note that since (2.4) has 0 on the right (homogeneous equation) – thus a solution
(x1 , x2 ) can be multiplied by any scalar α to produce a valid solution. Thus, (2.4)
only defines a relationship among x1 , x2 and not a unique solution. (2.7) and (2.8)
produce two such independent relationships. If we assume x1 = 1 it follows that:
a11 − λ1
)
ξ1 ≡ (x1 , x2 ) = (1, −
a12
a11 − λ2
)
(2.9)
ξ2 ≡ (x1 , x2 ) = (1, −
a12
are two linearly independent vectors for whom
Aξ1 = λ1 ξ1 and Aξ2 = λ2 ξ2
(2.10)
Clearly – the vectors ξ1 , ξ2 are special. A will only stretch (NOT ROTATE) them
by λ1 , λ2 . (λ1 , ξ1 ), (λ2 , ξ2 ) are called (eigen value, eigen vector) pairs.
2.2
Applications
• Example 1 Consider now the mass spring system from the previous chapter.
Setting k = 1
2 −1
A=
−1 2
p
p
(2 + 2) + 0 + 4(−1)(−1)
(a11 + a22 ) + (a11 − a22 )2 + 4a12 a21 )
=
=3
λ1 =
p 2
p2
(a11 + a22 ) − (a11 − a22 )2 + 4a12 a21 )
(2 + 2) − 0 + 4(−1)(−1)
λ1 =
=
=1
2
2
1
1
Corresponding to λ1 = 3 we get ξ1 =
and for λ2 = 1 we get ξ2 =
by
−1
1
using (2.9).
These modes and natural frequencies can be calculated from a finite element model using a
process that we illustrate now.
24
CHAPTER 2. EIGENVALUES AND EIGENVECTORS
Sample computation of modes/frequencies for 2 DOF system:
Figure 2.2: Vibration of two mass spring system.
• Example 2: For mechanical systems like the one in the first example there
is a nice physical interpretation of eigen values and eigen vectors. Consider
a variant as in Figure 2.2. The equations of motion from summing forces for
each mass in Fig 2.2 are
m1 0
x¨1
k1
−k1
x1
0
+
=
0 m2 x¨2
−k1 k1 + k2 x2
0
(2.11)
Let
x1 = a1 sin ωt and x2 = a2 sin ωt
Differentiating twice with respect to t
x¨1 = −ω 2 a1 sin ωt and x¨2 = −ω 2 a2 sin ωt
substituting above
k1
−k1
a1 sin ωt
0
a1 sin ωt
2 m1
=ω
−k1 k1 + k2 a2 sin ωt
0 m2 a2 sin ωt
(2.12)
Or
Kx = λM x
where λ ≡ ω 2 . Multiplying both sides by M −1 . (Note that since M is diagonal,
M −1 is simply a diagonal matrix with the inverse of each scalar value of M .)
M −1 Kx = λM −1 M x = λx
An eigenvalue problem!
λ ≡ ω 2 is the eigenvalue of the matrix M −1 K. By our choice of x1 and x2 , ω is
the natural frequency. Thus, the eigen value here is the square of the natural
frequency.
2.2. APPLICATIONS
25
ξ2
m1
m1
x1
−1
ξ1
2
k1
m1
m2
x2
1
m2
m2
1
k2
Figure 2.3: Original configuration and vibration at the two modes corresponding to
the two eigen vectors of M −1 K.
k1
m1
−λ
− mk12
− mk11
0
a1
sin ωt =
k1 +k2
a
0
−
λ
2
m2
(2.13)
As before, to find the eigen values we need to set det(M −1 K − λI) = 0.
Let k1 = 10, k2 = 20, m1 = 5, m2 = 10
(10 − 5λ)(30 − 10λ) − 100 = 0
50(λ2 − 5λ + 4) = 0
λ = 1, 4 ⇒ ω1 = 1, ω2 = 2
Plugging these in we have for the eigen vectors
2
−1
ξ1 =
and ξ2 =
1
1
Thus for vibrations at frequency ω1 = 1
⇒ x1 = 2 sin ωt and x2 = 1 sin ωt
and for vibrations at frequency ω2 = 2
⇒ x1 = −1 sin ωt and x2 = 1 sin ωt
Figure 2.3 shows the original configuration and modes of vibration.
• Example 3 Consider
the 2D plane stress state σxx = 2, σxy = 5, σyy = 5. In
2 5
matrix form
. Rotating the stress element (using for e.g. Mohr’s cicle)
5 5
– see figure 2.4 – to where shear stresses are zero to get principal stresses gets
26
CHAPTER 2. EIGENVALUES AND EIGENVECTORS
2
y
5
−1.7202
x
8.720 2
5
y’
x’
Figure 2.4: 2D plane stress and rotated form showing principal stresses. The principal
stresses are the eigen values of the stress matrix and the directions are the eigen
vectors.
−1.7202
us the following principal stresses
. Now recall that if axes line up
8.7202
with eigen vectors we will only have a stretch equal to the eigen value i.e.
Ax0 = λ1 x0 , Ay 0 = λ2 y 0 .
Thus, the principal stresses are the eigen values of the stress matrix and the
principal directions are the eigen vectors.
• Example 4 Consider the following electrical circuit:
di
The voltage across the inductors, each with inductance Lj , is VL,j = Lj dt
, where
t is time. Additionally,
the voltage across the capacitors with capacitance Cj ,
R
is given by C1j idt. Using Kirchoff’s law and summing voltages around each
loop, we arrive at the following system of equations:
Z
Z
di1
1
1
−
(i1 − i2 ) dt −
i1 dt = 0
(2.14)
E − L1
dt
C1
C3
Z
Z
1
1
di2
(i1 − i2 ) −
i2 dt − L2
=0
(2.15)
C1
C2
dt
2.2. APPLICATIONS
27
If we differentiate both equation with respect to time, then the equations become
1
1
dE
d2 i 1
1
+
− i2
=
(2.16)
L1 2 + i1
dt
C1 C3
C1
dt
d2 i 2
1
1
1
L2 2 − i 1
+ i2
+
=0
(2.17)
dt
C1
C1 C2
If the supplied voltage, E, is constant, we can write the system as
d2 i1 1
+ C13
L1 0
C1
dt2
+
2
0 L2 ddti22
− C11
− C11
1
+ C12
C1
i1
0
=
i2
0
(2.18)
As with the spring-mass system, we take the currents, ij to be of the form
ij = Ij sin(ωt + φ). Substituting into Equation 2.18 gives
1
+ 1
L1 0
I1
ω
= C1 1 C3
0 L2 I2
− C1
2
− C11
1
+ C12
C1
I1
I2
(2.19)
Again, we have an eigenproblem, now for the frequency of the current in the
circuits.
• Example 5 Consider the example that we had at the beginning of this chapter
where matrix multiplication can be seen as a geometrical scaling and rotation.
Figure 2.5 illustrates n equally spaced 2D points on a ciricle (blue line) which
are represented by matrix Pnx2 . Using a transformation matrix M , we can find
P 0 which is the transformed version of the original points (red line) using the
following equation.



P =


x1 y 1
x2 y 2 
1 2

0
..
..  , M = 2 1 , P = P M.
.
. 
xn y n
(2.20)
Eigenvectors of ’M’ are also shown in Figure 2.5. In fact the eigenvectors
demonstrate the main directions in which the data have been transformed.
Moreover, the absolute value of eigenvalues associated with each eigenvector
represent the importance of that direction. For instance the eigenvector associated with the largest eigenvalue demonstrate the main direction of transformation.
Now use a M = αI where α is a scalar value and I is an identity matrix. What
are the eigenvalues and eigenvectors? How would the shape change after the
new transformation? Can you explain the physical meaning of the eigenvectors
of the new transformation matrix?
28
CHAPTER 2. EIGENVALUES AND EIGENVECTORS
𝛼2 𝑉2
𝛼1 𝑉1
Figure 2.5: 2D Geometrical Transformation
These modes and natural frequencies can be calculated from a finite element model using a
Figure 2.6: Vibration of a cantilever beam.
process that we illustrate now.
Sample computation of modes/frequencies for 2 DOF system:
• Example 6 Now consider one more example – a continuous system – a beam
that is either cantilevered or simply supported – see figure 2.6. While, the same
ideas extend here we will need to use a differential equation to model this. We
will develop this after the next chapter on differential equations. The picture
suggests the identification of natural frequencies with eigen values and mode
shapes with eigen vectors.
2.3. EXTENDING THE IDEA
A =
29
[ ]
2 −1
−1 2
λ1=1 ξ1=(0.707,0.707)
λ2=3 ξ2=(−0.707,0.707
y=Az
β1 λ1 ξ1
β2 λ2 ξ2
β2
O
2.3
2.3.1
Z
β1
x
Extending the idea
Representation
Now consider some vector z and product Az. Since ξ1 , ξ2 are linearly independent
we can write z = β1 ξ1 + β2 ξ2 . Thus using the linearity of A
Az = A(β1 ξ1 + β2 ξ2 )
= β1 Aξ1 + β2 Aξ2
= β1 λ1 ξ1 + β2 λ2 ξ2
(2.21)
We can represent the action of matrix A on z by using only a simple combination of
the eigen vectors Az = β1 λ1 ξ1 + β2 λ2 ξ2 . Thus, if we know the eigen values and eigen
vectors λ1 , λ2 , ξ1 , ξ2 we can easily compute the action of A on any z, Az by simply
scaling the eigen vectors with the components of z β1 , β2 and the eigen values. Thus,
the effect of operator A on z is conveniently represented using the eigenvectors.
2.3.2
Special Properties
Now we list some special properties of eigen values and eigen vectors that are useful:
P 1 Eigenvalues and eigen vectors of symmetric matrices are real.
P 2 Eigen vectors ξi , ξj of a symmetric matrix corresponding to distinct eigen values
λi 6= λj are orthogonal.
ξi · ξj = 0
30
CHAPTER 2. EIGENVALUES AND EIGENVECTORS
Chapter 3
Differential Equations
3.1
Introduction
In solving engineering problems, we always use the laws of physics to represent our
problems with a set of mathematical equations which are often in forms of ‘differential
equations’. This chapter provides the basic definitions of differential equations and
briefly reviews the solution to the common differential equations in the domain of
Mechanical and Aerospace Engineering.
Differential Equations are equations containing derivatives of one or more
variables. Variables indicating the value of a function are called dependent variables
and the ones that takes different values in the domain are referred to as independent
variable. For example, Equation 3.1 and 3.2 are two differential equations, where
in the first one , ‘y’ is the dependent variable and ‘x’ is the independent variable.
Whereas in the second equation ‘u’ is the dependent variable are ‘x’ and ‘t’ are two
independent variables.
3.1.1
dy
= x2
dx
(3.1)
∂ 2u
∂u
=
2
∂x
∂t
(3.2)
Classification of Differential Equations
Ordinary Differential Equation (ODE): a differential equation whose dependent variable is a function of a single independent variable. Equation 3.1 is an ODE
because y = y(x).
Partial Differential Equation (PDE): a differential equation whose dependent variable is a function of two or more independent variables. Equation 3.2 is a
PDE because u = u(x, t).
31
32
CHAPTER 3. DIFFERENTIAL EQUATIONS
Order of Differential Equation: The highest derivative that appears in the
differential equation.
Homogeneous Equation: If the scalar values (terms which do not include the
dependent variables) in the differential equation are equal to zero, that equation
is called ‘Homogeneous’ otherwise it is ‘non-homogeneous’. In other words if there
exist a trivial solution (x=0) for a differential equation, it is a homogeneous equation
otherwise it is non-homogeneous.
2
For example ddyx2 + sin(x) = 0 is a homogeneous equation but if you replace the
‘0’ with a constant value or a function of the independent variable ‘g(y)’ in general,
it becomes a non-homogeneous equation.
Linear/Non-linear: Any differential equation is called linear if it satisfies the
superposition criteria. That is if x1 and x2 are two different solution of the differential equations, their liner combination (C1 x1 + C2 x2 ) is a solution too.
Examples 3.1 Classify the following Differential Equations:
•
d3 y
dx3
2
d y
+ y dx
2 + x = 0:
3rd order, Non-homogeneous ODE.
Variable: (Dependent y, Independent x).
•
d3 x
dy 3
2
+ y ddyx2 + x = 0:
3rd order, Homogeneous ODE.
Variable: (Dependent x, Independent y).
•
∂2u
∂x2
+
∂2u
∂y 2
=
∂2u
:
∂t2
2nd order, Homogeneous PDE.
Variable: (Dependent u, Independent x, y, t).
•
∂2u
∂x2
+
∂2u
∂y 2
= 5:
2nd order, Non-Homogeneous PDE.
Variable: (Dependent u, Independent x, y).
3.1.2
Notations
In most engineering problems, we are dealing with a physical parameter (such as
velocity, temperature, current, etc.) which varies in a domain. Let use the dependent
variable ‘u’ to represent such a physical parameter. If ‘u’ only varies in time, then
there is only one independent variable that is time (t) and u = u(t). The differential
equations describing the physics of u(t) will be ODE problems and for simplicity we
2
often use u̇ and ü instead of du
and ddt2u .
dt
If ‘u’ varies in time as well spatial domain (e.g. x or y direction), then there
are multiple independent variables and u = u(t, x, y). The differential equations
3.1. INTRODUCTION
33
describing the physics of u(t, x, y) will be PDE problems and for simplicity we may
use the following notations:
2
2
2
2
ut , ux , uy , uxy , utt , uxx , uyy instead of ∂u
, ∂u ∂u , ∂ u , ∂ u , ∂ u and ∂∂yu2 respectively.
∂t ∂x ∂y ∂x∂y ∂t2 ∂x2
3.1.3
Operators
Let φ(x, y) be a scalar parameter (such as temperature) and ~u(x, y) = u1 (x, y)î +
u2 (x, y)ĵ a vector parameter such as velocity. Both φ and ~u can vary from one point
of the domain (xi , yi ) to another (xj , yj ). Then the following operators may be applied to one or both of these two fields.
~ or simply ∇φ represents the level of variation (graGradient: The operator ∇φ
dient) of φ in different direction. The mathematical representations is shown in
equation 3.3. Note that the gradient will convert the scalar into a vector.
∂φ
∂φ
~
î +
ĵ = φx î + φy ĵ.
(3.3)
∇φ(x,
y) = ∇φ(x, y) =
∂x
∂y
Divergence: The operator ∇ · ~u represents the magnitude of the u at different
points of the domain according to equation 3.4. Note that the Divergence will convert
a vector field into a scalar one.
∇ · ~u(x, y) =
∂u1 (x, y) ∂u2 (x, y)
+
.
∂x
∂y
(3.4)
Laplacian: The Laplace operator ∇2 is the divergence of a gradient ∇ · ∇φ. it
will convert a scalar field to another scalar field.
∇2 φ(x, y) = ∇ · ∇φ = (
∂
∂
∂φ
∂φ
∂ 2φ ∂ 2φ
î +
ĵ) · ( î +
ĵ) =
+
= φxx + φyy . (3.5)
∂x
∂y
∂x
∂y
∂x2 ∂y 2
Curl: The Curl operator ∇ × ~u will transform the vector field to another vector
field according to equation 3.6. For example if u is the velocity field, ∇ × ~u will
represent the angular velocity field.
∇ × ~u(x, y) = (
∂
∂
∂u2 ∂u1
î +
ĵ) × (u1 (x, y)î + u2 (x, y)ĵ) = (
−
)k̂.
∂x
∂y
∂x
∂y
(3.6)
Example 3.2 Find the following operations if M = xy î + sin(x)ĵ and T =
zx + 2y + e−z :
2
• ∇T =
∂T
î
∂x
+
∂T
ĵ
∂y
+
∂T
k̂
∂z
= 2xz î + 2ĵ + (x2 − e−z )k̂.
• ∇M : beyond the scope of this class.
• ∇ · T : is not defined.
• ∇·M =
∂
xy
∂x
+
∂
sin(x)
∂y
= y.
34
CHAPTER 3. DIFFERENTIAL EQUATIONS
• ∇ × T : is not defined.
∂
• ∇ × M = ( ∂x
î +
∂
ĵ)
∂y
× (xy î + sin(x)ĵ) = ( ∂sin(x)
−
∂x
∂(xy)
)k̂
∂y
= (cos(x) − x)k̂.
• ∇2 T = Txx + Tyy + Tzz = 2z + e−z .
• ∇2 M : beyond the scope of this class.
Table 3.1: Summary of the Operators
Operation
Gradient
Input
Scalar
Equation
∇φ(x, y) = ∂φ
î +
∂x
Divergence
Vector
∇ · ~u(x, y) =
∂u1
∂x
+
∂u2
∂y
Scalar
Laplacian
Scalar
∇2 φ(x, y) =
∂2φ
∂x2
+
∂2φ
∂y 2
Scalar
Curl
Vector
∂
î +
∇ × u(x, y) = ( ∂x
3.2
∂
ĵ)
∂y
∂φ
ĵ
∂y
Output
Vector
× (u1 î + u2 ĵ)
Vector
Solutions of Linear ODEs
A Linear ODE of order n has precisely n distinct solutions: x1 (t), x2 (t), ..., xn (t).
The general solution of nth order homogeneous ODE is a linear combination of its n
independent solutions. That is
x(t) = C1 x1 (t) + C2 x2 (t) + ... + Cn xn (t)
C1 , C2 , ...Cn are n unknown scalar values. To find a unique solution we need an
additional n conditions. They are known as initial conditions if they represent
the dependant variable at a specific time, or boundary conditions if they provide
spatial information. If all the additional constraints are given at a initial point of the
independent variable (time or space), then the ODE problem may be referred to as
an initial value problem, otherwise it will be called a boundary value problem.
Example 3.3 Classify the following ODE problems.
2
d y
dy
2
• x dx
2 + (x − y) = 0 constraints: at x = 0, y = 1 and dx = 0.
Answer: Initial value problem with three boundary conditions.
•
d2 x
dt2
+ x = sin(t) constraints: at t = 0, x = 1 and at t = 1, x = 1.
Answer: Boundary value problem with two initial conditions.
3.2. SOLUTIONS OF LINEAR ODES
3.2.1
35
2nd Order ODE with constant coefficient
The general solution of any inhomogeneous ODE has the general form of:
x(t) = xh (t) + xp (t)
where xh (t) is the general solution to the homogeneous equation and xp (t) is the
particular solution in response to the inhomogeneous part of the ODE. In many
textbooks xh (t) and xp (t) may be refereed to as transient solution and steady
state solutions respectively.
In this section, we seek to find the ‘homogeneous’ and ‘particular’ solution of 2nd
order ODE with constant coefficients which is shown in Equation 3.7.
aẍ + bẋ + cx = f (t)
(3.7)
Homogeneous/Transient Solution
If f (t) = 0, Equation 3.7 will be a homogeneous ODE. The constant coefficient
suggests that the solution of this ODE may have the general form of xh (t) = Aeαt .
We calculate the first and 2nd derivatives of this candidate solution and replace them
in 3.7.
ẋh = αAeαt = αxh ,
ẍh = αẋh = α2 xh
→ aα2 xh + bαxh + cxh = 0 → (aα2 + bα + c)xh = 0
(3.8)
(3.9)
In order to have a non-trivial solution, we need to find the roots of (aα2 +bα+c) =
0. The roots of this quadratic equation can be studied in three different cases.
Case1: b2 − 4ac > 0: There are two distinct roots α1 and α2 . Each one of these
solutions represents a possible solution. The general solution of this ODE will be the
linear combination of the two solutions:
xh (t) = A1 eα1 t + A2 eα2 t
Case2: b2 − 4ac = 0: There is a real repeated roots α1 . In this case, eα1 t and
teα1 t are two possible solutions (Review your ODE textbook for the proof of the 2nd
solution). The general solution of this ODE will be the linear combination of the
two solutions:
xh (t) = A1 eα1 t + A2 teα1 t
Case3: b2 − 4ac < 0: There are complex conjugate roots α1 = p + iq and
α2 = p − iq. In this case, e(p+iq)t and e(p−iq)t are two possible solutions. The
general solution of this ODE will be the linear combination of the two solutions:
x(t) = C1 e(p+iq)t + C2 e(p−iq)t . We can use Euler formula eiφ = cos(φ) + i sin(φ) to
36
CHAPTER 3. DIFFERENTIAL EQUATIONS
further simplify the solution into a real format.
xh (t) = A1 ept cos(qt) + A2 ept sin(qt) = ept [A1 cos(qt) + A2 sin(qt)]
Particular Solution
The particular solution of ODE has the general form of the inhomogeneous part. For
instance if the inhomogeneous part is sin(4t) the particular solution will have the
form of Asin(4t)+Bcos(4t). A and B are two constants that should satisfy the ODE
problem. The most common inhomogeneous parts and their corresponding solutions
are listed in Table 3.2.
Table 3.2: Particular solutions of most common solutions
inhomogeneous function
C sin(ωt)
Particular solution
A sin(ωt) + B cos(ωt)
C cos(ωt)
A sin(ωt) + B cos(ωt)
a1 t3 + a2 t2 + a3 t + a4
b1 t3 + b2 t2 + b3 t + b4
nth order polynomial
nth order polynomial
c1 eαt
d1 eαt
Note that if we have a linear function of different inhomogeneous functions, the
particular solution will be a liner combination of their associated solutions. Moreover, if the inhomogeneous function is an nth order polynomial, the solution wil also
be an nth order polynomial. All the polynomial coefficients should be assumed to
be non-zero and evaluated in the main ODE. See the following examples for more
clarifications.
Example 3.4 Find the particular solution of the following ODEs:
• ẋ + 4x = 8t2 .
Answer: f (t) = 8t2 → xp (t) = a1 t2 + a2 t + a3 . Substitute xp (t) in the ODE
to find a1 , a2 , a3
ẋp + 4xp = t2 → (2a1 t + a2 ) + 4(a1 t2 + a2 t + a3 ) = 8t2
→ (4a1 − 8)t2 + (2a1 + 4a2 )t + (4a3 + a2 ) = 0

4a1 − 8 = 0 → a1 = 2
→  2a1 + 4a2 = 0 → a2 = − 21 a1 = −1  → xp = 2t2 − t + 0.25
4a3 + a2 = 0 → a3 = − 14 a2 = 0.25

• ẍ + ẋ − x = t + sin(t).
3.3. SOLUTIONS OF PARTIAL DIFFERENTIAL EQUATIONS
37
Answer: f (t) = t + sin(t) → xp (t) = a1 t + a2 + a3 sin(t) + a4 cos(t). Substitute
xp (t) in the ODE to find a1 , ...a4
x¨p + ẋp − xp = t + sin(t)
→ (−a1 − 1)t + (a1 − a2 ) + (−2a3 − a4 − 1) sin(t) + (a3 − 2a4 ) cos(t) = 0


−a1 − 1 = 0 → a1 = −1
 a1 − a2 = 0 → a2 = a1 = −1


→
 a3 − 2a4 = 0 → a3 = 2a4

−2a3 − a4 − 1 = 0 → −5a4 = 1 → a4 = −0.2 , a3 = −0.4
→ xp = −t − 1 − 0.4 sin(t) − 0.2 cos(t)
Example 3.5: Find the complete solution of ẍ + 2ẋ + x = sin(2t). Assume that
x(0) = 0 and ẋ(0) = 0
Answer. Step I finding the homogeneous solution: ẍ + ẋ + 2x = 0
xh (t) = Aeαt → α2 + 2α + 1 = 0 → (α + 1)2 = 0 → α1 = α2 = −1
Case 2 → xh (t) = C1 e−t + C2 te−t = e−t (C1 + C2 t)
Step II Finding the particular solution xp (t) = A sin(2t) + B cos(2t)
→ ẋp = 2A cos(2t) − 2B sin(2t) , ẍp = −4A sin(2t) − 4B cos(2t)
ẍp + 2ẋp + xp = sin(t) → (−3A − 2B − 1) sin(2t) + (−3B + 4A) cos(2t) = 0
3A + 4B = −1
→ A = −0.12 , B = −0.16
−3B + 4A = 0
x(t) = xh (t) + xp (t) = e−t (C1 + C2 t) − 0.12 sin(2t) − 0.16 cos(2t)
Step
III- Apply initial conditions.
x(0) = 0 → C1 = 0.16
ẋ(0) = 0 → C2 = C1 + 0.24 → C2 = 0.4
→ x(t) = (0.16 + 0.4t)e−t − 0.12 sin(2t) − 0.16 cos(2t)
3.3
Solutions of Partial Differential Equations
The most common approach for solving PDE is the method of Separation of Variables. Although this method is not universally applicable to all PDEs, it will provide
a solution to the most simple engineering problems. It assumes that the dependent
variable (e.g y(x, t)) can be written as a product of two separate functions, each of
which depends on only one independent variable. Separation of variables will convert
a PDE problem into a set of ODE problems. We will use this approach to solve some
of the popular PDE problems in engineering.
38
CHAPTER 3. DIFFERENTIAL EQUATIONS
3.3.1
Wave Equation
Consider an elastic string stretched under a tension between two points on the x
axis. By releasing the stretched string, we are interested in calculating the vibration
of the string. In other word we want to find y(x, t) in Figure 3.1.
The PDE describing the vibration of the elastic string is shown in Equation 3.10
and is known as 1-dimensional wave equation.
Y
y(t=0,x)=f(x)
x=L
x=0
X
Figure 3.1: Elastic String under tension
2
∂ 2y
2∂ y
=
α
(3.10)
∂t2
∂x2
This equation is subjected to two boundary conditions: y(0, t) = 0 , y(L, t) = 0
and two initial conditions: y(x, 0) = f (x) , yt (x, 0) = 0.
Problem: Wave Equations:
∂2y
∂t2
2
∂ y
= α2 ∂x
2
BCs: y(0, t) = 0 , y(L, t) = 0 ICs: y(x, 0) = f (x) , yt (x, 0) = 0
Solution: Step I-Separation of Variables
Let assume that we can write the solution as a multiplication of two independent
solution for space (X(x)) and time (T (t)).
y(x, t) = X(x)T (t)


d2 T
yt = X(x) dT
,
y
=
X(x)
tt
dt
dt2
 → X(x) d2 T2 = α2 T (t) d2 X2
→
dt
dx
2
d X
yx = T (t) dX
,
y
=
T
(t)
xx
2
dx
dx
1 d2 X
1 1 d2 T
→
= 2
(3.11)
X(x) dx2
α T (t) dt2
In Equation 3.11, the right hand side is a function of time (t) while the left hand
side is a function of space (x). This is impossible unless they are both equal to a
3.3. SOLUTIONS OF PARTIAL DIFFERENTIAL EQUATIONS
39
constant value. It turns out that only a negative constant value will yield to a nontrivial solution. Therefore the PDE problem can be written as two ODE problems.
1st ODE:
2nd ODE:
1 d2 X
X(x) dx2
1 1 d2 T
α2 T (t) dt2
= −k 2 → X 00 + k 2 X = 0
= −k 2 → T 00 + α2 k 2 T = 0
To see that the constant must be negative, let us consider the cases when the
constant is zero and when the constant is positive. If the constant is zero, then the
ODE for the x-variable is simply X 00 = 0. The solution is X(x) = ax+b for constants
a and b. However, if we now consider the boundary conditions X(0) = X(L) = 0
(see below), then that means a = b = 0, i.e. X(x) = 0. So only the trivial solution
can satisify the equation if the constant is zero. Similarly, if the constant is positive,
call it k 2 , then the differential equation for X(x) becomes X 00 − k 2 X = 0. In this
case, the form of the solution will be a sum of exponentials. Again, applying the
boundary conditions, we find that the only function that satisifies the differential
equation is X(x) = 0. Thus, the constant must be negative in order for us to find
a non-trivial (i.e. nonzero) solution.
Step II-Solving the ODEs: The above ODEs are homogeneous equations of case3.
To solve the above ODEs we need find the new initial and boundary conditions.
y(0, t) = X(0)T (t) = 0 → X(0) = 0
yt (x, 0) = X(x)Ṫ (0) = 0 → Ṫ (0) = 0
→
and
.
y(L, t) = X(L)T (t) = 0 → X(L) = 0
y(x, 0) = X(x)T (0) = f (x)
Solving ODE1: X 00 + k 2 X = 0 and X(0) = X(L) = 0.
Case 3: → X(x) = C1 cos(kx) + C2 sin(kx).
X(0) = 0 → C1 = 0
X(L) = 0 → C2 sin(kx) = 0. There are infinite solutions: kn =
nπ
L
→ Xn (x) = Cn sin( nπ
x) , n = 1, 2, ...∞
L
Solving ODE2: T 00 +
n2 π 2 2
α T
L2
= 0 and Ṫ (0) = 0, X(x)T (0) = f (x).
Case 3: → T (t) = D1 cos( nπ
αt) + D2 sin( nπ
αt).
L
L
Ṫ (0) = 0 → D2 = 0
→ Tn (t) = Dn cos( nπ
αt) , n = 1, 2, ...∞
L
Final PDE Solutions: There are infinite solutions in space and time domain,
yn (x, t) = Xn (t)Tn (t). Therefore the final solution will be a linear combination of
these solutions.
40
CHAPTER 3. DIFFERENTIAL EQUATIONS
y(x, t) =
P∞
n=1 yn (x, t) =
P∞
n=1
wn sin( nπ
x) cos( nπ
αt)
L
L
wn can be calculated using the remaining initial condition: y(0, x) = f (x).
y(x, 0) = f (x) → y(x, 0) =
P∞
n=1
wn sin( nπ
x) = f (x)
L
This is the Fourier series representation of the function f (x). Recognizing this, we
can compute the coefficients, wn , by applying the formula for computing the Fourier
coefficients:
wn =
2
L
RL
0
f (x)sin( nπ
x)dx
L
Example 3.6 Solve the wave equation for f (x) = sin(x) and L = π.
y(x, t) =
wn =
2
π
P∞
n=1 yn (x, t) =
Rπ
0
P∞
n=1
wn sin(nx) cos(nαt)
sin(x) sin(nx)dx → w1 = 1 and other wn are zero.
y(x, t) = sin(x) cos(αt)
3.3.2
Heat Equation
Heat equation describes the heat propagation through a medium. In this equation,
α is the thermal diffusivity. In this section, we will use the method of ‘separation of
variables’ to solve the Heat Equation subject to different boundary conditions.
∂u
∂t
= α∇2 u
Problem 1. Find a temperature distribution in a rod with constant temperature
at both ends. Assume that the surface of the rod is insulated (no convection).
Figure 3.2: Heat transfer in a rod with homogeneous boundary conditions
2
This problem can be described by ∂u
= α2 ∂∂xu2 . This is a 1-dimensional heat
∂t
equation. Similar to wave equation, we solve this problem in three different steps.
3.3. SOLUTIONS OF PARTIAL DIFFERENTIAL EQUATIONS
41
Solution: Step I-Separation of Variables
Let assume that we can write the solution as a multiplication of two independent
solution for space (X(x)) and time (T (t)).
u(x, t) = X(x)T (t)
ut = X(x) dT
dt
2
T
2
ux = T (t) dX
, uxx = T (t) ddxX2 → X(x) ddt = α2 T (t) ddxX2
dx
→
1 d2 X
1 1 dT
=
X(x) dx2
α2 T (t) dt
(3.12)
In Equation 3.12, the right hand side is a function of time (t) while the left hand
side is a function of space (x). This is impossible unless they are equal to a constant
value. Using similar arguments from the wave equation development, it can be shown
that only a negative constant value will yield to a non-trivial solution. Therefore the
PDE problem can be written as two ODE problems.
1st ODE:
2nd ODE:
1 d2 X
X(x) dx2
1 1 dT
α2 T (t) dt
= −k 2 → X 00 + k 2 X = 0
= −k 2 → T 0 + α2 k 2 T = 0
Step II-Solving the ODEs: The 1st ODEs is a homogeneous equations of case3
and the 2nd one is a first order ODE. To solve the above ODEs we need find the
new initial and boundary conditions.
u(0, t) = X(0)T (t) = 0 → X(0) = 0
→
and u(x, 0) = X(x)T (0) = f (x).
u(L, t) = X(L)T (t) = 0 → X(L) = 0
Solving ODE1: X 00 + k 2 X = 0 and X(0) = X(L) = 0.
Case 3: → X(x) = C1 cos(kx) + C2 sin(kx).
X(0) = 0 → C1 = 0
X(L) = 0 → C2 sin(kx) = 0. There are infinite solutions: kn =
nπ
L
→ Xn (x) = Cn sin( nπ
x) , n = 1, 2, ...∞
L
Solving ODE2:
dT
dt
1st order ODE: →
→ Tn (t) = e−
R
n2 π 2 2
α t
L2
2 2
= − nLπ2 α2 T and X(x)T (0) = f (x).
dT
T
=−
R
n2 π 2 2
α dt.
L2
, n = 1, 2, ...∞
Final PDE Solutions: There are infinite solutions in space and time domain,
un (x, t) = Xn (t)Tn (t). Therefore the final solution will be a linear combination of
42
CHAPTER 3. DIFFERENTIAL EQUATIONS
these solutions.
u(x, t) =
P∞
n=1 un (x, t) =
P∞
−
n=1 wn e
n2 π 2 2
α t
L2
sin( nπ
x)
L
wn can be calculated using the remaining initial condition: y(0, x) = f (x).
u(x, 0) = f (x) → u(x, 0) =
P∞
n=1
x) = f (x)
wn sin( nπ
L
Again, we use the formula for the Fourier coefficients:
wn =
2
L
RL
0
f (x)sin( nπ
x)dx
L
Example 3.7 Solve the heat equation for f (x) = 100 , α = 1 and L = π.
u(x, t) =
P∞
n=1 un (x, t) =
P∞
−
n=1 wn e
n2 π 2 2
α t
L2
Rπ
(1 − (−1)n )
wn = π2 0 100 sin(nx)dx = 200
nπ
P∞ 200
2
u(x, t) = n=1 nπ (1 − (−1)n ) sin(nx)en t =
sin( nπ
x)
L
400
π
(sin(x)e−t + sin(3x)e−9t + ...)
Problem 2. Find a temperature in a rod with an insulated end. (IC: u(x, 0) =
f (x), BCs: u(L, t) = 0, ux (0, t) = 0)
The formulation of this problem is similar to problem 1 but with different boundary conditions. Therefore step 1 is the same as problem 1 but we have to redo the
step 2 and 3.
ux (0, t) = X 0 (0)T (t) = 0 → X 0 (0) = 0
→
and u(x, 0) = f (x).
u(L, t) = X(L)T (t) = 0 → X(L) = 0
Solving ODE1: X 00 + k 2 X = 0 and X(0) = X(L) = 0.
Case 3: → X(x) = C1 cos(kx) + C2 sin(kx).
X 0 (0) = 0 → C2 = 0
X(L) = 0 → C1 cos(kx) = 0. There are infinite solutions: kn =
(2n−1)π
2L
→ Xn (x) = Cn cos( (2n−1)π
x) , n = 1, 2, ...∞
2L
Solving ODE2: It will be same as problem 1 but note that k-value has been
changed to (2n−1)π
2L
→ Tn (t) = e−
(2n−1)2 π 2 2
α t
4L2
, n = 1, 2, ...∞
Final PDE Solutions: There are infinite solutions in space and time domain,
un (x, t) = Xn (t)Tn (t). Therefore the final solution will be a linear combination of
3.3. SOLUTIONS OF PARTIAL DIFFERENTIAL EQUATIONS
43
these solutions.
u(x, t) =
P∞
n=1 un (x, t) =
P∞
−
n=1 wn e
(2n−1)2 π 2 2
α t
4L2
cos( (2n−1)π
x)
2L
wn can be calculated using the remaining initial condition: y(0, x) = f (x).
3.3.3
Different Boundary conditions
Comparing the solutions of problem 1 and 2, you can see that the solution of the same
PDE varies depending on the BCs. In general there are three classes of boundary
conditions that may be applied to the PDE problems.
• Drichlet BCs: u is specified at the boundary. If u = 0 it is called homogeneous
boundary conditions.
• Neumann BCs: The first derivative of u is specified at the boundary.
• Robin BCs: A combination of u and its derivative is specified at the boundary.
44
CHAPTER 3. DIFFERENTIAL EQUATIONS
Part II
Numerical Methods
45
Chapter 4
Numerical Solution of Linear
Systems
Previously, we have discussed Gaussian Elimination as a systematic way to compute
the solution to linear systems of equations. However, for large systems, say number of
unknowns greater than 10, this is very cumbersome and error-prone to do manually.
We would much rather have a computer do the work for us. Thus, we will first
discuss outlining an algorithm in “pseudo-code” to perform Gaussian Elimination for
arbitrary square matrices. Next, we consider the computational effort involved in the
algorithm. Next, we study the pitfalls of “naı̈ve” Gaussian Elimination and propose
strategies to overcome these pitfalls. Finally, we consider more efficient variations
when one has multiple right-hand-side vectors for the same matrix.
4.1
Automating Gaussian Elimination
To this point, we’ve only discussed Gaussian Elimination through examples and
illustrating in words. Now, we must convert this procedure into an algorithm that
can be implemented in computer code. When we write such algorithms, we typically
use “pseudo-code” — that is, we express the logic of the algorithm, adhering to
common programming syntax, but not conforming to a specific language.
We construct the algorithm by examining the Gaussian Elimination procedure.
We use each pivot row to eliminate the lower triangular portion of the matrix, each
column at a time. If we have an n × n matrix A, then for each pivot row, i = 1 to
n − 1, we zero out the column beneath the current pivot element, i.e. rows i + 1
to n. We zero out the column by scaling the pivot row using the pivot element
and substracting from the row for which we are trying to construct a zero in the
appropriate column; we also have to adjust the right-hand-side.
Figure 4.1 gives the pseudo-code for the Gaussian Elimination procedure we discussed previously.
47
48
CHAPTER 4. NUMERICAL SOLUTION OF LINEAR SYSTEMS
% Forward elimination
for i = 1:n-1 %pivot row
for j = i+1:n
factor = A(j,i)/A(i,i)
A(j,i) = 0
b(j) = b(j) - factor*b(i)
for k = i+1:n
A(j,k) = A(j,k) - factor*A(i,k)
end
end
end
%Backward substitution
x(n) = b(n)/A(n,n)
for i = 1:n-1
k = n-i
x(k) = b(k)
for j = k+1:n
x(k) = x(k) - A(k,j)*x(j)
end
x(k) = x(k)/A(k,k)
end
Figure 4.1: Pseudo-code for naı̈ve Gaussian Elimination
4.2
Computational Work of Gaussian Elimination
Before we discuss the operation count of Gaussian Elimination, we need to introduce
the concept of “Big O” notation. This is useful because when discuss the scaling of
algorithms, we often don’t need to know exact counts, but only trends as the number
of entries gets large. We say a function f (x) is O(g(x)) if as x → a, there exists
a δ and M such that |f (x)| ≤ M |g(x)| for |x − a| < δ. In the present context, we
might say that “The number of FLOPS in Gaussian Elimination is O(n3 ).” This
means that while we don’t exactly the constant in front of n3 , we know that the cost
scales like the cube of the dimension of the matrix for which we’d perform Gaussian
Elimination.
Now, we can proceed to count the number of floating point operations (FLOPS)
involved in Gaussian elimination. We start with the elimination phase. Referring to
Figure 4.1, we have 1 divide, 1 substraction, and 1 multiplication for each i and for
each j. Additionally, for each k, we have 1 substraction and 1 multiplication. Then,
4.2. COMPUTATIONAL WORK OF GAUSSIAN ELIMINATION
49
the total number of FLOPS, T , is
T =
n−1 X
n
X
3+
i=1 j=i+1
!
n
X
2
(4.1)
k=i+1
Next, we can separate the terms and factor out the numbers from the sums since
they are independent of the counting index. Thus, we have
T =
n−1 X
n
X
3+
i=1 j=i+1
=3
=3
n−1 X
n
n
X
X
n−1 X
n
X
1+2
n−1 X
n
n
X
X
i=1 j=i+1
i=1 j=i+1 k=i+1
n−1
X
n−1 X
n
X
(n − i) + 2
i=1
=3
2
n−1
X
(4.2)
i=1 j=i+1 k=i+1
1
(4.3)
(n − i)
(4.4)
i=1 j=i+1
(n − i) + 2
i=1
n−1
X
(n − i) (n − i)
(4.5)
i=1
Now regrouping terms, we have,
T =
=
=
n−1
X
i=1
n−1
X
i=1
n−1
X
(n − i) + 2
(n − i) + 2
(n − i) + 2
i=1
n−1
X
i=1
n−1
X
i=1
n−1
X
(n − i) + 2
n−1
X
(n − i) (n − i)
(4.6)
i=1
[(n − i) + (n − i) (n − i)]
(4.7)
(n − i) (n − i + 1)
(4.8)
i=1
Now, there are two useful identies that we will use:
n−1
X
i=1
n−1
X
i=1
i=
(n − 1)(n)
2
(4.9)
n(n − 1)(2n − 1)
i =
6
2
Continuing, we can now separate Equation (4.8) into separate terms conducive to
applying the identities in Equation (4.9):
T =
n−1
X
i=1
n−
n−1
X
i=1
i+2
n−1
X
n2 − 2ni + n + i2 − i
(4.10)
i=1
(n − 1)n
= n(n − 1) −
+ 2n2 (n − 1) − 4n
2
n(n − 1)(2n − 1)
(n − 1)n
+2
−2
6
2
(n − 1)n
2
+ 2n(n − 1)
(4.11)
50
CHAPTER 4. NUMERICAL SOLUTION OF LINEAR SYSTEMS
Collecting the terms we see that
2
T = n3 + O(n2 )
3
(4.12)
In other words, there are 32 n3 flops plus some other terms that scale like n2 , but n2
grows much slower than the n3 term, so the constant on the n2 term is unimportant.
A similar analysis can be performed for the backward substitution phase. It
turns out that backward substitution is only O(n2 ), so the elimination phase is by
far the dominant cost of Gaussian Elimination. Table 4.1 shows the cost of Gauss
Elimination for various matrix sizes.
Table 4.1: Cost of Gaussian Elimination as a function of matrix dimension n
Back
2 3
n
Elimination Substitution Total Flops
n
3
10
705
100
805
667
100
671550
10000
681550
666667
1000 6.67 × 108
1000000
6.68 × 108 6.67 × 108
4.3
Percentage due
to Elimination
87.58%
98.53%
99.85%
Partial Pivoting
Up to this point, we have not considered any of the potential modes of failure of
Gaussian Elimination, other than assuming that the system to be solved is nonsingular. However, examining Figure 4.1 we can see one particular mode of failure:
namely, if any entry of the diagonal is zero, then we will have a divide-by-zero
instance, resulting in failure of the algorithm. Such circumstances are common, even
for non-singular systems. What can we do to overcome this difficulty?
The main strategy that is followed is pivoting the rows. That is, we exchange two
rows in the system to remove the zero from the diagonal. The order of the equations
of our system doesn’t alter the solution, so we are free to interchange the rows.
We call this strategy “partial pivoting”. It is possible to additionally interchange
columns — such a strategy is called “complete pivoting”. However, the complexity
of the complete pivoting strategy far outweighs its utility. Indeed, the vast majority
of linear systems encountered in science and engineering can be adequately treated
with partial pivoting.
So, assuming we wish to adopt the partial pivoting strategy, how do we systematically implement it? The key is one extra step before eliminating entries in the
current pivot column. Namely, we search for the largest magnitude element in the
current column and then exchange the current with row with that row. In this way,
we always ensure that we have a nonzero value for the pivot element.
Example 4.3.1. Gaussian Elimination with Partial Pivoting
4.4. LU DECOMPOSITION
51
Consider the following linear system of equations:
   

1 0 2 3
x1
1
−1 2 2 −3 x2  −1

   
 0 1 1 4  x3  =  2 
6 2 2 4
x4
1
We wish to solve this system using Gaussian Elimination with partial pivoting. First,
we examine column 1 and see that the largest element is the (4, 1) entry. Thus, we
pivot the 1st and 4th rows to get

   
6 2 2 4
x1
1
−1 2 2 −3 x2  −1

   
 0 1 1 4  x3  =  2 
1 0 2 3
x4
1
Now we proceed eliminating the

6
2
0 7/3

0
1
0 −1/3
subdiagonal entries in the first column. This leaves
  

2
4
x1
1
  

7/3 −7/3
 x2  = −5/6
1
4  x3   2 
5/3 7/3
x4
5/6
Now, the pivot element becomes the (2, 2) entry. Before eliminating the subdiagonal
elements of the second column, we search for the maximum element, in absolute
value, in the second column below the diagonal. Here, the current pivot element is
the maximum, so there is no need to pivot. We proceed with elimination yielding

  

6 2
2
4
x1
1
0 7/3 7/3 −7/3 x2   −5/6 

  = 

0 0
0
5  x3  33/14
0 0
2
2
x4
5/7
Now the pivot element is the (3, 3) entry. Now we have a zero pivot! There’s
only one other remaining row, so we exchange row 3 and 4. This gives

  

6 2
2
4
x1
1
0 7/3 7/3 −7/3 x2   −5/6 


  = 
0 0
2
2  x3   5/7 
0 0
0
5
x4
33/14
Now the elimination phase is complete (system is now upper triangular) and the solution procedure can be completed with the standard backward substitution algorithm.
4.4
LU Decomposition
Gaussian Elimination gives a “direct” route to a solution for a given linear system.
However, if we wish to change only the right-hand-side, Gaussian Elimination requires us to completely redo all stages of the solution process. As we saw in previous
52
CHAPTER 4. NUMERICAL SOLUTION OF LINEAR SYSTEMS
examples in Chapter 1.2, such as the statics examples, the right-hand-side consists
of only external loads on a structure. When studying a structure, we may wish to
apply many different loads without changing the geometry or other properties of the
structure. Given that the elimination phase is by far the most compute-intensive, is
there a way we can reuse the elimination part of the solution phase so that we can
easily, and cheaply, change only the loading?
The answer to this question is yes and relies on the notion of matrix decompositions. The idea is to “decompose” a matrix into separate matrices. In the present
context, we focus on the LU decomposition:
A = LU
(4.13)
where L is a lower triangular matrix and U is an upper triangular matrix. We know
we ought to be able to accompolish such a decomposition because the elimination
phase of Gaussian Elimination transformed the system into the form
Ux = d
(4.14)
Assuming that we can compute the decomposition A = LU , then we can see
how we can alter our solution procedure. Consider the linear system Ax = b. If
we decompose A = LU , then we have LU x = b. Now, define y = U x. Then
we have Ly = b. Since L is lower triangular and b is given, we can easily solve
for y using forward substitution, in analogy with backward substitution only
starting at the “top” and moving forward through the system. With y computed,
y now becomes the data for the system U x = y. Now, we can solve U x = y using
backward substitution. Both substitution phases are cheap, i.e. O(n2 ), so that
once we’ve determined the LU decomposition of A, it is straight-forward to vary the
vector b without having to recompute the matrix decomposition. So, using the LU
decomposition, the solving a linear system proceeds in three steps:
1. Factor: Compute the matrix decomposition A = LU , where L is lower triangular and U is upper triangular.
2. Forward Substitution: Solve Ly = b for the vector y.
3. Backward Substitution: Solve U x = y for the vector x.
How do we compute the LU decomposition? Let’s first consider a 2 × 2 case.
1 2
l11 0 u11 u12
=
3 5
l21 l22
0 u22
This gives us the relationships:
l11 u11
l11 u12
l21 u11
l21 u12 + l22 u22
=1
=2
=3
=5
4.4. LU DECOMPOSITION
53
We have 4 equations, but 6 unknowns — the system is underdetermined! This means
we have a choice in how to proceed. The choice we make is to have the diagonal
entries lii = 1.1 Now that we’ve made this choice, we can continue developing the
LU decomposition.
For the general 4 × 4 case, the factorization is
 



a11 a12 a13 a14
1 0 0 0
u11 u12 u13 u14
a21 a22 a23 a24  l21 1 0 0  0 u22 u23 u24 
 


A=
(4.15)
a31 a32 a33 a34  = l31 l32 1 0  0
0 u33 u34 
a41 a42 a43 a44
l41 l42 l43 1
0
0
0 u44
We can multiply L and U back together and compare against A in order to deduce
the values of lij and uij .


a11 a12 a13 a14
a21 a22 a23 a24 

A=
a31 a32 a33 a34 
a41 a42 a43 a44


u11
u12
u13
u14
l21 u11 l21 u12 + u22

l21 u13 + u23
l21 u14 + u24

=
l31 u11 l31 u12 + l32 u22 l31 u13 + l32 u23 + u33

l31 u14 + l32 u24 + u34
l41 u11 l41 u12 + l42 u22 l41 u13 + l42 u23 + l43 u33 l41 u14 + l42 u24 + l43 u34 + u44
Right away, we see the top row matches directly with A, namely u11 = a11 , u12 = a12 ,
u13 = a13 , and u14 = a14 . With u11 determined, we can now determine the li1 values.
In particular, l21 = a21 /u11 = a21 /a11 , and similarly, l31 = a31 /a11 and l41 = a41 /a11 .
These are exactly the scaling factors from Gaussian Elimination! In fact, turning to
the 2nd row, the entries in U are the same as if we were to apply Gaussian Elimination
to A. Namely, u22 = a22 − l21 u12 , u23 = a23 − l21 u13 , and u24 = a24 − l21 u14 . This
pattern continues upon examination of the second column, the 3rd row, and so
on. So, to construct the LU decomposition of A, we apply the elimination step of
Gaussian Elimination to construct U and we store the factors in L.
Example 4.4.1. LU Decomposition
Consider the following matrix A:


1 2 3
A = 2 6 10
3 14 28
Let us construct the LU decomposition of A. We apply Gaussian Elimination as
before, but now we’ll keep track of the pivot factors and store them in L and the
“eliminated” version of A will be U . First, we scale the first equation by 2/1 and
1
This is the so-called “Doolittle” decomposition; there are other variants, but we will not discuss
them.
54
CHAPTER 4. NUMERICAL SOLUTION OF LINEAR SYSTEMS
substract from the second equation. Thus, the l21 entry will be 2 and we have




1 2 3
1 0 0
U = 0 2 4  , L =  2 1 0
3 14 28
l31 l32 1
Now we eliminate the (3, 1) entry by scaling the first equation by 3/1 and subtracting
from the third equation. Thus, l31 = 3 and we have




1 2 3
1 0 0
U = 0 2 4  , L = 2 1 0
0 8 19
3 l32 1
Finally, we eliminate the (3, 2) entry by scaling the second equation by 8/2 and
subtracting from the third equation. Thus, l32 = 4 and we have




1 2 3
1 0 0
U = 0 2 4 , L = 2 1 0
0 0 3
3 4 1
The eliminated version of A is, in fact, U and we have filled L, so we have completed
the LU decomposition of A. Now, we may verify that


 

1 0 0 1 2 3
1 2 3
LU = 2 1 0 0 2 4 = 2 6 10 = A
3 4 1 0 0 3
3 14 28
Example 4.4.2. Solving a Linear System using LU Decomposition
Consider the linear system

   
1 2 3
x1
1
2 6 10 x2  = 2
3 14 28 x3
3
In the previous example, we computed the LU decompositon of A:



1 0 0 1 2 3
LU = 2 1 0 0 2 4
3 4 1 0 0 3
To solve the linear system using the LU decomposition, first we define the vector
y = U x. Then, upon substituting for A and U x, we have

   
1 0 0 y1
1
Ly = 2 1 0 y2  = 2 = b
3 4 1 y3
3
4.4. LU DECOMPOSITION
55
We apply forward substitution to compute the y vector. So, trivially, y1 = 1. Then,
2y1 + y2 = 2 so that y2 = 0. Finally, 3y1 + 4y2 + y3 = 3 so that y3 = 0. Thus,
y = [1, 0, 0]T .
Now that we’ve computed y, we can perform backward substitution, i.e. solve

   
1 2 3 x1
1





U x = 0 2 4 x2 = 0 = y
0 0 3 x3
0
So we immediately see that x3 = 0. Then, moving “backwards”, 2x2 + 4x3 = 0 so
that x2 = 0. Finally, we have x1 + 2x2 + 3x3 = 1 so that x1 = 1. Thus, the solution
to our original linear system is x = [1, 0, 0]T .
Once we’ve computed the decomposition of our matrix, we can easily compute the
solution for a variety of right-hand-side vectors by using the LU decomposition and
forward and backward substitution. This potentially save a great deal of computational effort as the LU decomposition phase is O(n3 ), but the forward and backward
substitution phases are only O(n2 ). So, we only need to do one LU decomposition
for our matrix, the expensive part, but then it’s relatively cheap to do many repeated
forward and backward substitutions for many right-hand-side vectors.
We may also use partial pivoting with the LU decomposition. The same strategy
is used as with Gaussian Elimination, namely we search the current column for the
largest magnitude entry and exchange the two rows. Additionally, we now keep
track of a permutation matrix, P , that encodes the row swapping operations. In
particular, P is just the identity matrix, but each time two rows are exchanged
during the decomposition, we also exchange the corresponding two rows of P . In
particular, at the end of the LU decomposition computation,
P A = LU
(4.16)
The important implication here is that, if we wish to solve a linear system, we must
also permute the right-hand-side in order to obtain a correct solution:
P Ax = LU x = P b
(4.17)
That is, we must apply P to b before performing the forward and backward substitution. This is because when we pivot the rows during elimination, we have to also
swap the rows of the right-hand side.
Example 4.4.3. LU Decomposition with Partial Pivoting
Consider again the matrix A:


1 0 2 3
−1 2 2 −3

A=
0 1 1 4
6 2 2 4
56
CHAPTER 4. NUMERICAL SOLUTION OF LINEAR SYSTEMS
We will compute the LU decomposition including partial pivoting. In addition to
populating the lower triangular L matrix, we will also keep track of the permuation
matrix P .
As in Example (4.3.1), we first exchange rows 1 and 4 since the (4, 1) entry is the
largest in magnitude in the first column. So we exchange the rows in A and proceed
as before, eliminating the first column and storing the pivot factors in L, but we now
also permute rows 1 and 4 in our matrix P :






6
2
2
4
1
0 0 0
0 0 0 1
0 7/3 7/3 −7/3




 , L = −1/6 1 0 0 , P = 0 1 0 0 ,
U =
0




1
1
4
0
l32 1 0
0 0 1 0
0 −1/3 5/3 7/3
1/6 l42 l43 1
1 0 0 0
No we proceed to eliminate the second column. Here, no permutation is
we proceed:





6 2
2
4
1
0
0 0
0 0
0 7/3 7/3 −7/3
−1/6


1
0 0
0 1
,L = 
U =
,P = 
0 0
 0
0 0
0
5 
3/7 1 0
0 0
2
2
1/6 −1/7 l43 1
1 0
needed, so
0
0
1
0

1
0
,
0
0
Now, in column 3, we must permute the 3rd and 4th rows according to the pivoting
algorithm. Here, we must exchange rows 3 and 4 of P , but we must also exchange
the subdiagonal rows L:






6 2
2
4
1
0
0 0
0 0 0 1
0 7/3 7/3 −7/3



1
0 0
 , L = −1/6
 , P = 0 1 0 0 ,
U =
0 0
 1/6 −1/7 1 0
1 0 0 0
2
2 
0 0
0
5
0
3/7 l43 1
0 0 1 0
Finally, l43 = 0/2 = 0 and U is already in upper triangular form so we are done:






1
0
0 0
6 2
2
4
0 0 0 1
−1/6
0 7/3 7/3 −7/3


1
0 0


 , P = 0 1 0 0
L=
 1/6 −1/7 1 0 , U = 0 0


2
2
1 0 0 0
0
3/7 0 1
0 0
0
5
0 0 1 0
Example 4.4.4. Solving a linear system using LU Decomposition with Partial Pivoting
Consider the following linear system of equations:

   
1 0 2 3
x1
1
−1 2 2 −3 x2  −1

   
 0 1 1 4  x3  =  2 
6 2 2 4
x4
1
4.5. CHOLESKY DECOMPOSITION
57
We will use the decomposition we computed previously to solve the linear system.
First, we must apply the permutation matrix, P , to our right-hand-side vector since
P Ax = LU x = P b
Thus,

0
0

1
0
0
1
0
0
0
0
0
1
   
1
1
1
−1 −1
0
  =  
0  2   1 
0
1
2
First, we perform forward substitution, Ly = P b:

1
0
0
−1/6
1
0

 1/6 −1/7 1
0
3/7 0
   
  

0
y1
1
y1
1
   
  

0
 y2  = −1 ⇒ y2  =  −5/6 
y3   5/7 
0 y3   1 
y4
1 y4
2
33/14
And now we can perform backward substitution, U x = y, to get the solution x:

  

  

6 2
2
4
x1
1
x1
−13/70
0 7/3 7/3 −7/3 x2   −5/6 
  


  = 
 ⇒ x2  =  8/35 
0 0








2
2
x3
5/7
x3
−4/35 
0 0
0
5
x4
33/14
x4
33/70
4.5
Cholesky Decomposition
Many systems encountered in engineering analysis are symmetric and positive-definite.
The system is symmetric if, when written in matrix form Ax = b, A = AT . The
system is positive definite if, and only if, all the eigenvalues are real and positive.
There are many algorithms for which we can take advantage of this structure of the
problem. The Cholesky decomposition is one such example.
For symmetric, positive-definite systems, instead of computing the LU decomposition, we can compute the Cholesky Decomposition:
A = LLT
(4.18)
where is a lower-triangular matrix. Note we already save half of the memory requirements of the LU decomposition because we only need to store L, not both L
and U . Additionally, the Cholesky decomposition requires approximately half the
computational effort of the LU decomposition.
58
CHAPTER 4. NUMERICAL SOLUTION OF LINEAR SYSTEMS
To construct the Cholesky decomposition, we proceed as before, writing out the
matrix in general form and computing each entry term-by-term. For the 3 × 3 case,



l11 0 0
l11 l21 l31
A = LLT = l21 l22 0   0 l22 l32 
l31 l32 l33
0 0 l33
 2

(symmetric)
l11
2
2

l21
+ l22
= l21 l11
2
2
2
l31 l11 l31 l21 + l32 l22 l31 + l32 + l33
From this expression, we can see that each of the entries in L is
!
j−1
X
1
aij −
lik ljk , i > j
lij =
ljj
k=1
v
u
j−1
u
X
t
2
ljj = ajj −
ljk
k=1
Using these expressions, we can construct pseudocode for the Cholesky algorithm.
for k = 1 : n
% evaluate off-diagonal terms
for i = 1 : k-1
sum=0
for j = 1 : i -1
sum = sum + A(i,j) * A(k,j)
end
A(k,i) = (A(k,i) - sum) / A(i,i)
end
% evaluate diagonal term s=0
for j = 1 : k-1
sum = sum + (A(k,j))^2
end
A(k,k) = sqrt(A(k,k) - sum)
end
Figure 4.2: Pseudo-code for Cholesky factorization.
4.6
Computing the Inverse of a Matrix
We open this discussion by first saying that one should never compute the
inverse of a matrix directly. We will elaborate on this point later, but suffice it to
say for the time being, it is much more efficient to compute the matrix decomposition
4.6. COMPUTING THE INVERSE OF A MATRIX
59
and use forward and backward subsitution to solve a linear system. That said, the
inverse of a matrix is a useful theoretical tool that we use often.
To compute the inverse of a matrix A, we use LU decomposition, but for a
sequence of right-hand-sides, bi :
Axi = bi
(4.19)
We choose bi as each of the unit vectors: b1 = [1, 0, 0, . . . , 0]T , b2 = [0, 1, 0, 0, . . . , 0]T ,
b3 = [0, 0, 1, 0, . . . , 0]T , etc. Once we solve for each xi , we combine each of the column
vectors into a matrix; this matrix is A−1 . That is
A−1 = [x1 , x2 , . . . , xn ]
This corresponds to nothing else than the definition of the inverse:
AA−1 = I
AX = B
Now, if we apply the LU decomposition algorithm
LU X = I ⇒ U X = Y, LY = I
Notice now that forward backard substitutions must be done on matrix right-handsides, not vectors. This means that forward and backward substitution become O(n3 )
algorithms, from O(n2 ). This means that forward and backward substitution become
asymptotically as expensive as the matrix decomposition! This is why one should
never, except in the most trivial of circumstances (e.g. diagonal matrices), solve a
linear system by computing the inverse matrix.
Example 4.6.1. Computing the Inverse of a Matrix
Consider the matrix


1 −1 2
A = −2 1 1
−1 2 1
The LU decomposition for this matrix

1
0
L = −2 1
−1 −1
First, we solve LY = I for the

1
0
−2 1
−1 −1
is



0
1 −1 2
0 , U = 0 −1 5
1
0 0 8
matrix Y .

 

0
y11 y12 y13
1 0 0
0 y21 y22 y23  = 0 1 0
1
y31 y32 y33
0 0 1
We do the forward substitution one column at a time.

   
1
0 0
y11
1
−2 1 0 y21  = 0
−1 −1 1
y31
0
60
CHAPTER 4. NUMERICAL SOLUTION OF LINEAR SYSTEMS
This gives y11 = 1, y21 = 2, and y31 =
column:

1
0
−2 1
−1 −1
3. We repeat this process again for the next
   
0
y12
0
0 y22  = 1
1
y32
0
This gives y12 = 0, y22 = 1, and y32 = 1. Repeating again for the third column gives
y13 = y23 = 0 and y33 = 1. Now, having computed Y , we solve U X = Y :


 

1 −1 2
x11 x12 x13
1 0 0
0 −1 5 x21 x22 x23  = 2 1 0
0 0 8
x31 x32 x33
3 1 1
Proceeding before, one column at a time, we can compute each column of X. When
completed, the matrix X will be exactly A−1 :


1/8 −5/8 3/8
X = A−1 = −1/8 −3/8 5/8
3/8
1/8 1/8
Verify that AA−1 = I.
4.7
Practice Problems
Work the following problems in Chapra [1]:
• 8.3
• 8.10
• 9.6
• 10.5
• 10.8
Chapter 5
Numerical Error and Conditioning
To this point, we have mainly focused on analytical solutions. The previous chapter was our first foray into numerical methods, but the focus was on turning our
“by-hand” algorithm into Matlab code that automated the solution procedure. Nevertheless, for many engineering problems, it is simply not possible to compute analytical solutions and we must rely on numerical approximations. While these approximations can be very accurate, they are nonetheless approximations. Therefore,
it is important for us to understand the errors involved in our approximations and
to judge whether the level of error is acceptable for studying engineering systems.
First, we will overview a potpourri of topics related to error considerations. Then, we
will discuss the how real numbers are represented digitally in the computer. Before
studying the impact of floating point representation on the solution of linear systems,
we will need to review the concept of norms as they relate to vectors and matrices.
5.1
Error Considerations
When discussing the various errors encountered our study, we discuss the accuracy of
algorithms, methods, etc. as well as the precision of the computer that we use, etc.
Accuracy relates to “how closely our computed values match the true values”. Precision, on the other hand, relates to “how closely computed values agree”. Figure 5.1
illustrates this concept with targets. The more accurate points on the target are
closer to the bullseye, while the more precise points are clustered closer to together.
Notice that we can have more precision, but less accuracy.
Errors arise from many different sources, such as approximate algorithms, modeling approximations, and the numerical computations that take place in the computer.
For any particular scalar quantity, we can think of the error as a “shift” between the
“truth” (whatever that is) and our computed value:
x∗ = x + e
(5.1)
where x∗ is the true value, x is our computed value, and e is the error. So, if we
know x∗ , then the absolute error is
e = x∗ − x
61
(5.2)
62
Accuracy vs. Precision
CHAPTER 5. NUMERICAL ERROR AND CONDITIONING
Figure 5.1: Graphical depiction of accuracy and precision concepts. Taken from [1].
and the relative error is
x∗ − x
x∗
We also can discuss relative error in terms of percent relative error:
erel =
erel % = erel × 100%
(5.3)
(5.4)
Of course, we never can compute the exact error — if we could, we would know the
exact solution! We are reduced to trying to estimate the error. The estimates used
depend on the context; they include comparing “old” and “new” values, looking at
the remainder of a Taylor series truncation, looking at the residual of our equations,
and using higher fidelity algorithms to gain insight into the error.
Another important concept that relates to reporting the precision of our answer,
or its error, is significant figures. The number of significant figures indicates precision. Significant digits of a number are those that can be used with confidence,
e.g., the number of certain digits plus one estimated digit. For example, if you read
the speedometer in your car (and assuming it is a traditional analog needle and
not digital), you wouldn’t purport to be going “48.958” miles-per-hour because you
cannot accurately read the gage to that many digits. When reporting values, it is
understood that leading zeros are not significant figures since these are eliminated
by “moving the decimal point” in scientific notation, whereas trailing zeros are significant. For example, the numbers 0.00001753, 0.0001753, and 0.001753 all have 4
significant figures whereas 5.38 × 104 , 5.380 × 104 , and 5.3800 × 104 have 3, 4, and 5
significant digits, respectively.
We must be careful to not report false significant figures. For example, if we
type 3.25/1.96 into Matlab, we will get back 1.65816326530162. But we will report
either 1.65 (chopping) or 1.66 (rounding). This is because we do not know what lies
5.2. FLOATING POINT REPRESENTATION
63
beyond the second decimal place! Consider the following example. If we change the
third (unknown) digit and use chopping, we get
3.259/1.960 = 1.66275510204082...
3.250/1.969 = 1.65058405281869...
Similarly, if we using rounding and change the third (unknown) digit, we get
3.254/1.955 = 1.66445012787724...
3.250/1.969 = 1.65058405281869...
We see that we can easily end up with different decimal values in the second decimal
place! So, we only report 3 significant figures: the first two we are confident and the
third is uncertain.
5.2
Floating Point Representation
Once source of error that is always present when performing computations using a
computer is the limitation of the computer to only be able to store a finite number of
digits. Because of this limitation, arithmetic operations will always have round-off
error present. To understand the source of round-off error, we need to study how
the computer stores numbers. First, however, we need to remind ourselves of the
basic number systems.
By now, all of us are very comfortable using a base 10 (decimal) number system.
However, because the memory in a computer is effectively just switches (on or off),
the computer works in binary number systems. Figure 5.2 illustrates how we represent the number 86, 409 in a decimal system vs. representing the number 173 in a
binary number system.
Number Systems
Base 10
(Decimal)
Base 2
(Binary)
Figure 5.2: Illustration of decimal and binary number systems. Taken from [1].
64
CHAPTER 5. NUMERICAL ERROR AND CONDITIONING
Next, we focus on the storage of integers because integers are simpler than real-valued
numbers so we will start there.
Integer$Representa2ons$
Figure 5.3 illustrates
thethemain
idea
storing
an the
integer
• Use
first bit
of afor
word
to indicate
sign in memory in a com0: positive
(off), in
1: negative
(on)is a zero or one, with the first bit
puter. Namely, each bit–(on-off
switch
memory)
• Remaining
bitsstoring
are used the
to store
associated with the integer
used for
signa number
of the integer.
+
1 0 1 0
0 1
0
1 1
0
! %""""""
"$"""""""
#
Sign
Number
Figure 5.3: Illustration of integer storage in computer memory.
When discussing the size of the integers, we refer to how many bits are used to store
the integer. For example, if we consider an 8-bit integer, then we have 1 bit for the
sign and 7 bits to represent the numeral part of the integer. Figure 5.4 illustrates
such an 8-bit integer.
Integer$Representa2ons$
8-bit word
± 26 2 5 2 4 2 3 2 2 2 1 20
$#" $!!!!!!#!!!!!!"
Sign
Number
= 0000000
# smallest number
Figure 5.4: Illustration
of integer
storage
in0base10
computer memory.
base2 =
"
!largest number = 1111111base2 = 127 base10
•
+/- 0000000 are the same, therefore we may use
Notice, in particular, -0that
there are
to represent
-128bounds to the numbers we can represent:
Total numbers
28 = 256
(-128 ∼127)
the smallest number •is 0000000
in =base
2, which
is 0 in base 10, while the largest
number is 1111111 in base 2, which is 127 in base 10. Additionally, because “0” is mathematically the same as “+0”, we can use “-0” to represent -128. So
therefore, with an 8-bit integer, we can represent the numbers -128 to 127. Anything
outside of this range is an overflow (larger than the max) or an underflow error
(smaller than the min): we “flow under/over” the boundaries of memory we have
to represent the number. Of course, we need numbers much larger than 128. More
commonly used integers are 32-bit and 64-bit versions. With 32-bit integers, we can
stores numbers up to 231 = 2, 147, 483, 648, while with 64-bit integers, we can store
263 = 9.22337203685 × 1018 . Note that this explains why one could not have more
than 2 Gigabytes of memory in a computer that possessed only 32-bit hardware
and/or a 32-bit operating system: the memory could not be addressed!
The storage of real-valued numbers is similar. The format that is used is referred
to as “floating point representation”, alluding to the fact that scientific notation
is always used and the decimal “floats” to accommodate the normalization of the
number. In particular, there is a bit to track the sign of the number, then a block for
the signed exponent, and then the “mantissa”, the significant figures of the number.
Figure 5.5 illustrates generic floating point representation for an arbitrary base B
number system.
5.3. REVIEW OF VECTOR AND MATRIX NORMS
65
Floa2ng$Point$Representa2on$
e
m
$!!#
!!
" $!!#
!!"
$
±$ ± e 1 e 2 % e m d 1 d 2 d 3 % d p
sign of
number
signed exponent
mantissa
N = ± .d 1 d 2 d 3 !d p B e = mB e
• m: mantissa
Figure 5.5: Illustration
of Base
storage
of floating
numbers in computer memory.
• B:
of the
numberpoint
system
• e: signed exponent
• Note: the mantissa is usually normalized if
Of course, in the computer,
the number
used is always binary (base 2).
the leading
digit is system
zero
There are two primary types used for floating point numbers in the computer:
“single precision” (32-bit) and “double precision” (64-bit). In scientific computing,
double precision is the norm; in particular, by default, all floating point numbers in
Matlab are double precision. Of the 32-bits allocated for single precision numbers,
1 bit is for the sign, 8 bits are for the signed exponent, and 23 bits are for the
digits. Thus, for single precision numbers, the smallest number that can be stored
is ≈ ±1.17549 × 10−38 and the largest is ≈ ±3.40282 × 1038 . Analogously, for the
64-bits in a double precision number, 1 bit is for the sign, 11 bits are for the signed
exponent, and 52 bits are for the digits. The smallest double precision number is
≈ ±2.2251 × 10−308 while the largest is ≈ ±1.7977 × 10308 .
In addition to the limit on the magnitude of the numbers that can be stored,
there’s also a limit on the difference that can be stored during a floating point
operation. In particular, we only have the width of the mantissa available for the
digits, so any differences that exceed that width will be truncated. In particular,
if we are adding two numbers whose difference exceeds the width of the mantissa,
then the addition will be truncated completely.
Example 5.2.1. Double Precision Truncation
The width of the mantissa in double precision is 52 bits, so we expect differences
greater than of the order of 252 ≈ 4.5036 × 1015 to be truncated. In fact, the
number 2−52 ≈ 2.22 × 10−16 is called the “machine epsilon” and is the variable eps
in Matlab. So, if we execute the command 1 + eps/2 in Matlab, we see that we get
back exactly the value of 1, i.e the eps/2 factor was truncated.
5.3
Review of Vector and Matrix Norms
Before examining the impact of round-off error due to floating point truncation,
we first need to recall the definition of vector and matrix norms and some of their
properties. Given vectors u and v, their dot product is
u · v = u1 v1 + u2 v2 + · · · + un vn =
n
X
ui vi
(5.5)
i=1
The length of the vector u is written in terms of its dot product:
√
kuk2 = u · u
(5.6)
66
CHAPTER 5. NUMERICAL ERROR AND CONDITIONING
In fact, the length of the vector is one type of vector norm. Norms, in general, are
used to measure the size of mathematical objects. This language allows us to study
of the behavior of more abstract mathematical entities. In general, norms are just
mathematical operators that take the object and return a number. So, vector norms
take vectors and return a number:
kuk : u ∈ Rn → R
(5.7)
The Euclidean norm in Equation (5.6) is one example. Other examples are the
general p-norms:
! p1
n
X
(5.8)
kukp =
|ui |p
i=1
and the “infinity” norm:
kuk∞ = max |ui |
i
(5.9)
The idea of measuring the size of objects is quite general. We can apply it to
matrices as well. So, in the case of matrices, the norm operator takes a matrix and
returns a number:
kAk : A ∈ Rm × Rn → R
(5.10)
These norms are “induced” by the norms of vectors; their formal development is
beyond the scope of this course, but we provide a few examples. The one norm of a
matrix corresponds to the “column sum”:
kAk1 = max
n
X
j
|aij |
(5.11)
i=1
while the infinity norm of a matrix is the “row sum”:
kAk∞ = max
i
n
X
|aij |
(5.12)
j=1
The Frobenius norm is what you might’ve guessed as the two norm:
v
uX
n
u n X
t
a2ij
kAkF =
(5.13)
i=1 j=1
For square matrices, it turns out that the two norm is the square root of the maximum
eigenvalue of AT A:
q
kAk2 = max λi (AT A)
(5.14)
i
In this case, kAk2 is sometimes called the spectral norm.
All norms enjoy the following properties, by definition:
kuk > 0 if u 6= 0
kαuk = |α|kuk, α ∈ R
ku + vk ≤ kuk + kvk
(5.15)
(5.16)
(5.17)
5.4. CONDITIONING OF LINEAR SYSTEMS
67
The last property is also referred to as the triangle inequality. There are also two
other important properties that vector and matrix norms satisfy:
kABk ≤ kAkkBk
kAxk ≤ kAkkxk
5.4
(5.18)
(5.19)
Conditioning of Linear Systems
Armed with the previously discussed notions of norms, we are now ready to study
the effect of truncation error in the solution of linear systems. Supposed we are
interested in solving the system Ax = b. Suppose now that as we construct b, we
have accumulated errors due to floating point truncation: b + ∆b. Here, b is the exact
vector and ∆b is error we incurred. This error will then induce error in our solution,
namely x + ∆x. So the system we are really solving is
A(x + ∆x) = (b + ∆b)
(5.20)
Expanding terms and using the fact that the exact problem, Ax = b is still satisfied,
Ax + A∆x = b + ∆b
A∆x = ∆b
Then, multiplying both sides by A−1 and taking norms, we have
∆x = A−1 ∆b
⇒ k∆xk = kA−1 ∆bk
⇒ k∆xk ≤ kA−1 kk∆bk
If we now examine our exact equation,
Ax = b
⇒ kAxk = kbk
⇒ kbk ≤ kAkkxk
1
1
≤ kAk
⇒
kxk
kbk
Putting together the relationships for k∆xk and kxk, we have
k∆xk
k∆bk
≤ kAkkA−1 k
kxk
kbk
(5.21)
What this equation is saying is that our output error, k∆xk/kxk, is our input error,
k∆bk/kbk, magnified by the quantity kAkkA−1 k. The quantity kAkkA−1 k is called
the condition number and is written at the symbol κ(A). So if our input error is
due to floating point truncation, say k∆bk/kbk ≈ 10−15 , and if our condition number
68
CHAPTER 5. NUMERICAL ERROR AND CONDITIONING
is ≈ 105 , then our output error is going to be k∆xk/kxk ≈ 10−10 . We lost five digits
just due to the condition number!
In general, we also have errors in our matrix A. A similar calculation to that
above yields
k∆xk
k∆Ak k∆bk
≤ κ(A)
+
(5.22)
kxk
kAk
kbk
So our output error is roughly the sum of our input errors, but then magnified by the
condition number. So the higher the condition number, the more error in our output.
Linear systems that possess a large condition number are said to be “ill-conditioned”
or “poorly conditioned” systems. The solution of such systems will possibly require
more care in order to obtain the necessary accuracy.
Example 5.4.1. Hilbert Matrix
One notoriously ill-conditioned matrix is the Hilbert matrix:


1
1
.
.
.
1 21
3
n
1
1
1 
1
. . . n+1
3
4

2
1
1
1 
1
.
.
.
A = 3
4
5
n+2 
 ..
..
..
.. 
.
.
.
.
.
.
. 
1
1
1
1
. . . 2n−1
n
n+1
n+2
(5.23)
In Matlab, the hilb command will generate a Hilbert matrix for the input dimension. Additionally, the condition number of a matrix can be computed using the
cond command. Thus, the Matlab command cond(hilb(5)) will compute the condition number of a 5 × 5 Hilbert matrix. Matlab reports the condition number as
4.7661e+05.
5.5
Practice Problems
Work the following problems in Chapra [1]:
• 11.9
Chapter 6
Numerical Differentiation
We have so far spoken of solving differential equations where it was possible to find a
function that satisfied the differential equation and initial/ boundary conditions by
“guessing” a function – inserting it in the equation and solving for appropriate constants. However, this approach fails in almost all real problems where the geometry
is complex and/or the functions needed are not simple.
When no exact solution is possible the best we can do is to obtain approximate
solutions. A first step to this end is creating approximations of the derivatives that
are in the differential equations.
6.1
Approximating Derivatives
Most approaches to approximate derivatives fall into one of the following categories
1. Finite differences
2. Fitting a curve/surface to the desired function and using its slope as the derivative.
6.1.1
What Are Finite Differences?
Finite differences build on the definition of the derivative of a function f (x) as the
“rate of change of f (x) with respect to x”. Thus, the derivative will be the ratio of
the “variation in f (x) and the corresponding variation in x”. For some xi let
f (xi + ∆x) − f (xi )
df
(xi ) = lim
∆x→0
dx
∆x
Instead of passing to the limit where ∆x is infinitesimal we can simply approximate
the derivative using a finite value of ∆x as
df
f (xi + ∆x) − f (xi )
(xi ) ≈
= fˆ0 (xi )
dx
∆x
(6.1)
Thus, finite means not infinite and not infinitesimal, in other words non-zero. Depending on the choice of ∆x we will get more or less error. As we make ∆x smaller
69
70
CHAPTER 6. NUMERICAL DIFFERENTIATION
xi)4# xi)3# xi)2# xi)1# xi#
xi+1# xi+2# xi+3# xi+4#
Figure 6.1: Discretization for finite difference approximation of derivative at xi .
df
and smaller we will recover dx
but then the number of computations may become
unaffordable. The question then is how to pick a variation of f (x) and a variation
in x, ∆x so it minimizes error but is still easy to compute.
6.1.2
Taylor Series and Approximate Derivative
Finite difference approximations of derivatives with a systematic definition of both
the variation in f (x) and ∆x can be generated by combining Taylor series expansions.
Recall the definition of Taylor series that any differentiable function f (x) can be
estimated in the neighborhood of a point xi as
f (x) = f (xi ) +
1 d2 f
1 d3 f
df
2
(xi )(x − xi ) +
(x
)(x
−
x
)
+
(xi )(x − xi )3 ...
i
i
dx
2 dx2
6 dx3
(6.2)
Writing x − xi = ∆x or x = xi + ∆x yields
f (xi + ∆x) − f (xi ) =
df
1 d2 f
1 d3 f
2
(xi )∆x +
(x
)∆x
+
(xi )∆x3 ....
i
dx
2 dx2
6 dx3
Dividing by ∆x and rearranging we get an expression for the error in fˆ0 (xi )
df
f (xi + ∆x) − f (xi ) df
1 d2 f
1 d3 f
fˆ0 (xi ) − (xi ) =
− (xi ) =
(x
)∆x
+
(xi )∆x2 + ...
i
dx
∆x
dx
2 dx2
6 dx3
The first term in the error depends on ∆x, the second term depends on ∆x2 ,
... Thus for ∆x << 1 the first term will dominate. In this case, since the leading
order term in the truncation error is O(∆x), we say that this is a first order accurate
approximation.
6.1.3
Taylor Series and Finite Differences
Let us now consider a systematic use of the Taylor series approximation to control the
error in the approximation of the derivative – for example if we can devise a scheme
to construct fˆ0 (xi ) so that the leading term in the error above is 0 then clearly the
error will go as ∆x2 . In this case, we say the error is second order accurate since the
exponent on ∆x is 2.
Consider, the line in Figure 6.1 with a set of points at equal intervals
{..., xi−4 , xi−3 , xi−2 , xi−1 , xi , xi+1 , xi+2 , xi+3 , xi+4 ...}
6.1. APPROXIMATING DERIVATIVES
71
Now let us express the values f (xi±∗ ) in terms of a Taylor series expansion about
xi :
... = ...
df
(xi )(xi−2 − xi ) +
dx
df
f (xi ) + (xi )(xi−1 − xi ) +
dx
f (xi )
df
f (xi ) + (xi )(xi+1 − xi ) +
dx
df
f (xi ) + (xi )(xi+2 − xi ) +
dx
...
f (xi−2 ) = f (xi ) +
f (xi−1 ) =
f (xi ) =
f (xi+1 ) =
f (xi+2 ) =
... =
1 d2 f
(xi )(xi−2 − xi )2 +
2
2 dx
1 d2 f
(xi )(xi−1 − xi )2 +
2 dx2
1 d3 f
(xi )(xi−2 − xi )3 ...
3
6 dx
1 d3 f
(xi )(xi−1 − xi )3 ...
6 dx3
1 d2 f
(xi )(xi+1 − xi )2 +
2 dx2
1 d2 f
(xi )(xi+2 − xi )2 +
2 dx2
1 d3 f
(xi )(xi+1 − xi )3 ...
6 dx3
1 d3 f
(xi )(xi+2 − xi )3 ...
6 dx3
Rearranging



...
...
f (xi−2 )  1

 
f (xi−1 )  1

 
 f (xi )  =  1

 
f (xi+1 )  1

 
f (xi+2 )  1
...
...

...
...
 f (xi ) 

1
n 
(x
−
x
)
...
 df (x ) 
i−2
i
n!

i


dx
1
(xi−1 − xi )n ...

d2 f
n!

2 (xi ) 
dx

0...

  d3 f
1
n   dx3 (xi ) 
(x
−
x
)
...


i+1
i
n!


...
1
n 
(xi+2 − xi ) ...  dn f

n!
 dxn (xi )
...
...


(xi−2 − xi )
(xi−1 − xi )
0
(xi+1 − xi )
(xi+2 − xi )
1
(xi−2
2
1
(xi−1
2
− xi ) 2
− xi ) 2
0
1
(xi+1 − xi )2
2
1
(xi+2 − xi )2
2
1
(xi−2
6
1
(xi−1
6
− xi ) 3
− xi ) 3
0
1
(xi−1 − xi )3
6
1
(xi+2 − xi )3
6
...
...
...
...
...
Note that a simpler form is possible if the spacing of the points is uniform i.e |xi−4 −
xi | = 4∆x, |xi−3 − xi | = 3∆x, |xi−2 − xi | = 2∆x, |xi−1 − xi | = ∆x, |xi+1 − xi | =
∆x, |xi+2 − xi | = 2∆x, ....
{f } = [D]{df }
{df } = [D]−1 {f }
(6.3)
Let us consider the implication of (6.3). If we know the values of the function at a set
of points f (xi±∗ ) then we are able to exactly compute different orders of derivative
di f
(xi ) at xi . Thus, our original goal of solving a differential equation involving
dxi
2
terms that look like ddxf2 (xi ) etc. can be accomplished by replacing the derivative
terms using suitable expression from (6.3).
In reality, of course, we cannot afford to compute the whole Taylor series but will
truncate it after a few terms. This implies that our choice of truncation point will
define the approximation error e.g. if we truncate after 3 terms, the error will be
3
dominated by 61 ddxf3 (xi )∆x3 . Thus, assuming no error in f (xi±∗ ), the error is set by
the lowest higher order derivative that we do not include and the appropriate power
of ∆x.
This also allows us to estimate the first few derivatives using only a few of the
df
equations from (6.3). (e.g. dx
which is 2 unknowns including f (xi ) needs only 2
72
CHAPTER 6. NUMERICAL DIFFERENTIATION
2
df d f
equations from (6.3) or dx
, dx2 which is 3 unknowns only – this can be done by
choosing 3 equations from (6.3).)
f (xi )
f (xi )
1
0
=
df
f (xi+1 )
1 (xi+1 − xi ) dx
(xi )
Or setting (xi+1 − xi ) = ∆x,
−1 f (xi )
1 0
1
=
= −1
df
1 ∆x
(xi )
∆x
dx
0
1
∆x
f (xi )
=
f (xi+1 )
which leads to
f (xi+1 ) − f (xi )
df
=
dx
∆x
with a truncation error of O(∆x).

 


f (xi )
f (xi−2 )
1 (xi−2 − xi ) 21 (xi−2 − xi )2
df
f (xi−1 ) = 1 (xi−1 − xi ) 1 (xi−1 − xi )2   dx
(xi ) 
2
d2 f
f (xi )
1
0
0
(xi )
dx2




 
−1
f (xi )
f (xi−2 )
1 (xi−2 − xi ) 12 (xi−2 − xi )2
df
 dx
(xi )  = 1 (xi−1 − xi ) 12 (xi−1 − xi )2  f (xi−1 )
2
d f
f (xi )
1
0
0
(xi )
dx2
(6.4)
Or rewriting assuming uniform discretization


 
−1 
f (xi )
1 −2∆x 12 (−2∆x)2
f (xi−2 )
df
 dx
(xi )  = 1 (−∆x) 21 (−∆x)2  f (xi−1 )
2
d f
f (xi )
1
0
0
(xi )
dx2
Inverting and solving leads to
df
−f (xi−2 ) + 4f (xi−1 ) − 3f (xi )
(xi ) =
dx
2∆x
(6.5)
with a truncation error of O(∆x2 ).
d2 f
−f (xi−2 ) + 2f (xi−1 ) − f (xi )
(xi ) =
2
dx
∆x2
(6.6)
with a truncation error of O(∆x). This is of course by no means a unique choice –
we could have chosen

 
−1 

f (xi )
1
0
0
f (xi )
df
 dx
(xi )  = 1 (xi+1 − xi ) 12 (xi+1 − xi )2  f (xi+1 )
2
d f
1 (xi+2 − xi ) 12 (xi−2 − xi )2
f (xi+2 )
(xi )
dx2
to get
df
−f (xi+2 ) + 4f (xi+1 ) − 3f (xi )
(xi ) =
dx
2∆x
(6.7)
6.1. APPROXIMATING DERIVATIVES
73
with a truncation error of O(∆x2 ), or,
d2 f
−f (xi+2 ) + 2f (xi+1 ) − f (xi )
(x
)
=
i
dx2
∆x2
(6.8)
with a truncation error of O(∆x).
Two things stand out
• the formulae (6.4), (6.7), (6.8) involve values f (xi+k ), k > 0 while (6.5), (6.6)
involve values f (xi+k ), k < 0. The first category k > 0 are called forward
differences while the second category k < 0 are called backward differences.
df
but (6.4) uses only two points
• Formulae (6.4), (6.5) both approximate dx
f (xi ) and f (xi+1 ) and has a truncation error of ∆x while (6.5) uses 3 points
f (xi−2 ), f (xi−1 ), f (xi ) and gets a truncation error of ∆x2 . Note also that (6.7)
achieves ∆x2 with a different set of points.
What about not picking either k > 0 or k < 0 exclusively? This can indeed lead
to many possible difference formulae. One of which is special – central finite differences using evenly spaced x are a bit special because there is natural cancellation
of alternating terms in the Taylor series. The order of accuracy is one degree higher
than normal.
f (xi+1 ) − f (xi−1 )
df
=
(6.9)
dx
∆x
with a truncation error of O(∆x2 ).
Example 6.1
Let f (x) = sin(x). Consider a discretization
..., xi−2 = π/2−0.1, xi−1 = π/2−0.05, xi = π/2, xi+1 = π/2+0.05, xi+2 = π/2+0.1...
now let us estimate
df
(π/2).
dx
Using (6.4)
df
f (xi+1 ) − f (xi )
sin(π/2 + 0.05) − sin(π/2)
0.99875 − 1
=
=
=
= −0.025
dx
∆x
0.05
0.05
The exact answer is cos(π/2) = 0. Note the error is comparable to ∆x at 0.05. Using
(6.7)
df
−f (xi+2 )+4f (xi+1 )−3f (xi )
(xi ) =
2∆x
dx
− sin(π/2−0.1)+4. sin(π/2+0.05)−3. sin(π/2)
=
0.05×2
= 3.1237 × 10−05 << ∆x2 = 0.0025
(6.10)
(6.11)
(6.12)
Even more impressive on this problem the central difference calculation is exact
because of canceling errors!
The MATLAB diff function is a fast way to compute differences
df=diff(f ); gives the same result as i=1:length(f )-1; df=f(i+1)-f(i);
74
CHAPTER 6. NUMERICAL DIFFERENTIATION
Figure 6.2: Table of Forward Difference formulae. Taken from Chapra [1].
Figure 6.3: Table of Central Difference formulae. Taken from Chapra [1].
6.2. HIGHER DIMENSIONS AND PARTIAL DERIVATIVES
75
Figure 6.4: Table of Backward Difference formulae. Taken from Chapra [1].
6.1.4
What if there is error in evaluation of f (x)?
This is commonly encountered when processing empirical data. In that case, there is
no underlying function that we can readily evaluate, we only have data points. And,
of course, those points will have error in them. Further, we have no mechanism to
shrink the spacing without gathering new data (which can be very expensive). So
we are stuck using the points we have. Unfortunately, finite differences do not handle
measurement error very well. In particular, derivatives tend to get “noisier” as we
take more and more derivatives. Fig. 6.5 shows the effect of error in the evaluation.
If we are indeed in such a situation, the best thing to do is use the formulae that
possess more points. This can both increase the accuracy of the derivative as well as
help “smooth out” the noise.
6.2
Higher Dimensions and Partial Derivatives
With uniform grids it is “relatively straight forward” to combine one-dimensional
finite difference rules by a “tensor” product. It is just as easy, perhaps more so, to
generate multidimensional finite differences by combining multidimensional Taylor
series expansions. This is no more difficult to do for non-uniform grids. In two
dimensions the Taylor series will be:
∂f
1 ∂ 2f
∂f
(x0 , y0 )(x − x0 ) +
(x0 , y0 )(y − y0 ) +
(x0 , y0 )(x − x0 )2 +
∂x
∂y
2 ∂x2
1 ∂ 2f
1 ∂ 2f
2
(x
,
y
)(y
−
y
)
+
(x0 , y0 )(x − x0 )(y − y0 ) + · · ·
0
0
0
2 ∂y 2
2 ∂x∂y
f (x, y) =f (x0 , y0 ) +
For the first order partial derivatives, all the single variable derivative formulae
apply in the direction in which the derivative is acting. Similarly for the second
76
CHAPTER 6. NUMERICAL DIFFERENTIATION
Figure 6.5: On the left, we have the “exact” function. On the right, we add just a
small amount of error in the function. When we use finite differences to estimate the
derivative, the error is greatly exaggerated.
6.3. PRACTICE PROBLEMS
77
derivatives, except for the mixed partial term. For the mixed partial term, we can
apply the finite difference formulae one-at-a-time. Namely,
∂ 2f
∂ ∂f
=
∂x∂y
∂x ∂y
Then, using a central difference approximation in the y-direction:
∂f
f (x0 , y0 + ∆y) − f (x0 , y0 − ∆y)
(x0 , y0 ) ≈
∂y
2∆y
Now, we can apply a central difference approximation to in the x-direction to the
previous expression:
∂ ∂f
(x0 , y0 ) ≈
∂x ∂y
f (x0 +∆x,y0 +∆y)−f (x0 +∆x,y0 −∆y)
2∆y
−
f (x0 −∆x,y0 +∆y)−f (x0 −∆x,y0 −∆y)
2∆y
2∆x
f (x0 + ∆x, y0 + ∆y) − f (x0 + ∆x, y0 − ∆y)
=
+
4∆x∆y
−f (x0 − ∆x, y0 + ∆y) + f (x0 − ∆x, y0 − ∆y)
4∆x∆y
Example 6.2.1. Numerical Gradient in Matlab
See Chapra [1, Example 21.8, pg. 538]
6.3
Practice Problems
Work the following problems in Chapra [1]:
• 21.3
• 21.12
• 21.14
78
CHAPTER 6. NUMERICAL DIFFERENTIATION
Chapter 7
Solution of Initial Value Problems
Solving initial value problems requires us to start with a given ordinary differential
equation and initial condition and march forward in time. Finite differences of the
types we just learned are very useful for that.
7.1
A Simple Illustration
Consider the simple ODE for y(t) used to find y(1) given
dy
+ 2y = 0
dt
with initial conditions
y(0) = 1
First let us find the exact solutions for comparison. Analytical solutions of these are
of the form y = aeλt + b. Differentiating with respect to time, t, we get dy
= λaeλt
dt
y(0) = 1
ae−2t = 1
⇒a=1
y(t) = e−2t
y(1) = e−2×1 = 0.13533
Now let us solve it with finite differences. Replace dy
by the first order forward
dt
difference formula yi+1∆t−yi , with yi = y(ti ), etc. and ti+1 = ti + ∆t:
λaeλt + 2(aeλt + b) = 0
aeλt (λ + 2) + 2b = 0
⇒ 2b = 0
λ = −2.
yi+1 − yi
+ 2yi = 0
∆t
yi+1 − yi
= −2yi
∆t
yi+1 = yi − 2yi ∆t
79
(7.1)
80
CHAPTER 7. SOLUTION OF INITIAL VALUE PROBLEMS
Since we know y(0) = 1 we can start the iteration y0 = y(0) = 1 and after choosing
a ∆t (say ∆t = 0.25) we have
t1 = 0 + ∆t = 0.25
t2 = t1 + ∆t = 0.50
t3 = t2 + ∆t = 0.75
t1 = t3 + ∆t = 1.00
→
→
→
→
y1
y2
y3
y4
= y0 − 2y0 ∆t = 1 − 2 × 1 × 0.25 = 0.5
= y1 − 2y1 ∆t = 0.5 − 2. × 0.5 × 0.25 = 0.25
= y2 − 2y2 ∆t = 0.25 − 2. × 0.25 × 0.25 = 0.125
= y3 − 2y3 ∆t = 0.125 − 2 × 0.125 × 0.25 = 0.0625
Clearly y(1) = 0.0625 is a somewhat poor approximation of y(1) = 0.13533!
Can we do better? If ∆t → 0 then the difference formula reduces to the derivative.
Thus, let us try with a smaller ∆t e.g. ∆t = 0.1.
t1 = 0 + ∆t = 0.1
t2 = t1 + ∆t = 0.2
...
t10 = t9 + ∆t = 1.0
→
→
...
→
y1 = y0 − 2y0 ∆t = 1 − 2 × 1 × 0.1 = 0.8
y2 = y1 − 2y1 ∆t = 0.8 − 2 × 0.8 × 0.1 = 0.64
...
y10 = y9 − 2y9 ∆t = 0.107374
This value using a smaller ∆t = 0.1 is clearly more accurate. Note that in this
formula yi+1 depends only on yi and we can compute. This method based on the
forward difference is called the Forward Euler method.
Let us now consider a backward difference formula
dy
yi − yi−1
=
dt
∆t
Applying to our problem leads to
yi − yi−1
+ 2yi = 0
∆t
yi − yi−1
= −2yi
∆t
yi = yi−1 − 2yi ∆t
yi+1 = yi − 2yi+1 ∆t
(7.2)
The new value is “implicitly” defined – unlike the forward difference. We can simplify
– though if this was a system of ODEs we would have to solve systems of equations!
yi+1 (1 + 2∆t) = yi
yi
yi+1 =
1 + 2∆t
(7.3)
Solving this again for ∆t = 0.25 leads to a y(1) = 0.197531. Reducing ∆t to 0.1
leads to y(1) = 0.161506; Further reducing ∆t = 0.01 gets y(1) = 0.138033. This
method is called the Backward Euler method.
Example 7.1.1. Euler’s Method
See Chapra [1, Example 22.1, pg. 556].
Example 7.1.2. Euler’s Method for Systems of Equations
See Chapra [1, Example 22.4, pg. 572]
7.2. STABILITY
7.2
81
Stability
What is meant by a “stable” method? The finite difference integration at every
increment introduces numerical round-off and other types of error at every update.
Let us rework the previous example with a small perturbation and look for behavior
at y(10). The exact answer is y = e−20 = 2.061154 × 10−09 . Since we know y(0) = 1
we can start the iteration y0 = y(0) = 1 and after choosing a ∆t (say ∆t = 0.25)
and add a perturbation of 0.001 we have
t1 = 0 + ∆t = 0.25
t2 = t1 + ∆t = 0.50
t3 = t2 + ∆t = 0.75
...
t40 = t39 + ∆t = 10.0
→
→
→
...
→
y1 = y0 − 2y0 ∆t = 1 − 2 × 1 × 0.25 = 0.501
y2 = y1 − 2y1 ∆t = 0.501 − 2. × 0.501 × 0.25 = 0.2505
y3 = y2 − 2y2 ∆t = 0.2505 − 2. × 0.25 × 0.2505 = 0.12525
...
y40 = y39 − 2y39 ∆t = y39 (1 − 2 × 0.25) = 9.094 × 10−13
The error induced by the perturbation is clearly decaying out quickly from 0.001 to
0.00025 in 2 steps. thus, for this calculation any perturbation (likely much smaller
than the 0.001 ) will not affect the result.
Let us rework the previous example with a small perturbation and a larger time
step. Since we know y(0) = 1 we can start the iteration y0 = y(0) = 1 and after
choosing a ∆t (say ∆t = 2.) we have
t1 = 0 + ∆t = 2.
t2 = t1 + ∆t = 4.
t3 = t2 + ∆t = 6.
...
→
→
→
...
y1 = y0 − 2y0 ∆t = 1 − 2 × 1 × 2. = −3.001
y2 = y1 − 2y1 ∆t = −3.001 − 2. × (−3.001) × 2 = 9.003
y3 = y2 − 2y2 ∆t = 9.003 − 2. × 9.003 × 2 = −27.012
...
The simple question arises – will the perturbations decay leaving the solution uncorrupted or will they keep growing and ultimately completely overwhelm the solution.
Conditions under which the latter happens is what we seek. The answer lies in the
eigenvalue problem:
dy
= λy; y(0) = y0
dx
y(x) = y0 eλx
First Order Forward Difference formula
yi+1 = yi + ∆xλyi = (1 + ∆xλ)yi
After n steps
yn = (1 + ∆xλ)n y0 = An y0
where A is an amplification factor. For stability we need the
|A| < 1
⇒ |(1 + ∆xλ)| < 1
82
CHAPTER 7. SOLUTION OF INITIAL VALUE PROBLEMS
2
λ
This idea of having a limited step size in order to preserve stability is inherent to all
explicit methods.
Now let’s repeat this procedure for the backward Euler method. In particular,
λ < 0 ⇒ ∆x <
yi+1 = yi + ∆xλyi+1
⇒ yi+1 (1 − ∆xλ) = yi
1
yi
⇒ yi+1 =
(1 − ∆xλ)
Now, the amplification factor A = 1/(1 − ∆xλ). So, for λ < 0, there is no restriction
on ∆x to preserve stability! It is unconditionally stable! That is not to say that
you can take large step sizes and expect an accurate solution, but rather that the
errors do not grow exponentially with each time step. Although not all implicit
methods are unconditionally stable, generally implicit methods have much better
stability properties compared to explicit methods.
This increased stability comes at a cost, however. If we are evolving systems of
differential equations, then implicit methods will require the solution of a, potentially
nonlinear, system of equations. However, for problems with large time step restrictions, even the cost of the extra solves of linear or nonlinear systems can still leave
the implicit method being the clear winner in terms of cost. Equations with such
time step restrictions are said to be stiff. Chemical kinetics is one classical example
of highly stiff systems of differential equations (arising from the large magnitudes of
activation energies). Stiff systems often have very fast transients followed by a much
slower mode in the solution.
7.3
Multistage Methods
Any numerical method to solve the ODE
dy
= f (t, y)
dt
that has the form
yn+1 = yn + ∆t
p
X
bj kj
(7.4)
j=1
k1 = f (tn , yn )
ki = f (tn + ci−1 ∆t, yn + ∆t
(7.5)
i−1
X
ai−1,j kj ), i = 2, . . . , n
(7.6)
j=1
belongs to the Runge-Kutta family of methods. The forward and backward Euler
methods belong to the Runge-Kutta family. However, the most popular implementations of Runge-Kutta (RK) are “predictor-corrector” forms. The idea is that we use
7.3. MULTISTAGE METHODS
83
“intermediate stages” of the time step interval to “predict” the value and then use the
slope at that point to “correct” the final result. The more predictions/corrections,
the more accurate you can make the scheme.
Indeed, this is how Runge-Kutta methods are derived in general: we have a
particular accuracy we wish to achieve. We use Taylor expansions up to the order
of accuracy we desire, and then match the terms in the Taylor expansion with the
form in equation (7.4) to determine the unknown coefficients.
Example 7.3.1. Heun’s Method
See Chapra [1, Example 22.2, pg. 563]
7.3.1
First Order Explicit RK Methods
In this case, we only are seeking first order accurate methods, so p = 1. That is,
yi+1 = yi + ∆tb1 k1
k1 = f (ti , yi )
so that
yi+1 = yi + ∆tb1 f (ti , yi )
Now we do a Taylor expansion of yi+1 about ti :
yi+1 = yi + ẏ(ti )∆t
⇒ yi+1 = yi + f (ti , yi )∆t
Comparing this Taylor expression to our Runge-Kutta step, we see that b1 = 1. This
is just Forward Euler!
7.3.2
Second Order Explicit RK Methods
We can repeat the same procedure for second order methods. Namely,
yi+1 = yi + ∆t (b1 k1 + b2 k2 )
k1 = f (ti , yi )
k2 = f (ti + c1 ∆t, yi + a1,1 k1 ∆t)
so that
yi+1 = yi + ∆t (b1 f (ti , yi ) + b2 f (ti + c1 ∆t, yi + a1,1 k1 ∆t))
Now, if we do a Taylor expansion of yi+1 about ti :
∆t2
2
∆t2
= yi + f (ti , yi )∆t + ÿ(ti )
2
yi+1 = yi + ẏ(ti )∆t + ÿ(ti )
⇒ yi+1
84
CHAPTER 7. SOLUTION OF INITIAL VALUE PROBLEMS
To handle the ÿ term, we recognize that ÿ(ti ) = f˙(ti , yi ). Furthermore, y is implicitly
a function of time, y = y(t), so we can use the chain rule to differentiate f with respect
to time:
∂f dy
∂f
+
f˙(ti , yi ) =
∂t
∂y dt
= ft + fy f
Thus, our Taylor expansion of y is
yi+1 = yi + f (ti , yi )∆t + (ft (ti , yi ) + fy (ti , yi )f (ti , yi ))
∆t2
2
(7.7)
We can’t quite compare our Taylor series to our R-K step rule yet. Now, we need
to expand the k2 term in a Taylor expansion about (ti , yi ) so that we may directly
compare with our Taylor expansion in equation (7.7):
f (ti + c1 ∆t, yi + a1,1 k1 ∆t) = f (ti , yi ) + c1 ∆tft (ti , yi ) + a1,1 k1 ∆tfy (ti , yi ) + · · ·
Now, substituting back into our R-K equation
yi+1 = yi + ∆t (b1 f (ti , yi ) + b2 (f (ti , yi ) + c1 ∆tft (ti , yi ) + a1,1 k1 ∆tfy (ti , yi )))
= yi + ∆tb1 f (ti , yi ) + ∆tb2 f (ti , yi ) + ∆t2 c1 ft (ti , yi ) + ∆t2 a1,1 k1 fy (ti , yi )
= yi + ∆tf (ti , yi )(b1 + b2 ) + ∆t2 b2 c1 ft (ti , yi ) + ∆t2 b2 a1,1 f (ti , yi )fy (ti , yi )
∆t2
= yi + f (ti , yi )∆t(b1 + b2 ) + (2b2 c1 ft (ti , yi ) + 2b2 a1,1 fy (ti , yi )f (ti , yi ))
2
Comparing the final equation to our Taylor expansion in equation (7.7), we get the
following conditions for our coefficients:
b1 + b2 = 1
2b2 c1 = 1
2b2 a1,1 = 1
We have 4 unknowns and only 3 equations to solve for them! This means there is an
infinite number of 2nd order accurate RK methods, we can choose 1 coefficient and
then determine the other three. The 3 most popular are
• The Midpoint Method — b2 = 1 ⇒ b1 = 0, c1 = a1,1 = 1/2
• Ralston’s Method — b2 = 3/4 ⇒ b1 = 1/4, c1 = a1,1 = 2/3
• Heun’s Method — b2 = 1/2 ⇒ b1 = 1/2, c1 = a1,1 = 1
7.3. MULTISTAGE METHODS
85
The Midpoint Method
To solve
dy
= f (y, t)
dt
y(0) = y0
we need a scheme to use yn ≡ y(tn ), f (yn , tn ), ∆t to find yn+1 ≈ y(tn+1 ). Let us start
by describing the midpoint method.
yn + f (tn , yn ) ∆t
2
yn+ 1 =
2
yn+1 = yn + f (tn+ 1 , yn+ 1 ∆t)
2
2
predictor
(7.8)
corrector
(7.9)
Note that 7.8 uses the value of dy
(t ) to predict the value of y(tn+ 1 ). This predicted
dt n
2
value is then used to estimate the corrected value at yn+1 ≈ y(tn+1 )
Example Problem:
4t
1
dy
= 4.e 5 − y
dt
2
y(0) = 2; y(3) =?
(7.10)
(7.11)
The exact solution of this is
40 4t 14 −5
e 5 − e 2 ; y(3) = 33.677171176968
13
13
y(t) =
(7.12)
Using Midpoint rule
yn+ 1 =
2
4
yn + (4.e 5 tn − 12 yn ) ∆t
2
yn+1 = yn + (4.e
4
(t + ∆t
)
5 n
2
(7.13)
− 12 yn+ 1 )
2
(7.14)
Using ∆t = 1/2 and 12 function evaluations y(t = 3) = 33.770169 while the true
value= 33.67717. Why not just use forward Euler with half the time-step instead?
The answer is the midpoint method is more accurate. Its error is O(∆t2 ) (by design!)
while forward Euler’s error is O(∆t).
Using ∆t = 1/4 and 12 function evaluations Forward Euler yields y(t = 3) =
31.6432 true value= 33.67717 – a much larger error than the midpoint method above.
7.3.3
Fourth Order Explicit RK
A similar (rather tedious) process can be used to derive a fourth order method i.e.
error is O(∆t4 ). Again, there will be choices for the coefficients since there will be
fewer equations than coefficients when matching terms with the Taylor expansion.
The key point when designing these methods, then, is not only the order of accuracy,
but the number of times f (t, y) must be evaluated in the stage. That is, we want
86
CHAPTER 7. SOLUTION OF INITIAL VALUE PROBLEMS
to maximize the accuracy of the method and minimize the number of times f (t, y)
has to be evaluated. One can imagine for a large system of equations, f (t, y) may be
more expensive to evaluate. In this sense, the 4th order R-K methods are special:
fourth order accuracy can be achieved with four evaluations of f (t, y). Methods
higher than order 4 do not enjoy this benefit and, thus, the popularity of the fourth
order R-K methods. This method below is such an R-K scheme: it is fourth order
in accuracy and requires only four evaluations of f (t, y):
∆t
2
∆t
∆t
yn + f (tn +
, y1 )
2
2
∆t
yn + f (tn +
, y2 )∆t
2
1
yn + (f (tn , yn )
6
∆t
∆t
2f (tn +
, y1 ) + 2f (tn +
, y2 ) + f (tn + ∆t, y3 ))∆t
2
2
y1 = yn + f (tn , yn )
(7.15)
y2 =
(7.16)
y3 =
yn+1 =
+
(7.17)
(7.18)
(7.19)
Using ∆t = 1 required 12 function evaluations y(t = 3) = 33.721348 true value=
33.67717.
Example 7.3.2. Fourth Order Runge-Kutta
See Chapra [1, Example 22.3, pg. 570]
7.3.4
MATLAB and RK methods
MATLAB has implemented several Runge-Kutta methods. In all cases, the user
supplies a function handle for f (t, y), the time interval [t0 , tfinal ], and the initial
condition y(0). ode45 is the defacto initial value problem solver in MATLAB. The
4 indicates that it’s fourth order accurate. The 5 indicates that a fifth order error
estimation is used (in fact, this method is called the Runge-Kutta-Feylberg method).
The idea is that we can compare the step produced by the fourth order method with
the the step produced by a fifth order method — this gives us an estimate of the
error! Thus, if the error is too large, we shrink the time step size, ∆t, and try again.
The RKF method is special because the fifth order error estimator reuses many of
the function evaluations from the fourth order method. This is why the MATLAB
methods return arrays for both the times and the solution — you very likely will not
have uniform time steps following the application of the algorithm. There are other
methods as well: ode23, ode113, etc. Additionally, there are methods designed for
stiff problems: ode23s, ode15s, ode23t, etc. As always, use help ode45 etc. to get
the full documentation.
7.4. PRACTICE PROBLEMS
7.4
Practice Problems
Work the following problems in Chapra [1]:
• 22.1
• 22.7
• 22.15
87
88
CHAPTER 7. SOLUTION OF INITIAL VALUE PROBLEMS
Chapter 8
Solution of Boundary Value
Problems
Finite difference approximations are a very effective way of solving boundary value
problems. First, we’ll start with a specific example of heat transfer in a onedimensional rod. Then, we’ll consider a general second order, linear, ordinary differential equations with non-constant coefficients. Finally, we’ll consider partial differential equations in a two-dimensional setting.
8.1
Heat transfer in a One-Dimensional Rod
First, we’ll consider the case of pure Dirichlet boundary conditions, i.e. fixed temperatures on both ends of the rod. Then, we’ll consider the case of specifying a
Neumann boundary condition on one end, i.e. a specified heat flux.
8.1.1
One-Dimensional Rod with Fixed Temperature Ends
For the 1-D rod illustrated in Fig. 8.1 Consider the simple problem of heat transfer
along a rod between two defined temperatures at either end.
T∞
convec&on'
Ta'
conduc&on'
L'
'x'
Tb'
1" 2" 3" 4" 5" 6" 7" 8" 9" 10" 11" 12" 13" 14" 15" 16"
Figure 8.1: Heat transfer along 1D rod from Ta to Tb
89
90
CHAPTER 8. SOLUTION OF BOUNDARY VALUE PROBLEMS
d2 T
+ h(T∞ − T ) = 0,
dx2
T (0) = Ta
T (L) = Tb
0<x<L
(8.1)
(8.2)
Equation 8.1 defines the temperatures on 0 < x < L and 8.2 the temperatures at
the boundaries x = 0, L. Equation 8.1 and 8.2 define the complete “boundary value
problem”.
To solve the differential equation, one possible approach could be to use the
formulae we developed for the 2nd order derivative. First, we “discretize” the onedimensional domain into a set of N points xi , i = 1, . . . , N , shown in Figure 8.1 for
N = 16. Then, T1 = Ta and TN = Tb as those are our boundary conditions. For the
interior points, let Ti , i = 2, . . . , N − 1 be temperatures at fixed points along the rod.
We approximate the derivatives in the differential equation (8.1) by the appropriate
finite difference formulae. Here, we use the central difference formula for the second
derivative:
Ti+1 − 2Ti + Ti−1
d2 T
=
2
dx
∆x2
Now, inserting into equation (8.1)
0=
Ti+1 − 2Ti + Ti−1
+ h(T∞ − Ti )
∆x2
(8.3)
Or multiplying through by ∆x2 and rearranging we have
− Ti−1 + (2 + h∆x2 )Ti − Ti+1 = h∆x2 T∞
(8.4)
(8.4) is a template equation that holds inside the domain. At the ends we are given
the temperatures
T1 = Ta , T16 = Tb
(8.5)
Using (8.4) and the above values for T1 , T16 we can write equations:
i = 2 : h∆x2 T∞ = −Ta + (2 + h∆x2 )T2 − T3
i = 3 : h∆x2 T∞ = −T2 + (2 + h∆x2 )T3 − T4
i = 4 : h∆x2 T∞ = −T3 + (2 + h∆x2 )T4 − T5
...
......
2
i = 15 : h∆x T∞ = −T14 + (2 + h∆x2 )T15 − Tb
Combining and putting in matrix form

  

1
0
0
0
...
0
T1
Ta
−1 2 + h∆x2
 T2  h∆x2 T∞ 
−1
0
...
0


  

2
0
  

−1
2 + h∆x2 −1
...
0

  T3  = h∆x T∞ 
 ...
  

...
...
...
...
... 

  ...   ...2 
2
 ...
...
...
−1 2 + h∆x −1 T15  h∆x T∞ 
0
0
0
0
...
1
T16
Tb
(8.6)
(8.7)
(8.8)
(8.9)
8.1. HEAT TRANSFER IN A ONE-DIMENSIONAL ROD
91
Figure 8.2: Exact and approximate solution for Heat transfer along 1D rod
Or
⇒ [K]{T } = {F }
{T } = [K]−1 {F }
(8.10)
To get better results we will need to increase the number of intervals from 15 to
much larger numbers e.g. 100. For a test problem with Ta = 300K, Tb = 400K, T∞ =
200K, L = 10m, h = 0.05m−2 we can solve it for N=100 and compare to the exact
solution of the problem obtained using previous chapter techniques to be
√
T (x) = 200 + 20.4671e
0.05x
√
+ 79.5329e
−0.05x
The approximation is plotted against the exact in Fig 8.2.
Example 8.1.1. Finite Difference Approximation of BVPs
See Chapra [1, Example 24.5, pg. 629]
8.1.2
One-Dimensional Rod with Mixed Boundary Conditions
So far we have fixed T (0) = Ta , T (L) = Tb . These are called Dirichlet boundary conditions. Another option is to fix the derivative at one end; this is called a
Neumann boundary condition.
For example, at x = 0, set dT
= b. One way to treat this boundary condition is
dx
to replace the derivative by a suitable difference formula. To do so, we introduce a
“ghost point”, T0 . This point is not actually part of the grid, but allows us to write
down the central difference formula at x1 :
T2 − T0
= b ⇒ T0 = 2b∆x + T2
2∆x
92
CHAPTER 8. SOLUTION OF BOUNDARY VALUE PROBLEMS
Now write down the discretized governing equation at x1 and combine with the
difference formula above:
T2 − 2(h + ∆x2 )T1 + T0 = −h∆x2 T∞
T2 − 2(h + ∆x2 )T1 + 2b∆x + T2 = −h∆x2 T∞
2(h + ∆x2 )T1 − 2T2 = h∆x2 T∞ + 2b∆x
(8.11)
Note that because of the substitution, the “ghost point”, T0 is not part of the system,
since it could be written in terms of T2 . Now, we can write this linear system of
equations in matrix form:

  

2 + h∆x2
−2
0
0
...
0
T1
h∆x2 T∞ − 2b∆x
 −1
  

2 + h∆x2
−1
0
...
0
h∆x2 T∞

  T2  

2
2

  T3  

0
−1
2
+
h∆x
−1
...
0
h∆x
T
∞

  = 







...
...
...
...
...
...   ...  
...




...
...
...
−1 2 + h∆x2 −1 T15  
h∆x2 T∞
0
0
0
0
...
1
T16
Tb
Now, as before, we can solve the linear system of equations for the values of Ti ,
i = 1, . . . , N .
Example 8.1.2. Incorporating Neumann Boundary Conditions
See Chapra [1, Example 24.6, pg. 632]
8.2
General Linear Second Order ODEs with Nonconstant Coefficients
Now we will consider the case of a general, linear, second order differential equation
with nonconstant coefficients. We’ll focus on the pure Dirichlet boundary condition
case, but it is easy enough to generalize to include Neumann boundary conditions,
as we saw in Section 8.1.2.
y 00 + p(x)y 0 + q(x)y = r(x),
y(0) = ya
y(L) = yb
0<x<L
(8.12)
(8.13)
(8.14)
for given functions p(x), q(x), and r(x).
As before, we discretize the line into points xi , i = 1, . . . , N . Then, we can apply
suitable finite difference formulae to the equation. Here, we’ll use a central difference
formula for the second derivative and a forward difference for the first derivative. As
before, we let pi = p(xi ), qi = q(xi ), and ri = r(xi ).
y1 = ya
yi+1 − yi
yi+1 − 2yi + yi−1
+ pi
+ qi yi = ri ,
2
∆x
∆x
yN = yb
(8.15)
i = 2, . . . , N − 1
(8.16)
(8.17)
8.3. TWO DIMENSIONAL EQUATIONS
93
Rearranging terms, and multiplying by −∆x2 , we have
y1 = ya
(8.18)
2
−yi−1 + yi 2 + pi ∆x − qi ∆x +
yi+1 (−1 − pi ∆x) = −ri ∆x2 ,
(8.19)
i = 2, . . . , N − 1
yN = yb
(8.20)
Rewriting in matrix form, we have

1
−1

0

 ...

 ...
0
0
0
2+pi ∆x−qi ∆x2
−1−pi ∆x
0
0
−1
...
...
0
2+pi ∆x−qi ∆x2
−1−pi ∆x
...
...
0
...
−1
0
...
...
...
...
2+pi ∆x−qi ∆x2
...



ya

 −ri ∆x2 

 


 −ri ∆x2 

=


  ... 

 

−ri ∆x2 
−1−pi ∆x  yN −1 
1
yN
yb
0
0
0
...
y1
y2
y3
...

Notice that the example in Section 8.1 is captured here by setting p(x) = 0,
q(x) = −h, and r(x) = −hT∞ .
8.3
Two dimensional Equations
Now we consider boundary value problems for two-dimensional domains. That is,
we’ll consider the application of finite difference methods to partial differential equations. All of the ideas follow very naturally from the one-dimensional case. The only
difference is now how to mange the indexing of the two-dimensional grid of points into
the linear system. We’ll focus on the Poisson equation: ∆u = f . Furthermore, we’ll
only consider Dirichlet boundary conditions; similar ideas to the one-dimensional
case apply for Neumann boundary conditions.
Consider the Poisson equation on the rectangular domain Ω = [0, a] × [0, b], with
Dirichlet boundary conditions:
∂ 2u ∂ 2u
+
= f,
∂x2 ∂y 2
u(0, y) = ul
u(a, y) = ur
u(x, 0) = ub
u(x, b) = ut
(x, y) ∈ Ω
(8.21)
(8.22)
(8.23)
(8.24)
(8.25)
for a given forcing function f (x, y). As before, we discretize the domain. Now
we’ll have a two-dimensional array points, say N points in the x-direction and M
points in the y-direction. Now we have two-indices to track points on the grid:
(i, j), i = 1, . . . , N ; j = 1, . . . M . That is, each point in the grid lies at (xi , yj ).
94
CHAPTER 8. SOLUTION OF BOUNDARY VALUE PROBLEMS
Now we apply central difference rules to both the partial derivatives:
∂ 2u
ui−1,j − 2ui,j + ui+1,j
≈
2
∂x
∆x2
2
∂ u
ui,j−1 − 2ui,j + ui,j+1
≈
2
∂y
∆y 2
(8.26)
(8.27)
Substituting into our original partial differential equation, we have
ui−1,j − 2ui,j + ui+1,j ui,j−1 − 2ui,j + ui,j+1
+
= fi,j
∆x2
∆y 2
u1,j = ul , j = 1, . . . , M
uN,j = ur , j = 1, . . . , M
ui,1 = ub , i = 1, . . . , N
ui,M = ut , i = 1, . . . , N
(8.28)
(8.29)
(8.30)
(8.31)
(8.32)
where, as before, fi,j = f (xi , yj ).
In the one-dimensional case, there was a direct correspondence between the index
of the equation and the index of the matrix row. Here, the situation is more complicated. We track the points in the grid using two indices, but have to convert those
two indices into a single index into the matrix equation. The simplest possibility,
since we have a nice structured grid, is to count from left to right, bottom to top;
i.e. start at j = 1, move i from 1 to N , then move to j = 2, then, again, move i from
1 to N , etc. This is conveniently expressed by the following integer function:
k = i + (j − 1)N
(8.33)
The integer k corresponds to the entry in the matrix for a given (i, j) and takes
values k = 1, . . . , N ∗ M . So now, as we consider the equation at each point in the
domain, i.e. a specific (i, j), we can now directly map that to a row in linear system.
8.4
Practice Problems
Work the following problems in Chapra [1]:
• 24.8(b)
• 24.12
Chapter 9
Solution of Eigenproblems
In Chapter 2, we discussed the formulation and solution of eigenproblems Ax = λx.
The methods of solution we considered were all based on finding the roots of the
characteristic polynomial. However, that’s only practical for small problems. In
this case, we can specify exactly how small: dimension 4. Why? It was proven in
the 1800’s that polynomials of order 5 or higher possess no analytical solution.
That means we must use numerical methods for solving practical eigen problems.
However, numerically finding the roots of polynomials is not the most practical
method. We will consider three methods: the power method, used for finding the
largest eigenvalue and corresponding eigenvector, the inverse power method, which
computes the smallest eigenvalue and its eigenvector, and the QR method, which
computes the entire spectrum at once.
This will also be our first encounter with iterative methods. That is, the numerical methods we’ve encountered so far have been “direct” methods, i.e. we can
directly compute the solution and we can count how many operations it will take to
do so. This is not the case with iterative methods. These methods update the solution and we must continually check if we solved the problem to the desired accuracy.
There’s no way to predict ahead of time, except in the most trivial of circumstances,
how many steps it will take. Thus, we will be introducing a new kind of error: the
error incurred by only approximately solving our numerical problem.
9.1
Power Method
The power method is perhaps the most classical approach to solving eigenproblems.
The power method (or power iteration) will yield an approximation of the largest
eigenvalue and its corresponding eigenvector. The idea is very simple. First, we
make a guess of what the eigenvector is, call it z. Then, compute w = Az. If z is
an eigenvector, then, for any component k,
X
wk =
Akj zj = λzk
(9.1)
j
⇒λ=
wk
zk
95
(9.2)
96
CHAPTER 9. SOLUTION OF EIGENPROBLEMS
That is, if z was an eigenvector, we can directly compute the corresponding eigenvalue. If z is not an eigenvector, then we use w as the next guess for the eigenvector.
And we repeat this process. So, if our initial guess is z0 , then
w0 = Az0
z1 = w0
w1 = Az1 = Aw0 = A2 z0
...
This is where the name “Power Method” comes from, we are effectively repeatedly
multiplying by A. How do we extract, then, the eigenvalue? At each iteration, we
always normalize z so that kzk∞ = 1, i.e. we rescale z so that its largest component
is 1. Why? That largest component will converge to the eigenvalue. That is, looking
at Equation (9.2), if z is getting closer to the eigenvector and its largest component
has the value of 1, then the the factor we use to normalize z (the largest value of w)
will be the eigenvalue. So, the final procedure for the Power Method is as follows.
Repeat until convergence tolerance is reached:
1. Provide initial guess for z.
2. Compute w = Az.
3. z =
1
w
kwk∞
The maximum value of w will be the eigenvalue and z will be the eigenvector.
How do we assess convergence? There are several choices. First, we can check
and see if the eigenvalue is not changing within each iteration:
|λi − λi−1 | < εtol
(9.3)
where i indicates the current iteration and εtol is a user-supplied tolerance. Although
such a test is practical, it can also be misleading since, in many iterative methods,
the convergence may stagnate, i.e. the value changes very little between iterations,
but we are not solving the original problem. Thus, it is also useful to check the
residual :
ri = Azi − λi zi
(9.4)
But r is a vector — how can we check if a vector is “small”? This is another use for
norms. Norms give us a way of measuring the size of vectors. So, in this case, our
residual error check would be
kri k = kAzi − λi zi k < εtol
Typically, we use the Euclidean norm.
Example 9.1.1. Power Method
Consider the matrix


40 −20 0
A = −20 40 −20
0 −20 40
(9.5)
9.2. INVERSE POWER METHOD
97
We will perform several iterations of the Power Method to estimate the largest eigenvalue. We use the initial guess z = [1, 1, 1]T . Then, w = Az = [20, 0, 20]T . We
need to normalize for the next iteration so we extract the largest value (in absolute
value) and normalize. So, max w = 20 and z = [1, 0, 1]T . So after 1 iteration, our
approximation of the largest eigenvalue is 20 and the corresponding eigenvector is
[1, 0, 1]T . Again, w = Az = [40, −40, 40]T . Extracting the largest value gives 40
and z = [1, −1, 1]T , our current approximation for the eigenpair. Repeating again,
w = Az = [60, −80, 60]. So, the largest value now is −80 and normalizing gives
z = [−3/4, 1, −3/4]T . Repeating again, w = Az = [−50, 70, −50]. Normalizing
gives the largest value as 70 and z = [−5/7, 1, −5/7]T ; this is our estimate of the
eigenpair after 4 iterations of the Power method. The exact values, according to the
MATLAB eig command are λ = 68.28427 . . . and x = [−0.707107, 1, −0.707107]T .
9.2
Inverse Power Method
The Inverse Power Method is very easy to understand once we understand the Power
Method. We can rewrite our original eigenproblem as
A−1 x =
1
x
λ
(9.6)
If we now apply the power method to Equation (9.6), we will be computing the
largest value of 1/λ. This means that we’ll be computing the smallest value of λ, i.e.
the smallest eigenvalue of A! The algorithm is very similar to the Power Method.
Repeat until convergence tolerance is reached:
1. Provide initial guess for z.
2. Compute w = A−1 z.
3. z =
1
w
kwk∞
The maximum value of w will be the eigenvalue (1/λ) and z will be the eigenvector.
Of course, the primary difference is now we’re not simply multiplying by A, but rather
we have to solve a linear system at each iteration. This is much more expensive than
the Power method! Of course, one can perform an LU decomposition and store L
and U to reuse for each linear solve, but this initial step can still be quite costly for
even modest sized matrices.
9.3
Shifted Inverse Power Methods
With the Power Method and Inverse Power Method, we can now compute the largest
and smallest eigenvalues. What about others? We will take advantage of a shifting
property of matrices to examine the other eigenvalues. Namely, if we take our original
98
CHAPTER 9. SOLUTION OF EIGENPROBLEMS
eigenproblem and subtract the vector τ x, with τ some number, from both sides, we
have
(A − τ I) x = (λ − τ ) x
(9.7)
That is, if we shift the matrix, we also correspondingly shift the eigenvalues. In
particular, we use this to shift the eigenvalues for the Inverse Power Method, thereby
controlling which of the (shifted) eigenvalues is the smallest. Although this is not
practical for computing the entire spectrum, it can be useful for computing a handful
of eigenvalues and for accelerating convergence of desired eigenvalues.
9.4
QR Iteration
Although it can often be useful to compute a few of the smallest and/or largest eigenvalues of a matrix, it can also be useful to compute all eigenvalues and eigenvectors.
Computing this whole spectrum is the job of the QR iteration. The QR iteration
is extremely simple to write down, but has a heavy theoretical burden that will we
not fully explore. Nevertheless, the QR iteration is based on the existence of the QR
decomposition of a matrix: A = QR where Q is an orthogonal matrix and R is an
upper triangular matrix. An orthogonal matrix satisfies two important properties:
det Q = ±1 and Q−1 = QT .
If we assume that such a QR decomposition exists, then the QR iteration is very
simple. Repeat until convergence:
1. Compute the QR decomposition of A = QR.
2. Set A = RQ
This second step in the repeating iteration can be seen a different way. Namely,
since QR = A, then R = QT A (since Q is an orthogonal matrix), then we reset
A = RQ, what we have is A = QT AQ. Thus we are repeatedly transforming A
using the orthogonal matrices Q. These repeated similarity transformations will
yield the entire spectrum, namely the diagonal of the final A will be the eigenvalues
and the product of all the matrices Q will contain the eigenvectors.
Computing the QR decomposition is an O(n3 ) operation. Thus, we would be
performing an O(n3 ) at every iteration of the QR iteration! Thus, in actual implementations of such methods, the matrix A is first reduced to upper Hessenberg form,
upper triangular form plus one nonzero subdiagonal, for which the QR decomposition is O(n2 ) (O(n) if the matrix is symmetric!). Such strategies are at the heart
of the MATLAB command eig for computing eigenvalues and eigenvectors. This
is also why the cost of computing the entire spectrum can be quite large for even
modest sized matrices.
Chapter 10
Nonlinear Equations
Engineering analysis proceeds in a three stage iterative process – first we must carry
out a careful set of observations to determine what data and information about the
problem are available; second we must develop our basic knowledge of the physics,
feasibility constraints, etc., into a mathematical analysis of the problem; and, finally
we must solve the mathematical problem numerically to obtain the desired results
of the analysis. Let us now explore this in the context of a simple example.
10.1
Example Problem: Rocket
A rocket with initial mass m0 (shell + fuel) is fired vertically at time t0 . Fuel is
consumed at a constant rate:
dm
(10.1)
q=
dt
and is expended at constant speed u relative to the rocket.
We wish to determine:
1. Velocity as a function of time t, neglecting air resistance.
2. The time t at which the rocket will reach velocity v.
3. The size of rocket (initial mass) needed to reach a given velocity at a given
time.
10.1.1
Problem Formulation
Principles of Impulse and Momentum
I~ = ∆~p
At time t
• Force: W (t) = m(t)g
• Momentum: Pz (t) = m(t)v(t)
99
(10.2)
100
Figure 10.1:
gram
CHAPTER 10. NONLINEAR EQUATIONS
Rocket Dia-
Figure 10.3: Time t+∆t
Figure 10.2: Time t
10.2. SOLVING NON-LINEAR EQUATIONS
101
At time t + ∆t
• W (t + ∆t) = [m(t) − ∆m]g + ∆mg
• Pz (t + ∆t) = [m(t) − ∆m]v(t + ∆t) + ∆m[v(t + ∆t) − u]
• ∆m = q∆t
10.1.2
Problem Solution
Z
t+∆t
−W (t)dt = Pz (t + ∆t) − Pz (t)
(10.3)
m(t)g∆t = m(t)[v(t + ∆t) − v(t)] − qu∆t
(10.4)
t
Divide by ∆t
m(t)g = m(t)
v(t + ∆t) − v(t)
− qu
∆t
(10.5)
dv
(t) − qu
dt
(10.6)
Limit as ∆t → 0
m(t)g = m(t)
Separate variables and integrate.
v
Z t
qu
− g dt
dv =
m0 − qt
0
o
m0
v(t) = u ln
− gt
m0 − qt
m0
f (t) = u ln
− gt − v = 0
m0 − qt
Z
(10.7)
(10.8)
(10.9)
To answer the first question we posed – velocity as a function of time t, all we have
to do is evaluate 10.8 for given values of m0 , q and g. To answer the second, time
at which the rocket will reach a desired velocity, we need to solve 10.9 i.e. given
v, m0 , q, g, find the roots of f (t) = 0.
For another example on finding roots of a non-linear equation, refer to Chapter
5 of Ref [1].
10.2
Solving Non-Linear Equations
First we note that the “equation” is not of the form t = g(...) – no explicit expression
for t as a function of everything else is possible. Secondly, the equation is non-linear
in the m0 , q, and t variable. How do we know this? We can test the equation for
linearity.
102
CHAPTER 10. NONLINEAR EQUATIONS
10.2.1
Test for Linearity
Suppose we have an equation
f (v) = 0
If we replace v by a linear combination of variable, e.g. 2v1 + v2 , and if f (v) is a
linear function (note: not an affine function) – we will now get
2f (v1 ) + f (v2 ) = 0
For 10.9, clearly, this is not possible – the log function precludes this! Thus 10.9 is
non-linear in t, m0 and q. Most of the time you can find this by inspection.
10.2.2
Methods of Solution
Let’s now frame some solution strategies. The solutions have to be searched for
inside a possible range of values. The simplest and first thing to do is graph the
function and get a sense of its behavior.
1. Incremental Search This is the simplest strategy. Simply divide up a range
of into smaller increments t ∈ [a, b] = [t0 , t1 , t2 ..., tM ] and keep evaluating the
function until, for some ti , |f (t) − f (ti )| < , a defined tolerance. ti is then the
desired root.
2. Bisection This is a smarter search strategy which uses the bisection method
to speed up the search. The core idea is that for f (troot ) = 0 it must change
sign to the left and right of troot . So if we start with an interval [a,b] on which
) will
f (t) changes sign the root must lie in there. Checking the sign of f ( a+b
2
then tell us which half contains the sign change and hence root. The procedure
can be applied recursively until we are happy with the accuracy of the root.
The algorithm to implement this scheme is:
Input m0 , vdesired , q and range[a, b]
1.Check data for consistency
2.Evaluate f (a), f (b) and check that sign changes i.e. f (a) ∗ f (b) < 0
)
3.Evaluate f (m) = f ( a+b
2
4. if f (m) ∗ f (b) < 0 then set a = m, else set b = m
repeat 3,4 until f (m) < A simple MATLAB code to implement this is shown in Figure 10.4.
It should be noted that if f 0 (troot ) = 0, the bisection method might not work
within the given bounds – this is the case when the function does not change
sign within the given bounds (even though a root does exist) .
10.2. SOLVING NON-LINEAR EQUATIONS
103
Exercise: Modify the sample MATLAB code to answer the third question i.e.
estimate the size of rocket needed to attain a desired velocity after a specified
amount of time.
For further practice in using the Bisection method, see Examples 5.3 and 5.4
in Ref [1].
3. Newton Rhapson The Taylor series expansion of a function f (x) about a
known value f (x1 ) is given by:
f (x) = f (x1 ) + (x − x1 )f 0 (x1 ) + (x − x1 )2
f 00 (x1 )
+ ...
2!
(10.10)
We can approximate to ”first order” as
f (x) ≈ f (x1 ) + (x − x1 )f 0 (x1 ) + O(x − x1 )2
(10.11)
Near a possible root x1
f (x) = 0 = f (x1 ) + (x − x1 )f 0 (x1 )
(10.12)
What this equation is showing is that we can estimate f (x) by it’s value at
x1 , f (x1 ) plus a correction given by the slope f 0 (x1 ) times the change in x,
x − x1 i.e. a straight line approximation of the function.
x = x1 −
f (x1 )
f 0 (x1 )
(10.13)
Repeating
xi+1 = xi −
f (xi )
f 0 (xi )
(10.14)
For sufficiently small x − x1 this will always work but the small might be too
small!! Newton’s method converges fast as error is of order (x − x1 )2 BUT if
this gap is large we might not converge at all. This method also needs the
derivative to be computable – this may not always be possible, though often
we can get an approximation. If f (t) has repeated roots e.g. f (x) = (x − 3)2
then f 0 (x) = f (x) = 0 for x = 3. The method will break down since we will
end up dividing by zero!!
Example 10.2.1. Examples 6.2, 6.3, and 6.4 of Ref [1].
4. Secant Variation of NR method – replace f 0 by successive iterates.
f 0 (xi ) ≈
f (xi ) − f (xi−1 )
xi − xi−1
Example 10.2.2. Example 6.5 of Ref [1].
104
CHAPTER 10. NONLINEAR EQUATIONS
MATLAB Code for Bisection to find roots of 10.9
% Adapted from Faucett, Applied Numerical Analysis Using Matlab (2008)
% Root finding by bisection
% Lets get the variables input
vdes =input ( ’ desired vel ’);
M0 = input (’ initial mass ’);
q = input (’ Burn Rate ’);
u = input(’ exhaust vel ’ );
% Define the function (x is time, M0 is initial mass ...)
f = @(x)(u ∗ (log(M 0) − log(M 0 − q ∗ x)) − vdes)/9.81 − x;
% Set initial range
a = input(’ search left ’);
b = input (’ search right ’);
% Make sure that poorly chosen arguments don’t kill code
if (M 0 − q ∗ b) <= 0, error ( ’negative argument to log ’), end
% Set tolerance and max iterations
kmax = 500;
tol = 0.001;
fa = f(a);
fb = f(b);
% Check that bisection will work
if sign(fa) == sign(fb), error(’ function has same sign at end points – bisection will not work’), end
disp(’ step a b m fm bound’)
for k = 1:kmax
m = (a+b)/2;
fm = f(m);
iter = k;
bound = (b-a)/2;
out = [ iter, a, b, m, fm, bound ];
disp( out )
if abs(f m) < tol, disp(’ bisection has converged’); break; end
outerr = [ abs(fm) ];
% Check which half of domain contains the root
if f m ∗ f a > 0
a = m;
fa = fm;
else
b = m;
fb = fm;
end
bb(k)=m;
err(k)=fm;
if (iter >= kmax), disp(’ zero not found to desired tolerance’),
fprintf(’ err = % e \n’, outerr ), end
end
% Just an optional set of variables to make a nice plot of the function
for k=1:kmax
fr(k)=f(a+(b-a)*k/kmax);
tim(k)=a+(b-a)*k/kmax;
end
subplot(1,2,1), plot(tim,fr);
subplot(1,2,2), plot(err,’*’);
Figure 10.4: Code for solving 10.9 using the bisection method
10.3. CONVERGENCE
105
5. Regula Falsi Variation of bisection method. Instead of “bisecting” the interval at each iteration, we instead use a linear approximation. That is, we
approximate the function f (x) by a linear function that passes through the
points f (a) and f (b). Where this linear approximation is zero determines the
next guess of the root, xr :
xr = b −
f (b)(a − b)
f (a) − f (b)
(10.15)
Example 10.2.3. Example 5.5 of Ref [1].
6. Fixed Point Iteration Rewrite f (x) = 0 as
x = g(x)
A point x that satisfies x = g(x) is said to be a fixed point of the function g(x).
So, we are looking for fixed points of g(x), but because the equation x = g(x)
was based on f (x) = 0, solving this fixed point problem will also solve our
original root problem. The iteration then proceeds as
xi+1 = g(xi ), i = 1, 2, 3, ...
(10.16)
Stop when
|xi+1 − g(xi+1 )| < or
|f (xi+1 )| < This process converges (see below) if
|g 0 (x)| < 1
Example 10.2.4. Example 6.1 of Ref [1].
10.3
Convergence
To be useful, iterative numerical methods like the ones we have discussed above
must reduce error to a tolerable level and in the limit (if we have infinite computing
resources!) reduce it to zero. We analyze the methods above to see if they satisfy
this. Usually, most methods will do it under restrictions – these are the conditions
we need to make sure our problem observes before we apply the method to that
problem.
106
10.3.1
CHAPTER 10. NONLINEAR EQUATIONS
Bisection
Let us look at the interval hi+1 at iteration i + 1. Since the root is in the interval
[ai+1 , bi+1 ]the maximum error
|ei+1 | < hi+1
(10.17)
1
1
1
1
hi+1 = bi+1 − ai+1 = hi = 2 hi−1 = ... = i+1 h0 = i+1 (b − a)
(10.18)
2
2
2
2
Taking logarithms and rearranging:
log10 h0 − log10 |ei |
−1
(10.19)
i≤
log10 2
Thus given an accuracy ei we can estimate the upper bound of iterations. Note that
the above argument breaks down if there is more than 1 root. Thus, we can apply
this analysis to bisection only if there is 1 root in [ai+1 , bi+1 ].
10.3.2
Newton Rhapson
Let the correct root be x∗ for f (x) = 0 i.e. f (x∗ ) = 0. Let xn , xn+1 be estimates
at the n, n + 1 iteration such that |x∗ − xn | = δ << 1. We can define errors
en = x∗ − xn , en+1 = x∗ − xn+1 . By Taylor series,
0 = f (x∗ ) = f (xn ) + f 0 (xn )(x∗ − xn ) +
f 00 (ξ) ∗
(x − xn )2
2
for some ξ ∈ (x∗ , xn ) using the reminder formula for Taylor series
f 00 (xn )
(x∗ − xn )2 + ....
2
By Newton Rhapson
xn+1 =
xn −
f 00 (ξ)
(x∗
2
(10.20)
− xn )2 =
f (xn )
f 0 xn
⇒ f (xn ) = f 0 (xn )(xn − xn+1 )
(10.21)
Using 10.21 in 10.20 we have
0 = f 0 (xn )(xn − xn+1 ) + f 0 (xn )(x∗ − xn ) +
= f 0 (xn )en+1 +
⇒ en+1
Or en+1
f 00 (ξ) 2
= − 0
e
2f (xn ) n
∝ e2n
f 00 (ξ) ∗
(x − xn )2 (10.22)
2
f 00 (ξ) 2
e
2 n
(10.23)
This is the quadratic convergence property of Newton’s method. This means
that, if we’re close enough to the solution, the error at the next iteration will be the
square of the error at the current iteration, i.e. if the error at our current iteration
is ≈ 10−3 , then the error at the next iteration will be ≈ 10−6 , and then ≈ 10−12 after
that. This is very fast convergence!
10.4. NONLINEAR SYSTEMS OF EQUATIONS
10.3.3
107
Fixed Point
Let the correct root be x∗ .
x∗ = g(x∗ )
(10.24)
Subtracting 10.16 and dividing by x∗ − xi
x∗ − xi+1 = g(x∗ ) − g(xi )
x∗ − xi+1
g(x∗ )−g(xi )
=
x∗ −xi
∗
x − xi
(10.25)
(10.26)
Using the mean value theorem of calculus (g() is a continuous function defined
on the interval [x∗ , xi ] so there must exist a value of ξ ∈ [x∗ , xi ] for which g 0 (ξ) is
given by the ratio of the difference of values of g() at each end and the length of the
interval).
ei+1
=
ei
ei+1 =
g(x∗ )−g(xi )
x∗ −xi
= g 0 (ξ) for ξ ∈ [x∗ , xi ]
g 0 (ξ)ei
(10.27)
(10.28)
For convergence the error must reduce i.e. ei+1 < ei . Thus,
g 0 (ξ) < 1
10.4
Nonlinear Systems of Equations
To this point, we have only considered scalar nonlinear equations, i.e. only a single equation with one independent variable. Just as we can have systems of linear
equations, studied extensively in previous chapters, we can also have systems of nonlinear equations. That is, we can have multiple equations with multiple independent
variables, with a nonlinear dependence on the independent variables. Such systems
arise in many different aspects of science and engineering.
The first step is always to transform the equations into a root problem, namely
f(x) = 0, or, written more explicitly
f1 (x1 , x2 , x3 , . . . , xn ) = 0
f2 (x1 , x2 , x3 , . . . , xn ) = 0
f3 (x1 , x2 , x3 , . . . , xn ) = 0
..
.
fn (x1 , x2 , x3 , . . . , xn ) = 0
Now we have n equations with n unknowns, with a nonlinear dependence in one or
more of the independent variables.
Example 10.4.1. Simple Nonlinear System of Equations
108
CHAPTER 10. NONLINEAR EQUATIONS
The following system of equations is nonlinear in the unknowns x1 , x2 , x3 :
x21 + 2x22 + 4x3 = 7
2x1 x2 + 5x2 − 7x2 x23 = 8
5x1 − 2x2 + x23 = 4
Example 10.4.2. Intersection of a Circle and an Ellipse
The intersection of geometric shapes provide more interesting examples of nonlinear systems. Consider the intersection of a circle and an ellipse:
x 2 y 2
+
=1
a
b
x2 + y 2 = r 2
We are looking for points (x, y) where the two curves intersect, i.e. the values of
(x, y) that satisfy both of the equations. Depending on the parameters of the ellipse
and the circle, there will be differing numbers of solutions – i.e., there could be 0 to
4 points of intersection between the ellipse and the circle.
To solve such systems of equations, we can use strategies similar to those used
for scalar equations. We will consider two algorithms here: the fixed-point method
and Newton’s method (also called the Newton-Raphson method). By far, Newton’s
method is the most widely used. One key difference in the case of a nonlinear system
of equations is assessing convergence. In the case of a single equation (scalar case),
we could simply check |f (xi )| < εtol since f (x) was a scalar function. Now, we have
a vector f (x). How do we check the tolerance here? We use norms! Norms give
us a measure of the magnitude of a vector; in this case allowing us to check if the
magnitude of our vector is small enough, i.e. kf (x)k < εtol . Typically, the Euclidean
norm (2-norm) is used, kf (x)k2 (c.f. Equation (5.6)).
10.4.1
Fixed-Point Method
As with the scalar case, we will rewrite the root problem, f (x) = 0 into the form
x = g(x). Written more explicitly, we have
f1 (x1 , x2 , x3 , . . . , xn ) = 0
f2 (x1 , x2 , x3 , . . . , xn ) = 0
f2 (x1 , x2 , x3 , . . . , xn ) = 0
⇒
x1 = g1 (x1 , x2 , x3 , . . . , xn )
x2 = g2 (x1 , x2 , x3 , . . . , xn )
x3 = g3 (x1 , x2 , x3 , . . . , xn )
..
.
fn (x1 , x2 , x3 , . . . , xn ) = 0
xn = gn (x1 , x2 , x3 , . . . , xn )
Then, as in the scalar case, given an initial guess x0 , the iteration proceeds as
xi+1 = g(xi ),
i = 0, 1, 2, . . .
We continue until we reach an acceptable tolerance.
(10.29)
10.4. NONLINEAR SYSTEMS OF EQUATIONS
109
Example 10.4.3. Formulate Nonlinear System for Fixed-Point Method
Consider the following root problem:
f1 (x1 , x2 , x3 ) = x21 + 50x1 + x22 + x23 − 200 = 0
f2 (x1 , x2 , x3 ) = x21 + 20x2 + x23 − 50 = 0
f3 (x1 , x2 , x3 ) = −x21 − x22 + 40x3 + 75 = 0
We can rearrange the equations into fixed-point form:
200 − x21 − x22 − x23
50
2
50 − x1 − x23
x2 = g2 (x1 , x2 , x3 ) =
20
x21 + x22 − 75
x3 = g3 (x1 , x2 , x3 ) =
40
x1 = g1 (x1 , x2 , x3 ) =
Example 10.4.4. Example 12.3 of Ref [1].
10.4.2
Newton-Raphson Method
In the scalar case, we derived Newton’s method by considering a truncated Taylor
series. We will do the same here, but now we have a system of equations, so we must
do a Taylor series for each equation. Consider the system of nonlinear equations:
f1 (x1 , x2 , x3 , . . . , xn ) = 0
f2 (x1 , x2 , x3 , . . . , xn ) = 0
f3 (x1 , x2 , x3 , . . . , xn ) = 0
..
.
fn (x1 , x2 , x3 , . . . , xn ) = 0
Now we expand each equation in a Taylor series up to first order about the point xi .
Note that since we have multiple independent variables, the Taylor series will involve
partial derivatives with-respect-to each of the independent variables. We use the
notation f1,i+1 = f1 (x1,i+1 , x2,i+1 , x3,i+1 , . . . , xn,i+1 ), f1,i = f1 (x1,i , x2,i , x3,i , . . . , xn,i ),
etc.
∂f1,i
∂f1,i
∂f1,i
(x1,i+1 − x1,i ) +
(x2,i+1 − x2,i ) + · · · +
(xn,i+1 − xn,i )
f1,i+1 = f1,i +
∂x1
∂x2
∂xn
∂f2,i
∂f2,i
∂f2,i
f2,i+1 = f2,i +
(x1,i+1 − x1,i ) +
(x2,i+1 − x2,i ) + · · · +
(xn,i+1 − xn,i )
∂x1
∂x2
∂xn
∂f3,i
∂f3,i
∂f3,i
f3,i+1 = f3,i +
(x1,i+1 − x1,i ) +
(x2,i+1 − x2,i ) + · · · +
(xn,i+1 − xn,i )
∂x1
∂x2
∂xn
..
.
fn,i+1 = fn,i +
∂fn,i
∂fn,i
∂fn,i
(x1,i+1 − x1,i ) +
(x2,i+1 − x2,i ) + · · · +
(xn,i+1 − xn,i )
∂x1
∂x2
∂xn
110
CHAPTER 10. NONLINEAR EQUATIONS
Now, setting f1,i+1 = f2,i+1 = f3,i+1 = · · · = fn,i+1 = 0 and moving f1,i though fn,i
to the other side of the equation, we have
∂f1,i
(x1,i+1 − x1,i ) +
∂x1
∂f2,i
=
(x1,i+1 − x1,i ) +
∂x1
∂f3,i
(x1,i+1 − x1,i ) +
=
∂x1
−f1,i =
−f2,i
−f3,i
∂f1,i
(x2,i+1 − x2,i ) + · · · +
∂x2
∂f2,i
(x2,i+1 − x2,i ) + · · · +
∂x2
∂f3,i
(x2,i+1 − x2,i ) + · · · +
∂x2
∂f1,i
(xn,i+1 − xn,i )
∂xn
∂f2,i
(xn,i+1 − xn,i )
∂xn
∂f3,i
(xn,i+1 − xn,i )
∂xn
..
.
−fn,i =
∂fn,i
∂fn,i
∂fn,i
(x1,i+1 − x1,i ) +
(x2,i+1 − x2,i ) + · · · +
(xn,i+1 − xn,i )
∂x1
∂x2
∂xn
We can rewrite these equations in matrix form as
 ∂f1,i ∂f1,i ∂f1,i

 

∂f
. . . ∂x1,i
x1,i+1 − x1,i
−f1,i
∂x1
∂x2
∂x3
n
 ∂f2,i ∂f2,i ∂f2,i
∂f  


. . . ∂x2,i
 ∂x1
  x2,i+1 − x2,i 
∂x2
∂x3
  −f2,i 
n
 ∂f3,i ∂f
∂f3,i
∂f3,i  


3,i

x3,i+1 − x3,i  =  −f3,i 
. . . ∂xn 

∂x2
∂x3
 ∂x1

  .. 
..
..
..
..  
 ..
..
  . 
.
.
 .
.
.
. 
∂fn,i
∂fn,i
∂fn,i
∂fn,i
xn,i+1 − xn,i
−fn,i
...
∂x1
∂x2
∂x3
∂xn
The matrix is called the Jacobian matrix, which we label Ji . So, the newton step
looks very familiar:
xi+1 = xi − J−1
(10.30)
i fi
Now, however, we have to solve a linear system of equations to compute the Newton
step! But the principle is still the same: we compute the derivative and the derivative
guides the next step in the iteration. Here, though, the derivative is the Jacobian
matrix.
Example 10.4.5. Newton Iteration for Intersection of Curves
Consider the nonlinear system
f1 (x1 , x2 ) = x21 + x22 − 1
f2 (x1 , x2 ) = x21 − x2
If we wish to use Newton’s method to solve the root problem f (x) = 0, we must
compute the Jacobian. In this case, the Jacobian matrix is
2x1 2x2
J=
2x1 −1
Now, we evaluate the nonlinear system, f (x) and the Jacobian J(x) at each iteration
and solve the linear system J∆x = −f to get the Newton update.
Example 10.4.6. Case Study 12.3 of Ref [1].
10.4. NONLINEAR SYSTEMS OF EQUATIONS
10.4.3
111
Case Study: Four Bar Mechanism
We now consider a prototypical engineering problem: the rigid body kinematics of a
structure. Such problems arise in the study of robotic systems, for example. Consider
the four bar mechanism shown in Figure 10.5. The goal is to predict the motion of
Figure 10.5: Simple four bar mechanism.
the structure as we change one of the angles, for example. We assume that the bars
are rigid and that we are given the lengths of each bar: r1 , r2 , r3 , and r4 . The angle θ1
is fixed since bar 1 cannot rotate (due to being pinned at both ends). The equation
for point P is merely the sum of the position vectors of each of the bars. Namely,
rP = r2 + r3 = r1 + r4
(10.31)
We can express Equation (10.31) in terms of unit vectors i and j:
r2 (cos θ2 i + sin θ2 j) + r3 (cos θ3 i + sin θ3 j) =r1 (cos θ1 i + sin θ1 j)
+r4 (cos θ4 i + sin θ4 j)
(10.32)
Since the equations must be satisfied for each component i and j, we can write these
as two separate equations, one for the x-direction and one for the y-direction:
r2 cos θ2 + r3 cos θ3 = r1 cos θ1 + r4 cos θ4
r2 sin θ2 + r3 sin θ3 = r1 sin θ1 + r4 sin θ4
(10.33)
(10.34)
Now, suppose were are changing θ4 (i.e. rotating bar 4), what will be the configuration of the system? This is a system of nonlinear equations in terms of θ2 and
θ3 !
f1 (θ2 , θ3 ) = r2 cos θ2 + r3 cos θ3 − r1 cos θ1 − r4 cos θ4
f2 (θ2 , θ3 ) = r2 sin θ2 + r3 sin θ3 − r1 sin θ1 − r4 sin θ4
(10.35)
(10.36)
112
CHAPTER 10. NONLINEAR EQUATIONS
So for the given data on the system and the current value of θ4 , we can solve the
nonlinear system (10.35) for θ2 and θ3 to get the position of the system. If we wish
to use Newton’s method, we’ll need the Jacobian matrix:
−r2 sin θ2 −r3 sin θ3
J(θ2 , θ3 ) =
(10.37)
r2 cos θ2
r3 cos θ3
We’ll also need to supply an initial guess for our numerical method. However,
we need to be careful. In addition to worrying about supplying initial values that
could lead to a singular Jacobian matrix (the scalar analog is a zero derivative), we
also need to worry about multiple solutions. Figure 10.6 illustrates a second valid
configuration for the four bar system, i.e. our system of nonlinear equations may
have multiple solutions! Which solution will be get? This depends on our initial
guess! Typically, in these scenarios, there is a preferred solution, either through
other physical constraints not posed in the system, preferred in the design process,
etc. Thus, we need to have an initial guess that will give us our preferred solution.
Figure 10.6: Multiple valid configurations for the four bar mechanism.
Part III
Data Analysis
113
Chapter 11
Linear Regression
As engineers, we are often faced with the task of determining parameters of materials
from experimental data. For example, one way to determine the Youngs modulus of
a material is to subject a sample of the material to controlled loading and examine
the resulting stress-strain curve, for example one similar to Figure 11.1. We know
that for linear elastic materials, the stress and strain are related through Hooke’s
Law σ = Eε. However, as we see in Figure 11.1, we have many different values of
stress and strain — which one should we use? Given that we have a model, Hooke’s
law in this case, we would like it to match all the data as much as possible. That is,
we want the best fit or best approximation.
11.1
Least Squares Fit: Two Parameter Functions
Our “best fit” problem can be stated as follows. Given a set of n data points
(xi , yi ), i = 1, . . . , n, we want to find a function f (x) that best fits the data. Note
that this means yi 6= f (xi ). What do we mean by “best fit”? Typically, we want to
minimize the error between the data, yi , and the function f (x):
ei = yi − f (xi )
(11.1)
See also Figure 11.2. But we can’t just use the error as is because we can have
cancellation! Some errors are positive and some are negative, so it’s possible to sum
the errors and have zero total error even though we clearly aren’t matching the data.
Figure 11.1: Example of Stress-Strain measurements for several types of Aluminum.
115
116
CHAPTER 11. LINEAR REGRESSION
y
f (x)
f (xi )
yi
ei
x
xi
Figure 11.2: Illustration of error between data and curve fit.
Instead, we will square each of the error components so that our total error is
Sr =
n
X
e2i
=
i=1
n
X
(yi − f (xi ))2
(11.2)
i=1
So, our goal is to minimize the total error or, in other words, find the least square
error.
So now we need to choose the form of f (x). This is typically dictated by our
understanding of the physical process, e.g. Hooke’s Law. We will first consider the
simple case of a line: f (x) = a0 + a1 x. Now the problem is to determine the values
of a0 and a1 that will minimize the error in Equation (11.2). Namely,
min Sr = min
a0 ,a1
a0 ,a1
n
X
i=1
e2i
= min
a0 ,a1
n
X
(yi − a0 − a1 xi )2
(11.3)
i=1
The minimum occurs when ∇Sr = 0. In this case, the variables we are varying are
the parameters a0 and a1 , so the gradient is with respect to each of these parameters:
∂Sr
=0
∂a0
∂Sr
=0
∂a1
(11.4)
(11.5)
Computing each of these derivatives gives us the following system of equations (again,
11.1. LEAST SQUARES FIT: TWO PARAMETER FUNCTIONS
117
a0 and a1 are the unknowns):
n
X
(yi − a0 − a1 xi ) = 0
i=1
n
X
(11.6)
xi (yi − a0 − a1 xi ) = 0
i=1
We can rewrite these equations in matrix form:
Pn
Pn
Pnn
Pni=1 x2i a0 = Pni=1 yi
a1
i=1 xi yi
i=1 xi
i=1 xi
In this case, we can compute the solution directly, namely
P
P
P
P
( ni=1 x2i ) ( ni=1 yi ) − ( ni=1 xi yi ) ( ni=1 xi )
a0 =
P
P
2
n ( ni=1 x2i ) − ( ni=1 xi )
Pn
Pn
Pn
n ( i=1 xi yi ) − ( i=1 xi ) ( i=1 yi )
a1 =
P
P
2
n ( ni=1 x2i ) − ( ni=1 xi )
(11.7)
(11.8)
(11.9)
Now that we have a solution for our line-fitting problem, we can use some simple
statistical-type measures to assess the fit. First, is the variance Sy :
r
St
n−1
n
X
St =
(yi − ȳ)2
Sy =
(11.10)
(11.11)
i=1
P
where ȳ = i yi /n is the average. This looks at the variation of all the data around
the mean. We can also look at the variation of the data around our best fit line:
r
Sr
Sy/x =
(11.12)
n−2
Figure 11.3 illustrates these two measures. Using these measures, we can define a
goodness-of-fit measure, so-called “coefficient of determination”:
r2 =
St − Sr
St
(11.13)
When Sr becomes very small, then we have very small mismatch between our function
f (x) and the data yi and r2 → 1, and the function captures the behavior of the data
very well. If, on the otherhand, r2 → 0, then this means that Sr is getting close to
St . Intuitively, this means that our total error, Sr , looks very similar to the spread
of the data around the mean, St , and the function, statistically, does not capture the
behavior of the data well.
118
CHAPTER 11. LINEAR REGRESSION
Standard Deviation for Regression Line
Sy
Sy/ x
St
n 1
Sy/x
Sy
Sr
n 2
n
St
( yi
y)2 ; y
i 1
n
Sr
( yi
a0
1
n
n
yi
i 1
a1 x i ) 2
i 1
Sy : Spread around the mean
Sy/x : Spread around the regression line
November 10, 2011
Figure
11.3: Illustration of variation of data around the mean of the data (St ) and
Slide 25
around the curve fit (Sr .
For the straight-line case that we have considered in this section, we can write
down the value of r explicitly:
P
P
P
n ( ni=1 xi yi ) − ( ni=1 xi ) ( ni=1 yi )
q P
(11.14)
r=q P
Pn
Pn
2
2
n
n
2
2
n ( i=1 xi ) − ( i=1 xi ) n ( i=1 yi ) − ( i=1 yi )
Exercise: Use the (wind tunnel experiment) data from Table 14.1 of Ref [1], to
perform a least squares fit (linear regression), and then assess the quality of the fit
using the “goodness-of-fit” measure described above.
Example 11.1.1. Example 14.5 of Ref [1].
Example 11.1.2. Example 14.6 of Ref [1].
Exercise: Learn to implement the MATLAB function, linregr, for linear regression and use it to again solve the three example problems in this section.
There are also several other two-parameter functions that on the surface appear
to be nonlinear in the parameters, but that we can easily transform into a straight
line regression problem.
Consider the exponential form. If we apply the ln to
both sides, we get
ln f = ln α1 + β1 x
(11.15)
So, after applying the ln, we have an equation that looks like a straight line. Now,
if we take ln yi and now using Equation 11.15 to fit (xi , ln yi ), we’ll get values for
ln α1 and β1 . The final step is then to take the exponential of ln α1 to retrieve the
value of α1 . Table 11.2 summarizes the transformations for each of the functions in
Table 11.1.
Example 11.1.3. Case Study 14.6 of Ref [1].
11.2. POLYNOMIAL REGRESSION
119
Table 11.1: Two Parameter Functions for Regression.
Exponential
f (x) = α1 eβ1 x
Power Law
f (x) = α2 xβ2
Saturation Growth Rate
f (x) = α3
x
x + β3
Table 11.2: Two Parameter Functions for Regression.
Transformed Function
Data Transformation
Fit Parameters
ln f (x) = ln α1 + β1 x
(xi , ln yi )
ln α1 , β1
log10 f (x) = log10 α2 + β2 log10 x
(log10 xi , log10 yi )
1 1
,
xi y i
log10 α2 , β2
1
β3 1
1
=
+
f (x)
α3 α3 x
11.2
1 β3
,
α3 α3
Polynomial Regression
Thus far, we have only considered line functions that have only two parameters. We
can easily generalize the procedure to arbitrary order polynomials. We’ll begin with
quadratic polynomials. Consider
f (x) = a0 + a1 x + a2 x2
(11.16)
The total square error between this function and our given data (xi , yi ) is
Sr =
n
X
yi − a0 − a1 xi − a2 x2i
2
(11.17)
i=1
Now we have three parameters: a0 , a1 , a2 . As before, the least square error occurs
when ∇Sr = 0. In this case
∂Sr
=0
∂a0
∂Sr
=0
∂a1
∂Sr
=0
∂a2
(11.18)
(11.19)
(11.20)
120
CHAPTER 11. LINEAR REGRESSION
This yields a (linear) system of equations that possesses three equations and three
unknowns. In matrix form, those equations are

Pn
Pn 2     Pn
a0
Pnn
Pni=1 x2i Pni=1 x3i
Pni=1 yi
 i=1 xi
 a1  =  i=1 xi yi 
Pn 2 Pni=1 x3i Pni=1 x4i
Pn 2
a2
i=1 xi
i=1 xi
i=1 xi
i=1 xi yi

(11.21)
This idea generalizes to any mth order polynomial. An mth order polynomial will
have m + 1 parameters. Following the procedure above will yield a linear system
m + 1 equations.
Example 11.2.1. Example 15.1 of Ref [1].
Exercise: Learn to implement the MATLAB function, polyfit, for polynomial
regression and use it to fit 2nd and 3rd order polynomials for the above example
problem.
11.3
Multiple Linear Regression
We have only considered fitting data with functions that have one independent variable. If our data has multiple indendent variables, it is possible to best-fit that data
with multivariable functions. Let us consider data [(xi , zi ), yi ]. That is, we now have
two independent variables (xi , zi ) and our dependent variable yi . Now we will be
fitting functions with two independent variables f (x, z).
Let’s consider the case of fitting a plane:
f (x, z) = a0 + a1 x + a2 z
(11.22)
As before, we construct the sum of the square of the errors:
Sr =
n
X
(yi − a0 − a1 xi − a2 zi )2
(11.23)
i=1
Here, again, we have three parameters a0 , a1 , a2 . The best-fit occurs when the gradient is zero: ∇Sr = 0. This yields a linear system with three equations:
    Pn


Pn
Pn
n
x
z
a
y
i
i
0
i
i=1
i=1
i=1
P
Pn 2 Pn
P
 ni=1 xi
 a1  =  ni=1 xi yi 
x
x
z
i
i
i
i=1
i=1
Pn
Pn
Pn 2
Pn
a2
i=1 zi
i=1 xi zi
i=1 zi
i=1 zi yi
(11.24)
In analogy with polynomial regression, we can easily extend this case to any number
of dimensions (independent variables) and follow the same procedure.
Example 11.3.1. Example 15.2 of Ref [1].
11.4. GENERAL LINEAR LEAST SQUARES REGRESSION
11.4
121
General Linear Least Squares Regression
To this point, we have considered three forms of linear regression: straight line,
polynomial, and multiple dimensions. We can easily encompass all these cases in a
general formulation of the linear regression problem. Given data (xi , yi ), i = 1, . . . , n,
where now xi is potentially a vector (the multiple dimensions case), we seek to fit a
function f (x) of the form
f (x) = a0 f0 (x) + a1 f1 (x) + · · · + am fm (x)
(11.25)
where now each fj (x), j = 0, . . . , m is some function of our independent variables.
In the straight line case, x = x, f0 = 1, and f1 = x. In the polynomial case, x = x,
f0 = 1, f1 = x, . . . , fm = xm . In multiple linear regression case x = (x, z), f0 = 1,
f1 = x, and f2 = z. So, in this general form, as long as the function, f (x) we wish
to use is linear in the unknown coefficients, we can represent the problem in the
generalized form as discussed above.
For formulating the solution in this general case, we begin at a slightly different
point. Instead of directly writing the sum of the square of the errors, we instead
express the relationship between the data yi , our fitting function, and the error in
matrix form as follows:
y = Aa + e
(11.26)
where y is the n × 1 vector of data, e is the n × 1 vector of errors, a is the (m + 1) × 1
vector of unknown coefficients, and A is an n × (m + 1) matrix:


f0 (x1 ) f1 (x1 ) . . . fm (x1 )
 f0 (x2 ) f1 (x2 ) . . . fm (x2 ) 


A =  ..
(11.27)
..
.. 
.
.
 .
.
.
. 
f0 (xn ) f1 (xn ) . . . fm (xn )
That is, each row i corresponds to evaluating our functions at the data points xi .
Now, we can rearrange to see the error is e = y − Aa and then compute the sum
of the square of the errors as
Sr = eT e =
n
X
i=1
yi −
m+1
X
!2
Aij aj
(11.28)
j=1
Upon computing ∇Sr , we get the following system of equations:
AT Aa = AT y
(11.29)
These are the so-called Normal Equations. This is the most general formulation
of the linear regression problem. An important point of consideration in the solution
of this system is that the condition number of this system can be easily be quite
large. In particular, κ(AT A) ≈ κ(A)2 . Thus, solving the normal equations using
the Cholesky decomposition could be numerically difficult if A is even moderately
122
CHAPTER 11. LINEAR REGRESSION
ill-conditioned. Instead, what is typically done is to use the QR decomposition
as it is more numerically stable. This is what is done in MATLAB for example.
In MATLAB, you can form the (non-square) matrix A and the data vector y and
simply use the “slash” command: A\y. The solution will be the vector of coefficients
a.
Example 11.4.1. Example 15.3 of Ref [1].
Chapter 12
Interpolation
In Chapter 11, we considered data that we wished to approximate using a specified
function. In this chapter, we consider the case in which we want to construct a
function that exactly matches the given data. Such instances arise in many places,
e.g. tabulated thermodynamic data, atmospheric data, material properties, etc.
Thus, we are given data (xi , yi ), i = 1, . . . , n and we wish to construct f (xi ) = yi
for all data points. This is called interpolation and a function f (x) that satisfies
f (xi ) = yi is said to interpolate the data.
12.1
Polynomial Interpolation
We first begin with the case of interpolating our data with a single polynomial, in
contrast with multiple polynomials considered in Section 12.2. See Figure 12.1 for
an illustration.
12.1.1
Monomial Functions and the Vandermonde Matrix
We first consider polynomial functions based on a sum of monomials:
f (x) = a1 + a2 x + a3 x2 + ... + an xn−1
(12.1)
As we can see right away, if we have n data points, there are n coefficients to be
determined. Thus, for n data points, we must interpolate using a polynomial of order
n − 1. Now, we use the interpolation condition, f (xi ) = yi for each data pair. This
gives us n equations we can express in matrix form:

   
1 x1 x21 . . . xn−1
a1
y1
1
1 x2 x2 . . . xn−1   a2   y2 
2
2

   
1 x3 x2 . . . xn−1   a3   y3 
(12.2)
3
3

  =  
 .. ..
.. . .
..   ..   .. 
. .
.
.
.  .   . 
2
n−1
1 xn xn . . . x n
an
yn
Solving this linear system will yield the coefficients for our polynomial and, therefore,
give us the interpolating function of our data. However, there is one problem. The
123
124
CHAPTER 12. INTERPOLATION
y
yi
f (x)
xi
x
Figure 12.1: Illustration of polynomial interpolation.
matrix in Equation (12.2) is known as the Vandermonde matrix and it is notoriously
ill-conditioned. So ill-conditioned, in fact, that it is effectively unusable.
Example 12.1.1. Vandermonde System for Interpolation
Consider the data (300, 0.616), (400, 0.525), and (500, 0.457). Since we have three
data points, we will interpolate with a quadratic polynomial. Assembling the linear
system, following Equation (12.2), we have

  

1 300 90, 000
a1
0.616
1 400 160, 000 a2  = 0.525
1 500 250, 000 a3
0.457
Matlab reports that the condition of this matrix is 5.89 × 106 .
Because of the ill-conditioning of such systems, we must resort to other forms of
polynomials for interpolation. These will still give the same curve. There is
only one polynomial that will interpolate the given data points, but we can write it
in a different form, more conducive to numerical implementation.
12.1.2
Lagrange Polynomials
We consider here functions that are a sum of Lagrange polynomials.
f (x) = y1 L1 (x) + y2 L2 (x) + · · · + yn Ln (x)
(12.3)
The key idea here is that the coefficient in front of each of the Lagrange polynomial
terms, Li (x), is the data we are trying to interpolate. In particular, the Lagrange
12.1. POLYNOMIAL INTERPOLATION
125
polynomials possess the property that Li (xi ) = 1, i = 1, . . . , n and Li (xj ) = 0, i 6=
j, j = 1, . . . , n. This naturally gives us our interpolation condition f (xi ) = yi .
Let’s begin with the linear case (i.e. two data points).
f (x) = y1 L1 (x) + y2 L2 (x)
(12.4)
We need L1 (x1 ) = 1 and L1 (x2 ) = 0. Similarly, we need L2 (x1 ) = 0 and L2 (x2 ) = 1.
These naturally give the following forms for the Lagrange polynomials:
x − x2
x1 − x2
x − x1
L2 (x) =
x2 − x1
L1 (x) =
and thus
f (x) = y1
x − x2
x1 − x2
+ y2
x − x1
x2 − x1
The quadratic (three data point) case is very similar.
(x − x2 ) (x − x3 )
(x1 − x2 ) (x1 − x3 )
(x − x1 ) (x − x3 )
L2 (x) =
(x2 − x1 ) (x2 − x3 )
(x − x1 ) (x − x2 )
L3 (x) =
(x3 − x1 ) (x3 − x2 )
L1 (x) =
and, therefore,
(x − x2 ) (x − x3 )
(x − x1 ) (x − x3 )
(x − x1 ) (x − x2 )
f (x) = y1
+y2
+y3
(x1 − x2 ) (x1 − x3 )
(x2 − x1 ) (x2 − x3 )
(x3 − x1 ) (x3 − x2 )
Thus, we see that interpolating functions that use Lagrange polynomials have n
coefficients, yi , and n Lagrange polynomials of order n − 1. We can succinctly write
such functions as
f (x) =
n
X
i=1
yi Li (x),
n
Y
(x − xj )
Li (x) =
(xi − xj )
j=1
(12.5)
j6=i
So far, we have developed interpolants based on single functions. However, one
major drawback of such an approach is that high order polynomials tend to be
highly oscillatory, even when the data is quite smooth. See Figure 12.2. Thus, for
large quantities of data, a single interpolating function is not a practical solution.
One remedy is, instead of a single high-order polynomial, we use many low-order
polynomials together. This is the notion of spline functions.
126
CHAPTER 12. INTERPOLATION
Figure 12.2: Runge function (red), 1/ (1 + 25x2 ), with 5th order interpolating polynomial (blue) and 9th order interpolating polynomial (green) – interpolations done
on equidistant points in the range [−1, 1]. Taken from https://en.wikipedia.org/
wiki/Runge%27s_phenomenon
12.2. SPLINES
127
y
fi
fi+1 (x)
1 (x)
yi
fi (x)
x
xi
Figure 12.3: Illustration of spline interpolation.
12.2
Splines
The idea of splines is to perform “piecewise-interpolation” of our data. So, between
each interval (xi , xi+1 ), interpolate the data using a lower order polynomial. The
points at which the functions meet are called “knots” (the knots tie together each of
the polynomials into a single function). Figure 12.3 illustrates a spline interpolant.
It is important to note that splines are in general capable of interpolating data
of size n (i.e., n points) where n orderofthesplinepolynomial.
12.2.1
Linear Splines
First, we consider a linear spline. In this case, we use a linear function in each data
interval (xi , xi+1 ), i = 1, . . . , n − 1. In particular,
fi (x) = ai + bi (x − xi ),
i = 1, . . . , n − 1
(12.6)
The first coefficient is determined by the interpolation condition, fi (xi ) = yi , giving
ai = yi . The second coefficient is also determined by interpolation, but at the other
point in the interval: fi (xi+1 ) = yi+1 . This gives bi = (yi+1 − yi )/(xi+1 − xi ).
So, for each interval, we have a different function. Thus, when comparing to
polynomial interpolation discussed in Section 12.1, we see we have an additional
step. Namely, given a value of x, we must determine in what interval x lies in order
to determine which of our splining functions are to be used. Thus, when using spline
interpolants, we must make use of efficient search algorithms to search for the correct
interval, e.g. binary search.
Example 12.2.1. Examples 18.1 and 18.2 of Ref [1].
128
CHAPTER 12. INTERPOLATION
y
x
fi0 (x)
Figure 12.4: Illustration of derivative of linear spline interpolant.
Another consideration arises when we need to consider derivatives of our interpolating function. In the linear spline case, the first derivatives will not be continuous
and the second derivatives are not even defined, see Figure 12.4. When derivative
information is needed, we need to use higher order spline functions so that we may
enforce continuity of derivatives. In general, splines of order n + 1 are needed to yield
n continuous derivatives. Perhaps the most typical case of higher-order splines arise
in the form of cubic splines.
12.2.2
Cubic Splines
For cubic splines, the function on each interval takes the form
fi (x) = ai + bi (x − xi ) + ci (x − xi )2 + di (x − xi )3 ,
i = 1, . . . n − 1
(12.7)
for n data points (xi , yi ). We have n − 1 intervals, with four coefficients per interval
— thus we need 4(n − 1) conditions to determine all the coefficients.
The first condition is that the function must interpolate the data: fi (xi ) = yi , i =
1, . . . , n − 1. This gives ai = yi . The second condition is that the spline must be
continuous at the knots: fi (xi+1 ) = yi+1 . Let hi = xi+1 − xi . Then, we have
ai + bi hi + ci h2i + di h3i = yi+1 ,
i = 1, . . . , n − 1
12.2. SPLINES
129
The third condition is that the derivative of the spline must be continuous at the
0
knots: fi0 (xi+1 ) = fi+1
(xi+1 ). The derivative is
fi0 (x) = bi + 2ci (x − xi ) + 3di (x − xi )2
(12.8)
Thus, applying our third condition, we have the following n − 2 conditions:
bi + 2ci hi + 3di h2i = bi+1 , i = 1, . . . , n − 2
(12.9)
The fourth set of conditions is that the second derivative must be continuous at the
00
(xi+1 ). The second derivative is
knots: fi00 (xi+1 ) = fi+1
fi00 (x) = 2ci + 6di (x − xi )
(12.10)
Thus, applying our fourth condition, we have the following n − 2 conditions:
ci hi + 3di hi = ci+1 , i = 1, . . . , n − 2
(12.11)
These interpolation and continuity conditions have given us 2(n − 1) + 2(n − 2)
constraints. We need still two more conditions to be able to fully constrain the
system for all the unknown coefficients. We have many choices, but there are three
common ones that are used:
00
1. Natural Condition: f100 (x1 ) = fn−1
(xn ) = 0.
0
(xn ) = A2 , where A1 , A2 are
2. Clamped End Condition: f10 (x1 ) = A1 , fn−1
given numbers.
000
000
3. “Not-a-knot” Condition: f1000 (x2 ) = f2000 (x2 ) and fn−2
(xn−1 ) = fn−1
(xn−1 )
Choosing one of these sets of conditions will yield a system of equations to solve that
will give the coefficients for each cubic spline in each of the n − 1 intervals. Actually,
this system can be written in tridiagonal form yielding an efficient solution strategy.
In Matlab, one may use the spline function to construct cubic splines utilizing the
“not-a-knot” condition as well as the interp1 function which will construct linear
splines, cubic splines, etc. depending on the method supplied by the user.
Example 12.2.2. Examples 18.3 of Ref [1].
Exercise: Generate and plot the upper half of an airfoil by fitting a cubic
spline to the following truncated airfoil data: x = [0, 1, 2, 4, 8, 16, 24, 40, 56, 72, 80]/80;
y = [0, 28, 39, 53, 70, 86, 90, 79, 55, 22, 2]/1000; where (x, y) represents 11 points on
the airfoil. Re-do this problem by using the MATLAB function, spline.
130
CHAPTER 12. INTERPOLATION
Chapter 13
Numerical Integration
The integration of functions is a common task in engineering applications. However,
we do not always have the luxury of functions that can be analytically integrated.
We may not even have analytical functions to begin with — we may have only data
points! Thus, we need to develop numerical schemes to integrate functions (or data)
in such circumstances.
The principal idea is that we approximate the integral of a function as the
weighted sum of evaluations of that function:
Z
b
f (x) dx ≈
a
n
X
ci f (xi )
(13.1)
i=1
where ci are weights and xi ∈ [a, b]. Different numerical integration methods will
have differing weights and evaluation points, but we can always reduce the methods
to this primitive form. There are two primary classes of methods that we’ll consider
here: so-called Newton-Cotes formulae, suitable in many circumstances, including
integrating when analytical functions are not available, and Gaussian quadrature
formulae, used typically for integrals of complex functions that are not analytically
available.
13.1
Newton-Cotes Rules
Newton-Cotes rules are based on a very simple idea. Namely, if we interpolate our
function at equally spaced points, or we only have equally spaced points, then we can
interpolate the data using polynomials and then integrate the resulting interpolant.
13.1.1
Trapezoidal Rule
If we use a linear interpolant, then we can integrate this interpolating function. The
area under this curve is a trapezoid, motivating the name trapezoidal rule for this
numerical method; see Figure 13.1. To derive the final rule, we simply perform the
131
132
CHAPTER 13. NUMERICAL INTEGRATION
y
f (x)
a
b
x
Figure 13.1: Illustration of trapezoidal rule.
integration.
Z
a
b
Z b
f (b) − f (a)
f (x) dx ≈
f (a) +
(x − a) dx
b−a
a
(f (b) − f (a))
= f (a)(b − a) +
(b − a)
2
f (b) + f (a)
=
(b − a)
2
So, then, the trapezoidal rule has coefficients c1 = c2 = (b − a)/2 and x1 = a and
x2 = b.
As has been done many times previously, we can examine the error in this approximation by considering Taylor expansions of our function. We omit the details
and simply state that, for the trapezoidal rule, the error is
|E| =
1 00
f (ξ)(b − a)3 ,
12
ξ ∈ [a, b]
(13.2)
In particular, we observe that, since the error is dominated by the second derivative,
that the trapezoidal rule is exact for constant and linear functions! This is not
surprising since the development of the trapezoidal rule began with using linear
interpolation.
Example 13.1.1. Example 19.1 of Ref [1].
Exercise:
problem.
Use the MATLAB function, trapz, to solve the above example
13.1. NEWTON-COTES RULES
13.1.2
133
Simpson’s Rule
If instead we use a quadratic polynomial to interpolate, as opposed to a linear function, we will arrive at Simpson’s Rule. Let x1 = a, x2 = (a + b)/2, x3 = b. Then,
we use a Lagrange interpolant to derive Simpson’s rule:
Z b
Z b
(x − x1 )(x − x3 )
(x − x2 )(x − x3 )
f (x0 ) +
f (x1 )+
f (x) dx ≈
(x1 − x2 )(x1 − x3 )
(x2 − x1 )(x2 − x3 )
a
a
(x − x1 )(x − x2 )
f (x2 ) dx
(x3 − x1 )(x3 − x2 )
(b − a)
(f (x1 ) + 4f (x2 ) + f (x3 ))
=
6
Therefore, we see that, for Simpson’s rule, c1 = c3 = (b − a)/6 and c2 = 2(b − a)/3.
As was the case with the Trapezoidal rule, we can use a Taylor analysis to examine
the error in Simpson’s rule. Again, omitting the details,
|E| =
(b − a)5 (4)
f (ξ),
90
ξ ∈ [a, b]
Interestingly, because the error is dominated by the fourth derivative, we see that
not only are quadratic polynomials integrated exactly, as we would expect, but also
cubic functions.
Example 13.1.2. Example 19.3 of Ref [1].
13.1.3
Composite Rules
Although we could conceptually proceed with higher-order polynomials to achieve
more accuracy with our integration rules, we can follow a simpler strategy. Just as
was the case with interpolation, instead of pursuing higher order oscillatory polynomials, we can subdivide the interval [a, b] into equally spaced segments, and then
apply our integration rules on each subinterval and the sum the result. See Figure 13.2 for an illustration. These rules are called composite integration rules.
We proceed simply by decomposing our integral over the subintervals:
Z b
Z x1
Z x2
Z xn
f (x) dx =
f (x) dx +
f (x) dx + · · · +
f (x) dx
a
x0
x1
xn−1
Suppose we have n equally space segments so that each interval length h = (b−a)/n.
Now if we apply the trapezoidal on each of the intervals, we’ll get the composite
trapezoidal rule:
Z b
h
h
h
f (x) dx = (f (x0 ) + f (x1 )) + (f (x1 ) + f (x2 )) + · · · + (f (xn−1 ) + f (xn ))
2
2
2
a
"
!
#
n−1
X
h
=
f (x0 ) + 2
f (xi ) + f (xn )
2
i=1
134
CHAPTER 13. NUMERICAL INTEGRATION
y
f (x)
a
b
h
x
Figure 13.2: Illustration of composite numerical integration rules.
Example 13.1.3. Example 19.2 of Ref [1].
Exercise: Use the MATLAB function, cumtrapz, to solve the above example
problem.
Similarly, we can apply Simpson’s rule. However, we must work with two intervals at-a-time since we need three points to evaluate the function. Thus, to apply
composite Simpson’s rule, we must have an even number of intervals (or an odd
number of points). Proceeding, we have
Z
b
2h
2h
(f (x0 ) + 4f (x1 ) + f (x2 )) +
(f (x2 ) + 4f (x3 ) + f (x4 ))
6
6
2h
+ ··· +
(f (xn−2 ) + 4f (xn−1 ) + f (xn ))
6
"
!
!
#
n−2
n−1
X
X
h
=
f (x0 ) + 2
f (xi ) + 4
f (xi ) + f (xn )
3
i=2,4,6,...
i=1,3,5,...
f (x) dx =
a
To assess the error, we can sum the contribution for each of the intervals. For
the composite trapezoidal rule, the error for an interval is given in Equation (13.2),
so the total error is
Et =
n
X
(b − a)3
i=1
12n3
f 00 (ξi )
n
(b − a)3 X 00
f (ξi )
=
12n3 i=1
13.2. GAUSS QUADRATURE
135
The sum is now only over the second derivative. But this just looks like the average:
n
1 X 00
f (ξi )
f¯00 =
n i=1
Thus, we have
(b − a)3 ¯00
nf
12n3
(b − a)3 ¯00
=
f
12n2
(b − a) 2 ¯00
=
hf
12
Et =
(13.3)
where, again, h is the interval spacing. Thus, we see that the composite trapezoidal
rule is O(h2 ). A similar argument holds for the composite Simpson’s rule:
(b − a)5 ¯(4)
nf
90n5
(b − a) 4 ¯(4)
=
hf
90
Et =
(13.4)
Here, we see Simpson’s rule is O(h4 ).
Example 13.1.4. Example 19.4 of Ref [1].
Additional Examples for applying Trapezoidal and Simpson’s rules of numerical
integration:
Example 13.1.5. Example 19.5 of Ref [1].
Example 13.1.6. Case Study 19.9 of Ref [1].
13.2
Gauss Quadrature
So far, we have considered numerical integration rules that can be applied both to
functions and to datasets that may not have an explicit function that we can evaluate.
If we further pursue the case where we do have a function that we can evaluate, there
are opportunities for more accurate integration rules. In particular, we can take
advantage of cancellation of errors. Consider the illustration in Figure 13.3. If we
carefully select at what points we evaluate the function, we can better balance the
amount of positive and negative errors that we incur in the integration. Gaussian
Quadrature rules are built by choosing (optimizing) the choice of the coefficients
ci and the evaluation points xi such that polynomials of up to a certain order are
integrated exactly. This procedure is called the method of undetermined coefficients.
136
CHAPTER 13. NUMERICAL INTEGRATION
y
error
x0
a
f (x)
x1
b
x
Figure 13.3: Illustration of error incurred in integration approximation.
To illustrate this process, we will re-derive the Trapezoidal by following the procedure for choosing the coefficients ci (the points xi are already determined for Trapezoidal rule). Here,
Z b
f (x) dx = c0 f (a) + c1 f (b)
a
So we have two coefficients to determine, c0 , c1 . Thus, we can enforce two conditions
to determine the coefficients. The first is that we wish to integrate constant functions
exactly; the second is that we wish to integrate linear functions exactly. So, if we
take f (x) = 1 (constant function), then
Z
a
1 dx = b − a = c0 (1) + c1 (1)
b
Similarly, if we choose f (x) = x (linear function), then
a
Z
x dx =
b
b 2 − a2
= c0 (a) + c1 (b)
2
This gives a linear system of equations. Solving we find, as we expect, c0 = c1 =
(b − a)/2.
Now, we follow this approach, but allow x0 and x1 to vary as well. For simplicity
of the derivation, we consider the integration over the interval [−1, 1]; we will discuss
later how to apply these results to the general interval [a, b]. Thus,
Z
1
f (x) dx ≈ c0 f (x0 ) + c1 f (x1 )
−1
13.2. GAUSS QUADRATURE
137
Now, we have 4 unknowns, we can enforce four constraints. So, we will seek to exactly
integrate constant, linear, quadratic, and cubic functions. Therefore, we have the
following system of nonlinear equations:
1
Z
1 dx = 2 = c0 + c1
(13.5)
x dx = 0 = c0 x0 + c1 x1
(13.6)
2
= c0 x20 + c1 x21
3
(13.7)
x3 dx = 0 = c0 x30 + c1 x31
(13.8)
−1
Z 1
Z
−1
1
x2 dx =
−1
Z 1
−1
In this case, we can solve the equations analytically. Solve (13.6) for c1 and substitute
into (13.8):
−c0 x0
x1
c0 x 0 3
x =0
⇒ c0 x30 −
x1 1
⇒ x20 = x21
c1 =
Since x0 6= x1 , then we must have x0 = −x1 . Substituting this result into (13.6), we
have c0 = c1 . Using this result with (13.5), we find c0 = c1 = 1 . Now, finally, using
√
√
this result in (13.7), we find x1 = 1/ 3 and, therefore, x0 = −1/ 3 .
Notice that this integration rule will evaluate up to cubic polynomials exactly
with only two function evaluations. As such, this is called a two-point quadrature
rule. Following the procedures above, one can derive the equations and integration
rules for n points. Such rules are tabulated. In general, n-point Gaussian quadrature
rules will integrate polynomials of order 2n − 1 exactly.
To this point, our two-point quadrature rule is valid for integrals posed on the
interval [−1, 1]. To apply this integration rule to an interval [a, b], we must use a
change of variables:
Z b
Z 1
f (x) dx =
f ((g(t))) g 0 (t) dt
(13.9)
−1
a
Take g(t) = a1 + a2 t, then
g(−1) = a = a1 + a2 (−1)
g(1) = b = a1 + a2 (1)
so that
a1 =
a+b
b−a
, a2 =
2
2
138
CHAPTER 13. NUMERICAL INTEGRATION
So, we can map t ∈ (−1, 1) to x ∈ (a, b) as
(b + a) + (b − a)t
2
b−a
dt
dx =
2
x=
Example 13.2.1. Example 20.3 of Ref [1].
(13.10)
Bibliography
[1] S. C. Chapra. Applied Numerical Methods with Matlab for Engineers & Scientists.
McGraw-Hill, 3rd edition, 2011.
139