N - MGNet

advertisement
Computational Methods in Applied Sciences I
University of Wyoming MA 5310
Spring, 2013
Professor Craig C. Douglas
http://www.mgnet.org/~douglas/Classes/na-sc/notes/2013sw.pdf
Course Description: First semester of a three-semester computational methods
series. Review of basics (round off errors and matrix algebra review), finite
differences and Taylor expansions, solution of linear systems of equations
(Gaussian elimination variations (tridiagonal, general, and sparse matrices),
iterative methods (relaxation and conjugate gradient methods), and
overdetermined systems (least squares)), nonlinear equations (root finding of
functions), interpolation and approximation (polynomial, Lagrange, Hermite,
piecewise polynomial, Chebyshev, tensor product methods, and least squares
fit), numerical integration (traditional quadrature rules and automatic quadrature
rules), and one other topic (simple optimization methods, Monte-Carlo, etc.).
(3 hours)
Prerequisites: Math 3310 and COSC 1010. Identical to COSC 5310, CHE 5140,
ME 5140, and CE 5140.
Suggestion: Get a Calculus for a single variable textbook and reread it.
Textbook: George Em Karniadakis and Robert M. Kirby II, Parallel Scientific
Computing in C++ and MPI: A Seamless Approach to Parallel Algorithms and
Their Implementation, Cambridge University Press, 2003.
2
Preface: Outline of Course
Errors
In pure mathematics, a+b is well defined and exact. In computing, a and b might
not even be representable in Floating Point numbers (e.g., 3 is not representable
in IEEE floating point and is only approximately 3), which is a finite subset of
the Reals. In addition, a+b is subject to roundoff errors, a concept unknown in
the Reals. We will study computational and numerical errors in this unit.
See Chapter 2.
C++ and parallel communications
If you do not know simple C++, then you will learn enough to get by in this
class. While you will be using MPI (message passing interface), you will be
taught how to use another set of routines that will hide MPI from you. The
advantages of hiding MPI will be given during the lectures. MPI has an
3
enormous number of routines and great functionality. In fact, its vastness is also
a disadvantage for newcomers to parallel computing on cluster machines.
See Appendix A and a web link.
Solution of linear systems of equations Ax=b
We will first review matrix and vector algebra. Then we will study a variety of
direct methods (ones with a predictable number of steps) based on Gaussian
elimination. Then we will study a collection of iterative methods (ones with a
possibly unpredictable number of steps) based on splitting methods. Then we
will study Krylov space methods, which are a hybrid of the direct and iterative
paradigms. Finally we will study methods for sparse matrices (ones that are
almost all zero).
See Chapters 2, 7, and 9.
4
Solution of nonlinear equations
We will develop methods for finding specific values of one variable functions
using root finding and fixed point methods.
See Chapters 3 and 4.
Interpolation and approximation
Given {f(x0), f(x1), …, f(xN+1)}, what is f(x), x0xxN+1 and xi<xi+1?
See Chapter 3.
Numerical integration and differentiation
Suppose I give you an impossible to integrate (formally) function f(x) and a
domain of integration. How do you approximate the integral? Numerical
integration, using quadrature rules, turns out to be relatively simple.
5
Alternately, given a reasonable function g(x), how do I takes its derivative using
just a computer? This turns out to be relatively difficult in comparison to
integration. Surprisingly, from a numerical viewpoint, it is the exact opposite of
what freshman calculus students determine in terms of hardness.
Finally, we will look at simple finite difference methods for approximating
ordinary differential equations.
See Chapters 4, 5, and 6.
Specialized topic(s)
If there is sufficient time at the end of the course, one or more other topics will
be covered, possibly by the graduate students in the class.
6
1. Errors
1. Initial errors
a. Inaccurate representation of constants (, e, etc.)
b. Inaccurate measurement of data
c. Overly simplistic model
2. Truncation
a. From approximate mathematical techniques, e.g.,
ex = 1 + x + x2/2 + … + xn/n! + …
e = 1 +  + … + k/k! + E
3. Rounding
a. From finite number of digits stored in some base
b. Chopping and symmetric rounding
Error types 1-2 are problem dependent whereas error type 3 is machine
dependent.
7
Floating Point Arithmetic
We can represent a real number x by
x   (0 a1 a2  ...  am )bc ,
where 0aib, and m, b, and cM are machine dependent with common bases
b of 2, 10, and 16.
IEEE 755 (circa 1985) floating point standard (all of ~6 pages):
Feature
Bits total
Sign bits
Mantissa bits
Exponent bits
Exponent Range
Decimal digits
Single precision Double precision
32
64
1
1
23
52
8
11
[-44.85,38.53]
[-323.3,308.3]
7
16
8
Conversion between bases is simple for integers, but is really tricky for real
numbers. For example, given r base 10, its equivalent in base 16 is (r )10  (r )16
is derived by computing
0160 + 1161 + 2162 + … + 116-1 + 216-2 + …
Integers are relatively easy to convert. Real numbers are quite tricky, however.
Consider r1 = 1/10:
16 r 1 = 1.6 = 1 + 2/16 + 3/162 + …
16 r 2 = 9.6 = 2 + 3/16 + 4/162 + …
Hence, (.1)10  (.199999)16  a number with m digits in one base may not have
terminal representation in another base. It is not just irrationals that are a
problem (e.g., consider (3.0)10  (3.0)2 ).
9
Consider r = .115 if b = 10 and m = 2, then
r = .11 chopping
r = .12 symmetric rounding (r+.5bc-m-1 and then chop)
Most computers chop instead of round off. IEEE compliant CPUs can do both
and there may be a system call to switch, which is usually not user accessible.
Note: When the rounding changes, almost all nontrivial codes break.
Warning: On all common computers, none of the standard arithmetic operators
are associative. When dealing with multiple chained operations, none you would
expect are commutative, either, thanks to round off properties. (What a deal!)
Let’s take a look, one operator at a time.
10
Let e( x)  x  x in the arithmetic operations that follow in the remainder of this
section.
Addition:
x  y  ( x  e(x))  ( y  e( y))  (x  y)  (e(x)  e( y)) .
x  y  ____ is fun to construct an example.
x y
In addition, x  y can overflow (rounds off to  or underflow (rounds off to
zero) even though the number in infinite precision is neither. Overflow is a
major error, but underflow usually is not a big deal.
Warning: The people who defined IEEE arithmetic assumed that 0 is a signed
number, thus violating a basic mathematical definition of the number system.
Hence, on IEEE compliant CPUs, there is both +0 and -0 (but no signless 0),
which are different numbers in floating point. This seriously disrupts
11
comparisons with 0. The programming fix is to compare abs(expression) with 0,
which is computationally ridiculous and inefficient.
Decimal shifting can lead to errors.
Example: Consider b = 10 and m = 4. Then given x1  0.5055104 and
x2  ...x11  0.4000100 we have
x1  x2  0.50554104  0.5055104  x1.
Even worse, (...( x1  x2 )  x3)  ...)  x11)  x1 ,
but
(...( x11  x10)  x9) ...)  x1)  0.5059104 .
Rule of thumb: Sort the numbers by positive, negative, and zero values based on
their absolute values. Add them up in ascending order inside each category.
Then combine the numbers.
12
Subtraction:
x  y  (x  e(x))  ( y  e( y))  (x  y)  (e(x)  e( y)) .
If x and y are close there is a loss of significant digits.
Multiplication:
x  y  x  y  xe( y)  ye(x) .
Note that the e(x)e(y) term is not present above. Why?
13
Division:

x   x  e( x) 
1

  x  e( x)  xe( y)


y  y 1 e( y)/ y  y y
y2



where we used 1 1 r  r 2  r3 ...
1 r
Note that when y is sufficiently close to 0, it is utterly and completely disastrous
in terms of rounding error.
14
2. An introduction to C++ and parallel computing basics
See the C++ Primer at
http://www.mgnet.org/~douglas/Classes/na-sc/notes/C++Primer.pdf.
A parallel computing communications interface (parlib) is available from
http://www.mgnet.org/~douglas/Classes/hpc-xtc/notes/parlib.tgz
or
http://www.mgnet.org/~douglas/Classes/hpc-xtc/notes/parlib.zip
with documentation available from
http://www.mgnet.org/~douglas/Classes/hpc-xtc/notes/parlib.pdf.
15
Assume there are p processors numbered from 0 to p-1 and labeled Pi. The
communication between the processors uses one or more high speed and
bandwidth switches.
In the old days, various topologies were used, none of which scaled to more than
a modest number of processors. The Internet model saved parallel computing.
Today parallel computers come in several flavors (hybrids, too):
 Small shared memory (SMPs)
 Small clusters of PCs
 Blade servers (in one or more racks)
 Forests of racks
 GRID or Cloud computing
Google operates the world’s largest Cloud/GRID system. The Top 500 list
provides an ongoing list of the fastest computers willing to be measured. It is not
a comprehensive list and Google, Yahoo!, and many governments and
companies do not participate.
16
Data needs to be distributed sensibly among the p processors. Where the data
needs to be can change, depending on the operation, and communication is
usual.
Algorithms that essentially never need to communicate are known as
embarrassingly parallel. These algorithms scale wonderfully and are frequently
used as examples of how well so and so’s parallel system scales. Most
applications are not in this category, unfortunately.
To do parallel programming, you need only a few functions to get by:
 Initialize the environment and find out processor numbers i.
 Finalize or end parallel processing on one or all processors.
 Send data to one, a set, or all processors.
 Receive data from one, a set, or all processors.
 Cooperative operations on all processors (e.g., sum of a distributed vector).
17
Everything else is a bonus. Almost all of MPI is designed for compiler writers
and operating systems developers. Only a small subset is expected to be used by
regular people.
18
3. Solution of Linear Systems of Equations
3a. Matrix Algebra Review








Let R   rij  be mn and S   sij  be np.
 
Then T  RS   tij  is mp with tij  k 1rik skj .
n


SR exists if and only if m=p and SRRS normally.




Q=  qij   R  S  rij  sij exists if and only if R and S have the same dimensions.








Transpose: for R   rij  , RT   r ji  .
19
Inner product: for x,y n-vectors, (x,y) = xTy and (Ax,y) = (Ax)Ty.
Matrix-Matrix Multiplication (an aside)
for i = 1,M do
for j = 1,M do
for k = 1,M do
A(i,j) = B(i,k)*C(k,j)
or the blocked form
for i = 1,M, step by s, do
for j = 1,M, step by s, do
for k = 1,M step by s do
for l = i, i + s –1 do
for m = j, j + s –1 do
for n = k, k+s-1 do
A(l,m) = B(l,n)*C(n,m)
20
If you pick the block size right, the blocked algorithm runs 2X+ faster than the
standard algorithm.
Why does the blocked form work so much better? If you pick s correctly, the
blocks fit in cache and only have to be moved into cache once with double
usage. Arithmetic is no longer the limiting factor in run times for numerical
algorithms. Memory cache misses control the run times and are notoriously hard
to model or repeat in successive runs.
An even better way of multiplying matrices is a Strassen style algorithm (the
Winograd variant is the fastest in practical usage). A good implementation is the
GEMMW code (see http://www.mgnet.org/~douglas/ccd-free-software.html).
21
Continuing basic definitions…
If x  ( xi ) is an n-vector (i.e., a n1 matrix), then
diag ( x)   x







1
x2







n 
.
x
Let ei be a n-vector with all zeroes except the ith component, which is 1. Then
I = [ e1, e2, …, en ]
is the nn identity matrix. Further, if A is nn, then IA=AI=A.
22
The nn matrix A is said to be nonsingular if ! x such that Ax=b, b.
Tests for nonsingularity:
 Let 0n be the zero vector of length n. A is nonsingular if and only if 0n is the
only solution of Ax=0n.
 A is nonsingular if and only if det(A)0.
Lemma: ! A-1 such that A-1A=AA-1=I if and only if A is nonsingular.
Proof: Suppose  C such that CA-1, but CA=AC=I. Then
C=IC=(A-1A)C=A-1(AC)=A-1I=A-1. 
23
Diagonal matrices: D  
a







0
b
0
c
d









.
.
Triangular matrices: upper U   x x x x , strictly upper U  
0 x x x



x x x


0 x x



x
x


0 x
x



0

 , strictly lower L  
.
lower L   x

0





x x

x 0

x x x





x x 0



 x x x x 
 x x x 0


24
3b. Gaussian elimination
Solve Ux=b, U upper triangular, real, and nonsingular:
xn  abn and xn1  a 1
n1,n1
nn
If we define
n
a x 0,
in ij j

(bn1  an1,n xn )
then the formal algorithm is
n
xi  (aii )1(bi   ji1aij x j ) , i=n,n-1, …, 1.
Solve Lx=b, L lower triangular, real, and nonsingular similarly.
Operation count: O(n2) multiplies
25
Tridiagonal Systems
Only three diagonals nonzero around main diagonal:
a11x1  a12 x2
a21x1  a22 x2  a23x3
 b1
 b2
an,n1xn1  ann xn
 bn
Eliminate xi from (i+1)-st equations sequentially to get
x1  p1x2  q1
x2  p2 x3  q2
xn
 qn
where
26
a
p1   a12
b
q1   a 1
ai,i1
pi   a p  a
ii
i,i1 i1
bi  ai,i1qi1
qi   a p  a
ii
i,i1 i1
11
11
Operation count: 5n-4 multiplies
Parallel tridiagonal solvers
Parallel tridiagonal solvers come in several flavors, all of which are extremely
complicated. In the past I have confused entire classes at this point with one
such definition. I refer interested readers to the textbook and its associated
software.
Parallel or simple cyclic reduction are my favorite algorithms to parallelize or
vectorize tridiagonal solvers.
27
General Matrix A (nonsingular), solve Ax = f by Gaussian elimination
Produce A(k), f(k), k=1,…n, where A(1)=A and f(1)=f and for k=2, 3, …, n,
aij(k ) 













f i(k ) 
aij(k 1)
aij(k 1) 









0
ai(,kk1)1
ak(k1,1)k 1
ak(k1,1)j
i  k 1
i  k , j  k 1
i  k, j  k
f i(k 1)
i  k 1
ai(,kk1)1 (k 1)
(
k

1)
fi
 (k 1) f k 1
ak 1,k 1
28
ik
The 22 block form of A(k) is






(k ) A(k ) 
U
(
k
)
A 
.
(
k
)
0 A 
Theorem 3.1: Let A be such that Gaussian elimination yields nonzero diagonal
(k ) , k=1, 2, …, n. Then A is nonsingular and
elements akk
(1)
(1)a(2)
det A  a11
22
(n) .
ann
Also, A(n) U is upper triangular and A has the factorization
(2)
LU  A ,
where L  (mik ) is lower triangular with elements
29













0 for i  k
mik  1
ik
aik(k )
ik
(
k
)
akk
The vector
g  f (n)  L1 f .
(3)
Proof: Note that once (2) is proven, det( A)  det(L)det(U )  det(U ) , so (1)
follows.
Now we prove (2). Set LU  (cij ) . Then (since L and U are triangular and A(k) is
satisfied for k=n)
n
min(i, j)
cij  k 1mik akj(n)  k 1
30
mik akj(k ) .
From the definitions of aij(k ) and mik we get
mi,k 1ak(k1.1)j  aij(k 1)  aij(k ) for 2  k  i, k  j
and recall that aij(1)  aij . Thus, if ij, then
i1
i1 



cij  k 1mik akj(k )  aij(i)  k 1 aijk  aij(k 1)   aij(i)  aij .
When i>j, aij( j1)  0  (2).
Finally, we prove (3). Let h  Lg . So,
i
i
hi  k 1mik gk  k 1mik f k(k ) .
From the definitions of f i(k ), mik , and f i(1)  f i ,
31
mik f k(k )  f i(k )  f i(k 1)  hi  f i .
L nonsingular completes the proof of (3).
QED
Examples:










4 6 1
1 
1 
0 0
4 6
4 6
 1
A  A(1)  8 10 3 , A(2)  0 2 1  , A(3)  0 2 1  U and L   2 1 0






12 48 2
0 66 15
0 0 38
 3 33 1







and











4 6 1 
4 6 1
 1 0 0
A  A(1)  8 10 3  , A(2)  0 2 1 U and L   2 1 0 .




8 12 2
 0 0 0
 3 0 1


32



The correct way to solve Ax=f is to compute L and U first, then solve
Ly  f ,
Ux  y.
Generalized Gaussian elimination
1. Order of elimination arbitrary.
2. Set A(1)  A, and f (1)  f .
3. Select an arbitrary ai(1),j  0 as the first pivot element. We can eliminate x j
1 1
from all but the i1-st equation. The multipliers are mk , j  ak(1), j / ai(1)
,j .
1
4. The reduced system is now A(2) x  f (2) .
5. Select another pivot ai(2),j  0 and repeat the elimination.
1
1
1
2 2
(2)  0, r, s , then the remaining equations are degenerate and we halt.
6. If ars
33
Theorem 3.2: Let A have rank r. Then we can find a sequence of distinct row
and column indices (i1,j1), (i2,j2), …, (ir,jr) such that corresponding pivot
elements in A(1), A(2), …, A(r) are nonzero and aij(r )  0 if i  i1,i2, ,ir . Define
permutation matrices (whose columns are unit vectors)

(i ) (i )




( j1) ( j2)
,e ,
P  e 1 ,e 2 , ,e(ir ), ,e(in )  and Q  e


,e( jr ), ,e( jn )  ,

where {ik} and {jk} are permutations of {1,2,…,n}. Then
By=g
(where B  PT AQ, y  QT x, and g  PT f ) is equivalent to Ax=f and can be
reduced to triangular form by Gaussian elimination with the natural ordering.
Proof: Generalized Gaussian elimination alters A  A(1) by forming linear
combinations of the rows. Thus, whenever no nonzero pivot can be found, the
remaining rows were linearly dependent on the preceding rows. Permutations P
34
and Q rearrange equations and unknowns such that bvv  ai , j , v 1,2, , n . By
v v
the first half of the theorem, the reduced B(r) is triangular since all rows r+1, …,
n vanish.
QED
Operation Counts
 To compute aij(k ) : (n-k+1)2 + (n-k+1)
(do quotients only once)
 To compute f i(k ) : (n-k+1)
n
n
 Recall that  k 1k  n(n 1) and k 1k 2  n(n 1)(2n 1) . Hence, there are
2
6
n(n2 1) multiplies to triangularize A and n(n1) multiplies to modify f.
2
3
 Using the Ly=f and Ux=y approach, computing xi requires (n-i) multiplies
plus 1 divide. Hence, only n(n1) multiplies are required to solve the
2
triangular systems.
35
3
Lemma: n  mn2  n operations are required to solve m systems Ax( j)  f ( j) ,
3
3
j=1, …, m by Gaussian elimination.
Note: To compute A-1 requires n3 operations. In general, n2 operations are
required to compute A-1f(j). Thus, to solve m systems requires mn2 operations.
Hence, n3+mn2 operations are necessary to solve m systems.
Thus, it is always more efficient to use Gaussian elimination instead of
computing the inverse!
We can always compute A-1 by solving Axi=ei, i=1,2,…,n and then the xi’s are
the columns of A-1.
Theorem 3.3: If A is nonsingular, P such that PA=LU is possible and P is only
a permutation of the rows. In fact, P may be found such that lkk  lik for i>k,
k=1,2,…,n-1.
36
Theorem 3.4: Suppose A is symmetric. If A=LU is possible, then the choice of
lkk=ukklik=uki. Hence, U=LT.
Variants of Gaussian elimination
LDU factorization: L and U are strictly lower and upper triangular and D is
diagonal.
Cholesky: A=AT, so factor A=LLT.





0
1
 is symmetric, but cannot be factored into LU form.
Fun example: A 
1 0
Definition: A is positive definite if xT Ax  0, xT x  0 .
Theorem 3.5 (Cholesky Method): Let A be symmetric, positive definite. Then A
can be factored in the form A=LLT.
37
Operation counts:
3
 To find L and g=L-1f is n  n2  n operations + n 's .
6
6
2
 To find U is n  n operations.
2
3
 Total is n  3 n2  n operations + n 's operations.
6 2
3
38
Parallel LU Decomposition
There are 6 convenient ways of writing the factorization step of the nn A in LU
decomposition (see textbook). The two most common ways are as follows:
kij loop: A by row (daxpy)
for k = 1, n − 1
for i = k + 1, n
lik = aik /akk
for j = k + 1, n
aij = aij − likakj
endfor
endfor
endfor
kji loop: A by column (daxpy)
for k = 1, n − 1
for p = k + 1, n
lpk = apk /akk
endfor
for j = k + 1, n
for i = k + 1, n
aij = aij − likakj
endfor
endfor
endfor
39
Definition: A daxpy is a double precision vector update of the form
x  x  y , where x, y
n
and   .
saxpy’s are single precision vector updates defined similarly.
Four styles of axpy’s (real and complex, single and double precision) are
included in the BLAS (basic linear algebra subroutines) that are the basis for
most high performance computing linear algebra and partial differential
equation libraries.
40
It is frequently convenient to store A by rows in the computer.
Suppose there are n processors Pi, with one row of A stored on each Pi. Using
the kji access method, the factorization algorithm is
for i = 1, n-1
Send aii to processors Pk, k=i+1,…, n
In parallel on each processor Pk, k=i+1,…, n, do the daxpy update to row
k
endfor
Note that in step i, after Pi sends aii to other processors that the first i processors
are idle for the rest of the calculation. This is highly inefficient if this is the only
thing the parallel computer is doing.
A column oriented version is very similar.
41
We can overlap communication with computing to hide some of the expenses of
communication. This still does not address the processor dropout issue. We can
do a lot better yet.
Improvements to aid parallel efficiency:
1. Store multiple rows (columns) on a processor. This assumes that there are p
processors and that p n . While helpful to have mod(n,p)=0, it is
unnecessary (it just complicates the implementation slightly).
2. Store multiple blocks of rows (columns) on a processor.
3. Store either 1 or 2 using a cyclic scheme (e.g., store rows 1 and 3 on P1 and
rows 2 and 4 on P2 when p=2 and n=4).
Improvement 3, while extremely nasty to program (and already has been as part
of Scalapack so you do not have to reinvent the wheel if you choose not to)
leads to the best use of all of the processors. No processor drops out. Figuring
out how to get the right part of A to the right processors is lots of fun, too, but is
also provided in the BLACS, which are required by Scalapack.
42
Now that we know how to factor A = LU in parallel, we need to know how to do
back substitution in parallel. This is a classic divide and conquer algorithm
leading to an operation count that cannot be realized on a known computer
(why?).
We can write the lower triangular matrix L in block form as





L1 0 
L
,
L2 L3 

where L1 and L2 are also lower triangular. If L is of order 2k, some k>0, then no
special cases arise in continuing to factor the Li’s. In fact, we can prove that
L1 






L1
1
L1L2 L1
3
1



1
3 
0
L
,
which is also known as a Schur complement. Recursion solves the problem.
43
Norms
Definition: A vector norm  :
n
satisfies for any x   xi 

y n ,
1. x  0, x n and x  0 if and only if x1  x2 
2.  x    x ,   , x n
3. x  y  x  y , x, y  n
In particular,
n
x 1  i1 xi .
x p






n
x
i1 i
1/ p
p




, p 1.




x   max  x1 , x2 , , xn  .
Example: x   4, 2, 5  , x 1  6  5, x 2  5, x   4 .


44

 xn  0 .
n
and any
Definition: A matrix norm  :
nn 




satisfies for any A   aij 
B nn ,
1. A  0 and A  0 if and only if i, j, aij  0.
nn
2.  A    A
3. A  B  A  B
4. AB  A  B
In particular,
n
A  max i1 aij , which is the maximum absolute column sum.
1 1 jn
A

n
 max  j1 aij , which is the maximum absolute row sum.
1in
A 
E




n
i1
 
1/2

n
a2  ,
j 1 ij 
which is the Euclidean matrix norm.
A  max Au
u 1
45
and any
Examples:


 1 2 3 
1. A   9 1 2  , A 11, A 12, A 11
1
2
E



1

2

4



2. Let In 

nn .
Then In  In 1, but In  n .
1
2
E
Condition number of a matrix
Definition: cond(A)= A  A1 .
Facts (compatible norms): Ax  A x 1, Ax  A
1

1
46

x  , Ax  A
2
E
x 2.
Theorem 3.6: Suppose we have an approximate solution of Ax=b by some x ,
where b  0 and A nn is nonsingular. Then for any compatible matrix and
vector norms,
 1
Ax  b
b
Ax  b
x x


, where   cond( A) .
x
b
Proof: (rhs) x  x  A1r , where r  Ax b is the residual. Thus,
x  x  A1  r  A1  Ax  b
Since Ax=b,
A  b  b and A / b  x
1
.
Thus,
x  x / x  A1  Ax  b  A / b .
(lhs) Note that since A  0 ,
Ax  b  r  Ax  Ax  A  x  x or x  x  Ax  b / b .
47
Further,
x  A1b  x  A1  b or x
1




 A1  b
Combining the two inequalities gives us the lhs.




1
.
QED
Theorem 3.7: Suppose x and x satisfy Ax  f and ( A  A)( x  x)  f  f ,
where x and x are perturbations. Let A be nonsingular and A be so small that
1
 A  A1 . Then for   cond(A) we have
x
x


1  A / A







f
f


 A 
A




.
Note: Theorem 3.7 implies that when x is small, small relative changes in f and
A cause small changes in x.
48
Iterative Improvement
1. Solve Ax=f to an approximation x (all single precision).
2. Calculate r  Ax  f using double the precision of the data.
3. Solve Ae=r to an approximation e (single precision).
4. Set x '  x  e (single precision x ) and repeat steps 2-4 with x  x ' .
Normally the solution method is a variant of Gaussian elimination.
Note that r  Ax  f  A(x  x)  Ae . Since we cannot solve Ax=f exactly, we
probably cannot solve Ae=r exactly, either.
Fact: If 1st x ' has q digits correct. Then the 2nd x ' will have 2q digits correct
(assuming that 2q is less than the number of digits representable on your
computer) and the nth x ' will have nq digits correct (under a similar assumption
as before).
Parallelization is straightforward: Use a parallel Gaussian elimination code and
parallelize the residual calculation based on where the data resides.
49
3c. Iterative Methods
3c (i) Splitting or Relaxation Methods
Let A=ST, where S is nonsingular. Then Ax  b  Sx Tx  b . Then the
iterative procedure is defined by






x0
given
Sxk 1  Txk  b, k 1
To be useful requires that
1. xk 1 is easy to compute.
2. xk  x in a reasonable amount of time.
50
Example: Let A=D-L-U, where D is diagonal and L and U are strictly lower and
upper triangular, respectively. Then
a. S=D and T=L+U: both are easy to compute, but many iterations are
required in practice.
b. S=A and T=0: S is hard to compute, but only 1 iteration is required.
Let ek  x  xk . Then
Sek 1  Tek or ek  (S 1T )k e0 ,
which proves the following:
Theorem 3.8: The iterative procedure converges or diverges at the rate of
S 1T .

51
Named relaxation (or splitting) methods:
1. S  D, T  L U (Jacobi): requires 2 vectors for xk and xk+1, which is
somewhat unnatural, but parallelizes trivially and scales well.
2. S  D  L, T U (Gauss-Seidel or Gau-Seidel in German): requires only 1
vector for xk. The method was unknown to Gauss, but known to Seidel.
3. S   1D  L, T  1 D U :
a.  (1,2) (Successive Over Relaxation, or SOR)
b.  (0,1) (Successive Under Relaxation, or SUR)
c.  1 is just Gauss-Seidel





2

1
2
0
0
1




 , and S 1T  1 , whereas
Example: A 
, SJ  
, TJ  



J J
2
1 2 
0 2
1 0











2
0
0
1
1,


 , and S 1 T
,
SGS 
T


GS GS
4
1 2 GS 0 0
which implies that 1 Gauss-Seidel iteration equals 2 Jacobi iterations.
52
Special Matrix Example
Let Afd 



















2 1
1 2 1
1 2 1



















1 2 1
1 2 1
1 2
be tridiagonal.
1
For this matrix, let   S J1TJ and   SSOR
. The optimal  is such that
T
, SOR,
53



2
  1   2 2 , which is part of Young’s thesis (1950), but correctly proven







by Varga later. We can show that   2 2 1 1  2  makes  as small as
possible.
Aside: If =1, then  2   or    . Hence, Gauss-Seidel is twice as fast as
Jacobi (in either convergence or divergence).
If Afd 
Facts:
let h  1 .
n 1
  cos( h)
Jacobi
 2  cos2 ( h) Gauss-Seidel
  1 sin( h) SOR-optimal 
1 sin( h)
nn ,
Example: n=21 and h=1/22. Then   0.99,  2  0.98,   0.75  30 Jacobis
equals 1 SOR with the optimal ! Take n=1000 and h=1/1001. Then ~1275
Jacobis equals 1 SOR with the optimal !!
54
There are many other splitting methods, including Alternating Direction Implicit
(ADI) methods (1950’s) and a cottage industry of splitting methods developed
in the U.S.S.R. (1960’s). There are some interesting parallelization methods
based on ADI and properties of tridiagonal matrices to make ADI-like methods
have similar convergence properties of ADI.
Parallelization of the Iterative Procedure
For Jacobi, parallelization is utterly trivial:
1. Split up the unknowns onto processors.
2. Each processor updates all of its unknowns.
3. Each processor sends its unknowns to processors that need the updated
information.
4. Continue iterating until done.
55
Common fallacies:
 When an element of the solution vector xk has a small enough elementwise residual, stop updating the element. This leads to utterly wrong
solutions since the residuals are affected by updates of neighbors after
the element stops being updated.
 Keep computing and use the last known update from neighboring
processors. This leads to chattering and no element-wise convergence.
 Asynchronous algorithms exist, but eliminate the chattering through
extra calculations.
Parallel Gauss-Seidel and SOR are much, much harder. In fact, by and large,
they do not exist. Googling efforts leads to an interesting set of papers that
approximately parallelize Gauss-Seidel for a set of matrices with a very well
known structures only. Even then, the algorithms are extremely complex.
56
Parallel Block-Jacobi is commonly used instead as an approximation. The
matrix A is divided up into a number of blocks. Each block is assigned to a
processor. Inside of each block, Jacobi is performed some number of iterations.
Data is exchanged between processors and the iteration continues.
See the book (absolutely shameless plug),
C. C. Douglas, G. Haase, and U. Langer, A Tutorial on Elliptic PDE
Solvers and Their Parallelization, SIAM Books, Philadelphia, 2003
for how to do parallelization of iterative methods for matrices that commonly
occur when solving partial differential equations (what else would you ever want
to solve anyway???).
57
3c (ii) Krylov Space Methods
Conjugate Gradients
Let A be symmetric, positive definite, i.e.,
A=AT and ( Ax, x)   x 2 , where   0 .
The conjugate gradient iteration method for the solution of Ax+b=0 is defined
as follows with r=r(x)=Ax+b:
x0 arbitrary
r0=Ax0+b
w0=r0
(approximate solution)
(approximate residual)
(search direction)
58
For k  0,1,
xk 1  xk k wk ,
k  
rk 1  rk k Awk
wk 1  rk 1  k wk ,
(rk , wk )
(wk , Awk )
(rk 1, Awk )
k  
(wk , Awk )
Lemma CG1: If Q( x(t))  1 ( x(t), Ax(t))  (b, x(t)) and x(t )  xk  twk , then  k is
2
chosen to minimize Q( x(t )) as a function of t.
Proof: Expand x(t) and use inner product linearity:
Q(x(t)) =
=
1 ( x  tw , Ax  tAw )  (b, x  tw )
k
k
k
k
k
2 k
1 ( x , Ax )  2t( x , Aw )  t 2(w , Aw )  (b, x )  t(b, w )
k
k
k
k 
k
k
2  k k
59
d Q(x(t)) =
dt
( xk , Awk )  t (wk , Awk )  (b, wk )
d Q(x( )) =
k
dt
=
=
=
( xk , Awk ) k (wk , Awk )  (b, wk )
( xk , Awk )  (rk , wk )  (b, wk )
( Axk  b  rk , wk )
0
since
Axk  rk =
=
=
=
=
A( xk 1 k 1wk 1)  (rk 1 k 1Awk 1)
Axk 1  rk 1
Ax0  r0
b
60
2
Note that d 2 Q(x(k ))=(wk ,Awk )>0 if wk >0 .
dt
Lemma CG2: The parameter k is chosen so that (wk 1, Awk )  0 .
Lemma CG3: For 0  q  k ,
1. (rk 1, wq )  0
2. (rk 1, rq )  0
3. (wk 1, Awq )  0
Lemma CG4: k 
(rk 1, rk 1)
.
(rk 1, Ark )
Lemma CG5: k 
(rk 1, rk 1)
(rk , rk )
61
Theorem 3.9: (CG): Let A N N be symmetric, positive definite. Then the CG
iteration converges to the exact solution of Ax+b=0 in not more than N
iterations.
Preconditioning
We seek a matrix M (or a set of matrices) to use in solving M 1Ax  M 1b  0
such that
  (M 1A)  ( A)
 M is easy to use when solving My=z.
 M and A have similar properties (e.g., symmetry and positive definiteness)
Reducing the condition number reduces the number of iterations necessary to
achieve an adequate convergence factor.
62
Thereom 3.10: In finite arithmetic, the preconditioned conjugate gradient
method converges at the rate based on the largest and smallest eigenvalues of
M 1A,
x  xk
x  x0
2 2
2
2
(M 1A)












k
2 (M 1A) 1
, where  2 (M 1A)  max .
min
2 (M 1A) 1
Proof: See Golub and Van Loan or many other numerical linear algebra books.
What are some common preconditioners?





Identity!!! 
Main diagonal (the easiest to implement in parallel and very hard to beat)
Jacobi
Gauss-Seidel
Tchebyshev
63
 Incomplete LU, known as ILU (or modified ILU)
Most of these do not work straight out of the box since symmetry may be
required. How do we symmetrize Jacobi or a SOR-like iteration?
 Do two iterations: once in the order specified and once in the opposite
order. So, if the order is natural, i.e., 1N, then the opposite is N1.
 There are a few papers that show how to do two way iterations for less than
the cost of two matrix-vector multiplies (which is the effective cost of the
solves).
Preconditioned conjugate gradients
x0 arbitrary
r0=Ax0+b
Mr0  r0
w0  r0
(approximate solution)
(approximate residual)
(search direction)
64
followed by for k  0,1,
until (rk 1, rk 1)   (r0, r0) and (rk 1, rk 1)   (r0, r0) for
a given :
xk 1  xk k wk ,
k  
(rk , rk )
(wk , Awk )
k  
(rk 1, rk )
(rk , rk )
rk 1  rk k Awk
Mrk 1  rk 1
wk 1  rk 1  k wk ,
65
3d. Sparse Matrix Methods
We want to solve Ax=b, where A is large, sparse, and NN. By sparse, A is
nearly all zeroes. Consider the tridiagonal matrix, A[1,2, 1]. If N=10,000,
then A is sparse, but if N=4 it is not sparse. Typical sparse matrices are not just
banded or diagonal matrices. The nonzero pattern may appear to be random at
first glance.
There are a small number of common storage schemes so that (almost) no zeroes
are stored for A, ideally storing only NZ(A) = number of nonzeroes in A:




Diagonal (or band)
Profile
Row or column (and several variants)
Any of the above for blocks
The schemes all work in parallel, too, for the local parts of A. Sparse matrices
arise in a very large percentage of problems on large parallel computers.
66
Compressed row storage scheme (Yale Sparse Matrix Package format)
 3 vectors: IA, JA, and AM.
Length
N+1
NZ(A)
NZ(A)
Description
IA(j) = index in AM of 1st nonzero in row j
JA(j) = column of jth element in AM
AM(j) = aik, for some row i and k=JA(j)
Row j is stored in AM (IA( j): IA( j 1) 1) . The order in the row may be arbitrary
or ordered such that JA( j)  JA( j 1) within a row. Sometimes the diagonal entry
for a row comes first, then the rest of the row is ordered.
The compressed column storage scheme is defined similarly.
67
Modified compressed row storage scheme (new Yale Sparse Matrix Package
format)
 2 vectors: IJA, AM, each of length NZ(A)+N+1.
 Assume A = D + L + U, where D is diagonal and L and U are strictly lower
and upper triangular, respectively. Let i  NZ (row i of L+U) .
Then
IJA(1)  N  2
IJA(i)  IJA(i 1)  i1, i  2,3, , N 1
IJA( j)  column index of jth element in AM
AM (i)  aii , 1 i  N
AM ( N 1) is arbitrary
AM ( j)  aik , IJA(i)  j  IJA(i 1) and k  IJA( j)
The modified compressed column storage scheme is defined similarly.
68
Very modified compressed column storage scheme (Bank-Smith format)
 Assumes that A is either symmetric or nearly symmetric.
 Assume A = D + L + U, where D is diagonal and L and U are strictly lower
and upper triangular, respectively. Let i  NZ (column i of U ) that will be
stored. Let   i1i .
N
 2 vectors: IJA, AM with both aij and aji stored if either is nonzero.
IJA(1)  N  2
IJA(i)  IJA(i 1) i1, i  2,3, , N 1
IJA( j)  row index of jth element in AM
AM (i)  aii , 1 i  N
AM ( N 1) is arbitrary
AM ( j)  aki , IJA(i)  j  IJA(i 1) and k  IJA( j)
If A  AT , then AM ( j  )  aik
69
AM contains first D, an arbitrary element, UT, and then possibly L.
Example:
A
















24 




44 
a11 0 a13 0
0 a22 0 a
a31 0 a33 0
0 a42 0 a
Then
index
IJA
AM
1
6
a11
D and column “pointers”
2
3
4
6
6
7
a22
a33
a44
70
UT
5
8

6
1
a13
7
2
a24
Optional L
8
9
a31
a42
Compute Ax or ATx
Procedure MULT( N, A, x, y )
do i = 1:N
y(i)=A(i)x(i)
enddo
Lshift=0 if L is not stored and IJA(N+1)-IJA(1) otherwise
Ushift=0 if y=Ax or L is not stored and IJA(N+1)-IJA(1) if y=ATx
do i = 1:N
do k = 1:IJA(i):IJA(i+1)-1
j=IJA(k)
y(i)+=A(k+Lshift)x(j) // So-so caching properties
y(j)+=A(k+Ushift)x(i) // Cache misses galore
enddo
enddo
end MULT
71
In the double loop, the first y update has so-so cache properties, but the second
update is really problematic. It is almost guaranteed to cause at least one cache
miss every time through the loop. Storing small blocks of size pq (instead of
11) is frequently helpful.
Note that when solving Ax=b by iterative methods like Gauss-Seidel or SOR,
independent access to D, L, and U is required. These algorithms can be
implemented fairly easily on a single processor core.
Sparse Gaussian elimination
We want to factor A=LDU. Without loss of generality, we assume that A is
already reordered so that this is accomplished without pivoting. The solution is
computed using forward, diagonal, and backward substitutions:







Lw  b
Dz  w
Ux  z
72
There are 3 phases:
1. symbolic factorization (determine the nonzero structure of U and possibly
L,
2. numeric factorization (compute LDU), and
3. forward/backward substitution (compute x).
Let G  (V , E) denote the ordered, undirected graph corresponding to the matrix
A. V  v



N

i i1








is the virtex set, E  eij  e ji | aij  a ji  0 is the edge set, and
virtex adjaceny set is adjG (vi )  k | eik E  .


Gaussian elimination corresponds to a sequence of elimination graphs Gi,
0i<N. Let G0=G. Define Gi from Gi-1, i>0, by removing vi from Gi-1 and all of
its incident edges from Gi-1, and adding new edges as required to pairwise
connect all vertices in adjG (vi ) .
i1
73
Let F denote the set of edges added during the elimination process. Let
G '  V , E  F  . Gaussian elimination applied to G’ produces no new fillin edges.


Symbolic factorization computes E  F . Define
m(i) = min k  i | k adj (v ), 1 i  N
G' i







= min k  i | k adj (v )
i 
G
i1
Theorem 3.11: eij E  F , i  j , if and only if
1. eij E , or
2.  sequence  k1, k2, , k p  such that

a. k1=l1, kp=j, el  E ,

j
b. i=kq, some 2qp1, and
c. kq  m(kq1) , 2qp.
74


Computing the fillin
The cost in time will be O( N  E  F ) . We need 3 vectors:
 M of length N1
 LIST of length N
 JU of length N+1 (not technically necessary for fillin)
The fillin procedure has three major sections: initialization, computing row
indices of U, and cleanup.
Procedure FILLIN( N, IJA, JU, M, LIST )
// Initialization of vectors
M(i)=0, 1iN
LIST(i)=0, 1iN
JU(1)=N+1
75
do i=1:N
Length=0
LIST(i)=i
// Compute row indices of U
do j=IJA(i):IJA(i+1)1
k=IJA(j)
while LIST(k)=0
LIST(k)=LIST(i)
LIST(i)=k
Length++
if M(k)=0, then M(k)=i
k=M(k)
endwhile
enddo // j
JU(i+1)=JU(i)+Length
76
// Cleanup loop: we will modify this loop when computing either
// Ly=b or Ux=z (computing Dz=y is a separate simple scaling loop)
k=i
do j=1:Length+1
ksave=k
k=LIST(k)
LIST(ksave)=0
enddo // j
enddo // i
end FILLIN
Numerical factorization (A=LDU) is derived by embedding matrix operations
involving U, L, and D into a FILLIN-like procedure.
The solution step replaces the Cleanup loop in FILLIN with
k=i
Sum=0
77
do j=JU(i):JU(i+1)1
ksave=k
k=LIST(k)
LIST(ksave)=0
Sum+=U(j)y(k)
enddo // j
y(i)=b(i)Sum
LIST(k)=0
The i loop ends after this substitution.
Solving Ux=z follows the same pattern, but columns are processed in the reverse
order. Adding Lshift and Ushift parameters allows the same code handle both
cases A=AT and AAT equally easily.
R.E. Bank and R.K. Smith, General sparse elimination requires no permanent
integer storage, SIAM J. Sci. Stat. Comp., 8 (1987), pp. 574-584 and the SMMP
and Madpack2 packages in the Free Software section of my home web.
78
4. Solution of Nonlinear Equations
Intermediate Value Theorem: A continous function on a closed interval takes on
all values between and including its local maximum and mimum.
(First) Mean Value Theorem: If f is continuous on [a,b] and is differentiable on
(a,b), then there exists at least one  (a,b) such that f (b)  f (a)  f '()(b  a) .
Taylor’s Theorem: Let f be a function such that f (n1) is continuous on (a,b). If
x  y(a,b) , then
f ( x)  f ( y) 





i
n
(
x

y
)
(
i
)

f
(
y
)
 R ( y, x) ,
i1
i!  n1


where  between x and y such that
n1
Rn1( y, x)  f (n1) ( ) ( x  y) .
(n 1)!
79
Given y=f(x), find all s such that f(s)=0.
Backtracking Schemes
Suppose a  b such that f (a)  f (b) and f is continuous on [a,b].
Bisection method: Let m  a  b . Then either
2
1. f (a) f (m)  0 : replace b by m.
2. f (a) f (m)  0 : replace a by m.
3. f (a) f (m)  0 : stop since m is a root.
80
Features include
 Will always converge (usually quite slowly) to some root if one exists.
 We can obtain error estimates.
 1 function evaluation per step.
False position method: Derived from geometry.
81
First we determine the secant line from (a, f (a)) to (b, f (b)) :
y  f (b)  f (a)  f (b) .
x b
a b
The secant line crosses the x-axis when x  x1, where
f (b)  f (b)  f (a) ( x1  b)  0 or x1  af (b)  bf (a) .
ba
f (b)  f (a)
Then a root lies in either [a, x1] or [ x1,b] depending on sign of f (a) f ( x1) as
before. We replace a or b with x1 depending on which interval the root lies and
repeat until we get (close enough) to the root.
82
Features include
 Usually converges faster than the Bisection method, but is quite slow..
 Very old method: first reference is in the Indian mathematics text Vaishali
Ganit (circa 3rd century BC). It was known in China by the 2nd century AD
and by Fibonacci in 1202. Middle Eastern mathematicians kept the method
alive during the European Dark Ages.
The method can get stuck, however. In this case, it can be unstuck (and speeded
up) by choosing either
x1  (a / 2) f (b)  bf (a) or x1  af (b)  (b / 2) f (a) .
f (b)/ 2  f (a)
f (b)  f (a)/ 2
Modifications like this are called the Illinois method and date to the 1970’s.
83
Fixed point methods: Construct a function g(s) such that g (s)  s  f (s)  0 .
Example: g (s)  s  f (s) .
Constructing a good fixed point method is easy. The motivation is to look at a
function y=g(x) and see when g(x) intersects y=x.
Let I [a,b] and assume that g is defined on I. Then g has either zero or possibly
many fixed points on I.
84
Theorem 4.1: If g (I )  I and g is continuous, then g has at least one fixed point
in I.
Proof: g (I )  I means that a  g (a)  b and a  g (b)  b . If either a  g (a) or
b  g (b) , then we are done. Assume that is not the case: hence, g (a)  a  0 and
g (b)  b  0 . For F ( x)  g ( x)  x , F is continuous and F (a)  0 and F (b)  0 . Thus,
by the initial value theorem, there exists at least one s  I such that
0  F (s)  g (s)  s . QED
Why are Theorem 4.1’s requirements reasonable?
 s  I : s cannot equal g(s) if not g (s)I .
 Continuity: if g is discontinuous, the graph of g may lie partly above and
below y=x without an intersection.
Theorem 4.2: If g (I )  I and g '( x)  L 1, x I , then !s  I such that g(s)=s.
85
Proof: Suppose s1  s2  I are both fixed points of g. The mean value theorem
with  (s1, s2 ) has the property that
s2  s1  g (s2 )  g (s1)  g '( )( s2  s1 )  L s2  s1  s2  s1 ,
which is a contradiction.
QED
Note that the condition on g '  g must be continuous.
86
Algorithm: Let x0  I be arbitrary and set xn1  g ( xn ), n  0,1,
Note that after n steps, g ( xn )  xn  xm  xn, m  n .
87
Theorem 4.3: Let g (I )  I and g '( x)  L 1, x I . For x0  I , the sequence
xn  g ( xn1), n 1,2,
satisfies
converges to the fixed point s and the nth error en  xn  s
n
en  L x1  x0 .
1 L
Note that Theorem 4.3 is a nonlocal convergence theorem because s is fixed, a
known interval I is assumed, and convergence is for any x0  I .
Proof: (convergence) Recall that s is unique. For any n, n between xn-1 and s
such that
xn  s  Ln g (xn1)  s  g '(n )  xn1  s .
Repeating this gives
xn  s  Ln x0  s .
n  0  lim x  s .
L
Since 0  L 1, nlim

n n
88
Error bound: Note that
x0  s  x0  x1  x1  s
 x0  x1  L x0  s
 (1 L) x0  s  x1  x0
n
Since xn  s  Ln x0  s , en  xn  s  L x0  s .
1 L
QED
Theorem 4.4: Let g '( x) be continuous on some open interval containing s, where
g(s)=s. If g '( x) 1, then   0 such that the fixed point iteration is convergent
whenever x0  s   .
Note that Theorem 4.4 is a local convergence theorem since x0 must be
sufficiently close to s.
89
Proof: Since g ' is continuous in an open interval containing s and g '(s) 1, then
for any constant K satisfying g '(s)  K 1,   0 such that if x[s  , s   ]  I ,
then g '( x)  K . By the mean value theorem, given any x I ,  between x and
s such that g ( x)  s  g '(s)  x  s  K   and thus g (I )  I . Using I in
Theorem 4.3 completes the proof. QED
Notes:
 There is no hint what  is.
 If g '(s) 1, then I such that g '( x) 1, x I . So if x0  I , x0  s , then
g ( x0)  g(s)  g '()( x0  s) or x1  s  g '()  x0  s  x0  s . Hence, only x0  s
implies convergence while all others imply divergence.
90
Error Analysis
Let ek  xk  s , I a closed interval, and g satisfies a local theorem’s requirements
on I. The Taylor series of g about x=s is
en1 = g ( xn )  g (s)
(k ) (s)
g
''(
s
)
g
2
= g '(s)en 
en  
enk  Ek ,n ,
2
k!
(k 1) ( )
n ek 1.
where Ek,n = g
(k 1)! n
If x0  s, g '(s)  g ''(s) 
 g (k 1) (s)  0 , and 0 g (k ) (I ) , then
en1 g (k ) (s)
th
lim


k
order convergence.
n ek
k
!
n
The important k’s are 1 and 2.
91
If we only have 1st order convergence, we can speed it up using quadratic



interpolation: given ( xi , f ( xi )
3


i1
, fit a 2nd order polynomial p to the data such
that p( xi )  f ( xi ), i 1,2,3. Use p to get the next guess. Let
xn  xn1  xn, 2 xn  xn2  2xn1  xn, and xn'  xn  (x2n ) .
 xn
2
  0 , then for n
If  n  xn  xn' satisfies  n1  (B  n ) n,  n  0, B 1, and nlim
 n
xn  x  0 , where x*  lim x (x* is
sufficiently large, xn' is well defined and nlim
 x  x*
n n
n
hopefully s).
'
*
We can apply the fixed point method to the zeroes of f: Choose
g ( x)  x  ( x) f ( x) , where 0   ( x)  . Note that f ( x) and f ( x) ( x) have the
92
same zeroes, which is true also for g (x)  x  F ( f (x)) , where F ( y)  0 if y0 and
F (0)  0 .
Chord Method
Choose  ( x)  m , m constant. So, g '( x) 1 mf '( x) . We want
g '( x) 1  0  mf '( x)  2 in some x  s   .
Thus, m must have the same sign as f ' . Let xn1  xn  mf ( xn ) . Solving for m,
m
xn  xn1
.
f ( xn )
Therefore, xn+1 is the x-intercept of the line through ( xn, f ( xn )) with slope 1/m.
Properties:
 1st order convergence
93
 Convergence if xn+1 can be found (always)
 Can obtain error estimates
Newton’s Method
Choose  ( x)  1 . Let s be such that f (s)  0 . Then
f '( x)
2  f ( x) f ''( x)
(
f
'(
x
))
g '( x) = 1
( f '( x))2
f ( x) f ''( x)
= ( f '( x))2
If f '' exists in I  x  s   and f '( x)  0 ( xI ) , then g '( x)  0  2nd order
convergence. So,
xn1  xn  f ( xn ) .
f '( xn )
94
What if f '(s)  0 and f '' exists? Then f ( x)  ( x  s) p h( x) , where h(s)  0, h ''
exists. So,
p h( x)
(
x

s
)
g ( x) = x 
( x  s) p h '( x)  p( x  s) p1h( x)
1 ( x  s)
p
x

=
( x  s) h '( x) 1
ph( x)
(1 1p )  ( x  s) 2h '( x)  ( x  s)2 h2''( x)
ph( x)
p h( x)
g '( x) =





1 ( x  s) h '( x)
ph( x)





2
Thus, g '(s) 1 1p . Then xn1  xn  p f ( x) makes the method 2nd order again.
f '( x)
95
Properties:
 2nd order convergence
 Evaluation of both f ( xn ) and f '( xn ) .
If f '( xn ) is not known, it can be approximated using
f '( xn ) 
f ( xn )  f ( xn1)
xn  xn1 .
Secant Method
x0 is given and x1 is given by false position. Thus,
xn  xn1
and g ( x)  x  ( x) f ( x) .
 ( x) 
f ( xn )  f ( xn1)
96
Properties:
 Must only evaluate f ( xn )
 First step is identical to the first step of the false position method. After that
the two methods differ.
 Convergence order 1.618 (two steps  2.5)
97
N Equations
Let f ( x)  f i ( x)



N


i1
 0 . Construct a fixed point function g ( x)  gi ( x)



 
 
 
f ( x) . Replace
g '( x)  L 1, x such that x  s    by
gi ( x) L
 , L 1, all i, j and x  s   
N
x j
Equivalent: for i 1,2, , N ,
gi ( x)  gi ( y)  
N gi ( (i) )
( x  y) .
j 1 x
j
Thus,
98
N


i1
from

gi ( x)  gi ( y) 
N gi ( (i) )
 x y
j 1 xj
x y  

(i)
N gi ( )
j 1 x
j
L
 x  y   j1 N
 L x y 
N
Thus, g ( x)  g ( y)  L x  y  .

Newton’s Method







f ( x) 
Define the Jacobian by J ( x)  i  . If J ( x)  0 for x  s   , then we define
x j 

99
xn1  xn  J 1(xn ) f (xn ) , or (better)
1. Solve J ( xn )cn  f ( xn )
2. Set xn1  xn  cn
Quadratic convergence if
1. f ''( x) exists for x  s  
2. J (s) nonsingular
3. x0 sufficiently close to s
1D Example: f (x)  cos x  x, x[0,1] . To reduce f ( x) to
 Bisection:
x0  0.5 20 steps
 False position: x0  0.5
 Secant:
 Newton:
x0  0
x0  0
7 steps
6 steps
5 steps
100
0.1106 ,
Zeroes of Polynomials
Let p(x)  a0 xn  a1xn1   an1x  an, a0  0 .
Properties:
 Computing p( x) and p '( x) are easy.
 Finding zeroes when n4 is by use of formulas, e.g., when n=2,
a1  a12  4a0a2
2a0
When n5, there are no formulas.
Theorem 4.5 (Fundamental Theorem of Algebra): Given p( x) with n1, there
exists at least one r (possibly complex) such that p(r)  0 .
101
We can uniquely factor p using Theorem 4.5:
p( x) = ( x  r1)q1( x) ,
q1 is a (n1)st degree polynomial
= ( x  r1)( x  r2 )q2 ( x) , q2 is a (n 2)nd degree polynomial
= a0n ( x  ri )
i1
We can prove by induction that there exist no more than n roots.
Suppose that r  a  ib such that p(r)  0 . Then p(r )  0 , where r  a ib .
Theorem 4.6 (Division Theorem): Let P( x) and Q( x) be polynomials of degree n
and m, where 1mn. Then ! S ( x) of degree nm and a unique polynomial
R( x) of degree m1 or less such that P( x)  Q( x)S ( x)  R( x) .
102
Evaluating Polynomials
How do we compute p '( ), p ''( ), , p(m) ( ) ? We may need to make a change
of variables: t  x  , which leads to
p(x)  0(x  )n  1(x  )n1   n1(x  )  n .
Using Taylor’s Theorem we know that
(n j ) ( )
p
j 
, 0 j  n.
(n  j)!
We use nested multiplication,
p( x)  x( x( x( x(a0 x  a1)  a2)  an1)  an ,
where there are n1 multiplies by x before the inner a0 x  a1 expression.
103
The cost of evaluating p is
nested multiplication
direct evaluation
multiplies adds
n
n
n
2n1
Synthetic Division
To evaluate p( ),   :
b0  a0
b j   b j1  a j , 1jn
Then bn  p( ) for the same cost as nested multiplication. We use this method to
evaluate p(m) ( ), 0  m  n . Write
p( x)  ( x  )qn1( x)  r0 ( x) .
104
Note that qn1( x) has degree n1 since a0  0 in the definition of p( x) . Further,
its leading coefficient is also a0 . Also, r0 ( x)  p( ) using the previous way of
writing p( x) . So, we can show that
qn1(x)  b0 xn1  b1xn2   bn1.
Further, qn1( x)  ( x  )qn2( x)  r1, where r1  qn1( ) . Substituting,
p( x) = (x  )2 qn2(x)  (x  )r1  r0
or
p '( x) = qn1( )  r1.
We can continue this to get
p( x)  rn
( x   )n 
(m) ( )
p
 r1( x  )  r0 , where rm 
, 0 m n.
m!
105
Deflation
Find r1 for p( x) . Then
p( x)  p1( x)( x  r1) .
Now find r2 for p1( x) . Then
p( x)  p2 ( x)( x  r1)( x  r2) .
Continue for all ri . A problem arises on a computer. Whatever method we use to
find the roots will not usually find the exact roots, but something close. So we
really compute r1. By the Division Algorithm Theorem,
p( x)  p1( x)( x  r1)  p1(r1) with p(r1)  0 usually.
106
Now we compute r2 , which is probably wrong, too (and possibly quite wrong).
A better r̂2 can be computed using r2 as the initial guess in our zero finding
algorithm for p( x) . This correction strategy should be used for all ri , i  2 .
Suppose r1  r2 and r1  r1  r2  r2    0 . Then
r1  r1
r1

r2  r2
r2
,
which implies that we should find the smaller roots first.
107
Descartes Rule of Signs
In order, write down the signs of nonzero coefficients of p( x) . Then count the
number of sign changes and call it  .
Examples:
p1( x) 
x3  2x2  4x



p2 ( x)  x3  2x2  4x



3
 ,  2
3
 ,  1
Rule: Let k be the number of positive real roots of p( x) . Then k   and   k is
a nonnegative even integer.
Example: For p2 ( x) above,  1,   k  0 or k 1, which implies that p2 has
one positive real root.
108
Fact: If p(r)  0 , then r is a root of p( x) . Hence, we can obtain information
about the number of negative real roots by looking at p( x) .
Example: p2(x) x3  2x2  4x 3,   2,   k  0 or 2 , which implies that p2
has 0 or 2 negative real roots.
Localization of Polynomial Zeroes
Once again, let p(x)  a0 xn  a1xn1   an1x  an, a0  0 .
Theorem 4.7: Given p( x) , then all of the zeroes of p( x) lie in

















n
C,
i1 i





where
Cn  z  : z  an , C1  z  : z  a1 1 , and Ck  z  : z 1 ak , 2  k  n .
109
Corollary 4.8: Given p( x) and r 1 max ai , then every zero lies in
1in
C   z  : z  r  .


Note that the circles C2, ,Cn have origin 0. One big root makes at least one
circle large. A change of variable ( t  x  ) can help reduce the size of the
largest circle.
Example: Let p( x)  ( x 10)( x 12)  x2  22x 120 . Then
C2,x   z  : z 120 .


Let x 11 and generate p(m) (11), 1 m  2 . We get p '( x)  2x  22, p '(11)  0, and
p ''(11)  2 . So, p( x)  ( x 11)2 1 t 2 1 for t  x 11. Then
C2,t   z  : z 1 .


110
Theorem 4.9: Given any  such that p '( )  0 , then there exists at least one zero







of p( x) in C  z  : z   n p( )  .
p '( ) 

Apply Theorem 4.9 to Newton’s method. We already have p( xm ) and p '( xm )
calculated for  xm . If p(x)  a0 xn  a1xn1   an1x  an  a0(x  r1) (x  rn) and
a0  0, an  0 , then no ri  0, 1 i  n . If p(s)   , some , then
1/ n


min1 rs   a  ,
1in
i  n
which is an upper bound on the relative error of s with respect to some zero of
p( x) .
111
5. Interpolation and Approximation
Assume we want to approximate some function f ( x) by a simpler function p( x)
.
Example: a Taylor expansion.
Besides approximating f ( x) , p( x) may be used to approximate f (m) ( x), m1 or
 f (x)dx .
Polynomial interpolation
p(x)  a0 xn  a1xn1   an1x  an, a0  0 . Most of the theory relies on
 Division Algorithm Theorem
 p has at most n zeroes unless it is identically zero.
112
Lagrange interpolation



Given ( xi , f i )
n


i1
, find p of degree n1such that p( xi )  f i , i 1, , n .
Note that if we can find polynomials i ( x) of degree n1 such that for i 1, , n
i ( x j )  
1, i  j
0, i  j




then p( x)   j1 f j j ( x) is a polynomial of degree n1 and
n
p( xi )   j1 f j j ( xi )  f i .
n
113
There are many solutions to the Lagrange interpolation problem.
The first one is
i ( x)  
x xj
n
j 1, j i x  x
i
j
.
i ( x) has n1 factors ( x  x j ) , so i ( x) is a polynomial of degree n1. Further, it
satisfies the remaining requirements.
Examples:
x x
x x
 n=2: p( x)  f1 x  x2  f 2 x  x1
1
 n=3: p( x)  f1
2
2
1
( x  x2 )( x  x3)

( x1  x2 )( x1  x3)
 n3: very painful to convert p( x) into the form
114
 ai xi .
The second solution is an algorithm: assume that p( x) has the Newton form,
p( x)  b0  b1( x  x1)  b2 ( x  x1)( x  x2 ) 
 bn1( x  x1) (x  xn1) .
Note that
f1  p( x1)  b0 ,
f b
f 2  p( x2 )  b0  b1( x2  x1) or b1  x2  x0 ,
2
bi1 
1
f i  b0  b1( xi  x1)   bi2 ( xi  x1) ( xi  xi2 )
.
( xi  x1) ( xi  xi1)
For all solutions to the Lagrange interpolation problem, we have a theorem that
describes the uniqueness, no matter how it is written.
115



Theorem 5.1: For fixed ( xi , f i )
n


i1
, there exists a unique Lagrange interpolating
polynomial.
Proof: Suppose p( x) and q( x) are distinct Lagrange interpolating polynomials.
Each has degree n1 and r( x)  p( x)  q( x) is also a polynomial of degree n1.
However, r( xi )  p( xi )  q( xi )  0 , which implies that r has n zeroes. We know it
can have at most n1 zeroes or must be identically zero.
Equally spaced xi ’s can be disastrous, e.g.,
f ( x) 
1 , x[1,1] .
1 25x2
It can be shown that
lim max f ( x)  pn ( x)  .
n 1 x1
116
QED
We can write the Newton form in terms of divided differences.
1st divided difference:
f f
f [ xi , xi1]  xi1  xi
i1
i
kth divided difference:
f [ xi , xi1, , xik ] 
f [ xi1, , xik ] f [ xi , xi1, , xik 1]
xik  xi
We can prove that the Newton form coefficients are bi  f [ x1, , xi1] .
117
We build a divided difference table in which coefficients are found on
downward slanting diagonals.
x1
f ( x1)  b0
f [ x1, x2 ]  b1
x2
f [ x1, x2, x3]  b2
f ( x2 )
f [ x1, x2, x3, x4 ]  b3
f [ x2, x3]
f [ x2, x3, x4 ]
f [ x1, , xn ]  bn1
f [ xn2, xn1]
118
xn1
f ( xn1)
f [ xn2, xn1, xn ]
f [ xn1, xn ]
xn
f ( xn )
This table contains a wealth of information about many interpolating
polynomials for f ( x) . For example, the quadratic polynomial of f ( x) at
x2, x3, x4 is a table lookup starting at f ( x2 ) .
Hermite interpolation



n


i1
This is a generalization of Lagrange interpolation. We assume that xi , f i , f i '
is available, where x1  x2   xn . We seek a p( x) of degree 2n1 such that for
i 1, , n two conditions are met:
1. p( xi )  f i
119
2. p '( xi )  f i '
There are two solutions. The first solution is as follows.
n
n
P( x)   j1 f j h j ( x)   j1 f j ' g j ( x) ,
, 1 i, j  n , satisfies condition 1.
where g j ( xi )  0 and h j ( xi )  
1,
i

j


0, i  j
Also,
n
n
P '( x)   j1 f j h j '( x)   j1 f j ' g j '( x) ,
, 1 i, j  n , satisfies condition 2.
where h j '( xi )  0 and g j '( xi )  
1,
i

j


0, i  j
120
We must find polynomials g j and h j of degree 2n1 satisfying these conditions.
Let
H ( x)  
n
(x  x j )
j 1
and



H ( x)



2
i ( x)  ( x  x )2 .
i
Note that i ( x) and i '( x) vanish at all of the nodes except xi and that
polynomial of degree 2n2. Put
hi ( x)  i ( x) ai ( x  xi )  bi 


and determine ai and bi so that hi ( xi ) 1 and hi '( xi )  0 : choose
ai  
i '( x)
1 .
and
b

i
( i ( x))2
i ( xi )
121
i ( x)
is a
Similarly,
gi ( x) 
i ( x) ( x  x ) .
i
i ( xi )
122
The second solution to the Hermite interpolation problem requires us to write
P( x)  b0  b1( x  x1)  b2( x  x1)2  b3( x  x1)2( x  x2) 

n

b4( x  x1)2( x  x2)2   b2n1  i1( x  xi )2  ( x  xn ).

Then
f1  p( x1)  b0
f1 '  p '( x1)  b1
f 2  p(x2)  b0  b1(x2  x1)  b2(x2  x1)2
or
f 2  b0  b1( x2  x1)
b2 
( x2  x1)2
and so on…
123




n


i1
Theorem 5.2: Given xi , f i , f i '
, the Hermite interpolant is unique.
Just as in the Lagrange interpolation case, equally spaced nodes can cause
disastrous problems.
Hermite cubics
n=2, so it is a cubic polynomial. Let h  x2  x1 . Then
b0  f1, b1  f1 ' , b2  h2  f 2  f1  hf1 ' , and b3  h3 h( f1 ' f 2 ')  2( f1  f 2) .




Hermite cubics is by far the most common form of Hermite interpolation that
you are likely to see in practice.
124
Piecewise polynomial interpolation
x x
x x
Piecewise linears: P( x)  f i x  xi1  f i1 x  ix , xi  x  xi1 .
i
i1
i1
i
Piecewise quadratics: Use Lagrange interpolation of degree 2 over [ x1, x2, x3] ,
[ x3, x4, x5], … This extends to Lagrange interpolation of degree k1 over groups
of k nodal points.
Piecewise Hermite: Cubics is the best known case. For xi  x  xi1 ,
f i1  f i  f i '( xi1  xi )
2
Q( x)  f i  f i '( x  xi ) 
(
x

x
)
i
( xi1  xi )2
( xi1  xi )( f i ' f i1 ')  2( f i  f i1)
( x  xi )2( x  xi1).
3
( xi1  xi )
Facts: Q( x) and Q '( x) are continuous, but Q ''( x) is not usually continuous.
125
Cubic spline: We want a cubic polynomial such that s, s ', and s '' are continuous.
We write
si''  s ''( xi ) .
Note that s ''( x) must be linear on [ xi , xi1]. So
x x
s ''( x)  si''  x  ix (si''1  si'') .
i1
i
Then
x
s '( x) = si'  xi1 s ''(t)dt
i
''  s''
s
( x  xi )2
i
'
''
i

1
= si  si ( x  xi )  x  x 
2
i
i1
and
126
2 s''  s'' ( x  x )3
(
x

x
)
i 
i  i1
i .
s( x)  s( xi )  si' ( x  xi )  si''
xi1  xi
2
6
We know si  f i and si1  f i1, so
si'  ( xi1  xi )

1 




2 s''  s''
(
x

x
)

i
i
''
2
i

1
i

1
f i1  f i  si

( xi1  xi )  .

2
6


At this point, s( x) can be written by knowing xi , f i , si'' . The si'' can be eliminated
by using the continuity condition on s ' . Suppose that xi1  xi  h, i . Then
s '( xi )  si' , but for xi1  x  xi ,
2 s''  s'' ( x  x )3
(
x

x
)
i1  i
i1
i1 
s( x)  s( xi1)  s' ( x  xi1)  s''
i1
i1
2
6
h
and
127
s '( xi )  si'1  h (si''  si''1) .
2
Equating both expressions for s '( xi ) we get
si''1  4si''  si''1  62  f i1  2 f i  f i1  , i  2, , n 1.

h 
Imposing s''  sn''  0 gives us n2 equations and n2 unknowns (plus 2 knowns)
1
to make the system have a unique solution.
Error Analysis



Consider Lagrange interpolation with ( xi , f i )
want to know what
128
n


i1
, x1  x2 
 xn, f i  f (xi ) . We
p( x)  f ( x)  
0 if x {x }
i


? otherwise
We write
n
f ( x)  p( x)   ( x)G( x) , where  ( x)  i1( x  xi ) and G is to be determined.
(n) ( )
f
Theorem 5.3: G( x) 
, where  depends on x.
n!
Proof: Note that G is continuous at any x{xi}in1 . Using L’Hopital’s Rule,
f ( x)  p( x)  f '( xi )  p '( xi ) .
G( xi )  xlim
xi
 ( x)
 '( xi )
Since  '( xi )  0, G( x) is continuous at any node xi . Let x be fixed and consider
129
H ( z)  f ( z)  p( z)  ( z)G( x) .
Note that H ( xi )  0 since f ( xi )  p( xi ) and  (xi )  0 . By the definition of f ( x) ,
H ( x)  0 . Now suppose that x{xi}in1 . Then H ( z) vanishes at n+1 distinct
points. H '( z) mush vanish at some point strictly between each adjacent pair of
these points (by Rolle’s Theorem), so H '( z) vanishes on n distinct points.
Similarly, H ''( z) vanishes at n1 distinct points. We continue this until we have
1 point, , depending on x, such that H (n) ( )  0 . Since p( x) is a polynomial of
degree n1,
0
or
= H (n) ( ) = f (n) ( )  (n) ( )G( x)
= f (n) ( )  n!G( x)
f (n) ( )
G( x) =
n!
.
130
Now suppose that x  xi , some i. Then H ( z) only vanishes at n distinct points.
But,
H '( z)  f '( z)  p '( z)  '( z)G( x) ,
so H '( xi )  0 and H '( z) still vanishes at n distinct points. We use the same trick
as before. QED



Consider Hermite interpolation, with ( xi , f i , f i ')
n


i1
, x1  x2 
f i '  f '( xi ) , and p( x) is the Hermite interpolant. Set
2
q( x)   ( x) and f ( x)  p(x)  q(x)G(x) .






Since q '( xi )  0 and q ''( xi )  0 , we have G( x) is continuous and
131
 xn , f i  f ( xi ) ,
G( xi ) 
f ''( xi )  p ''( xi )
.
q ''( xi )
Define
H ( z)  f ( z)  p( z)  q( z)G( x) .
In this case,
H ' vanishes at 2n distinct points,
H '' vanishes at 2n1 distinct points,
(2n) ( )
f
Hence, G( x) 
.
(2n)!
132
Note that interpolation is a linear process. Let Pf be any interpolating function
(e.g., Lagrange, Hermite, or spline) using a fixed set of nodes. Then for any
functions f and g,
P f  g  Pf   Pg , any  ,  .
Examples:
Lagrange: P f  g   j1 f i   gi  j ( x)   Pf   Pg .
n


Hermite: Similar to Lagrange.
133
Splines: Define the Kronecker delta function, ij  
. Let  i ( x) be the
1 i  j

0 i  j


unique spline function satisfying  i ( x j )  ij ( x) . If Pf  i1 f i i ( x) , then Pf is
n
the interpolatory spline for f ( x) . Linearity follows as before.
Let Pf be any linear interpolatory process that is exact for polynomials of
degree m, i.e., if q( x) is a polynomial of degree m, Pq  q . For a given function
f ( x) , Taylor’s Theorem says that
f ( x)  f ( x1)  f '( x1)( x  x1) 
f
m
(m) ( x ) ( x  x1)  x f (m1) (t ) ( x  t )m dt .
1
x1
m!
m!

134
Define
K ( x,t )  





( x  t )m , x  t  x
1
m!
0,
x  t  xn
so that
f ( x) 







j
x
(
x

x
)
n
( j) ( x )
1   n K ( x,t ) f (m1) (t )dt  F ( x)  F ( x) .
f
1
j 0
j!  x1



Even More Error Analysis 

Define C k ([a,b])   f :[a,b] 



such that f (k ) is continuous on [a,b] .


135
Theorem 5.4: If p( x) is a polynomial of degree n1 that interpolates
f
C k ([a,b])



n

i i1
at x
[a,b] , then
(n) ( )
n
f
f ( x)  p( x) 
W ( x), where W ( x)  i1( x  xi ) .
n!
Tchebyshev Polynomials of the First Kind
Define
Tk (x)  cos(k cos1( x)), k  0,1,2, , x[1,1],
where
T0 ( x) 1 and T1( x)  x .
Choose x  cos( ) , 0    . Then Tk ( x)  Tk (cos( ))  cos(k ) and we get a three
term recurrence:
136
Tk 1( x) = cos((k 1) )  2cos( )cos(k )  cos((k 1) )
= 2xTk ( x) Tk 1( x), k 1
Hence,
T2 ( x) = 2x2 1
T3( x) = 4x3  3x
We can verify inductively that Tk ( x) is a kth degree polynomial. The leading
coefficient of Tk ( x) is 2k1 and Tk ( xi )  0, 0  i  k 1, when

(2
i

1)

.
xi  cos
2k 





137
Finally, for x[1,1] we can show that Tk 1. For yi  cos( i ) ,
k

Tk ( yi )  cos(i )  (1)i , so, in fact, Tk 1. From Theorem 5.4, we can prove that

W ( x) is minimized when W ( x)  2nTn1( x) .
Translating Intervals
Suppose the problem on [c, d ] needs to be reformulated on [a,b] .
Example: Tchebyshev only works on [a,b] [1,1]. We use a straight line
transformation: t [c, d ], x[a,b]. Hence,
x  mt   , where   ad  bc and m  b  a .
d c
d c
Example: Tchebyshev with x[a,b], a  b arbitrary. Then
138





b

a
b

a
2
b

a




  2( x  a) 1 .
x
t

or
t

x

2   2 
2  b  a
b  a 




The shifted Tchebyshev polynomials are defined by





T k ( x)  Tk (t )  T 2( x  a) 1  cos k cos
ba


k 







1 



2( x  a) 1 .
ba
Since

(2
i

1)

 , 0  i  k 1,
Tk (ti )  0 for ti  cos
2k 





then

b

a
 t  b  a , 0  i  k 1.
xi 
2  i 2




139




 
are zeroes of T k ( x) . Further,
T 0 ( x)
T 1( x)
= 1
xa
= 2  b 1  1




T k 1 ( x) = Tk1(x)  2tTk (t) Tk 1(t)


2(
x

a
)

= 2  b  a 1T k ( x) T k 1( x), k 1,



k 1 


We can prove that the leading coefficient of T k 1 ( x) is 2

n 


Further, we know that W ( x)  2
2
ba




n1
140




k
2 , k 1.
ba
T n1( x) from Theorem 5.4.
Tensor Product Interpolation



n

i i0
Given x





and y
m


j 
j 0






, interpolate f ( x, y) over ( xi , y j ) , giving us
n
m
p( x, y)  i0  j0 aij xi y j .
The bi-Lagrangian is defined by
ij ( x, y)  i ( x) j ( y) ,
where
k
is the one
dimensional Lagrangian along either the x or y axis ( k 1,2 respectivefully) and
aij  f ( xi , y j ) .
The bi-Hermite and bi-Spline can be defined similarly.
141
Orthogonal Polynomials and Least Squares Approximation



We approximate f ( x) on [a,b] given ( xi , f i )
m


i0
. Define
n   p( x)| p a polynomial of degree  n .
Problem A: Let wi 
i0


m


, wi  0 (weights), m>n. Find p*( x)n which minimizes
2
m
 *

w p ( xi )  f i  .
i0 i 

Problem B: Let w( x)C([a,b]) and positive on (a,b) . Find p*( x)n which
minimizes
b
a w(x)



p*( x) 



2
f ( x) dx .
Properties of Both: Unique solutions and are “easy” to solve by a finite number
of steps in math formulas (which is not true of solving the more general problem
minmax
p*( x)  f ( x) .
x
*
p
142
Define
 f , g 1
= 
 f , g 2
b
f
2
m
w
j 0 j
f (x j )g(x j )
= a w( x) f ( x) g ( x)dx
=  f , f  (either inner product)
Note that f is a real norm for ,1, but is only a semi-norm for ,2 .
Theorem 5.5 (Cauchy-Schwarz): Let f , g C([a,b]) . Then
 f ,g   f  g .
Proof: If  g, g  0 , then  f , g  0 . If  g, g  0 , then
143
0   f  g, f  g    f , f  2  f , g   2  g, g  .
2

f
,
g


f
,
g

Use    g, g  . Then 0   f , f    g , g  .
QED
Definitions: p and q are orthogonal if and only if  p, q  0 . p and q are
orthonormal if and only if  p, q  0 and p  q 1.
Consider S  1, x, x2, , xn  , the set of monomials. The elements are not




orthogonal to each other under either  f , g 1 or  f , g 2 . Yet any pn is a
linear combination of the elements of S. We can transform S into a different set
of orthogonal polynomials using the
Gram-Schmidt Algorithm: Given S, let
144
q0 ( x) 1
qk



n

k k 0
Then q
( x)  xk 

k 1
k, p 

x
j
j0



is orthogonal and p
Note that for ,2 , 1 





b
a12 dx
n

k k 0
p j ( x)
p0 ( x) 
q0 ( x)
pk ( x) 
qk ( x)
q0
qk
is orthonormal.
1/2





 ba .
r
Let p( x)  k 0 ak xk , r  n . Then
k 1
k 1
xk  qk ( x)   j0  xk , p j  p j ( x)  qk pk ( x)   j0  xk , p j  p j ( x) .
145
Using this expression, we can write p( x) as
r
p( x)   j0  p, p j  p j ( x)
since
 pk , p   j0  j  p j , pk   k  pk , pk  k .
n
Best Least Squares Approximation
Theorem 5.6: Let , be either ,1 or ,2 and f   f , f  . If
f C([a,b]) , then the polynomial p*( x)n that minimizes f  p* over n is
given by
146
n
p*( x)   j0  f , p j  p j ( x) ,
n





where p j ( x)




 j 0
is the orthonormal set of polynomials generated by Gram-
Schmidt.
Proof: Let p*( x)n . Then p( x)   j0 j p j ( x) . Further,
n
0 f  p
2
=
 f   j0 j p j , f   j0 j p j 
n
n
 f , f  2 j0 j  f , p j   j0 2j
n
n
=
n
n  2
2

f
,
f


2

f
,
f

 j0
 j0  j  2 j  f , p j    f , p j 2 
=
=  f , f  
n

j 0
f , pj
2 
147

n 
 
j 0  j
2
f , pj  ,



which is minimized when we choose  j  f , p j  .
QED
Note: The coefficients  j  f , p j  are called the generalized Fourier
coefficients.
Facts:
 f ( x)  pn* ( x) n
2
n
 f   j0  f , p j 2
Efficient Computation of pn* ( x)
We can show that we have a three term recurrence:
qk ( x)  ( x  ak )qk 1( x)  bk qk 2( x), k  2 ,
148
where
 q , xq 
 xq , q 
ak   q k 1, q k 1 and bk   qk 1 , q k 2 .
k 2 k  2
k 1 k 1
This gives us
pn* ( x)




= 
n1

j 0




f , p j  p j ( x)   f , pn  pn ( x)
= pn*1(x) f , pn  pn(x)
So,


pn*( x)    j0  f , p j  p j ( x) 
n


is equivalent and may be less sensitive to roundoff error.
149
Also,
pn* ( x)
= 
= 
= 
n

j 0
f , p j  p j ( x)
n  f ,q j 
q ( x)
j 0  q , q  j
j j
n
c q ( x)
j 0 j j
If we precompute a j ,b j ,c j  , then pn ( x) only costs 2n1 multiplies.




150
6. Numerical Integration and Quadrature Rules
Assume f ( x) is integrable over [a,b] . Define
b
I ( f )   f ( x)w( x)dx , where w( x)  0 is a weight function.
a
Frequently, w( x) 1. An formula that approximates I ( f ) is called numerical
integration or a quadrature rule. In practice, if g ( x) approximates f ( x) well
enough, then I ( g )  I ( f ) .
Interpolatory Quadrature
Let pn ( x) be the Lagrangian interpolant of f ( x) at {xi}in0 . i.e.,
n
pn ( x)   j0 f ( x j )l j ( x) .
151
Define
Qn ( f )  I ( pn ) =
=
=
=
b
a pn( f )w(x)dx
b n


f
(
x
)
l
(
x
)
a  j0 j j  w(x)dx


b
n
f
(
x
)
j a l j ( x)w( x)dx
j 0
n
A f (x j ) ,
j 0 j

where the Aj are quadrature weights and the x j are the quadrature nodes.
Note that if f ( x)n , then f ( x)  pn ( x)  Qn ( f )  I ( f ) , i.e., the quadrature is
exact. If Qn ( f ) is exact for polynomials of degree  m , then we say the
quadrature rule has precision m. We will develop quadrature rules that have
precision  n later (e.g., Gaussian quadrature).
152
Method of Undetermined Coefficients
If Qn ( f ) has precision n, then it is exact for the monomials 1, x, x2, , xn .
Suppose the nodes are no longer fixed. We start with n+1 equations
n
I ( xk )   xk w( x)dx  Qn ( xk )  j0 A j xkj , 0  k  n ,
for our 2n+1 unknowns Aj and x j . Let k [0,2n 1]
so we have 2n+2
(nonlinear) equations and unknowns. If it has a solution, then it has precision
2n+1. This is what Gaussian quadrature is based on (which we will get to later).
The Trapezoidal and Simpson’s Rules are trivial examples.
Trapezoidal Rule
Let [a,b] [h, h], h  0, w( x) 1 . This is derived by direct integration rule. Take
x0  h and x1  h . Then
153
p1( x) = 1  f (h)(h  x)  f (h)(h  x)

2h 
Q1( x) = I ( p )  h p ( x)dx
1 
1
h

h
h 

1
= 4h  f (h)(h  x)2  f (h)(h  x)2 
h
h




= h  f (h)  f (h) .
Simpson’s Rule
This derived using undetermined coefficients. Let [a,b] [h, h], h  0, w( x) 1 ,
x0  h , x1  0 , and x2  h . We force
2
Q2 ( f )   j0 Aj f ( x j )  
h
h
154
f ( x)dx for f ( x) 1, x, x2 .
Then
I (1)
I ( x)
=
=
I ( x2 ) =
h
h1dx
h
h xdx
h
h
x2dx
= 2h = A0  A1  A2
=
0
=  A0h  A2h
3
2
h
=
= A0h2  A2h2
3
Solving this 33 system of linear equations gives us A0  A2  h and A1  4h .
3
3
Note that I (x3)  0  Q2(x3) , but I (x4)  Q2(x4) so that Simpson’s Rule has
precision 3.
155
What Does Increasing n Do for You?
n
Theorem 6.1: For any n , f C([a,b]) , let Qn ( f )   j0 A(jn) f ( x j ) be an
interpolatory quadrature derived by direct integration. Then K , constant, such
that
n
Q ( f )  I ( f ), f C([a,b]) .
 j0 A(jn)  K , n  nlim
 n
Justification for Positive Weights
We must have
b
a w(x)dx  0 . Further,
b
n
0  I (1)   w( x)dx  Qn (1)   j0 A(jn) .
a
If A(jn)  0, 0  j  n , and we can choose a set of x j ’s to get this, then Theorem
6.1 guarantees convergence. All positive weights are good because they reduce
156
roundoff errors since we ought to have as many roundoffs on the high and low
sides, thus canceling errors. Finally, we expect roundoff to be minimized when
the A(jn) ’s are (nearly) equal.
Translating Intervals
We will derive a formula on a specific interval, e.g., [1,1], and then apply it to
another interval [a,b]. Suppose that
n
Qn ( f )   j0 Aj g ( x j ) that approximates
and we want
b
a
1
f ( x)dx 
b
a g (t)dt
b  a , and   b  a . Then
.
Set
,
f
(
x
)
dx
x


t




a
2
2
 f ( t   )dt . Let g (t)  f ( t   ) . Then
b
1
Qn ( g )   j0 Aj f ( t j   ) approximates I ( g ) .
n
157
So,
Qn*( f ) =
=
n
* f (x )
A
j
j 0 j
b  a n A f (x ) ,
j
j 0 j
2


so
xj
at ba .
= b
2 j 2
Newton-Coates Formulas
Assume that the xi ’s are equally spaced in [a,b] and that we define a quadrature
rule by Qn ( f )   Aj f ( x j )
158
a
The closed Newton-Cotes formulas Qn ( f )   j0 Aj f ( x j ) assume that h  b 
n
n
n1
and xi  a  ih, 0  i  n . The open Newton-Cotes formulas Qn ( f )   j1 A*j f ( y j )
assume that h  b  a and yi  a  ih, 1 i  n 1.
n2
Examples:
T ( f )  b  a  f (a)  f (b)

2 


b

a
a

b

S( f ) 
f (a)  4 f (
)  f (b)

6 
2

2 point closed Trapezoidal
Rule
2 point closed Simpson Rule


2
1
1

1 f ( x)dx  3 2 f ( 2)  f (0)  2 f ( 2)
3 point open


1
1
1

1 f ( x)dx  4  f (1)  3 f ( 3)  3 f (3)  f (1)
4 point closed
1
1
159
For n 10 , the weights are always of mixed signs. Higher order formulas are not
necessarily convergent. Lower order formulas are extremely useful.
Suppose we have p3( x) , the Hermite interpolant of f ( x) . We want I ( p3) , which
we can get by observing that
I ( p3) = S ( p3)
2

 (b  a) 

b

a
=
 f (a)  f (b) 
 f '(a)  f '(b)


2 
12 
2
(
b

a
)

T
(
f
)

=
 f '(a)  f '(b)

12 
= CT ( f )
This is known as the Trapezoidal Rule with Endpoint Correction (a real
mouthful). It has precision 3.
160
Error Analysis
Assuming that f C n1([a,b]) , the error in interpolation is given by
(n1) ( )
n
f
en ( x)  f ( x)  pn ( x) 
W ( x), where W ( x)  i0( x  xi ) .
(n 1)!
The error in integration is
en =
b
a en (x)w(x)dx
f (n1) ( ) bW ( x)w( x)dx
=
(n 1)! a
So,
b
en  f (n1)  1  W ( x) w( x)dx .
 (n 1)! a
161
We can simplify the last equation by applying the Second Mean Value Theorem
(which states that for g, hC((a,b)) such that g does not change signs, then
b
a
b
g ( x)h( x)dx  h( ) g ( x)dx,  (a,b) ) to the formula for en . Hence,
a
f (2) ( ) b ( x  a)( x  b)dx
f ''( ) (b  a)3
=
= 
2! a
12
eT
= I ( f ) T ( f )
eCT
f (4) ( ) b ( x  a)2( x  b)2 dx
=
=
4! a
eS
(4) ( )
f
=
(b  a)5
720

=
162
f (4) ( ) 
90



ba
2
5




Composite Rules
What if we wanted a highly accurate rule on [a,b]? The best approach is to
divide [a,b] into subintervals, use a low order quadrature rule on each
subinterval, and add them up since high order quadrature rules tend to have
problems.
Let a0  x0  x1 
 xn  b . Then
n1 x j1
xj
b
I ( f )   f ( x)w( x)dx   j0 
a
f ( x)w( x)dx .
Consider w( x) 1, x j1  x j  h . Then for the Trapazoidal Rule,

n1 
Tn ( f ) = h  j0  f ( x j1)  f ( x j ) 
2


n1

h
h
=  j1 f ( x j )  2  f ( x0 )  f ( xn ) 
163
enT
3 n1
h

f ''( j ),  j ( x j , x j1)
=
12  j0





Theorem 6.2: Let g C([a,b]) and a
n1


j 
j 0
be constants of the same sign. If
t j [a,b], 0  j  n , then for some  [a,b] ,

n1
a g (t j )  g ( )
j0 j

n1
a .
j0 j
Hence,
enT
  f ''( )
n1 h3

j 012
f ''( )nh3   h2(b  a) f ''( ) .
12
12
164
Consider Simpson’s Rule:
x j1
x j
(4) ( )  5

x

x
f
j

j 1
j h
f ( x)dx  h f ( x j )  4 f (
)  f ( x j1) 
, x j  j  x j1 .
 2 
6
2
90

 






So,
x j  x j1 
n1
n1

h
Sn ( f )   f ( x0)  f ( xn )  2 j1 f ( x j )  4 j0 f (
)
6
2





and
enS  
n1
j 0
5
4
(4)
 
 
f (4) ( j )  h 2
( ) n  h  =  b  a  h  f (4) ( )
   f
 2 
90  2 
90
180  2 
 
4
 h (b  a) f (4) ( )
2880
=
.
165
Corrected Trapezoidal Rule
2
CTn ( f )  Tn ( f )  h  f '(a)  f '(b) and

12 
enCT
4 (b  a) (4)
h

f ( )
720
166
The number of function evaluations and order of error over n points is
Method
Tn ( f )
Sn ( f )
CTn ( f )
Evaluations
Function Derivative
N
2N+1
N
2
Order
O(h2 )
O(h4 )
O(h4 )
We can show that
lim T ( f )  nlim
S ( f )  nlim
cT ( f )  I ( f ) .
n n
 n
 n
If the function evaluation cost is quite high, CTn ( f ) becomes quite attractive
computationally, particularly if the endpoint derivatives are known or quite easy
to compute. While CTn ( f ) and Sn ( f ) are both O(h4 ) , there is a noticeable
difference in the constants, which needs to be considered in choosing n.
167
Adaptive Quadrature
Suppose we want I ( f ) to within an error tolerance of   0 and an automatic
procedure to accomplish this feat. Consider Sn ( f ) .
Motivation: Suppose f ( x) is badly behaved only over [ ,  ] [a,b], where [ ,  ]
is a small part of [a,b] . Then Sn ( f ) over [a, ] and [ ,b] will be accurate for
small n’s, but Sn ( f ) over [a,b] may be a very poor approximation to I ( f ) .
Doubling n will not necessarily increase accuracy over [a, ] and [ ,b] , where it
was already acceptable, and we still not get an acceptable approximation over
[ ,  ]. Instead, we want to subdivide [ ,  ] and work hard just there while doing
minimal work in [a, ] and [ ,b] … and we do not want to know where [ ,  ] is
in advance!
168
Adaptive quadrature packages accept [a,b] , f, and  and return EST, which
supposedly satisfies
b
a f (x)dx  EST   .
An error sensing mechanism is used on intermediate steps to control the overall
quadrature error. For instance, if [c, d ] [a,b] and H  d  c , then
b
a
5
(4) ( )  
f
 H  and
f ( x)dx  S ( f ) 
90  2 
b
a
4
(4) ( )  
f
f ( x)dx  S2 ( f ) 
2  H  ,
90
2

where  , [c, d ] . The critical (and sometimes erroneous) assumption is that
f (4) ( x)  K , K constant over [c, d ] . This is true when [c, d ] is small in
comparison to how rapidly f (4) ( x) varies in [c, d ] .
169
Set
d
Icd ( f )   f ( x)dx .
c
Then
Icd ( f )  S2 ( f )  1  Icd ( f )  S ( f ) ,
16 

which mean that S2 ( f ) is 16 times more accurate than S ( f ) when f (4) ( x) is
well behaved on [c, d ] . So,
16  Icd  S2    Icd  S  or 15  Icd  S2   S2  S  .





170



We know to compute both S ( f ) and S2 ( f ) over [c, d ] . Many applications
require that EST be very accurate, rather than inexpensive to compute. Hence,
we can use a conservative error estimator of the form,
Icd ( f )  S2 ( f )  1 S2 ( f )  S ( f ) .
2
Algorithm apparent: Compute S ( f ) and S2 ( f ) over [c, d ] .
1. If the error is acceptable, then add the estimate of Icd ( f ) into EST.
2. Otherwise, divide [c, d ] into two equal sized intervals and try again in both
intervals. The expected error on both intervals is reduced by a factor of 32.
The real estimator must depend on the size of [c, d ] , however. A good choice is
1 S ( f )  S ( f )    d  c  .
 b  a 
2 2


171
Theorem 6.3: This estimator will eventually produce a interval [c,e] that is
acceptable.
Proof: Every time we half the interval [c, d ] , the quadrature error decreases by a
factor of 32. Set
d
err(c, d )   f ( x)dx  EST (c, d ) .
c
If
then
err(a, z)   z  a and err( z,t)   t  z ,
ba
ba
err(a,t)  err(a, z)  err( z,t )   z  a   t  z   t  a .
ba ba ba
Taking t  b  err(a,b)   .
QED
Theorem 6.4: The cost is only two extra function evaluations at each step.
172
Folk Theorem 6.5: Given any adaptive quadrature algorithm, there exists an
infinite number of f ( x) ’s that will fool the Algorithm Apparent into failing.
(Better algorithms work for the usual f ( x) ’s.)
Proof: Let a  r1  s1  t1  u1  v1  b be 5 equally spaced points used in computing
S ( f ) and S2 ( f ) . Test
S2( f )  S ( f )  2
v1  r1
.
ba
 If true, then use S2 ( f ) as an estimate to Ir ,v ( f ) .
1 1
 If false, then retreat to [r2.v2 ], where r1  r2  s2  t2  u2  v2  t1 , equally
spaced. Now only evaluations at s2 and u2 are necessary if we saved our
v r
previous function evaluations. We test S2( f )  S ( f )  2 2 2 . If the test
ba
succeeds, then we pass on to interval [t2.v2] , otherwise we work on a new
level 3. This process is not guaranteed to succeed. Hence, we need to add an
173
extra condition that
vi  ri  HMIN always.
If this fails, then we cannot produce EST. QED
Richardson Extrapolation
This method combines two or more estimates of something to get a better
estimate. Suppose
a0 is estimated by A(h) ,
where A(h) is computable for any h 0 . Further, we assume that
lim A(h)  a0 .
h0
174
Finally, we assume that
m
a0  A(h)  ik aihi  Cm (h)hm1 ,
where the ai ’s are independent of h and ak  0 . Take h1  h, h2  rh1, and 0  r 1
with r  1 the most common value. We want to eliminate the hk term using a
2
combination of A(h) and A(rh) by noting that
m
a0  A(rh)  ik ai (rh)i  Cm (rh)(rh)m1.
We have two definitions of a0 , so we can equate them to compute r k  first +
second definitions, or
a0  r k a0  A(rh)  r k A(h)  ik ai (r i  r k )hi  Cm (rh)r m1  Cm (h)r k hm1 .
m

175

Set
= ai r  rk
1 r
m1
k
Cm(h) = Cm (rh)r k Cm (h)r
1 r
k
B(h) = A(rh)  rk A(h)
1 r
i
bi
k
Then
m
a0  B(h)  ik 1bihi  C m (h)hm1 .
If bk 1  0 , then we can repeat this process to eliminate the hk 1 term. Define
A0,m  A(r mh ), m  0,1,2,
176
Then
A0,m1  r k A0,m
Ai,m1  r k i Ai,m
and Ai1,m 
A1,m 
, m  0,1,
1 r k
1 r k i
Applications of Richardson Extrapolation
Differentiation is a primary application. Assume that f '( )  lim f (  h)  f ( ) .
h0
h
First, try for small h, A(h)  f (  h)  f ( ) . The Taylor expansion about x 
h
gives us
a0  A(h)  







f (m2) (h )  m1
i
2,
h

A
(
h
)

a
h

C
(
h
)
h
h  
m
1
(i 1)! 
(m  2)! 


f (i1) ( ) 


m 

i1

177
where the ai ’s are independent of h and probably unknown.
Second, try A(h)  f (  h)  f (  h) . We can prove that
2h
(3) ( )
(5) ( )
f
f
2
A(h)  f '( ) 
h 
h4 
3!
5!
We can modify the definition of Ai1,m to use r 2, r 4, r 6,
Then
2 A(h)
A
(
rh
)

r
B(h) 
 f '( )  b4h4 
2
1 r
4 B(h)
B
(
rh
)

r
Next extrapolatation must be of the form
. So,
1 r 4
k 2
Ai1,m  Ai,m1  rk 2i Ai,m .
1 r
Use this formula whenever
178
a0  A(h)  ak hk  ak 2hk 2  ak 4hk 4 
Romberg Integration
b
On TN ( g ), h  b  a, r= 1 , approximate I ( g )   g ( x)dx . Define
a
2

b

a
T0,m  m  1 g0  g1 
2 2

1
 gs1  gs  ,
2 
where gi  g ( xi ), xi  a  i b ma , s  2m . This choice of T0,m eliminates half of the
2
g ( x) function evaluations when computing T0,m1. The error only contains even
powers of h. Hence,
179




i




1 T
T

1
3(T0,m1  T0,m )
i1,m1 4 i1,m
Ti1,m1 Ti1,m
4
T1,m 
or Ti,m 
.
 Ti1,m1 
i
i 1
4
4
 
1  1 
 4
 
Continue extrapolation as long as
Ti,m Ti,m1 i1
Ri,m 
4 .
Ti,m1 Ti,m
Roundoff error is the typical culprit for stopping Richardson extrapolation.
180
7. Automatic Differentiation (AD)
This is a technique to numerically evaluate the derivative of a function using a
computer program. There have been two standard techniques in the past:
 Symbolic differentiation
 Numerical differentiation
Symbolic differentiation is slow, frequently produces many pages of expressions
instead of a compact one, and has great difficulty converting a computer
program. Numerical differentiation involves finite differences, which are subject
to roundoff errors in the discretization and cancellation effects. Higher order
derivatives exasperate the difficulties of both techniques.
“Automatic differentiation solves all of the mentioned problems.” Wikipedia
Throughout this section, we follow Wikipedia’s AD description and use its
figures. The most comprehensive AD book is Griewank’s SIAM 300 pager.
181
The primary tool of AD is the chain rule,
df  dg  dh for a function f ( x)  g (h( x)) .
dx dh dx
There are two ways to traverse the chain rule:
 Right to left, known as forward accumulation.
 Left to right, known as backward accumulation.
182
Assume that any computer program that evaluates a function y  F ( x) can be
decomposed into a sequence of simpler, or elementary partial differivatives,
each of which is differentiated using a trivial table lookup procedure. Each
elementary partial derivative is evaluated for a particular argument using the
chain rule to provide derivative information about F (e.g., gradients, tangents,
Jacobian matrix, etc.) that is exact numerically to some level of accuracy.
Problems with symbolic mathematics are avoided by only using it for a set of
very basic expressions, not complex ones.
183
Forward accumulation
First compute dg then dh in dg (h( x))  dg  dh .
dx
dh dx
dh
dx
Example: Find the derivative of f ( x1, x2 )  x1x2  sin( x1) . We have to seed the
expression to distinguish between the derivative for x1 and x2 .
Original code statements
w1  x1
w2  x2
w3  w1w2
w4  sin(w1)
w5  w3  w4
Added AD statements
w1' 1 (seed)
w2'  0 (seed)
w3'  w1' w2  w2' w1 1x2  x10  x2
w4'  cos(w1)w1'  cos( x1)1
w5'  w3'  w4'  x2  cos( x1)
184
Forward accumulation traverses the figure from bottom to top to accumulate the
result.
185
In order to compute the gradient of f, we have to evaluate both f and f ,
x1
x2
which corresponds to using seeds x1 1, x2  0 and x1  0, x2 1, respectively.
The computational complexity of forward accumulation is proportional to the
complexity of the original code.
Reverse accumulation
First compute dh then dg in dg (h( x))  dg  dh .
dx
dh dx
dh
dx
Example: As before. We can produce a graph of the steps needed. Unlike,
forward accumulation, we only need one seed to walk through the graph (from
top to bottom this time) to calculate the gradient in half the work of forward
accumulation.
186
Superiority condition of forward versus reverse accumulation
Forward accumulation is superior to reverse accumulation for functions
f :  m, m 1. Reverse accumulation is superior to forward accumulation
for functions f : n  , n 1.
187
Jacobean computation
The Jacobean J of f :
using either
n
m
is a mn matrix. We can compute the Jacobian
 n sweeps of forward accumulation, where each sweep produces a column of
J.
 m sweeps of backward accumulation, where each sweep produces a row of
J.
Computing the Jacobian with a minimum number of arithmetic operations is
known as optimal Jacobian accumulation and has been proven to be a NPcomplete hard problem.
188
Dual numbers
We define a new arithmetic in which every x is replaced by x  x ' , where
x ' and  is nothing but a symbol such that  2  0 . For regular arithmetic, we
can show that
( x  x ' )  ( y  y ' )  x  y  ( x ' y ') ,
( x  x ' )( y  y ' )  xy  xy '  yx '  x ' y '  xy  (x ' y ') ,
and similarly for subtractraction and division. Polynomials can be calculated
using dual numbers:
P( x  x ' ) = p0  p1( x  x ' )   pn ( x  x ' )n

 
=  p0  p1x   pn xn    p1x '  2 p2 xx ' 

= P( x)  P(1) ( x) x ' ,


189
 npn xn1x ' 

where P(1) ( x) represents the derivative of P with respect to its first argument and
x ' is an arbitrarily chosen seed.
The dual number based arithmetic we use consists of ordered pairs  x, x '  with
ordinary arithmetic on the first element and first order differential arithmetic on
the second element. In general for a function f, we have
f ( u,u ' , v,v ' )  f (u,v), f (1)(u,v)u ' f (2)(u, v)v '  ,
where f (1) and f (2) represent the derivative of f with respect to the first and
second arguments, respectively. Some common expressions are the following:
 u,u '    v, v '  u  v,u ' v '  and  u,u '   v, v '  u  v,u ' v ' 
 u,u ' * v,v '  uv,u 'v  uv '  and  u,u '   uv , u 'v 2uv ' , v  0
 v, v ' 
v
190
sin  u,u '  sin(u),u 'cos(u)  and cos  u,u '  cos(u), u 'sin(u) 
eu,u '  eu ,u 'eu  and log  u,u '  log(u), uu'
 u,u ' k  uk , kuk 1u '  and  u,u '   u ,u ' sign(u)
c  c,0 
The derivative of f :
given by




at some point x
in some direction x '
is
 y1, y1 ' , ,  ym, ym '    f   x1, x1 ' , ,  xm, xm '  



using the just defined arithmetic. We can generalize this method to higher order
derivatives, but the rules become quite complicated. Truncated Taylor series
191
arithmetic is typically used instead since the Taylor summands in a series are
known coefficients and derivatives of the function in question.
Implementations
Google “automatic differentiation” and just search through the interesting sites.
Oldies, but goodies:




ADIFOR (Fortran 77)
ADIC (C, C++)
OpenAD (Fortran 77/95, C, C++)
MAD (Matlab) – not recommended!
Typically, the transformation process is similar to the following:
192
8. Numerical Differential (Finite Differences)
Assume uC1([a,b]) . Then
u '(x)  lim u(x  h) u(x)  lim u(x) u(x  h)
h0
h0
h
h
for all x(a,b) . This suggests a finite difference approach to estimating u '( x) .
Let
a  x1  x2   xN  xN 1  b .
For simplicity assume that
193
xi  a  ih, h  b  a , and i  0,1, , N 1,
N 1
which is known as a uniform mesh. We will use Taylor expansions about one of
more points liberally, e.g.,
2
ui1  u(xi  h)  u(xi )  hu '(xi )  h u ''(xi ) 
2
There are 3 common first differences of note:
Forward
ui' 
ui1 ui h ''
 ui
2
h
O(h)
Backward
ui' 
ui ui1 h ''
 ui
2
h
O(h)
Central
ui1 ui1 h2 '''
'
ui 
 ui
6
2h
O(h2)
194
While the forward and backward differences are 1st order with mesh spacing h,
they are second order for mesh spacing h/2 for midpoints xi1/2 !
To get an approximation to the 2nd derivative, we add two Taylor expansions
about the points xi1 to get
u  2ui  ui1 h2 (4)
2) .
O
(
h

u
,
which
is
12 i
h2
ui''  i1
These formulae are frequently reduced to stencils involving only 2-3 adjacent
points in the mesh:
ui''  h2[1,2,1]
ui'  h1[0,1,1], h1[1,1,0], or (2h)1[1,0,1].
195
There are many more formulae with specific properties that can be derived by
matching terms in specific Taylor expansions.
Example (upwind difference): Find a one sided, 2nd order finite difference for
u '( x) , i.e.,
ui' 
aui  bui1  cui2
 O(h2) .
h
Expand about the points of interest to see that
c:
ui2 = u  2hu'  (2h)2 u''  (2h)3 u''' 
i
i
2 i
6 i
b:
ui1 = u  hu'  h2 u''  h3 u''' 
i
i 2 i 6 i
a:
ui
= ui
196
So,
hui
= aui  bui1  cui2  O(h3)
2
= (a  b  c)ui  h(2b  c)ui'  h (4c  b)ui''  O(h3)
2
We are left solving









or a  3, b 2, and c  1 .
2
2
a b  c  0
2b  c  1
b  4c  0
Hence,
ui' 
3ui  4ui1  ui2
 O(h2)
2h
197
The stencil is ui'  (2h)1[1,4,3,0,0] . Note that the trailing 0’s are sometimes left
off if the meaning is completely clear. In practice, with a stencil based code, the
0’s usually are left off since the ith location in the stencil has to be specified.
We can apply finite differences to an elliptic differential equation with boundary
values, which is also known as an elliptic boundary-value problem (BVP). This
one is also known as Laplace’s equation in one dimension (1D):







uxx  g ( x) in (0,1)
u(0)  u(1)  0
On a uniform mesh we get the (N  2)(N  2) following system of linear
equations:
198








































1
1 2 1
1 2 1


0 
1 

2 






N 

N 1
u
u
u
1 2 1
u
1 2 1 u
1
 




















1 
2 






N 


0
g
g
g
0
h2
,
which is nonsymmetric. We can eliminate the first and last rows and columns
(since we know the boundary values of u) to get an N  N symmetric, positive
definite system of linear equations instead. Any of the methods we used earlier
for solving systems of linear equations (direct or iterative) work well to solve
this problem.
199
Variable coefficients can be handled by taking the correct Taylor expansions
and combining them. Consider the differential equation
(a(x)ux)x  s(x)u  g(x), a(x)  a0  0 and s(x)  0 .
Suitable Taylor expansions lead to a finite difference scheme of
ai1/2ui1  (ai1/2  ai1/2)ui  ai1/2ui1
 siui  gi ,
h2
which is O(h2) .
Question: What happens when a( x) is unavailable to evaluate? Then ai1/2 has
to be interpolated to O(h2) or better. This leads to another error term.
200
Error analysis of finite difference schemes leads to considering what is known as
the Lax equivalence theorem, which can be summarized by
Consistency + Stability = Convergence.
Consistency determines the order of accuracy of a difference scheme plus the
truncation error.
Stability determines the frequency distribution of the error (usually by
investigating eigenvalue type analysis).
n
un . We want
Absoulte stability is based on considering  n  uexact
 n1   n and prefer that  n1   n .
Conditional stability is simlar, but there is at least one condition to guarantee
stability.
201
Time and Space finite differences
Consider an initial value problem (IVP)
dU  F (t,U ) with U (t  0) U .
0
dt
If F (t,U )  F (U ) , then there are very efficient special methods, which are in the
textbook, but not here. Consider some typical explicit cases:
Forward
Euler
U n1 U n  F (t,U n)
t
Leapfrog
U n1 U n1  F (t,U n )
2t
Multistep
U n1 U nk  F (t,U n1,U n,U n1, )
(k 1)t
or U n1 U n tF (t,U n )
202
The general formula for a k-step scheme is
k
k
j 0
j 0
  jU n1 j t   j F n1 j
with 0 1 (normalization) and either 0  0 (explicit) or 0  0 (implicit).
Example: Adams-Bashford family
1st order
Forward Euler
2nd order
U n1 U n  3 F n  1 F n1
t
2
2
rd
3 order
U n1 U n  23 F n 16 F n1  5 F n2
t
12
12
12
203
The first few steps use lower order methods, which can cause problems and
spurious errors in later time steps.
Multi-stage methods use a weighted sum of corrections U k within one time
step. So,
U n1 U n  C1U1  C2U 2  C3U 3 
The Ck are determined by matching terms in a Taylor expansion. Typically,
U1 tF (t n,U n )
(Forward Euler)
U 2 tF (t n t,U n  U 1)
204
Runge-Kutta methods are popular and usually either Total Variation Diminshing
(TVD) or … Bounded (TVB), which do not allow any spurious oscillations to
appear in the numerical solution and ruin all further calculations.
RK2 / 2 level storage scheme:
Set X U n, Y  F (t n, X ) .
Compute X  X tY , Y  aY  F (t n t, X ), a 1 2  2
Update U n1  X  t Y
2
Note that  1/2 is the modified Euler method and  1 is Heun’s method.
205
Classic RK4 / 4 levels of storage:
Compute X1  F (t n,U n ), X 2  F (t n  t ,U n tX1), X 3  F (t n  t ,U n tX 2),
2
2
and X 4  F (t n t,U n tX3)






Update U n1 U n  t  X1  2 X 2  X 3   X 4 

6


Note that there is a trick that reduces this scheme to only 3 levels of storage.
206
Implicit Time Stepping:
Consider the IVP dU  F (t,U ), U (t  0) U0 . The  family of methods is defined
dt
by
U n1 U n  F (t n1,U n1)  (1 )F (t n,U n) .
t
Note that =0 is Forward Euler (explicit), =1 is Backward Euler (implicit), and
=1/2 is Crank-Nicolson (implicit). For implicit methods, some sort of direct
solver is implied (or an iterative methods approximation).




n
n
2 n
2 n
3 n
Consistency: dU  F n t  dF  1 d U2   (t)2  d F2  1 d U3  
t
6 dt 
dt 2 dt 

 2 dt


As t  0 , the right hand side goes to zero, so we recover the IVP.
207
n1
n
Stability: We study dU  U , Re()  0. So, U U U n1  (1 )U n .
t
dt
Looking at the error at the nth step and doing some algebraic manipulations, we
get
 n1   n1 (1 )t .
1t
 We have absolute stability if (1 2 )t  2 .
 Conditional stability 0   1 and requires that t  2 .
(1 2 )
2
 Unconditional stability whenever   1 .
2
For Forward Euler, we must have h  1 , which is a very serious constraint as
(t)2 2
t  0 . Implicit methods are more expensive per step, but can use much, much
bigger time steps.
208
9. Monte Carlo Methods
Monte Carlo (MC) methods use repeated random sampling to solve
computational problems when there is no affordable deterministic algorithm.
Most often used in




Physics
Chemistry
Finance and risk analysis
Engineering
MC is typically used in high dimensional problems where a lower dimensional
approximation is inaccurate.
Example: n year mortgages paid once a month. The risk analysis is in 12n
dimensions. For a 30 year mortgage we have a 360 dimensional problem.
Integration (quadrature rules) above 8 dimensions is impractical.
209
Main drawback is the addition of statistical errors to the systematic errors. A
balance between the two error types has to be made intelligently, which is not
always easy nor obvious.
Short history: MC methods have been used since 1777 when the Compte de
Buffon and Laplace each solved problems. In the 1930’s Enrico Fermi used MC
to estimate what lab experiments would show for neutron transport in fissile
materials. Metropolis and Ulam first called the method MC in the 1940’s. In
1950’s MC was expanded to use an probability distribution, not just Gaussian.
In the 1960’s and 1970’s, quantum MC and variational MC methods were
developed.
MC Simulations: The problem being solved is stochastic and the MC method
mimics the stochastic properties well. Example: neutron transport an decay in a
nuclear reactor.
MC Calculations: The problem is not stochastic, but is solved using a stochastic
MC method. Example: high dimension integration.
210
Quick review of probability
Event B is a set of possible outcomes that has probability Pr(B) . The set of all
events is denoted by  and particular outcomes are  . Hence, B .
Suppose B,C . Then B  C represents events in both B and C. Similarly,
B  C represents events that are in B or C.
Some axioms of probability are
1. Pr(B)[0,1]
2. Pr( A B)  Pr( A)  Pr(B) if A B 
3. B  C  Pr(B)  Pr(C)
4. Pr() 1
The conditional probability that a C outcome is also a B outcome is given by
Bayes formula,
Pr(B | C)  Pr(B C) .
Pr(C)
211
Frequently, we already know both Pr(B | C) and Pr(C) and use Bayes formula to
calculate Pr(B C) .
Events B and C are independent if Pr(B C)  Pr(B)Pr(C)  Pr(B)  Pr(B | C) .
If  is either finite or countable, we call  discrete. In this case we can specify
all probabilities of possible outcomes as
f k  Pr(  k )
and an event B has probability
Pr(B) 

kB
Pr(k ) 
212

kB
fk .
A discrete random variable is a number X () that depends on the random
outcome  . As an example, in coin tossing, X () could represent how many
heads or tails came up. For xk  X (k ) , define the expected value by
E[ X ] 
 X ()Pr()   xk f k .


The probability distribution of a continuous random variable is described using a
probability density function (PDF) f ( x) . If X  n and B n , then
Pr(B) 

f ( x)dx and E[ X ] 
xB
n xf (x)dx .
The variance in 1D is given by
 2  var( X )  E[( X  E[ X ])2]   ( x  E[x])2 f (x)dx .
213
The notation is identical for discrete and continuous random variables.
For 2 or higher dimensions, there is a symmetric n n variance/covariance
matrix given by
C  E[( X  E[ X ])( X  E[ X ])T ] ,
where the matrix elements C jk are given by
C jk  E[( X j  E[ X j ])( X k  E[ X k ])]  cov[ X j , X k ] .
The covariance matrix is positive semidefinite.
214
Common random variables
The standard uniform random variable U has a probability density of
f (u)  1 if 0  u 1
0 otherwise.





We can create a random variable in [a,b] by Y  (b  a)U  a . The PDF for Y is






1
a  y b
g ( y)  b  a
0 otherwise.
The exponential random variable T with rate constant   0 has a PDF





t
if 0  t
f (t )  e
0 otherwise.
215
The standard normal is denoted by Z and has a PDF
f ( z)  1 e z2 /2 .
2
The general normal with mean  and variance  2 is given by X  Z   and
has PDF
f ( x) 
1
2 2
e( x )2 /2 2 .
We write X ~ N (, 2) in this case. A standard distribution has X ~ N (0,1) .
If an n component random variable X is a multivariate normal with mean  and
covariance C, then it has a probability density
216
n
( x )T C1 x /2


1
f ( x)  e
, where Z   2  det(C) .


Z






Multivariate normal possess a linear transformation property: suppose L is an
mn matrix with rank m, so L : n  m and onto. If X  n and Y  m are
multivariate normal, then the covariance matrix for Y is
CY  LCX LT assuming that   0 .
Finally, there are two probability laws/theorems that are crucial to believing that
MC is relevant to any problem:
1. Law of large numbers
2. Central limit theorem
Law of large numbers: Suppose A  E[ X ] and  X k | k 1,2,

approximation of A is
217



 X . The
n
1
ˆ
An  n  X k  A as n .
k 1
All estimators satisfying the law of large numbers are denoted consistent.
Central limit theorem: If  2  var[ X ], then Rn  Aˆn  A  N (0, 2 / n) .
Hence, recalling that A is not random, that
Var(Rn ) Var( Aˆn )  1nVar( X ) .
The law of large numbers makes the estimator unbiased. The central limit
theorem follows from the independence of the X k . When n is large enough, Rn
is approximately normal, independent of the distribution of X as long as
E[ X ]  .
218
Random number generators
Beware simple random number generators. For example, never, ever use the
UNIX/Linux function rand. It repeats much too quickly. The function random
repeats less frequently, but is not useful for parallel computing. Matlab has a
very good random number generator that is operating system independent.
Look for digital based codes developed 20 years ago by Michael Mascagni for
good parallel random number generators. These are the state of the art even
today.
However, the best ones are analog: they measure the deviations in the electrical
line over time and normalize them to the interval [0,1] . Some CPUs do this as a
hardware instruction for sampling the deviation. These are the only true random
number generators available on computers. The Itanium2 CPU line has this built
in. Some other chips have this, too, but finding operating systems that will
sample this instruction is hard to find.
219
Sampling
A simple sampler produces and independent sample of X each time it is called.
The simple sampler turns standard uniforms into samples of some other random
variable.
MC codes spend almost all of their time in the sampler. Optimizing the sampler
code to reduce its execution time can have a profound effect on the overall run
time of the MC computation.
In the discussion below rng() is a good random number generator.
220
Bernoulli coin tossing
A Bernoulli random variable with parameter p is a random variable X with
Pr( X 1)  p and Pr( X  0) 1 p .
If U is a standard uniform, then p  Pr(U  p) . So we can sample X using the
code fragment
if ( rng() <= p ) X = 1; else X = 0;
For a random variable with a finite number of values
Pr( X  xk )  pk with  pk 1,
we sample it using the unit interval and dividing it into subintervals of length pk.
This works well with Markov chains.
221
Exponential
If U is a standard uniform, then
T   1 ln(U )
is an exponential with rate parameter  with units 1/Time. Since 0<U<1,
ln(U )  0 and T>0. We can sample T with the code fragment
T = -(1/lambda)*log(rng());
The PDF of the random variable T is given by f (t )  et for some t>0.
222
Cumulative density function (CDF)
Suppose X is a one component random variable with PDF f ( x) . Then the CDF
F ( x)  Pr( X  x)   f ( x ')dx ' . We know that 0  F ( x) 1, x and any u[0,1]
x' x
there is an x such that F ( x)  u .
The simple sampler can be coded with
1. Choose U = rng()
2. Find X such that F ( X ) U
Note that step 2 can be quite difficult and time consuming. Good programming
reduces the time.
There is no elementary formula for the cumulative normal N ( z) . However there
is software available to compute it to approximately double precision. The
inverse cumulative normal z  N 1(u) can also be approximated.
223
The Box Muller method
We can generate two independent standard normal from two independent
standard uniforms using the formulas
R

Z1
Z2
 2ln(U1)
 2U 2
 R cos()
 R sin()
We can make N independent standard normal by making N standard uniforms
and then using them in pairs to make N/2 pairs of independent standard normal.
224
Multivariate normals
Let X  n be a multivariate normal random variable with mean 0 and
covariance matrix C. We sample X using the Cholesky factorization of C  LT L ,
where L is lower triangular. Let Z n be a vector of n independent standard
normal generated by the Box Muller method (or similarly). Then cov[Z ]  I . If
X  LZ , then X is multivariate normal and has cov[ X ]  LILT  C .
There are many more methods that can be studied, e.g., Rejection.
Testing samplers
All scientific software should be presumed wrong until demonstrated to be
correct. Simple 1D samplers are testing using tables and histograms.
225
Errors
Estimating the error in a MC calculation is straightforward. Normally a result
with an error estimate is given when using a MC method.
n
Suppose X is a scalar random variable. Approximate A  E[ X ] by Aˆn  1n k 1 X k
. The central limit theorem states that Rn  Aˆn  A   nZ , where  n is the standard
deviation of Aˆn and Z ~ N (0,1) . It can be shown that
 n  1 , where  2  var[ X ]  E[( X  A)2] ,
2
which we estimate using
n
n
ˆn )2 then take ˆ  1 ˆn2 .
(
X

A

n
k
n
n
ˆ 2  1
k 1
226
Since Z is of order 1, Rn is of order ˆ n .
We typically report the MC data as A  Aˆn ˆn . We can plot circles with a line
for the diameter called the (standard deviation) error bar. We can think of k


standard deviation error bars  Aˆn  kˆn , Aˆn  kˆn  , which are confidence levels.




The central limit theorem can be used to show that










 






Pr Aˆn ˆn, Aˆn ˆn    66% and Pr   Aˆn  2ˆn, Aˆn  2ˆn    95% .
It is common in MC to report one standard deviation error bar. To interpret the
data correctly, knowledge that the real data is outside of the circle one-third of
the time has to be understood.
227
Integration (quadrature)
We want to approximate a d dimensional integral to an accuracy of   0 .
Assume we can do this using N quadrature points. Consider Simpson’s rule. For
a function f ( x): n  ,   N d /4 . MC integration can be done so that
  N 1/2 independent of d as long as the variance of the integrand is finite.
MC integration
Let V be the domain of integration. Define I ( f )   f ( x)dx and for uniform
V
xi V let
N
N
 f  1 i1 f ( xi ) and  f 2  1 i1 f 2 ( xi ) .
N
N
Then
2    f 2

f
I ( f )  f   
.
N 1
228
Download