Numerical Methods for Finance

advertisement
Numerical Methods for Finance
Dr Robert Nürnberg
This course introduces the major numerical methods needed for quantitative work in
finance. To this avail, the course will strike a balance between a general survey of
significant numerical methods anyone working in a quantitative field should know, and a
detailed study of some numerical methods specific to financial mathematics. In the first
part the course will cover e.g.
linear and nonlinear equations, interpolation and optimization,
while the second part introduces e.g.
binomial and trinomial methods, finite difference methods, Monte-Carlo simulation,
random number generators, option pricing and hedging.
The assessment consists of 80% an exam and 20% a project.
1
1. References
1. Burden and Faires (2004), Numerical Analysis.
2. Clewlow and Strickland (1998), Implementing Derivative Models.
3. Fletcher (2000), Practical Methods of Optimization.
4. Glasserman (2004), Monte Carlo Methods in Financial Engineering.
5. Higham (2004), An Introduction to Financial Option Valuation.
6. Hull (2005), Options, Futures, and Other Derivatives.
7. Kwok (1998), Mathematical Models of Financial Derivatives.
8. Press et al. (1992), Numerical Recipes in C. (online)
9. Press et al. (2002), Numerical Recipes in C++.
10. Seydel (2006), Tools for Computational Finance.
11. Wilmott, Dewynne, and Howison (1993), Option Pricing.
2
12. Winston (1994), Operations Research: Applications and Algorithms.
3
2. Preliminaries
1. Algorithms.
An algorithm is a set of instructions to construct an approximate solution to a mathematical problem.
A basic requirement for an algorithm is that the error can be made as small as we like.
Usually, the higher the accuracy we demand, the greater is the amount of computation
required.
An algorithm is convergent if it produces a sequence of values which converge to the
desired solution of the problem.
Example: Given c > 1 and ε > 0, use the bisection method to seek an approxima√
tion to c with error not greater than ε.
4
Example
√
Find x = c, c > 1 constant.
Answer
√
x = c ⇐⇒ x2 = c ⇐⇒ f (x) := x2 − c = 0
⇒
⇒
f (1) = 1 − c < 0
∃ xˉ ∈ (1, c) s.t.
f 0(x) = 2 x > 0
and
f (ˉ
x) = 0
⇒
f (c) = c2 − c > 0
f monotonically increasing
Denote In := [an, bn] with I0 = [a0, b0] = [1, c]. Let xn :=
⇒
an +bn
2 .
(i) If f (xn) = 0 then xˉ = xn.
(ii) If f (an) f (xn) > 0 then xˉ ∈ (xn, bn), let an+1 := xn, bn+1 := bn.
(iii) If f (an) f (xn) < 0 then xˉ ∈ (an, xn), let an+1 := an, bn+1 := xn.
5
xˉ is unique.
Length of In : m(In) = 12 m(In−1) = ∙ ∙ ∙ = 21n m(I0) =
Algorithm
Algorithm stops if m(In) < ε and let x? := xn.
Error as small as we like?
c−1
2n
xˉ, x? ∈ In
⇒
error |x? − xˉ| = |xn − xˉ| ≤ m(In) → 0 as n → ∞. X
Convergence?
⇒
I0 ⊃ I1 ⊃ ∙ ∙ ∙ ⊃ In ⊃ ∙ ∙ ∙
∞
\
√
∃! xˉ ∈
In ,
f (ˉ
x) = 0 , i.e. xˉ = c. X
n=0
Implementation:
No need to define In = [an, bn]. It is sufficient to store only 3 points throughout.
Suppose xˉ ∈ (a, b), define x := a+b
2 .
If xˉ ∈ (a, x) let a := a, b := x, otherwise let a := x, b := b.
6
2. Errors.
There are various errors in computed solutions, such as
• discretization error (discrete approximation to continuous systems),
• truncation error (termination of an infinite process), and
• rounding error (finite digit limitation in computer arithmetic).
If a is a number and e
a is an approximation to a, then
the absolute error is |a − e
a| and
|a − e
a|
provided a 6= 0.
the relative error is
|a|
Example: Discuss sources of errors in deriving the numerical solution of the nonlinear differential equation x0 = f (x) on the interval [a, b] with initial condition
x(a) = x0.
7
Example
discretization error
[differential equation]
x0 = f (x)
x(t + h) − x(t)
= f (x(t)) [difference equation]
h
x(t + h) − x(t)
0
DE = − x (t)
h
truncation error
lim xn = x,
n→∞
approximate x with xN , N a large number.
TE = |x − xN |
rounding error
We cannot express x exactly, due to finite digit limitation. We get x̂ instead.
RE = |x − x̂|
Total error = DE + TE + RE.
8
3. Well/Ill Conditioned Problems.
A problem is well-conditioned (or ill-conditioned) if every small perturbation of the
data results in a small (or large) change in the solution.
Example: Show that the solution to equations x + y = 2 and x + 1.01 y = 2.01 is
ill-conditioned.
Exercise: Show that the following problems are ill-conditioned:
(a) solution to the differential equation x00 − 10x0 − 11x = 0 with initial conditions
x(0) = 1 and x0(0) = −1,
(b) value of q(x) = x2 + x − 1150 if x is near a root.
9
Example
Change 2.01 to 2.02:

x + y = 2
x + 1.01 y = 2.01
⇒

x + y = 2
x + 1.01 y = 2.02
⇒

x = 1
y = 1

x = 0
y = 2
I.e. 0.5% change in data produces 100% change in solution: ill-conditioned !
1 1
[reason: det
1 1.01
!
= 0.01
⇒
nearly singular]
10
4. Taylor Polynomials.
Suppose f, f 0, . . . , f (n) are continuous on [a, b] and f (n+1) exists on (a, b). Let x0 ∈
[a, b]. Then for every x ∈ [a, b], there exists a ξ between x0 and x with
f (x) =
n
X
f (k)(x0)
k=0
k!
(x − x0)k + Rn(x)
f (n+1)(ξ)
(x − x0)n+1 is the remainder.
where Rn(x) =
(n + 1)!
Z x (n+1)
f
(t)
[Equivalently: Rn(x) =
(x − t)n dt.]
n!
x0
Examples:
•
•
exp(x) =
sin(x) =
∞
X
xk
k=0
∞
X
k=0
k!
(−1)k 2k+1
x
(2k + 1)!
11
5. Gradient and Hessian Matrix.
Assume f : Rn → R.
The gradient of f at a point x, written as ∇f (x), is a column vector in Rn with ith
∂f
(x).
component ∂x
i
The Hessian matrix of f at x, written as ∇2f (x), is an n × n matrix with (i, j)th
2f
∂ 2f
∂ 2f
2
(x).
[As
=
,
∇
f (x) is symmetric.]
component ∂x∂i∂x
∂xi ∂xj
∂xj ∂xi
j
Examples:
•
•
•
f (x) = aT x, a ∈ Rn
⇒
f (x) = 12 xT A x, A symmetric
∇f = a, ∇2f = 0
⇒
∇f (x) = A x, ∇2f = A
f (x) = exp( 12 xT A x), A symmetric
⇒ ∇f (x) = exp( 12 xT A x) A x,
∇2f (x) = exp( 12 xT A x) A x xT A + exp( 12 xT A x) A
12
6. Taylor’s Theorem.
Suppose that f : Rn → R is continuously differentiable and that p ∈ Rn.
Then we have
for some t ∈ (0, 1).
f (x + p) = f (x) + ∇f (x + t p)T p ,
Moreover, if f is twice continuously differentiable, we have
Z 1
∇2f (x + t p) p dt,
∇f (x + p) = ∇f (x) +
0
and
1 T 2
f (x + p) = f (x) + ∇f (x) p + p ∇ f (x + t p) p,
2
for some t ∈ (0, 1).
T
13
7. Positive Definite Matrices.
An n × n matrix A = (aij ) is positive definite if it is symmetric (i.e. AT = A) and
[I.e. xT A x ≥ 0 with “=” only if x = 0.]
xT A x > 0 for all x ∈ Rn \ {0}.
The following statements are equivalent:
(a) A is a positive definite matrix,
(b) all eigenvalues of A are positive,
(c) all leading principal minors of A are positive.
The leading principal minors of A are the determinants Δk , k = 1, 2, . . . , n, defined
by
"
#
a11 a12
, . . . , Δn = det A.
Δ1 = det[a11], Δ2 = det
a21 a22
A matrix A is symmetric and positive semi-definite, if xT A x ≥ 0 for all x ∈ Rn.
Exercise.
Show that any positive definite matrix A has only positive diagonal entries.
14
8. Convex Sets and Functions.
A set S ⊂ Rn is a convex set if the straight line segment connecting any two points
in S lies entirely inside S, i.e., for any two points x, y ∈ S we have
α x + (1 − α) y ∈ S
∀ α ∈ [0, 1].
A function f : D → R is a convex function if its domain D ⊂ Rn is a convex set
and if for any two points x, y ∈ D we have
f (α x + (1 − α) y) ≤ αf (x) + (1 − α) f (y)
∀ α ∈ [0, 1].
Exercise.
Let D ⊂ Rn be a convex, open set.
(a) If f : D → R is continuously differentiable, then f is convex if and only if
f (y) ≥ f (x) + ∇f (x)T (y − x)
∀ x, y ∈ D .
(b) If f is twice continuously differentiable, then f is convex if and only if ∇2f (x) is
positive semi-definite for all x in the domain.
15
Exercise 2.8.
(a) “⇒”
As f is convex we have for any x, y in the convex set D that
f (α y + (1 − α) x) ≤ αf (y) + (1 − α) f (x)
∀ α ∈ [0, 1].
Hence
f (x + α (y − x)) − f (x)
+ f (x) .
α
Letting α → 0 yields f (y) ≥ f (x) + ∇f (x)T (y − x).
“⇐”
For any x1, x2 ∈ D and λ ∈ [0, 1] let x := λ x1 + (1 − λ) x2 ∈ D and y := x1.
On noting that y − x = x1 − λ x1 − (1 − λ) x2 = (1 − λ) (x1 − x2) we have that
f (y) ≥
f (x1) = f (y) ≥ f (x) + ∇f (x)T (y − x) = f (x) + (1 − λ) ∇f (x)T (x1 − x2) .
(†)
Similarly, letting x := λ x1 + (1 − λ) x2 and y := x2 gives, on noting that y − x =
λ (x2 − x1), that
f (x2) ≥ f (x) + λ ∇f (x)T (x2 − x1) .
(‡)
16
Combining λ ∙ (†) + (1 − λ) ∙ (‡) gives
λ f (x1) + (1 − λ) f (x2) ≥ f (x) = f (λ x1 + (1 − λ) x2
⇒ f is convex.
(b) “⇐”
For any x, x0 ∈ D use Taylor’s theorem at x0:
1
T
f (x) = f (x0)+∇f (x0) (x−x0)+ (x−x0)T ∇2f (x0 +θ (x−x0)) (x−x0)
2
As ∇2f is positive semi-definite, this immediately gives
f (x) ≥ f (x0) + ∇f (x0)T (x − x0)
⇒
θ ∈ (0, 1)
f is convex.
“⇒”
Assume ∇2f is not positive semi-definite in the domain D. Then there exists x0 ∈ D
and x̂ ∈ Rn s.t. x̂T ∇2f (x0) x̂ < 0. As D is open we can find x1 := x0 + α x̂ ∈ D, for
α > 0 sufficiently small. Hence (x1 − x0)T ∇2f (x0) (x1 − x0) < 0. For x1 sufficiently
close to x0 the continuity of ∇2f gives (x1 − x0)T ∇2f (x0 + θ (x1 − x0)) (x1 − x0) < 0
for all θ ∈ (0, 1). Taylor’s theorem then yields f (x1) < f (x0) + ∇f (x0)T (x1 − x0). This
contradicts f being convex, see (a).
17
9. Vector Norms.
A vector norm on Rn is a function, k∙k, from Rn into R with the following properties:
(i) kxk ≥ 0 for all x ∈ Rn and kxk = 0 if and only if x = 0.
(ii) kα xk = |α| kxk for all α ∈ R and x ∈ Rn.
(iii) kx + yk ≤ kxk + kyk for all x, y ∈ Rn.
Common vector norms are the l1−, l2− (Euclidean), and l∞−norms:
( n
)1/2
n
X
X
kxk1 =
|xi|,
kxk2 =
x2i
,
kxk∞ = max |xi|.
i=1
i=1
Exercise.
(a) Prove that k ∙ k1, k ∙ k2 and k ∙ k∞ are norms.
(b) Given a symmetric positive definite matrix A, prove that
√
kxkA := xT A x
is a norm.
18
1≤i≤n
Example.
Draw graphs defined by kxk1 ≤ 1, kxk2 ≤ 1, kxk∞ ≤ 1 when n = 2.
l2
l1
l∞
Exercise. Prove that for all x, y ∈ Rn we have
n
X
|xi yi| ≤ kxk2 kyk2
(a)
[Scharz inequality]
i=1
and
(b)
√
1
√ kxk2 ≤ kxk∞ ≤ kxk1 ≤ n kxk2.
n
19
10. Spectral Radius.
The spectral radius of a matrix A ∈ Rn×n is defined by ρ(A) = max |λi|, where
1≤i≤n
λ1, . . . , λn are all the eigenvalues of A.
11. Matrix Norms.
For an n × n matrix A, the natural matrix norm kAk for a given vector norm k ∙ k
is defined by
kAk = max kA xk.
kxk=1
The common matrix norms are
n
q
X
|aij |, kAk2 = ρ(AT A),
kAk1 = max
1≤j≤n
| {z }
i=1
kAk∞ = max
1≤i≤n
ρ(A) if A=AT
Exercise: Compute kAk1, kAk∞, and kAk2 for A =
(Answer: kAk1 = kAk∞ = 4 and kAk2 =
20
p
7+
√
"
1 1 0
1 2 1
−1 1 2
7 ≈ 3.1.)
#
.
n
X
j=1
|aij |.
12. Convergence.
A sequence of vectors {x(k)} ⊂ Rn is said to converge to a vector x ∈ Rn if
kx(k) − xk → 0 as k → ∞ for an arbitrary vector norm k ∙ k. This is equivalent
(k)
to the componentwise convergence, i.e., xi → xi as k → ∞, i = 1, . . . , n.
A square matrix A ∈ Rn×n is said to be convergent if kAk k → 0 as k → ∞, which
is equivalent to (Ak )ij → 0 as k → ∞ for all i, j.
The following statements are equivalent:
(i) A is a convergent matrix,
(ii) ρ(A) < 1,
(iii) limk→∞ Ak x = 0, for every x ∈ Rn.
Exercise. Show that A is convergent, where
"
#
1/2 0
A=
.
1/4 1/2
21
3. Algebraic Equations
1. Decomposition Methods for Linear Equations.
A matrix A ∈ Rn×n is said to have LU decomposition if A = L U where L ∈ Rn×n
is a lower triangular matrix (lij = 0 if 1 ≤ i < j ≤ n) and U ∈ Rn×n is an upper
triangular matrix (uij = 0 if 1 ≤ j < i ≤ n).
The decomposition is unique if one assumes e.g. lii = 1 for 1 ≤ i ≤ n.


l11



 l21 l22


,

L =  l31 l32 l33


 .
.
.
.
.


ln1 ln2 ln3 . . . lnn


u11 u12 u13
...
u1n



u22 u23
...
u2n 


.
.

..
U =
. 



u
u

n−1,n−1
n−1,n 
unn
22
In general, the diagonal elements of either L or U are given and the remaining elements
of the matrices are determined by directly comparing two sides of the equation.
The linear system A x = b is then equivalent to L y = b and U x = y.
Exercise.
Show that the solution to L y = b is
y1 = b1/l11,
yi = (bi −
i−1
X
lik yk )/lii,
i = 2, . . . , n
k=1
(forward substitution) and the solution to U x = y is
xn = yn/unn,
(backward substitution).
xi = (yi −
n
X
k=i+1
23
uik xk )/uii,
i = n − 1, . . . , 1
2. Crout Algorithm.
Exercise.
Let A be tridiagonal, i.e. aij = 0 if |i − j| > 1 (aij = 0 except perhaps ai−1,i, aii and
P
ai,i+1), and strictly diagonally dominant (|aii| > j6=i |aij | holds for i = 1, . . . , n).
Show that A can be factorized as A = L U where lii = 1 for i = 1, . . . , n, u11 = a11,
and
ui,i+1 = ai,i+1
li+1,i = ai+1,i/uii
ui+1,i+1 = ai+1,i+1 − li+1,i ui,i+1
for i = 1, . . . , n − 1.
[Note: L and U are tridiagonal.]
C++ Exercise: Write a program to solve a tridiagonal and strictly diagonally dominant linear equation A x = b by the Crout algorithm. The input are the number of
variables n, the matrix A, and the vector b. The output is the solution x.
24
Exercise 3.2.
u11 = a11 and ui,i+1 = ai,i+1, li+1,i = ai+1,i/uii, ui+1,i+1 = ai+1,i+1 − li+1,i ui,i+1, for
i = 1, . . . , n − 1, can easily be shown.
It remains to show that uii 6= 0 for i = 1, . . . , n. We proceed by induction to show
that
|uii| > |ai,i+1|,
where for convenience we define an,n+1 := 0.
• i = 1: |u11| = |a11| > |a1,2|
X
• i 7→ i + 1:
ai+1,i ai,i+1 |ui+1,i+1| = ai+1,i+1 −
uii
|ai,i+1|
≥ |ai+1,i+1| − |ai+1,i| > |ai+1,i+2| X
|uii|
Overall we have that |uii| > 0 and so the Crout algorithm is well defined. Moreover,
Qn
det(A) = det(L) det(U ) = det(U ) = i=1 uii 6= 0.
≥ |ai+1,i+1| − |ai+1,i|
25
3. Choleski Algorithm.
Exercise.
Let A be a positive definite matrix. Show that A can be factorized as A = L LT
where L is a lower triangular matrix.
(i) Compute 1st column:
l11 =
√
a11,
li1 = ai1/l11,
i = 2, . . . , n.
(ii) For j = 2, . . . , n − 1 compute jth column:
ljj = (ajj −
lij = (aij −
j−1
X
k=1
j−1
X
2 12
ljk )
lik ljk )/ljj ,
k=1
26
i = j + 1, . . . , n.
(iii) Compute nth column:
lnn = (ann −
27
n−1
X
k=1
1
2 2
lnk
) .
4. Iterative Methods for Linear Equations.
Split A into A = M + N with M nonsingular and convert the equation A x = b into
an equivalent equation x = C x + d with C = −M −1 N and d = M −1 b.
Choose an initial vector x(0) and then generate a sequence of vectors by
x(k) = C x(k−1) + d,
k = 1, 2, . . .
The resulting sequence converges to the solution of A x = b, for an arbitrary initial
vector x(0), if and only if ρ(C) < 1.
The objective is to choose M such that M −1 is easy to compute and ρ(C) < 1.
The iteration stops if kx(k) − x(k−1)k < ε.
[Note: In practice one solves M x(k) = −N x(k−1) + b, for k = 1, 2, . . ..]
28
Claim.
The iteration x(k) = C x(k−1) + d is convergent if and only if ρ(C) < 1.
Proof.
Define e(k) := x(k) − x, the error of the kth iterate. Then
e(k) = C x(k−1) + d − (C x + d) = C (x(k−1) − x) = C e(k−1) = C 2 e(k−2) = . . . C k e(0) ,
where e(0) = x(0) − x is an arbitrary vector.
Assume C is similar to the diagonal matrix Λ = diag(λ1, . . . , λn), where λi are the
eigenvalues of C.
⇒
∃ X nonsingular s.t. C = X Λ X −1
⇒
e(k) = C k e(0) = X Λk X −1 e(0)
⇐⇒
⇐⇒
|λi| < 1
ρ(C) < 1 .

λk1

...
=X
∀ i = 1, . . . , n
29
λkn

 −1 (0)
 X e → 0 as k → ∞
5. Jacobi Algorithm.
Exercise: Let M = D and N = L + U (L strict lower triangular part of A, D
diagonal, U strict upper triangular part). Show that the ith component at the kth
iteration is


i−1
n
X
X
1
(k)
(k−1)
(k−1) 
bi −
xi =
aij xj
−
aij xj
aii
j=1
j=i+1
for i = 1, . . . , n.
6. Gauss–Seidel Algorithm.
Exercise: Let M = D + L and N = U . Show that the ith component at the kth
iteration is


i−1
n
X
X
1 
(k)
(k)
(k−1) 
xi =
bi −
aij xj −
aij xj
aii
j=1
j=i+1
for i = 1, . . . , n.
30
7. SOR Algorithm.
Exercise.
Let M = ω1 D + L and N = U + (1 − ω1 )D where 0 < ω < 2. Show that the ith
component at the kth iteration is


i−1
n
X
X
1
(k)
(k−1)
(k)
(k−1) 
+ ω bi −
aij xj −
aij xj
xi = (1 − ω) xi
aii
j=1
j=i+1
for i = 1, . . . , n.
C++ Exercise: Write a program to solve a diagonally dominant linear equation
A x = b by the SOR algorithm. The input are the number of variables n, the matrix
A, the vector b, the initial vector x0, the relaxation parameter ω, and tolerance ε.
The output is the number of iterations k and the approximate solution xk .
31
8. Special Matrices.
If A is strictly diagonally dominant, then Jacobi and Gauss–Seidel converge for any
initial vector x(0). In addition, SOR converges for ω ∈ (0, 1].
If A is positive definite and 0 < ω < 2, then the SOR method converges for any
initial vector x(0).
If A is positive definite and tridiagonal, then ρ(CGS ) = [ρ(CJ )]2 < 1 and the optimal
2
p
∈ [1, 2). With this choice
choice of ω for the SOR method is ω =
1 + 1 − ρ(CGS )
of ω, ρ(CSOR ) = ω − 1 ≤ ρ(CGS ).
Exercise.
Find the optimal ω for the SOR method for the matrix


4 3 0


A =  3 4 −1  .
0 −1 4
(Answer: ω ≈ 1.24.)
32
9. Condition Numbers.
The condition number of a nonsingular matrix A relative to a norm k ∙ k is defined
by
κ(A) = kAk ∙ kA−1k.
Note that κ(A) ≥ kA A−1k = kIk = max kxk = 1.
kxk=1
A matrix A is well-conditioned if κ(A) is close to one and is ill-conditioned if κ(A)
is much larger than one.
1
Suppose kδAk <
. Then the solution x
e to (A + δA) x
e = b + δb approximates
−1
kA k
the solution x of A x = b with error estimate
κ(A)
kx − x
ek
kδbk kδAk
.
≤
+
kxk
1 − kδAk kA−1k kbk
kAk
In particular, if δA = 0 (no perturbation to matrix A) then
kx − x̃k
kδbk
≤ κ(A)
.
kxk
kbk
33
Example.
Consider Example 1.3.
A=
−1
⇒
A
Recall
1 1
1 1.01
1
=
det A
!
1.01 −1
−1 1
!
kAk1 = max
1≤j≤n
Hence
n
X
i=1
=
101 −100
−100 100
|aij | .
kA−1k1 = max(201, 200) = 201 .
kAk1 = max(2, 2.01) = 2.01 ,
⇒
1.01 −1
−1 1
1
=
0.01
!
κ1(A) = kAk1 ∙ kA−1k1 = 404.01 1
Similarly κ∞ = 404.01 and κ2 = ρ(A) ρ(A−1) =
34
λmax
λmin
(ill-conditioned!)
= 402.0075.
!
10. Hilbert Matrix.
An n × n Hilbert matrix Hn = (hij ) is defined by hij = 1/(i + j − 1) for i, j =
1, 2, . . . , n.
Hilbert matrices are notoriously ill-conditioned and κ(Hn) → ∞ very rapidly as
n → ∞.

Exercise.

1
1
1
.
.
.

2
n 


1
1
1 


...


3
n+1 
Hn =  2
.
.. 
.



1
1
1 
...
n n+1
2n − 1
Compute the condition numbers κ1(H3) and κ1(H6).
35
(Answer: κ1(H3) = 748 and κ1(H6) = 2.9 × 106.)
36
11. Fixed Point Method for Nonlinear Equations.
A function g : R → R has a fixed point xˉ if
g(ˉ
x) = xˉ.
A function g is a contraction mapping on [a, b] if g : [a, b] → [a, b] and
|g 0(x)| ≤ L < 1,
∀ x ∈ (a, b)
where L is a constant.
Exercise.
Assume g is a contraction mapping on [a, b]. Prove that g has a unique fixed point xˉ
in [a, b], and for any x0 ∈ [a, b], the sequence defined by
xn+1 = g(xn),
n ≥ 0,
converges to xˉ. The algorithm is called fixed point iteration.
37
Exercise 3.11.
Existence:
Define h(x) = x − g(x) on [a, b]. Then h(a) = a − g(a) ≤ 0 and h(b) = b − g(b) ≥ 0.
As h is continuous, ∃ c ∈ [a, b] s.t. h(c) = 0. I.e. c = g(c).
X
Uniqueness:
Suppose p, q ∈ [a, b] are two fixed points. Then
|p − q| = |g(p) − g(q)|
⇒
(1 − L) |p − q| ≤ 0
⇒
=
|{z}
MVT, α∈(a,b)
|g 0(α) (p − q)| ≤ L |p − q|
|p − q| ≤ 0
⇒
p = q.
Convergence:
|xn − xˉ| = |g(xn−1) − g(ˉ
x)| = |g 0(α) (xn−1 − xˉ)|
≤ L |xn−1 − xˉ| ≤ . . . ≤ Ln |x0 − xˉ| → 0 as n → ∞ .
Hence
xn → xˉ as n → ∞ .
38
X
X
12. Newton Method for Nonlinear Equations.
Assume that f ∈ C 1([a, b]), f (ˉ
x) = 0 (ˉ
x is a root or zero) and f 0(ˉ
x) 6= 0.
The Newton method can be used to find the root xˉ by generating a sequence {xn}
satisfying
f (xn)
,
n = 0, 1, . . .
xn+1 = xn − 0
f (xn)
provided f 0(xn) 6= 0 for all n.
The sequence xn converges to the root xˉ as long as the initial point x0 is sufficiently
close to xˉ.
The algorithm stops if |xn+1 − xn| < ε, a prescribed error tolerance, and xn+1 is
taken as an approximation to xˉ .
Geometric Interpretation:
Tangent line at (xn, f (xn)) is
Y = f (xn) + f 0(xn) (X − xn).
39
n)
Setting Y = 0 yields xn+1 := X = xn − ff0(x
(xn ) .
40
13. Choice of Initial Point.
Suppose f ∈ C 2([a, b]) and f (ˉ
x) = 0 with f 0(ˉ
x) 6= 0. Then there exists δ > 0 such
that the Newton method generates a sequence xn converging to xˉ for any initial point
x0 ∈ [ˉ
x − δ, xˉ + δ] (x0 can only be chosen locally).
However, if f satisfies the following additional conditions:
1. f (a)f (b) < 0,
2. f 00 does not change sign on [a, b],
3. tangent lines to the curve y = f (x) at both a and b cut the x-axis within [a, b];
f (b)
,
b
−
(i.e. a − ff0(a)
(a)
f 0 (b) ∈ [a, b])
then f (x) = 0 has a unique root xˉ in [a, b] and Newton method converges to xˉ for
any initial point x0 ∈ [a, b] (x0 can be chosen globally).
Example.
Use the Newton method to compute x =
can be any point in [1, c].
41
√
c, c > 1, and show that the initial point
Example.
√
Find x = c, c > 1.
Answer.
x is root of f (x) := x2 − c.
2
f (xn)
xn − c 1
c
=
,
= xn −
Newton: xn+1 = xn − 0
xn +
f (xn)
2 xn
2
xn
How to choose x0?
Check the 3 conditions on [1, c].
1. f (1) = 1 − c < 0, f (c) = c2 − c > 0.
2. f 00 = 2
⇒ f (1) f (c) < 0
X
n ≥ 0.
X
3. Tangent line at 1: Y = f (1) + f 0(1) (X − 1) = 1 − c + 2 (X − 1)
c−1
∈ (1, c).
X
Let Y = 0, then X = 1 +
2
Tangent line at c: Y = f (c) + f 0(c) (X − c) = c2 − c + 2 c (X − c)
c−1
Let Y = 0, then X = c −
∈ (1, c).
X
2
42
⇒
Newton convergence for any x0 ∈ [1, c].
43
Numerical Example.
√
√
Find 7.
(From calculator: 7 = 2.6457513.)
Newton converges for all x0 ∈ [1, 7]. Choose x0 = 4.
1
7
x1 =
= 2.875
x0 +
2
x0
x2 = 2.6548913
x3 = 2.6457670
x4 = 2.6457513
Comparison to bisection method with I0 = [1, 7]:
I1 = [1, 4]
I2 = [2.5, 4]
I3 = [2.5, 3.25]
I4 = [2.5, 2.875]
..
I25 = [2.6457512, 2.6457513]
44
14. Pitfalls.
Here are some difficulties which may be encountered with the Newton method:
1. {xn} may wander around and not converge (there are only complex roots to the
equation),
2. initial approximation x0 is too far away from the desired root and {xn} converges
to some other root (this usually happens when f 0(x0) is small),
3. {xn} may diverge to +∞ (the function f is positive and monotonically decreasing
on an unbounded interval), and
4. {xn} may repeat (cycling).
45
15. Rate of Convergence.
Suppose {xn} is a sequence that converges to xˉ .
The convergence is said to be linear if there is a constant r ∈ (0, 1) such that
|xn+1 − xˉ|
≤ r,
|xn − xˉ|
for all n sufficiently large.
The convergence is said to be superlinear if
|xn+1 − xˉ|
lim
= 0.
n→∞ |xn − x
ˉ|
In particular, the convergence is said to be quadratic if
|xn+1 − xˉ|
≤ M,
2
|xn − xˉ|
for all n sufficiently large.
where M is a positive constant, not necessarily less than 1.
n
Example. xn = xˉ + 0.5n linear, xn = xˉ + 0.52 quadratic.
Example. Show that the Newton method converges quadratically.
46
Example.
f (x)
Define g(x) = x − 0 . Then the Newton method is given by
f (x)
xn+1 = g(xn) .
Moreover, f (ˉ
x) = 0 and f 0(ˉ
x) 6= 0 imply that
g(ˉ
x) = xˉ ,
(f 0)2 − f f 00
f 00(ˉ
x)
x) = 1 −
(ˉ
x
)
=
f
(ˉ
x
)
= 0,
g (ˉ
0
2
0
2
(f )
x))
(f (ˉ
x)
f 00(ˉ
00
g (ˉ
x) = 0
.
f (ˉ
x)
0
Assuming that xn → xˉ we have that
|g 0(ˉ
x) (xn − xˉ) + 12 g 00(ηn) (xn − xˉ)2|
|xn+1 − xˉ| |g(xn) − g(ˉ
x)|
=
=
|{z}
2
2
2
|x
|xn − xˉ|
|xn − xˉ|
−
x
ˉ
|
n
Taylor
1
1
= |g 00(ηn)| → |g 00(ˉ
x)| =: λ
as n → ∞ .
2
2
47
Hence |xn+1 − xˉ| ≈ λ |xn − xˉ|2
⇒
quadratic convergence rate.
48
4. Interpolations
1. Polynomial Approximation.
For any continuous function f defined on an interval [a, b], there exist polynomials P
that can be as “close” to the given function as desired.
Taylor polynomials agree closely with a given function at a specific point, but they
concentrate their accuracy only near that point.
A good polynomial needs to provide a relatively accurate approximation over an entire
interval.
49
2. Interpolating Polynomial – Lagrange Form.
Suppose xi ∈ [a, b], i = 0, 1, . . . , n, are pairwise distinct mesh points in [a, b].
The Lagrange polynomial p is a polynomial of degree ≤ n such that
p(xi) = f (xi),
i = 0, 1, . . . , n.
p can be constructed explicitly as
p(x) =
n
X
Li(x)f (xi)
i=0
where Li is a polynomial of degree n satisfying
Li(xj ) = 0,
This results in
Li(x) =
j 6= i,
Y x − xj j6=i
xi − xj
Li(xi) = 1.
i = 0, 1, . . . , n.
p is called linear interpolation if n = 1 and quadratic interpolation if n = 2.
50
Exercise.
Find the Lagrange polynomial p for the following points (x, f (x)): (1, 0), (−1, −3),
and (2, −4). Assume that a new point (0, 2) is observed, and construct a Lagrange
polynomial to incorporate this new information in it.
Error formula.
Suppose f is n + 1 times differentiable on [a, b]. Then it holds that
f (n+1)(ξ)
f (x) = p(x) +
(x − x0) ∙ ∙ ∙ (x − xn),
(n + 1)!
where ξ = ξ(x) lies in (a, b).
Proof.
Define g(x) = f (x) − p(x) + λ
n
Y
j=0
(x − xj ), where λ ∈ R is a constant. Clearly,
g(xj ) = 0 for j = 0, . . . , n. To estimate the error at x = α 6∈ {x0, . . . , xn}, choose λ
such that g(α) = 0.
51
Hence
g(x) = f (x) − p(x) − (f (α) − p(α))
⇒
g has at least n + 2 zeros: x0, . . . , xn, α.
n Y
x − xj
j=0
α − xj
.
Mean Value Theorem yields that
g 0 has at least n + 1 zeros
..
g (n+1) has at least 1 zero, say ξ
Moreover, as p is polynomial of degree ≤ n, it holds that p(n+1) = 0.
52
Hence
⇒
(n + 1)!
0 = g (n+1)(ξ) = f (n+1)(ξ) − (f (α) − p(α)) Qn
j=0 (α − xj )
n
f (n+1)(ξ) Y
Error = f (α) − p(α) =
(α − xj ) .
(n + 1)! j=0
53
3. Trapezoid Rule.
We can use linear interpolation (n = 1, x0 = a, x1 = b) to approximate f (x) on [a, b]
Rb
and then compute a f (x) dx to get the trapezoid rule:
Z b
1
f (x) dx ≈ (b − a) [f (a) + f (b)] .
2
a
If we partition [a, b] into n equal subintervals with mesh points xi = a + ih, i =
0, . . . , n, and step size h = (b − a)/n, we can derive the composite trapezoid rule:
Z b
n−1
X
h
f (x) dx ≈ [f (x0) + 2
f (xi) + f (xn)] .
2
a
i=1
The approximation error is in the order of O(h2) if |f 00| is bounded.
54
Use linear interpolation (n = 1, x0 = a, x1 = b) to approximate f (x) on [a, b] and then
Rb
compute a f (x) dx.
Answer.
The linear interpolating polynomial is p(x) = f (a) L0(x) + f (b) L1(x), where
x−a
L1(x) =
.
b−a
Z b
Z b
Z
b
b
x−b
x−a
dx + f (b)
dx
f (x) dx ≈
p(x) dx = f (a)
a
−
b
b
−
a
a
a
a
a
1 1
1 1
2 b
= f (a)
(x − b) |a +f (b)
(x − a)2 |ba
a−b2
b−a2
1 1
1 1
2
(−(a − b) ) + f (b)
(b − a)2
= f (a)
a−b2
b−a2
b−a
=
(f (a) + f (b))
← Trapezoid Rule
2
L0(x) =
⇒
x−b
,
a−b
Z
and
[Of course, one could have arrived at this formula with a simple geometric argument.]
55
Error Analysis.
f 00(ξ)
Let f (x) = p(x) + E(x), where E(x) =
(x − a) (x − b) with ξ ∈ (a, b). Assume
2
that |f 00| ≤ M is bounded. Then
Z
Z
Z b
b
b
M
|E(x)| dx ≤
(x − a) (b − x) dx
E(x) dx ≤
a
2
a
a
Z b
M
(x − a) [(b − a) − (x − a)] dx
=
2 a
Z b
M
2
=
−(x − a) + (b − a) (x − a) dx
2 a
M
1
1
− (b − a)3 + (b − a)3 dx
=
2
3
2
M
(b − a)3 .
=
12
56
The composite formula can be obtained by considering the partitioning of [a, b] into
b−a
a = x0 < x1 < . . . < xn−1 < xn = b, where xi = a + i h with h :=
.
n
Z
b
f (x) dx =
a
n−1 Z
X
i=0
xi+1
xi
f (x) dx ≈
=
n−1
X
xi+1 − xi
i=0
n−1
X
i=0
2
(f (xi) + f (xi+1))
h
(f (xi) + f (xi+1))
2
1
1
=h
f (a) + f (x1) + . . . + f (xn−1) + f (b) .
2
2
Error analysis then yields that
M (b − a) 2
M 3
h n=
h = O(h2) .
Error ≤
12
12
57
4. Simpson’s Rule.
Exercise.
Use quadratic interpolation (n = 2, x0 = a, x1 = a+b
2 , x2 = b) to approximate f (x)
Rb
on [a, b] and then compute a f (x) dx to get the Simpson’s rule:
Z b
1
a+b
f (x) dx ≈ (b − a) [f (a) + 4 f (
) + f (b)] .
6
2
a
Derive the composite Simpson’s rule:
Z b
n/2
n/2
X
X
h
f (x) dx ≈ [f (x0) + 2
f (x2i−2) + 4
f (x2i−1) + f (xn)] ,
3
a
i=2
i=1
where n is an even number and xi and h are chosen as in the composite trapezoid
rule.
[One can show that the approximation error is in the order of O(h4) if |f (4)| is
bounded.]
58
5. Newton–Cotes Formula.
Suppose x0, . . . , xn are mesh points in [a, b], usually mesh points are equally spaced
and x0 = a, xn = b, then integral can be approximated by the Newton–Cotes
formula:
Z b
n
X
f (x) dx ≈
Ai f (xi)
a
i=0
where parameters Ai are determined in such a way that the integral is exact for all
polynomials of degree ≤ n.
[Note: n+1 unknowns Ai and n+1 coefficients for polynomial of degree n.] Exercise.
Use Newton–Cotes formula to derive the trapezoid rule and the Simpson’s rule. Prove
that if f is n + 1 times differentiable and |f (n+1)| ≤ M on [a, b] then
Z b
Z bY
n
n
X
M
|
f (x) dx −
Ai f (xi)| ≤
|x − xi| dx.
(n + 1)! a i=0
a
i=0
59
Exercise 4.5.
We have that
Z b
n
X
q(x) dx =
Ai q(xi)
a
for all polynomials q of degree ≤ n.
i=0
for the data points x0, x1, . . . , xn.
Let q(x) = Lj (x), where Lj is the jth Lagrange polynomial

1 i = j
I.e. Lj is of degree n and satisfies Lj (xi) = δij =
. Now
0 i 6= j
Z b
n
X
Lj dx =
Ai Lj (xi) = Aj
⇒
Z
a
b
a
f (x) dx ≈
=
i=0
n
X
Ai f (xi) =
i=0
Z bX
n
a
n
X
f (xi)
i=0
f (xi) Li(x) dx =
i=0
Z
Z
b
Li(x) dx
a
b
p(x) dx,
a
where p(x) is the interpolating Lagrange polynomial to f . Hence we find trapezoid
60
(n = 1) and Simpson’s rule (n = 2, with x1 =
The Lagrange polynomial has the error term
f (x) = p(x) + E(x),
a+b
2 ).
f (n+1)(ξ)
(x − x0) ∙ ∙ ∙ (x − xn),
E(x) :=
(n + 1)!
where ξ = ξ(x) lies in (a, b). Hence
Z
Z
Z
Z b
b
b
b
p(x) dx = E(x) dx ≤
|E(x)| dx
f (x) dx −
a
a
a
a
Z bY
n
M
≤
|x − xi| dx.
(n + 1)! a i=0
61
6. Ordinary Differential Equations.
An initial value problem for an ODE has the form
x0(t) = f (t, x(t)), a ≤ t ≤ b and x(a) = x0.
(1) is equivalent to the integral equation:
Z t
f (s, x(s)) ds, a ≤ t ≤ b.
x(t) = x0 +
(1)
(2)
a
To solve (2) numerically we divide [a, b] into subintervals with mesh points ti = a+ih,
i = 0, . . . , n, and step size h = (b − a)/n. (2) implies
Z ti+1
f (s, x(s)) ds, i = 0, . . . , n − 1.
x(ti+1) = x(ti) +
ti
62
(a) If we approximate f (s, x(s)) on [ti, ti+1] by f (ti, x(ti)), we get the Euler (explicit)
method for equation (1):
wi+1 = wi + hf (ti, wi),
w0 = x0.
We have x(ti+1) ≈ wi+1 if h is sufficiently small.
[Taylor: x(ti+1) = x(ti) + x0(ti) h + O(h2) = x(ti) + f (ti, x(ti)) h + O(h2).]
(b) If we approximate f (s, x(s)) on [ti, ti+1] by linear interpolation with points (ti, f (ti, x(ti))
and (ti+1, f (ti+1, x(ti+1))), we get the trapezoidal (implicit) method for equation
(1):
h
w0 = x0.
wi+1 = wi + [f (ti, wi) + f (ti+1, wi+1)],
2
(c) If we combine the Euler method with the trapezoidal method, we get the modified
Euler (explicit) method (or Runge–Kutta 2nd order method):
wi+1
h
= wi + [f (ti, wi) + f (ti+1, wi + hf (ti, wi))],
|
{z
}
2
≈ wi+1
63
w0 = x0.
7. Divided Differences.
Suppose a function f and (n + 1) distinct points x0, x1, . . . , xn are given. Divided
differences of f can be expressed in a table format as follows:
xk
x0
x1
x2
x3
..
0DD 1DD
2DD
3DD
...
f [x0]
f [x1] f [x0, x1]
f [x2] f [x1, x2] f [x0, x1, x2]
f [x3] f [x2, x3] f [x1, x2, x3] f [x0, x1, x2, x3]
...
64
where
f [xi] = f (xi)
f [xi+1] − f [xi]
f [xi, xi+1] =
xi+1 − xi
f [xi+1, xi+2, . . . , xi+k ] − f [xi, xi+1, . . . , xi+k−1]
f [xi, xi+1, . . . , xi+k ] =
xi+k − xi
f [x2, . . . , xn] − f [x1, . . . , xn−1]
f [x1, . . . , xn] =
xn − x1
65
8. Interpolating Polynomial – Newton Form.
One drawback of Lagrange polynomials is that there is no recursive relationship
between Pn−1 and Pn, which implies that each polynomial has to be constructed
individually. Hence, in practice one uses the Newton polynomials.
The Newton interpolating polynomial Pn of degree n that agrees with f at the points
x0, x1, . . . , xn is given by
Pn(x) = f [x0] +
n
X
f [x0, x1, . . . , xk ]
k−1
Y
i=0
k=1
(x − xi).
Note that Pn can be computed recursively using the relation
Pn(x) = Pn−1(x) + f [x0, x1, . . . , xn](x − x0)(x − x1) ∙ ∙ ∙ (x − xn−1).
[Note that f [x0, x1, . . . , xk ] can be found on the diagonal of the DD table.]
Exercise.
Suppose values (x, y) are given as (1, −2), (−2, −56), (0, −2), (3, 4), (−1, −16), and
(7, 376). Is there a cubic polynomial that takes these values? (Answer: 2x3 − 7x2 +
66
5x − 2.)
67
Exercise 4.8.
Data points: (1, −2), (−2, −56), (0, −2), (3, 4), (−1, −16), (7, 376).
xk
1
−2
0
3
−1
7
0DD 1DD 2DD 3DD 4DD 5DD
−2
−56 18
−2
27 −9
4
2 −5
2
−16
5 −3
2
0
376
49
11
2
0
0
Newton polynomial:
p(x) = −2 + 18 (x − 1) − 9 (x − 1) (x + 2) + 2 (x − 1) (x + 2) x
= 2x3 − 7x2 + 5x − 2 .
[Note: We could have stopped at the row for x3 = 3 and checked whether p3 goes through
the remaining data points.]
68
9. Piecewise Polynomial Approximations.
Another drawback of interpolating polynomials is that Pn tends to oscillate widely
when n is large, which implies that Pn(x) may be a poor approximation to f (x) if x
is not close to the interpolating points.
If an interval [a, b] is divided into a set of subintervals [xi, xi+1], i = 0, 1, . . . , n−1, and
on each subinterval a different polynomial is constructed to approximate a function
f , such an approximation is called spline.
The simplest spline is the linear spline P which approximates the function f on the
interval [xi, xi+1], i = 0, 1, . . . , n − 1, with P agreeing with f at xi and xi+1.
Linear splines are easy to construct but are not smooth at points x1, x2, . . . , xn−1.
69
10. Natural Cubic Splines.
Given a function f defined on [a, b] and a set of points a = x0 < x1 < ∙ ∙ ∙ < xn = b,
a function S is called a natural cubic spline if there exist n cubic polynomials Si
such that:
(a) S(x) = Si(x) for x in [xi, xi+1] and i = 0, 1, . . . , n − 1;
(b) Si(xi) = f (xi) and Si(xi+1) = f (xi+1) for i = 0, 1, . . . , n − 1;
0
(c) Si+1
(xi+1) = Si0(xi+1) for i = 0, 1, . . . , n − 2;
00
(xi+1) = Si00(xi+1) for i = 0, 1, . . . , n − 2;
(d) Si+1
(e) S 00(x0) = S 00(xn) = 0.
70
Natural Cubic Splines.
Si
Si+1
|
|
xi
(a)
4 n parameters
(b)
2 n equations
(c)
n − 1 equations
(d)
n − 1 equations
(e)
2 equations
|
xi+1













xi+2
4 n equations
71
Example. Assume S is a natural cubic spline that interpolates f ∈ C 2([a, b]) at
the nodes a = x0 < x1 < ∙ ∙ ∙ < xn = b. We have the following smoothness property
of cubic splines:
Z
Z
b
a
In fact, it even holds that
Z b
[S 00(x)]2 dx ≤
00
2
b
a
[S (x)] dx = min
g∈G
a
[f 00(x)]2 dx.
Z
b
[g 00(x)]2 dx,
a
where G := {g ∈ C 2([a, b]) : g(xi) = f (xi) i = 0, 1, . . . , n}.
Exercise: Determine the parameters a to h so that S(x) is a natural cubic spline,
where
S(x) = ax3 + bx2 + cx + d for x ∈ [−1, 0] and
S(x) = ex3 + f x2 + gx + h for x ∈ [0, 1]
with interpolation conditions S(−1) = 1, S(0) = 2, and S(1) = −1.
(Answer: a = −1, b = −3, c = −1, d = 2, e = 1, f = −3, g = −1, h = 2.)
72
11. Computation of Natural Cubic Splines.
Denote
ci = S 00(xi),
i = 0, 1, . . . , n.
Then c0 = cn = 0.
Since Si is a cubic function on [xi, xi+1], we know that Si00 is a linear function on
[xi, xi+1]. Hence it can be written as
xi+1 − x
x − xi
00
Si (x) = ci
+ ci+1
hi
hi
where hi := xi+1 − xi.
73
Exercise.
Show that Si is given by
ci
ci+1
Si(x) =
(xi+1 − x)3 +
(x − xi)3 + pi (xi+1 − x) + qi (x − xi),
6 hi
6 hi
where
f (xi) ci hi
f (xi+1) ci+1 hi
,
qi =
pi =
−
−
6
6
hi
hi
and c1, . . . , cn−1 satisfy the linear equations:
hi−1 ci−1 + 2 (hi−1 + hi) ci + hi ci+1 = ui,
where
ui = 6 (di − di−1),
di =
for i = 1, 2, . . . , n − 1.
f (xi+1) − f (xi)
hi
C++ Exercise: Write a program to construct a natural cubic spline. The inputs are
the number of points n + 1 and all the points (xi, yi), i = 0, 1, . . . , n. The output is
a natural cubic spline expressed in terms of the functions Si defined on the intervals
[xi, xi+1] for i = 0, 1, . . . , n − 1.
74
5. Basic Probability Theory
1. CDF and PDF.
Let (Ω, F , P ) be a probability space, X be a random variable.
The cumulative distribution function (cdf) F of X is defined by
F (x) = P (X ≤ x),
x ∈ R.
F is an increasing right-continuous function satisfying
F (−∞) = 0,
F (+∞) = 1.
If F is absolutely continuous then X has a probability density function (pdf) f defined
by
f (x) = F 0(x), x ∈ R.
F can be recovered from f by the relation
Z x
F (x) =
f (u) du.
−∞
75
2. Normal Distribution.
A random variable X has a normal distribution with parameters μ and σ 2, written
X ∼ N (μ, σ 2), if X has the pdf
(x−μ)2
1
−
φ(x) = √ e 2 σ2
σ 2π
for x ∈ R.
If μ = 0 and σ 2 = 1 then X is called a standard normal random variable and its cdf
is usually written as
Z x
1 − u2
√ e 2 du.
Φ(x) =
2π
−∞
If X ∼ N (μ, σ 2) then the characteristic function (Fourier transform) of X is given
by
where i =
√
c(s) = E(e
i sX
2 2
)=e
i μ t− σ 2s
,
−1. The moment generating function (mgf) is given by
sX
m(s) = E(e ) = e
76
2 2
μ t+ σ 2s
.
3. Approximation of Normal CDF.
It is suggested that the standard normal cdf Φ(x) can be approximated by a “polye
nomial” Φ(x)
as follows:
e
Φ(x)
:= 1 − Φ0(x) (a1k + a2k 2 + a3k 3 + a4k 4 + a5k 5)
(3)
e
e
when x ≥ 0 and Φ(x)
:= 1 − Φ(−x)
when x < 0.
1
, γ = 0.2316419, a1 = 0.319381530, a2 =
The parameters are given by k = 1+γx
−0.356563782, a3 = 1.781477937, a4 = −1.821255978, and a5 = 1.330274429. This
approximation has a maximum absolute error less than 7.5 × 10−8 for all x.
C++ Exercise: Write a program to compute Φ(x) with (3) and compare the result
with that derived with the composite Simpson Rule (see Exercise 4.4). The input is
a variable x and the number of partitions n over the interval [0, x]. The output is the
results from the two methods and their error.
77
4. Lognormal Random Variable.
Let Y = eX and X be a N (μ, σ 2) random variable. Then Y is a lognormal random
variable.
Exercise: Show that
1 2
2
E(Y ) = eμ+ 2 σ ,
E(Y 2) = e2μ+2σ .
5. An Important Formula in Pricing European Options.
If V is lognormally distributed and the standard deviation of ln V is s then
1
E(max(V − K, 0)) = E(V ) Φ(d1) − K Φ(d2)
where
1
1 E(V ) s
+
and d2 = d1 − s.
d1 = ln
s
K
2
K will be later denoted by X. We use a different notation here to avoid an abuse of notation, as K is not a random variable.
78
1 E(V ) s
d1 = ln
+
and d2 = d1 − s.
s
K
2
+
E(V − K) = E(V ) Φ(d1) − K Φ(d2),
Proof.
Let g be the pdf of V . Then
Z
E(V − K)+ =
∞
−∞
+
(v − K) g(v) dv =
Z
∞
K
(v − K) g(v) dv.
As V is lognormal, ln V is normal ∼ N (m, s2), where m = ln(E(V )) − 12 s2.
Let Y :=
ln V −m
,
s
i.e. V = e
√1
2π
m+s Y
E(V − K)+ = E(em+s Y
. Then Y ∼ N (0, 1) with pdf: φ(y) =
Z ∞
+
m+s y
− K) =
− K φ(y) dy
e
ln K−m
Z ∞s
Z ∞
=
em+s y φ(y) dy − K
ln K−m
s
= I1 − K I2 .
79
ln K−m
s
e
2
− y2
.
φ(y) dy
Z
∞
1 − y2 +m+s y
√ e 2
I1 =
dy
ln K−m
2π
Z ∞s
1 − (y−s)2 +m+ s2
2 dy
√ e 2
=
ln K−m
2π
s Z
∞
2
s
1 − y2
m+ 2
√ e 2 dy
[y − s 7→ y]
=e
ln K−m −s
2π
s 2
2
s
s
ln K − m
ln K − m
= em+ 2 1 − Φ
−s
= em+ 2 Φ −
+s
s
s
!
s2
− ln K + ln E(V ) − 2
= eln E(V ) Φ
+s
s
1 E(V ) s
= E(V ) Φ(d1) ,
ln
+
= E(V ) Φ
s
K
2
on recalling m = ln(E(V )) − 12 s2.
80
Similarly
ln K − m
I2 = 1 − Φ
s
ln K − m
=Φ −
s
81
= Φ(d1 − s) = Φ(d2) .
6. Correlated Random Variables.
Assume X = (X1, . . . , Xn) is an n-vector of random variables.
The mean of X is an n-vector μ = (E(X1), . . . , E(Xn)).
The covariance of X is an n × n-matrix Σ with components
Σij = (CovX)ij = E((Xi − μi)(Xj − μj )) .
The variance of Xi is given by σi2 = Σii and the correlation between Xi and Xj is
Σ
given by ρij = σiσijj .
X is called a multi-dimensional normal vector, written as X ∼ N (μ, Σ), if X has pdf
1
1
1
T −1
exp
−
Σ (x − μ)
(x
−
μ)
f (x) =
2
(2π)n/2 (detΣ)1/2
where Σ is a symmetric positive definite matrix.
82
7. Convergence.
Let {Xn} be a sequence of random variables. There are four types of convergence
concepts associated with {Xn}:
a.s.
• Almost sure convergence, written Xn −→ X, if there exists a null set N such
that for all ω ∈ Ω \ N one has
Xn(ω) → X(ω),
P
n → ∞.
• Convergence in probability, written Xn −→ X, if for every ε > 0 one has
P (|Xn − X| > ε) → 0,
Lp
n → ∞.
• Convergence in norm, written Xn −→ X, if Xn, X ∈ Lp and
E|Xn − X|p → 0,
D
n → ∞.
• Convergence in distribution, written Xn −→ X, if for any x at which P (X ≤ x)
is continuous one has
P (Xn ≤ x) → P (X ≤ x),
83
n → ∞.
8. Strong Law of Large Numbers.
Let {Xn} be independent, identically distributed (iid) random variables with finite
expectation E(X1) = μ. Then
Zn a.s.
−→ μ
n
where Zn = X1 + ∙ ∙ ∙ + Xn.
9. Central Limit Theorem. Let {Xn} be iid random variables with finite expectation μ and finite variance σ 2 > 0. For each n, let
Zn = X1 + ∙ ∙ ∙ + Xn.
Then
Zn
n
−μ
√σ
n
=
Zn − nμ D
√
−→ Z
nσ
where Z is a N (0, 1) random variable, i.e.,
Z z
2
Zn − nμ
1
− x2
P( √
≤ z) → √
e
dx,
nσ
2π −∞
84
n → ∞.
10. Lindeberg–Feller Central Limit Theorem.
Suppose X is a triangular array of random variables, i.e.,
n
X = {X1n, X2n, . . . , Xk(n)
: n ∈ {1, 2, . . .}}, with k(n) → ∞ as n → ∞,
n
are independently distributed and are bounded
such that, for each n, X1n, . . . , Xk(n)
in absolute value by a constant yn with yn → 0. Let
n
.
Zn = X1n + ∙ ∙ ∙ + Xk(n)
If E(Zn) → μ and var(Zn) → σ 2 > 0, then Zn converges in distribution to a normally
distributed random variable with mean μ and variance σ 2.
Note: Lindeberg–Feller implies the Central Limit Theorem (5.9).
85
If X1, X2, . . . are iid with expectation μ and variance σ 2, then define
Xi − μ
n
Xi := √
,
i = 1, 2, . . . , k(n) := n.
nσ
n
For each n, X1n, . . . , Xk(n)
are independent and
E(Xi) − μ μ − μ
√
=
= √
= 0,
nσ
nσ
1
1 2 1
VarX
−
μ
=
σ = .
Var(Xin) =
i
2
2
nσ
nσ
n
P
n
= ( ni=1 Xi − n μ) √1nσ , then
Let Zn = X1n + ∙ ∙ ∙ + Xk(n)
E(Xin)
E(Zn) =
k(n)
X
E(Xin) = 0 ,
i=1
k(n)
Var(Zn) =
X
Var(Xin) = 1 .
i=1
Hence, by Lindeberg–Feller,
D
Zn −→ Z ∼ N (0, 1) .
86
6. Optimization
1. Unconstrained Optimization.
Given f : Rn → R.
Minimize f (x) over x ∈ Rn.
f has a local minimum at a point xˉ if f (ˉ
x) ≤ f (x) for all x near xˉ, i.e.
∃ε>0
s.t. f (ˉ
x) ≤ f (x)
∀ x : kx − xˉk < ε .
f has a global minimum at xˉ if
f (ˉ
x) ≤ f (x)
87
∀ x ∈ Rn .
2. Optimality Conditions.
• First order necessary conditions:
Suppose that f has a local minimum at xˉ and that f is continuously differentiable
in an open neighbourhood of xˉ . Then ∇f (ˉ
x) = 0. (ˉ
x is called a stationary point.)
• Second order sufficient Conditions:
Suppose that f is twice continuously differentiable in an open neighbourhood of
xˉ and that ∇f (ˉ
x) = 0 and ∇2f (ˉ
x) is positive definite. Then xˉ is a strict local
minimizer of f .
Example: Show that f = (2x21 − x2)(x21 − 2x2) has a minimum at (0, 0) along any
straight line passing through the origin, but f has no minimum at (0, 0).
Exercise: Find the minimum solution of
f (x1, x2) = 2x21 + x1x2 + x22 − x1 − 3x2.
(Answer.
1
7
(−1, 11).)
88
(4)
Sufficient Condition.
Taylor gives for any d ∈ Rn:
f (ˉ
x + d) = f (ˉ
x) + ∇f (ˉ
x)T d + 12 dT ∇2f (ˉ
x + λ d) d
If xˉ is not strict local minimizer, then
∃ {xk } ⊂ Rn \ {ˉ
x} :
xk → xˉ
λ ∈ (0, 1) .
s.t. f (xk ) ≤ f (ˉ
x) .
x
Define dk := kxxk −ˉ
xk . Then kdk k = 1 and there exists a subsequence {dkj } such that
k −ˉ
dkj → d? as j → ∞ and kd?k = 1. W.l.o.g. we assume dk → d? as k → ∞.
f (ˉ
x) ≥ f (xk ) = f (ˉ
x + kxk − xˉk dk )
x)T dk + 12 kxk − xˉk2 dTk ∇2f (ˉ
x + λk kxk − xˉk dk ) dk
= f (ˉ
x) + kxk − xˉk ∇f (ˉ
= f (ˉ
x) + 12 kxk − xˉk2 dTk ∇2f (ˉ
x + λk kxk − xˉk dk ) dk .
x + λk kxk − xˉk dk ) dk ≤ 0, and on letting k → ∞
Hence dTk ∇2f (ˉ
dT? ∇2f (ˉ
x ) d? ≤ 0 .
x) being symmetric positive definite. Hence xˉ
As d? 6= 0, this is a contradiction to ∇2f (ˉ
is a strict local minimizer.
89
Example 6.2. Show that f = (2x21 − x2)(x21 − 2x2) has a minimum at (0, 0) along any
straight line passing through the origin, but f has no minimum at (0, 0).
Answer.
Straight line through (0, 0): x2 = α x1, α ∈ R fixed.
g(r) := f (r, α r) = (2 r2 − α r) (r2 − 2 α r)
⇒
g 0(r) = 8 r3 − 15 α r2 + 4 α2 r,
g 0(0) = 0
g 00(r) = 24 r2 − 30 α r + 4 α2
and g 00(0) = 4 α2 > 0 .
Hence r = 0 is a minimizer for g
⇐⇒ (0, 0) is a minimizer for f along any straight
line.
Now let (xk1 , xk2 ) = ( k1 , k12 ) → (0, 0) as k → ∞. Then
1 1
∀ k.
f (xk1 , xk2 ) = − 2 2 < 0 = f (0, 0)
k k
Hence (0, 0) is not a minimizer for f .
!
0 0
2
[Note: ∇f (0, 0) = 0, but ∇ f (0, 0) =
.]
0 4
90
3. Convex Optimization.
Exercise. When f is convex, any local minimizer xˉ is a global minimizer of f . If in
addition f is differentiable, then any stationary point x
ˉ is a global minimizer of f .
(Hint. Use a contradiction argument.)
91
Exercise 6.3.
When f is convex, any local minimizer xˉ is a global minimizer of f .
Proof.
Suppose xˉ is a local minimizer, but not a global minimizer. Then
Since f is convex, we have that
∃x
e s.t. f (e
x) < f (ˉ
x).
f (λ x
e + (1 − λ) xˉ) ≤ λ f (e
x) + (1 − λ) f (ˉ
x)
< λ f (ˉ
x) + (1 − λ) f (ˉ
x) = f (ˉ
x)
Let xλ := λ x
e + (1 − λ) xˉ. Then
xλ → xˉ and f (xλ) < f (ˉ
x)
This is a contradiction to xˉ being a local minimizer.
Hence xˉ is a global minimizer for f .
92
∀ λ ∈ (0, 1] .
as λ → 0.
4. Line Search.
The basic procedure to solve numerically an unconstrained problem (minimize f (x)
over x ∈ Rn) is as follows.
(i) Choose an initial point x0 ∈ Rn and an initial search direction d0 ∈ Rn and set
k = 0.
(ii) Choose a step size αk and define a new point xk+1 = xk + αk dk . Check if the
stopping criterion is satisfied (k∇f (xk+1)k < ε?). If yes, xk+1 is the optimal
solution, stop. If no, go to (iii).
(iii) Choose a new search direction dk+1 (descent direction) and set k = k + 1. Go to
(ii).
The essential and most difficult part in any search algorithm is to choose a descent
direction dk and a step size αk with good convergence and stability properties.
93
5. Steepest Descent Method.
f is differentiable.
Choose dk = −g k , where g k = ∇f (xk ), and choose αk s.t.
f (xk + αk dk ) = min f (xk + α dk ).
α∈R
Note that the successive descent directions are orthogonal to each other, i.e. (g k )T g k+1 =
0, and the convergence for some functions may be very slow, called zigzagging.
Exercise.
Use the steepest descent (SD) method to solve (4) with the initial point x0 = (1, 1).
(Answer. First three iterations give x1 = (0, 1), x2 = (0, 32 ), and x3 = (− 18 , 32 ).)
94
Steepest Descent.
Taylor gives:
f (xk + α dk ) = f (xk ) + α ∇f (xk )T dk + O(α2).
As
∇f (xk )T dk = k∇f (xk )k kdk k cos θk ,
with θk the angle between dk and ∇f (xk ), we see that dk is a descent direction if
cos θk < 0. The descent is steepest when θk = π ⇐⇒ cos θk = −1.
Zigzagging.
αk is minimizer of φ(α) := f (xk + α dk ) with dk = −g k .
Hence
0 = φ0(αk ) = ∇f (xk + αk dk )T dk = ∇f (xk+1)T (−g k ) = −(g k+1)T g k .
Hence dk+1 ⊥ dk , which leads to zigzagging.
95
Exercise 6.5.
Use the SD method to solve (4) with the initial point x0 = (1, 1). [min:
Answer.
∇f = (4 x1 + x2 − 1, 2 x2 + x1 − 3).
Iteration 0: d0 = −∇f (x0) = −(4, 0) 6= (0, 0).
φ(α) = f (x0 + α d0) = f (1 − 4 α, 1) = 2 (1 − 4 α)2 − 2
minimum point at α0 = 14
⇒ x1 = x0 + α0 d0 = (0, 1),
d1 = −∇f (x1) = −(0, −1) = (0, 1) 6= (0, 0).
Iteration 1: x2 = (0, 32 ), d2 = (− 12 , 0).
Iteration 2: x3 = (− 18 , 32 ), d3 = (0, 18 ).
96
1
7
(−1, 11).]
6. Newton Method.
f is twice differentiable.
Choose dk = −[H k ]−1g k , where H k = ∇2f (xk ).
Set xk+1 = xk + dk .
If H k is positive definite then dk is a descent direction.
The main drawback of the Newton method is that it requires the computation of
∇2f (xk ) and its inverse, which can be difficult and time-consuming.
Exercise.
Use the Newton method to solve (4) with x0 = (1, 1).
(Answer. First iteration gives x1 = 17 (−1, 11).)
97
Newton Method.
Taylor gives
f (xk + d) ≈ f (xk ) + dT ∇f (xk ) + 12 dT ∇2f (xk ) d =: m(d)
min m(d) ⇒
d
⇒
∇m(d) = 0
∇f (xk ) + ∇2f (xk ) d = 0 .
Hence choose dk = −[∇2f (xk )]−1 ∇f (xk ) = −[H k ]−1g k .
If H k is positive definite, then so is (H k )−1, and we get
(dk )T g k = −(g k )T (H k )−1 g k ≤ −σk kg k k2 < 0
for some σk > 0.
Hence dk is a descent direction.
[Aside: The Newton method for minx f (x) is equivalent to the Newton method for finding
a root of the system of nonlinear equations ∇f (x) = 0.]
98
Exercise 6.6.
Use the Newton method to minimize
f (x1, x2) = 2x21 + x1x2 + x22 − x1 − 3x2
with x0 = (1, 1)T .
Answer.
!
!
4 x 1 + x2 − 1
4 1
, H := ∇2f =
∇f =
.
2 x 2 + x1 − 3
1 2
!
!
2 −1
2 −1
1
1
−1
=7
.
H =
det H −1 4
−1 4
Iteration 0: x0 = (1, 1)T , ∇f (x0) = (4, 0)T . !
1
x1 = x0 − [H 0]−1 ∇f (x0) =
− 17
1
⇒
⇒
2 −1
−1 4
∇f (x1) = (0, 0)T and H positive definite.
x1 is minimum point.
99
!
4
0
!
=
1
7
!
−1
.
11
7. Choice of Stepsize.
In computing the step size αk we face a tradeoff. We would like to choose αk to
give a substantial reduction of f , but at the same time we do not want to spend too
much time making the choice. The ideal choice would be the global minimizer of the
univariate function φ : R → R defined by
φ(α) = f (xk + α dk ),
α > 0,
but in general, it is too expensive to identify this value.
A common strategy is to perform an inexact line search to identify a step size that
achieves adequate reductions in f with minimum cost.
α is normally chosen to satisfy the Wolfe conditions:
f (xk + αk dk ) ≤ f (xk ) + c1 αk (g k )T dk
∇f (xk + αk dk )T dk ≥ c2 (g k )T dk ,
(5)
(6)
with 0 < c1 < c2 < 1. (5) is called the sufficient decrease condition, and (6) is the
curvature condition.
100
Choice of Stepsize.
The simple condition
f (xk + αk dk ) < f (xk )
(†)
is not appropriate, as it may not lead to a sufficient reduction.
Example: f (x) = (x − 1)2 − 1. So min f (x) = −1, but we can choose xk satisfying (†)
such that f (xk ) = k1 → 0.
Note that the sufficient decrease condition (5)
φ(α) = f (xk + α dk ) ≤ `(α) := f (xk ) + c1 α (g k )T dk
yields acceptable regions for α. Here φ(α) < `(α) for small α > 0, as (g k )T dk < 0 for
descent directions.
The curvature condition (6) is equivalent to
φ0(α) ≥ c2 φ0(0)
[ > φ0(0) ]
i.e. a condition on the desired slope and so rules out unacceptably short steps α. In
practice c1 = 10−4 and c2 = 0.9.
101
8. Convergence of Line Search Methods.
An algorithm is said to be globally convergent if
lim kg k k = 0.
k→∞
It can be shown that if the step sizes satisfy the Wolfe conditions
• then the steepest descent method is globally convergent,
• so is the Newton method provided the Hessian matrices ∇2f (xk ) have a bounded
condition number and are positive definite.
Exercise. Show that the steepest descent method is globally convergent if the
following conditions hold
(a) αk satisfies the Wolfe conditions,
(b) f (x) ≥ M
∀ x ∈ Rn ,
(c) f ∈ C 1 and ∇f is Lipschitz: k∇f (x) − ∇f (y)k ≤ L kx − yk
102
∀ x, y ∈ Rn.
[Hint: Show that
∞
X
k=0
kg k k2 < ∞.]
103
Exercise 6.8.
Assume that dk is a descent direction, i.e. (g k )T dk < 0, where g k := ∇f (xk ). Then if
1. αk satisfies the Wolfe conditions,
2. f (x) ≥ M
∀ x ∈ Rn ,
3. f ∈ C 1 and ∇f is Lipschitz, i.e. k∇f (x) − ∇f (y)k ≤ L kx − yk ∀ x, y ∈ Rn,
∞
X
(g k )T dk
2 k
k 2
k
cos θ kg k < ∞, where cos θ := kgk k kdk k .
it holds that
k=0
[Note: SD method is special case with cos2 θk = 1. ⇒ lim kg k k = 0.]
k→∞
Proof.
Wolfe condition (6) ⇒ (g k+1)T dk ≥ c2 (g k )T dk
⇒
(g k+1 − g k )T dk ≥ (c2 − 1) (g k )T dk .
.
104
(†)
The Lipschitz condition yields that
(g k+1 − g k )T dk ≤ kg k+1 − g k k kdk k = k∇f (xk+1) − ∇f (xk )k kdk k
≤ L kxk+1 − xk k kdk k = αk L kdk k2 .
c2 − 1 (g k )T dk
, and hence
Combining (†) and (‡) gives αk ≥
k
2
L
kd k
c2 − 1 [(g k )T dk ]2
αk (g ) d ≤
kdk k2
L
k T
k
Together with Wolfe condition (5) we get
f (x
k+1
c2 − 1 [(g k )T dk ]2
k
2 k
k 2
) ≤ f (x ) + c1
=
f
(x
)
−
c
cos
θ
kg
k ,
k
2
L
kd k
k
105
(‡)
2
where c := c1 1−c
L > 0.
f (xk+1) ≤ f (xk ) − c cos2 θk kg k k2 ≤ f (x0) − c
⇒
k
X
1
cos2 θj kg j k2 ≤ (f (x0) − M ) ∀ k
c
j=0
106
⇒
∞
X
j=0
k
X
j=0
cos2 θj kg j k2
cos2 θj kg j k2 < ∞ .
9. Popular Search Methods.
In practice the steepest descent method and the Newton method are rarely used
due to the slow convergence rate and the difficulty in computing Hessian matrices,
respectively.
The popular search methods are
• the conjugate gradient method (variation of SD method with superlinear convergence) and
• the quasi-Newton method (variation of Newton method without computation of
Hessian matrices).
There are some efficient algorithms based on the trust-region approach. See Fletcher
(2000) for details.
107
10. Constrained Optimization.
Minimize f (x) over x ∈ Rn subject to
the equality constraints
hi(x) = 0,
i = 1, . . . , l ,
and the inequality constraints
gj (x) ≤ 0,
j = 1, . . . , m .
Assume that all functions involved are differentiable.
108
11. Linear Programming.
The problem is to minimize
z = c1 x1 + ∙ ∙ ∙ + cn xn
subject to
ai1 x1 + ∙ ∙ ∙ + ain xn ≥ bi,
i = 1, . . . , m ,
and
x1, . . . , xn ≥ 0.
LPs can be easily and efficiently solved with the simplex algorithm or the interior
point method.
MS-Excel has a good in-built LP solver capable of solving problems up to 200 variables. MATLAB with optimization toolbox also provides a good LP solver.
109
12. Graphic Method.
If an LP problem has only two decision variables (x1, x2), then it can be solved by
the graphic method as follows:
• First draw the feasible region from the given constraints and a contour line of the
objective function,
• then, on establishing the increasing direction perpendicular to the contour line,
find the optimal point on the boundary of the feasible region,
• then find two linear equations which define that point,
• and finally solve the two equations to obtain the optimal point.
Exercise. Use the graphic method to solve the LP:
minimize z = −3 x1 − 2 x2
subject to x1 + x2 ≤ 80, 2x1 + x2 ≤ 100, x1 ≤ 40, and x1, x2 ≥ 0.
(Answer. x1 = 20, x2 = 60.)
110
13. Quadratic Programming.
Minimize
xT Q x + cT x
subject to
Ax ≤ b
and
x ≥ 0,
where Q is an n × n symmetric positive definite matrix, A is an n × m matrix,
x, c ∈ Rn, b ∈ Rm.
To solve a QP problem, one
• first derives a set of equations from the Kuhn–Tucker conditions, and
• then applies the Wolfe algorithm or the Lemke algorithm to find the optimal
solution.
The MS-Excel solver is capable of solving reasonably sized QP problems, similarly
for MATLAB.
111
14. Kuhn–Tucker Conditions.
min f (x) over x ∈ Rn s.t. hi(x) = 0, i = 1, . . . , l; gj (x) ≤ 0, j = 1, . . . , m.
Assume that xˉ is an optimal solution.
Under some regularity conditions, called the constraint qualifications, there exist
two vectors uˉ = (ˉ
u1, . . . , uˉl ) and vˉ = (ˉ
v1, . . . , vˉm), called the Lagrange multipliers,
such that the following set of conditions is satisfied:
x, uˉ, vˉ) = 0,
Lxk (ˉ
k = 1, . . . , n
x) = 0,
hi(ˉ
i = 1, . . . , l
x) ≤ 0,
gj (ˉ
j = 1, . . . , m
vˉj gj (ˉ
x) = 0,
vˉj ≥ 0,
where
L(x, u, v) = f (x) +
l
X
ui hi(x) +
i=1
is called the Lagrange function or Lagrangian.
112
j = 1, . . . , m
m
X
j=1
vj gj (x)
Furthermore, if f : Rn → R and hi, gj : Rn → R are convex, then xˉ is an optimal
solution if and only if (ˉ
x, uˉ, vˉ) satisfies the Kuhn–Tucker conditions.
This holds in particular, when f is convex and hi, gj are linear.
Example.
Find the minimum solution to the function x2 − x1 subject to x21 + x22 ≤ 1.
Exercise.
Find the minimum solution to the function x21 +x22 −2x1 −4x2 subject to x1 +2x2 ≤ 2
and x2 ≥ 0.
(Answer. (x1, x2) = 15 (2, 4).)
113
Interpretation of Kuhn–Tucker conditions
Assume that no equality constraints are present.
If xˉ is an interior point, i.e. no constraints are active, then we recover the usual optimality
condition: ∇f (ˉ
x) = 0.
Now assume that xˉ lies on the boundary of the feasible set and let gjk be the active
constraints at xˉ. Then a necessary condition for optimality is that we cannot find a
descent direction for f at xˉ that is also a feasible direction. Such a vector cannot exist,
if
X
−∇f (ˉ
x) =
vˉjk ∇gjk (ˉ
x)
with vˉjk ≥ 0 .
(†)
k
P
x) d < 0 and k vˉjk ∇gjk (ˉ
x)T d >
This is because, if d ∈ R is a descent direction, then ∇f (ˉ
0.
So there must exist a jk , such that ∇gjk (ˉ
x)T d > 0. But that means that d is an ascent
direction for gjk , and as gjk is active at xˉ, it is not a feasible direction.
x) = 0 for all inactive constraints, then we can re-write (†) as 0 =
If we require vˉj gj (ˉ
n
T
114
∇f (ˉ
x) +
Pm
ˉj
j=1 v
∇gj (ˉ
x) with vˉj ≥ 0. These are the KT conditions.
115
Application of Kuhn–Tucker: LP Duality
Let b ∈ Rm, c ∈ Rn and A ∈ Rn×m.
min cT x s.t. A x ≥ b,
x ≥ 0.
(P)
Equivalent to
min cT x s.t. b − A x ≤ 0, −x ≤ 0 .
Lagrangian: L = cT x + v T (b − A x) + y T (−x).
Hence, xˉ is the solution, if there exist vˉ and yˉ such that
KT conditions:
∇L = c − AT vˉ − yˉ = 0
⇒
vˉ, yˉ ≥ 0,
−ˉ
x ≤ 0.
vˉT (b − A xˉ) = 0,
116
yˉT (−ˉ
x) = 0,
b − A xˉ ≤ 0,
yˉ = c − AT vˉ,
Eliminate yˉ to find vˉ, xˉ:
A xˉ ≥ b,
AT vˉ ≤ c,
xˉ ≥ 0
vˉ ≥ 0
vˉT (b − A xˉ) = 0
xˉT (c − AT vˉ) = 0
feasible region: primal
)
feasible region: dual
⇒
xˉT c = xˉT AT vˉ = vˉT b
Hence vˉ ∈ Rm solves the dual:
max bT v s.t. AT v ≤ c,
117
v ≥ 0.
(D)
Here we have used that
cT xˉ =
min
x≥0, A x≥b
cT x
≥ min max cT x + v T (b − A x)
x≥0
v≥0
= max min cT x + v T (b − A x)
v≥0
x≥0
= max min v T b + xT (c − AT v)
v≥0
≥
x≥0
max
v≥0, AT v≤c
T
T
vT b
≥ vˉ b = c xˉ
118
Example 6.14.
Find the minimum solution to the function x2 − x1 subject to x21 + x22 ≤ 1.
Answer.
L = x2 − x1 + v (x21 + x22 − 1), so the KT conditions become
Lx1 = −1 + 2 v x1 = 0
(1)
Lx2 = 1 + 2 v x2 = 0
(2)
x21 + x22 ≤ 1
(3)
v (x21 + x22 − 1) = 0,
v≥0
(4)
(1) ⇒ v > 0 and hence x1 = 2v1 , x2 = − 2v1 .
Plugging this into (4) yields 4 2v2 = 1 and hence
1
√
vˉ =
2
⇒
√
1
1
√
√
xˉ1 =
, xˉ2 = − ;
2
2
with the optimal value being zˉ = − 2.
119
Example.
min x1 s.t. x2 − x31 ≤ 0, x1 ≤ 1, x2 ≥ 0.
Since x31 ≥ x2 ≥ 0 we have x1 ≥ 0 and hence xˉ = (0, 0) is the unique minimizer.
Lagrangian: L = x1 + v1 (x2 − x31) + v2 (x1 − 1) + v3 (−x2).
KT conditions for a feasible point x:
2
1 − 3 v 1 x 1 + v2
=0
(1)
∇L =
v1 − v3
v1 (x2 − x31) = 0, v2 (x1 − 1) = 0, v3 (−x2) = 0
(2)
v1 , v 2 , v 3 ≥ 0
(3)
Check KT conditions at xˉ = (0, 0):
(1)
⇒
v2 = −1 < 0
impossible!
KT condition is not satisfied, since the constraint qualifications do not hold.
0
0
3
Here g1 = x2 − x1 and g3 = −x2 are active at (0, 0), and ∇g1 = 1 , ∇g3 = −1 . Hence
∇g1 and ∇g3 are not linearly independent!
120
Exercise 6.14
Find the minimum solution to the function x21 + x22 − 2 x1 − 4 x2 subject to x1 + 2 x2 ≤ 2
and x2 ≥ 0.
Answer.
Lagrangian: L = x21 + x22 − 2 x1 − 4 x2 + v1 (x1 + 2 x2 − 2) + v2 (−x2)
KT conditions:
2 x 1 − 2 + v1
∇L =
=0
(1)
2 x2 − 4 + 2 v1 − v2
(2)
v1 (x1 + 2 x2 − 2) = 0, v2 x2 = 0, v1, v2 ≥ 0
x1 + 2 x2 ≤ 2,
x2 ≥ 0
(3)
If x1 + 2 x2 − 2 < 0 then v1 = 0 ⇒ x1 = 1 , x2 = 2 + 12 v2 ≥ 2. Hence from (2),
v2 = 0, and so x1 = 1, x2 = 2. But that contradicts (3), and so it must hold that
x1 + 2 x2 − 2 = 0.
If x2 > 0, then v2 = 0 ⇒ solving (1) together with x1 + 2 x2 − 2 = 0 gives x1 = 25 ,
X
x2 = 45 ; and v1 = 65 , v2 = 0.
121
If x2 = 0 then x1 = 2 − 2 x2 = 2. But then from (1), v1 = −2 < 0. Impossible.
122
7. Lattice (Tree) Methods
1. Concepts in Option Pricing Theory.
A call (or put) option is a contract that gives the holder the right to buy (or sell)
a prescribed asset (underlying asset) by a certain date T (expiration date) for a
predetermined price X (exercise price).
A European option can only be exercised on the expiration date while an American
option can be exercised at any time prior to the expiration date.
The other party to the holder of the option contract is called the writer.
The holder and the writer are said to be in long and short positions of the contract,
respectively.
The terminal payoffs of a European call (or put) option is (S(T )−X)+ := max(S(T )−
X, 0) (or (X − S(T ))+), where S(T ) is the underlying asset price at time T .
123
2. Random Walk Model and Assumption.
Assume S(t) and V (t) are the asset price and the option price at time t, respectively,
and the current time is t and current asset price is S, i.e. S(t) = S.
After one period of time Δt, the asset price S(t + Δt) is either u S (“up” state)
with probability q or d S (“down” state) with probability 1 − q, where q is a real
(subjective) probability.
To avoid riskless arbitrage opportunities, we must have
u>R>d
where R := er Δt and r is the riskless interest rate.
Here r is the riskless interest rate that allows for unlimited borrowing or lending at
the same rate r.
Investing $1 at time t yields a value (return) at time t + Δt of $R = $(er Δt).
(continuous compounding).
124
Continuous compounding
Saving an amount B at time t, yields with interest rate r at time t + Δt without
compounding: (1 + r Δt) B.
Compounding the interest n times yields
n
r Δt
1+
B → er ΔtB as n → ∞ .
n
Example:
Consider Δt = 1 and r = 0.05.
Then R = er = 1.0512711, equivalent to an AER of 5.127%.
n
1
4
12
52
365
(1 + nr )n
1.05
1.05094534
1.05116190
1.05124584
1.05126750
125
AER
5%
5.095
5.116
5.125
5.127
%
%
%
%
No Arbitrage. u > R > d
If R ≥ u, then short sell α > 0 units of the asset, deposit α S in the bank; giving a
portfolio value at time t of:
π(t) = 0
and at time t + Δt of
π(t + Δt) = R (α S) − α S(t + Δt)
≥ R (α S) − α (u S) ≥ 0 .
Moreover, in the “down” state (with probability 1 − q > 0) the value is
π(t + Δt) = R (α S) − α (d S) > 0 .
Hence we can make a riskless profit with positive probability.
No arbitrage implies u > R.
A similar argument yields that d < R.
126
3. Replicating Portfolio.
We form a portfolio consisting of
α units of underlying asset (α > 0 buying, α < 0 short selling) and
cash amount B in riskless cash bond (B > 0 lending, B < 0 borrowing).
If we choose
V (u S, t + Δt) − V (d S, t + Δt)
,
uS − dS
u V (d S, t + Δt) − d V (u S, t + Δt)
B =
,
uR − dR
we have replicated the payoffs of the option at time t + Δt, no matter how the asset
price changes.
α =
127
Replicating Portfolio.
Value of portfolio at time t : π(t) = α S + B, and at time t + Δt:

α u S + R B =: π Δt w. prob. q
u
π(t + Δt) = α S(t + Δt) + R B =
α d S + R B =: π Δt w. prob. 1 − q
d
Call option value at time t + Δt (expiration time) :

(u S − X)+ =: C Δt w. prob. q
u
C(t + Δt) =
(d S − X)+ =: C Δt w. prob. 1 − q
d
To replicate the call option, we must have πuΔt = CuΔt and πdΔt = CdΔt. Hence



CuΔt − CdΔt

α u S + R B = C Δt
α =
u
uS − dS
⇐⇒
Δt
Δt
u
C
−
d
C
α d S + R B = C Δt

u
d

d
B =
uR − dR
This gives the fair price for the call option at time t: π(t) = α S + B.
128
Replicating Portfolio.
The value of the call option at time t is:
where
CuΔt − CdΔt
u CdΔt − d CuΔt
C(t) = π(t) = α S + B =
S+
uS
−
d
S
u R − d R
1 R − d Δt u − R Δt
=
Cu +
Cd
R u−d
u−d
1 Δt
Δt
=
p Cu + (1 − p) Cd ,
R
R−d
.
u−d
No arbitrage argument: If C(t) < π(t), then buy call option at price C(t) and sell
portfolio at π(t) (that is short sell α units of asset and lend B amounts of cash to the
bank). This gives an instantaneous profit of π(t) − C(t) > 0. And we know that at
time t + Δt, the payoff from our call option will exactly compensate for the value of the
portfolio, as C(t + Δt) = π(t + Δt).
A similar argument for the case C(t) > π(t) yields that C(t) = π(t).
p=
129
4. Risk Neutral Option Price.
The current option price is given by
1
V (S, t) =
p V (u S, t + Δt) + (1 − p) V (d S, t + Δt) ,
R
R−d
where p =
. Note that
u−d
1. the real probability q does not appear in the option pricing formula,
(7)
2. p is a probability (0 < p < 1) since d < R < u, and
3. p (u S) + (1 − p) (d S) = R S, so p is called the risk neutral probability and the
process of finding V is called the risk neutral valuation.
130
5. Riskless Hedging Principle.
We can also derive the option pricing formula (7) as follows:
form a portfolio with a long position of one unit of option and a short position of α
units of underlying asset.
By choosing an appropriate α we can ensure such a portfolio is riskless.
Exercise.
Find α (called delta) and derive the pricing formula (7).
131
6. Asset Price Process.
In a risk neutral continuous time model, the asset price S is assumed to follow a
lognormal process.
(as discussed in detail in Stochastic Processes I, omitted here).
Over a period [t, t + τ ] the asset price S(t + τ ) can be expressed in terms of S(t) = S
as
√
(r− 12 σ 2 ) τ +σ τ Z
S(t + τ ) = S e
where r is the riskless interest rate, σ the volatility, and Z ∼ N (0, 1) a standard
normal random variable.
S(t + τ ) has the first moment S er τ and the second moment S 2 e(2 r+σ
132
2) τ
.
7. Relations Between u, d and p.
By equating the first and second moments of the asset price S(t + Δt) in both
continuous and discrete time models, we obtain the relation
S p u + (1 − p) d = S er Δt,
2
2
2
2
S p u + (1 − p) d = S 2 e(2 r+σ ) Δt,
or equivalently,
p u + (1 − p) d = er Δt,
p u2 + (1 − p) d2 = e(2 r+σ
(8)
2 ) Δt
.
(9)
There are two equations and three unknowns u, d, p. An extra condition is needed to
uniquely determine a solution.
R−d
, where R = er Δt as before.]
[Note that (8) implies p =
u−d
133
8. Cox–Ross–Rubinstein Model.
The extra condition is
u d = 1.
The solutions to (8), (9), (10) are
p
1 2
u =
σ̂ + 1 + (σ̂ 2 + 1)2 − 4 R2 ,
2R p
1
2
σ̂ + 1 − (σ̂ 2 + 1)2 − 4 R2 ,
d =
2R
where
2
σ̂ 2 = e(2 r+σ ) Δt.
3
If the higher order term O((Δt) 2 ) is ignored in u, d, then
√
√
R−d
σ Δt
−σ Δt
u=e
, d=e
, p=
.
u−d
These are the parameters chosen by Cox–Ross–Rubinstein for their model.
Note that with this choice (9) is satisfied up to O((Δt)2).
134
(10)
Proof.
σ̂ 2 = e(2 r+σ
⇒
2 ) Δt
= p (u2 − d2) + d2
R−d
(u + d) (u − d) + d2
=
u−d
= Ru − du + Rd
R
= Ru−1+
u
R u2 − (1 + σ̂ 2) u + R = 0 .
As u > d, we get the unique solutions
s
p
2
2
2
2
2
2
2
1 + σ̂ + (1 + σ̂ ) − 4 R
1 + σ̂
1 + σ̂
utrue =
−1
=
+
2R
2R
2R
and
s
p
2
2
2
2
2
2
2
1 1 + σ̂ − (1 + σ̂ ) − 4 R
1 + σ̂
1 + σ̂
− 1.
dtrue = =
=
−
u
2R
2R
2R
135
Moreover
and
1 + σ̂ 2 1 (2 r+σ2) Δt
e
+ 1 e−r Δt
=
2
2R
1
2
2
2
= 2 + (2 r + σ ) Δt + O((Δt) ) 1 − r Δt + O((Δt) )
2
1
2
2
= 2 + σ Δt + O((Δt) )
2
= 1 + 12 σ 2 Δt + O((Δt)2),
s
σ̂ 2
1+
2R
⇒
2
−1=
utrue
q
1+
p
1
2
σ 2 Δt
+
2
2
O((Δt) )
−1
1 + σ 2 Δt + O((Δt)2) − 1
√ p
= σ Δt 1 + O(Δt)
√
= σ Δt (1 + O(Δt))
√
3
= σ Δt + O((Δt) 2 ) .
√
3
1 2
= 1 + 2 σ Δt + σ Δt + O((Δt) 2 ) .
=
136
The Cox–Ross–Rubinstein (CRR) model uses
√
√
3
σ Δt
1 2
= 1 + σ Δt + 2 σ Δt + O((Δt) 2 ) .
u=e
Hence the first three terms in the Taylor series of the CRR value u and the true value
utrue match. So
3
u = utrue + O((Δt) 2 ) .
Estimating the error in (9) by this choice of u gives
2
2
Error = p u + (1 − p) d − e
(2 r+σ 2 ) Δt
2
= R (u + d) − 1 − e(2 r+σ ) Δt
√
√ r Δt
σ Δt
−σ Δt
(2 r+σ 2 ) Δt
=e
e
+e
−1−e
2
2
2
= 1 + r Δt + O((Δt) ) 2 + σ Δt + O((Δt) ) − 1
2
2
− 1 + (2 r + σ ) Δt + O((Δt) )
= O((Δt)2) .
137
9. Jarrow–Rudd Model.
The extra condition is
p=
1
.
2
(11)
Exercise.
Show that the solutions to (8), (9), (11) are
p
2
u = R 1 + eσ Δt − 1 ,
p
d = R 1 − eσ2Δt − 1 .
3
Show that if O((Δt) 2 ) is ignored then
2
√
2
√
u = e
(r− σ2 ) Δt+σ
d = e
(r− σ2 ) Δt−σ
Δt
Δt
,
.
(These are the parameters chosen by Jarrow–Rudd for their model.)
Also show that (8) and (9) are satisfied up to O((Δt)2).
138
10. Tian Model.
The extra condition is
p u3 + (1 − p) d3 = e3 r Δt+3 σ
2 Δt
.
Exercise.
Show that the solutions to (8), (9), (12) are
p
RQ
Q + 1 + Q2 + 2 Q − 3 ,
u =
2 p
RQ
Q + 1 − Q2 + 2 Q − 3 ,
d =
2
R−d
,
p =
u−d
where R = er Δt and Q = eσ
2 Δt
.
3
Also show that if O((Δt) 2 ) is ignored, then
1 3 √
p = − σ Δt.
2 4
139
(12)
(Note that u d = R2 Q2 instead of u d = 1 and that the binomial tree loses its
symmetry about S whenever u d 6= 1.)
140
11. Black–Scholes Equation (BSE).
As the time interval Δt tends to zero, the one period option pricing formula, (7) with
(8) and (9), tends to the Black–Scholes Equation
∂V 1 2 2 ∂ 2V
∂V
+rS
+ σ S
− r V = 0.
(13)
∂t
∂S 2
∂S 2
Proof.
Two variable Taylor expansion at (S, t) gives
V (u S, t + Δt) = V + VS (u S − S) + Vt Δt
+
1
2
2
2
VSS (u S − S) + 2 VSt (u S − S) Δt + Vtt (Δt)
+ higher order terms .
Similarly,
V (d S, t + Δt) = V + VS (d S − S) + Vt Δt
+
1
2
2
2
VSS (d S − S) + 2 VSt (d S − S) Δt + Vtt (Δt)
+ higher order terms .
141
Substituting into the right hand side of (7) gives
V = {V + S VS [p (u − 1) + (1 − p) (d − 1)] + Vt Δt
2
2
2
1
+ 2 S VSS p (u − 1) + (1 − p) (d − 1)
+2 S VSt (p (u − 1) + (1 − p) (d − 1)) Δt
−r Δt
2
+Vtt (Δt) + higher order terms e
.
Now it follows from (8) that
p (u − 1) + (1 − p) (d − 1) = p u + (1 − p) d − 1 = er Δt − 1 = r Δt + O((Δt)2) .
Similarly, (8) and (9) give that
p (u − 1)2 + (1 − p) (d − 1)2 = p u2 + (1 − p) d2 − 2 (p u + (1 − p) d) + 1
=e
(2 r+σ 2 ) Δt
142
− 2 er Δt + 1 = σ 2 Δt + O((Δt)2) .
So, for the RHS of (7) we get
2
V = V + S VS r Δt + O((Δt) ) + Vt Δt
2
2
2
2
1
+ 2 S VSS σ Δt + O((Δt) ) + 2 VSt r Δt + O((Δt) ) Δt
−r Δt
2
+O((Δt) ) e
2
2
1 2 2
= V + r S VS + Vt + 2 σ S VSS Δt + O((Δt) ) 1 − r Δt + O((Δt) )
1 2 2
= V + −r V + r S VS + Vt + 2 σ S VSS Δt + O((Δt)2) .
Cancelling V and dividing by Δt yields
−r V + r S VS + Vt + 12 σ 2 S 2 VSS + O(Δt) = 0 .
Ignoring the O(Δt) term, we get the BSE.
⇒
The binomial model approximates BSE to 1st order accuracy.
143
12. n-Period Option Price Formula.
For a multiplicative n-period binomial process, the call value is
n X
n j
C = R−n
p (1 − p)n−j max(uj dn−j S − X, 0) ,
j
j=0
!
n
n!
is the binomial coefficient.
where
=
j!(n − j)!
j
(14)
Define k to be the smallest non-negative integer such that uk dn−k S ≥ X.
The call pricing formula can be simplified as
C = S Ψ(n, k, p0) − X R−n Ψ(n, k, p) ,
(15)
up
where p0 =
and Ψ is the complementary binomial distribution function defined
R
by
!
n
X n
Ψ(n, k, p) =
pj (1 − p)n−j .
j
j=k
144
u2 S
uS
Asset:
S
recombined tree
udS
dS
d2 S
CuΔt
Call:
C
CdΔt
↑
t
↑
t + Δt
2Δt
= (u2 S − X)+
Cuu
2Δt
Cud
= (u d S − X)+
2Δt
Cdd
= (d2 S − X)+
↑
t + 2 Δt
145
Call price at time t + Δt:
1
1
Δt
2Δt
2Δt
Δt
2Δt
2Δt
Cu =
p Cuu + (1 − p) Cud ,
p Cud + (1 − p) Cdd .
Cd =
R
R
Call price at time t:
1
1
Δt
Δt
2 2Δt
2Δt
2 2Δt
p Cu + (1 − p) Cd = 2 p Cuu + 2 p (1 − p) Cud + (1 − p) Cdd .
C=
R
R
Proof of (14) by induction:
Assume (14) holds for n ≤ k − 1, then for n = k, there are k − 1 periods between time
t + Δt and t + k Δt. So
k−1 X
+
k−1 j
1
Δt
k−1−j
j k−1−j
p (1 − p)
(u S) − X
u d
Cu = k−1
j
R
j=0
k X
+
k − 1 j−1
1
k−j
j k−j
= k−1
p (1 − p)
u d S−X .
R
j−1
j=1
146
⇒
p CuΔt
=
1
Rk−1
and similarly
(1 −
p) CdΔt
=
1
Rk−1
k X
k−1
j=1
j−1
k−1 X
k−1
j=0
j
j
k−j
j
u d
k−j
j
k−j
j
k−j
p (1 − p)
p (1 − p)
u d
S−X
+
S−X
+
,
.
Combining gives
1
Δt
Δt
C=
p Cu + (1 − p) Cd
R
1 k k
= k p (u S − X)+ + (1 − p)k (dk S − X)+
R
k−1 X
k−1
k−1
j
k−j
j k−j
+
+
+
p (1 − p) (u d S − X)
j−1
j
j=1 |
{z }
k
=
j
This proves (14).
147
Let k be the smallest non-negative integer such that
u k
X
X
u
k n−k
u d
S ≥ X ⇐⇒
≥
⇐⇒ k ≥ ln
/ ln .
d
S dn
S dn
d
Then

0
j<k
j n−j
+
(u d S − X) =
.
uj dn−j S − X j ≥ k
Hence
n X
1
n j
C= n
p (1 − p)n−j (uj dn−j S − X)
R
j
j=k
j n−j
n n X
X
n pu
(1 − p) d
n j
−n
−XR
p (1 − p)n−j .
=S
j
j
R
R
j=k
j=k
pu
On setting p0 =
, it holds that
R
R − d u −R d + u d u − R d
(1 − p) d
0
1−p =1−
=
=
=
.
u−d R
R (u − d)
u−d R
R
This proves (15).
148
13. Black–Scholes Call Price Formula.
As the time interval Δt tends to zero, the n-period call price formula (15) tends to
the Black–Scholes call option pricing formula
c = S Φ(d1) − X e−r (T −t) Φ(d2)
where
d1 =
and
Proof.
ln
S
X
σ2
2 ) (T
+ (r +
√
σ T −t
(16)
− t)
2
√
ln XS + (r − σ2 ) (T − t)
√
d2 =
≡ d1 − σ T − t.
σ T −t
The n-period call price (15) is given by
C = S Ψ(n, k, p0) − X R−n Ψ(n, k, p) = S Ψ(n, k, p0) − X e|−r{zn Δt} Ψ(n, k, p) .
e−r (T −t)
149
Hence we need to prove
Ψ(n, k, p0) → Φ(d1) and Ψ(n, k, p) → Φ(d2)
150
as Δt → 0.
Let j be the binomial RV of the number of upward moves in n periods. Then
u j
j n−j
Sn = S u d
=S
dn ,
d
where Sn is the asset price at time T = t + n Δt.
Sn
u
= j ln + n ln d = j ln u + (n − j) ln d ,
⇒
ln
S
d
E(j) = n p ,
Var(j) = n p (1 − p) ,
u 2
u
Sn
Sn
= n p ln + n ln d ,
= n p (1 − p) ln
.
Var ln
E ln
S
S
d
d
Moreover,
1 − Ψ(n, k, p) = P (j ≤ k − 1) = P
Sn
u
ln
≤ (k − 1) ln + n ln d
d
S
.
As k is the smallest integer such that k ≥ ln SXdn / ln ud , there exists an α ∈ (0, 1] such
that
X
u
−α,
k − 1 = ln
/
ln
n
Sd
d
151
and so
1 − Ψ(n, k, p) = P
Sn
X
u
ln
≤ ln − α ln
S
d
S
152
.
Define

ln u with prob. p
Xin :=
ln d with prob 1 − p
i = 1, . . . , n ,
and Zn := X1n + . . . + Xnn.
Then, for each n, {X1n, . . . , Xnn} are independent and Zn ∼ ln SSn . Hence
1 − Ψ(n, k, p) = P (Zn ≤ βn) ,
X
u
βn := ln − α ln .
S
d
Assuming e.g. the CRR model, we get
σ2
where μ := r − ,
2
E(Zn) = n p ln u + n (1 − p) ln d → μ (T − t),
2u
→ σ 2 (T − t) ,
Var(Zn) = n p (1 − p) ln
d
X
u
X
βn = ln − α ln
→ ln ,
S
d√
S
√
σ T −t
n
√
as n → ∞ .
≡ yn → 0
and |Xi | ≤ |σ Δt| =
n
153
Hence Theorem 5.10. implies that
D
Zn −→ Z ∼ N (μ (T − t), σ 2 (T − t)) .
154
So, as n → ∞
1
1 − Ψ(n, k, p) = P (Zn ≤ βn) → p
2 π σ 2 (T
On letting
y :=
we have
1
1 − Ψ(n, k, p) → √
2π
Hence
=Φ
ln
S
X
ln X
S
e
−
(s−μ (T −t))2
2 σ 2 (T −t)
ds .
−∞
s − μ (T − t)
√
,
σ T −t
ln
X
S
!
!
ln
X
S
!
+ (r −
− t)
√
σ T −t
= Φ(d2) .
Z
(T −t)
ln X
S −μ
√
σ T −t
2
e
− y2
dy = Φ
−∞
ln
Ψ(n, k, p) → 1 − Φ
− t)
Z
X
S
− μ (T − t)
√
σ T −t
σ2
2 ) (T
Similarly one can show Ψ(n, k, p0) → Φ(d1).
155
=Φ −
!
− μ (T − t)
√
σ T −t
− μ (T − t)
√
σ T −t
.
14. Implementation.
Choose the number of time steps that is required to march down until the expiration
T −t
time T to be n, so that Δt =
.
n
At time ti = t + i Δt there are i + 1 nodes.
Denote these nodes as (i, j) with i = 0, 1, . . . , n and j = 0, 1, . . . , i, where the first
component i refers to time ti and the second component j refers to the asset price
S uj di−j .
Compute the option prices Vjn = V (S uj dn−j , T ), j = 0, 1, . . . , n, at expiration time
tn = T over n + 1 nodes.
Then go backwards one step and compute option prices at time tn−1 = T − Δt using
the one period binomial option pricing formula over n nodes.
Continue until reaching the current time t0 = t.
156
The recursive formula is
1
i
i+1
i+1
Vj =
p Vj+1 + (1 − p) Vj
, j = 0, . . . , i and i = n − 1, . . . , 0 ,
R
where
R−d
.
p=
u−d
The value V00 = V (S, t) is the option price at the current time t.
C++ Exercise: Write a program to implement the Cox–Ross–Rubinstein model to
price options with a non-dividend paying underlying asset. The inputs are the asset
price S, the strike price X, the riskless interest rate r, the current time t and the
maturity time T , the volatility σ, and the number of steps n. The outputs are the
European call and put option prices.
157
15. Greeks.
Assume the current time is t = 0. Some common sensitivity measures of a European
call option price c (the rate of change of the call price with respect to the underlying
factors) are delta, gamma, vega, rho, theta, defined by
∂c
∂ 2c
∂c
∂c
∂c
δ=
, γ=
,
ρ
=
,
θ
=
−
.
,
v
=
2
∂S
∂S
∂σ
∂r
∂T
Exercise.
Derive the following relations from the call price (16).
δ = Φ(d1) ,
Φ0(d1)
√ ,
γ =
S√
σ T
v = S T Φ0(d1) ,
ρ = X T e−r T Φ(d2) ,
S Φ0(d1) σ
√
θ = −
− r X e−r T Φ(d2) .
2 T
158
16. Dynamic Hedging.
Greeks can be used to hedge an option position against changes of underlying factors.
For example, to hedge a short position of a European call against changes of the asset
price, one should keep a long position of the underlying asset with δ units.
This strategy requires continuous trading and is prohibitively expensive if there are
any transaction costs.
159
17. Dividends.
(a) If the asset pays a continuous dividend yield at a rate q, then use r − q instead of
r in building the asset price tree, but still use r in discounting the option values
in the backward calculation.
In the risk-neutral model, the asset has a rate of return of r − q.
So
(8) becomes
(9) becomes
p u + (1 − p) d = e(r−q) Δt ,
p u2 + (1 − p) d2 = e(2 r−2 q+σ
1
However, the discount factor remains the same:
R
1
= e−r Δt .
R
160
2 ) Δt
.
(b) If the asset pays a discrete proportional dividend β S between ti−1 and ti, then the
asset price at node (i, j) drops to (1 − β) uj di−j S instead of the usual uj di−j S.
The tree is still recombined.
161
S•
◦
•
•
•
(1 − β) u2 S
•
•
(1 − β) u d S
◦
•
◦
•
(1 − β) d2 S
•
•
•
(1 − β) u4 S
•
•
•
•
(1 − β) d4 S
(c) If the asset pays a discrete cash dividend D between ti−1 and ti, then the asset
price at node (i, j) drops to uj di−j S −D instead of the usual uj di−j S. There are
i + 1 new recombined trees emanating from the nodes at time ti, but the original
tree is no longer recombined.
162
Suppose a cash dividend D is paid between ti−1 and ti, then at time ti+m there
are (i + 1) (m + 1) nodes, instead of i + m + 1 in a recombined tree.
If there are several cash dividends, then the number of nodes is much larger than
in a normal tree.
Example. Assume there is a cash dividend D between periods 1 and 2. Then
the possible asset prices at time n = 2 are {u2 S − D, u d S − D, d2 S − D}. At
n = 3 there are 6 nodes: u ∙ {u2 S − D, u d S − D, d2 S − D} and d ∙ {u2 S −
D, u d S − D, d2 S − D}, instead of 4.
[Reason: d (u2 S − D) 6= u (u d S − D)].
163
• u2 (u2 S − D)
◦
•
S•
•
◦
•
•
u2 S − D
•
•
•
udS − D
◦
•
•
•
d2 S − D
•
•
•
•
•
•
•
•
•
164
d2 (d2 S − D)
18. Trinomial Model.
Suppose the asset price is S at time t. At time t + Δt the asset price can be
u S (up-state) with probability pu, m S (middle-state) with probability pm, and d S
(down-state) with probability pd. Then the risk neutral one period option pricing
formula is
1
V (S, t) = (pu V (u S, t + Δt) + pm V (m S, t + Δt) + pd V (d S, t + Δt)) .
R
The trinomial tree approach is equivalent to the explicit finite difference method. By
equating the first and second moments of the asset price S(t + Δt) in both continuous
and discrete models, we obtain the following equations:
pu + pm + pd = 1 ,
pu u + pm m + pd d = er Δt ,
pu u2 + pm m2 + pd d2 = e(2 r+σ
2 ) Δt
.
There are three equations and six variables u, m, d and pu, pm, pd. We need three
more equations to uniquely determine these variables.
165
19. Boyle Model.
The three additional conditions are
m = 1,
u d = 1,
u=e
λσ
√
Δt
,
where λ is a free parameter.
Exercise.
Show that the risk-neutral probabilities are given by
(W − R) u − (R − 1)
,
pu =
(u − 1) (u2 − 1)
(W − R) u2 − (R − 1) u3
pd =
,
2
(u − 1) (u − 1)
where R = er Δt and W = e(2 r+σ
2 ) Δt
p m = 1 − pu − pd ,
.
Note that if λ = 1, which corresponds to the choice of u as in the CRR model, certain
sets of parameters can lead to pm < 0. To rectify this, choose λ > 1.
166
Boyle claimed that if pu ≈ pm ≈ pd ≈ 13 , then the trinomial scheme with 5 steps is
comparable to the CRR binomial scheme with 20 steps.
167
20. Hull–White Model.
This is the same as the Boyle model with λ =
√
3.
Exercise.
Show that the risk-neutral probabilities are
r
Δt
1
1 2
r
−
pd = −
σ ,
2
6
12 σ
2
2
pm = ,
3 r
1
1 2
Δt
σ ,
r
−
pu = +
2
6
12 σ
2
if terms of order O(Δt) are ignored.
168
21. Kamrad–Ritchken Model.
If S follows a lognormal process, then we can write ln S(t + Δt) = ln S(t) + Z, where
Z is a normal random variable with mean (r − 12 σ 2) Δt and variance σ 2 Δt.
KR suggested to approximate Z by a discrete random variable Z a as follows: Z a =
Δx with probability pu, 0 with probability pm, and −Δx with probability pd, where
√
Δx = λ σ Δt and λ ≥ 1.
The corresponding u, m, d in the trinomial tree are u = eΔx, m = 1, d = e−Δx.
By omitting the higher order term O((Δt)2), they showed that the risk-neutral probabilities are
1
1
1 2 √
+
Δt ,
r− σ
pu =
2
2λ
2λσ
2
1
pm = 1 − 2 ,
λ
1
1
1 2 √
pd =
−
r− σ
Δt .
2
2λ
2λσ
2
Note that if λ = 1 then pm = 0 and the trinomial scheme is reduced to the binomial
169
scheme.
KR claimed that if pu = pm = pd = 13 , then the trinomial scheme with 15 steps is
comparable to the binomial scheme (pm = 0) with 55 steps. They also discussed the
trinomial scheme with two correlated state variables.
170
and the equations become



with prob. pu

Δx
Za = 0
with prob. pm



−Δx with prob. pd
pu + pm + pd = 1 ,
E(Z a) = Δx (pu − pd) = (r − 12 σ 2) Δt ,
(1)
(2)
Var(Z a) = (Δx)2 (pu + pd) − (Δx)2 (pu − pd)2 = σ 2 Δt .
|
{z
}
O((Δt)2 )
Dropping the O((Δt)2) term leads to
(Δx)2 (pu + pd) = σ 2 Δt .
171
(3)
Hence
⇒
1 Δt 2 Δt
1 2
1 Δt 2 Δt
1 2
pu =
σ +
σ −
(r − σ ) , pd =
(r − σ ) .
2 (Δx)2
Δx
2 2 (Δx)2
Δx
2 1 1
1
1 2 √
1 1
1
1 2 √
(r − σ ) Δt , pd =
(r − σ ) Δt .
pu =
+
−
2
2
2 λ
λσ
2
2 λ
λσ
2
172
8. Finite Difference Methods
1. Diffusion Equations of One State Variable.
2
∂u
∂
u
2
, (x, t) ∈ D ,
(17)
=c
∂t
∂x2
where t is a time variable, x is a state variable, and u(x, t) is an unknown function
satisfying the equation.
To find a well-defined solution, we need to impose the initial condition
u(x, 0) = u0(x)
(18)
and, if D = [a, b] × [0, ∞), the boundary conditions
u(a, t) = ga(t)
and
where u0, ga, gb are continuous functions.
173
u(b, t) = gb(t) ,
(19)
If D = (−∞, ∞) × (0, ∞), we need to impose the boundary conditions
lim u(x, t) e
|x|→∞
−a x2
=0
for any a > 0.
(20) implies u(x, t) does not grow too fast as |x| → ∞.
(20)
The diffusion equation (17) with the initial condition (18) and the boundary conditions
(19) is well-posed, i.e. there exists a unique solution that depends continuously on u0,
ga and gb.
174
2. Grid Points.
To find a numerical solution to equation (17) with finite difference methods, we first
need to define a set of grid points in the domain D as follows:
Choose a state step size Δx = b−a
N (N is an integer) and a time step size Δt, draw
a set of horizontal and vertical lines across D, and get all intersection points (xj , tn),
or simply (j, n),
where xj = a + j Δx, j = 0, . . . , N , and tn = n Δt, n = 0, 1, . . ..
If D = [a, b] × [0, T ] then choose Δt =
0, . . . , M .
175
T
M
(M is an integer) and tn = n Δt, n =
tM = T
..
.
t3
t2
t1
t0 = 0
x0 = a x 1
x2
...
176
xN = b
3. Finite Differences.
The partial derivatives
∂u
∂ 2u
ux :=
and uxx := 2
∂x
∂x
are always approximated by central difference quotients, i.e.
unj+1 − unj−1
unj+1 − 2 unj + unj−1
ux ≈
and uxx ≈
2 Δx
(Δx)2
(21)
at a grid point (j, n). Here unj = u(xj , tn).
Depending on how ut is approximated, we have three basic schemes: explicit, implicit,
and Crank–Nicolson schemes.
177
4. Explicit Scheme.
If ut is approximated by a forward difference quotient
− unj
un+1
j
ut ≈
Δt
at (j, n),
then the corresponding difference equation to (17) at grid point (j, n) is
n
n
wjn+1 = λ wj+1
+ (1 − 2 λ) wjn + λ wj−1
,
where
Δt
.
2
(Δx)
The initial condition is wj0 = u0(xj ), j = 0, . . . , N , and
λ = c2
n
= gb(tn), n = 0, 1, . . ..
the boundary conditions are w0n = ga(tn) and wN
The difference equations (22), j = 1, . . . , N − 1, can be solved explicitly.
178
(22)
5. Implicit Scheme.
If ut is approximated by a backward difference quotient
− unj
un+1
j
ut ≈
Δt
at (j, n + 1),
then the corresponding difference equation to (17) at grid point (j, n + 1) is
n+1
n+1
−λ wj+1
+ (1 + 2 λ) wjn+1 − λ wj−1
= wjn .
(23)
The difference equations (23), j = 1, . . . , N − 1, together with the initial and boundary conditions as before, can be solved using the Crout algorithm or the SOR algorithm.
179
Explicit Method.
n
n
n
wjn+1 − wjn
w
−
2
w
+
w
j−1
j
j+1
= c2
Δt
(Δx)2
(†)
Δt
Letting λ := c2 (Δx)
2 gives (22).
Implicit Method.
n+1
n+1
n+1
wjn+1 − wjn
w
−
2
w
+
w
j−1
j
j+1
= c2
Δt
(Δx)2
Δt
Letting λ := c2 (Δx)
2 gives (23).
In matrix form


1 + 2 λ −λ


 −λ 1 + 2 λ −λ


w = b.


...
...


−λ 1 + 2 λ
180
(‡)
The matrix is tridiagonal and diagonally dominant.
181
⇒ Crout / SOR.
6. Crank–Nicolson Scheme.
The Crank–Nicolson scheme is the average of the explicit scheme at (j, n) and the
implicit scheme at (j, n + 1).
The resulting difference equation is
λ n+1
λ n+1 λ n
λ n
− wj−1
+ (1 + λ) wjn+1 − wj+1
= wj−1 + (1 − λ) wjn + wj+1
.
(24)
2
2
2
2
The difference equations (24), j = 1, . . . , N − 1, together with the initial and boundary conditions as before, can be solved using Crout algorithm or SOR algorithm.
182
Crank–Nicolson.
1
[(†) + (‡)] gives
2
n+1
n+1
n+1
n
n
wjn+1 − wjn 1 2 wj−1
− 2 wjn + wj+1
1 2 wj−1 − 2 wj + wj+1
+ c
= c
Δt
(Δx)2
(Δx)2
2
2
Δt
Letting μ := 12 c2 (Δx)
2 =
λ
2
gives
n+1
n+1
− μ wj+1
+ (1 + 2 μ) wjn+1 − μ wj−1
=w
bjn+1
n
n
and w
bjn+1 = μ wj+1
+ (1 − 2 μ) wjn + μ wj−1
.
This can be interpreted as
w
bjn+1 —
wjn+1 —
predictor
(explicit method)
corrector
(implicit method)
183
7. Local Truncation Errors.
These are measures of the error by which the exact solution of a differential equation does not satisfy the difference equation at the grid points and are obtained by
substituting the exact solution of the continuous problem into the numerical scheme.
A necessary condition for the convergence of the numerical solutions to the continuous
solution is that the local truncation error tends to zero as the step size goes to zero.
In this case the method is said to be consistent.
It can be shown that all three methods are consistent.
The explicit and implicit schemes have local truncation errors O(Δt, (Δx)2), while
that of the Crank–Nicolson scheme is O((Δt)2, (Δx)2).
184
Local Truncation Error.
For the explicit scheme we get for the LTE at (j, n)
Enj =
u(xj , tn+1) − u(xj , tn)
u(xj−1, tn) − 2 u(xj , tn) + u(xj+1, tn)
.
− c2
2
Δt
(Δx)
With the help of a Taylor expansion at (xj , tn) we find that
u(xj , tn+1) − u(xj , tn)
= ut(xj , tn) + O(Δt) ,
Δt
u(xj−1, tn) − 2 u(xj , tn) + u(xj+1, tn)
2
=
u
(x
,
t
)
+
O((Δx)
).
xx
j
n
2
(Δx)
Hence
Enj = ut(xj , tn) − c2 uxx(xj , tn) +O(Δt) + O((Δx)2) .
|
{z
}
= 0
185
8. Numerical Stability.
Consistency is only a necessary but not a sufficient condition for convergence.
Roundoff errors incurred during calculations may lead to a blow up of the solution or
erode the whole computation.
A scheme is stable if roundoff errors are not amplified in the calculations.
The Fourier method can be used to check if a scheme is stable.
Assume that a numerical scheme admits a solution of the form
vjn = a(n)(ω) ei j ω Δx ,
√
where ω is the wave number and i = −1.
186
(25)
Define
a(n+1)(ω)
G(ω) = (n)
,
a (ω)
where G(ω) is an amplification factor, which governs the growth of the Fourier component a(ω).
The von Neumann stability condition is given by
|G(ω)| ≤ 1
for 0 ≤ ωΔx ≤ π.
It can be shown that the explicit scheme is stable if and only if λ ≤ 12 , called conditionally stable, and the implicit and Crank–Nicolson schemes are stable for any
values of λ, called unconditionally stable.
187
Stability Analysis.
For the explicit scheme we get on substituting (25) into (22) that
a(n+1)(ω) ei j ω Δx = λ a(n)(ω) ei (j+1) ω Δx + (1 − 2 λ) a(n)(ω) ei j ω Δx + λ a(n)(ω) ei (j−1) ω Δx
⇒
a(n+1)(ω)
G(ω) = (n)
= λ ei ω Δx + (1 − 2 λ) + λ e−i ω Δx .
a (ω)
The von Neumann stability condition then is
|G(ω)| ≤ 1
⇐⇒
⇐⇒
⇐⇒
⇐⇒
⇐⇒
for all 0 ≤ ω Δx ≤ π.
|λ ei ω Δx + (1 − 2 λ) + λ e−i ω Δx| ≤ 1
|(1 − 2 λ) + 2λ cos(ω
Δx)| ≤ 1
2 ω Δx
|≤1
|1 − 4 λ sin
2
ω
Δx
0 ≤ 4 λ sin2
≤2
2
1
0≤λ≤
2 ω Δx
2 sin
2
188
[cos 2 α = 1 − 2 sin2 α]
This is equivalent to 0 ≤ λ ≤ 12 .
189
Remark.
The explicit method is stable, if and only if
(Δx)2
Δt ≤
.
(†)
2
2c
(†) is a strong restriction on the time step size Δt. If Δx is reduced to 12 Δx, then Δt
must be reduced to 14 Δt.
So the total computational work increases by a factor 8.
Example.
ut = uxx
Take Δx = 0.01. Then
(x, t) ∈ [0, 1] × [0, 1]
1
⇒
Δt ≤ 0.00005
λ≤
2
I.e. the number of grid points is equal to
1 1
= 100 × 20, 000 = 2 × 106 .
Δx Δt
190
Remark.
In vector notation, the explicit scheme can be written as
wn+1 = A wn + bn ,
n
T
N −1
where wn = (w1n, . . . , wN
)
∈
R
and
−1


1 − 2λ
λ


 λ

1 − 2λ λ

 ∈ R(N −1)×(N −1),
A=

... ...


λ 1 − 2λ
For the implicit method we get
B wn+1 = wn + bn+1 ,


λ w0n


 0 


n
N −1
.

b = . 
.
∈
R



 0 
n
λ wN


1 + 2 λ −λ


 −λ 1 + 2 λ −λ


.
where B = 

...
...


−λ 1 + 2 λ
191
Remark.
Forward diffusion equation:
Backward diffusion equation:
ut − c2 uxx = 0
ut + c2 uxx = 0
u(x, T ) = uT (x)
u(a, t) = ga(t),
u(b, t) = gb(t)
t ≥ 0.
t≤T
∀x
∀t
[as before]
[Note: We could use the transformation v(x, t) := u(x, T − t) in order to transform this
into a standard forward diffusion problem.]
We can solve the backward diffusion equation directly by starting at t = T and solving
“backwards”, i.e. given wn+1, find wn.
e wn + b̃n
Implicit: wn+1 = A
e wn+1 = wn + b̃n
Explicit: B
The von Neumann stability condition for the backward problem then becomes
n
a (ω) e
|G(ω)| = n+1 ≤ 1 .
a (ω)
192
Stability of the Binomial Model.
The binomial model is an explicit method for a backward equation.
1
1
n+1
n+1
n+1
n+1
Vjn = (p Vj+1
+ (1 − p) Vj−1
) = (p Vj+1
+ 0 Vjn+1 + (1 − p) Vj−1
)
R
R
for j = −n, −n + 2, . . . , n − 2, n and n = N − 1, . . . , 1, 0.
N
N
N
N
, V−N
,
.
.
.
,
V
,
V
Here the initial values V−N
+2
N −2 N are given.
Now let Vjn = a(n)(ω) ei j ω Δx, then
1
a(n)(ω) ei j ω Δx =
p a(n+1)(ω) ei (j+1) ω Δx + (1 − p) a(n+1)(ω) ei (j−1) ω Δx
R
−r Δt
i ω Δx
−i ω Δx
e
⇒ G(ω) = p e
+ (1 − p) e
e
−r Δt
= cos(ω Δx) + (2 p − 1) i sin(ω Δx) e
| {z }
= q
⇒
2
e
|G(ω)|
= (cos2(ω Δx) + q 2 sin2(ω Δx)) e−2 r Δt
= (1 + (q 2 − 1) sin2(ω Δx)) e−2 r Δt
≤ e−2 r Δt ≤ 1
if q 2 ≤ 1 ⇐⇒ −1 ≤ q ≤ 1 ⇐⇒ p ∈ [0, 1]. Hence the binomial model is stable.
193
Stability of the CRR Model.
We know that the binomial model is stable if p ∈ (0, 1).
For the CRR model we have that
u=e
σ
√
Δt
,
d=e
−σ
√
Δt
,
so p ∈ (0, 1) is equivalent to u > R > d.
Clearly, for Δt small, we can ensure that
σ
e
√
Δt
p=
R−d
,
u−d
> er Δt .
Hence the CRR model is stable, if Δt is sufficiently small, i.e. if Δt <
σ2
.
r2
Alternatively,
one can argue (less rigorously) as follows. Since Δx = u S − S =
√
√
σ Δt
S (e
− 1) ≈ S σ Δt and as the BSE can be written as
1
ut + σ 2 S 2 uSS + . . . = 0 ,
|2 {z }
c2
194
it follows that
Δt
1 2 2 Δt
1
=
λ=c
= σ S 2 2
(Δx)2 2
S σ Δt 2
2
195
⇒
CRR is stable.
9. Simplification of the BSE.
Assume V (S, t) is the price of a European option at time t.
Then V satisfies the Black–Scholes equation (13) with appropriate initial and boundary conditions.
Define
τ = T − t,
w(τ, x) = eα x+β τ V (t, S) ,
x = ln S,
where α and β are parameters.
Then the Black–Scholes equation can be transformed into a basic diffusion equation:
∂w 1 2 ∂ 2w
= σ
∂τ
2 ∂x2
with a new set of initial and boundary conditions.
Finite difference methods can be used to solve the corresponding difference equations
and hence to derive option values at grid points.
196
Transformation of the BSE.
Consider a call option.
∂V
Let τ = T −t be the remaining time to maturity. Set u(x, τ ) = V (x, t). Then ∂u
=
−
∂τ
∂t
and the BSE (13) is equivalent to
1
uτ = σ 2 S 2 uSS + r S uS − r u ,
(†)
2
(IC)
u(S, 0) = V (S, T ) = (S − X)+ ,
u(0, τ ) = V (0, t) = 0 ,
u(S, τ ) = V (S, t) ≈ S as S → ∞.
(BC)
Let x = ln S ( ⇐⇒ S = ex). Set ũ(x, τ ) = u(S, τ ). Then
ũx = uS ex = S uS ,
ũxx = S uS + S 2 uSS
and (†) becomes
1 2
1 2
ũτ = σ ũxx + (r − σ ) ũx − r ũ ,
2
2
ũ(x, 0) = u(S, 0) = (ex − X)+ ,
ũ(x, τ ) = u(0, τ ) = 0 as x → −∞ ,
(‡)
(IC)
ũ(x, τ ) = u(ex, τ ) ≈ ex as x → ∞.
197
(BC)
−a x2
Note that the growth condition (20), lim|x|→∞ ũ(x, τ ) e
= 0 for any a > 0, is satisfied.
Hence (‡) is well defined.
Let w(x, τ ) = eα x+β τ ũ(x, τ ) ⇐⇒ ũ(x, τ ) = e−α x−β τ w(x, τ ) =: C w(x, τ ). Then
ũτ = C (−β w + wτ )
ũx = C (−α w + wx)
ũxx = C (−α (−α w + wx) + (−α wx + wxx)) = C (α2 w − 2 α wx + wxx) .
So (‡) is equivalent to
1 2
1
σ C (α2 w − 2 α wx + wxx) + (r − σ 2) C (−α w + wx) − r C w .
2
2
In order to cancel the w and wx terms we need to have


−β = 1 σ 2 α2 − (r − 1 σ 2) α − r ,
α = 1 (r − 1 σ 2) ,
2
2
2
σ2
⇐⇒
 0 = 1 σ 2 (−2 α) + r − 1 σ 2 .
β = 1 2 (r − 1 σ 2)2 + r .
C (−β w + wτ ) =
2
2
2σ
198
2
With this choice of α and β the equation (‡) is equivalent to
1 2
wτ = σ wxx ,
2
w(x, 0) = eα x ũ(x, 0) = eα x (ex − X)+ ,
w(x, τ ) = 0 as x → −∞ ,
(])
(IC)
w(x, τ ) ≈ eα x+β τ ex as x → ∞.
(BC)
Note that the growth condition (20) is satisfied. Hence (]) is well defined.
Implementation.
1. Choose a truncated interval [a, b] to approximate (−∞, ∞).
e−8 = 0.0003, e8 = 2981
⇒
[a, b] = [−8, 8] serves all practical purposes.
2. Choose integers N , M to get the step sizes Δx =
b−a
N
and Δτ =
T −t
M .
Grid points (xj , τn):
xj = a + j Δx, j = 0, 1, . . . , N and τn = n Δτ , n = 0, 1, . . . , M .
Note: x0, xN and τ0 represent the boundary of the grid with known values.
199
3. Solve (]) with
w(x, 0) = eα x (ex − X)+ ,

e(α+1) b+β τ
w(a, τ ) = 0 , w(b, τ ) =
eα b (eb − X) eβ τ
(IC)
or
(a better choice)
.
(BC)
Note: If the explicit method is used, N and M need to be chosen such that
1 2 Δτ
1
σ
≤
2 (Δx)2 2
σ 2 (T − t) 2
M≥
N .
(b − a)2
⇐⇒
If the implicit or Crank–Nicolson scheme is used, there are no restrictions on N , M .
Use Crout or SOR to solve.
4. Assume w(xj , τM ), j = 0, 1, . . . , N are the solutions from step 3, then the call option
price at time t is
V (Sj , t) = e−α xj −β (T −t) w(xj , T
− }t)
| {z
τM
where Sj = exj and T − t ≡ τM .
200
j = 0, 1, . . . , N ,
Note: The Sj are not equally spaced.
201
10. Direct Discretization of the BSE.
Exercise: Apply the Crank–Nicolson scheme directly to the BSE (13), i.e. there is
no transformation of variables, and write out the resulting difference equations and
do a stability analysis.
C++ Exercise: Write a program to solve the BSE (13) using the result of the
previous exercise and the Crout algorithm. The inputs are the interest rate r, the
volatility σ, the current time t, the expiry time T , the strike price X, the maximum
price Smax, the number of intervals N in [0, Smax], and the number of subintervals
M in [t, T ]. The output are the asset prices Sj , j = 0, 1, . . . , N , at time t, and their
corresponding European call and put prices (with the same strike price X).
202
11. Greeks.
Assume that the asset prices Sj and option values Vj , j = 0, 1, . . . , N , are known at
time t.
The sensitivities of V at Sj , j = 1, . . . , N − 1, are computed as follows:
∂V
Vj+1 − Vj−1
|S=Sj ≈
δj =
,
∂S
Sj+1 − Sj−1
which is
Vj+1 − Vj−1
, if S is equally spaced.
2 ΔS
2
γj =
∂ V
|S=Sj ≈
2
∂S
Vj+1 −Vj
Sj+1 −Sj
Sj − Sj−1
Vj+1 − 2 Vj + Vj−1
, if S is equally spaced.
which is
(ΔS)2
203
V −V
− Sjj −Sj−1
j−1
,
12. Diffusion Equations of Two State Variables.
2
2
∂u
∂ u ∂ u
= α2
+ 2 , (x, y, t) ∈ [a, b] × [c, d] × [0, ∞).
2
∂t
∂x
∂y
(26)
The initial conditions are
∀ (x, y) ∈ [a, b] × [c, d] ,
u(x, y, 0) = u0(x, y)
and the boundary conditions are
u(a, y, t) = ga(y, t),
u(b, y, t) = gb(y, t)
∀ y ∈ [c, d], t ≥ 0 ,
u(x, c, t) = gc(x, t),
u(x, d, t) = gd(x, t)
∀ x ∈ [a, b], t ≥ 0 .
and
Here we assume that all the functions involved are consistent, in the sense that they
have the same value at common points, e.g. ga(c, t) = gc(a, t) for all t ≥ 0.
204
13. Grid Points.
(xi, yj , tn), where
b−a
, i = 0, . . . , I ,
xi = a + i Δx, Δx =
I
d−c
yj = c + j Δy, Δy =
, j = 0, . . . , J ,
J
T
tn = n Δt, Δt = , n = 0, . . . , N
N
and I, J, N are integers.
Recalling the finite differences (21), we have
uxx
uni+1,j − 2 uni,j + uni−1,j
≈
(Δx)2
and uyy
at a grid point (i, j, n).
205
uni,j+1 − 2 uni,j + uni,j−1
≈
.
(Δy)2
Depending on how ut is approximated, we have three basic schemes: explicit, implicit,
and Crank–Nicolson schemes.
206
14. Explicit Scheme.
If ut is approximated by a forward difference quotient
n
un+1
i,j − ui,j
ut ≈
Δt
at (i, j, n),
then the corresponding difference equation at grid point (i, j, n) is
n+1
n
n
n
n
n
wi,j
= (1 − 2 λ − 2 μ) wi,j
+ λ wi+1,j
+ λ wi−1,j
+ μ wi,j+1
+ μ wi,j−1
(27)
for i = 1, . . . , I − 1 and j = 1, . . . , J − 1, where
Δt
λ=α
(Δx)2
2
and
Δt
μ=α
.
(Δy)2
2
(27) can be solved explicitly. It has local truncation error O(Δt, (Δx)2, (Δy)2), but
is only conditionally stable.
207
15. Implicit Scheme.
If ut is approximated by a backward difference quotient ut ≈
then the difference equation at grid point (i, j, n + 1) is
n
un+1
i,j −ui,j
Δt
at (i, j, n + 1),
n+1
n+1
n+1
n+1
n+1
n
− λ wi+1,j
− λ wi−1,j
− μ wi,j+1
− μ wi,j−1
= wi,j
(1 + 2 λ + 2 μ) wi,j
(28)
for i = 1, . . . , I − 1 and j = 1, . . . , J − 1.
For fixed n, there are (I − 1) (J − 1) unknowns and equations. (28) can be solved by
relabeling the grid points and using the SOR algorithm.
(28) is unconditionally stable with local truncation error O(Δt, (Δx)2, (Δy)2), but is
more difficult to solve, as it is no longer tridiagonal, so the Crout algorithm cannot
be applied.
16. Crank–Nicolson Scheme.
It is the average of the explicit scheme at (i, j, n) and the implicit scheme at (i, j, n +
1). It is similar to the implicit scheme but with the improved local truncation error
O((Δt)2, (Δx)2, (Δy)2).
208
Solving the Implicit Scheme.
n+1
n+1
n+1
n+1
n+1
n
(1 + 2 λ + 2 μ) wi,j
− λ wi+1,j
− λ wi−1,j
− μ wi,j+1
− μ wi,j−1
= wi,j
With SOR for ω ∈ (0, 2).
For each n = 0, 1, . . . , N
1. Set wn+1,0 := wn and
n+1,0
n+1,0
n+1,0
n+1,0
fill in the boundary conditions w0,j
, wI,j
, wi,0
, wi,J
for all i, j.
2. For k = 0, 1, . . .
For i = 1, . . . I − 1, j = 1, . . . , J − 1
n+1,k
n+1,k+1
n+1,k
n+1,k+1
k+1
1
n
= 1+2 λ+2
+
λ
w
+
λ
w
+
μ
w
+
μ
w
w
w
bi,j
i,j
i+1,j
i−1,j
i,j+1
i,j−1
μ
n+1,k+1
n+1,k
k+1
= (1 − ω) wi,j
+ωw
bi,j
wi,j
until kwn+1,k+1 − wn+1,k k < ε.
3. Set wn+1 = wn+1,k+1.
209
With Block Jacobi/Gauss–Seidel.


w1,j


 w2,j 
 ∈ RI−1, j = 1, . . . , J − 1,
Denote w
ej = 
 .. 


wI−1,j
210


w
e1


 w
e2 
(I−1) (J−1)

w= . 
.
∈
R

 . 
w
eJ−1
On setting c = 1 + 2 λ + 2 μ, we have from (28) for j fixed









 w1,j+1
 w1,j−1
c −λ
w1,j







−μ
−μ
−λ c −λ  w2,j  



w2,j+1  
w2,j−1 

... 
.

+




.. 
+










..
..
..
...





−μ 
−μ 
−λ c wI−1,j
wI−1,j+1
wI−1,j−1


n+1
n
w1,j + λ w0,j


n


w2,j


n+1
n+1
n+1
n
.

+
B
w
e
+
B
w
e
=
d
⇐⇒
A
w
e
=
.
j
j
j+1
j−1




n
wI−2,j


n+1
n
wI−1,j
+ λ wI,j
211
Rewriting
as

n+1
n+1
ej+1
+Bw
ej−1
= dnj
Aw
ejn+1 + B w



j = 1, . . . , J − 1



den1
dn1 − B
A B


 n 



 d2 
B A B
dn2





..
 =:  ..  ,
 . . . . . . . . .   ..  = 
 









 .  
B A B  .  
dnJ−2

dnJ−2

n+1
B A
w
eJ−1
denJ−1
dnJ−1 − B w
eJn+1
w
e1n+1
  n+1 
 w
e 
 2 
w
e0n+1
where w
e0n+1 and w
eJn+1 represent boundary points, leads to the following Block Jacobi
212
iteration: For k = 0, 1, . . .
Aw
e1n+1,k+1 = −B w
e2n+1,k + den1
Aw
e2n+1,k+1 = −B w
e1n+1,k − B w
e3n+1,k + dn2
..
n+1,k+1
n+1,k
n+1,k
Aw
eJ−2
= −B w
eJ−3
−Bw
eJ−1
+ dnJ−2
Aw
en+1,k+1 = −B w
en+1,k + den
J−1
J−1
J−2
Similarly, the Block Gauss–Seidel iteration is given by:
For k = 0, 1, . . .
Aw
e1n+1,k+1 = −B w
e2n+1,k + den1
Aw
e2n+1,k+1 = −B w
e1n+1,k+1 − B w
e3n+1,k + dn2
..
n+1,k+1
n+1,k+1
n+1,k
Aw
eJ−2
= −B w
eJ−3
−Bw
eJ−1
+ dnJ−2
Aw
en+1,k+1 = −B w
en+1,k + den
J−1
J−2
213
J−1
In each case, use the Crout algorithm to solve for w
ejn+1,k+1, j = 1, . . . , J − 1.
Note on Stability.
(n+1) (ω) Recall that in 1d a scheme was stable if |G(ω)| = aa(n)(ω)
≤ 1, where vjn = a(n)(ω) ei j ω Δx.
In 2d, this is adapted to
n
vi,j
(n)
√
= a (ω) e
√
−1 i ω Δx+ −1 j ω Δy
214
.
17. Alternating Direction Implicit (ADI) Method.
An alternative finite difference method is the ADI scheme, which is unconditionally
stable while the difference equations are still tridiagonal and diagonally dominant.
The ADI algorithm can be used to efficiently solve the Black–Scholes two asset pricing
equation:
Vt + 12 σ12 S12 VS1S1 + 12 σ22 S22 VS2S2 + ρ σ1 σ2 S1 S2VS1S2 + r S1VS1 + r S2VS2 − r V = 0.
(29)
See Clewlow and Strickland (1998) for details on how to transform the Black–Scholes
equation (29) into the basic diffusion equation (26) and then to solve it with the ADI
scheme.
215
ADI scheme
Implicit method at (i, j, n + 1):
n+1
n+1
n+1
n+1
n
n
n
n
wi,j
− wi,j
w
−
2
w
+
w
w
−
2
w
+
w
i,j
i,j−1
i+1,j
i,j
i−1,j
2 i,j+1
= α2
+α
Δt
(Δx)2
(Δy)2
|
{z
}
approx. uxx using (i, j, n) data
Implicit method at (i, j, n + 2):
n+2
n+1
n+2
n+2
n+2
n+1
n+1
n+1
wi,j
− wi,j
w
−
2
w
+
w
w
−
2
w
+
w
i+1,j
i,j
i−1,j
i,j+1
i,j
i,j−1
2
+
α
= α2
Δt
(Δx)2
(Δy)2
|
{z
}
approx. uyy using (i, j, n + 1) data
We can write the two equations as follows:
n+1
n+1
n+1
n
n
n
−μ wi,j+1
+ (1 + 2 μ) wi,j
− μ wi,j−1
= λ wi+1,j
+ (1 − 2 λ) wi,j
+ λ wi−1,j
n+2
n+2
n+2
n+1
n+1
n+1
−λ wi+1,j
+ (1 + 2 λ) wi,j
− λ wi−1,j
= μ wi,j+1
+ (1 − 2 μ) wi,j
+ μ wi,j−1
216
(†)
(‡)
n+1
To solve (†), fix i = 1, . . . , I − 1 and solve a tridiagonal system to get wi,j
for j =
1, . . . , J − 1.
This can be done with e.g. the Crout algorithm.
n+2
for i =
To solve (‡), fix j = 1, . . . , J − 1 and solve a tridiagonal system to get wi,j
1, . . . , I − 1.
Currently the method works on the interval [tn, tn+2] and has features of an explicit
method. In order to obtain an (unconditionally stable) implicit method, we need to
n
adapt the method so that it works on the interval [tn, tn+1] and hence gives values wi,j
for all n = 1, . . . , N .
n+ 12
1
Introduce the intermediate time point n + 2 . Then (†) generates wi,j (not used) and
n+1
.
(‡) generates wi,j
μ n+ 12
μ n+ 12
λ n
λ n
n+ 12
n
− wi,j+1 + (1 + μ) wi,j − wi,j−1 = wi+1,j + (1 − λ) wi,j + wi−1,j
2
2
2
2
1
1
λ n+1
λ n+1
μ n+ 2
μ n+ 12
n+ 2
n+1
− wi+1,j + (1 + λ) wi,j − wi−1,j = wi,j+1 + (1 − μ) wi,j + wi,j−1
2
2
2
2
217
(†)
(‡)
9. Simulation
1. Uniform Random Number Generators.
In order to use simulation techniques, we first need to generate independent samples
from some given distribution functions.
The simplest and the most important distribution in simulation is the uniform distribution U [0, 1].
Note that if X is a U [0, 1] random variable, then Y = a + (b − a)X is a U [a, b]
random variable.
We focus on how to generate U [0, 1] random variables.
Examples.
• 2 outcomes, p1 = p2 =
1
2
(coin)
• 6 outcomes, p1 = . . . = p6 =
1
6
(dice)
• 3 outcomes, p1 = p2 = 0.49, p3 = 0.05
218
(roulette wheel)
2. Linear Congruential Generator.
A common technique is to generate a sequence of integers ni, defined recursively by
ni = (a ni−1) mod m
(30)
for i = 1, 2, . . . , N , where n0 (6= 0) is called the seed, a > 0 and m > 0 are integers
such that a and m have no common factors.
(30) generates a sequence of numbers in the set {1, 2, . . . , m − 1}.
Note that ni are periodic with period ≤ m − 1, this is because there are not m
different ni and two in {n0, . . . , nm−1} must be equal: ni = ni+p with p ≤ m − 1.
If the period is m − 1 then (30) is said to have full period.
The condition of full period is that m is a prime, am−1 − 1 is divisible by m, and
aj − 1 is not divisible by m for j = 1, . . . , m − 2.
Example. n0 = 35, a = 13, m = 100.
Then the sequence is {35, 55, 15, 95, 35, . . .}. So p = 4.
219
3. Pseudo–Uniform Random Numbers.
If we define
ni
m
then xi is a sequence of numbers in the interval (0, 1).
xi =
If (30) has full period then these xi’s are called pseudo-U [0, 1] random numbers.
In view of the periodic property, the number m should be as large as possible, because
a small set of numbers makes the outcome easier to predict – a contrast to randomness.
The main drawback of linear congruential generators is that consecutive points obtained lie on parallel hyperplanes, which implies that the unit cube cannot be uniformly filled with these points.
x
i+1
For example, if a = 6 and m = 11, then the ten
distinct points generated lie on just two parallel lines
in the unit square.
xi
220
4. Choice of Parameters.
A good choice of a and m is given by a = 16807 and m = 2147483647 = 231 − 1.
The seed n0 can be any positive integer and can be chosen manually.
This allows us to repeatedly generate the same set of numbers, which may be useful
when we want to compare different simulation techniques.
In general, we let the computer choose n0 for us, a common choice is the computer’s
internal clock.
For details on the implementation of LCG, see Press et al. (1992).
For state of the art random number generators (linear and nonlinear), see the pLab
website at http://random.mat.sbg.ac.at/. It includes extensive information on
random number generation, as well as links to free software in a variety of computing
languages.
221
5. Normal Random Number Generators
Once we have generated a set of pseudo-U [0, 1] random numbers, we can generate
pseudo-N (0, 1) random numbers.
Again there are several methods to generate a sequence of independent N (0, 1) random numbers.
Note that if X is a N (0, 1) random variable, then Y = μ + σ X is a N (μ, σ 2) random
variable.
222
6. Convolution Method.
This method is based on the Central Limit Theorem.
We know that if Xi are independent identically distributed random variables, with
finite mean μ and finite variance σ 2, then
Pn
i=1 Xi − n μ
√
Zn =
(31)
σ n
converges in distribution to a N (0, 1) random variable, i.e.
P (Zn ≤ x) → Φ(x),
If X is U [0, 1] distributed then its mean is
1
2
n → ∞.
and its variance is
1
12 .
If we choose n = 12 then (31) is simplified to
Zn =
12
X
i=1
223
Xi − 6.
(32)
Note that Zn generated in this way is only an approximate normal random number.
This is due to the fact that the Central Limit Theorem only gives convergence in
distribution, not almost surely, and n should tend to infinity.
In practice there are hardly any differences between pseudo-N (0, 1) random numbers
generated by (32) and those generated by other techniques.
The disadvantage is that we need to generate 12 uniform random numbers to generate
1 normal random number, and this does not seem efficient.
224
7. Box–Muller Method.
This is a direct method as follows: Generate two independent U [0, 1] random numbers
X1 and X2, define
Z = h(X)
where X = (X1, X2) and Z = (Z1, Z2) are R2 vectors and h : [0, 1]2 → R2 is a vector
function defined by
p
p
−2 ln x1 cos(2 π x2), −2 ln x1 sin(2 π x2) .
(z1, z2) = h(x1, x2) =
Then Z1 and Z2 are two independent N (0, 1) random numbers.
225
This result can be easily proved with the transformation method. Specifically, we can
find the inverse function h−1 by
1 2
1
z2
2
.
(x1, x2) = h−1(z1, z2) = e− 2 (z1 +z2 ),
arctan
z1
2π
The absolute value of the determinant of the Jacobian matrix is
∂(x1, x2) = √1 e− 12 z12 ∙ √1 e− 12 z22 .
∂(z1, z2) 2π
2π
(33)
From the transformation theorem of random variables and (33) we know that if X1
and X2 are independent U [0, 1] random variables, then Z1 and Z2 are independent
N (0, 1) random variables.
226
8. Correlated Normal Distributions.
Assume X ∼ N (μ, Σ), where Σ ∈ Rn×n is symmetric positive definite.
To generate a normal vector X do the following:
(a) Calculate the Cholesky decomposition Σ = LLT .
(b) Generate n independent N (0, 1) random numbers Zi and let Z = (Z1, . . . , Zn).
(c) Set X = μ + L Z.
Example.
To generate 2 correlated N (μ, σ 2) RVs with correlation coefficient ρ, let
p
(Y1, Y2) = (Z1, ρ Z1 + 1 − ρ2 Z2)
and then set
(X1, X2) = (μ + σ Y1, μ + σ Y2) .
227
9. Inverse Transformation Method.
To generate a random variable X with known cdf F (x), the inverse transform method
sets
X = F −1(U ) ,
where U is a U [0, 1] random variable (i.e. X satisfies F (X) = U ∼ U [0, 1])).
The inverse of F is well defined if F is strictly increasing and continuous, otherwise
we set
F −1(u) = inf{x : F (x) ≥ u}.
Exercise.
Use the inversion method to generate X from a Rayleigh distribution, i.e.
2
−x
F (x) = 1 − exp
.
2
2σ
228
Inverse Transformation Method.
X = F −1(U )
⇒
P (X ≤ x) = P (F −1(U ) ≤ x)
= P (U ≤ F (x))
Z F (x)
=
1 du
0
= F (x)
⇒
X∼F
Examples.
• Exponential with parameter a > 0: F (x) = 1 − e−a x, x ≥ 0.
Let U ∼ U [0, 1] and
F (X) = U ⇐⇒ 1 − e−a X = U
1
1
⇐⇒ X = − ln(1 − U ) = − ln U .
a
a
229
Hence X ∼ Exp(a).
• Arcsin law: Let Bt be a Brownian motion on the time interval [0, 1]. Let T =
arg max0≤t≤1 Bt. Then, for any t ∈ [0, 1] we have that
√
2
P (T ≤ t) = arcsin t =: F (t) .
π
Similarly, let L = sup{t ≤ 1 : Bt = 0}. Then, for any s ∈ [0, 1] we have that
√
2
P (L ≤ s) = π arcsin s = F (s).
How to generate from this distribution?
Let U ∼ U [0, 1] and
√
2
arcsin X = U
F (X) = U ⇐⇒
π
2
1 1
πU
= − cos(π U ).
⇐⇒ X = sin
2
2 2
Hence X ∼ F .
230
10. Acceptance-Rejection Method.
Assume X has pdf f (x) on a set S, g is another pdf on S from which we know how
to generate samples, and f is dominated by g on S, i.e. there exists a constant c such
that
f (x) ≤ c g(x) ∀ x ∈ S .
The acceptance-rejection method
generates a sample X from g and a sample U from U [0, 1], and
f (X)
accepts X as a sample from f if U ≤
and
c g(X)
rejects X otherwise, and repeats the process.
Exercise.
Suppose the pdf f is defined over a finite interval [a, b] and is bounded by M . Use
the acceptance-rejection method to generate X from f .
231
Acceptance-Rejection Method.
Let Y be a sample returned by the algorithm.
Then Y has the same distribution of X conditional on U ≤
So for any measurable subset A ⊂ S, we have
P (Y ∈ A) = P (X ∈ A | U ≤
f (X)
)=
c g(X)
f (X)
c g(X) .
P (X ∈ A & U ≤
P (U ≤
f (X)
c g(X) )
f (X)
c g(X) )
.
Now, as U ∼ U [0, 1],
Z
Z
Z
f (X)
f (x)
f (x)
1
1
P (U ≤
f (x) dx = ,
) = P (U ≤
) g(x) dx =
g(x) dx =
c g(X)
c
g(x)
c
g(x)
c
c
S
S
S
which yields
Z
Z
f (x)
f (X)
f (x) dx .
)=c
g(x) dx =
P (Y ∈ A) = c P (X ∈ A & U ≤
c g(X)
A c g(x)
A
As A was chosen arbitrarily, we have that
Y ∼f.
232
11. Monte Carlo Method for Option Pricing.
A wide class of derivative pricing problems come down to the evaluation of the following expectation
E [f (Z(T ; t, z))] ,
where Z denotes the stochastic process that describes the price evolution of one or
more underlying financial variables such as asset prices and interest rates, under the
respective risk neutral probability distributions.
The process Z has the initial value z at time t.
The function f specifies the value of the derivative at the expiration time T .
Monte Carlo simulation is a powerful and versatile technique for estimating the expected value of a random variable.
Examples.
• E[e
−r (T −t)
+
(ST − X) ], where ST = St e
(r− 12 σ 2 ) (T −t)+σ
• E[e−r (T −t) (maxt≤τ ≤T Sτ − X)+], where Sτ as before.
233
√
T −t Z
, Z ∼ N (0, 1).
12. Simulation Procedure.
(a) Generate sample paths of the underlying state variables (asset prices, interest
rates, etc.) according to risk neutral probability distributions.
(b) For each simulated sample path, evaluate discounted cash flows of the derivative.
(c) Take the sample average of the discounted cash flows over all sample paths.
C++ Exercise: Write a program to simulate paths of stock prices S satisfying the
discretized SDE:
√
ΔS = r S Δt + σ S Δt Z
where Z ∼ N (0, 1). The inputs are the initial asset price S, the time horizon T , the
number of partitions n (time-step Δt = Tn ), the interest rate r, the volatility σ, and
the number of simulations M . The output is a graph of paths of the stock price (or
the data needed to generate the graph).
234
13. Main Advantage and Drawback.
With the Monte Carlo approach it is easy to price complicated terminal payoff function such as path-dependent options.
Practitioners often use the brute force Monte Carlo simulation to study newly invented
options.
However, the method requires a large number of simulation trials to achieve a high
level of accuracy, which makes it less competitive compared to the binomial and finite
difference methods, when analytic properties of an option pricing model are better
known and formulated.
[Note: The CLT tells us that the Monte Carlo estimate is correct to order O( √1M ). So
to incease the accuracy by a factor of 10, we need to compute 100 times more paths.]
235
14. Computation Efficiency.
Suppose Wtotal is the total amount of computational work units available to generate
an estimate of the value of the option V . Assume there are two methods for generating MC estimates, requiring W1 and W2 units of computational work for each run.
Assume Wtotal is divisible by both W1 and W2. Denote by V1i and V2i the samples for
the estimator of V using method 1 and 2, and σ1 and σ2 their standard deviation.
The sample means for estimating V using Wtotal amount of work are, respectively,
N1
N2
X
X
1
1
V1i and
V2i
N1 i=1
N2 i=1
where Ni = WWtotal
, i = 1, 2. The law of large numbers tells us that the above two
i
estimators are approximately normal with mean V and variances
σ22
σ12
and
.
N1
N2
Hence, method 1 is preferred over method 2 provided that σ12 W1 ≤ σ22 W2.
236
15. Antithetic Variate Method.
Suppose {εi} are independent samples from N (0, 1) for the simulation of asset prices,
so that
√
2
i
(r− σ2 ) (T −t)+σ T −t εi
ST = St e
for i = 1, . . . , M , where M is the total number of simulation runs.
An unbiased estimator of a European call option price is given by
M
M
1 X i
1 X −r (T −t)
c =
e
max(STi − X, 0).
ĉ =
M i=1
M i=1
We observe that {−εi} are also independent samples from N (0, 1), and therefore the
simulated price
σ 2 ) (T −t)−σ √T −t ε
i
(r−
i
Se = St e 2
T
for i = 1, . . . , M , is a valid sample of the terminal asset price.
237
A new unbiased estimator is given by
M
M
1 X i
1 X −r (T −t)
c̃ =
c̃ =
e
max(SeTi − X, 0).
M i=1
M i=1
Normally, we expect ci and c̃i to be negatively correlated, i.e. if one estimate overshoots the true value, the other estimate undershoots the true value.
The antithetic variate estimate is defined to be
ĉ + c̃
cˉ =
.
2
The antithetic variate method improves the computation efficiency provided that
cov(ci, c̃i) ≤ 0.
[Reason: Var(X + Y ) = Var(X) + Var(Y ) + 2 cov(X, Y ).
⇒
Var(ˉc) = 12 Var(ĉ) + 12 cov(ĉ, c̃)]
238
16. Control Variate Method.
Suppose A and B are two similar options and the analytic price formula for option B
is available.
Let VA and VbA be the true and estimated values of option A, VB and VbB are similar
notations for option B.
The control variate estimate of option A is defined to be
VeA = VbA + (VB − VbB ),
where the error VB − VbB is used as an adjustment in the estimation of VA.
The control variate method reduces the variance of the estimator of VA when the
options A and B are strongly (positively) correlated.
239
The general control variate estimate of option A is defined to be
Ve β = VbA + β (VB − VbB ),
A
where β ∈ R is a parameter.
The minimum variance of VeAβ is achieved when
cov(VbA, VbB )
∗
β =
.
b
Var(VB )
Unfortunately, cov(VbA, VbB ) is in general not available.
One may estimate β ∗ using the regression technique from the simulated option values
VAi and VBi , i = 1, . . . , M .
Note: E(X + β (E(Y ) − Y )) = E(X) + β (E(Y ) − E(Y )) = E(X) and
Var(X + β(E(Y ) − Y )) = Var(X − β Y ) = Var(X) + β 2 Var(Y ) − 2 β cov(X, Y ) .
240
17. Other Variance Reduction Procedures.
Other methods include
• importance sampling that modifies distribution functions to make sampling more
efficient,
• stratified sampling that divides the distribution regions and takes samples from
these regions according to their probabilities,
• moment matching that adjusts the samples such that their moments are the same
as those of distribution functions,
1
• low-discrepancy sequence that leads to estimating error proportional to
rather
M
1
than √ , where M is the sample size.
M
For details of these methods see Hull (2005) or Glasserman (2004).
241
18. Pricing European Options.
If the payoff is a function of the underlying asset at one specific time, then we can
simplify the above Monte Carlo procedure.
For example, a European call option has a payoff max(S(T ) − X, 0) at expiry.
If we assume that S follows a lognormal process, then
(r− 12 σ 2 ) T +σ
S(T ) = S0 e
√
TZ
(34)
where Z is a standard N (0, 1) random variable.
To value the call option, we only need to simulate Z to get samples of S(T ).
There is no need to simulate the whole path of S.
The computational work load will be considerably reduced.
Of course, if we want to price path-dependent European options, we will have to
simulate the whole path of the asset process S.
242
C++ Exercise: Write a program to price European call and put options using the
Monte Carlo simulation with the antithetic variate method. The standard normal
random variables can be generated by first generating U [0, 1] random variables and
then using the Box–Muller method. The terminal asset prices can be generated by
(34). The inputs are the current asset price S0, the exercise price X, the volatility
σ, the interest rate r, the exercise time T , and the number of simulations M . The
outputs are European call and put prices at time t = 0 (same strike price).
243
10. American Option Pricing
1. Formulation.
A general class of American option pricing problems can be formulated by specifying
a Markov process {X(t), 0 ≤ t ≤ T } representing relevant financial variables such as
underlying asset prices, an option payoff h(X(t)) at time t, an instantaneous short
rate process {r(t), 0 ≤ t ≤ T }, and a class of admissible stopping times T with
values in [0, T ].
The American option pricing problem is to find the optimal expected discounted
payoff
R
− 0τ r(u) du
h(X(τ ))].
sup E[e
τ ∈T
It is implicit that the expectation is taken with respect to the risk-neutral measure.
In this course we assume that the short rate r(t) = r, a non-negative constant for
0 ≤ t ≤ T.
244
For example, if the option can only be exercised at times 0 < t1 < t2 < ∙ ∙ ∙ < tm =
T (this type of option is often called the Bermudan option), then the value of an
American put can be written as
sup E[e−r ti (K − Si)+] ,
i=1,...,m
where K is the exercise price, Si the underlying asset price S(ti), and r the risk-free
interest rate.
245
American Option
In general, the value of an American option is higher than that of the corresponding
European option.
However, an American call for a non-dividend paying asset has the same value as the
European call.
This follows from the fact that in this case the optimal exercise time is always the
expiration time T , as can be seen from the following argument.
Suppose the holder wants to exercise the call at time t < T , when S(t) > X. Exercising
would give a profit of S(t) − X. Instead, one could keep the option and sell the asset
short at time t, then purchase the asset at time T by either
(a) exercising the option at time t = T or
(b) buying at the market price at time T .
Hence the holder gained an amount S(t) > X at time t and paid out min(S(T ), X) ≤ X
at time T . This is better than just S(t) − X at time t.
246
2. Dynamic Programming Formulation.
Let hi denote the payoff function for exercise at time ti, Vi(x) the value of the option
at ti given Xi = x, assuming the option has not previously been exercised.
We are interested in V0(X0).
This value is determined recursively as follows:
Vm(x) = hm(x) ,
Vi−1(x) = max{hi−1(x), Ci−1(x)},
i = m, . . . , 1 ,
where hi(x) is the immediate exercise value at time ti,
Ci−1(x) = E[Di Vi(Xi) | Xi−1 = x]
is the expected discounted continuation value at time ti−1, and
Di = e−r (ti−ti−1)
is the discount factor over [ti−1, ti].
247
3. Stopping Rule.
The optimal stopping time τ is determined by
τ = min{ti : hi(Xi) ≥ Ci(Xi), i = 1, . . . , m}.
bi(x) is the approximation to Ci(x), then the sub-optimal stopping time τ̂ is
If C
determined by
bi(Xi), i = 1, . . . , m}.
τ̂ = min{ti : hi(Xi) ≥ C
248
4. Binomial Method.
At the terminal exercise time tm = T , the American option value at node j is given
i
i
,
j
=
0,
1,
.
.
.
,
m,
where
h
=
h(S
by Vjm = hm
j
j
j ) is the intrinsic value at (i, j), e.g.
h(s) = (K − s)+.
At time ti, i = m − 1, . . . , 0, the option value at node j is given by
1
i+1
+ (1 − p) Vji+1] .
Vji = max hij , [p Vj+1
R
Dynamic programming is used to find the option value at time t0.
R−d
and the values of p, u, d are determined as usual.
In (35) we have p =
u−d
249
(35)
5. Finite Difference Method.
The dynamic programming approach cannot be applied with the implicit or Crank–
Nicolson scheme.
Suppose the difference equation with an implicit scheme has the form
aj−1 Vj−1 + aj Vj + aj+1 Vj+1 = dj
(36)
for j = 1, . . . , N − 1, where the superscript “n + 1” is omitted for brevity and dj
represents the known quantities.
Recall the SOR algorithm to solve (36) is given by
ω
(k)
(k−1)
(k)
(k−1)
(k−1)
+
dj − aj−1 Vj−1 − aj Vj
− aj+1 Vj+1
Vj = V j
aj
for j = 1, . . . , N − 1 and k = 1, 2, . . ..
250
Let e.g. hj = (Sjn+1 − K)+ be the intrinsic value of the American option at node
(j, n + 1).
The solution procedure to find Vjn+1 is then
ω
(k)
(k−1)
(k)
(k−1)
(k−1)
Vj = max hj , Vj
+ (dj − aj−1 Vj−1 − aj Vj
− aj+1 Vj+1 )
aj
for j = 1, . . . , N − 1 and k = 1, 2, . . ..
The procedure (37) is called the projected SOR scheme.
251
(37)
6. Random Tree Method.
This is a Monte Carlo method based on simulating a tree of paths of the underlying
Markov chain X0, X1, . . . , Xm.
Fix a branching parameter b ≥ 2.
From the initial state X0 simulate b independent successor states X11, . . . , X1b all
having the law of X1.
From each X1i simulate b independent successors X2i1, . . . , X2ib from the conditional
law of X2 given X1 = X1i .
From each X2i1i2 generate b successors X3i1i21, . . . , X3i1i2b, and so on.
A generic node in the tree at time step i is denoted by Xij1j2∙∙∙ji .
At the mth time step there are bm nodes and the computational cost has exponential
growth.
(e.g. if b = 5 and m = 5, there are 3125 nodes;
if b = 5 and m = 10, there are about 10 million nodes.)
252
7. High Estimator.
Write Vbij1j2∙∙∙ji for the value of the high estimator at node Xij1j2∙∙∙ji .
At the terminal nodes we set
Working backwards we get
Vbij1j2∙∙∙ji
j1 j2 ∙∙∙jm
Vbmj1j2∙∙∙jm = hm(Xm
).


b


X
j1 j2 ∙∙∙ji 1
j1 j2 ∙∙∙ji j
= max hi(Xi
),
Di+1Vbi+1
.


b j=1
Vb0 is biased high in the sense that
E[Vb0] ≥ V0(X0)
and converges in probability and in norm to the true value V0(X0) as b → ∞.
253
8. Low Estimator.
Write v̂ij1j2∙∙∙ji for the value of the low estimator at node Xij1j2∙∙∙ji .
At the terminal nodes we set
j1 j2 ∙∙∙jm
j1 j2 ∙∙∙jm
v̂m
= hm(Xm
).
Working backwards, for k = 1, . . . , b, we set
(
Pb
j1 j2 ∙∙∙ji
j1 j2 ∙∙∙ji j
1
) if b−1 j=1,j6=k Di+1v̂i+1
≤ hi(Xij1j2∙∙∙ji )
hi(Xi
j1 j2 ∙∙∙ji
=
v̂i,k
j1 j2 ∙∙∙ji k
Di+1v̂i+1
otherwise
we then set
b
v̂ij1j2∙∙∙ji
1 X j1j2∙∙∙ji
=
v̂i,k
.
b
k=1
v̂0 is biased low and converges in probability and in norm to V0(X0) as b → ∞.
254
Example. High Estimator.
Low Estimator.
92 (0)
112
(12)
100
(9)
97
(9)
105
(6)
112
(12)
115 (15)
92 (0) ← 12
115 (15) ← 12
85 (0)
85 (0) ← 12
107 (7)
107 (7) ← 7
100
(8)
102 (2)
97
(9)
102 (2) ← 2
118 (18)
118 (18) ← 18
114 (14)
114 (14) ← 5
105
(3)
104 (4)
104 (4) ← 4
94 (0) ← 0
94 (0)
Exercise price K = 100, interest rate r = 0.
Asset price (call price)
255
9. Confidence Interval.
Let Vˉ0(M ) denote the sample mean of the M replications of Vb0, sV (M ) the sample
standard deviation, and z δ the 1 − 2δ quantile of the normal distribution. Then
2
sV (M )
ˉ
V0(M ) ± z δ √
2
M
provides an asymptotically valid (for large M ) 1 − δ confidence interval for E[Vb0].
Similarly, we can get the confidence interval for E[v̂0].
The interval
sv (M )
sV (M )
vˉ0(M ) − z δ √ , Vˉ0(M ) + z δ √
2
2
M
M
contains the unknown value V0(X0) with probability of at least 1 − δ.
256
10. Regression Method.
Assume that the continuation value can be expressed as
E(Vi+1(Xi+1) | Xi = x) =
n
X
βir ψr (x) = βiT ψ(x) ,
(38)
r=1
where βi = (βi1, . . . , βin)T , ψ(x) = (ψ1(x), . . . , ψn(x))T , and ψr , r = 1, . . . , n, are
basis functions (e.g. polynomials 1, x, x2, . . .). Then the vector βi is given by
βi = Bψ−1 bψV ,
where Bψ = E[ψ(Xi) ψ(Xi)T ] is an n × n matrix (assumed to be non-singular),
bψV = E[ψ(Xi) Vi+1(Xi+1)] is an n column vector, and the expectation is over the
joint distribution of (Xi, Xi+1).
257
The regression method can be used to estimate βi as follows:
• generate b independent paths (X1j , . . . , Xmj ), j = 1, . . . , b, and
• suppose that the values Vi+1(Xi+1,j ) are known, then
b −1 b̂ψV ,
β̂i = B
ψ
bψ is an n × n matrix with qr entry
where B
b
1X
ψq (Xij ) ψr (Xij )
b j=1
and b̂ψV is an n vector with rth entry
b
1X
ψr (Xij ) Vi+1(Xi+1,j ).
b j=1
However, Vi+1 is unknown and must be replaced by some estimated value Vbi+1.
258
11. Pricing Algorithm.
(a) Generate b independent paths {X0j , X1j , . . . , Xmj }, j = 1, . . . , b, of the Markov
chain, where X0j = X0, j = 1, . . . , b.
(b) Set Vbmj = hm(Xmj ), j = 1, . . . , b.
(c) For i = m − 1, . . . , 0, and given estimated values Vbi+1,j , j = 1, . . . , b, use the
regression method to calculate β̂i and set Vbij = max{hi(Xij ), Di+1 β̂iT ψ(Xij )}.
b
X
(d) Set Vb0 = 1b
Vb0j .
j=1
The regression algorithm is biased high and Vb0 converges to V0(X0) as b → ∞ if the
relation (38) holds for all i = 0, . . . , m − 1.
259
12. Longstaff–Schwartz Algorithm.
Same as 10.11, except in computing Vbij :
(
T
β̂
(X
)
if
D
ψ(Xij ) ≤ hi(Xij )
h
i
ij
i+1
i
b
Vij =
Di+1 Vbi+1,j otherwise
The LS algorithm is biased low and Vb0 converges to V0(X0) as b → ∞ if the relation
(38) holds for all i = 0, . . . , m − 1.
13. Other Methods. Parametric approximation, state-space partitioning, stochastic
mesh method, duality algorithm, and obstacle (free boundary-value) problem formulation.
(See Glasserman (2004) and Seydel (2006) for details.)
260
11. Exotic Option Pricing
1. Introduction.
Derivatives with more complicated payoffs than the standard European or American
calls and puts are called exotic options, many of which are so called path-dependent
options.
Some common exotic options are
• Asian option: Average price call and put, average strike call and put. Arithmetic
or geometric average can be taken.
• Barrier option: Up/down-and-in/out call/put. Double barrier, partial barrier
and other time dependent effects possible.
• Range forward contract: It is a portfolio of a long call with a higher strike price
and a short put with a lower strike price.
• Bermudan option: It is a nonstandard American option in which early exercise
is restricted to certain fixed times.
261
• Compound option: It is an option on an option. There are four basic forms:
call-on-call, call-on-put, put-on-call, put-on-put.
• Chooser option: The option holders have the right to choose whether the option
is a call or a put.
• Binary (digital) option: The terminal payoff is a fixed amount of cash, if the
underlying asset price is above a strike price, and it pays nothing otherwise.
• Lookback option: The terminal payoff depends on the maximum or minimum
realized asset price over the life of the option.
262
Examples.
Range forward contract: Long call and short put
Assume you have a long position in a call with strike price X2 and a short position in a put




S −
with strike price X1 < X2. Then the terminal payoff is given by (S−X2)+−(X1−S)+ = 0



S −
X1
X2
S
Buy this option, if you do not think that the asset price will go down.
⇒ You can generate cash from a price increase.
263
Option price = Call price - Put price
264
Compound option: Call on call option
|
t
C(t) = ?
|
|
T1
T2
e 2) = (S(T2) − X2)+
C(T
e
e 1) ≡ C(S(T
C(T
1 ), X2 , T1 , . . .)
e 1) − X1)+
C(T1) = (C(T
Binary (digital) option:
Price = E[e−r T Q 1{ST ≥X}] = e−r T Q E[1{ST ≥X}] = e−r T Q P (ST ≥ X) .
The price can be easily computed, since ST is lognormally distributed.
265
Lookback Option
Let
m = min S(u)
t≤u≤T
and
• (Floating strike) Call: (ST − m)+.
• (Floating strike) Put: (M − ST )+.
• Fixed strike Call: (M − X)+.
• Fixed strike Put: (X − m)+.
266
M = max S(u) .
t≤u≤T
2. Barrier Options.
The existence of options at the expiration date depends on whether the underlying
asset prices have crossed certain values (barriers).
There are four basic forms: down-and-out, up-and-out, down-and-in, up-and-in.
If there is only one barrier then we can derive analytic formulas for standard European
barrier options.
However, there are many situations (two barriers, American options, etc.), in which
we have to rely on numerical procedures to find the option values.
The standard binomial and trinomial schemes can be used for this purpose, but their
convergence is very slow, because the barriers assumed by the tree is different from
the true barrier.
267
Outer barrier
True barrier
•
•
Inner barrier
•
•
•
•
Barriers assumed by binomial trees.
•
•
•
•
•
•
•
•
•
The usual tree calculations implicitly assume that the outer barrier is the true barrier.
268
Outer barrier
True barrier
Inner barrier
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Barriers assumed by trinomial trees.
269
•
•
•
•
•
3. Positioning Nodes on Barriers.
Assume that there are two barriers H1 and H2 with H1 < H2.
Assume that the trinomial scheme is used for the option pricing with m = 1 and
d = u1 .
Choose u such that nodes lie on both barriers.
u must satisfy H2 = H1 uN (and hence H1 = H2 dN ), or equivalently
ln H2 = ln H1 + N ln u ,
with N an integer.
√
It is known that u = e 3 Δt is a good choice in the standard trinomial tree, recall
the Hull–White model (7.20).
σ
270
A good choice of N is therefore
ln H2 − ln H1
√
N = int
+ 0.5
σ 3 Δt
and u is determined by
1
H2
ln
.
N H1
Normally, the trinomial tree is constructed so that the central node is the initial asset
price S. In this case, the asset price at the first node is the initial asset price. After
that, we choose the central node of the tree (beginning with the first period) to be
H1 uM , where M is an integer that makes this quantity as close as possible to S, i.e.
ln S − ln H1
M = int
+ 0.5 .
ln u
u = exp
271
H2
S
•
•
•
H1 u3
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
H1
Tree with nodes lying on each of two barriers.
272
•
•
•
•
•
•
•
4. Adaptive Mesh Model.
Computational efficiency can be greatly improved if one projects a high resolution
tree onto a low resolution tree in order to achieve a more detailed modelling of the
asset price in some regions of the tree.
E.g. to price a standard American option, it is useful to have a high resolution tree
near its maturity around the strike price.
To price a barrier option it is useful to have a high resolution tree near its barriers.
See Hull (2005) for details.
273
5. Asian Options.
The terminal payoff depends on some form of averaging of the underlying asset prices
over a part or the whole of the life of the option.
There are two types of terminal payoffs:
• max(A − X, 0) (average price call) and
• max(ST − A, 0) (average strike call),
where A is the average of asset prices, and there are many different ways of computing
A.
From now on we focus on pricing average price calls, the results for average strike
calls can be obtained similarly.
We also assume that the asset price S follows a lognormal process in a risk-neutral
world.
274
6. Continuous Arithmetic Average.
The average A is computed as
1
A = I(T ) ,
T
where the function I(t) is defined by
Z t
I(t) =
S(u) du.
0
Assume that V (S, I, t) is the option value at time t. Then V satisfies the following
diffusion equation:
∂V
∂V
∂V 1 2 2 ∂ 2V
−
r
V
+
S
+rS
+ σ S
=0
2
∂t
∂S 2
∂S
∂I
with suitable boundary and terminal conditions.
(39)
If finite difference methods are used to solve (39), then the schemes are prone to
serious oscillations and implicit schemes have poor performance, due to the missing
of one diffusion term in this two-dimensional PDE.
275
7. Continuous Geometric Average.
The average A is computed as
1
A = exp
I(T ) ,
T
where the function I(t) is defined by
Z t
ln S(u) du .
I(t) =
0
The option value V satisfies the following diffusion equation:
∂V
∂V
∂V 1 2 2 ∂ 2V
−
r
V
+
ln
S
+rS
+ σ S
= 0.
(40)
2
∂t
∂S 2
∂S
∂I
Exercise. Show that if we define a new variable
I + (T − t) ln S
y=
T
and seek a solution of the form V (S, I, t) = F (y, t), then the equation (40) can be
reduced to a one-state-variable PDE:
2
1 2 T − t ∂F
∂F 1 σ (T − t) ∂ 2F
+ (r − σ )
+
− rF = 0.
∂t 2
T
∂y 2
2
T
∂y
276
8. Discrete Geometric Average: Case t ≥ T0.
Assume that the averaging period is [T0, T ] and that the sampling points are ti =
0
T0 + i Δt, i = 1, 2, . . . , n, Δt = T −T
n .
The average A is computed as
A=
n
Y
S(ti)
i=1
! n1
.
Assume
t ≥ T0 ,
i.e. the current time is already within the averaging period.
Assume
t = tk + α Δt
for some integer k: 0 ≤ k ≤ n − 1 and 0 ≤ α < 1.
277
e + μA and variance σ 2 ,
Then ln A is a normal random variable with mean ln S(t)
A
where
1
n−k
e
S(t)
= [S(t1) ∙ ∙ ∙ S(tk )] n S(t) n
2
σ
(n − k − 1) (n − k)
n−k
μA = r −
(T − T0)
(1
−
α)
+
2
n2
2 n2
2
(n − k − 1) (n − k) (2n − 2k − 1)
(n − k)
σA2 = σ 2 (T − T0)
(1
−
α)
+
n3
6 n3
The European call price at time t is
1
2
e eμA+ 2 σA Φ(d1) − X Φ(d2)
c(S, t) = e−r (T −t) S(t)
where
1
d1 =
σA
and
e
S(t)
ln
+ μA
X
d2 = d 1 − σ A .
278
!
+ σA
Proof.
t = tk + α Δt, Δt =
T −T0
n ,
0 ≤ k ≤ n − 1, 0 ≤ α < 1

 n1
2
n−k
)
(t
)
(tk+1) n−k
S(t
S
S
n
n−1

∙∙∙
S (t) S(tk ) ∙ ∙ ∙ S(t1)
A = [S(t1) ∙ ∙ ∙ S(tn)] =
2
n−k
{z
}
S(tn−1) S (tn−2)
S (t) |
1
n
=
2
Rn Rn−1
∙∙∙
1
n−k−1 n−k n
Rk+2
Rt
known
e
S(t)
e + 1 [ln Rn + 2 ln Rn−1 + . . . + (n − k − 1) ln Rk+2 + (n − k) ln Rt]
ln A = ln S(t)
n
Since S is a lognormal process (dS = r S dt + σ S dW ), we have that ln Rn, . . . , ln Rk+2,
ln Rt are independent normally distributed and
⇒
ln Rn, . . . , ln Rk+2 ∼ N (μ Δt, σ 2 Δt),
ln Rt ∼ N (μ (tk+1 − t), σ 2 (tk+1 − t)) = N (μ (1 − α) Δt, σ 2 (1 − α) Δt) ,
where μ = r − 12 σ 2.
279
Hence ln A is normal with mean
1
e
E[ln A] = E[ln S(t)] + E [ln Rn + 2 ln Rn−1 + . . . + (n − k − 1) ln Rk+2 + (n − k) ln Rt]
n
e + 1 [μ Δt + 2 μ Δt + . . . + (n − k − 1) μ Δt + (n − k) μ (1 − α) Δt]
= ln S(t)
n
1
e
= ln S(t) + μ Δt [1 + 2 + . . . + (n − k − 1) + (n − k) (1 − α)]
n (n − k − 1) (n − k) n − k
e
e + μA .
= ln S(t) + μ Δt
+
(1 − α) = ln S(t)
2n
n
Moreover, the variance of ln A is given by
1 2
2
Var(ln A) = 2 Var(ln Rn) + . . . + (n − k − 1) Var(ln Rk+2) + (n − k) Var(ln Rt)
n
1 2 2
2
2
2
= 2 σ Δt 1 + 2 + . . . + (n − k − 1) + (n − k − 1) (1 − α)
n
2
(n − k − 1) (n − k) (2 n − 2 k − 1) (n − k)
2
= σ 2 Δt
+
(1
−
α)
=
σ
.
A
2
2
6n
n
[Used:
n
X
i=1
n
X
n (n + 1)
n (n + 1) (2n + 1)
i=
i2 =
and
.]
2
6
i=1
280
As A = eln A, A is lognormally distributed and the important formula in 5.5. tells us
that
E[(A − X)+] = E(A) Φ(d1) − X Φ(d2) ,
where d1 =
Now
1
s
s
2
ln E(A)
+
and
d
=
d
−
s,
with
s
= Var(ln A).
2
1
2
X
and so, on recalling 5.4.,
2
μA + 12 σA
e
E(A) = S(t) e
.
Combining gives
where
e + μA, σA2 )
ln A ∼ N (ln S(t)
e eμA+ 12 σA2 Φ(d1) − X Φ(d2) ,
E[(A − X)+] = S(t)
"
#
"
#
e
e
S(t)
S(t)
1
1 2
1
1
d1 =
ln
ln
+ μA + σ A + σ A =
+ μA + σ A
σA
X
2
2
σA
X
281
and d2 = d1 − σA.
Finally, the Asian (European) call price is e−r (T −t) E[(A − X)+].
282
Aside:
n
X
i=1
n
X
n (n + 1)
n (n + 1) (2n + 1)
and
.
i=
i2 =
2
6
i=1
Let f (x) = x then
" #
i
Z
Z 1
n
n
n
2
2
X n
X1
X1
i
1
i
i−1
2 n
=
x | i−1 =
f (x) dx =
x dx =
−
i−1
2
2
2
n
n
n
0
n
i=1
i=1
i=1
n
n
n
n
X
1
i
1
1 X
1 X
1 X
1
i− 2
1= 2
i−
2 2− 2 = 2
=
2 n
n
n i=1
2 n i=1
n i=1
2n
i=1
Hence
n
1 X
1
1
n+1
i= +
=
n2 i=1
2 2n
2n
⇒
n
X
i=1
n (n + 1)
i=
.
2
Similarly, for f (x) = x2 we obtain
Z 1
n Z i
X
n
1
f (x) dx =
x2 dx = . . .
=
i−1
3
0
n
i=1
283
9. Discrete Geometric Average: Case t < T0.
Exercise.
Assume
t < T0 ,
i.e. the current time is before the averaging period.
Show that ln A is a normal random variable with mean ln S(t) + μA and variance σA2 ,
where
2
σ
n+1
(T0 − t) +
μA = r −
(T − T0)
2
2
n
(n + 1) (2n + 1)
σA2 = σ 2 (T0 − t) +
(T − T0) .
2
6n
The European call price at time t is
1
2
c(S, t) = e−r (T −t) S eμA+ 2 σA Φ(d1) − X Φ(d2)
1
S
where d1 = σ ln X + μA + σA and d2 = d1 − σA.
A
284
10. Limiting Case n = 1.
If n = 1 then A = S(T ).
Furthermore,
μA
σA2
σ2
= (r − ) (T − t)
2
2
= σ (T − t).
The call price reduces to the Black–Scholes call price.
285
A = (Π1i=1S(ti))1 = S(t1) = S(T ), independently from the choice of T0.
Hence choose e.g. T0 > t.
σ2
σ2
⇒ μA = (r − ) [(T0 − t) + (T − T0)] = (r − ) (T − t) ,
2
2
σA2 = σ 2 [(T0 − t) + (T − T0)] = σ 2 (T − t) .
Call price:
c = e−r (T −t) S e
where
2
(r− σ2 ) (T −t)+ 12 σ 2 (T −t)
Φ(d1) − X Φ(d2)
= S Φ(d1) − X e−r (T −t) Φ(d2) ,
1
S
ln + μA + σA
σA
X
√
1
S
σ2
= √
ln + (r − ) (T − t) + σ T − t
2 σ T −t X
1
S
1 √
ln + r (T − t) + σ T − t
= √
X
2
σ T −t
1 √
1
S
√
and d2 = d1 − σA = σ T −t ln X + r (T − t) − 2 σ T − t.
d1 =
286
This is just the Black–Scholes pricing formula (16), see 7.13.
287
11. Limiting Case n = ∞.
If n → ∞ then A → exp
1
T − T0
Furthermore, if time t < T0 then
Z
T
ln S(u) du .
T0
2
σ
1
1
μA → (r − )
T + T0 − t ,
2
2
2
2
2
2 1
σA → σ
T + T0 − t .
3
3
If time t ≥ T0 then
1
e
exp
S(t)
→ [S(t)]
T − T0
σ 2 (T − t)2
μA → r −
,
2 2 (T − T0)
3
2
2 (T − t)
σA → σ
.
2
3 (T − T0)
T −t
T −T0
Z
t
ln S(u) du ,
T0
The above values, together with the discrete average price formulas, then yield pricing
288
formulas for the continuous geometric mean.
289
Proof.
Two cases: t < T0 and t ≥ T0. Here we look at the latter.
2
1
k
1
k 1
k
σ
(1 − ) (1 − α) + (1 − − ) (1 − ) ,
μA = r −
(T − T0)
n
n
2
n n
n
2
1
k 2
1
k 1
k
k 1
2
2
σA = σ (T − T0)
(1 − ) (1 − α) + (1 − − ) (1 − ) (2 − 2 − ) .
n
n
6
n n
n
n n
T −T0
0
+
α
Since t = tk + α Δt = T0 + k Δt + α Δt = T0 + k T −T
n
n , we have that
t − T0
k α
k
k
t − T0
T −t
= +
⇐⇒ 1 − →
.
⇒
→
T − T0 n n
T − T0
n
n
T − T0
Hence, as n → ∞, we have that
2
2
2
σ
σ
1 (T − t)
(T − t)2
= r−
μA → r −
(T − T0)
,
2
2 (T − T0)2
2 2 (T − T0)
3
3
1 (T − t)
2
2
2 (T − t)
=σ
.
σA → σ (T − T0)
3
2
3 (T − T0)
3 (T − T0)
290
Moreover,
1
n−k
e
n
S(t) = [S(t1) ∙ ∙ ∙ S(tk )] S(t) n
1
n−k
e
ln S(t)
⇒ ln S(t) = [ln S(t1) + . . . + ln S(tk )] +
n
n
1
k
=
Δt [ln S(t1) + . . . + ln S(tk )] + (1 − ) ln S(t) .
T − T0
n
As t = tk + α Δt → tk as n → ∞, we have
Z t
1
T −t
e
ln S(u) du +
ln S(t)
ln S(t) →
T − T 0 T0
T − T0
Z t
T −t 1
ln S(u) du + ln S T −T0 (t)
=
T − T0 T 0
Z t
T −t
1
e → S T −T0 (t) exp
⇒
S(t)
ln S(u) du .
T − T0 T0
1
Note that the average A = [S(t1) ∙ ∙ ∙ S(tn)] n converges to
Z t
1
ln S(u) du
as n → ∞ ,
exp
T − T0 T0
291
which is just the continuous geometric average.
292
12. Discrete Arithmetic Average.
The average A is computed as
n
1X
A=
S(ti).
n i=1
The option value is difficult to compute because A is not lognormal.
The binomial method can be used but great care must be taken so that the number
of nodes does not increase exponentially.
Monte Carlo simulation is possibly the most efficient method.
Since an option with discrete arithmetic average is very similar to that with discrete
geometric average, and since the latter has an analytic pricing formula, the control
variate method (with the geometric call price as control variate) can be used to reduce
the variance of the Monte Carlo simulation.
293
Download