Implementing Finite Difference Solvers for the Black-Scholes PDE Objective We seek the function V (S,t) satisfying the PDE V V 2V rS rV 0 t S 2 S 2 2 …subject to the constraints S 0, t 0,T V (S,T) g(S) …where g, the terminal boundary condition, is the payoff at maturity of the option whose value will be given by V. We require that the underlying follow geometric Brownian motion. For the purposes of this development, we will assume that it does not pay dividends, and that both the risk-free rate and the underlying’s volatility are constant over the life of the option, although in practice finite-difference methods are capable of handling all of these elaborations of the GBM model. The general strategy is to use a change of variables to transform the PDE into one that’s more tractable. Analytic solutions to the PDE depend upon the observation that it can be transformed into the heat equation in a function u(x, ) u 2 u x 2 …subject to the constraints x , 0, f u(x,0) f (x) ..where f, the initial boundary condition, is g from the original formulation of the problem suitably transformed. Although the heat equation does have an analytic solution for a large class of boundary conditions f, most such solutions require at least some degree of numerical estimation in order to be evaluated. Finite difference methods of solving the equation are reasonably fast and easily extensible, particularly to the free-boundary problems encountered with American options, where closed-form solutions are virtually never available. Change of Variables There are several ways to transform the BS-PDE into the heat equation, or a relative thereof. In practice there are a variety of considerations that inform what change of variables is appropriate for a given application. For the sake of illustration, this document uses a change of variables that completely transforms the BSPDE into the heat equation. Although it may not seem particularly clean or intuitively appealing, this choice turns out to have a number of desirable properties, not the least of which is that it results in the simplest possible PDE for our numerical solver to work on. S x ln K 2 2 (T t) u(x, ) V (S,t) axb e K where: a b 2 r 2r 2 2 2 a2 and K is an arbitrary positive constant, usually chosen to be the strike price of the option. The Finite Difference Method in a Nutshell Once the problem has been transformed into the heat equation, we use algebraic methods to solve it numerically. This involves splitting the finite time interval 0, into M equal subintervals of length , f resulting in a discretized time domain with M+1 nodes. We also split the spatial dimension into equal subintervals—N intervals of length x, giving rise to a spatial discretization with N+1 nodes. Since the spatial domain is actually infinite, we must choose endpoints xleft and xright at which V(S,t) is known to an acceptable degree of accuracy in advance. These values become a priori side boundary conditions. The result is a rectangular domain of (N+1)(M+1) nodes, of which the nodes at =0 and those at x=xleft and x=xright have known u-values. The algorithm will step through this domain in , solving for the unknown u-values at the interior nodes at each time step, and after M iterations will produce the u-values at f, corresponding to the time at which we wish to price the option. Along the way, it will produce estimates of the option’s value at many different choices of S and t, a fact that can be made use of in calculating the delta, gamma, and theta of the option as well. Recall from Taylor polynomials that the first and second derivatives of a locally well-behaved function f can be expressed at x using the following finite differences, with h some positive number: First derivative, forward: f (x) f (x h) f (x) O(h) h First derivative, backward: f (x) f (x) f (x h) O(h) h First derivative, central: f (x) f (x h) f (x h) O(h 2 ) 2h Second derivative, central: f (x) f (x h) 2 f (x) f (x h) O(h 2 ) h2 The O-terms express the order of the error in the approximation used—that is, the smallest polynomial power in the term capturing the difference between the actual derivative and the numerical approximation suggested by the first term. As h becomes small, it is the lowest-degree term that dictates how rapidly the error approaches zero, and thus higher-order approximations are preferred. An Explicit Method: Forward Euler Explicit finite-difference methods calculate unknown u-values one at a time, using only known u-values. As a result, iterations can be done without solving a system of equations, making these methods simpler to implement and computationally less intensive than implicit methods. The drawbacks of explicit methods are mainly that they may take many time steps to achieve acceptable accuracy, and that they are often not unconditionally stable. To find the unknown value u(x,), the forward Euler method uses the known values at three other nodes: u(x-x,-), u(x,-), and u(x+x,-). That is, the values required are those from the previous time step at the corresponding node in space, and both of the adjacent spatial nodes. With this choice, we arrive at the following equations: u 2 u x 2 u(x, ) u(x, ) u(x x, ) 2u(x, ) u(x x, ) x 2 u(x, ) u(x x, ) (1 2 )u(x, ) u(x x, ) where x 2 This value is sometimes called the Courant constant. It can be shown that the forward Euler method of pricing an option is equivalent to using a trinomial tree to price it. That is, the method above is quite similar to taking risk-neutral expectations of three possible outcomes to price the option at each node. This provides an intuitive sense—although by no means a rigorous explanation—of the stability requirement for the forward Euler method when it is applied to the heat equation: must be less than ½. Clearly, any value greater than this assigns a negative risk-neutral probability to the middle transition, a condition that one might anticipate causing problems. Implementing Forward Euler For the purposes of this illustration, we will use forward Euler to price a European call option on an equity. The inputs for our model are as follows: (1) Black-Scholes inputs: S0, the spot price of the underlying K, the strike price of the call , the volatility of the underlying r, the risk-free rate T, the time to expiry of the option (2) Discretization inputs M, the number of time intervals , the Courant constant D, the pricing radius—more on this below Step 1: Computational domain Given T and , we determine f for the domain under our change of variables: f 2T 2 With M, this determines : f M With , this determines x: x It is desirable to have the x-value corresponding to our spot price on the mesh; otherwise, we will be forced to interpolate at the end to find our option price. So we calculate x0 under our change of variables: S x 0 ln 0 K We now need to set the extreme ends of the spatial domain. No choice here is ideal: A pricing radius that’s too small will result in errors due to poor approximations at the side boundaries; a pricing radius that’s too large will result in a pricer that does far more computation than is actually required. What constitutes “too small” or “too large” is a function primarily of the volatility of the underlying and the maturity of the option. To be conservative, we will insist that the extreme ends of the spatial domain be no less than D away from the closer of x0 or the origin (the x-value associated with our strike). Thus, our spatial domain will be asymmetric for anything but at-the-money options. A reasonable, safe choice of D is ln(4), although more tailoring can be done, if desired, to increase execution speed. Under our choice of D, we first calculate initial estimates of the endpoints: x˜ left min( x 0,0) D x˜ right max( x 0 ,0) D These, however, may not lie precisely on the mesh. We also wish to keep track of the number of intervals to each side of our x0—Nleft and Nright, so we determine the endpoints of our spatial domain by: x x˜ left N left 0 1 x x left x 0 N leftx x˜ x 0 N right right 1 x x right x 0 N rightx …where [y] indicates the Heaviside step function, which returns the greatest integer less than or equal to y. The number of intervals in space is thus N=Nleft+Nright. Step 2: Boundary Conditions Of course, the terminal boundary condition for a call option is V(S,T) max( S K,0) …under the change of variables, this becomes the initial boundary condition max( Kex K,0) ax e K u(x,0) max( e x 1,0)e ax u(x,0) We make the assumption that the S-value corresponding to xleft is sufficiently far out of the money that the probability that it expires in the money is 0. Thus, the left-side boundary condition is simply u(x left , ) 0 We make the assumption that the S-value corresponding to xright is sufficiently far in the money that the probability that it expires out of the money is 0. It is easy to see that a put under this assumption struck at the same K has value 0, and thus from put-call parity the value of the call is simply that of a forward contract struck at K V (Sright,t) Sright Ker(T t ) u(x right, ) Ke x rig h t r Ke K 2 2 e axrig h t b 2r x 2 ax b rig h t u(x right, ) e e e rig h t Step 3: Iteration and Solution Solution proceeds as outlined above; pseudocode is provided below to illustrate the method in more detail. This pseudocode imagines that u-values are stored in a matrix U with N+1 rows, indexed from i=0,…,N, corresponding to the spatial nodes, and M+1 columns, indexed from j=0,…,M, corresponding to the time nodes. Under this arrangement, we use the boundary conditions above to populate the column vector at i=0 (initial boundary condition) and the row vectors at j=0 (boundary condition at xleft) and j=N (boundary condition at xright). From a speed standpoint, this method of storing information is not optimal; for large M, the memory required to store the matrix U is prohibitive, leading to performance lags. A more efficient method is to store the current iteration as a vector which is overwritten at each time step, but for the sake of clarity, the pseudocode below retains the pricing information at every node in the domain. Pseudocode for forward Euler implementation: ----------------------------------------------Function signature: inputs (S0, K, T, , r, , M, D) // Calculate relevant constants: a = (r - b = 2r /+ a2 f = 2T / 2 = f / M x = sqrt( /) x0 = ln(S0 / K) Nleft = int((x0 + D – min(x0, 0)) /x) + 1 xleft = x0 – Nleftx Nright = int((D – x0 + max(x0, 0)) / x) + 1 xright = x0 + Nright x N = Nleft + Nright // Declare matrix U’s dimensions and populate its edges with boundary condition information dim U(N+1, M+1) // Initial boundary condition for i: 0 to N xi = xleft + i x U(i, 0) = max(exp(xi) – 1, 0) exp(axi) (increment i) // Side boundary conditions for j: 1 to M U(0, j) = 0 j = j U(N, j) = (exp(xright) – exp(-2r j / )) exp(a xright + b j) (increment j) // Forward Euler solution loop: = 1 – 2 for j:1 to M for i: 1 to N-1 U(i, j) = U(i-1, j-1) + U(i, j-1) + U(i+1, j-1) (increment i) (increment j) // calculate V, the value of the option at time 0 and the current spot output: V = K U(Nleft, M) exp(-ax0 - bf) ----------------------------------------------An Implicit Method: Backward Euler Implicit finite-difference methods calculate the vector of unknown u-values wholesale at each time step, by solving a system of equations. One advantage of implicit methods is that they are unconditionally stable, meaning that the choice of is not restricted from above as it is for most explicit methods. Being able to price using large —that is, a high density of nodes in the spatial coordinate—most often means that implicit methods converge in fewer time steps than explicit methods do, although this may not necessarily mean that they are overall faster. Solving the resulting system efficiently is the main goal for any practical implementation of an implicit method. The backward Euler method uses the known value at u(x,-) to set up an equation involving the three uvalues u(x-x,), u(x,), and u(x+x,). Usually all three of these values will be unknown; the exception occurs at the first interior node along the spatial coordinate on each side, where one of these values is known from a side boundary condition. The values are related by the central finite-difference approximation of the second derivative at , rather than - as in forward Euler. With this choice, we arrive at the following equations: u 2 u x 2 u(x, ) u(x, ) u(x x, ) 2u(x, ) u(x x, ) x 2 u(x x, ) (1 2 )u(x, ) u(x x, ) u(x, ) This equation can be formulated for each of N-1 interior spatial nodes at every time step—i.e., all the spatial nodes except those at the side boundaries. If we define u to be the column vector of unknown uvalues at the interior spatial nodes at time , we have a vector of length N-1. Analogously, define u(-) to be the vector of known u-values at the interior spatial nodes at time -; this vector is also of length N-1. Then we may write the system implied by the N-1 equations of the form above as Au( ) u( ) c( ) A is the N-1 x N-1 matrix of coefficients, and c() is u(x left , ) 0 0 u(x , ) right That is, its only nonzero entries are at the top and bottom, arising from the known contributions of the two side boundary conditions being moved to the other side of the equation. The coefficient matrix A has the convenient form: 1 2 1 2 1 2 0 0 It is tridiagonal, symmetric, and strictly diagonally dominated. Symmetry and diagonal domination guarantee that A is a real positive definite matrix: All its eigenvalues are real and positive, and thus of course its determinant is positive, so A is invertible, meaning that our equation will have a unique solution. Even more important from the standpoint of speed, A can be decomposed into factors easily. As we will see, the decomposition can be accomplished in O(N) operations, and since A remains the same for each time step, the decomposition need only be done once. In turn, the fact that the matrix can be decomposed this way means that our solution of this equation at each time step can be accomplished in O(N) operations, making it at least comparable to the explicit method above in terms of solution time. Solving the Linear System For a matrix as friendly as A, a variety of methods are available to solve the linear system at each step. If simplicity of implementation is more important than speed of execution, then the matrix inversion routines available in many software packages are a suitable choice. It turns out, however, that factoring the matrix A is more time-efficient than inverting it, and solving the system of equations using the factorization is no more expensive than a matrix multiplication involving the inverse. The method described here uses LU decomposition to accomplish the iteration. The idea is to write A as the product of a lower-triangular matrix Lt and an upper-triangular matrix Ut. This decomposition is unique under certain constraints; to make our factorization easier to use, we will require that the diagonal elements of Lt all equal 1. We will omit any discussion of how LU decomposition is done for general matrices; in the case of a tridiagonal matrix which is diagonally dominated, each of the factors is banded in the same fashion as our original matrix A, and no elaborations such as pivoting are required to produce the factorization. Thus, the factorization can be represented in three vectors of values: Lt_offd: the N-2 off-diagonal elements of the lower-triangular factor Ut_d: the N-1 diagonal elements of the upper-triangular factor Ut_offd: the N-2 off-diagonal elements of the upper-triangular factor In the pseudocode below, these vectors are calculated from A; each is indexed beginning at 0: Pseudocode for LU decomposition of a tridiagonal, diagonally dominated matrix A: ----------------------------------------------Function signature: input (A) // For the sake of consistency with the above, A is assumed to be N-1 x N-1 in size. // Declare dimensions of the result vectors dim Lt_offd(N-2) dim Ut_d(N-1) dim Ut_offd(N-2) // Begin the decomposition by populating the first diagonal entry of the upper-triangular factor Ut_d(0) = A(0,0) // Step through the remaining entries for i: 0 to N-3 Ut_offd(i) = A(i,i+1) Lt_offd(i) = A(i+1,i) / Ut_d(i) Ut_d(i+1) = A(i+1,i+1) – Lt_offd(i)*Ut_offd(i) (increment i) // return the results output: Lt_offd, Ut_d, Ut_offd ----------------------------------------------Note that in the example above, since the matrix is symmetric with the same coefficients in each row, the result could be achieved by passing the function two constant values: 1+2 and –, corresponding to the diagonal and off-diagonal elements of A. In this case, the vector U_offd need not be calculated at all, since every entry will equal –. The pseudocode above, however, will work with a greater range of tridiagonal matrices, such as those found under other changes of variables. Once the factorization is done, the factors will be used in every iteration. The convenience of having A factored as LtUt is that the system Ax = b can be solved instead as LtUtx = b in two passes: First, by finding the vector y such that Lty = b, then by solving Utx = y. The structure of the factors is such that the whole process can be accomplished in the same number of operations as a single matrix multiplication, by using forward substitution to solve the first equation, then backward substitution to solve the second. Implementing Backward Euler For the purposes of this illustration, we will use backward Euler to price a European put option on an equity. The inputs for our model are the same as those in our example above of pricing the call option: (1) Black-Scholes inputs: S0, the spot price of the underlying K, the strike price of the put , the volatility of the underlying r, the risk-free rate T, the time to expiry of the option (2) Discretization inputs M, the number of time intervals , the Courant constant D, the pricing radius Step 1: Computational domain The domain is discretized in exactly same manner used above in pricing the call using forward Euler. Step 2: Boundary Conditions In the case of a put, the terminal boundary condition is V(S,T) max( K S,0) …under the change of variables, this becomes the initial boundary condition max( K Kex ,0) ax e K u(x,0) max(1 e x ,0)e ax u(x,0) The side boundary conditions for the European put are analogous those for the call. At xright, the option is considered to be certain to expire out of the money; at xleft, it is considered certain to expire in the money. This leads to the equations: u(x right, ) 0 2r2 ax b x u(x left , ) e e left e left Step 3: Iteration and Solution Solution proceeds in the manner already described. The pseudocode below once again stores all the nodal values of u in a matrix U with N+1 and M+1 columns, indexed starting at 0, with the initial column and the top and bottom rows populated using the boundary conditions. This pseudocode also includes the slight speed improvement to LU decomposition suggested above, where all the entries in each diagonal of the coefficient matrix are known to be constant. Pseudocode for forward Euler implementation: ----------------------------------------------Function signature: inputs (S0, K, T, , r, , M, D) // Calculate relevant constants: a = (r - b = 2r /+ a2 f = 2T / 2 = f / M x = sqrt( /) x0 = ln(S0 / K) Nleft = int((x0 + D – min(x0, 0)) /x) + 1 xleft = x0 – Nleftx Nright = int((D – x0 + max(x0, 0)) / x) + 1 xright = x0 + Nright x N = Nleft + Nright // Declare matrix U’s dimensions and populate its edges with boundary condition information dim U(N+1, M+1) // Initial boundary condition for i: 0 to N xi = xleft + i x U(i, 0) = max(1-exp(xi), 0) exp(axi) (increment i) // Side boundary conditions for j: 1 to M U(N, j) = 0 j = j U(0, j) = (exp(-2r j / )-exp(xleft)) exp(a xleft + b j) (increment j) // LU decomposition dim Lt_offd(N-2) dim Ut_d(N-1) dim Ut_offd(N-2) // All of the upper diagonal elements of Ut will be equal to – for i: 0 to N-3 Ut_offd = – (increment i) // Populate the first diagonal entry of Ut Ut_d(0) = 1+2 // Populate the remainder of Ut and Lt for i: 1 to N-2 Lt_offd(i-1) = - / Ut_d(i) Ut_d(i) = 1 + (2 + Lt_offd(i-1)) (increment i) // Backward Euler solution loop: for j:1 to M // Use forward substitution to solve Lt y = u(-1) + c() // Forward sub at first interior node: U(1,j) = U(1, j-1) + U(0, j) // Forward sub at interior nodes from index 2 to index N-2: for i: 2 to N-2 U(i, j) = U(i, j-1) – U(i-1, j) * Lt_offd(i-2) (increment i) // Forward sub at last interior node: U(N-1, j) = U(N-1, j-1) + U(N, j) – U(N-2, j) * Lt_offd(N-3) // Use backward substitution to solve Ut u() = y // Backward sub at last interior node: U(N-1, j) = U(N-1, j) / Ut_d(N-2) // Backward sub at remaining nodes—indices N-2 to 1 for i: N-2 to 1 U(i, j) = (U(i, j) – U(i+1, j) * Ut_offd(i-1)) / Ut_d(i-1) (decrement i) (increment j) // calculate V, the value of the option at time 0 and the current spot output: V = K U(Nleft, M) exp(-ax0 - bf) ----------------------------------------------A Centered Method: Crank Nicolson In the methods we have seen above, finite difference approximations of the spatial derivatives are calculated either exclusively at - (forward Euler) or at (backward Euler) to find approximations of u at these times. Each method leads to systematic discretization errors, and although they do approach zero as the number of time steps is increased (as long as the scheme used is stable), an examination of these errors suggests a possible improvement: Since the implicit and explicit methods tend to err in opposite directions, a weighted average of the approximations seems likely to produce faster convergence. One such method—Crank Nicolson—assigns equal weight to the two approximations, and is sometimes referred to as a “fully centered” method. It writes the PDE at the point u(x, t-/2), using linear interpolations in of all derivatives, and thus involves the six points u(x-x,-), u(x,-), and u(x+x, (all known), along with u(x-x,), u(x,), and u(x+x,) (all unknown). In effect, this leads to a method that is an average of the forward and backward Euler methods, with the advantage that the Crank Nicolson method is, unlike forward Euler, unconditionally stable. Nevertheless, the contribution of backward Euler does show up in spurious oscillations when the Courant constant is chosen to be greater than ½, so in practice Crank Nicolson is usually used only after an initial smoothing step. Since it involves the solution of a linear system at each time step, the iteration method used with Crank Nicolson is essentially the same as that used with backward Euler, with slightly different coefficient and constant matrices: u 2 u x 2 u(x, ) u(x, ) 1 u(x x, ) 2u(x, ) u(x x, ) 2 x 2 1 u(x x, ) 2u(x, ) u(x x, ) 2 2 2 x 2 u(x x, ) (1 )u(x, ) 2 u(x x, ) u(x x, ) (1 )u(x, ) 2 u(x x, ) If we express this system as in the form Au() = b(), then the matrices are as follows: A is: 1 2 1 2 2 1 2 0 0 b() is: 2 u(x left , ) (1 )u(x left x, ) 2 u(x left 2x, ) 2 u(x left , ) u(x left x, ) (1 )u(x left 2x, ) u(x left 3x, ) 2 2 u(x right 3x, ) (1 )u(x right 2x, ) u(x right x, ) 2 2 u(x right 2x, ) (1 )u(x right x, ) u(x right, ) u(x right, ) 2 2 2 …and u() is, as before, the vector of function values we seek at the interior nodes at time . Since the matrix A is also symmetric positive definite, the methods outlined above for backward Euler will also serve, with very slight adjustments, for the Crank Nicolson method as well. The main change is that the vector on the right-hand side of the equation must be recalculated from u(-) at each time step, before forward and backward substitution are used to solve the system. Because this is only a small extension of the backward Euler method outlined in detail above, pseudocode for the Crank Nicolson method is not included. Applications of Finite Difference Methods Clearly there is no need for such computation-intensive methods to price European options under the usual Black-Scholes assumptions, since closed-form formulas exist for a wide variety of payoffs. These formulas can be extended easily to situations where analytic functions of time for interest rate, dividend rate, and volatility are used, or when their integrals can be approximated over the life of the option. In practice, however—even when the assumption of a lognormal evolution is considered reasonable—there are many situations in which closed-form formulas are not available. Two major ones are cases when the underlying pays a discrete fixed dividend at one or more times during the life of the option; and when a socalled “local volatility” model is used for the underlying—that is, when the volatility is chosen as a function depending upon both S and t. Both of these complications are relatively straightforward to implement using finite difference methods, although we will not attempt it here. The most common—and easiest to illustrate—case in which numerical methods are necessary is when the option can be exercised during its life—either at any time, for an American option; or else at a preset selection of times, for a Bermudan option. Aside from the need to make sure the exercise times fall on the mesh in , both types of options can be priced by making only minor changes to the methods already outlined. Pricing American Options The basic specification of the problem for an American option changes only slightly from the initial one above. Although there are several ways of posing the problem formally, in essence the idea is simple: As above we seek V(S,t) and impose the same constraints, with one added: S 0, t 0,T V (S,T) g(S) V (S,t) g(S) …where g(S) is the intrinsic value of the option—the value if it is exercised immediately. The value of our American option must satisfy one of the following relations; either: V V 2 2V rS rV 0 t S 2 S 2 or else: V (S,t) g(S) Thus, there are two distinct regions we will encounter in pricing the option. In one, the continuation region, the option obeys the Black-Scholes PDE, and its value is greater than the intrinsic value of the option. In the other, the early exercise region, it does not obey the Black-Scholes PDE, but its value is simply equal to the intrinsic value. This suggests a straightforward way of adapting a finite-difference pricer to American options. By whatever method, at each time step we arrive at a vector of values of the option using the Black-Scholes PDE and the values from the prior time step. By checking each of these values against the intrinsic value of the option and taking the larger value at each node, we can model optimal early-exercise behavior and arrive at a good approximation of the option’s value. This method does converge to the solution of the free- boundary problem above as our mesh grows finer, and has the additional benefit of discovering where the boundary between the early exercise and continuation regions lies. If we consider only vanilla options—calls and puts—then the matter becomes even simpler, since it can be shown that the value of an American call and a European call are the same, since early exercise of a call is never optimal. Thus, the only case we need to consider here is an American put. Using our change of variables and the payoff of a put, we can arrive at the following description of our option value Vee within the early exercise region: Vee (S,t) g(S) K S K Ke x axb e K uee (x, ) (1 e x )e axb uee (x, ) Clearly, the check for early exercise need only be performed for x < 0 in the case of a put. Note that, when pricing American puts, this function also becomes the boundary condition at xleft, since it is strictly greater than the left-hand boundary condition used to price the European put above. As always, it is important to choose a pricing radius large enough to be sure that this serves as a good approximation of the left-hand values—in this case, that xleft lies well within the early exercise region for all .