PAPER: Maximum likelihood estimation of the Heston stochastic volatility model using asset and option prices: an application of nonlinear filtering theory Francesca Mariani, Graziella Pacelli, Francesco Zirilli 1. 2. 3. 4. 5. 6. Abstract Introduction Filtering problem Filtering Movie Estimation problem Estimation Movie References FORTRAN code Warning: If your browser does not allow you to see the content of this website click here to download a word file that reproduces the material contained here. 1. Abstract Let us suppose that the dynamics of the stock prices and of their stochastic variance is described by the Heston model, that is by a system of two stochastic differential equations with a suitable initial condition. Our aim is to estimate the parameters of the Heston model and one component of the initial condition, that is the initial stochastic variance, from the knowledge of the stock and option prices observed at discrete times. The option prices considered refer to an European call on the stock whose prices are described by the Heston model. The method proposed to solve this problem is based on a filtering technique to construct a likelihood function and on the maximization of the likelihood function obtained. The estimated parameters and initial value component are characterized as being a maximizer of the likelihood function subject to some constraints. The solution of the filtering problem, used to construct the likelihood function, is based on an integral representation of the fundamental solution of the Fokker-Planck equation associated to the Heston model, on the use of the wavelet expansions presented in [1], [2], [3] to approximate the integral kernel appearing in the representation formula of the fundamental solution, on a simple truncation procedure to exploit the sparsifying properties of the wavelet expansions and on the use of the Fast Fourier Transform (FFT). The use of these techniques generates a very efficient and fully parallelizable numerical procedure to solve the filtering problem, this last fact makes possible to evaluate very efficiently the likelihood function and its gradient. As a byproduct of the solution of the filtering problem we have developed a stochastic variance tracking technique that gives very good results in numerical experiments. The maximum likelihood problem used in the estimation procedure is a low dimensional constrained optimization problem, its solution with ad hoc techniques is justified by the computational cost of evaluating the likelihood function and its gradient. We use parallel computing and a variable metric steepest ascent method to solve the maximum likelihood problem. Some numerical examples of the estimation problem using synthetic data obtained with a parallel implementation of the previous numerical method are presented. Very impressive speed up factors are obtained in the numerical examples using the parallel implementation of the numerical method proposed. This website contains two animations and some auxiliary material that helps the understanding of the paper [7] and makes available to the interested users the computer programs used to produce the numerical experience presented here and in [7] . 2. Introduction Assuming the stock prices and their stochastic volatilities or variances as state variables and the stock prices and option prices observed at discrete times as observations, we study the problem of forecasting (tracking) the unobserved and time-varying stochastic volatility or variance when the model parameters and initial condition are known using the observations (filtering problem) and the problem of estimating the model parameters and the unknown initial volatility or variance using the observations (estimation problem). We consider these problems in the context of the Heston stochastic volatility model. Let St and vt be the state variables at time t>0 that is, respectively, the stock price and the stochastic variance at time t>0. The Heston model assumes that the dynamics of the state variables (S t,vt), t>0, is described by the following system of stochastic differential equations [4]: dS t μ S t dt v t S t dWt1 , t 0, (1) dv t γ(θ v t )dt ε v t dWt2 , t 0, (2) where Wt1, Wt2, t>0, are standard Wiener processes such that W01=W02=0, dWt1, dWt2 are their stochastic differentials and dWt1dWt2 = dt where denotes the mean of and [-1,1] is a constant known as correlation parameter; is a real constant related to the behaviour of the mean of the stock log returns, associated to the stochastic process St , t>0. Note that the stochastic variance vt, t>0, is assumed to be described by a mean-reverting process with speed 0 and parameters 0 and 0, that is by equation (2). The equations (1), (2) must be equipped with an initial condition, that is: ~ S0 S0 , v ~ v . 0 0 ~ Here S0 and (3) (4) ~v 0 denote random variables concentrated with probability one in a point; for ~ ~ ~v S 0 0 simplicity, we continue to denote these points respectively with and . We note that S0 , the ~ stock price at t=0, is observable while v 0 , the stochastic variance at t=0, cannot be observed. The quantities , , , and are real constants and are the model parameters that we want to estimate. Hence, the parameters and unknown initial condition component of the Heston model that we want ~ to estimate are , , , , and v 0 . We assume that 2/2 1, this assumption guarantees that if ~v 0 is positive with probability one, vt, solution of (1), (2), remains positive with probability one for t>0. We suppose to observe the prices of the stock and the prices of a European call option of known maturity and strike price on the underlying whose price St, t>0, satisfies (1), (2), (3), (4) at prescribed discrete times 0<t1<t2<…<tn<tn+1 =+, let Ft= {(Si,Ci) : ti t}, t>0, be the set of the observations up to time t>0 of the stock prices St and of the option prices Ct that is for i=1,2,...,n let Si be the observation of the stock price and Ci be the observation of the option price at time ti and let Θ ( , , ε, θ, ρ, ~ v 0 ) T be the vector of the model parameters and of the initial stochastic variance, ~ ~ where the superscript T denotes the transpose operator. We use the notation t0 =0 and F0={( S0 , v0 )}, this notation simplifies some of the formulae that follow. The option prices considered refer to the prices of an European call option on the underlying whose prices are described by S t, t>0, with known strike price E and maturity time T such that T>tn, we denote with C(S,v,t;E,T,) the price of this option in the Heston stochastic volatility model when 0t<T, St=S and vt=v. We assume that the option prices observed are given by the theoretical option prices in the Heston model affected by a Gaussian error, that is: for i=1,2,...,n, Ci=C(Si,vi,ti; E,T,)+ui, where ui is a Gaussian error term; that is we assume that ui is sampled from a Gaussian variable with mean zero and known variance i and C(Si,vi,ti; E,T,) is the option price in the Heston model assuming known the (unobserved) variance vi at time ti. 3. Filtering problem Let us assume the vector known, we focus on the stock price and stochastic variance forecasting problem from the observed stock and option prices. This problem is a filtering problem, where the ~ ~ stochastic differential equations (1), (2) represent the “state equations”, S0 and v 0 at time t are the 0 initial condition and, for i=1,2,...,n, (Si ,Ci) at time ti represent the observations. Note that in the ~ filtering problem v 0 is known through the knowledge of . The solution of the previous filtering problem is based on the knowledge of the joint probability density function p(S,v,t| Ft,), (S,v)(0,+) (0,+) , t>0, of the state variables, that is of the stock prices and of the stochastic variance, conditioned to the observations Ft, t>0, and to the initial condition (3), (4). We introduce the stock log returns: ~ x t log(S t / S0 ), (5) t 0, ~ without loss of generality, we can assume S0= S0 =1. From (1), (2), (3), (4) and (5) it is easy to see that (xt,vt), t>0, satisfy the following equations: 1 v t )dt v t dWt1 , 2 dv t γ(θ v t )dt ε v t dWt2 , dx t (μ t 0, t 0, with initial condition: x0 ~ x 0 0, v ~ v . 0 (8) (9) 0 ~ The joint probability density function p(x, v, t | Ft , Θ), (x,v)(-,+) (0, +) , t>0, of the state variables xt, vt, t>0, conditioned to the observations Ft, t>0, is solution of the following equations [5]: for i=0,1,...,n: ~ p 1 2 (v~ p) 1 2 (ε 2 v~ p) 2 (ε ρ v~ p) 2 ((μ v/2) ~ p) (γ (θ v) ~ p) , (x, v) (-,) (-,), t i t 2 2 t 2 x 2 v xv x v (10) with initial condition: for i=0: ~ p (x, v, t | Ft 0 , Θ ) δ(x ~ x 0 )δ(v ~ v 0 ), (x, v) (,) (,), (11) and for i=1,2,...,n: 1 ~ δ(x x i ) ~ p(x, v, t i | Ft i , Θ) exp C(x, v, t i ; E, T, Θ) C i 2πi 2i (x, v) (-,) (-,), 2 ~ p(x, v, t i | Ft i 1 , Θ) 2 ~ exp 1 C ~ (x , v, t ; E, T, Θ ) C p(x i , v, i i i 0 2i (12) ~ where for i=1,2,...,n: xi=log(Si) and C (x,v,ti;E,T, ) is the value in the Heston model at time t=ti x x v ti v when t i , of a call option on the stock whose price is St, t>0, with strike price E and maturity time T. ~ Once we know the joint probability density p (x,v,t|Ft,), (x,v) (-,+) (0, +), t>0, we can forecast the values of the random variables (xt,vt), t>0, as follows: x̂ t|Θ v̂ t|Θ ~ dx xp(x, v, t | Ft , Θ)dv, 0 0 dx vp(x, v, t | F , Θ)dv, ~ t The quantities x̂ t|Θ , v̂ t|Θ , t 0, t 0, (13) t 0. (14) can be regarded as the solution of the filtering problem. Let G(x,v,t,x',v',t'|), (x,v), (x',v') (-,+) (0, +) , t>t'>0, be the fundamental solution of (10), we have (see [6] pag.605): 1 2 G(x, v, x' , v' , t' | Θ) 2exp (x x' μτ) dk exp(k ( x x' μτ))exp 2 (A(τ , k, l; ) B(τ , k, l; Θ)v' ilv 2 ε (x, v), (x' , v' ) (,) (0, ), t t' 0, τ t t' where denotes the imaginary unit an A and B are explicitly known elementary functions [6] pag.605. Using the fundamental solution of (10) given in (15) the solution of (10), (11), (12) can be written as follows: ~ p(x, v, t | Ft 0 , Θ) G(x, v, t, ~ x0,~ v 0 , t 0 | Θ), (16) (x, v) (,) (0,), t 0 t t 1 , and for i=1,2,…,n: ~ p(x, v, t | Ft i , Θ) G(x, v, t, x i , v' , t i | Θ)f i (v' )dv' , (x, v) (,) (0,), t i t t i 1 , 0 (17) where f i (v' ) δ(x x i )~ p (x, v' , t i | Ft i 1 , Θ)π 1 (x, v' , t i | Ft i , Θ) , v' 0, ~ p(x i , v, t i | Ft i1 , Θ)π1 (x i , v, t i | Ft i , Θ)dv 0 (18) and π1 (x, v' , t i | Ft i , Θ) (19) δ(x x i ) 2πi 1 ~ exp C(x, v' , t i ; E, T, Θ) C i 2i , 2 (x, v' ) (,) (0,). ~ p (x, v, t | Ft i , Θ) The evaluation of , (x,v) (-,+) (0, +), t>0, for i=1,2,...,n, on a grid of N2 points in the (x,v) variables with a naïve quadrature procedure applied to formula (17) using N quadrature nodes in each variable requires O(N5) complex multiplications when N goes to infinity. In fact (17) due to (15) must be regarded as a three dimensional integral (i.e. an integral of the v' , k, l variables). Using a wavelet expansion [1], [2], [3] of the integral kernel of (17) and of the function (18), a truncation procedure that exploits the sparsifying properties of the wavelet expansion, the Fast Fourier Transform (FFT), we reduce the computational cost of the previous computation from O(N5) to N1O(N2log N) complex multiplications when N goes to infinity, where N1 is independent of N. Note that N1 is the number of the wavelet functions remaining in the expansion after the use of the truncation procedure. Using the estimates (13), (14), we can forecast the values of the stock log return x t, t>0, and of the stochastic variance vt, t>0, coherently with the observations Ft, t0. The continuous update in time (x̂ , v̂ ), t 0, of the value of the couple t|Θ t|Θ is the “trajectory tracking” of (6), (7), (8), (9). The v̂ t|Θ , t 0, continuous update in time of is the “variance tracking”. x̂ v̂ The quality of the estimated values ( t | , t | ), t>0, depends on the variance of the random variables xt, vt, t>0, conditioned to the observations, that is we can estimate the quality of the estimates (13), (14) computing respectively the quantities: 0 0 Σ(x t | Ft , Θ) dx (x xˆ t|Θ ) 2 ~ p(x, v, t | Ft i , Θ)dv, Σ(v t | Ft , Θ) dx (v vˆ t|Θ ) 2 ~ p(x, v, t | Ft i , Θ)dv, t 0, (20) t 0. (21) The estimates (13), (14) are good, when the variances (20), (21) are small. Click here to download a digital movie showing the trajectory tracking (right top corner), the evolution in time of the variance of the estimated variance that is the evolution in time of (21) (right bottom corner) and the evolution in time of the joint probability density function conditioned to the observations (left side) obtained solving the filtering problem . In the trajectory tracking shown in the digital movie and in Figure1 the solid line represents the true trajectory made of the values (xt,vt), t[0,24/252.5], obtained computing with Euler method one ~ ~ trajectory of the stochastic differential equations (6), (7), (8), (9) with x 0 =0 and v 0 =0.5; the dashed line represents the trajectory followed by the forecasted values ( x̂ t | v̂ t | , ), t[0,24/252.5], obtained solving the filtering problem computing the integrals (13), (14) and using the data shown in Table 1. The asterisks and the circles indicate, respectively, the values of (x t,vt) and of (x̂ t|Θ , v̂ t|Θ ) at the observation times t=ti, i=0,1,...,5. We have chosen = (0.026, 5.94, 0.306, 0.01159, -0.576, 0.5)T. We have considered the option with strike price E = 0.5 and maturity time T = 126/252.5. We assume that the non adimensional components of the parameter vector are expressed in annualized unit, that is in the unit 1/year. We use a “year” made of the average number of trading days per year, that is 252.5 days, so that for example t=24/252.5 corresponds to the end of the trading day number 24 of the regular year. i ti xi C(xi,vi,ti|E,) 0 0 0 1 4/252.5 -0.00371625 0.13279039 2 8/252.5 -0.08059790 0.06952851 3 12/252.5 0.00498269 0.12195051 4 16/252.5 -0.00590680 0.10456021 5 20/252.5 0.08869340 0.18950671 Ci 0.13440918 0.06835849 0.12349829 0.10599176 0.18758168 Table 1. The data of the filtering problem Below we show two figures (Figure 1 and Figure 2) that help the understanding of the quality of the estimates obtained solving the filtering problem. Figure 1 shows the trajectory tracking obtained as solution of the filtering problem. In particular the solid line represents the true trajectory made by the values (xt,vt), t[0,24/252.5], obtained solving with the Euler method one trajectory of the stochastic differential equations (6), (7), (8), (9), the dashed line represents the x̂ v̂ trajectory followed by the forecasted values ( t | , t | ), t[0,24/252.5]. Figure 2 shows (vt | Ft, ) versus time when t[0,24/252.5]. Figure 1. Trajectory tracking Figure 2. (vt|Ft ,) versus time t 4. Estimation problem In Section 3 we have assumed known to solve the filtering problem but in real situations when the Heston model is used is not known and must be estimated from the knowledge of the x̂ v̂ observations. The dynamics of the stochastic processes (xt,vt), t>0, and the estimates ( t | , t | ), t>0, obtained from the solution of the filtering problem, depend on the vector Θ ( , , ε, θ, ρ, ~ v 0 ) T . Let M { : 0, 0, θ 0, 2θ / 2 1, 1 1} be the set of the ~ admissible vectors . The observations (xi,Ci), i=1,2,...,n,and the initial condition x 0 are realizations (samples) of the random variable xt, t0, at the observation times and at t=t0=0 and of a random variable, the option price Ct, t>0, derived from (xt,vt), t>0, at the observation times so that we can hope that the observations can be used to characterize the vector . Therefore, in order to derive forecasts of the stock log returns and of the stochastic variance and in order to obtain derivative prices that are coherent with the observed derivative prices it is essential to have a reliable estimate of the vector . That is we search an admissible vector that makes most likely ~ the observations x 0 and (x ,C ), i=1,2,...,n. i i We solve the problem of estimating the vector from the observations using the maximum likelihood method. The maximum likelihood method consists in searching the vector M that maximizes the following (log-)likelihood function: n 1 i 0 0 F(Θ) log ~ p(x i 1 , v, t i 1 | Ft i , Θ)π1 (x i 1 , v, t i | Ft i , Θ)dv, Θ M. (22) Note that (22) represents the sum of the logarithms of the integrals of the product between the ~ π (x , v, t i 1 | Fti , Θ) transition probability densities p (xi+1,v,ti+1|Ft i ,) and the weight functions 1 i 1 , i=0,1,...,n-1. The addition of these weight functions to the expression of the likelihood function considered in (22) has the purpose of taking care of the fact that also the option prices C i, i=1,2,...,n, are observed, assigning bigger weights to the values of the v variable that make more likely the observation of Ci at time t=ti, i=1,2,...,n. The function F() represents the (log-)likelihood ~ of the parameter vector given the observations Ft, t>0, and x 0 at time t=t0. In order to maximize the (log-)likelihood function shown in (22), we use an iterative optimization procedure that searches the maximum likelihood estimate * solution of the maximization problem: max F(Θ), ΘM (23) starting from an initial guess 0M. The iterative procedure that solves the optimization problem (23) using 0 as initial guess finds during the iterations the admissible vectors k,, k=1,2,...,iter, where iter is the maximum number of iterations allowed. The vectors k, k=0,1,…,iter, are such that F(k+1)F(k), k=0,1,…,iter-1. We use as data of the estimation problem the same data used for the filtering problem that is the data shown in Table 1. These are synthetic data that were generated using =true = (0.026, 5.94, 0.306, 0.01159, -0.576, 0.5)T. We choose as initial guess of the optimization procedure the vector 0=true(1+0.4), where is a 6-dimensional standard Gaussian variable with independent components. We fix the maximum number of iterations allowed iter=40 and we use the variable metric steepest ascent method described in [7] and we use one of the following two stopping criteria: the first criterion consists in checking at every iteration the condition: (Θ kj Θ kj 1 ) 2 j1 6 (Θ kj 1 ) 2 j1 6 1 2 δ1 , (24) where kj is the j-th component of the vector k, j=1,2,…,6, k=0,1,…,iter, 1 is a given tolerance and in stopping the iteration when (24) is satisfied for the first time, the second one consists in checking at every iteration the condition: | F(Θ ) F(Θ k | F(Θ k 1 k 1 )| )| δ2 , (25) where 2 is a given tolerance and in stopping the iteration when (25) is satisfied for the first time. Choosing 1 = 2 =0.001 if we use the stopping criterion (24) the variable metric steepest ascent procedure stops at the iteration k=20, if we use the stopping criterion (25) the variable metric steepest ascent procedure stops at the iteration k=10, where 10=(0.02, 4.74, 0.24, 0.02, -0.63, 0.4) and 20=(0.02,4.748,0.24, 0.02, -0.615, 0.422) and we have F(10) 19.1014 , F(20) 19.7356, F(0) 11.6194, and F(true) 21.7477 . Click here to download a digital movie showing the variance tracking for several values of the parameter vector k M , obtained during the iterative optimization procedure and the variance tracking corresponding to the parameter vector =true. The evaluation of the (log-)likelihood function F() is very expensive computationally. We reduce the computational cost of solving the filtering and estimation problems using parallel computing. Using np processors the wall clock time required to execute the program that finds the solution of T the filtering problem reduces approximately of a factor np. In particular if we denote with n p the wall clock time needed to evaluate the (log-)likelihood function F(), for a given choice of M, using np processors, we can measure the efficiency of the parallel implementation considered computing the speed up factor s(np), where s(n p ) Tn p T1 , p 1,2,... . (26) In Table 2 we show as a function of the number of processors np the speed up factor and the wall T clock time n p needed to evaluate the (log-)likelihood function associated to the data shown in Table 1 on IBM SP4 machine with 32 processors. np 1 2 4 8 16 s(np) 1 1.9927 3.9424 7.6549 14.8287 Tn p (seconds) 4200 2107 1065 548.6 283.23 T Table 2. Speed up factors s(np) and wall clock time n p needed to evaluate the likelihood function associated to the data of Table1 versus the number np of processors employed 5. References [1] L.Fatone, M.C.Recchioni, F.Zirilli New Scattering Problems and Numerical Methods in Acoustics, in Recent Research Developments in Acoustics,S.G.Pandalai Managing Editor, Transworld Research Network, Kerala, India, 2 (2005), 39-69. [2] L.Fatone, M.C.Recchioni, F.Zirilli New wavelet bases made of piecewise polynomial functions: approximation theory, quadrature rules and applications to kernel sparsification and image compression, in preparation. [3] L.Fatone, G.Rao, M.C.Recchioni, F.Zirilli, High Performance Algorithms Based on a New Wavelet Expansion for Time Dependent Acoustic Obstacle Scattering, submitted to Journal of Computational Physics. [4] S.L.Heston, A Closed-Form Solution for Options with Stochastic Volatility with Applications to Bond and Currency Options , Review of Financial Studies, 6 (1993), 327343. [5] A.H. Jazwinski, Stochastic Processes and Filtering Theory , Academic Press, New York, (1970). [6] A.Lipton, Mathematical Methods for Foreign Exchange, , World Scientific Pubblishing Co. Pte. Ltd, Singapore, (2001). [7] F.Mariani, G.Pacelli, F.Zirilli, Maximum Likelihood Estimation of the Heston Stochastic Volatility Model Using Asset and Option Prices: an Application of Nonlinear Filtering Theory, , submitted to Optimization Letters. 6. FORTRAN Code The FORTRAN code distributed in this website computes the solution of the estimation problem (23) using the formulae outlined in Sections 3 and 4 and derived in [7] . The data of the estimation problem are those described in Section 4 and must be provided by the user. The FORTRAN code consists of two programs. The first program preproc.f reads and writes on files the data inserted by the user, initializes the parameter vector 0 and computes the number of the wavelet coefficients needed after the truncation procedure in order to approximate efficiently the integral kernel contained in (17) associated to the initial guess 0. click here to download The second program estimate.f starting from the initial guess 0 generates a sequence of admissible vectors k, k=1,2,…,iter, using a variable metric steepest ascent procedure. At every iteration the program solves the filtering problem associated to the current vector parameter k. click here to download Both programs use parallel computing and MPI as message passing library. Therefore they can run on parallel machines. Entry n.