Variance Risk Premium Dynamics Job Market Paper Viktor Todorov Duke University Current Draft: January 3, 2007 ∗† Abstract This paper uses high-frequency S&P 500 index futures data and data on the VIX index to provide an arbitrage-free explanation of the variance risk premium and its dynamics. Using the high-frequency data only, I select a semiparametric two-factor stochastic volatility model, containing jumps in the price and the stochastic variance. For this model I derive prices of diffusive and jump risk that determine the variance risk premium. Unlike other studies of the variance risk premium, this study allows compensation for both stochastic volatility and jumps to be reflected in the variance risk premium. The price of jump risk considered here is novel and allows the jump risk premium to depend on the level of past price jumps. Using the selected stochastic volatility model and the prices of risk, I conduct a joint inference and detect a non-trivial variance risk premium. The estimation results show that the variance risk premium varies significantly over time. It increases in periods of high volatility and straight after big jumps. The empirical findings of this paper suggest habit persistence in investors’ fear of jumps, i.e., after a market crash investors are willing to pay more to protect themselves from future market drops. Key words: Change of measure, continuous-time stochastic volatility model, diffusive risk, jump risk, Lévy process, quadratic variation, realized multipower variation, variance risk premium, variance swap rate. JEL classification: G12, C51, C52. ∗ Author’s Contact: Viktor Todorov: viktor.todorov@duke.edu. Department of Economics, Duke University, Box 90097, Durham NC 27708. † I would like to thank the members of my committee George Tauchen(chair), Tim Bollerslev, Ron Gallant and Han Hong for many discussions and encouragement along the way. I thank also seminar participants in the Duke Economics and Finance seminars, Javier Cicco, Pedro Duarte, Paul Dudenhefer, Silvana Krasteva, Jonathan Mattingly and Barbara Rossi for helpful comments. I benefited from discussions with Jean Jacod, Albert Shiryaev, Ernesto Mordecki, Mark Podolskij and other seminar participants at the Conference on Stochastics in Science in Honor of Ole Barndorff-Nielsen, Guanajuato, Mexico, March 2006. 1 Introduction A central topic in finance concerns the risk premium that investors require for bearing different risks. Much of the work so far has been centered on explaining the equity risk premium, i.e. the compensation for the variation in asset prices (the price risk). However, the price risk is not the only risk from holding assets that investors face. Over the last few decades the financial econometrics literature has provided strong and unambiguous evidence that the variances of financial assets exhibit significant variation over time (e.g. Bollerslev, Engle, and Nelson (1994) and more recently Andersen, Bollerslev, and Diebold (2005a)). This variation introduces an additional source of risk from holding assets, referred to as variance risk. The importance of the variance risk for the investors is directly underlined by the development and trading of variance swap contracts, i.e., forward contracts on future variance1 . Investors generally dislike the randomness of the future variance and in equilibrium require a premium for accepting it. This is known as the variance risk premium. The main goal of this paper is to analyze the dynamics of the variance risk premium. Studying the dynamics of the variance risk premium is important for at least two reasons. First, given the increased interest in the direct trading of variance contracts, we need to be able to price these products. Moreover, in many cases the variance products are part of a portfolio containing the underlying asset. Thus, the pricing of the variance should not be done in isolation, but rather in a way that is consistent with the pricing of the underlying asset. If the variance risk premium is constant, the substantial literature on modeling and forecasting the returns variance (e.g. Andersen, Bollerslev, and Diebold (2005a)) is directly applicable for pricing the variance products. Things are different, however, if the variance risk premium has time-variation. Second, the analysis of the dynamics of the variance risk premium has implications for the existence and properties of the pricing kernel, also known as the stochastic discount factor. In that sense the importance of the analysis in the paper goes well beyond the pricing of variance products. In this paper I provide an arbitrage-free explanation of the dynamics of the variance risk premium. The analysis is based on high-frequency data on the S&P 500 index and data on the variance swap rate (the VIX index). Using a general semiparametric stochastic volatility model and flexible prices of risk in this model, I am able to account for the dynamics of the variance risk premium implied by the data. Previous studies of the dynamics of the variance risk premium include Bollerslev, Gibson, and Zhou (2005) and Wu (2005)2 3 . There are two substantial differences between these two papers and the current work, which also synthesize the major contributions of this paper to the existing literature. The first difference is the separation of the price jumps from the continuous price component. Bollerslev, Gibson, and Zhou (2005) do not allow for price jumps in their model, while Wu (2005) allows them but he does not consider their separation from the continuous price component in the estimation. In contrast, in this paper I allow for jumps in the model and use the high-frequency data on the index to separate the continuous from the discontinuous component of the price. The advantage of this separation is that it allows to isolate 1 These contracts give exposure only to variance risk and hence provide an instrument to hedge against it. The recent theoretical results in Carr and Wu (2004) and Britten-Jones and Neuberger (2000) imply that the variance swap can be replicated with a static portfolio of option contracts written on the underlying asset. In 2003 CBOE adopted these theoretical results in calculating its volatility index, the VIX index, and as a result the new VIX index reflects the theoretical price of a variance swap contract. The ability to replicate the VIX index (i.e. the variance swap) directly with standard options increased further the interest in trading future variance. As a result in 2004, CBOE launched trading of futures contracts on the VIX index and at the beginning of 2006 it started trading option contracts written on the VIX index as well. 2 Other studies of the variance risk premium include Bakshi and Kapadia (2003), Carr and Wu (2004) and Bakshi and Madan (2006). These papers, however, do not consider modeling the dynamics of the variance risk premium. 3 Of course, since the variance risk premium (and its dynamics) can be determined from the pricing kernel, all papers which consider estimation of the pricing kernel are also indirectly related with the current study of the dynamics of the variance risk premium. An incomplete list includes Bates (2000), Chernov and Ghysels (2000), Ait-Sahalia, Wang, and Yared (2001), Pan (2002), Rosenberg and Engle (2002), Eraker (2004), Santa-Clara and Shu (2005), Broadie, Chernov, and Johannes (2006). A major difference between the current study and these papers is the data. The current paper uses high-frequency data on the underlying index and a variance swap (portfolio of options) data, while the above-cited papers use low-frequency data on the underlying asset and data on a set of options. As discussed later, the use of high-frequency data is crucial for the analysis here. 2 the effect of the price jumps on the variance risk premium and hence it allows for a deeper analysis of the determinants of the variance risk premium. The second difference between Bollerslev, Gibson, and Zhou (2005) and Wu (2005) and the current paper is the source of the variance risk premium. Variance risk premium in Bollerslev, Gibson, and Zhou (2005) and Wu (2005) is associated with the compensation for the time-variation in the conditional variance. However, when the model contains price jumps, the variance of the asset will vary over time even if there is no time-variation in the conditional variance of the returns. The variance risk premium in general, therefore, reflects also compensation demanded by investors for the presence of price jumps. In contrast to the above-cited papers, I allow compensation for both presence of price jumps and time-variation in the conditional variance to determine the variance risk premium. In fact, it is the flexible specification of the compensation for jump risk, considered in the present paper, that allows me to explain the dynamics of the variance risk premium implied by the data. The analysis of the dynamics of the variance risk premium has two major building blocks. The first is the specification of a model for the dynamics of the underlying index, while the second one is specification of prices of risk in the model (i.e. specification of a valid pricing kernel). The paper starts with a selection of a stochastic volatility model. I work with a very general semiparametric model which nests the affine-jump diffusion models of Duffie, Pan, and Singleton (2000) (with constant jump intensity) and the jump-diffusion jump-driven stochastic volatility models of Todorov (2006a) (which include the non-Gaussian OU model of Barndorff-Nielsen and Shephard (2001)). Empirically relevant features of the model include the presence of jumps both in the price and the stochastic variance (the spot variance of the continuous price component) as well as the multifactor-type structure of the stochastic variance. The modeling of the jumps in the price and the variance is quite flexible and allows for all possible dependencies between them. Using only high-frequency data on the underlying index, I estimate different specifications of the general stochastic volatility model and select one of them for the analysis of the variance risk premium. The selected model has two variance factors. The one is diffusive (modelled as a square-root process) and very persistent. The other variance factor is driven by jumps and has a very short memory. In the selected model, the variation both in the price and in the stochastic variance of the underlying asset is driven by diffusive shocks (modelled with a Brownian motion) and jumps (modelled with a general pure-jump Lévy process). Therefore, to determine the variance risk premium, we need prices of diffusive and jump risks in the model. The compensation for diffusive risk, considered here, is the generalized affine price of risk, as recently defined in Cheridito, Filipović, and Kimmel (2005) in the context of affine diffusion models. The price of jump risk that is used is novel and quite flexible. It allows jumps to have very different behavior under the physical and the risk-neutral measure. For example, the compensation for the jumps allows for a situation where the jumps are time-homogenous under the physical measure and yet exhibit significant persistence under the risk-neutral measure. This flexibility turns out to be empirically relevant. Following Todorov (2006b), the estimation in the present paper is based on matching moments of realized multipower variation. Realized multipower variation statistics aggregate high-frequency data on a daily level and provide a good approximation of latent quantities of the model (see Barndorff-Nielsen, Graversen, Jacod, Podolskij, and Shephard (2005)4 ). In the estimation I treat the realized multipower variation statistics as their (unobservable) asymptotic limits. This introduces error in the parameter estimation. The error converges in probability to zero for the general stochastic volatility model used in the paper under the condition that the number of intraday observations goes to infinity. Further, under the additional condition that the number of intraday observations increases slightly faster than the number √ of days in the sample, T , this approximation error is asymptotically negligible, i.e. it is of order op (1/ T ). A final remark regarding the estimation is related to the jump specification. In the estimation, the distribution of the jumps in the price and the variance is left unspecified. Instead, only certain moments 4 Their asymptotic behavior in the case of no price jumps, as the number of intraday observations goes to infinity, is derived in Barndorff-Nielsen, Graversen, Jacod, Podolskij, and Shephard (2005). These results are partially extended to the case when the price process contains jumps, which is the case of interest in this paper, by Barndorff-Nielsen, Shephard, and Winkel (2006) and Jacod (2006a,b). 3 of the jumps are estimated. The advantage of this approach is that the results of the paper are immune to misspecification of the distribution of the jumps. This is particularly relevant for the dependence between the jumps in the price and the variance. The estimation results indicate that this dependence is statistically significant. The estimated dependence between the jumps, however, is different from that implied by most parametric specifications for the jumps used in the financial literature. This finding underscores the advantage of the estimation approach adopted here of not modeling parametrically the jumps. My main empirical findings can be summarized as follows. I find a non-trivial variance risk premium. Its estimated mean is 0.6827, while the sample mean of the variance swap rate is 1.6542 (both estimates are in daily variance units). Further, the variance risk premium shows significant variation over time. An estimated lower bound for its variance is 0.3401, while the sample variance of the variance swap rate is 1.2775. I find that both price jumps and stochastic variance are important determinants of the variance risk premium. The dependence of the variance risk premium on the price jumps, to the best of my knowledge, is a new finding. The empirical evidence indicates that after a big jump in the price, the variance risk premium increases and takes a while to revert to its mean. This is explained with a compensation for jumps that depends on a very persistent state variable, which, in turn, is related with the price jumps. At the same time, the price jumps in the model are time-homogeneous (under the physical measure), since they are modelled as a Lévy process, and further the estimation results show that their effect on the stochastic variance disappears quickly. Thus, the empirical finding of a persistent jump risk premium suggests a habit persistence in investors’ fear of jumps: immediately after a market crash investors are willing to pay more to protect themselves against future market drops. Finally, my findings for the importance of the time-varying jump risk premium are consistent with the results of Bates (2000), Pan (2002) and Santa-Clara and Shu (2005), among others. However, there are two major differences between this study and the above-cited papers in the modeling of the time-varying jump risk premium. First, in this study the compensation for jump risk depends on the past jumps in the stock market index and this accounts for the observed dependence of the variance risk premium on past price jumps. Second, the jump risk premium here is not directly linked with a state variable in the model such as the variance jump factor. This is important since the estimation results show that the jump risk premium, although related with past jumps, has much longer memory than does the variance jump factor. The remainder of the paper is organized as follows. Section 2 introduces the general stochastic volatility model for the dynamics of the underlying asset under the physical measure. I discuss how the model can capture key empirical features of asset prices and derive the moments to be used later in the estimation. Section 3 describes the estimation technique based on the realized multipower variation statistics constructed from the high-frequency data. This Section also contains an asymptotic result for realized multipower variation based inference in the context of the stochastic volatility model used here. In Section 4 I estimate different specifications of the general stochastic volatility model, introduced in Section 2, and select one of them to be used for the analysis of the variance risk premium. The estimation is done using only high-frequency data on the underlying asset. In Section 5 I construct a measure for the variance risk premium and report significant empirical evidence for time-variation in this measure. Section 6 derives general prices of diffusive and jump risk within the selected stochastic volatility model and discusses their implication for the variance risk premium. In this Section I also test the different specifications of diffusive and jump risk using high-frequency data on the underlying asset and data on the variance swap data. Section 7 concludes. All the proofs are given in Appendices at the end of the paper. 2 Dynamics under the Physical Measure In this Section I specify the general stochastic volatility model and define key quantities associated with it that are used for the definition and estimation of the variance risk premium. Later in Section 4 I estimate different specifications of the general stochastic volatility model, introduced in this Section, and select 4 one of them for the analysis of the variance risk premium. The current Section also discusses the main characteristics of the model and argues for its flexibility. Finally, moments of the return process, to be used in the estimation, are also derived. 2.1 The Stochastic Volatility Model I fix a filtered probability space (Ω, F , P), with F = (Ft )t∈R its filtration. On this space I define with F (t) the price at time t of a futures contract on the stock market index expiring at some future date. I assume for f (t) = log(F (t)) the following dynamics under the physical measure P Z Z t f (t) = f (0) + 0 Z tZ t α(s)ds + σ(s−)dW (s) + 0 0 Rn 0 h(x)µ̃(ds, dx), (1) σ 2 (t) = V c (t) + V j (t), V c (t) = p X Vic (t), and dVic (t) = κi (θi − Vic (t))dt + σiv i=1 Z j t V (t) = −∞ (2) q Vic (t)dBi (t), i=1,...,p, (3) Z Rn 0 g(t − s)k(x)µ(ds, dx), (4) where (W (t), B1 (t), ..., Bp (t)) is a (p + 1)-dimensional Brownian motion with B1 (t), ..., Bp (t) independent of each other and having correlation coefficients ρ1 , ρ2 , ..., ρp respectively with W (t); x is an n-dimensional vector on Rn0 ; µ is a time-homogenous Poisson random measure with compensator ν such that ν(dt, dx) = dtG(dx) for some G : Rn0 → R+ ; g : R+ → R+ , h : Rn0 → R and k : Rn0 → R+ and µ̃ := µ − ν is the compensated measure. Sufficient conditions for the existence of all processes in the model (1)-(4) are given in Section 3 (Assumption 4). The futures price in (1) has three components. The first is the drift term which is absolutely continuous. In this paper it is left unspecified. The second component of the price is a continuous local martingale. Its time-variation is determined by the process σ 2 (t). I refer to σ 2 (t) as the stochastic variance, since it determines the time-variation in the conditional variance of the returns5 . σ 2 (t) is a sum of two factors. The first factor, V c (t), is the continuous component of the stochastic variance. I model it as a sum of square-root processes as in the standard affine stochastic volatility models (Duffie, Pan, and Singleton (2000) and Duffie, Filipović, and Schachermayer (2003)). The second component of the stochastic variance, V j (t), is its discontinuous part6 . I model V j (t) as a moving average of a pure jump Lévy process. To guarantee nonnegativity of V j (t) I define it as an integral with respect to the random measure µ and not with respect to its compensated version µ̃. Further, I restrict k(·) > 0 and g(·) > 0 as already specified in the definition of the stochastic volatility model. A more familiar representation for V j (t) is (with the normalization g(0) = 1) X V j (t) = g(t − s)∆V j (s). s≤t This shows that V j (t) is a weighted sum of past variance jumps. The impact of the past jumps on the current level of V j (t) is determined by the function g(·). In other words g(·) controls the persistence in the process V j (t). This modeling of the discontinuous component of the stochastic variance follows the general dynamics of the jump-driven stochastic volatility models introduced in Todorov (2006a) (which include also the non-Gaussian OU model of Barndorff-Nielsen and Shephard (2001) and its extensions in 5 6 This is because the jump martingale is time-homogeneous. V j (t) is discontinuous provided g(0) 6= 0, which will be assumed. 5 Brockwell (2001a) and Brockwell and Marquardt (2005)). In these models the stochastic variance is driven solely by nonnegative jumps. The last component of the price in equation (1) is a jump martingale and as a result is defined as an integral with respect to the compensated martingale measure µ̃. This notation is less familiar in the empirical finance literature. If the price jumps are of finite variation, e.g. all compound Poisson processes, we have Z tZ Z tZ Z h(x)µ̃(ds, dx) = h(x)µ(ds, dx) − t h(x)G(dx). (5) 0 Rn 0 Rn 0 0 Rn 0 The second term in the above equation is constant and can be added to the drift term. For the first term in (5) we have Z tZ X h(x)µ(ds, dx) = ∆f (s), 0 Rn 0 0<s≤t which is more familiar and shows that this integral is simply a sum of all price jumps up to time t. The reason the price jumps are written as in equation (1) is that this allows considering more general cases in which the decomposition in (5) does not work, i.e. the case of infinite variation price jumps7 . Therefore, the jumps in the price are allowed to be completely general as far as their activity is concerned8 . I proceed with defining key variables, associated with the stochastic volatility model (1)-(4), to be used throughout the paper. The return of holding the futures contract over the period (t, t + a] is denoted with ra (t) = f (t + a) − f (t). The quadratic variation (hereafter QV) of the futures price f (t) over the period (t, t + a] is given by Z t+a Z t+a Z 2 (6) [f, f ](t,t+a] = σ (s)ds + h2 (x)µ(ds, dx). t t Rn 0 The first term in the quadratic variation is due to the continuous martingale in the futures price. This is the continuous part of the quadratic variation. I refer to it as Integrated Variance (hereafter IV) Z IVa (t) = t+a σ 2 (s)ds. (7) t The second component of the quadratic variation is due to the discontinuous martingale in the price. It can be written as Z Z t+a Z Z t+a Z 2 2 h2 (x)µ̃(ds, dx). h (x)µ(ds, dx) = a h (x)G(dx) + (8) t Rn 0 Rn 0 Rn 0 t The first term in (8) is a constant due to the time-homogeneity property of the Lévy processes. The second term in (8) is a jump martingale with jumps equal to h2 (x) (i.e. the squares of the price jumps). Further, it is convenient to decompose IV into two components corresponding to the continuous and jump components of σ 2 (t) IVa (t) = IVac (t) + IVaj (t), (9) where Z IVac (t) = t+a Z c V (s)ds, and t IVaj (t) = t+a V j (s)ds. (10) t 7 In intuitive terms a function is of finite variation if its trajectory over a finite interval is finite. If this is not the case the function is of infinite variation. If the jumps are of infinite variation we need to compensate them in order to be able to define the last integral in (1). In this case the integral is defined as a stochastic integral, see e.g. Jacod and Shiryaev (2003). 8 In the empirical part the activity of the price jumps is restricted and the infinite variation case is excluded. However, for this Section I keep the model as general as possible and allow for infinite variation price jumps since the analysis in this Section covers this case as well and nothing is gained from excluding it. 6 The second integral in (10) can be expressed as another integral with respect to the random measure µ. This could be easily done using Fubini’s theorem9 Z t+a Z j IVa (t) = Ha (t, s)k(x)µ(ds, dx), (11) −∞ where Rn 0 ( R t+a g(z − s)dz if s < t Rtt+a Ha (t, s) = g(z − s)dz if t ≤ s < t + a. s (12) Note that the quadratic variation of the price varies over time. There are two reasons for this. The first is that σ 2 (t) has time-variation. The second reason for the randomness of the quadratic variation is the presence of jumps in the price process. This observation is important for the analysis of the variance risk premium. Finally, the unit of measurement in this paper is a trading day and if a = 1, in order to simplify notation, I will omit the dependence on a in the notation of all the quantities defined above. 2.2 Model Characteristics I continue with a short discussion of the empirically relevant features of the stochastic volatility model (1)-(4). Price Jumps. The presence of jumps in the price process has two implications. The first consequence of price jumps is that the generated distributions (both conditional and unconditional) of the returns are much more general. Thus, for example, price jumps (together with the time-varying stochastic variance) can easily account for the observed fat-tailedness in the unconditional return distribution. This implication of the price jumps could be detected even with the use of low-frequency data. Indeed, the studies of Andersen, Benzoni, and Lund (2002), Chernov, Gallant, Ghysels, and Tauchen (2003) and Eraker, Johannes, and Polson (2003) provide empirical evidence, based on daily financial returns, in favor of parametric models containing price jumps. The second implication of price jumps is the discontinuity of the price trajectory. This is a pathwise implication of the presence of jumps in the price. Naturally, if we want to separate the price jumps from the continuous martingale component of the price, based on their difference in pathwise behavior, we need high-frequency observations. Recently, Barndorff-Nielsen and Shephard (2004, 2006) developed nonparametric tests for the presence of price jumps based on realized multipower variation statistics. These statistics are constructed from high-frequency data and behave differently depending on whether the price contains jumps10 . Using the tests of Barndorff-Nielsen and Shephard (2004, 2006) 11 , Barndorff-Nielsen and Shephard (2006), Andersen, Bollerslev, and Diebold (2005b) and Huang and Tauchen (2005) find strong empirical evidence for a non-trivial jump component in the price. Further, Ait-Sahalia (2004) and Ait-Sahalia and Jacod (2005), working in a time-homogenous setting, show theoretically that jumps could be disentangled in a parametric estimation with the use of high-frequency data. Bollerslev and Zhou (2002), Jiang and Oomen (2006), and Todorov (2006a) use high-frequency data to estimate different parametric models with and without price jumps. These studies find strong support for models containing price jumps. Thus, overall, there is overwhelming evidence for the presence of price jumps and their inclusion is necessary. In the model here the jumps in the price are12 ∆f (t) = h(x). 9 (13) The integral in the definition of the second variance factor V j (t) is with respect to µ and thus could be defined pathwise. In the estimation I use realized multipower variation statistics and Section 3 contains the definitions and properties of the ones used in this paper. 11 These tests are valid asymptotically, as the intraday sampling interval goes to zero. However, the Monte Carlo analysis in Huang and Tauchen (2005) suggests that they are good jump detectors for the frequencies at which the high-frequency data is recorded. 12 This notation is a bit loose, but underlies the fact jumps are time-homogenous. 10 7 As seen from equation (13), the jumps in the price are time-homogenous, i.e., they are modelled as a Lévy process. In the empirical part in Section 4 I find that for the data used in this study there is no significant time-variation in the price jumps, at least when looking at their quadratic variation only. Therefore, in this study I restrict the price jumps to be time-homogenous. Indeed, the empirical evidence (e.g. Andersen et al. (2005b)) suggests that the continuous and discontinuous martingale components of the futures price process differ substantially in their persistence. This is why in this paper these two components of the price are modelled separately, instead of working with a more parsimonious model where σ 2 (t) determines the time-variation both of the continuous and discontinuous martingales or even working with a model where the price is a pure jump process. An extension of the current work is to allow for time-variation in the jumps, which can be relevant especially if the jumps are identified not only through their quadratic variation13 . Stochastic Variance. Another important feature of the financial data is the persistence in the returns variance. Since in the model here the price jumps are time-homogenous their conditional variance is constant. Therefore, persistence in the returns variance can be generated only through time-variation in σ 2 (t). This, in turn, can be done in the following way. First, the continuous component of the stochastic variance, V c (t), is a sum of independent square-root processes. Secondly, the jump variance component, V j (t), is modelled as a moving average of past jumps. A typical choice for the function g(·) in equation (4) is a CARMA (continuous-time autoregressive moving average) kernel (see Brockwell (2001a,b)), but other choices like the fractionally-integrated CARMA kernels introduced in Brockwell and Marquardt (2005) are also possible. The choice of the function g(·) and the number of factors in the continuous variance part determine the persistence of the stochastic variance. In Section 4 I provide further details on the particular choice used in the empirical implementation. An important feature of the stochastic variance σ 2 (t) (e.g. Eraker, Johannes, and Polson (2003) among others) is its ability to increase rather quickly. Such sudden changes in the stochastic variance are hard to be generated with a continuous path process such as the square-root process. However, they are naturally generated by allowing for jumps in the variance. In the model here the jumps in the variance are (assuming g(·) is a continuous function) ∆σ 2 (t) = g(0)k(x). (14) Jump Dependence. The modeling of the jumps in the price and the variance is quite flexible. In equations (1) and (3) the jump sizes in the price and the variance are expressed as functions of jumps in an n-dimensional space. This way all possible dependencies between the jumps can be captured in a practical and intuitive way. I demonstrate with several examples, which have been used in the financial literature, the flexibility of the jump modeling used here. I start with two examples which use a one-dimensional Poisson measure, i.e. in which x = x. The first of these two examples is of a perfect linear dependence between the jumps in the price and the variance. Such modeling is used in the non-Gaussian model of Barndorff-Nielsen and Shephard (2001) (which is nested in the general stochastic volatility model here). In the setting of the model (1)-(4) this type of dependence is generated with h(x) ∝ x k(x) ∝ x. Note that, since the jumps in the variance are restricted to be positive, this dependence has the potentially limiting feature that the price jumps are of the same sign14 . 13 Time-variation in the price jumps can be generated either by introducing time-variation in the compensator ν (e.g. the time-changed Lévy processes considered in Carr, Geman, Madan, and Yor (2003)) or by time-changing the jump size (e.g. the COGARCH model of Klüppelberg, Lindner, and Maller (2004) and the pure-jump jump-driven stochastic volatility models in Todorov (2006a)). 14 Another restrictive feature of this modeling of the jumps is that the price jumps are constrained to be of finite variation. However, in the empirical part I exclude infinite variation price jumps from the analysis. 8 The second example, where the jumps in the price and the variance are modelled using a one-dimensional measure, is when the variance jumps are proportional to the squared price jumps, i.e. h(x) ∝ x k(x) ∝ x2 . This dependence structure is used in Todorov (2006a). It induces non-linear relationship between the price and the variance jumps. However, note that it implies perfect linear dependence between the squared price jumps and the jumps in the variance. This modeling of the jumps resembles the modeling of the conditional variance in the GARCH models. It is potentially more flexible than the previous case since the price jumps can be of either sign and are not restricted to those of finite variation. The above two examples consider the use of one-dimensional Poisson measures in modeling the price and variance jumps. However, the analysis here encompasses more general cases, where x can be multidimensional. For example, independent jumps in the setting here can be modelled as follows. x = (x1 , x2 ) h(x) = h(x1 ) k(x) = k(x2 ), and the compensator G(·) is such that Z R20 1(x1 x2 6=0) G(dx) = 0, i.e., the measure G(·) is concentrated on the two axes in R20 . Finally, the Lévy copula approach of Tankov (2003), which describes the dependence of a two-dimensional Lévy process in its full generality, could be also analyzed in the setting here. For the purposes of the analysis in this paper I leave h(·), k(·) and G(·) unspecified for two reasons. First, their parametric (or semiparametric) modeling does not simplify the analysis here. Secondly, since we do not have a clear idea of the dependence structure of the jumps, it is better to leave this structure unspecified and let the data “choose” the right one. This way potential misspecification problems can be avoided. Leverage Effect. Another empirically relevant feature of the model is the “leverage effect”, i.e. the (negative) linear relationship between the price and variance innovations. In this study I am not interested in measuring the “leverage effect”. However, in order for the model to be empirically realistic it should allow for a flexible way of generating “leverage effect”. Therefore, I explain shortly how the model can account for this feature of the data. Since the price and the variance are both driven by Brownian motions as well as Poisson jumps, the “leverage effect” could be generated in two different ways in this model. One way, which has been used predominantly in the financial literature, is to correlate the Brownian motions in the price and in the variance. It is interesting to note that if this correlation is zero the Brownian innovations in the price and in the variance will be independent. However, this could be restrictive as we could have no “leverage effect” while the innovations in the price and in the variance are still dependent (e.g. the standard GARCH model in discrete time or the jump-driven stochastic volatility models in Todorov (2006a)). The second way of generating “leverage effect” is through a dependence between the jumps in the price and the variance. In the financial literature the jumps are usually associated with very big changes. However, here the jumps could be specified as infinitely active (having an infinite number of jumps in any finite interval). Thus, a link between the jumps in the price and in the variance is not necessarily capturing only the link between the excessive changes in price and variance. 2.3 Moments of the Return Process I end this Section with a Theorem regarding moments of the return process which will be used later in the estimation. The theorem provides further insight into the mechanism through which the stochastic volatility model could account for the stylized features of the financial data. 9 Theorem 1 (Moments of the return process) In the stochastic R R volatility model R(1)-(4) assume that 2 ≤ 2κ θ for i = 1, ..., p; ∞ g(s)ds < ∞ and ∞ g 2 (s)ds < ∞; κi > 0, θi > 0 and σiv k(x)G(dx) < ∞ i i 0 0 Rn 0 R R R 2 2 4 and Rn k (x)G(dx) < ∞; Rn h (x)G(dx) < ∞ and Rn h (x)G(dx) < ∞. Then if α(t) = 0 we have 0 0 0 Var(ra (t)) = a p X Z θi + a g(s)ds ÃZ Z 4 Rn 0 +6a2 +3a2 Z n X h2 (x)G(dx) θi2 + 6a2 i=1 i=1 aZ s +6 0 à p X R0n h (x)G(dx) −∞ Z θi +6 0 Z θi + i=1 n X Z 2 h (x)G(dx) + 3a Rn 0 k(x)G(dx) + a !2 2 Z Z Rn 0 0 i=1 ¡ ¢ E ra4 (t) = a Z ∞ g(u)du 0 Rn 0 Ha (0, u)du ! g(u)du 0 Z Hs (0, u)g(s − u)duds Rn 0 ÃZ k(x)G(dx) + 3a2 2 Rn 0 Rn 0 h2 (x)k(x)G(dx) k(x)G(dx) Z ∞ (15) Z a Z ∞ h2 (x)G(dx), Rn 0 k (x)G(dx) + 6 Z ∞ g(u)du 0 p X σ 2 θi iv i=1 2κ3i Rn 0 !2 k(x)G(dx) (aκi − 1 + e−aκi ) + O(a5/2 ), (16) If in addition b ≥ a, we have ¶ µ Z a Z p −κi a 2 σ 2 θ X ¡ 2 ¢ 2 −κi (b−a) 1 − e iv i Cov ra (0), ra (b) = + Ha (0, u)Ha (b, u)du k 2 (x)G(dx) e n κi 2κi −∞ R 0 i=1 Z a Z + Ha (h, u)du h2 (x)k(x)G(dx) + O(a5/2 ). (17) 0 Rn 0 The assumption of a zero drift term is not crucial for the results in the Theorem. It is assumed in order to avoid trivial complications. In the case when the Brownian motions in the price and σ 2 (t) are independent, the error terms in equations (16) and (17) are exactly zero. In this case the proof of the Theorem follows essentially from the results in Todorov (2006a). When the Brownian motions in the price and σ 2 (t) are correlated, the proof of (16) and (17) involves bounding terms coming from this correlation. The proofs are easy, but somewhat tedious; therefore, they are given in a separate Appendix available upon request15 . 3 Realized Power Variation Based Inference This Section introduces the estimation technique used in the paper and states an asymptotic result for the resulting estimator. All results in this Section follow from Todorov (2006b), where the estimation of much more general stochastic volatility models is considered. The discussion here is in the context of GMM, as this is the estimation method used in the paper16 . The analysis is general in the sense that I do not specify the moment conditions in the GMM estimator. The theoretical result here provides justification for the estimation in Section 4 and Section 6. These two Sections contain the particular moment conditions and other details on the estimation. Estimation of stochastic volatility models has a long history in the empirical financial literature. One difficulty in the estimation comes from the fact that σ(t) is stochastic and unobservable. The presence of 15 16 The Appendix with the proof of Theorem 1 can be downloaded from www.duke.edu/∼vst2. Exactly the same analysis applies to M-type estimators. 10 price jumps additionally complicates the estimation problem. The availability of reliable high-frequency data over the last two decades provides a way of making some of the unobservable quantities practically observable. This can significantly simplify the estimation and in addition can provide big gains of efficiency. In this paper I aggregate the high-frequency data into realized multipower variation statistics, which are proxies for certain latent processes associated with the stochastic volatility model (e.g. QV and IV). The inference is based on these realized multipower variation statistics. I start with defining the realized multipower variation statistics that are used in this paper; see Barndorff-Nielsen, Graversen, Jacod, Podolskij, and Shephard (2005) for a general definition. The first statistic is the daily Realized Variance (hereafter abbreviated as RV). It is defined over a day t as RVδ (t) = M X rδ2 (t + (i − 1)δ), (18) i=1 where M = b1/δc. This statistic has been used extensively in finance (see Andersen, Bollerslev, and Diebold (2005a) and references therein). Its usefulness for the inference is determined from the fact that for δ close to zero, i.e. with high-frequency observations, it is close to QV defined in equation (6)17 . The second realized multipower variation statistic used in this paper is the Realized Tripower Variation (hereafter abbreviated as TV). It is defined over a day t as T Vδ (t) = µ−3 2/3 M X |rδ (t + (i − 3)δ)|2/3 |rδ (t + (i − 2)δ|2/3 |rδ (t + (i − 1)δ)|2/3 , (19) i=3 where µa = E(|u|a ) and u ∼ N (0, 1). Its usefulness is determined from the fact that for δ close to zero TV is close to IV, defined in equation (7)18 . It should be mentioned that it is more common to use the Realized Bipower Variation for testing for jumps and estimation of IV (e.g. Barndorff-Nielsen and Shephard (2004, 2006))19 . However, the Monte Carlo analysis in Todorov (2006b) shows that for estimating moments of IV, TV performs better for values of δ comparable with those of the available high-frequency data (see also the Monte Carlo evidence in Barndorff-Nielsen et al. (2006)). For high-frequency data δ is close to zero and consequently, as argued above, RV and TV are close to QV and IV respectively. Therefore, estimation that is based on using RV and TV is approximately the same as inference that is based on using the unobservable QV and IV (this statement is made precise in Theorem 2). At the same time, estimation based on QV and IV is easy to be done since in this case we have observations for the latent integrated variance and the squared jumps over the days in the sample. A problem with implementing the inference based on realized multipower variation statistics is that in many cases closed-form expressions for their moments are not available. However, given the fact that for δ close to zero RV and TV are close to QV and IV respectively, the moments of RV and TV can be approximated with the corresponding ones of QV and IV. Of course, this approximation introduces error in the estimation, the magnitude of which is controlled by δ. In Theorem 2 I provide conditions under which the consistency and the efficiency in the estimation is not affected by the approximation error. I proceed with stating the assumptions needed for Theorem 2. 17 This statement can be made more formal. As δ → 0 RV converges in probability to QV. Under certain integrability conditions, Todorov (2006b) shows that the convergence holds also in moments. Finally, a CLT result is also available; see Jacod (2006a,b). 18 As for RV this statement can be made formal. For δ → 0 TV converges in probability to IV. Also, provided certain integrability conditions are satisfied, the convergence holds in moments. A CLT result also holds, provided the activity of the price jumps is restricted; see Barndorff-Nielsen et al. (2006). 19 The Realized Bipower Variation over a day t is formally defined as BVδ (t) = µ−2 1 M X |rδ (t + (i − 2)δ||rδ (t + (i − 1)δ)|. i=2 11 Assumption 1. θ is a vector of parameters of the stochastic volatility model (1)-(4), {z(t)} is a data vector consisting of daily statistics associated with the price process f (t), whose only “infeasible” elements are IV and QV (and possibly lags of them). The infeasible estimator is defined as θ̂nf = argmin mT (θ)0 Ŵ mT (θ), (20) θ∈Θ P p where mT (θ) = T1 Tt=1 m(z(t), θ); Ŵ → W and W is a positive definite matrix. Assume that θ̂nf in (20) is consistent and asymptotically normal. Assumption 2. ẑ(t) is constructed from z(t) by replacing QV with RV and IV with TV. The feasible estimator θ̂f is defined as θ̂f = argmin m̂T (θ)0 Ŵ m̂T (θ), (21) where m̂T (θ) = 1 T PT θ∈Θ t=1 m(ẑ(t), θ). Assumption 3. α < 4/5, where α is the Blumenthal-Getoor index (Blumenthal and Getoor (1961)) of the price jumps defined as ¾ ½ Z γ 1|h(x)|≤1 |h(x)| G(dx) < ∞ . α = inf γ ≥ 0 : (22) Rn 0 p 2 Assumption 4. α(t) is stationary 2κi θi for i = 1, ..., p; R ∞and Ep|α(t)| < ∞; κi > 0, θi > R0 and σiv ≤λ|h(x)| G(dx) < ∞ and g(·) is bounded around zero and 0 |g(s)| ds < ∞ for every p > 0; Rn 1|h(x)|>² e 0 R λk(x) G(dx) < ∞ for some ² > 0 and λ > 0. Rn 1k(x)>² e 0 For the next qPassumptions we need the following notation. For an arbitrary matrix A = [aij ] I denote 2 with ||A|| = ij aij its Euclidean norm. Assumption 5. ||m(z + y, θ) − m(z, θ)|| ≤ ||C(θ)||||P (z + y) − P (z)|| for every z and y and some matrix valued functions C(·) and P (·) such that P (z) has at most polynomial growth. Assumption 6. ||∇θ m(z + y, θ) − ∇θ m(z, θ)|| ≤ ||C(θ)||||P (z + y) − P (z)|| for every z and y and some matrix valued functions C(·) and P (·) such that P (z) has at most polynomial growth. Assumption 7. ∇z m(z, θ0 ) exists, it is continuous in z and has at most polynomial growth in z. Before stating the asymptotic result for θ̂f , I make few remarks regarding the assumptions. Remark 1. θ̂nf is an infeasible estimator which is used as a benchmark for the feasible one θ̂f . For computing θ̂nf the econometrician has access to the unobservable QV and IV. Note that {z(t)} can contain other variables besides QV and IV. However, if this is the case, then these variables are observable (e.g. daily returns). In other words, the focus here is the error in the estimation coming from the substitution of QV with RV and IV with TV. Remark 2. Depending on what enters in the data vector {z(t)}, θ can include the whole parameter vector of the stochastic volatility model (1)-(4) or only part of it. For example, if {z(t)} contains only QV and IV (and possibly lags of them) then we will not be able to estimate the parameters controlling the drift α(t). This is because the estimation in this case is based only on the continuous and discontinuous components of the quadratic variation QV and the drift term does not participate in the latter. 12 Remark 3. The conditions for the consistency and asymptotic normality of θf are well known (see e.g. Newey and McFadden (1994) and Wooldridge (1994)) and therefore are omitted here. Remark 4. In the case when TV participates in the (feasible) data vector, the quality of ẑ(t) as a proxy for z(t) depends on how good TV is in disentangling price jumps. This, in turn, depends crucially on the activity of the price jumps. Intuitively, if the price contains many small jumps, then it will be harder to separate these small jumps from the continuous price movements. The activity of the price jumps is indexed with the Blumenthal-Getoor index given in (22). The index is in the interval [0, 2]. In intuitive terms, the index measures the smallest power for which the sum of the absolute jumps raised to it is still finite. As seen from the definition of the index, it concerns the behavior of the very small jumps. For finite activity jump processes this index is 0. Another example is the α-(tempered) stable process, where the Blumenthal-Getoor index coincides with the α parameter of the process. In Figure 1 I plot simulated trajectories of jump processes with different values of the Blumenthal-Getoor index and a simulated trajectory of Brownian motion. As seen from the figure, for higher values of the BlumenthalGetoor index the trajectories of the jump processes look very similar to the trajectory of the Brownian motion. Therefore, intuitively, the index is measuring how close the jumps are to a continuous process. Remark 5. Assumption 3 is needed only when TV is included in the data vector ẑ(t). Further, it could be weakened for the consistency result in Theorem 2. Assumption 3 coincides with the condition in Barndorff-Nielsen et al. (2006) under which the asymptotic distribution of TV (as δ → 0) is unaffected by the presence of price jumps. This is not a coincidence of course. Under the condition T δ → 0, which is imposed in part b of Theorem 2, we need supδ δ −1 E (T Vδ (t) − IV (t))2 < K for some constant K so that √ the error θ̂f − θ̂nf is op (1/ T ). Note also that Assumption 3 rules out infinite variation price jumps. The Monte Carlo analysis in Todorov (2006b) shows that for practical applications this assumption is relevant. Remark 6. Assumption 4 puts integrability conditions on the components of the stochastic volatility model. It is used for proving uniform integrability of powers of RV and TV. The conditions involving the continuous variance factors, V c (t), are well known (see e.g., Feller (1951)). Under these conditions V c (t) is strictly positive and stationary. The conditions in Assumption 4 involving the jumps in the price and the variance guarantee that all moments of the price and variance jumps are finite. These conditions involve only the behavior of the big (price or variance) jumps (unlike the Blumenthal-Getoor index which measures the behavior of the Lévy measure for the small jumps). The integrability condition for the powers of the function g(·) is necessary for the discontinuous variance factor V j (t) to have all its moments finite (see Rajput and Rosiński (1989)). This condition is automatically satisfied when g(·) is a sum of exponentials (with negative exponents) which is the case for CARMA kernels with distinctive negative autoregressive roots that are used in the empirical part. Remark 7. Assumptions 5-7 are related with the moment conditions m(·, ·) used in the GMM estimation. They are satisfied if for example m(·, ·) is polynomial in z, which is the case for the estimators in Sections 4 and 6. The asymptotic properties of the feasible estimator θ̂f are analyzed in the next Theorem. Theorem 2 (Consistency and asymptotic normality of θ̂f ) (a) Suppose Assumptions 1-5 hold. Then for T → ∞ and δ → 0 we have p θ̂f → θ0 . (b) Suppose Assumptions 1-7 hold. Then if T → ∞, δ → 0 and T δ → 0 we have ´ √ ³ d T θ̂f − θ0 → N (0, Avar(θ̂nf )). 13 (23) (24) Note that for the consistency of θ̂f we do not need a condition for the relative speed at which T → ∞ and δ → 0. For the asymptotic normality result, however, we need such a condition. This is so because in √ this case the error θ̂f − θ̂nf has to satisfy a stronger condition, i.e. to be op (1/ T ). Theorem 2 shows that for this it suffices to have T δ → 0, i.e. the number of intraday observations should increase slightly faster than the number of the days in the sample increase. The proof of Theorem 2 can be found in Todorov (2006b). 4 Selection of the Model under the Physical Measure 4.1 High-Frequency Data and Initial Data Analysis I estimate different model specifications (all falling in the general stochastic volatility model (1)-(4)) using high-frequency data on the S&P 500 index futures contract. The data covers the period January 2, 1990, to November 29, 2002. There are 80 five-minute return observations in each day covering the day trading session from 9:30am till 4:15pm. For each of the days in the sample I calculate RV and TV, using the high-frequency returns over that day and using equations (18) and (19)20 . Figure 2 plots the returns over the day as well as TV and JV=RV-TV. The last variable is a measure of the sum of squared jumps over the day. As seen from the TV series, integrated variance has spikes and this is suggestive of the presence of jumps in the stochastic variance σ 2 (t) (the spot variance of the continuous price component). Another interesting observation from Figure 2 is that most of the days in which TV is high are days in which JV is high as well. This suggests that jumps in σ 2 (t) are linked with the price jumps. I continue the initial data analysis by investigating the persistence in IV and the squared daily price jumps. The two panels of Figure 3 show the first 100 autocorrelations of TV and JV respectively. As seen from the Figure, IV and the sum of squared price jumps differ significantly in their persistence. On one hand, IV is a very persistent process. On the other hand, the squared jumps over the day show almost no persistence. Therefore, even if there is time-variation in the price jumps, it should be such that it yields almost no persistence when looking at the squared jumps over the days. 4.2 Model Selection In this Subsection I estimate several specifications of the stochastic volatility (hereafter abbreviated as SV) model (1)-(4) and select one of them to be used in the subsequent analysis of the variance risk premium. Below I specify each of the candidate models. In each of them I keep the price jumps since we saw in the previous Subsection that the data suggests nontrivial price jump contribution. To save space I do not report estimation results for SV models without a price jump component in them since they are overwhelmingly rejected. I classify the models according to the driving factor of the stochastic variance σ 2 (t), that is, whether the stochastic variance is determined by diffusion processes only, or is determined by jumps only, or contains both types of processes. For convenience here and to avoid unnecessary repetition I state in each of the cases only the stochastic variance specification (the equation for the evolution of the price is always the same and is given by (1)). 4.2.1 Model Specifications Diffusive SV Model In this model the stochastic volatility has only a continuous component (i.e. V j (t) = 0). This model is nested in the affine jump-diffusion models of Duffie, Pan, and Singleton (2000). Here I look at up to two 20 On the days on which RVδ (t) > T Vδ (t), I replace the value of T Vδ (t) with that of RVδ (t), i.e. I compute T Vδ (t) ∧ RVδ (t). This is a finite sample correction, since QV (t) ≥ IV (t) always, and guarantees that the estimate for the squared price jumps is always nonnegative. 14 factors, i.e., the stochastic variance specification is V (t) = V c (t) = V1c (t) + V2c (t), q c c dVi (t) = κi (θi − Vi (t))dt + σiv Vic (t)dBi (t), (25) i = 1, 2. (26) To avoid identification problems in the estimation I make the following re-parametrization. I set θ = θ1 +θ2 q θi and σi = σiv 2κ for i = 1, 2. σi2 is the variance of Vic (t). Therefore, the stochastic variance parameters i I estimate are θ and κi , σi for i = 1, 2. The nonnegativity and stationarity restrictions in Assumption 4 imply θ > 0, κ1 > 0, κ2 > 0, σ1 + σ2 < θ. I estimate both a one and a two-factor model. Jump-Driven SV Model In this model the stochastic variance is driven only by positive jumps, i.e. V c (t) = 0. This model falls into the class of the jump-driven stochastic volatility models of Todorov (2006a). Here I use the following specification for the moving average function g(·) in equation (4) Z t Z V (t) = V j (t) = g(t − s)k(x)µ(ds, dx), (27) −∞ g(u) = Rn 0 b0 + ρ1 ρ1 u b0 + ρ2 ρ2 u e + e , ρ1 − ρ2 ρ2 − ρ1 u ≥ 0. (28) The expression in (28) is of a (normalized) CARMA(2,1) kernel (see Brockwell (2001b) for details). The reason I work with it here is that it induces the same type of autocorrelation function for the stochastic variance σ 2 (t) as that implied by a two-factor affine jump-diffusion model. Thus, the models in the different classes here are given a fair comparison. To ensure nonnegativity of the kernel as well as to guarantee (weak) stationarity of the stochastic variance I impose the following parameter restrictions (see Todorov and Tauchen (2006)) b0 ≥ − max{ρ1 , ρ2 } > 0. I look also at a CARMA(1,0) kernel, which is the analogue of the one-factor Diffusive SV model. The CARMA(1,0) kernel is a restriction of the CARMA(2,1) kernel. The restriction is b0 = − min{ρ1 , ρ2 }. Jump-Diffusive SV Model This model has both a diffusive and a jump component in the variance V (t) = V c (t) + V j (t), V c (t) = V1c (t), q c c dV1 (t) = κ1 (θ1 − V1 (t))dt + σ1v V1c (t)dB1 (t), Z t Z j V (t) = g(t − s)k(x)µ(ds, dx), −∞ Rn 0 g(u) = eρ1 u , u ≥ 0. (29) (30) (31) (32) This variance specification generates the same autocorrelation structure for σ 2 (t) as that implied by the CARMA(2,1)-jump-driven SV model and the two-factor affine jump-diffusion model. To avoid R identifica1 tion problems in the estimation, the model is re-parameterized as follows. I set θ = θ1 − ρ1 R2 k(x)G(dx) 0 q θ 2 and σ1 = σ1v 2κ1 . θ is the mean of the stochastic variance and σ1 is the variance of the diffusive variance component V c (t). In other words, as for the Diffusive SV model, I do not estimate separately the means 15 of the different variance components (note that in the Jump-Driven SV model this problem is automatically avoided since in it we have a single factor). The nonnegativity and stationarity of σ 2 (t) implies the following conditions on the parameters which are imposed in the estimation κ1 > 0, θ > 0, 4.2.2 ρ1 < 0, σ1 < θ. Details on the estimation Turning to the estimation of the different model specifications, I use a GMM-type estimator and apply the general result in Theorem 2. In all estimated models I do not specify the Lévy processes in the price and the variance. Instead, I treat as parameters only cumulants which are needed for calculating the moment conditions in the GMM. In particular, I estimate Z Z Z Z Z 2 2 4 k(x)G(dx), k (x)G(dx) and h2 (x)k(x)G(dx). h (x)G(dx), h (x)G(dx), Rn 0 Rn 0 Rn 0 Rn 0 Rn 0 R As mentioned above, in the case of the Jump-Diffusive SV model I do not estimate Rn k(x)G(dx) separately 0 to avoid identification problems. Also, in the case of the Diffusive SV model, since the variance does not contain jumps, the last three quantities above are obviously not estimated. Further, in the estimation of the models I impose the following constraint on the cumulants sZ Z Z 0≤ Rn 0 h2 (x)k(x)G(dx) ≤ Rn 0 h4 (x)G(dx) Rn 0 k 2 (x)G(dx). This constraint guarantees that there exists a two-dimensional Lévy process (for the jumps in the price and the variance) with cumulants equal to the estimated ones. Turning to the moment conditions in the GMM, for the estimation of all the models specified above, I match the following statistics 1. Mean, Variance and Autocorrelation of IV 2. Mean and Variance of QV 3. Mean of Realized Fourth Variation (hereafter FV), which is defined below For the autocorrelation of IV I use lags 1, 3 and 6 as well as the average autocorrelation for lags 11 − 20, 21 − 30 and 31 − 40. The averaging of the higher order autocorrelations is done since these autocorrelations are estimated with less precision. Altogether I end up with 11 moment conditions. I make the following additional observations regarding the estimation. • I use Theorem 2 and substitute in the estimation the unobservable IV with TV. In addition, I make use of the following CLT result. Under the assumptions of Theorem 2, as shown in Barndorff-Nielsen et al. (2005) and Barndorff-Nielsen et al. (2006), we have √ Z t+1 µ ¶ Z t+1 A law −1/2 2 δ T Vδ (t) − σ (u)du −→ 3 σ 2 (u)dW (u), (33) µ t 2/3 t where W is a Wiener process defined on an extension of the probability space and is independent of the futures log-price process f . The constant A is given by A = µ34/3 − 5µ62/3 + 2µ22/3 µ24/3 + 2µ42/3 µ4/3 , 16 where µa is defined after equation (19). Based on this result we have the following approximation r µZ ¶ t+1 K 4 T Vδ (t) ≈ IV (t) + σ (u)du ²t , (34) M t where K = A µ62/3 ≈ 3.0613 and (²t ) is i.i.d. sequence and ²t ∼ N (0, 1). Similar approximation for RV, for the case of no price jumps, is used in Andersen, Bollerslev, and Meddahi (2005) for the purposes of constructing volatility forecasts. Note that ²t is independent of IV (t) and is i.i.d. (with mean 0). This means that, using the asymptotic refinement in equation (34), the mean and autocovariance of TV are approximated by the mean and the autocovariance of IV. For the approximation of the variance of TV, in addition to the variance of IV, we have a term of order O(δ) reflecting the effect of ²t µZ t+1 ¶ K 4 Var (T Vδ ) ≈ Var (IV (t)) + E σ (u)du . (35) M t I use the approximation in equation (35) in the estimation. That is, the sample variance of TV is matched to the expression in ³R (35). Under´the conditions of Theorem 2, there is no asymptotic effect t+1 K from adding the term M E t σ 4 (u)du to the variance of IV, i.e. Theorem 2 continues to hold. This approximation can be viewed as a small sample correction (i.e. for a finite number of intraday observations)21 (see Todorov (2006b)). Finally, for the moments of IV I use Theorem 1 in Todorov (2006a). • In the estimation the unobservable QV is replaced by RV, using Theorem 2. Similar to IV, I make a small sample correction to the variance of QV. I match the sample variance of RV to an approximation of the variance of RV, which is derived using Theorem 1 22 . This approximation can be written as a sum of the variance of QV and an additional O(δ) term. Under the conditions of Theorem 2 the correction has no asymptotic effect. • FV is a particular case of realized power variation. It is defined for a day t as F Vδ (t) = M X rδ4 (t + (i − 1)δ). (36) i=1 p R It can be shown that for ∀t F Vδ (t) → Rn g 4 (x)G(dx) as δ → 0 (see Woerner (2006) and Jacod 0 (2006a)). Thus, FV is a measure of the sum of the price jumps raised to the power four over the day. I use here F V to completely identify the second order moments of the jumps in the price and the variance. To calculate the mean of FV I use equation (16) in Theorem 1. This is a high order approximation. The error of the approximation is of magnitude O(δ 3/2 ), and is coming from the “leverage effect” associated with the link between the diffusive variance factor and the diffusive price innovation. This error will be zero for the Jump-Driven SV model and the Jump-Diffusive SV model, provided that in the latter model the continuous innovations in the price and the variance are independent. I neglect this error in this estimation23 . • For calculating the optimal weighting matrix for the GMM-type estimator I use a Parzen kernel with a lag-length of 80. 21 In the estimation it does not have effect. For example for the parameter estimates of the Jump-Diffusive ³R a very significant ´ t+1 K model Var (IV (t)) ≈ 1.14 and M E t σ 4 (u)du ≈ 0.08. 22 The error of approximating the true variance of RV is of magnitude O(δ 1/2 ) and is induced by the “leverage effect” coming from the link between the diffusive component of the stochastic variance and the diffusive price innovations. In the case of the Jump-Driven SV models this error is zero. 23 Note that the effect of this error is not covered by Theorem 2. However, numerical experiments suggest that for practical purposes the error is negligible. 17 • The estimation is performed using the MCMC approach of Chernozhukov and Hong (2003) of treating the Laplace transform of the objective function as an unnormalized likelihood function and applying MCMC to the pseudo posterior. The point estimates are the resulting mode of the pseudo posterior. 4.2.3 Estimation results The estimation results are reported in Tables 1-3. Below I summarize the key findings from the estimation. • One-factor type stochastic volatility models cannot match the autocorrelation in IV (respectively TV). This claim holds true regardless of the specification of the stochastic variance σ 2 (t) as a sum of diffusions or purely jump-driven. Both one-factor type models estimated here produce very bad fit as can be seen by their corresponding J-statistics24 reported in the first columns of Tables 1 and 2 respectively. On the other hand, inclusion of an additional variance factor significantly improves the fit. A two-factor type SV model can match the autocorrelation in IV (respectively TV). Figure 4 plots the fit to the autocorrelation of TV, implied by the parameter estimates for the Jump-Diffusive SV Model (29)-(32) (an almost identical autocorrelation structure is implied by the model estimates for the CARMA(2,1) jump-driven SV model). As seen from the Figure, the autocorrelation of TV is well matched for lags until forty (in the estimation I matched the autocorrelation of TV until lag forty). After lag forty the model-implied autocorrelation slightly underestimates the empirically observed one. However, it is still well within the 95% confidence interval. • The Diffusive SV Models, estimated here, produce very bad fit to the data as seen from the results in Table 1. The reason for this is that the square-root processes could not generate enough variance in IV to match the empirically observed one. On the other hand, models which contain jumps in the stochastic variance σ 2 (t) can naturally generate enough variance in IV. These models with jumps in σ 2 (t) provide good fit to the data, as seen from the estimation results in Tables 2 and 3. This observation is in line with the findings in Eraker, Johannes, and Polson (2003), where lower frequency stock market data is used and in Broadie, Chernov, and Johannes (2006), where options data is used. • The estimation results for the models containing jumps in the variance (given in Tables 2 and 3) show that there is a relationship between the jumps in the variance and the jumps in the price. That is, R h2 (x)k(x)G(dx) is statistically different from zero25 . This finding rejects independence between Rn 0 the jumps in the price and the variance. It is also in line with the observation made at the beginning of the current Section regarding the positive link between the JV and TV series. It should be noted that perfect linear dependence between the price jumps and the variance jumps or perfect linear dependence between the squared price jumps and the variance jumps can be shown to be rejected; see the analysis in Todorov (2006b). Another popular in the literature dependence structure, where the jumps in the price and the variance are compound Poisson, arrive always together and have independent normally and exponentially respectively distributed jump sizes, can be also rejected. As already discussed in Section 2 here I do not model parametrically the link between the jumps in the variance and the jumps in the price. Therefore, the results in the paper are not driven by a (possibly misspecified) parametric model for the jumps. The estimated two-factor type models are nonnested. To compare them formally we can use a model selection criteria (MSC) as proposed in Andrews (1999), Andrews and Lu (2001) and Hong et al. (2003) among others. The MSC can be written as M SC = J − s(#moments − #parameters) × kT , 24 The J-statistic is the GMM test for overidentifyingR restrictions. Note that under the null hypothesis the parameter Rn h2 (x)k(x)G(dx) is on the boundary of the parameter space. In this 0 case the asymptotic distribution (under the null) is truncated normal (see Andrews (2002)); as a result the 5% significance critical value is 1.65. 25 18 where J is the test of overindentifying restrictions for the model, s(·) is an increasing function and kT is a sequence satisfying kT → ∞ and kT = o(T ). This model selection criteria is consistent, i.e. asymptotically we choose the model with the best fit to the moment conditions, which is most parsimonious26 . In the case here, all models are estimated with the same number of conditions. The Jump-Driven and the JumpDiffusive SV models have the same number of parameters, which exceeds the number of parameters of the Diffusive SV model with one. Therefore, the difference in MSC of the Jump-Driven and Jump-Diffusive SV models is coming only from the difference in their J-statistics. When comparing these two models with the Diffusive SV model, a small correction to the difference of the corresponding J-statistics should be made to account for the fact that the Diffusive SV model is more parsimonious. However, given the high value of the J-statistic for the Diffusive SV model, this correction for parsimony cannot change our conclusions about the inferior performance of this model. Thus, the two best performing models are the CARMA(2,1) jump-driven SV model (parameter estimates reported in the second column of Table 2) and the Jump-Diffusive SV Model (parameter estimates reported in Table 3). The two models produce an almost identical fit to the moments used in the estimation. In the subsequent analysis I decide to work with the Jump-Diffusive SV Model for the following reason. As already discussed in Section 2, the “leverage effect” in the models analyzed here can be captured by dependence in the diffusive and/or jump innovations in the price and the variance. The CARMA(2,1) jump-driven SV model can put too much “burden” on the jump specification, since it is only through the link between the jumps in the price and the variance that this effect can be generated in this model. In contrast, the Jump-Diffusive SV Model is more flexible in that regard as it can allow for “leverage” coming from dependence between the diffusive innovations in the price and the variance. Thus, for the subsequent analysis of the variance risk premium, I work with the Jump-Diffusive SV Model. Following Tauchen (1985), in Table 4 I report the t-statistics associated with each of the moments used in the estimation. The results in Table 4 suggest that the Jump-Diffusive Volatility Model has no problem with fitting any of the moments used in the estimation27 . I finish the present Section with a short comment on the parameter estimates of the Jump-Diffusive Model. In line with many other studies in the literature (Andersen, Benzoni, and Lund (2002), Alizadeh, Brandt, and Diebold (2002), Chernov, Gallant, Ghysels, and Tauchen (2003)) here I find one of the variance factors to be slowly mean reverting, having a half-life of approximately twenty (business) days, while the other one to be quickly mean reverting with a half-life of approximately half a day. Perhaps not surprisingly the quickly mean-reverting factor is the jump component of the variance, while the slowly mean-reverting variance factor is the continuous component of the variance. 5 Initial Analysis of the Variance Risk Premium In this Section I start the analysis of the variance risk premium, using the selected model for the futures price. The variance risk premium is formally defined as the wedge between (conditional) expectation of the future quadratic variation under the risk-neutral and the physical measure. Thus, the daily-standardized risk premium for the time-variation in the quadratic variation over the next a days is V Ra (t) = ¢ 1 ¡ ¢ 1 Q¡ E [f, f ](t,t+a] |Ft − EP [f, f ](t,t+a] |Ft , a a (37) where EQ (·) denotes expectation under the risk-neutral measure, known also as equivalent martingale measure. In case a superscript is not put on the expectation operator, the expectation is always assumed 26 If none of the compared models can fit asymptotically the moment conditions, i.e. for none of the compared models kT m0 (θ0 ) = 0 (m0 (θ) = E(m(z(t), θ))), then for the consistency of the MSC criteria we need also √ → ∞. T 27 However, these t-statistics should be interpreted carefully. In particular, if a moment condition fails, then in general this affects the consistency of the whole parameter vector. This, in turn, leads to inconsistency even of correct moments. The diagnostic tests are used here just as one more device of checking if the model has difficulty in matching the moments used in the estimation. 19 to be under the physical measure. The variance risk premium reflects the compensation demanded by investors for two features of the price process. The first is the time-variation in the variance of the continuous price component σ 2 (t). The second feature, compensation for which is reflected in the variance risk premium, is the presence of price jumps. In this Section I construct a measure for the variance risk premium, using VIX index data and the selected Jump-Diffusive SV model, and analyze the dynamics of the variance risk premium. The empirical findings in this Section are used as an important guidance for constructing prices of diffusive and jump risk, which is done in the next Section. 5.1 Variance Risk Premium Measure and Its Properties I start with constructing a measure for the variance risk premium. In addition to the high-frequency futures data I use also data on the variance swap rate. The variance swap is a forward contract on the future quadratic variation28 . At expiration it pays the difference between the quadratic variation over the horizon of the contract and the fixed variance swap rate (for further details see e.g. Demeterfi, Derman, Kamal, and Zou (1999)). The variance swap contract involves no initial payment. Therefore, the price of the contract is equal to the expected value under the risk-neutral measure of the future quadratic variation over the contract’s horizon. Thus, for a contract with a length of a days (recall our unit of measurement is (trading) day) the daily-standardized variance swap rate is ÃZ ¯ ! Z t+a Z t+a ¯ ¢ 1 Q 1 Q¡ 2 2 h (x)µ(ds, dx)¯¯Ft . SWa (t) = E [f, f ](t,t+a] |Ft = E σ (s)ds + (38) a a t Rn t 0 For now assume that we have daily data on variance swap rates with a contract horizon of a days. Later in this Section I provide details on the variance swap data. With the realized power variation statistics TV and RV, calculated from the high-frequency data, we have a very good approximation for the integrated variance and the quadratic variation respectively. On the other hand, the variance swap rate gives the conditional expected value of the quadratic variation under the risk-neutral measure. Therefore, studying the joint behavior of the variance swap rate and TV and RV can allow us to construct a good proxy for the variance risk premium. I use the Jump-Diffusive SV model (29)-(32) for this (recall from Section 4 that this is our final model choice for the dynamics under the physical measure). The first term in the variance risk premium formula (37) is the theoretical value of the variance swap and we have data on it. Thus, in order to construct a proxy for the variance risk premium, we need to construct an estimate for the conditional expectation of the future quadratic variation under the physical measure. For the discontinuous component of the quadratic variation this is easy. Its conditional expectation is equal to the unconditional one since the price jumps are a Lévy process ÃZ ¯ ! Z t+a Z ¯ 2 P ¯ h (x)µ(ds, dx)¯Ft = a h2 (x)G(dx). (39) E t Rn 0 Rn 0 Turning to the continuous part of the quadratic variation, its conditional expectation is different from the unconditional one since σ 2 (t) is time-varying. The conditional expectation of IV is a linear function of its two variance factors (since both of them are AR(1)-type processes). However, this conditional expectation is not available to the econometrician. At the same time, the Jump-Diffusive SV model (and in fact all SV models estimated in Section 4) implies an ARMA(2,2) process for the daily IV with coefficients determined from the structural parameters of the model. This fact could be used to calculate the linear projection of the integrated variance on the past values of TV and JV (which in turn are being used as a proxy for IV and QV-IV respectively). The details are provided in Appendix A. I denote this linear projection as P (IVa (t)|Gt ) = β + β1 T Vδ (t − 1) + γ1 JVδ (t − 1) + ... + βt T Vδ (0) + γt JVδ (0), 28 In practice the unobservable quadratic variation is substituted with the realized variance. 20 (40) where Gt = σ(T Vδ (t − 1), ..., T Vδ (1), JVδ (t − 1), ..., JVδ (1)), i.e. Gt is the information created from the past realizations of TV and JV, and we have G ⊂ F . β, β1 ,γ1 ,... are the linear projection coefficients. Thus, a feasible measure for the premium at date t for the variance risk over the next a days is Z 1 RPa (t) = SWa (t) − (41) h2 (x)G(dx) − P (IVa (t)|Gt ). n a R0 RPa (t) is a proxy for V Ra (t). The approximation comes from substituting the conditional expectation of the future integrated variance with its linear projection on the past values of TV and JV. This introduces error for several reasons. First, the information set of the investor might be much larger. Secondly, P (IVa (t)|Gt ) is just a linear projection and since the distribution of IV is non-Gaussian (as suggested by the empirical evidence in Section 4), the linear projection does not coincide with the conditional expectation. Note that I work in a semiparametric setting since the jumps are not modelled parametrically. This means that, in general, it will not be possible in this setting to determine the coefficients from projecting also on nonlinear functions of TV and JV. How useful is the proposed measure for the variance risk premium? Our interest is in answering the question whether there is time-variation in the variance risk premium and if so what determines it. It is easy to show that, if an asymptotic approximation for TV and JV given in Appendix A holds exact, we have the following Cov(RPa (t), T Vδ (t − j)) = Cov (V Ra (t), IV (t − j)) , (42) Cov(RPa (t), JVδ (t − j)) = Cov (V Ra (t), QV (t − j) − IV (t − j)) . (43) Below I illustrate how we can make use of these covariances to identify the dynamics of the variance risk premium. Let’s look at the case of constant variance risk premium first. In this case the conditional expectation of the future quadratic variation differs from the one under the physical measure only by a constant. That is, in the case of a constant variance risk premium, the variance swap rate is à ¯ ! Z t+a Z ¯ 1 2 c h (x)G(dx)¯¯Ft + K, (44) SWa (t) := E IVa (t) + a t Rn 0 where K is some constant. Therefore, using (30), (31) and (32) it is easy to derive SWac (t) = K0 + 1 − e−κ1 a c eρ1 a − 1 j V (t) + V (t), aκ1 aρ1 (45) for some constant K0 . That is, in the case of a constant variance risk premium, the variance swap rate is a linear combination of the variance risk factors. Note that the coefficients in front of the variance factors in SWac (t) are not free and are determined by the persistence in these factors under the physical measure. A natural generalization is to consider variance risk premium specification under which the variance swap is a linear combination of the variance factors, but the coefficients in front of them are not restricted. This corresponds to a variance risk premium, which is linear in the variance factors29 . Virtually all measure changes considered in the finance literature imply variance risk premium that is linear in the variance factors. In this case we have time-variation in the variance risk premium and this time-variation is determined solely by the two variance factors. Thus, under such specification of the variance risk premium, the variance swap rate is SWav (t) := K0 + Kc V c (t) + Kj V j (t), 29 (46) In the next Section I derive prices of diffusive and jump risk which support such variance risk premium specification. 21 where K0 , Kc and Kj are some constants. Importantly, the coefficients Kc and Kj are left unrestricted 30 . The variance swap rate corresponding to the constant variance risk premium scenario can be recovered by −κ a ρ1 a −1 constraining Kc = 1−eaκ1 1 and Kj = e aρ in equation (46). 1 Can we distinguish these two scenarios for the variance risk premium using our measure RPa (t)? I work with the variance swap specification SWav (t) in (46), since SWac (t) is a constrained version of it. Then, it is easy to show that Cov(RPa (t), T Vδ (t − i)) = 0 for i = 1, 2, ... ⇔ Kc = 1 − e−κ1 a aκ1 and Kj = eρ1 a − 1 . aρ1 In other words, provided there is time-varying risk premium with time-variation determined by the variance factors, our risk premium measure RP should be correlated with the past values of TV. Further, with the measure RP we can investigate whether both the jump and diffusion parts of the stochastic variance σ 2 (t) determine the time-variation in the variance risk premium. It is easy to derive Z eρ1 a − 1 h2 (x)k(x)G(dx) = 0. or Cov(RPa (t), JVδ (t − i)) = 0 ⇔ Kj = n aρ1 R0 That is, the measure RP will be correlated with the past squared price jumps, provided the jumps in the price and the variance are dependent and in addition the variance R risk2 premium depends on the variance j jump factor V (t). The empirical evidence in Section 4 indicates Rn h (x)k(x)G(dx) 6= 0. Thus, with the 0 covariance between RP and TV and RP and JV, we can differentiate constant variance risk premium from variance risk premium that is linear in the variance factors. Further, because of the link between the price and variance jumps, using these covariances, we can also determine if both variance factors determine the variation in the variance risk premium. 5.2 VIX Index In this Subsection I provide details on the construction of the variance swap rate. In the empirical study I use a one-month variance swap rate. The variance swap is an over-the-counter derivative product, but its theoretical value can be replicated by a portfolio of standard European-style options. The theoretical results in Carr and Wu (2004), Britten-Jones and Neuberger (2000) (see also Bakshi and Madan (2000) and Carr and Madan (2001)) show that we have the following for the variance swap rate Z ert,a ∞ 2Q(K, t, a) SWa (t) = dK + ²t,a , (47) a 0 K2 where rt,a denotes a risk-free interest rate at time t for the period (t, t + a) and Q(K, t, a) is the time t price of an out-of-the-money European-style option with strike price K and time to expiation a. ²t,a is an approximation error, which will be zero if the price does not contain jumps31 . The numerical experiments in Carr and Wu (2004) suggest that this error is not significant for practical purposes. In fact, since in this paper I am interested in the time-variation of the variance risk premium, taking into account this error will not change any of the conclusions in the paper. The result in (47) means that the option data on each of the days can be used to construct the onemonth variance swap rate. This theoretical result is used by the CBOE in the calculation of the new VIX index, which is therefore a proxy for the one-month variance swap rate32 . The index is based on European 30 The constant K0 in (46) and (44) might differ of course. Our interest here is in the coefficients Kc and Kj . As shown in Carr and Wu´ (2004), for the general SV model in (1)-(4), the error is ²t,a = R R ³ h(x) h2 (x) Q Q −2 t+a ν (ds, dx), where ν (·, ·) is the compensator of the jump measure µ under the e − 1 − h(x) − n a 2 t R0 risk-neutral measure. 32 The VIX index was introduced by CBOE in 1993. The old VIX index (also called VXO) computes the average Black and Scholes (1973) implied volatility with strikes close to the current index level and two nearest maturities, so that one month implied volatility is interpolated. The old VIX index was based on options written on the S& P 100 index. In 2003 CBOE changed the calculation of the VIX index and also calculated the new VIX index back to 1990. The new VIX index uses market prices, instead of implied volatilities (unlike VXO) and is approximating the variance swap rate. 31 22 options written on the S&P 500 index. The formula for the implied variance, used in the calculation of the new VIX index, is a discretization of the portfolio of continuum of options in (47), i.e. µ ¶2 2 X ∆Ki rt,h 1 Ft,h 2 σ (t, h) = e Q(Ki , t, h) − −1 , (48) h h K0 Ki2 i where h is time to expiration of the option contracts (measured in calendar years, thus differing from our convention of trading time) used in the calculation; rt,h is the risk-free interest rate at time t for the period till expiration (t, t + h); Ft,h is the forward index level derived from the index option prices; K0 is the first strike below the forward index level Ft,h ; Q(Ki , t, h) is the midpoint of the bid-ask spread for each option with strike Ki ; Ki is the strike of the out-of-the-money option, which is a call if Ki > Ft,h and a put if i−1 Ki < Ft,h ; ∆Ki = Ki+1 −K , while for the lowest strike it is just the difference between the lowest and 2 next higher strike and similarly for the highest strike it is just the difference between the highest strike and the next lower strike. Since on each day we do not have data on options with time to expiration exactly one calendar month, the CBOE calculates the new VIX index by using the following linear interpolation s½ ¾ N30 − Nh1 365 Nh2 − N30 2 2 V IX(t) = 100 + h2 σ (t, h2 ) (49) h1 σ (t, h1 ) , Nh2 − Nh1 Nh2 − Nh1 30 where h1 and h2 are the two nearest maturities from the available options, and Nh1 and Nh2 are the number of calendar days to expiration of the options. In the calculation of the VIX index a calendar-counting convention is used. That is, the year consists of 365 days and in computing the time to expiration for the options, the actual number of days is being used. However, in this paper I adopted a trading-time counting. That is, a unit of time here is one trading day. I do not consider the overnight returns in the analysis of the S&P 500 futures. I continue to use the trading-time convention and assume that each month consists of 22 trading days. I use the VIX index to calculate a daily-standardized variance swap rate with one month horizon (corresponding to 22 trading days) according to the following formula SW22 (t) = 30 1 V IX 2 (t). 365 22 (50) The above is just a transformation, so that the variance swap rate is reported in daily variance units and thus is directly comparable with RV and TV over the days in the sample. The data for the VIX index consists of closing prices for every date for which we have high-frequency data on the S&P 500 futures contract. Thus, we have data covering the period January 2, 1990, to November 29, 2002, for a total of 3256 daily observations. For each of the days in the sample I calculate from the VIX index a one-month variance swap rate according to formula (50). 5.3 Initial Analysis of the Variance Risk Premium The variance swap rate can be decomposed as µ P SWa (t) = V Ra (t) + E ¶ 1 [f, f ](t,t+a] |Ft . a Using this formula we have 1 E(V Ra (t)) = E(SWa (t)) − E([f, f ](t,t+a] ), a µ ¶ q p ¡ ¢ 2 1 Var (V Ra (t)) ≥ Var(SWa (t)) − Var E([f, f ](t,t+a] |Ft ) . a 23 Thus, using the parameter estimates of the Jump-Diffusive SV model as well as the variance swap rate data, we have an estimate for the first two moments of the variance risk premium. The estimated mean of the variance risk premium is 0.6827, while the mean of the variance swap is 1.6542. This shows that the variance risk premium is rather significant. An estimate for the lower bound of the variance of the variance risk premium is 0.3401, while the estimated variance of the variance swap is 1.2775. Thus, the variance risk premium shows significant variation over time and this accounts for a large part of the variation of the variance swap rate. These results underline the importance of studying the variance risk premium and in particular its dynamics. I finish this Section with an analysis of the measure RP (evaluated at the parameter estimates reported in Table 3). I use the high-frequency data and the parameter estimates of the Jump-Diffusive SV model, reported in Table 3, to estimate the linear projection in (40). Appendix A contains details on the Kalman filter used in the construction of the linear projection. Using the estimated linear projection and the data on the variance swap rate I construct the measure for the variance risk premium. Figure 5 plots the RP series together with TV and JV. As seen from the Figure, the RP series is quite persistent and is generally above zero. We can distinguish three periods in the sample. The first is the beginning of the sample and it covers roughly all of 1990 and half of 1991. This period is characterized with average values of the variance of the continuous component and high price jump activity. As seen from the RP series, this period is associated with a high variance risk premium as well. The second period in the sample lasts from the middle of 1991 till approximately the beginning of 1996. This period is relatively tranquil and is associated with very low levels of the variance (both of the continuous and the jump component). Our measure for the variance risk premium RP shows very little variation and is close to zero in this period. The last part of the sample covers the period from 1996 till 2002. This period has a very high variance from the continuous component. We can detect many days in this period in which there are big changes in TV. The same holds true for JV, i.e. an indication for many days with nontrivial price jumps. The RP measure for this period is relatively high. Also, it shows significant variation. It peaks in the days of high variance and after this shows slow decay. Thus, Figure 5 suggests that the jumps and the level of IV are important factors determining the variance risk premium. To investigate further this conjecture I compute the covariance between RP and past values of TV and JV. In Figure 6 I plot these covariances. I summarize the findings from the analysis of the Figure as follows. • The top panel of Figure 6 shows that RP covaries positively with past values of TV. The 95% lower bound for these covariances is well above zero. This indicates that these covariances are statistically different from zero. In Table 5 I report the results from Wald tests for zero covariance. The null hypothesis of these tests is that all covariances between RP and past values of TV up to a certain lag are equal to zero. The results in Table 5 confirm the evidence for the strong dependence between RP and past values of TV. As seen from equation (42) this means that the variance risk premium has time-variation, which depends on the level of the variance of the continuous price component σ 2 (t). • The bottom panel of Figure 6 shows that RP covaries positively with past values of JV. The 95% lower bound for these covariances is above zero. As for the covariance between RP and past values of TV, in Table 5 I report Wald tests for zero covariances between RP and past values of JV up to a certain lag. These tests indicate that there is a statistically significant relation between RP and past values of JV. This, in turn, implies that the past values of the squared price jumps are also a determinant for the time-variation in the variance risk premium (in addition to the variance of the continuous price component). 6 Modeling and Inference for Time-Varying Variance Risk Premium The main question I try to answer in this Section is whether we can “rationalize” the empirical evidence of Section 5 for the time-variation in the variance risk premium. Can we find prices for the different risks in 24 the Jump-Diffusive SV model, which are consistent with no arbitrage and support the empirical findings in Section 5? I start by deriving very general prices of risks in the Jump-Diffusive SV model, which are consistent with no arbitrage. Following that, I conduct a joint inference using the high-frequency data and the variance swap data and test the various specifications for the prices of risk in the model. In order to avoid confusion I restate our final choice for the evolution of the futures price process Z p df (t) = α(t)dt + σ(t)(ρdB1 (t) + 1 − ρ2 dB2 (t)) + h(x)µ̃(dt, dx), (51) Rn 0 σ 2 (t) = V c (t) + V j (t), p dV c (t) = κ1 (θ1 − V c (t))dt + σ1v V c (t)dB1 (t), Z tZ j ρ1 t j eρ1 (t−s) k(x)µ(ds, dx), V (t) = e V (0) + 0 Rn 0 (52) (53) (54) where (B1 , B2 ) is a standard Brownian motion33 . 6.1 Prices of Risk and Change of Measure The fundamental theorem of asset pricing implies, under some technical conditions, that no arbitrage is equivalent to the existence of an Equivalent Martingale Measure (hereafter abbreviated as EMM) under which the discounted gain process associated with an asset is a local martingale 34 . The futures contract involves no initial payment and as a result, assuming that the contract is continuously marked to market, the futures price F (t) is a local martingale under the EMM (see e.g. Duffie (2001)35 ). Turning to the specification of the EMM, the presence of jumps in the futures price renders the market essentially incomplete. That is, we cannot complete it by including in the investor’s portfolio a finite number of securities 36 . That means, that in general we have infinitely many EMM-s which are consistent with no arbitrage. Recall from the previous Section that P denotes the physical measure and Q the risk-neutral measure, i.e. the EMM. The change of measure is specified with the following density process ÃZ ! ¯ Z T Z TZ T dQ ¯¯ = Z(T ) = E (55) ψ1 (s)dB1 (s) + ψ2 (s)dB2 (s) + (Y (s, x) − 1)µ̃(ds, dx) dP ¯FT 0 0 0 Rn 0 where E(·) is the stochastic exponential 37 . The stochastic processes ψ1 (t) and ψ2 (t) are predictable and the stochastic function Y (t, x) is nonnegative and predictable. The technical conditions for Z(T ) to define an EMM are discussed in Appendix B. Here I assume that these conditions are satisfied and focus on analysis of their implications for the variance risk premium. 33 Note that I made a slight change in notation here. I decomposed the Brownian motion in the price process into two orthogonal components. The reason for this is that the pricing kernel is easier to be written with respect to a standard Brownian motion. 34 More formally, as shown in Delbaen and Schachermayer (1998), the condition “No free lunch with vanishing risk” is equivalent to the existence of an equivalent measure under which the discounted gain process is a σ-martingale. The notion of “No free lunch with vanishing risk” can be viewed as a slight modification of no arbitrage opportunities. σ-martingales include local martingales, but the opposite does not hold. For more see Harrison and Kreps (1979), Harrison and Pliska (1981) and the more recent work of Delbaen and Schachermayer (1994, 1998). Here I will avoid these complications and I will assume that an equivalent martingale measure exists, i.e. the discounted gain process will be a local martingale under it. 35 This is subject to a boundedness condition on the interest rate process, but this assumption can be relaxed; see Pozdnyakov and Steele (2004). 36 Except trivial cases, for example when the jumps are a standard Poisson process or more generally when the set of possible jump sizes is finite. See the discussion in Cont and Tankov (2004) and Cont, Tankov, and Voltchkova (2005). 37 Recall that the stochastic (Doléans-Dade) exponential of a given semimartingale X is defined as the solution of dY = Y− dX and Y0 = 1. It has the property that if X is a local martingale, so is Y (see e.g. Jacod and Shiryaev (2003) for further properties). 25 Rt Rt Under the measure Q, B1 (t) − 0 ψ1 (s)ds and B2 (t) − 0 ψ2 (s)ds are Brownian motions and the jump measure µ has compensator Y (t, x)dtG(dx) (recall, under the measure P, µ has compensator dtG(dx)). The stochastic processes ψ1 (t), ψ2 (t) and the (stochastic) function Y (t, x) determine the prices of the different risks in the stochastic volatility model (51)-(54). ψ1 (t) and ψ2 (t) are the prices for the diffusion type risk in the price and the variance, while Y (t, x) determines the compensation for the presence of jumps in the price and the variance. ψ1 (t) and Y (t, x) determine together the variance risk premium, which we are after. ψ1 (t) and ψ2 (t) determine the compensation for the diffusive price risk. Typical assumption for this risk is to be proportional to the stochastic variance; see Pan (2002) for example. In this paper I am not interested in it and as a result I will leave ψ2 (t) unspecified. The drift term α(t) contains the compensation for the diffusive and jump price risks, i.e. (see Theorem 3 in Appendix B) p 1 α(t) = − σ 2 (t) − ρσ(t)ψ1 (t) − 1 − ρ2 σ(t)ψ2 (t) 2Z ³ ´ − Y (t, x)(eh(x) − 1) − h(x) G(dx). Rn 0 (56) In general ρ 6= 0 (because of the “leverage effect”) and therefore the drift term contains information for the variance risk premium. However, since the estimation in this paper is based on realized power variation statistics constructed from high-frequency data, the effect of the time-varying drift term α(t) is negligible. Therefore, I do not make use of α(t) for the identification of the variance risk premium in this paper. The variance risk premium contains compensation for diffusive and jump type risk, since the model has price jumps and in addition σ 2 (t) contains both a diffusive and a jump factor. The pricing of the diffusive risk is well studied, while pricing of jump risk is less so. Before analyzing different specifications for ψ1 (t) and Y (t, x), I briefly discuss the pricing of jump risk and how it differs from pricing diffusive risk. The stochastic function Y (t, x) specifies compensation for each possible jump size x at each point of time t. This is fundamentally different from the pricing of diffusive risk, where at each point of time we have a single price (e.g. ψ1 (t) for the Brownian motion B1 (t)). This explains why the market is incomplete in the presence of jumps. When there are only diffusive risks we need to include in the portfolio a finite number of instruments (i.e. assets and different derivatives on them) which have sensitivity with respect to those diffusive risks and this completes the market. Intuitively, the diffusive risks have a local Gaussian behavior and appropriately weighted set of instruments sensitive to those risks could completely eliminate (hedge) them. The situation is very different in the presence of jumps. In this case, it is not enough to include instruments which are sensitive towards jumps. In the presence of jumps we need a hedging instrument for each possible jump size. Thus, provided the jumps have an infinite number of possible jump sizes, the market cannot be completed by a finite number of instruments. Therefore, for example, the compensation for price jump risk reflected in the drift term α(t) in equation (56) is an “average” over the compensation for all possible jump sizes at time t. I turn now to modeling the variance (and jump) risk premium, i.e. modeling ψ1 (t) and Y (t, x) in the stochastic volatility model (51)-(54). The typical way of specifying the change of measure (i.e. the prices of risk) is such that the model is of the same class under both measures (physical and risk-neutral). The main reason for analyzing such measure changes is analytical tractability. For the SV model (51)-(54), used here, this means that under the risk-neutral measure the jumps in the price and the variance are again Lévy processes, and the stochastic variance is a sum of square-root process and jump-driven OU process, possibly with different parameters. This, however, might be too restrictive particularly for the jumps. Therefore, I consider also measure changes for jumps which go beyond the “structure-preserving” ones. In the next two subsections I analyze the pricing of the diffusive and the jump risks determining the variance risk premium, i.e. ψ1 (t) and Y (t, x). It is convenient for the subsequent analysis to split the variance risk premium as follows V Ra (t) = V Rac (t) + V Raj (t), 26 where V Rac (t) 1 = EQ a µZ 1 Q E a V Raj (t) = t+a t ÃZ 1 − EP a ¯ ¶ ¯ ¶ µZ t+a ¯ ¯ 1 P c ¯ V (s)ds¯Ft − E V (s)ds¯¯Ft , a t c t+a Z Z Rn 0 t ÃZ h2 (x)µ(ds, dx) + t+a Z Rn 0 t t+a t Z h2 (x)µ(ds, dx) + ¯ ! ¯ V j (s)ds¯¯Ft t+a t (57) ¯ ! ¯ V j (s)ds¯¯Ft . Thus, V Rac (t) is the part of the variance risk premium which is due to the compensation for the timevariation in the continuous variance factor V c (t). It is determined by ψ1 (t). The other component of the variance risk premium, V Raj (t), is determined only by the price of jump risk Y (t, x). It consists of compensation for the time-variation in V j (t) as well as a compensation for the presence of jumps in the price. I refer to V Rac (t) as diffusive variance risk premium and to V Raj (t) as jump variance risk premium. 6.1.1 Specification of ψ1 (t) All the proofs for the measure changes considered in this subsection are given in Appendix B. Let φ1 (t) = p ψ1 (t)σ1v V c (t). Then, for any specification of ψ1 (t), V c (t) has the following dynamics under the measure Q, p dV c (t) = (κ1 θ1 − κ1 V c (t) + φ1 (t))dt + σ1v V c (t)dB1Q (t), (58) and this implies ¯ ¶ µZ t+a Z t+a κ1 (u−t) ¯ 1 Q aκ1 + e−aκ1 − 1 e −1 Q 1 − e−κ1 a c c ¯ E V (s)ds¯Ft = V (t) + θ1 + E (φ1 (u)|Ft ) du, a κ1 a aκ1 aκ1 t t therefore Z V Rac (t) = t t+a eκ1 (u−t) − 1 Q E (φ1 (u)|Ft ) du. aκ1 Thus, the time-variation in φ1 (t) determines the time-variation in the variance risk premium coming from the compensation for the diffusive variance risk. I analyze two different specifications for ψ1 (t). The first corresponds to a constant diffusive variance risk premium. It is given by D1. ψ1 (t) = σ1v √λ V c (t) , where λ is a constant such that λ ≥ 2 σ1v 2 − κ1 θ1 ≥ 0. Under this specification ψ1 (t) is inversely proportional to the square-root of the diffusive variance factor. An implication of this specification for ψ1 (t) is that φ1 (t) is constant. Therefore, under specification D1 V Rac (t) = const. I further generalize this specification to allow for time-variation in V Rac (t). This is done in D2. D2. ψ1 (t) = λ0 +λ1 V c (t) √ , σ1v V c (t) where λ0 and λ1 are constants such that λ0 ≥ 2 σ1v 2 − κ1 θ1 ≥ 0. First, we are trivially back to the constant diffusive variance risk premium case if λ1 = 0. This specification for ψ1 (t) is called extended affine price of risk in Cheridito, Filipović, and Kimmel (2005). It is easy to derive that in this case V Rac (t) = const0 + const1 × V c (t), i.e. the diffusive variance risk premium is an affine function of the diffusive variance factor. This specification implies that the only relevant information at a given time for the diffusive variance risk premium is the level of the diffusive variance factor itself. This change of measure, with the restriction λ0 = 0 imposed, is one of the most frequently used for empirical finance applications. The fully general case, i.e. when λ0 6= 0, has only 27 recently been used by Wu (2005) and Cheridito, Filipović, and Kimmel (2005) and formally justified in Cheridito, Filipović, and Kimmel (2005) (see also Cheridito, Filipović, and Yor (2005)). Further generalizations of the specification could be also considered. The generalization might be looked for in two directions. The first is in including additional information besides the one contained in the process V c (t) already. Examples being the past level of jumps in the price or the variance. The second generalization is related with the persistence in V Rac (t). Under D2 the memory of the diffusive variance risk premium is tied with the memory of the diffusive variance risk factor. However, the information contained in V c (t) could be “summarized” in a different state variable (still adapted to the filtration generated by V c (t)), which differs in its degree of persistency. I do not consider these alternatives here for two reasons. The first is that it seems relatively harder to do this in an analytically tractable way. The second reason is that I make such a generalization for the price of jump risk and we can only observe the sum of the jump and diffusive variance risk premium. 6.1.2 Specification of Y (t, x) In this subsection I model the price of jump risk, i.e. Y (t, x). All the proofs for the measure changes considered in this subsection can be found in Appendix B. Each of the cases for Y (t, x) generalizes the previous one. The simplest specification for Y (t, x) is when it is only a function of x (i.e. of the jump size). This case is considered in specification J1. J1. Y (t, x) = ϑ(x), where ϑ(x) > 0 Z ³p ´2 ϑ(x) − 1 G(dx) < ∞ and Z Rn 0 Rn 0 | (ϑ(x) − 1) h(x)|G(dx) < ∞. Note that, since under this specification Y (t, x) is non-stochastic and does not depend on t, the jumps under the measure Q are Lévy process. Such measure change for jump-driven OU processes (i.e. the SV model (51)-(54) with V c (t) = 0) is analyzed in Nicolato and Venardos (2003). Under such measure change V Raj (t) = const, that is there is no time-variation in the jump variance risk premium. This specification might be too restrictive in view of the empirical evidence in Section 5. Also, Pan (2002) finds that (for the particular model she is using) time-varying jump risk premia is very important in reconciling the spot and option price data as well as explaining volatility ”smirks” in the cross section of options. A natural way to introduce time-variation in the jump risk premium is by specifying it as an affine function of the jump variance component. This is done with the following specification. J2. Y (t, x) = ϑ0 (x) + ϑ1 (x)V j (t−), where ϑ0 (x) ≥ 0 and ϑ1 (x) ≥ 0 and in addition Z (ϑ0 (x) − 1)2 G(dx) < ∞, Rn 0 Z Rn 0 Z ϑ21 (x)G(dx) < ∞ and Rn 0 k(x)ϑ(x)G(dx) < −ρ1 . First, for ϑ1 (x) = 0 we are back to specification J1. Under specification J2, it is easy to show that the jump variance risk premium is linear in the jump variance factor, i.e. V Raj = const0 +const1 ×V j (t). Under J2, the jumps (in the price and the variance) have time-varying intensity under the risk-neutral measure, while they are time-homogenous (i.e. have no time-variation) under the physical measure. However, it should be noted that in terms of tractability little is lost. The model still falls in the class of generalized affine models as defined in Duffie, Filipović, and Schachermayer (2003). I generalize further this jump risk premium specification in J3. 28 J3. Y (t, x) = ϑ0 (x) + ϑ1 (x)τ (t−), where ϑ0 (x) ≥ 0 and ϑ1 (x) ≥ 0 and in addition τ (t) = eρτ t τ (0) + Z tZ Rn 0 0 eρτ (t−s) ζ(x)µ(ds, dx), Z Rn 0 Z ρτ < 0, Z ζ(x)G(dx) < ∞ and Z 2 Rn 0 ζ(x) ≥ 0, (ϑ0 (x) − 1) G(dx) < ∞, Rn 0 ϑ21 (x)G(dx) Rn 0 ζ 4 (x)G(dx) < ∞, Z < ∞ and Rn 0 ζ(x)ϑ1 (x)G(dx) < −ρτ . Under specification J3, the jump variance risk premium can be written as V Raj (t) = const0 + const1 × τ (t), i.e. it is linear in τ (t). The state variable τ (t) is a Lévy-driven OU process. For ρτ = ρ1 and ζ(x) = k(x), we have τ (t) = V j (t) and thus we recover specification J2. Specification J3 generalizes J2 in two directions. First, by allowing ζ(·) to differ from k(·), I allow for different information, besides the one contained in the jump variance factor V j (t), to enter the state variable τ (t). This can be done by setting ζ(x) to depend on elements in the vector x on which the function k(x), determining the jumps in the variance, does not depend. The second generalization is to allow the persistence in the state variable τ (t) to differ from that of the jump variance component V j (t). That is, τ (t) and V j (t) might have the same information (i.e. the jumps in the variance) but this information could be synthesized in a different way. This difference in persistency can be achieved by letting ρτ 6= ρ1 . It should be pointed out that specification J3 is still analytically tractable and under the risk-neutral measure the model is again in the generalized affine class. Finally, J3 can be even further generalized. For example, τ (t) can have a kernel which is mixture of exponentials (e.g. CARMA(2,1) kernel). This will offer more flexibility in modeling the persistence in the jump variance risk premium. I did not make this generalization in order to keep the modeling parsimonious. Also, the time-variation in jumps of different size and/or sign might be modelled differently. Examples are big jumps being more persistent than small ones and/or negative jumps having more persistence as compared with positive ones (under the risk-neutral measure of course). All these scenarios are plausible but I do not explore them here for reasons of identification. The analysis of the jump risk premium here is based on using the conditional second moment of the risk-neutral distribution of the return process (i.e. the variance swap contract). Other conditional moments of the risk-neutral distribution of the return process are needed in order to identify different time-variation in the different jumps. Such analysis is beyond the scope of the paper. Overall, I conclude that for the purposes of exploring the variance risk premium the measure change for the jumps given in J3 is general enough and at the same time very easy to work with. 6.2 Inference for Time-Varying Variance Risk Premium In this Subsection I conduct a joint inference using high-frequency futures data and data on the variance swap. I work with specification D2 for ψ1 (t) and specification J3 for Y (t, x). As argued above, we have D1 ⊂ D2 and J1 ⊂ J2 ⊂ J3. The different cases for the diffusive and jump risk, analyzed here, have testable implications. Mainly, the covariances between the variance risk premium and past values of the daily IV and squared price jumps are different. Therefore, these moments could be used to discriminate between the different cases. In the previous Subsection I showed that V Rac (t) = const0 + const1 × V c (t) under specification D2 and that V Rj (t) = const0 + const1 × τ (t) under specification J3. Therefore, under D2 and J3 we have µ ¶ Z t−i 2 Cov V Ra (t), σ (s)ds = Kφc e−κ1 i + Kτc eρτ i , (59) t−i−a 29 and à Cov V Ra (t), Z t−i t−i−a ! Z 2 Rn 0 = Kτj eρτ i , h (x)µ(ds, dx) (60) where Kφc , Kτc , Kτj and ρτ > 0 are some constants. R Further, the constant Kτc is proportional to Rn k(x)ζ(x)G(dx) ≥ 0 and Kτj is proportional to 0 R 2 (x)ζ(x)G(dx) ≥ 0. Also, the results in Section 4 indicate that there is a highly statistically significant h n R0 R link between the price and variance jumps, i.e. Rn h2 (x)k(x)G(dx) > 0. However, note that even in this 0 R R case we might have Rn h2 (x)ζ(x)G(dx) > 0 and still Rn k(x)ζ(x)G(dx) = 038 . In such a scenario Kτc = 0 0 0 and Kτj 6= 0. Similarly we might have Kτc 6= 0 with Kτj = 0. Before discussing how to use the data to estimate the moments in (59) and (60) I briefly elaborate on the implications which different scenarios for the variance risk premium have on the values of these covariances. • Constant variance risk premium (D1 and J1). In this case Kτc = Kτj = Kφc = 0. Note that this scenario is observationally equivalent39 to the case when φ(t) and τ (t) are time-varying, but are both (at least linearly) independent from σ 2 (t) and the squared price jumps40 . • Affine-in-variance factors variance risk premium (D2 and J2). In this case ρτ = ρ1 . Note that in this case if the variance jump factor V j (t) has short memory, this will imply the impact of the R that 2 jumps on the variance risk premium dies out quickly as well. Also, since Rn h (x)k(x)G(dx) > 0, under J2 we can either have Kτc = Kτj = 0 or Kτc 6= 0 and Kτj 6= 0. 0 • Jump variance risk premium depends only on component of price jumps orthogonal to the variance jumps. This is possible only under specification J3. The implication of this scenario is Kτc = 0 and Kτj 6= 0. • Jump variance risk premium depends only on component of variance jumps that is orthogonal to price jumps. As the previous scenario, this case is possible under the general specification for the jump risk J3. Its implication is Kτj = 0 and Kτc 6= 0. Turning to the estimation of the covariances, I make use of the RP measure constructed in Section 5. To underline its dependence on the parameters I denote it here as RPa (t, θ), where θ denotes the parameter vector. Recall from Section 5 that we have41 Cov (RPa (t, θ), T Vδ (t − i)) = Cov (V Ra (t), IV (t − i)) , à Cov (RPa (t, θ), JVδ (t − i)) = Cov V Ra (t), 38 Z t−i t−i−a Z Rn 0 ! h2 (x)µ(ds, dx) . Example of such a case is the following. Suppose x = (x1 , x2 ) > 0 (that is all of theR components are nonnegative with at least one being strictly positive) and x1 and x2 are independent of each other, i.e. Rn 1(x1 x2 6=0) G(dx) = 0. Then set 0 √ h(x) = x1 + x2 , k(x) = x1 and ζ(x) = x2 . 39 When the covariances in (59) and (60) are used for identification of the time-variation in the variance risk premium. 40 In the specifications D1 and D2 such a scenario is not possible. For the jump Rvariance risk premium this could happen only under specification J3. Example is the case where x = (x1 , x2 , x3 ) > 0 with Rn 1(x1 x2 x3 6=0) G(dx) = 0 and h(x) = x1 , 0 k(x) = x2 and ζ(x) = x3 . 41 As already mentioned in Section 5, in the calculation of the linear projection P (IVa (t)|Gt ) given in Appendix A it is assumed that an asymptotic approximation for RV and TV stated in this Appendix holds exactly. If this is not the case we will not have the exact equalities above. However, under the conditions of Theorem 2, we know that asymptotically our estimation results will be unchanged from substituting IV with TV and QV with RV. 30 Using this the estimation is done as follows. For each value of the parameters I calculate the linear projection of IVa (t) on past values of TV and JV (the details on the calculation are given in Appendix A). Using this linear projection and the data on the VIX index, for each value of the parameter vector I construct RPa (t, θ) series. In the estimation I match the following set of moments • Mean, Variance and Autocovariance of IV • Mean and Variance of QV • Mean of FV • Mean of VR • Covariance between VR and past IV • Covariance between VR and past QV-IV The autocovariances of IV, used in the estimation, are for lags 1, 3 and 6 as well as the average autocovariance for lags 11 − 20 and 21 − 30. In comparision with the estimation in Section 4 I dropped one of the moment conditions. For the covariance between VR and past IV I use the average one for lags of TV 1−10, 11 − 20, 21 − 30 and 41 − 50. The same is done for the covariance between VR and past values of QV-IV. Thus, overall, I match 19 moments. Following Theorem 2, I substitute the unobservable quantities IV, QV and VR with TV, RV and RP respectively. All additional details of the estimation are as in Section 4. The estimation results are reported in Table 6. The test for overidentifying restrictions shows that the specification of the prices of risk, together with the stochastic volatility model, provides relatively good fit to the moments used in the estimation. Comparing the estimation results in Table 3 and Table 6, we can see that the parameters determining the evolution of the futures price under the physical measure do not change substantially when the additional moments identifying the time-variation in the variance risk premium are included in the estimation. Also, note that K0 is very high and statistically significant, confirming the observation made already in Section 5 of a non-trivial variance risk premium. I continue with analysis of the parameters controlling the time-variation in the variance risk premium. I summarize the findings in the following points 1. The parameters determining the covariance between RP and past values of TV are not very accurately estimated, as indicated by their relatively big standard errors. 2. The coefficient Kφc is negative (although statistically insignificant). This implies that an increase in the diffusive variance factor affects inversely the variance risk premium. This is contrary to what we would expect. In the model the negative premium for the diffusive variance factor can be explained with offsetting a rather big premium for the jumps. 3. I test the hypothesis that the jumps in the price and/or variance do not determine the time-variation in the variance risk premium. This is equivalent to testing Kτc = Kτj = 0. Note that ρτ is present only under the alternative hypothesis and is not identified under the null hypothesis. I use a criterion difference test, i.e J(θ̂r ) − J(θ̂), where J(θ) = T m̂T (θ)0 Ŵ mT (θ) is the GMM objective function42 and θ̂r and θ̂ are the restricted and unrestricted estimates respectively. From the results in Andrews (2001), it follows that the criteria different test has a χ2 distribution with 2 degree of freedom under the null hypothesis 43 . The value of the test is 16.00 with corresponding p-value of 0.0003. 42 Note that I use optimal weighting matrix for the GMM estimator, as this is crucial for the asymptotic distribution of the test. 43 Assumption 3 in Andrews (2001) is easy to be verified in our setting. This assumption is needed for proving the asymptotic distribution of the criterion difference test. It concerns the convergence of the first derivative of the GMM objective function, as a function of the nuisance parameter present only under the alternative ρτ (the convergence is on the space of continuous functions of ρτ equipped with the uniform metric). The first derivative of the vector of moment conditions is a continuous 31 This indicates that the jumps are an important state variable determining the time-variation in the variance risk premium. In addition, the test provides also very strong evidence against “structurepreserving” type measure change for the jumps, i.e. specification J1. 4. Given the strong evidence for the importance of the jumps in determining the time-variation in the variance risk premium, it is interesting to investigate what component of the jumps determines this time-variation. If this is a component in the jumps common for both price and variance we must have Kτc 6= 0 and Kτj 6= 0. If this is a component contained only in the price jumps (i.e. orthogonal to the variance jumps) we should have Kτc = 0 and Kτj 6= 0. Similarly, if it is a component present only in the variance jumps we should have Kτc 6= 0 and Kτj = 0. The coefficient Kτj is statistically different from zero, while Kτc is not. Therefore, this could be interpreted as evidence that the time-variation in the variance risk premium is determined by component in the price jumps, which is orthogonal to (i.e. independent from) the variance jumps. However, this hypothesis can be true only if the diffusive variance risk is also priced44 . Thus, overall, I conclude that there are two possible scenarios. The first is that the diffusive variance factor does not determine the time-variation in the variance risk premium. In this case the time-variation in the variance risk premium is determined by components of both price and variance jumps. The second possible scenario is that the diffusive variance factor determines the variation in the variance risk premium and in addition a component only present in the price jumps determines the variation in the variance risk premium (when the restriction Kτc = 0 is imposed, Kφc becomes positive). 5. The autoregressive coefficients ρ1 and ρτ differ significantly. I calculate a Wald test for the hypothesis ρ1 = ρτ . The value of the test is 36.39 with a corresponding p-value of 0.0000. In other words the difference between ρ1 and ρτ is statistically big. Note that, under the specification J2 for the jump risk, ρ1 and ρτ should be equal. Therefore, the estimation results provide a strong evidence against such a specification. Overall, the empirical results uncover a non-trivial variance risk premium, whose time-variation depends strongly on the price jumps and the stochastic variance. However, the results here are not so conclusive which component of σ 2 (t), the diffusive or the jump one, drives the variation in V Ra (t). Further, we can reconcile the data on the variance swap rate and the underlying stock market index, but only with the use of general (and quite flexible) specification for the price of jump risk. The general specification of the price of jump risk is needed for two reasons. First, it allows the jumps to behave quite differently under the two measures. On one hand, under the physical measure, the jumps generate little time-variation. For the price jumps this follows from the fact that they are of Lévy type and therefore time-homogenous. For the variance jumps this holds since the estimated value of ρ1 indicates very little persistence, its half-life is approximately half a day. Under the risk-neutral measure the behavior of the jumps is very different. Their compensator depends on a state variable which is very persistent (under P). The half-life of ρτ is over 30 business days. Thus, when a jump occurs (in the price and/or the variance) its effect on the evolution of the futures price under the physical measure disappears rather quickly. However, the effect of the jumps on the variance risk premium persists for a very long period of time. The second flexibility which J3 offers is to allow different components of the price and variance jumps to drive the time-variation in the variance risk premium. Finally, the empirical findings regarding the jump risk premium suggest that investors do fear jumps and that their fear of jumps increases after big market drops. This finding can be potentially explained function of ρτ and the moment vector does not depend on ρτ under the null hypothesis. Functional convergence of the first derivative of the GMM objective function, as a function of ρτ then follows by establishing finite-dimensional convergence (which in turn follows essentially from a standard CLT theorem for the moment vector) and C-tightness of the sequence, which can be verified using Theorem 8.3 in Billingsley (1968). 44 A Wald test for the hypothesis Kφc = Kτc = 0 has a value of 11.39 with a corresponding p-value of 0.0034. Thus, this hypothesis is strongly rejected. 32 with a habit persistence type equilibrium model45 in which habits are affected by jumps. In such a model the representative investor will treat differently the diffusive and jump risk, as in Liu, Pan, and Wang (2005) and Bates (2006). When a jump occurs investor’s willingness to protect her portfolio against future big (negative) jumps increases. This could be done for example with buying deep out-of-the money put options, which pushes up their price. This, in turn, results in an increase in the variance risk premium. 7 Conclusion This paper provides an arbitrage-free explanation of the dynamics of the variance risk premium, observed in the data, in the framework of a general stochastic volatility model. The study underlines the importance of general jump specifications for the modeling of financial time series. On the one hand, jumps are important for explaining the observed high-frequency data on the stock market index. On the other hand, this paper finds that flexible jump risk premium is needed for explaining the dynamics of the variance risk premium uncovered from the data. I find that the stochastic volatility model needs both jumps in the price and the variance. Furthermore, the estimation results indicate that the jumps in the price and the variance are dependent, but this dependence cannot be generated by most commonly used parametric stochastic volatility models. The jumps in the model are Lévy and therefore have no time-variation. This assumption is reasonable at least for the analysis of the quadratic variation of the price process (and its components), which is conducted here. The reason is that the daily squared price jumps, estimated from the high-frequency data, show statistically insignificant autocorrelation. Extensions to allow for time-dependence in the jump process could be considered but they will not change qualitatively the findings. When the measure is changed from the physical to the risk-neutral, the time-dependence in the jumps changes significantly. In particular, the estimation results in the paper show that the jumps can be no longer Lévy under the risk-neutral measure. The reason for this finding is that such a scenario will result in a constant jump risk premium and this will fail to match the observed dependence of the variance risk premium on past price jumps. Indeed, I find strong evidence that supports jump risk premium depending on a highly-persistent state variable. Moreover, in the model used here this state variable cannot be the jump variance factor since the latter has a relatively short memory. This means that we need far more general prices of jump risk, like the ones derived in the paper, in order to reconcile the joint dynamics of the underlying asset and the variance swap rate. The estimation results suggest that after a market crash the investors are willing to pay more in order to hedge against future big (negative) jumps. This is also indicative of a special attitude towards extreme negative events. Thus, preferences with an external habit influenced by jumps on the markets appear to be a plausible explanation for this phenomenon. Finally, there are several important directions in which the current work can be extended. First, the analysis of the dynamics of the variance risk premium can be done in a completely parametric framework. The advantage of this is that an efficient estimator can be used. The parametric modeling should take into account the key findings of this paper. Mainly, the jumps in the price and the variance should be modelled in a flexible way as dependent and the price of jump risk should be specified as a very persistent process depending on the price jumps. A second extension of the current work is to generalize further the jump risk premium to account for differential pricing of small versus big and positive versus negative jumps. An appealing conjecture is that the pricing of the big negative jumps is much more persistent as these are the jumps the investors fear most. Such asymmetric modeling of the price of jump risk is impossible to identify in the setting here. The reason is that in this paper I use only the second moment of the riskneutral distribution i.e. the variance swap rate. However, other moments of the risk-neutral distribution could also be constructed, using the theoretical results of Bakshi and Madan (2000) and Carr and Madan (2001). These moments can potentially allow estimating the more general asymmetric prices of jump risk 45 As in Campbell and Cochrane (1999). 33 suggested above. A final extension of the current work is to incorporate the equity risk premium in the analysis. Indeed, the price of jump risk affects both the equity and the variance risk premium. Therefore, a joint study of equity and variance risk premium could help better identify the jump risk premium and at the same time provide an additional test for the pricing of jump risk introduced in this paper. 34 Appendices A Calculation of P (IVa (t)|Gt ) for the Jump-Diffusive Volatility SV model (29)-(32) I start with stating an asymptotic result about the joint behavior of Realized Variance and Realized Tripower Variation in the presence of price jumps. I assume that Assumption 3 and Assumption 4 hold and denote bt/δc X [2] (fjδ − f(j−1)δ )2 , {fδ }t = j=1 bt/δc [2/3,2/3,2/3] {fδ }t = X |f(j−2)δ − f(j−3)δ |2/3 |f(j−1)δ − f(j−2)δ |2/3 |fjδ − f(j−1)δ |2/3 . j=3 This notation is adopted from Barndorff-Nielsen, Graversen, Jacod, Podolskij, and Shephard (2005). Then, combining the results in Barndorff-Nielsen, Shephard, and Winkel (2006) and Jacod (2006a,b) we have à δ −1/2 [2] {fδ }t − Rt 2 0 σ (s)ds − [2/3,2/3,2/3] {fδ }t RtR 0 Rn 0 − µ32/3 h2 (x)µ(ds, dx) ! law −→ L1 (t) + L2 (t), Rt 2 0 σ (s)ds where Z t σ 2 (u)AdW(u), ¡√ ¢ ¶ µ0 P √ 2∆fs κs us σs− + 1 − κs u0s σs s≤t L2 (t) = , 0 L1 (t) = and W(u) is a Brownian motion, κs ∼ U [0, 1], us ∼ N (0, 1), u0s ∼ N (0, 1) and furthermore W and the sequences (κs ), (us ), (u0s ) are independent of each other, are defined on an extension of the original probability space, and are independent from the process f . A is a matrix of constants such that ! à 2 3µ22/3 µ8/3 − 3µ32/3 0 . AA = 3µ22/3 µ8/3 − 3µ32/3 µ34/3 − 5µ62/3 + 2µ22/3 µ24/3 + 2µ42/3 µ4/3 Using this asymptotic result we can write the following approximation for T Vδ (t) and JVδ (t) Z Z t+1 Z 2 T Vδ (t) = θ + (IV (t) − θ) + ν1t JVδ (t) = h (x)G(dx) + h2 (x)µ̃(ds, dx) + ν2t , Rn 0 t Rn 0 where (ν1t , ν2t ) is a martingale difference sequence with µZ t+1 ¶ à µ3 − 5µ6 + 2µ2 µ2 + 2µ4 µ ! ¡ 2¢ 1 4/3 2/3 2/3 4/3 2/3 4/3 4 E ν1t = E σ (u)du 6 M µ2/3 t ¡ 2¢ E ν2t = ¶ à µ3 + 3µ6 + 2µ2 µ2 + 2µ4 µ − 6µ5 µ ! 4/3 2/3 2/3 4/3 2/3 4/3 2/3 8/3 σ 4 (u)du 6 µ2/3 t à ! Z Z 1 + 4E(IV (t)) h2 (x)G(dx) + 2 h2 (x)k(x)G(dx) n M Rn R 0 0 1 E M µZ t+1 35 1 E (ν1t ν2t ) = E M µZ t t+1 ¶ à 3µ5 µ + 2µ6 − µ3 − 2µ2 µ2 − 2µ4 µ ! 2/3 8/3 2/3 4/3 2/3 4/3 2/3 4/3 σ (u)du µ62/3 4 For the Jump-Diffusive Volatility SV Model, given in equation (29)-(32), we can decompose the demeaned IV into two independent parts ˜ c (t) + IV ˜ j (t), IV (t) − θ = IV ˜ c (t) = IV Z t+1 t V1c (s)ds − E(V1c (s)) and ˜ j (t) = IV Z t+1 V j (s)ds − E(V j (s)), t ˜ c (t) and IV ˜ j (t) have with V1c (t) and V j (t) specified in equations (30) and (31) respectively. Both IV R t+1 R autocorrelations of ARMA(1,1) process and are independent of each other. Also, t h2 (x)µ̃(ds, dx) Rn 0 R R ˜ j (t). Therefore, (IV ˜ c (t),IV ˜ j (t), t+1 n h2 (x)µ̃(ds, dx)) have the is an i.i.d. sequence connected with IV t R same autocorrelation structure as the following process (ytc , ytj , yth ) 0 c ytc = φc yt−1 + ect + θc ect−1 , ytj yth j h = φj yt−1 + φh yt−1 + ejt + θj ejt−1 , = (A.1) eht , where (et ) = (ect , ejt , eht ) is a discrete time white noise process, i.e. E (et e0s ) = 0 for t 6= s. Next, I derive the parameters of the above multivariate ARMA process as functions of the parameters of the underlying stochastic volatility model (29)-(32). First, it is easy to see that φc = e−κ1 and φj = eρ1 . To determine the moving average coefficient θc and the variance of the error term in the first equation, Var(ect ), I solve the following system of equations ¡ ¢ ¡ ¢ c Var (ytc ) − φc Cov ytc , yt−1 = Var(ect ) 1 + θc φc + θc2 , ¡ ¢ c Cov ytc , yt−1 − φc Var (ytc ) = Var(ect )θc . First, it is easy to derive (e−κ1 + κ1 − 1) κ21 ¡ ¢ σ 2 (1 − e−κ1 )2 c . Cov ytc , yt−1 = Cov (IV c (t), IV c (t − 1)) = 1 κ21 Var (ytc ) = Var (IV c (t)) = 2σ12 c (t),IV c (t−1)) If we set ρc = Cov(IV , then it is easy to verify that Var(IV c (t)) ρc > φc and 1 + φ2c − 2φc ρc ≥ 2. ρc − φc Therefore, the invertible solution for the moving average coefficient (i.e. |θc | < 1) is given by p 1 + φ2c − 2φc ρc − (1 + φ2c − 2φc ρc )2 − 4(ρc − φc )2 θc = , 2(ρc − φc ) and from here the expression for Var(ect ) follows. 36 Because of the independence of IV c from the jumps it is easy to see that we must have E(ect ejt ) = E(ect eht ) = 0. For the correlation between the error terms in the last two equations of the system (A.1) we have à ! Z t+1 Z j ˜ (t), E(ejt eht ) = Cov IV h2 (x)µ̃(ds, dx) ÃZ = Cov Z = H1 (t, s)ds − 1 − ρ1 ρ21 and for Var(eht ) ÃZ Var(eht ) Rn 0 Z Rn 0 Rn 0 h (x)µ̃(ds, dx) h2 (x)k(x)G(dx) (A.2) ! 2 Rn 0 t 2 h2 (x)k(x)G(dx), t+1 Z = Var ! t+1 Z t Z t+1 t eρ1 Z H1 (t, s)k(x)µ̃(ds, dx) Rn 0 −∞ = Rn 0 t t+1 Z h (x)µ̃(ds, dx) Z = Rn 0 h4 (x)G(dx). Finally, we need to determine φh , θj and Var(ejt ). I solve the following system of equations ³ ´ ³ ´ ³ ´ ³ ´ ¡ ¢ j h Var ytj − φj Cov ytj , yt−1 = φh Cov ytj , yt−1 + Var(ejt ) 1 + θj φj + θj2 + φh θj Cov eht , ejt , ³ ´ ³ ´ ³ ´ j Cov ytj , yt−1 − φj Var ytj = φh Cov eht , ejt + θj Var(ejt ), ³ ´ ³ ´ ³ ´ ³ ´ h Cov ytj , yt−1 − φj Cov eht , ejt = φh Var eht + θj Cov eht , ejt , ³ ´ ³ ´ ³ ´ j h where I use the following expressions for Var ytj , Cov ytj , yt−1 and Cov ytj , yt−1 ,which are easy to derive ³ ´ ³ j ´ 1 − eρ1 + ρ Z 1 j ˜ (t) = Var yt = Var IV k 2 (x)G(dx), 2 ρ31 R0 ³ ´ ´ ³ j ρ1 2 Z j ˜ (t), IV ˜ j (t − 1) = − (1 − e ) k 2 (x)G(dx), Cov ytj , yt−1 = Cov IV n 2ρ31 R à ! µ0 ¶ Z t Z ³ ´ ρ1 2 Z 1 − e j j h 2 ˜ (t), h (x)µ̃(ds, dx) = Cov yt , yt−1 = Cov IV h2 (x)k(x)G(dx). n n ρ 1 t−1 R0 R0 Since (ν1t ) is a martingale difference sequence, it is clear that the linear projection of IV on the past values of TV and JV coincides with the linear projection of TV on the past values of TV and JV. To calculate the last quantity I use the following state-space representation for TV where H= 1 θc 1 θj 0 φh θj φj +θj 0 0 0 0 1 0 , F = φc 1 0 0 0 0 T Vδ (t) = θ + e01 H0 ξt + e01 νt (A.3) ξt+1 = Fξt + vt+1 , (A.4) 0 0 0 0 0 φj 0 1 0 0 0 0 0 0 0 0 0 0 0 0 φh φj φj +θj 0 0 1 0 0 0 0 0 0 37 , vt = ect 0 ejt 0 eht 0 µ ¶ µ ¶ , νt = ν1t , e1 = 1 . ν2t 0 ³ c ´ R R ˜ (t), IV ˜ j (t), t+1 2 h2 (x)µ̃(ds, dx) have the same autocorSince IV (t) = IV c (t) + IV j (t) and IV t R 0 (ytc , ytj , yth ), specified in (A.1), to show that this is a valid state-space relation structure as the process representation for TV it suffices to establish that (y1t , y2t , y3t ) has the same autocorrelation structure as (ytc , ytj , yth ), where y1t := ξ1t + θc ξ2t , y2t := ξ3t + θj ξ4t + θj φh ξ6t , φj + θj y3t := ξ5t . That y1t follows a univariate ARMA (independent from y2t and y3t ) is easy to show. Next, from the equation for the evolution of the state vector we can write ξ3t = φj ξ3t−1 + φh φj ξ5t−1 + ejt φj + θj ξ4t = ξ3t−1 ξ5t = eht ξ6t = ξ5t−1 . Using this we can write θj φh Leh y2t = (1 + θj L) ξ3t + φj + θj t à ! φh φj θj φh Leht ejt = (1 + θj L) + + Leh φj + θj 1 − φj L 1 − φj L φj + θj t = φh L h 1 + θj L j e + e , 1 − φj L t 1 − φj L t where L is the lag operator. Therefore (1 − φj L) y2t = (1 + θj L) ejt + φh Leht , which verifies the claim. From here generating the linear projection P (IVa (t)|Gt ), given the parameter estimates in Table 3, follows easily (see e.g. Hamilton (1994), Chapter 13). B Equivalent Martingale Measures I start with a general theorem, which gives the prices of the different risks in the stochastic volatility model (51)-(54) and sufficient conditions under which they correspond to EMMs. The theorem is general in the sense that no assumption is made about the information entering in the filtration (Ft )t∈R+ . Theorem 3 Consider the probability space (Ω, F , P) with filtration F = (Ft )t∈R+ on which the stochastic volatility model (51)-(54) is defined. Fix a terminal date T . Define a probability measure Q on (Ω, F , F0≤t≤T ) which has a density with respect to the restriction of P to (Ω, F , F0≤t≤T ) given by ÃZ ! ¯ Z T Z TZ T dQ ¯¯ = Z(T ) = E ψ1 (s)dB1 (s) + ψ2 (s)dB2 (s) + (Y (ω, s, x) − 1)µ̃(ds, dx) (B.1) dP ¯FT 0 0 0 Rn 0 38 where ψ1 = (ψ1 (t)), ψ2 = (ψ2 (t)) are predictable processes and Y = Y (ω, t, x) is a strictly positive and predictable function such that the following two conditions are satisfied à Z à !! Z T Z TZ T 1 1 EP exp ψ 2 (s)ds + ψ 2 (s)ds + (B.2) ds (Y log(Y ) − Y + 1) G(dx) < ∞, 2 0 1 2 0 2 0 Rn 0 Z T Z Rn 0 0 |(Y (ω, t, x) − 1) h(x)| dtG(dx) < ∞, P-a.s. (B.3) Then the measure Q belongs to the set of equivalent martingale measures iff the following condition holds p 1 α(t) = − σ 2 (t) − ρσ(t)ψ1 (t) − 1 − ρ2 σ(t)ψ2 (t) 2Z ³ ´ − Y (ω, t, x)(eh(x) − 1) − h(x) G(dx), Rn 0 dP ⊗ dt-a.s. If Q is an EMM, then under Q the logarithmic futures price f (t) follows Z Z 1 2 h(x) Q Q df (t) = − σ (t)dt + (h(x) + 1 − e )ν (ω, dt, dx) + σ(t−)dW (t) + h(x)µ̃Q (dt, dx), n 2 Rn R 0 0 where Z B1Q (t) := B1 (t) − 0 ψ1 (s)ds, (B.6) ψ2 (s)ds, (B.7) t 0 ν Q (ω, dt, dx) := Y (ω, t, x)ν(ω, dt, dx) = Y (ω, t, x)dtG(dx), (B.8) µ̃Q (ω, dt, dx) := µ(ω, ds, dx) − ν Q (ω, dt, dx). (B.9) Proof. From condition (B.2) follows that µZ P E T 0 ¶ ψ12 (s)ds < ∞, and µZ E P 0 T ¶ ψ22 (s)ds < ∞, Rt Rt therefore 0 ψ1 (s)dB1 (s) and 0 ψ2 (s)dB2 (s) are P-local martingales for t ≤ T . Further, using again condition (B.2) we have ÃZ Z ! T EP Rn 0 0 (Y (ω, t, x) log(Y (ω, t, x)) − Y (ω, t, x) + 1) G(dx)dt < ∞. To continue further I make use of the following inequality 1 y log(y) − y + 1 ≥ [|y − 1|2 ∧ |y − 1|], 3 and therefore (B.5) t Z B2Q (t) := B2 (t) − (B.4) ÃZ P E 0 T Z Rn 0 for y ≥ 0, ! ¡ ¢ 2 |Y (ω, t, x) − 1| ∧ |Y (ω, t, x) − 1| G(dx)dt < ∞. 39 RtR As a consequence (see Jacod and Shiryaev (2003), Theorem II.1.33), 0 Rn (Y (ω, t, x) − 1) µ̃(ds, dx) for 0 t ≤ T is a P-local martingale. Combining everything we have that Z t Z t Z tZ N (t) = ψ1 (s)dB1 (s) + ψ2 (s)dB2 (s) + (Y (ω, t, x) − 1) µ̃(ds, dx), for t ≤ T 0 0 0 Rn 0 is a P-local martingale starting from 0. Then, using the property of the Doléans-Dade exponential (see Jacod and Shiryaev (2003), Theorem I.4.61), we have that Z = E(N ) is P-local martingale. Moreover ∆N (t) = Y (ω, t, x) − 1 > −1. Then, condition (B.2) is sufficient for Z to be a uniformly integrable martingale for t ≤ T (see Jacod (1979), Theorem 8.45) and we have EP (Z(T )) = 1. Therefore, the measure Q, defined by Q(dω) = Z(ω)P (dω) is a probability measure and moreover we have Q ∼ P. Further, for the density process Z in (B.1) application of the Girsanov’s theorem for local martingales (see Jacod and Shiryaev (2003), Theorem III.3.8) yields that the processes B1Q and B2Q , given in equations (B.6) and (B.7), are independent standard Brownian motions under the measure Q. Also, application of the Girsanov’s theorem for random measures (see Jacod and Shiryaev (2003), Theorem III.3.17) yields that ν Q (ω, dt, dx) in equation (B.8) is the compensator for the random measure µ under the probability measure Q. Finally, the condition in (B.3) guarantees that the quadratic covariation prohR R i t cess 0 Rn h(x)µ̃(ds, dx), Z(t) has locally integrable variation under the measure P and therefore the 0 following is a well defined local martingale under the probability measure Q Z tZ Z tZ Z tZ Q h(x)µ̃ (ds, dx) = h(x)µ̃(ds, dx) − h(x)(Y − 1)ν(ds, dx), for t ≤ T . 0 Rn 0 0 Rn 0 Rn 0 0 Based on that, f (t) satisfies the following SDE under the measure Q Z ³ ´ p 2 df (t) = α(t) + ρψ1 (t)σ(t) + 1 − ρ ψ2 (t)σ(t) dt + (Y (ω, t, x) − 1) h(x)ν(dt, dx) ´ Z p +σ(t) ρdB1Q (t) + 1 − ρ2 dB2Q (t) + ³ Rn 0 Rn 0 h(x)µ̃Q (dt, dx). (B.10) Using Ito’s lemma we have for F (t) under the measure Q µ ¶ Z p ¡ ¢ dF (t) 1 2 = α(t) + ρψ1 (t)σ(t) + 1 − ρ ψ2 (t)σ(t) dt + Y (ω, t, x)(eh(x) − 1) − h(x) ν(ds, dx) F (t−) 2 Rn 0 ³ ´ Z ³ ´ p Q Q h(x) +σ(t) ρdB1 (t) + 1 − ρ2 dB2 (t) + e − h(x) − 1 µ̃Q (dt, dx). (B.11) Rn 0 Since F (t) must be a local martingale under the measure Q we need to set the drift term in the SDE above to zero. From here we get the result in (B.4) and hence also (B.5). ¤ In Theorem 3 I did not specify exactly what information is contained in the filtration (Ft )t∈R+ . If we assume that the filtration is generated only by the Brownian motion (B1 , B2 ) and the Poisson random measure µ 46 then a representation theorem holds (see Jacod and Shiryaev (2003), chapter III). This means that in this case all equivalent martingale measures are of the form specified in the above Theorem (this will be formally derived in Theorem 4 below). This is a very strong result. On the other hand, if the filtration (Ft )t∈R+ contains additional information (besides that contained in (B1 , B2 ) and µ), then in 46 Note that (53) has a strong solution. 40 general we cannot make such a conclusion. In this case, the above theorem gives those changes of measure, which price only the risks associated with (B1 , B2 ) and µ. Thus, any other risk orthogonal to (B1 , B2 ) and µ will not be priced. This, however, is not restricting the analysis here since we are interested only in the risks coming from (B1 , B2 ) and µ. The condition in equation (B.2) is a sufficient condition guaranteeing that density process Z is a uniformly integrable martingale, but in practice it is very hard to be verified. In the case of no jumps it reduces to the familiar Novikov condition (for other conditions for uniform integrability of exponential local martingales see Kallsen and Shiryaev (2002) and references therein). Given the generality of the setting in Theorem 3 and mainly the fact that we do not know what kind of information enters in the filtration (Ft )t∈R+ this condition is hard to be relaxed further. The condition in equation (B.3) can be removed. However, in this case the dynamics of f (t) (under Q) needs to be changed slightly. Condition (B.3) guarantees that h(x) can be integrated with respect to the compensated jump measure under Q without the need to truncate the big jumps. In that regard this assumption is not restrictive and it is easy to verify. In the next Theorem I make a stronger assumption on the filtration (Ft )t∈R+ which allows to give much weaker conditions for the equivalence of two measures (as compared with condition (B.2)), that are easier to work with. Theorem 4 Consider the probability space (Ω, F , P) with filtration F = (Ft )t∈R+ . Suppose that the filtration F is generated by d-dimensional standard Brownian motion W and n-dimensional homogenous Poisson measure µ with compensator ν P (under the measure P). Let ψ be a d × 1 predictable process and Y (ω, t, x) a R t strictly positive and predictable function. Denote with Q a probability measure under which W (t) − 0 ψ(s)ds is a standard Brownian motion and the random measure µ has compensator ν Q (ω, dt, dx) = Y (ω, t, x)ν P (ω, dt, dx) (assuming that such a measure exists!). Assume that P0 ∼ Q0 and in addition the following conditions are satisfied Z t 0 (B.12) ψ(s)ψ (s)ds < ∞ dP ⊗ dt-a.s, 0 Z tZ Rn 0 0 ³√ ´2 Y − 1 ν(ds, dx) < ∞ Z t 0 ψ(s)ψ (s)ds < ∞ dQ ⊗ dt-a.s, (B.13) (B.14) 0 Z tZ 0 dP ⊗ dt-a.s, Rn 0 ³√ ´2 Y − 1 ν(ds, dx) < ∞ dQ ⊗ dt-a.s. (B.15) loc Then we have P ∼ Q, that is for every t > 0 we have Pt ∼ Qt . Proof. Under the measure P a representation theorem holds and therefore this measure is unique (note we are in the canonical setting because of the assumption for the filtration). Furthermore since the characteristics of (W, µ) are constant under P we have even local uniqueness47 for this probability measure (this follows from Theorem III.2.4 in Jacod and Shiryaev (2003)). Define T = inf (t : H(t) = ∞), where Z H(t) = t 0 Z tZ ψ(s)ψ (s)ds + 0 0 Rn 0 ³√ ´2 Y −1 Y ν Q (ds, dx). 47 Local uniqueness of a probability measure on a filtered probability space is a stronger property than the uniqueness. It requires that the measure is unique for the martingale problem associated with the stopped canonical process for every “strict” stopping time; see Jacod and Shiryaev (2003). 41 Consider the process 0 Z (t) = ´ ³ R ¢ RtR ¡ t E − 0 ψ(s)dW Q (s) + 0 Rn0 Y1 − 1 µ̃Q (ds, dx) 0 for t < T for t ≥ T . Taking into account the relationship ν Q (dt, dx) = Y (ω, t, x)ν(dt, dx) and using the conditions (B.12) and (B.13) we have H(t) < ∞ dP ⊗ dt-a.s. Combining everything we can apply Theorem III.5.34 in Jacod and Shiryaev (2003) (adapted to the case when the filtration is generated by a d-dimensional Brownian motion and n-dimensional homogenous loc 0 Poisson measure) to conclude P ¿ Q with density process Z . Therefore, to prove the (local) equivalence 0 of the two measures we need only to show that Q(Z (t) = 0) = 0. This is an easy consequence of the loc conditions (B.14) and (B.15). As a result we have Q ¿ P. Using Theorem III.5.19 (adapted to the case when the filtration is generated by a d-dimensional Brownian motion and n-dimensional homogenous Poisson measure) in Jacod and Shiryaev (2003) the density process of Q with respect to P is ´ ³R RtR t for t < T E 0 ψ(s)dW Q (s) + 0 Rn0 (Y − 1) µ̃(ds, dx) Z(t) = 0 for t ≥ T . ¤ Theorem 4 above is very general. In the setting of the stochastic volatility model we price only the Brownian motions and the jumps entering the variance and the price of the futures. There might be other sources of risk (modelled with Brownian motions and Poisson measure), which will not be priced. The conditions (B.12)-(B.15) replace the condition (B.2) in Theorem 3 (for the particular specification of the filtration of course) which, as argued above, is hard to verify. B.1 Proof for the Diffusive Risk Price ψ1 (t) The specification D1 is a particular case of the specification D2. Therefore I show only that the price of risk D2 specifies an equivalent change of measure (i.e. that conditions (B.12) and (B.14) are satisfied). The process V c is a square-root process under both measures. The dynamics of V c (t) under the measure Q is given by p dV c (t) = (λ0 + κ1 θ1 + (λ1 − κ1 )V c (t)) dt + σ1v V c (t)dB1Q (t). If κ1 θ1 ≥ 0 and λ0 ≥ 0 the square-root process (under both measures) satisfies the Yamada-Watanabe condition and therefore has a unique non-explosive solution under both measures. This implies that for the equivalence of the measures P and Q we need only verify that the following holds V c (t) > 0 dP ⊗ dt − a.s. and Q ⊗ dt − a.s. To check this condition we need to analyze the behavior of V c at the boundary 0. The necessary and sufficient conditions for non-attainment of the boundary under the measure P and Q respectively, starting from a strictly positive value are (see Ikeda and Watanabe (1981) for example, these conditions guarantee that the boundary is entrance under both measures, i.e. starting from a positive value it is never reached in finite time and if the process starts from zero it always goes out) 2 σ1v ≤ 2κ1 θ1 2 and σ1v ≤ 2κ1 θ1 + 2λ0 . Therefore, for those values of the parameters the conditions (B.12) and (B.14) are satisfied and hence we have equivalence of the two measures. 42 B.2 Proof for the Jump Price of Risk Y (ω, t, x) For the change of measure in J1 conditions (B.13) and (B.15) coincide and are automatically satisfied. The specification J2 is a particular case of J3. Therefore I prove only that the price of risk J3 specifies an equivalent martingale measure. We have Z t Z ³√ Z t Z ³√ ´2 ´2 Y − 1 ν(ds, dx) = Y − 1 dsG(dx) 0 Rn 0 0 Rn 0 0 Rn 0 Z tZ ≤ ≤ 2t Z (Y − 1)2 dsG(dx) Z 2 Rn 0 (ϑ0 (x) − 1) G(dx) + 2 Z t τ (s)ds Rn 0 0 ϑ21 (x)G(dx), therefore it sufficient for conditions (B.13) and (B.15) to hold that the following is true µZ t ¶ µZ t ¶ P Q E τ (s)ds < ∞ and E τ (s)ds < ∞ for ∀t > 0. 0 0 The condition is trivially satisfied under the measure P. I will show that it holds under the measure Q as well. First note that Z Z Z ζ(x)ϑ0 (x)G(dx) = ζ(x)(ϑ0 (x) − 1)G(dx) + ζ(x)G(dx) Rn 0 Rn sZ0 ≤ Rn 0 Z Rn 0 ζ 2 (x)G(dx) Z 2 Rn 0 (ϑ0 (x) − 1) G(dx) + Rn 0 ζ(x)G(dx) < ∞, sZ Z Rn 0 ζ(x)ϑ1 (x)G(dx) ≤ Rn 0 Z ζ 2 (x)G(dx) Rn 0 ϑ21 (x)G(dx) < ∞. Similar inequalities hold true when ζ(x) is replaced with ζ 2 (x). Therefore, the claim follows from the following general result. Lemma 1 (a) There exists probability measure on the canonical probability space such that the canonical process V is a semimartingale with initial distribution L(V (0)) = η (with positive support) and satisfies the following equation Z tZ ρt V (t) = e V (0) + eρ(t−s) k(x)µ(ds, dx), (B.16) 0 Rn 0 where k : Rn0 → R+ , µ is integer-valued measure on R+ × Rn0 with compensator ν(ds, dx) = ds (m(V (s−))G1 (dx) + G2 (dx)) where m : R+ → R+ is a continuous function, G1 : Rn0 → R+ , G2 : Rn0 → R+ and m(x) ≤ C ∨ x, where C is some constant, Z Z K1 := k(x)G1 (dx) < ∞, K2 := k(x)G2 (dx) < ∞, Rn 0 Rn 0 Z K10 := Z 2 Rn 0 k (x)G1 (dx) < ∞, 43 K20 := Rn 0 k 2 (x)G2 (dx) < ∞. (b) In addition to the conditions in part (a) assume that −ρ > K1 ≥ 0 and m(x) = x. Then V (t) is asymptotically covariance stationary. Proof. Part (a). It is convenient to rewrite equation (B.16) in a differential form (note that the measure µ and the process V are defined jointly) Z dV (t) = ρV (t)dt + k(x)µ(dt, dx). Rn 0 The characteristics of the process V with truncation function h(x) = x (that is without truncation) are given by48 Z t B(t) = (K1 m(V (s−)) + ρV (s−) + K2 ) ds, 0 Z C̃(t) = 0 Z ν(ds, A) = ds t¡ Rn 0 ¢ K10 m(V (s−)) + K20 ds, 1[k(x)∈A)] (m(V (s−))G1 (dx) + G2 (dx)) , A ∈ R0 . I define a sequence of semimartingales (ṼK ) with initial distribution η and the following characteristic triplet (the truncation function is again h(x) = x) Z t³ ³ ´ ³ ´ ´ BK (t) = K1 m(ṼK (s−)) ∧ K + ρ ṼK (s−) ∧ K + K2 ds, 0 Z t³ C̃K (t) = Z νK (ds, A) = ds Rn 0 0 1[k(x)∈A)] K10 ³ ´ ´ 0 m(ṼK (s−)) ∧ K + K2 ds, ³³ ´ ´ m(ṼK (s−)) ∧ K G1 (dx) + G2 (dx) , A ∈ R0 . I will show that such processes exist. First, for each K > 0 the characteristics of the semimartingale ṼK are majorized, i.e. ³ ´ ³ ´ sup |K1 m(ṼK (s−)) ∧ K + ρ ṼK (t−) ∧ K + K2 | < ∞, ω,t ³ ´ sup |K10 m(ṼK (t−)) ∧ K + K20 | < ∞. ω,t ³ ³ ´ ³ ´ ´ ³ ³ ´ ´ Also, K10 m(ṼK (s)) ∧ K + K20 and K1 m(ṼK (s−)) ∧ K + ρ ṼK (s) ∧ K + K2 are continu³³ ´R ´ R ous in ṼK (s). This holds true also for m(ṼK (s)) ∧ K Rn g(x)G1 (dx) + Rn g(x)G2 (dx) for all contin0 0 uous and bounded functions g(x) vanishing around zero. Finally, since K1 < ∞ and K2 < ∞, we trivially have à ! Z ³ ´Z lim sup m(ṼK (t−)) ∧ K 1|k(x)|>a G1 (dx) + 1|k(x)|>a G2 (dx) = 0, for ∀t ≥ 0. a↑∞ ω Rn 0 Rn 0 Therefore, the conditions of Theorem IX.2.31 in Jacod and Shiryaev (2003) are satisfied. This implies that there exists probability measure (denoted hereafter with PK ) supporting ṼK (the canonical process) 48 See Jacod and Shiryaev (2003) for a definition of the characteristics of a general semimartingale. C̃(t) stands for the second modified characteristic. 44 as a semimartingale with characteristics (BK , C̃K , νK ) and initial distribution η. I will now show that the sequence of processes (ṼK ) converges weakly (upon taking a subsequence if necessary) to a limiting process and will identify the limit with the process V . To establish weak convergence I prove that the sequence (ṼK ) is tight. For this I use Theorem VI.4.18 in Jacod and Shiryaev (2003). It is sufficient to show that 1. For all T > 0 and ² > 0 lim lim sup PK [νK ([0, T ] × {x : |k(x)| > a}) > ²] = 0. a↑∞ (B.17) K 2. The following sequence of processes is C-tight (i.e. the sequence of processes is tight and all its limit points are continuous processes) FK (t) = Z t³ ³ Z t³ ³ ´ ³ ´ ´ ´ ´ K1 m(ṼK (s−)) ∧ K + |ρ| ṼK (s−) ∧ K + K2 ds+ K10 m(ṼK (s−)) ∧ K + K20 ds. 0 0 First I establish that (FK ) is C-tight. For this I make use of Theorem VI.3.26 in Jacod and Shiryaev (2003). Note that FK (t) is absolutely continuous. Therefore for the C-tightness of FK (t) it suffices to show that the process ṼK (t) satisfies the following boundedness in probability condition · ¸ lim sup PK sup ṼK (s) > a = 0, for ∀t ≥ 0. a↑∞ K We have 0≤s≤t Z tZ Z t³ ´ ṼK (s−) ∧ K ds + ṼK (t) = ṼK (0) + ρ therefore Rn 0 0 0 Z t³ Z tZ ´ ṼK (t) ≤ ṼK (0) + |ρ| ṼK (s−) ∧ K ds + 0 k(x)µ(ds, dx), Rn 0 0 k(x)µ(ds, dx), RtR and since 0 Rn k(x)µ(ds, dx) ≥ 0 as k(x) > 0, and ṼK (0) ≥ as η has a positive support, using Gronwall’s 0 inequality (see Revuz and Yor (1994) for example) we have ! à Z Z t ṼK (s) ≤ ṼK (0) + 0 Rn 0 k(x)µ(ds, dx) exp(|ρ|t), for 0 ≤ s ≤ t. Therefore the C-tightness of (FK ) will be established if we can show that ³ ´ EK ṼK (t) < C, where C is a constant that does not depend on K. We have Z t Z t ³ ´ ³ ´ ³ ´ ³ ´ EK ṼK (t) = E ṼK (0) + ρ EK ṼK (s−) ∧ K ds + K1 EK m(ṼK (s−)) ∧ K ds + tK2 , 0 0 and since m(x) ≤ x ∨ C for some constant C, for K > C we have K E ³ Z t Z t ´ ³ ´ ³ ´ K ṼK (t) ≤ ṼK (0) + ρ E ṼK (s−) ∧ K ds + K1 EK ṼK (s−) ∧ K ds + t(K2 + C). 0 If K1 + ρ ≤ 0 we have 0 ³ ´ EK ṼK (t) ≤ ṼK (0) + t(K2 + C), 45 and note that the right hand side of the above inequality does not depend on K. If K1 + ρ > 0 we have Z t ³ ´ ´ ³ K EK ṼK (s) ds + t(K2 + C), E ṼK (t) ≤ ṼK (0) + (K1 + ρ) 0 therefore using Gronwall’s inequality we have ³ ´ ³ ´ EK ṼK (t) ≤ ṼK (0) + T (K2 + C) exp ((K1 + ρ) t) , 0 ≤ t ≤ T, and note again that the right hand side of the above inequality does not depend on K. This proves C-tightness of the sequence of processes (FK ). To establish tightness of the sequence (ṼK ) we need only verify that condition (B.17) holds. We have EK (νK ([0, T ] × {x : |k(x)| > a})) ² ´ R RT K ³ R E m( Ṽ (s)) ∧ K ds Rn 1|k(x)|>a G1 (dx) + Rn 1|k(x)|>a G2 (dx) K 0 PK [νK ([0, T ] × {x : |k(x)| > a}) > ²] ≤ 0 ≤ 0 ² R R C Rn k(x)G1 (dx) + Rn k(x)G2 (dx) 0 ≤ 0 a² , where C is a constant that does not depend on K. For the last inequality I made use of the result derived above that EK (ṼK (t)) is bounded by a constant, which does not depend on K. This proves that the sequence (ṼK ) is tight. Now we are left with identifying the limiting process with the process V . For this I use Theorem IX.2.22 in Jacod and Shiryaev (2003). It suffices to establish the following Z t p |ṼK (s) ∧ K − ṼK (s)|ds → 0, for every ∀t > 0, as K ↑ ∞, Z 0 0 t p |m(ṼK (s)) ∧ K − m(ṼK (s))|ds → 0, for every ∀t > 0, as K ↑ ∞. The first result follows since for arbitrary s > 0 and ² > 0 we have ³ ´ ´ EK ṼK (s) ³ ´ ³ PK |ṼK (s) ∧ K − ṼK (s)| > ² ≤ PK ṼK (s) > K ≤ , K and as shown above EK (ṼK (s)) can be bounded by a constant, which does not depend on K. The second result follows analogously since ³ ´ ³ ´ ³ ´ ³ ´ EK m(ṼK (s)) EK ṼK (s) + C PK |m(ṼK (s)) ∧ K − m(ṼK (s))| > ² ≤ PK m(ṼK (s)) > K ≤ ≤ . K K Part (b). We can write Z E (V (t)|Fs ) = e ρ(t−s) V (s) + K1 t Z ρ(t−u) e s E (V (u)|Fs ))du + K2 t eρ(t−u) du, t ≥ s. u If we denote with E (V (t)|Fs ) = x(t) for t ≥ s, then x(t) solves the following differential equation dx(t) = (K1 + ρ)x(t)dt + K2 dt, 46 t ≥ s, which implies Z x(t) = e(K1 +ρ)(t−s) x(s) + K2 t e(K1 +ρ)(t−u) du, t ≥ s. s Therefore, we have Z E (V (t)|F0 ) = e(K1 +ρ)t V (0) + K2 = e (K1 +ρ)t t e(K1 +ρ)(t−u) du 0 ´ K2 ³ (K1 +ρ)t V (0) − 1−e , ρ + K1 and since −ρ − K1 > 0 we have lim E (V (t)|F0 ) = − t→∞ K2 , ρ + K1 that is we have asymptotic stationarity in the mean. For the second moment we have Z t Z t ¡ 2 ¢ 2 0 2ρ(t−u) 0 E V (t)|F0 = (E (V (t)|F0 )) + K1 e E (V (u)|F0 ) du + K2 e2ρ(t−u) du, 0 0 and further µ 2 lim (E (V (t)|F0 )) = t→∞ Z lim t→∞ 0 t K2 ρ + K1 K2 lim ρ + K1 t→∞ 1 K2 . 2ρ ρ + K1 e−2ρ(t−u) E (V (u)|F0 ) du = − = Z lim t→∞ 0 t e2ρ(t−u) du = − Z ¶2 , t ³ ´ e2ρ(t−u) 1 − e(K1 +ρ)u du 0 1 . 2ρ Combining everything we can write ¢ ¡ lim E V 2 (t)|F0 = t→∞ µ K2 ρ + K1 ¶2 + K0 K10 K2 − 2, 2ρ ρ + K1 2ρ and therefore we have asymptotic stationarity in the second moment. 47 ¤ Table 1: Estimation results for the Diffusive Volatility SV model: Z h(x)µ̃(dt, dx), df (t) = α(t)dt + σ(t)dW (t) + Rn 0 σ 2 (t) = V1c (t) + V2c (t), q dVic (t) = κi (θi − Vic (t))dt + σiv Vic (t)dBi (t), Parameter θ i = 1, 2. one-factor 0.5273 two-factor 0.6001 κ1 0.3935 1.5945 σ1 0.1283 0.3305 (0.0364) (0.0569) (0.0098) (0.0507) (0.5136) (0.2996) κ2 0.0022 σ2 R 0.2575 (0.0094) (0.2090) h2 (x)G(dx) Rn 0 R 4 Rn h (x)G(dx) 0.0981 0.1136 0.00003 0.0003 91.3540 (6) 0.0000 40.6320 (4) 0.0000 (0.0086) (0.0416) 0 (0.0085) (0.0534) GMM test of overidentifying restrictions χ2 d.o.f p-value q θi Note: In the estimation I set θ = θ1 + θ2 and σi = σiv 2κ for i = 1, 2 and imposed the stationarity i conditions σ1 + σ2 < θ and κi > 0 for i = 1, 2. The data used in the estimation spans the period from January 1 1990 till November 29 2002 for a total of 3256 daily observations on the S&P 500 futures contract. The daily realized multipower variation statistics were computed using 80 intraday five-minute returns over each of the days in the sample. The model is estimated using GMM-type estimator with moment conditions specified in Section 4. The asymptotic variance-covariance matrix, used for calculating the optimal weighting matrix, is computed using Parzen weights with a lag length of 80. Standard errors for the parameter estimates are reported in parentheses. 48 Table 2: Estimation results for the CARMA Jump-Driven SV model: Z df (t) = α(t)dt + σ(t)dW (t) + h(x)µ̃(dt, dx), Z σ 2 (t) = t −∞ g(u) = Rn 0 Z Rn 0 g(t − s)k(x)µ(ds, dx), b0 + ρ1 ρ1 u b0 + ρ2 ρ2 u e + e , ρ1 − ρ2 ρ2 − ρ1 Parameter θ R 2 Rn k (x)G(dx) u ≥ 0. CARMA(1,0) 0.7425 CARMA(2,1) 0.7969 0.1672 2.3934 (0.0525) (0.0222) 0 b0 (0.0511) (0.4741) 0.2313 (0.0455) −ρ1 0.0604 (0.0067) −ρ2 R h2 (x)G(dx) Rn 0 R h4 (x)G(dx) Rn 0 R 2 Rn h (x)k(x)G(dx) 0.0390 (0.0097) 1.5574 (0.2283) 0.1119 0.1491 0.1060 0.2555 0.1331 0.5257 75.2200 (5) 0.0000 0.9879 (3) 0.8042 (0.0160) (0.1293) (0.0975) 0 (0.0159) (0.1296) (0.1397) GMM test of overidentifying restrictions χ2 d.o.f p-value R Note: In the estimation I set θ = ρ1b0ρ2 Rn k(x)G(dx) and imposed the stationarity conditions b0 > 0 − max{ρ1 , ρ2 } and ρi < 0 for i = 1, 2. The data used in the estimation spans the period from January 1 1990 till November 29 2002 for a total of 3256 daily observations on the S&P 500 futures contract. The daily realized multipower variation statistics were computed using 80 intraday five-minute returns over each of the days in the sample. The model is estimated using GMM-type estimator with moment conditions specified in Section 4. The asymptotic variance-covariance matrix, used for calculating the optimal weighting matrix, is computed using Parzen weights with a lag length of 80. Standard errors for the parameter estimates are reported in parentheses. 49 Table 3: Estimation results for the Jump-Diffusive Volatility SV model: Z df (t) = α(t)dt + σ(t)dW (t) + h(x)µ̃(dt, dx), Rn 0 σ 2 (t) = V1c (t) + V j (t), q dV1c (t) = κ1 (θ1 − V1c (t))dt + σ1v V1c (t)dB1 (t), Z t Z j V (t) = eρ1 (t−s) k(x)µ(ds, dx). −∞ Parameter θ κ1 σ1 −ρ R 1 2 n k (x)G(dx) RR0 2 n h (x)G(dx) RR0 4 n h (x)G(dx) RR0 2 Rn h (x)k(x)G(dx) 0 Rn 0 Estimate 0.8153 0.0390 0.8126 1.5399 2.3466 0.1527 0.2648 0.5768 Standard Error 0.0511 0.0098 0.0711 0.2245 0.4692 0.0159 0.1293 0.1483 GMM test of overidentifying restrictions χ2 d.o.f p-value 1.0770 (3) 0.7826 q R θ1 Note: In the estimation I set θ = θ1 − ρ11 Rn k(x)G(dx) and σ1 = σ1v 2κ and imposed the stationarity 1 0 conditions σ1 < θ, κ1 > 0 and ρ1 < 0. The data used in the estimation spans the period from January 1 1990 till November 29 2002 for a total of 3256 daily observations on the S&P 500 futures contract. The daily realized multipower variation statistics were computed using 80 intraday five-minute returns over each of the days in the sample. The model is estimated using GMM-type estimator with moment conditions specified in Section 4. The asymptotic variance-covariance matrix, used for calculating the optimal weighting matrix, is computed using Parzen weights with a lag length of 80. 50 Table 4: Moment Condition Tests for the Jump-Diffusive Volatility SV model autocorrelation in IV for lag 1 autocorrelation in IV for lag 3 autocorrelation in IV for lag 6 aver. autocorrelation in IV for lags 10 − 20 aver. autocorrelation in IV for lags 20 − 30 aver. autocorrelation in IV for lags 30 − 40 E(IV (t)) E(IV 2 (t)) E(QV (t) − IV (t)) E(QV 2 (t)) E(F Vδ2 (t)) −0.2760 0.2252 0.0875 0.5844 0.6461 0.2418 −0.3190 −0.0965 0.1062 0.2371 0.7163 Note: The table reports the diagnostic t-statistics for each of the moment conditions underlying the estimation results for the Jump-Diffusive SV Model reported in Table 3. 51 Table 5: Wald tests for zero covariances between RP and lags of TV and RP and lags of JV Covariance between RP and lags of TV Covariance between RP and lags of JV Lags 1 5 10 15 20 25 30 Lags 1 5 10 15 20 25 30 Wald test 16.9828 19.2828 28.1547 51.4367 73.3441 83.2144 105.4132 P-value 0.0000 0.0017 0.0017 0.0000 0.0000 0.0000 0.0000 Wald test 9.1121 21.1619 27.0614 33.5223 49.3173 56.6326 73.8088 P-value 0.0025 0.0008 0.0025 0.0040 0.0003 0.0003 0.0000 Note: The Wald statistic tests the null hypothesis of zero covariances between RP and lags of TV (respectively JV) up to the corresponding lag. The data used in the estimation spans the period from January 1 1990 till November 29 2002 for a total of 3256 daily observations on the S&P 500 futures contract and daily closing prices of the VIX index. The daily realized multipower variation statistics were computed using 80 intraday five-minute returns over each of the days in the sample. The RP measure was constructed, using the Jump-Diffusive SV model with parameter values the estimated ones reported in Table 3. For the calculation of the asymptotic variance of the covariances the error, from the estimation of the parameters of the Jump-Diffusive SV model, was taken into account. 52 Table 6: Estimation results for the Time-Variation in the Variance Risk Premium: V Ra (t) = α0 + α1 φ(t) + α2 τ (t), Z df (t) = α(t)dt + σ(t)dW (t) + h(x)µ̃(dt, dx), Rn 0 σ 2 (t) = V c (t) + V j (t), p dV c (t) = κ1 (θ1 − V c (t))dt + σ1v (t) V c (t)dB1 (t), Z t Z j V (t) = eρ1 (t−s) k(x)µ(ds, dx). −∞ Parameter θ κ1 σ1 −ρ R 1 2 n k (x)G(dx) RR0 2 n h (x)G(dx) RR0 4 n h (x)G(dx) RR0 2 Rn h (x)k(x)G(dx) 0 K0 Kφc Kτc Kτj −ρτ Rn 0 Estimate 0.7955 0.0312 0.7517 1.6203 2.2623 0.1522 0.2355 0.5419 Standard Error 0.0807 0.0107 0.1193 0.2722 0.6566 0.0169 0.1243 0.1537 0.6948 -1.0797 1.4288 0.1182 0.0189 0.0533 1.3404 1.3151 0.0252 0.0092 GMM test of overidentifying restrictions χ2 d.o.f p-value 5.1952 (6) 0.5190 q R θ1 and imposed the stationarity Note: In the estimation I set θ = θ1 − ρ11 Rn k(x)G(dx) and σ1 = σ1v 2κ 1 0 conditions σ1 < θ, κ1 > 0 and ρ1 < 0. The data used in the estimation spans the period from January 1 1990 till November 29 2002 for a total of 3256 daily observations on the S&P 500 futures contract and daily closing prices of the VIX index. The daily realized multipower variation statistics were computed using 80 intraday five-minute returns over each of the days in the sample. The model is estimated using GMM-type estimator with moment conditions specified in Section 6. The asymptotic variance-covariance matrix, used for calculating the optimal weighting matrix, is computed using Parzen weights with a lag length of 80. 53 Jumps, α=0.0 0.2 0 −0.2 Jumps, α=1.0 0.5 0 −0.5 Jumps, α=1.5 0.5 0 −0.5 Brownian motion 2 1 0 −1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Figure 1: Simulated trajectories of tempered stable jump processes and Brownian motion on the interval [0, 1]. The parameters of the processes were chosen such that in all cases these processes have variance of 1 on the unit interval. 54 Daily Returns 10 5 0 −5 −10 1990 1992 1994 1996 TV 1998 2000 2002 1992 1994 1996 JV 1998 2000 2002 1992 1994 1996 1998 2000 2002 20 15 10 5 0 1990 20 15 10 5 0 1990 Figure 2: S&P 500 daily measures. The top panel shows daily returns; the middle panel shows the daily TV and the bottom panel shows the daily difference between RV and TV. The sample period is from January 2 1990 till November 29 2002 and includes 3256 daily high-frequency observations on the S&P 500 futures contract. The daily realized multipower variation statistics were computed using 80 intraday five-minute returns over each of the days in the sample using the formulas in equations (18) and (19). 55 corr(TVt,TVt−i) 0.6 0.5 0.4 0.3 0.2 0.1 0 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 Lags in days 60 70 80 90 100 corr(JVt,JVt−i) 0.6 0.5 0.4 0.3 0.2 0.1 0 Figure 3: S&P 500 sample correlations. The top panel shows the sample autocorrelation in TV and the second panel shows the autocorrelation in JV=RV-TV. The sample period used in the calculations is from January 2 1990 till November 29 2002 and includes 3256 daily high-frequency observations on the S&P 500 futures contract. The daily realized multipower variation statistics were computed using 80 intraday five-minute returns over each of the days in the sample using the formulas in equations (18) and (19). 56 0.8 0.7 0.6 corr(TVt,TVt−i) 0.5 0.4 0.3 0.2 0.1 0 −0.1 −0.2 0 10 20 30 40 50 Lags in days 60 70 80 90 100 Figure 4: The figure shows the empirical and the fitted autocorrelation for TV. The empirical autocorrelation of the TV is marked with +. The dashed lines are the 95% confidence interval for the autocorrelation with GMM-type standard errors. The solid line is the autocorrelation implied from the Jump-Diffusive Volatility SV model given in (29)-(32). The parameters were set at the estimated values reported in Table 3. 57 6 RPt 4 2 0 1990 1992 1994 1996 1998 2000 2002 1992 1994 1996 1998 2000 2002 1992 1994 1996 Year 1998 2000 2002 20 TV t 15 10 5 0 1990 20 JV t 15 10 5 0 1990 Figure 5: Estimated “Variance Risk Premium”. The top panel shows the RP measure calculated using equation (41). The middle panel shows the daily TV and the bottom panel shows the daily difference between RV and TV. The sample period is from January 2 1990 till November 29 2002 and includes 3256 daily observations on the VIX index as well as 3256 daily high-frequency observations on the S&P 500 futures contract. The variance swap rate was calculated using equation (50). The daily realized multipower variation statistics were computed using 80 intraday five-minute returns over each of the days in the sample using the formulas in equations (18) and (19). 58 0.8 cov(RPt,TVt−i) 0.6 0.4 0.2 0 0 5 10 15 20 25 30 35 40 45 50 0 5 10 15 20 25 Lags in days 30 35 40 45 50 0.35 cov(RPt,JVt−i) 0.3 0.25 0.2 0.15 0.1 0.05 0 Figure 6: “Variance Risk Premium” sample covariances. The top panel shows the covariance between RP and past values of TV; the bottom panel shows the covariance between RP and past values of JV. On both panels the solid lines are the estimated covariances, while the dotted lines are the corresponding 95% confidence bounds. The RP measure was calculated using equation (41). The sample period is from January 2 1990 till November 29 2002 and includes 3256 daily observations on the VIX index as well as 3256 daily high-frequency observations on the S&P 500 futures contract. The variance swap rate was calculated using equation (50). The daily realized multipower variation statistics were computed using 80 intraday five-minute returns over each of the days in the sample using the formulas in equations (18) and (19). 59 References Ait-Sahalia, Y. (2004). Disentangling Diffusion from Jumps. Journal of Financial Economics 74, 487–528. Ait-Sahalia, Y. and J. Jacod (2005). Volatility Estimators for Discretely Sampled Lévy Processes. Working paper, Princeton University and Université de Paris-6. Ait-Sahalia, Y., Y. Wang, and F. Yared (2001). Do Option Markets Correctly Price the Probabilities of Movement of the Underlying Asset? Journal of Econometrics 102, 67–110. Alizadeh, S., M. W. Brandt, and F. Diebold (2002). Range-Based Estimation of Stochastic Volatility Models. Journal of Finance 57, 1047–1091. Andersen, T., L. Benzoni, and J. Lund (2002). An Empirical Investigation of Continuous-Time Equity Return Models. Journal of Finance 57, 1239–1284. Andersen, T., T. Bollerslev, and F. Diebold (2005a). Parametric and Nonparametric Measurement of Volatility. In Y. Ait-Sahalia and L. Hansen (Eds.), Handbook of Financial Econometrics. North-Holland. Andersen, T., T. Bollerslev, and F. Diebold (2005b). Some Like it Smooth, and Some Like it Rough: Disentangling Continuous and Jump Components in Measuring, Modeling and Forecasting Asset Return Volatility. Working paper, Duke University. Andersen, T., T. Bollerslev, and N. Meddahi (2005). Correcting the Errors: Volatility Forecast Evaluation Using High-Frequency Data and Realized Volatilities. Econometrica 73, 279–296. Andrews, D. (1999). Consistent Moment Selection Procedures for Generalized Method of Moments Estimation. Econometrica 67, 543–564. Andrews, D. (2001). Testing when a Parameter is on the Boundary of the Maintained Hypotheis. Econometrica 69, 683–734. Andrews, D. (2002). Generalized Method of Moments when the Parameter is on the Boundary. Journal of Business and Economic Statistics 20, 530–544. Andrews, D. and B. Lu (2001). Consistent Model and Moment Selection Procedures for GMM Estimation with Application to Dynamic Panel Data Models. Journal of Econometrics 101, 123–164. Bakshi, G. and N. Kapadia (2003). Delta-Hedged Gains and the Negative Market Volatility Risk Premium. Review of Financial Studies 16, 527–566. Bakshi, G. and D. Madan (2000). Spanning and Derivative-Security Valuation. Journal of Financial Economics 55, 205–238. Bakshi, G. and D. Madan (2006). A Theory of Volatility Spreads. Working paper, University of Maryland. Barndorff-Nielsen, O. E., S. Graversen, J. Jacod, M. Podolskij, and N. Shephard (2005). A Central Limit Theorem for Realised Power and Bipower Variations of Continuous Semimartingales. In Y. Kabanov and R. Lipster (Eds.), From Stochastic Analysis to Mathematical Finance, Festschrift for Albert Shiryaev. Springer. Barndorff-Nielsen, O. E. and N. Shephard (2001). Non- Gaussian Ornstein-Uhlenbeck-based Models and Some of Their Applicaions in Financial Economics. Journal of the Royal Statistical Society: Series B 63, 167–241. Barndorff-Nielsen, O. E. and N. Shephard (2004). Power and Bipower Variation with Stochastic Volatility and Jumps. Journal of Financial Econometrics 2, 1–37. 60 Barndorff-Nielsen, O. E. and N. Shephard (2006). Econometrics of Testing for Jumps in Financial Economics using Bipower Variation. Journal of Financial Econometrics 4, 1–30. Barndorff-Nielsen, O. E., N. Shephard, and M. Winkel (2006). Limit Theorems for Multipower Variation in the Presence of Jumps in Financial Econometrics. Stochastic Processes and Their Applications 116, 796–806. Bates, D. (2000). Post-’87 Crash Fears in S&P 500 Future Options. Journal of Econometrics 94, 181–238. Bates, D. (2006). The Market for Crash Risk. Working paper, University of Iowa. Billingsley, P. (1968). Convergence of Probability Measures. New York: Wiley. Black, F. and M. Scholes (1973). The Pricing of Options and Corporate Liabilities. Journal of Political Economy 81, 637–654. Blumenthal, R. and R. Getoor (1961). Sample Functions of Stochastic Processes with Independent Increments. Journal of Math. Mech. 10, 493–516. Bollerslev, T., R. Engle, and D. Nelson (1994). ARCH Models. In R. Engle and D. McFadden (Eds.), Handbook of Econometrics, Volume 4. Amsterdam: North-Holland. Bollerslev, T., M. Gibson, and H. Zhou (2005). Dynamic Estimation of Volatility Risk Premia and Investor Risk Aversion from Option-Implied and Realized Volatilities. Working paper, Duke University. Bollerslev, T. and H. Zhou (2002). Estimating Stochastic Volatility Diffusion using Conditional Moments of Integrated Volatility. Journal of Econometrics 109, 33–65. Britten-Jones, M. and A. Neuberger (2000). Option Prices, Implied Price Processes, and Stochastic Volatility. Journal of Finance 55, 839–866. Broadie, M., M. Chernov, and M. Johannes (2006). Specification and Risk Premiums: The Information in S&P 500 Futures Options. Journal of Finance, forthcoming. Brockwell, P. (2001a). Lévy -Driven CARMA Processes. Ann.Inst.Statist.Math 53, 113–124. Brockwell, P. (2001b). Continuous-Time ARMA Processes. In D. Shanbhag and C. Rao (Eds.), Handbook of Statistics, Volume 19. North-Holland. Brockwell, P. and T. Marquardt (2005). Lévy-Driven and Fractionally Integrated ARMA Processes with Continuous Time Parameter. Statistica Sinica 15, 477–494. Campbell, J. and J. Cochrane (1999). By Force of Habit: A Consumption Based Explanation of Aggregate Stock Market Behavior. Journal of Political Economy 107, 205–251. Carr, P., H. Geman, D. Madan, and M. Yor (2003). Stochastic Volatility for Lévy Processes. Mathematical Finance 13, 345–382. Carr, P. and D. Madan (2001). Optimal Positioning in Derivative Securities. Quantitative Finance 1, 19–37. Carr, P. and L. Wu (2004). Variance Risk Premia. Working paper, Bloomberg and Baruch College. Cheridito, P., D. Filipović, and R. Kimmel (2005). Market Price of Risk Specifications for Affine Models: Theory and Evidence. Journal of Financial Economics, forthcoming. 61 Cheridito, P., D. Filipović, and M. Yor (2005). Equivalent and Absolutely Continuous Measure Changes for Jump-Diffusion Processes. The Annals of Applied Probability 15, 1713–1732. Chernov, M., R. Gallant, E. Ghysels, and G. Tauchen (2003). Alternative Models for Stock Price Dynamics. Journal of Econometrics 116, 225–257. Chernov, M. and E. Ghysels (2000). A Study Towards a Unified Approach to the Joint Estimation of Objective and Risk-Neutral Measures for the Purpose of Options Valuation. Journal of Financial Economics 56, 407–458. Chernozhukov, V. and H. Hong (2003). An MCMC Approach to Classical Estimation. Journal of Econometrics 115, 293–346. Cont, R. and P. Tankov (2004). Financial Modelling With Jump Processes. London: Chapman & Hall. Cont, R., P. Tankov, and E. Voltchkova (2005). Hedging with Otions in Presence of Jumps. Stochastic analysis and applications: Abel Symposium 2005 in honor of Kiyosi Ito’s 90th birthday. Delbaen, F. and W. Schachermayer (1994). A General Version of the Fundamental Theorem of Asset Pricing. Mathematische Annalen 300, 520–563. Delbaen, F. and W. Schachermayer (1998). The Fundamental Theorem of Asset Pricing for Unbounded Stochastic Processes. Mathematische Annalen 312, 215–250. Demeterfi, K., E. Derman, M. Kamal, and J. Zou (1999). A Guide to Volatility and Variance Swaps. Journal of Derivatives 6, 9–32. Duffie, D. (2001). Dynamic Asset Pricing Theory. Princeton: Princeton University Press. Duffie, D., D. Filipović, and W. Schachermayer (2003). Affine Processes and Applications in Finance. Annals of Applied Probability 13(3), 984–1053. Duffie, D., J. Pan, and K. Singleton (2000). Transform Analysis and Asset Pricing for Affine JumpDiffusions. Econometrica 68, 1343–1376. Eraker, B. (2004). Do Stock Prices and Volatility Jump? Reconciling Evidence from Spot and Option Prices. Journal of Finance 59, 1367–1403. Eraker, B., M. Johannes, and N. Polson (2003). The Impact of Jumps in Volatility and Returns. Journal of Finance 58, 1269–1300. Feller, W. (1951). Two Singular Diffusion Problems. Annals of Mathematics 54, 173–182. Hamilton, J. (1994). Time Series Analysis. New Jersey: Princeton University Press. Harrison, J. and D. Kreps (1979). Martingales and Arbitrage in Multiperiod Security Markets. Journal of Economic Theory 20, 381–408. Harrison, J. and S. Pliska (1981). Martingales and Stochastic Integrals in the Theory of Continuous Trading. Stochastic Processes and their Applications 11, 215–260. Hong, H., B. Preston, and M. Shum (2003). Generalized Empirical Likekihood-Based Model Selection Criteria for Moment Condition Models. Econometric Theory 19, 923–943. Huang, X. and G. Tauchen (2005). The Relative Contributions of Jumps to Total Variance. Journal of Financial Econometrics 3, 456–499. 62 Ikeda, N. and S. Watanabe (1981). Stochastic Differential Equations and Diffusion Processes. Tokyo: North-Holland. Jacod, J. (1979). Calcul Stochastique et Problèmes de Martingales. Lecture notes in Mathemtatics 714. Berlin Heidelberg New York: Springer-Verlag. Jacod, J. (2006a). Asymptotic Properties of Power Variations and Associated Functionals of Semimartingales. Working paper, Université de Paris-6. Jacod, J. (2006b). Asymptotic Properties of Power Variations of Lévy Processes. Working paper, Université de Paris-6. Jacod, J. and A. N. Shiryaev (2003). Limit Theorems For Stochastic Processes (2nd ed.). Berlin: SpringerVerlag. Jiang, G. and R. Oomen (2006). Estimating Latent Variables and Jump Diffusion Models using High Frequency Data. Working paper, University of Arizona and University of Warwick. Kallsen, J. and A. Shiryaev (2002). The Cumulant Process and Esscher’s Change of Measure. Finance and Stochastics 6, 397–428. Klüppelberg, C., A. Lindner, and R. Maller (2004). A Continuous Time GARCH Process Driven by a Lévy Process: Stationarity and Second Order Behavior. Journal of Applied Probability 41, 601–622. Liu, J., J. Pan, and T. Wang (2005). An Equilibrium Model of Rare-Event Premia and Its Implications for Option Smirks. Review of Financial Studies 18, 131–164. Newey, W. and D. McFadden (1994). Large Sample Estimation and Hypothesis Testing. In R. Engle and D. McFadden (Eds.), Handbook of Econometrics, Volume 4, pp. 2113–2241. Amsterdam: North-Holland. Nicolato, E. and E. Venardos (2003). Option Pricing in Stochastic Volatility Models of the OrnsteinUhlenbeck Type. Mathematical Finance 13, 445–466. Pan, J. (2002). The Jump-Risk Premia Implicit in Options: Evidence from an Integrated Time-Series Study. Journal of Financial Economics 63, 3–50. Pozdnyakov, V. and J. Steele (2004). On the Martingale Framework for Futures Prices. Stochastic Processes and their Applications 109, 69–77. Rajput, B. and J. Rosiński (1989). Spectral Representation of Infinitely Divisible Processes Vectors. Probability Theory and Related Fields 82, 451–487. Revuz, D. and M. Yor (1994). Continuous Martingales and Brownian Motion (2nd ed.). New York: Springer-Verlag. Rosenberg, J. and R. Engle (2002). Empirical Pricing Kernels. Journal of Financial Economics 64, 341–372. Santa-Clara, P. and Y. Shu (2005). Crashes, Volatility, and the Equity Premium: Lessons from S&P 500 Options. Working paper, UCLA. Tankov, P. (2003). Dependence Structure of Lévy Processes with Applications in Risk Management. Raport Interne 502, CMAP,Ecole Polytechnique. Tauchen, G. (1985). Diagnostic Testing and Evaluation of Maximum Likelihood Models. Journal of Econometrics 30, 415–443. 63 Todorov, V. (2006a). Econometric Analysis of Jump-Driven Stochastic Volatility Models. Working paper, Duke University. Todorov, V. (2006b). Estimation of Coninuous-time Stochastic Volatility Models with Jumps. Working paper, Duke University. Todorov, V. and G. Tauchen (2006). Simulation Methods for Lévy -Driven CARMA Stochastic Volatility Models. Journal of Business and Economic Statistics 24, 455–469. Woerner, J. (2006). Power and Multipower Variation: Inference for High Frequency Data. In A. Shiryaev, M. do Rosario Grossinho, P. Oliviera, and M. Esquivel (Eds.), Proceedings of the International Conference on Stochastic Finance 2004. Berlin: Springer Verlag. Wooldridge, J. (1994). Estimation and Inference for Dependent Processes. In R. Engle and D. McFadden (Eds.), Handbook of Econometrics, Volume 4, pp. 2639–2738. Amsterdam: North-Holland. Wu, L. (2005). Variance Dynamics: Joint Evidence from Options and High-Frequency Returns. Working paper, Baruch College. 64