Vrije Universiteit Amsterdam School of Business and Economics Financial Econometrics Case Study Modeling and Forecasting Volatility Supervisor: Lennart Hoogerheide Siem Jan Koopman Author: Jorik van der Oord - 2646814 Jan Daenen - 2653703 Kwame Bonsu - 2576417 Stijn Donckers - 2680580 Bas Boekhout - 2655526 Abstract This paper examines how 16 different volatility models, applied to high frequency data on stock prices of BP (British Petroleum), perform in approximating volatility of the returns. The forecasts of the models are compared to each other using the Diebold-Mariano test based on the FMSE, FMAE and the likelihoods. The volatility models are tested on two test sets namely, set A and set B where set A starts on January 1st 2007 and ends on January 31st 2012 and set B starts on January 1st 2007 and ends on April 19th 2010. Set B is created to investigate how the different volatility models forecast the Deep Water Horizon oil disaster that occurred on April 20th 2010. Results show that in set A, the Realized GARCH-T and Realized GARCH-GED perform well in approximating volatility of the returns. For set B, the results show that, when using the FMAE, the Realized GARCH models perform best. When using the FMSE, it can be found that the GAS-GED outperforms all models, whereas there is no clear ’best’ model when using the Diebold-Mariano based on the likelihood. January 31, 2020 Contents 1 Introduction 1 2 Data Cleaning 1 3 Descriptive Statistics 3.1 Full dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 The year 2010 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 3 4 Realized Measures 4.1 Different Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Empirical Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 3 3 5 Models 5.1 GARCH . . . . . . . . . . . . . 5.2 Robust GARCH . . . . . . . . 5.3 Robust GARCH with leverage . 5.4 GJR-GARCH . . . . . . . . . . 5.5 NAGARCH . . . . . . . . . . . 5.6 GAS . . . . . . . . . . . . . . . 5.7 Realized GARCH Log-Linear . 5.8 Residual error testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 5 5 5 5 6 6 6 7 . . . . . . . . . . . . . . . . . . Absolute Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 7 8 8 8 7 Results 7.1 Forecasting results Set A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Forecasting results Set B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 Visualisation of result of Set A and B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 9 10 12 8 Conclusion 13 Appendices 15 A Data Cleaning 15 B Descriptive Statistics B.1 Returns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.2 2009-2010 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 15 16 C Kernel Models 17 D Realized Garch 21 E GAS E.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E.2 General framework GAS student t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 22 22 F Error distributions F.1 Gaussian . . . . . . . . . . . . F.2 Student-t . . . . . . . . . . . F.3 Generalized error distribution F.4 Derivative . . . . . . . . . . . F.4.1 Student-t . . . . . . . F.4.2 GED . . . . . . . . . . 23 23 23 23 24 24 25 . . . . . . . . . . . . . . . . . . . . . . . . 6 Forecasting 6.1 Forecasting methodology . . . . . . . 6.2 Forecasting accuracy . . . . . . . . . 6.2.1 Forecast Mean Squared Error 6.2.2 Diebold-Mariano Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Forecast Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Modeling and forecasting volatility G Results G.1 Set A . . . . . . . . . G.1.1 Estimates . . . G.1.2 DM-test results G.1.3 Error tests . . G.2 Set B . . . . . . . . . G.2.1 Estimates . . . G.3 DM-test results . . . . G.3.1 Error tests . . G.4 Plots . . . . . . . . . . G.4.1 Set A . . . . . G.4.2 Set B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 26 26 27 28 30 31 32 33 35 35 36 Page 2 Modeling and forecasting volatility 1 Introduction In this paper, several volatility models are studied and applied to high frequency data on stock prices of British Petroleum, or BP, traded on the NASDAQ exchange within the period 2007-2014. BP is a British oil and gas company that delivers energy products and services to customers worldwide. It is one of the six ‘Supermajor’ oil companies: large multinationals that are independent, hence not state-owned. This makes BP an international player of importance. Although the oil and gas market could be considered an oligopoly, influence of individual companies on prices is very limited. It is useful to study volatility in the gas and oil market, as a worldwide transition towards new ways of energy supply is currently arising. For investors, knowledge about the behaviour of prices and returns is important information during these more uncertain times. BP specifically is an interesting company, since it was the protagonist in the infamous Deep Water Horizon accident in the Gulf of Mexico in 2010. This led to an enormous oil spill, killing hundreds of animals in the years after and destroying biodiversity in the whole region. It is known to be the largest marine oil spill in history as Pallardy (2016) states among others. According to Lee et al. (2018), the total costs for BP are estimated on at least $144.89 billion while BP claims it is only $62.59 billion. Anyhow, costs were enormous, which likely led to significant reactions in the market. Hence, this event is incorporated in the volatility analysis performed in this paper. It is therefore good to mention that within this paper two different test and validation sets are used. The first test and validation set, which is conducted on the full dataset, runs from 01-01-2007 to 31-12-2011 and from 01-01-2012 tot 31-12-2014 respectively. Whereas the test and validation set, conducted on forecasting the oil crisis in 2010, runs from 01-01-2007 to 18-04-2010 and from 19-04-2010 to 15-04-2011. The structure of the paper is as follows. Section 2 explains the data-cleaning process of the data. In section 3 a general statistical analysis on the obtained data is performed. Section 4 compares realized measures employed on the data. A dozen of existent volatility models are applied to the data in section 5 and therefrom judged. In section 6 the predictive power of the models is explained using forecasts. This analysis is followed by the obtained results in section 7 and a conclusion in section 8. 2 Data Cleaning Before the econometric analysis could be conducted, the data from Wharton Research Data Series were ‘cleaned’. This is important as one wants to make optimal use of the dataset to generate the best possible volatility estimators. Hence, as much noise as possible should be deleted. Since data cleaning is tricky, a tiny mistake could make the data totally unrepresentative. In this paper, the exact cleaning method used by Barndorff-Nielsen et al. (2009) has been used. This section explains the steps taken in the data cleaning process. Entries with a bid, ask or transaction price of 0 were not present, so step (P2) from Barndorff-Nielsen was skipped. Apart from this, the following steps were taken to obtain the final dataset. 1. Filtering out entries that contain corrected trades. In the dataset these are the entries that have CORR 6= 0. (step T1 in Barndorff-Nielsen) 2. Delete entries that have an abnormal Sale Condition, i.e. entries where COND has a letter code that is not equal to E or F. These are transactions that deviate from what can be expected under the market conditions at that moment. Trades with no sale condition were kept as well. (T2) 3. Entries from exchanges other than NASDAQ were removed. NASDAQ was preferred above other exchanges as it contains the most datapoints. Therefore a subset was created from the dataset that only contains NASDAQ data. While, in theory, different markets should have no significant differences in prices, this cannot be assumed in practice. The estimates therefore are more reliable when we apply the models to prices of one exchange only1 . (P3) 4. Entries outside of the regular trading day (9:30-16:00) were taken out so separate days can be compared. (P1) 1 excluding exchanges (especially exchanges with different opening hours) does lead to some problems, however, which will be addressed in Section 5.7 Page 1 Modeling and forecasting volatility 5. Lastly, every timepoint should appear at most one time in the dataset. Hence, the median price was taken for trades occurring at the same second. (T3) Step T4, which ensures the data are smooth, was not included. Barndorff and Nielsen et al. (2009) use daily spreads for this step, which is not possible in our case as our dataset does not contain quote prices. An alternative smoothing method in absence of quote data could be to remove prices too far from the median price calculated over its neighbourhood (using a rolling window of, for example, 50 data points). However, this would be very time consuming and, is not crucial for a good analysis on this type of data as Barndorff and Nielsen et al. (2009) show that the amount of adjusted entries in step T4 is negligible. For this reason, smoothing is not included in the data cleaning process. Ultimately, data cleaning led to a loss of data points from about 70 million to somewhat more than 4 million. The exact numbers that were erased through each cleaning step are given in Table 8 in the Appendix. The cleaned data are separated into a test sample, and a validation sample. This is important for the remainder of the paper. The test sample comprises the first 5 years. This is used to estimate the parameter values of the used models. The last three years belong to the validation sample. This is used to test the performance of the model that has been built based on the test sample. 3 Descriptive Statistics When performing an econometric analysis of any kind, one should start with a general statistical analysis on the data. This gives insight in the properties of the data and helps understanding how to model and interpret the volatility estimates. In this section, some of these descriptive statistics are reported. 3.1 Full dataset Looking at the plot of the price development between 2007 and 2015, two large price drops stand out. The shock in 2008 can be explained by the worldwide decrease of oil prices that affected all oil companies. The drop in 2010 however, is specific for BP, also referred to as idiosyncratic risk. A straightforward hypothesis is that this shock is caused by the disaster on the 20th of April in that year on the oil platform ‘Deep Water Horizon’, leased by BP. Within a couple of weeks after the accident, half of the market value of BP dampened. Figure 1: The price development of BP shares between 2007 and 2015 shows two large shocks. The plot of returns in the same time period in Figure 5 of the Appendix also shows consecutive peaks during the same periods. Hence, returns do not seem to consist entirely out of random noise. This gives reason to study models that allow for volatility clustering. Table 1 depicts descriptive statistics of the data. In Table 11 in Appendix B.2, the results of the JarqueBera test can be found as well. From these results, one clearly concludes that the data have a non-normal distribution. The same holds for returns: the tails are fat and the distribution is skewed. Value nr. Obs. 4,104,486 mean 48.4485 std. dev. 11.0029 var. 121.0631 min. 26.75 1st Q 41.15 med. 45.34 3st Q 54.32 max. 79.77 kurt. -0.1014 skew. 0.8013 Table 1: Summary of descriptive statistics of BP prices between 2007 and 2015. Page 2 Modeling and forecasting volatility 3.2 The year 2010 Because the disaster had such an impact on prices and returns, it may yield to look closer at the year 2010. Descriptive statistics are supplied in Table 10. Compared to other years, returns in 2010 had higher kurtosis and were more negatively skewed. This depicts that 2010, indeed, was an unstable year. For this reason it is interesting to investigate how well the different models are able to forecast future volatility. A striking fact here is that it took a while before the market started to react to the oil spill. The first ten days after the accident, the price and returns remained fairly stable. In figures 6a and 6b of Appendix B.2 this is graphically shown. It is peculiar price behaviour as there does not seem to have been a lack of information about the disaster, hence, it is an interesting topic for future research. 4 4.1 Realized Measures Different Measures With the rise of high frequency financial data, it became possible to accurately estimate the realized volatility within a day using that day’s trading prices. Algorithms built for this task are called Realized Measures of volatility (RMs), and exist in many different forms. The simplest one is called Realized Volatility (RV), while the Realized Kernel (RK) is arguably the most sophisticated RM. Every RM is an approximation of the Integrated Variance (IV) of the return process. This IV can be seen as the sum of infinitesimal time-varying variances over a given time period. In Appendix C an overview is given of the formulae each RM uses, as well as an overview of the biases which adhere to each RM. Since the Two Scale Realized Volatility and the Realized Kernel correct for microstructure noise the most, they likely produce less biased approximations of the IV. 4.2 Empirical Comparison We employ each realized measure on our data, obtaining a path for the RV, BPV, TSRV, and RK. We compute the RV and BPV using 5 minute returns within each day. For the TSRV, we choose the number of grid such that on average the returns calculated over the grid will be 5 minute returns (K, the number of grids is calculated as the number of transactions in the full dataset, divided by the number of 5 minute intervals in the dataset, rounded up to the nearest integer). Finally, the RK is calculated using q = 25 in the equation for ω, using 20 minute returns in the RVsparse equation, and using the Parzen kernel. Since these are measures for the Integrated Variance, it is natural to compare them graphically by plotting the square root of each RM against the open-to-close returns on BP shares. In the figure below this is done for the year 2010. At the end of Appendix C, Figure 7 with the full sample period can be found. Figure 2: Realized Measure paths over 2009-2010. From this figure one can observe that the RMs all decently approximate the stock’s volatility. The measures all seem to be fairly close, with only BPV being slightly lower than the other three across this time period. Looking at the full sample, however, the RV seems much more prone to price jumps than the other measures, and hence shows some strong jumps in volatility as well. Page 3 Modeling and forecasting volatility Besides this, we can compare these RMs based on their Mean Squared Difference (MSD), which is defined by equation (1). T X M SDi,j = (RMt,i − RMt,j )2 (1) t=0 Here, i,j denote the individual RMs, and t is an index for each day. Table 2 shows the MSD between each of the Realized Measures. RV BPV TSRV RK RV 0.5097 1.0779 0.9206 BPV 0.5097 1.1728 1.0201 TSRV 1.0779 1.1728 0.0370 RK 0.9206 1.0201 0.0370 - Table 2: Mean Squared Difference between each Realized measure. From Table 2 it can be seen that the MSD between the TSRV and RK is very small, indicating that these RM’s largely agree on the Integrated Variance (IV) realizations. Between the RV and BPV, the MSD can also be considered small, albeit less convincing than it is for TSRV and RK. This indicates that the two least sophisticated measures also somewhat agree on IV. The MSD between a less sophisticated and a more sophisticated measure lies around 1.0, indicating that the less sophisticated realized measures likely produce biased estimates of IV in our dataset. Hence, one can conclude that the TSRV and RK produce the best approximations of IV, but it is difficult to say how accurate these approximations are (as we do not observe Integrated Variance). 5 Models According to Cox et al. (1981), there are two approaches when modeling time series with time varying parameters: the parameter driven models (PDM) and the observation driven models (ODM) approach. ODM have become popular in applied statistics and econometrics literature due to the perfect predictability given past information. In this paper various ODM’s will be investigated. One of the advantages of ODM models compared to PDM is that there exists a closed form of the likelihood function. For the PDM this property does not necessarily hold. Here, due to the stochastic properties of the models, one needs to apply a simulation or some other process to properly capture these properties. This makes empirical work with PDM models more difficult according to Creal et al. (2013). Three families of models will be investigated: the Generalized Autoregressive Conditional Heteroskedasticity (GARCH) Models, Generalized autoregressive score (GAS) models, and Realized GARCH models. In order to calculate the log likelihood functions for these models, three different distributions are used. These are the normal distribution, the Student-t distribution, and the Generalized Error Distribution (GED). If one needs a reminder on the exact workings of these distributions and their derivations one can look at the information provided in Appendix F. In order to prevent the estimates from converging to local maxima instead of a global maximum, the initial values were randomised before estimating. In order to ensure proper functioning of the maximum likelihood estimation procedure, these random variables are subject to certain constraints, which differ for each model. For randomising, a uniform distribution is chosen. Per variable the bounds of the distribution may differ, from which a random variable is picked as initial value. This procedure is repeated one hundred times for each GARCH/GAS model estimate and only 10 times for Realized GARCH models due to longer computing times. In this way, for each model, the estimate with the best log likelihood values is found. Thus, the global maximum and the best model estimates are selected. These estimates are used for further forecasting. In this paper, the negative average likelihood for optimizing (since Python has powerful minimization algorithms) is used. Also, the likelihood are scaled for easier optimization. In the remainder of this section, a brief explanation of all the used models is given. Page 4 Modeling and forecasting volatility 5.1 GARCH In this paper, application of the GARCH has been restricted to the GARCH(1,1) model. Hence, models with higher orders of lags are not considered for simplicity and for the fact that empirically a GARCH(p,q) model is not preferred due to it having many parameters which need to be estimated. GARCH models are often used to capture clusters of volatility, which is a common occurrence in financial data as is stated by Blasques (2019). This is an important feature in this study as the plots in section 3 clearly show volatility clustering. The model is used to filter a time-varying volatility σt in a sequence of, in our case, daily returns. This model is an extension on the ARCH models from Engle (1982). A GARCH(1,1) is equal to an ARCH(∞) as Bollerslev (1986) explains. Therefore, no further emphasis will be placed on ARCH models in this paper. The observation-equation of the GARCH is as follows: √ ∀t ∈ Z (2) y t = σ 2 t , Here yt = rt − µ and {t }t∈Z is IID with either a Gaussian, a Student-t or a GED distribution. σt2 is the conditional time-varying variance at time t. For all of the GARCH extensions below, the observationequation follows this same framework. In this specific equation, the time-varying volatility is filtered using the following updating equation of the GARCH(1,1): 2 σt+1 = ω + αyt2 + βσt2 , (3) where ω can be seen as the constant, α as the news parameter and β as the memory parameter. To ensure that the process remains stationary, the restriction needs to be imposed that α + β < 1. 5.2 Robust GARCH When the innovations (t ) of a model are fat-tailed, it is useful to bound the updating equation of a GARCH model. By bounding the updating equation, the model has become robust. In this paper, the updating equation is bounded. This is easily seen in the following equation by looking at the news term α: y2 2 (4) σt+1 = ω + α t 2 + βσt2 . 1 + yt This term will never be able to explode to extreme values when yt becomes very large. As stated by Blasques (2019), recognizing that the innovations are fat-tailed, may be important for the robustness since the maximum likelihood estimator converges to a pseudo-true parameter that renders the volatility filter more robust. 5.3 Robust GARCH with leverage Several studies, such as that of Nelson (1991), have shown that negative returns have more impact on volatility than positive returns. To account for this phenomenon, the robust GARCH with leverage model (RGARCH-LEV) can be used. This model deals with both robustifying of the equation by means of ρ and λ and accounting for the leverage effect by making use of the δ parameter. This results in the updating equation that is given in equation (5). The observation-equation is given by yt = σt t , with {t }t∈Z ∼ T ID(λ). y 2 + δyt−1 2 σt2 = ω + α t−1 + βσt−1 (5) 2 1 + (ρ/λ)yt−1 5.4 GJR-GARCH To deal with asymmetry the GJR-GARCH introduces a dummy variable in the updating equation 1t <0 . If t < 0, then the dummy is equal to one, otherwise it is equal to zero. This dummy is useful to deal with the fact that positive and negative innovations have different impacts on conditional volatility as Glosten et al. (1993) found. In this model the updating-equation is: 2 σt+1 = ω + α2t + γ2t · 1t <0 + βσt2 (6) For the GJR-GARCH model to be stationary, the following constraint need to hold, α + β + γ2 < 1. The observation-equation is given by yt = σt t . In this paper multiple different distributions are investigated for {t }t∈Z . Page 5 Modeling and forecasting volatility 5.5 NAGARCH In the Nonlinear Assymetric Generalized Autoregressive Conditional Heteroskedastic model (NAGARCH) the response to extreme news is reduced according to Engle & Ng (1993). This is a useful property to deal with outliers. In the updating equation of this model it is described as follows: q 2 ∀t ∈ Z (7) σt+1 = ω + βσt2 + α(t + γ σt2 )2 , q 2 . For The news impact curve of the model described above is symmetric and centered at t = (−γ) σt−1 this model to be stationary, α(1+γ 2 )+β needs to be smaller than 1. Once again, the observation-equation remains the same as above and the model is investigated using multiple error distributions. 5.6 GAS The final model group investigated in this paper is the group of GAS models. In this case, once again the observation equation remains the same as for the models described above. The updating equation is given by: 2 σt+1 = ω + Bσt2 + Ast st = St ∆t (8) Here St is the scaling matrix and ∆t is the score. A more detailed description of the workings of the GAS model is given in Appendix E. For the GAS model only two error distributions are investigated instead of three. These are the Student-t and the GED. The reason for this different approach is the fact that a GAS(1,1) where t is the distributed as a Normal with mean zero and variance one will reduce to a GARCH(1,1), as is shown in Creal et al. (2013). For this reason no further attention is given to a Gaussian distributed GAS. 5.7 Realized GARCH Log-Linear The realized GARCH model is an extension of normal GARCH and/or GAS models, which makes use of one or more Realized Measures as an extra variable, Hansen et al. (2012). These models hence try to model the joint density of returns and the Realized Measure, driven by the latent volatility process and a set of parameters. The idea behind the use of RMs as additional data, is that through these measures information based on high frequency returns can be added to the GARCH model. This will likely (and hopefully) improve volatility forecasts. As discussed before, the Realized Kernel is the most sophisticated RM, and hence we use this measure in our Realized GARCH models (as we found, the TSRV was very close to the RK, and could therefore also have been used. The other two RMs likely suffer from bias). One problem with using an RM as an additional variable in modeling close-to-close returns, is that the RMs discussed in this paper are all measures for open-to-close volatility. Hence, we must employ a 2 2 2 )/σO2C . + σC2O correction term to our Realized Kernel: this term is given as (σO2C Our correction term has a value of 2.274, which is relatively high for such a correction. An explanation could be that BP’s stock is traded worldwide, and that hence close-to-open returns (and their volatility) can be substantive. This is a large drawback of using data from one exchange only (but using all available data may present even more problems, such as exchange-specific noise). The realized GARCH model admits the following representation: p rt = ht t (9) log ht = ω + β log ht−1 + α log xt−1 log xt = ξ + φ log ht + τ (t ) + ut Where t and ut are white noise with some distribution (again, we consider Gaussian, Student-t, and GED). xt is the RM for time t. τ () is the following function: τ1 2t + τ2 (2t − 1). A more thorough description of the realized GARCH model can be found in Appendix D. Since the realized GARCH models have a large number of parameters, in this paper optimization is done over a unrestricted parameter set (θ), which is a transformation of the original parameter set. Page 6 Modeling and forecasting volatility The transformations and their inverses can be found in the Appendix. Unfortunately, this choice makes it impossible to derive standard errors and p-values for the original parameters. Hence, we provide estimates, SE’s, and p-values for the θ-set, and estimates only for the original parameter set. 5.8 Residual error testing Finally, after all the models are estimated the residuals of these models are investigated a bit further. For proper validity of the result, the residuals should be independently identically distributed (i.e. ∼ IID). The results of these test on the error term of Set A are depicted in section G.1.3 of the Appendix. For the results of the tests on Set B one needs to look at section G.3.1 of the appendix. Next to this, also the moments of the residuals are obtained. The mean, standard deviation, and the shape of the residuals are given. The shape is only depicted if it exist for the given distribution. Two tests are performed on the residuals. Firstly, the Kolmogorov-Smirnov (KS) test is performed. This is a nonparametric test to statistically tell if there is a difference between the cumulative distribution function of the reference distribution (i.e. Gaussian, Student-t, GED in this paper) and the obtained empirical distribution function. In the KS test the null hypothesis states that the empirical distribution is equal to the sample distribution Smirnov & Smirnov (1939). Secondly the error terms undergo a Ljung-Box (LB) test. This test is to test whether or not there is absence of serial correlation in the first lag. Here, the null hypothesis is tested for zero autocorrelation in the first lag (i.e. independent and random in the first lag). No serial auto correlation is a good property for the residuals to have because it suggests evidence for individual distributed errors. This test is performed on both the residuals and the squared residuals. For these tests a 5% significance level is adopted. 6 Forecasting In this section, the aim is to get a better understanding of the forecasting method used in this paper. In econometrics there are two distinct reasons to create a model. One can focus on the structure of the model, this is often used in policy analysis. Here the effect of certain parameters in the model can be analysed. An other goal is to forecast with the obtained estimates, these forecasting models are not necessarily good for structural analysis and the other way around, Blasques (2019). In this paper, the focus will be on forecasting, not on structural analysis. Also, this section explains the different measures used to compare the forecast accuracy of the different models. 6.1 Forecasting methodology Maximum likelihood estimation is used to estimate the parameters of each model, the different error distributions are explained in the appendix in section F. The parameter estimates are used to forecast the volatility of the validation set. The parameter estimates can be found in Appendix G in Table 12 and Table 34. The data sample is split up into two sets, namely a test set and a validation set. The parameter estimates are used for the one-step-ahead forecasts, these are compared to the realized kernel volatility values. The one-step-ahead forecast is as follows: 2 2 2 σ̂t+h+1 = E(σt+h+1 |σt+h , Ft+h ) (10) where Ft+h denotes all past available information at time t + h. In this paper the models are not re-estimate for each one-step-ahead forecast, as this is very timeconsuming computing wise. However, this approach would be more correct in comparing the accuracy of each of the models, as this would also take into account the flexibility of the models with respect to parameter changes in the underlying return process. Page 7 Modeling and forecasting volatility 6.2 6.2.1 Forecasting accuracy Forecast Mean Squared Error Forecast Mean Absolute Error The Mean Squared Error is a forecast accuracy method that tests the quality of the estimator σ̂t2 . The forecast errors are used to calculate the FMSE. The formula is as follows: T 1X 2 F\ M SE(σ̂t2 ) = ê T t=1 t (11) The Forecast Mean Absolute Error is another forecast accuracy method. It punishes large deviations less than the FMSE. The formula for the FMAE is as follows: T 1X F\ M AE(σ̂t2 ) = |êt | T t=1 (12) It is clear that forecast errors with the lowest FMSE and FMAE are the forecasts that have the best prediction, with respect to some observed variable. The problem encountered here, is that volatility is a latent variable, and hence it is difficult to calculate any error with respect to the unobserved timevarying volatility. Hence, we use the corrected Realized Kernel (see Section 5.7) as an approximation of the close-to-close Integrated Variance across each day, and calculate the FMSE and FMAE using the corrected RK as the ”observed” variance. 6.2.2 Diebold-Mariano Test Amongst all competing models the forecasting performance is compared using the method described by Diebold & Mariano (2002), the Diebold-Mariano (DM) test. In this test the null hypothesis of no difference between forecast errors produced by two models is tested. One can obtain the DM statistic by taking the standardized difference between the forecast errors under some loss function as described above. The DM statistic is described below in equation (13). (dt is the difference between forecast errors (see section 6.2.1)). √ DM = T d¯ , σ̂d T 1 X dt d¯ = N t=1 (13) As stated in Blasques (2019) the σ̂d is a consistent estimator for the dt’s standard deviation. This version of the DM test uses the FMAE or FMSE, which is based on the corrected Realized Kernel as an accurate approximation of the true, latent volatility. Note, however, that we do not know how good of an approximation of open-to-close IV the RK produces, and that the correction term we used is a rather crude (yet practical) way to obtain a close-to-close measure. It is thus necessary to also focus on other forecasting accuracy measures other than FMAE (and FMSE). Hence, our second version of the Diebold Mariano test uses a log-likelihood based scoring rule: dt = LLt,i − LLt,j (with LLt,i being the log-likelihood contribution of the data at time t in model i). The main advantage of this approach is that it uses the log-likelihood instead of an arbitrary target variable like the corrected RK. Since maximizing the likelihood function is equivalent to minimizing the KullbackLeibler divergence between the model’s distribution and the true distribution of the data (Akaike (1998)), the log-likelihood contributions can be seen as a measure of the distance between each data point and the true distribution. Hence, comparing models based on a log-likelihood scoring rule is a natural way to compare the accuracy of forecasted distributions of returns. Note that we can only compare likelihoods that are based on the same data. Hence, to be able to compare the GARCH and GAS models with the realized GARCH models, we have to use the partial log-likelihood of the realized GARCH models that give us the log-likelihood of seeing a certain return (not taking the likelihoods of the Realized Measure into account). The results from both of these methods will be discussed in the next section. Page 8 Modeling and forecasting volatility 7 Results In this part of the paper the results of two different scenarios are described, which come along with a different breakdown of the full dataset in the test and validation part. In scenario A, the data during the oil spill disaster are part of the test sample. In fact, all data up to and including 2011 are used as test sample, consisting of 1260 trading days. In this way, the parameter estimates are based on a lot of data points, which will improve the quality of the forecasts. The validation sample in this scenario is all the data from 2012 up to and including 2014 and includes 754 trading days. For the scenario B, the test set is shortened. It includes all the data from 2007 up to and including 2010/04/19. The last mentioned date is the day before the oil spill happened. The entire test sample then includes 801 trade days. The validation set in scenario B includes 252 trade days, which is one year of trade data. These 252 data points are highly volatile: they include the initial price shock caused by the Deep Water Horizon disaster and the short term (1 year) aftermath. These different test sets are used to estimate the parameters of the different models examined in this paper. The estimated parameter values of the different models can be found in Appendix G in Tables 12, 14, 34, and 36. These estimates are then used to forecast the future volatility of the close-to-close returns. The forecast errors of the models will be discussed in the section below. 7.1 Forecasting results Set A As explained in section 6, this paper discusses three forecast accuracy measures namely, the Forecast Mean Squared Error, the Forecast Mean Absolute Error and the log likelihoods of the validation set. Table 3b gives an overview of the different FMSE, FMAE and the likelihoods of the different models. In Table 3b likelihood values with an ∗ indicate that that particular value is a partial log-likelihood instead of a regular log-likelihood, the same holds for the values with a ∗ in Table 5. Partial log-likelihood means that for these values only the non kernel part of the likelihood is used. An interesting insight from Table 3b is that based on the FMSE and the FMAE, it can be observed that all the Realized models (i.e. Realized-GARCH, Realized-GARCH-T, and Realized-GARCH-GED) perform far better than all the other models. From these three models, the Realized-GARCH-GED seems to have the smallest error in terms of FMSE and FMAE. Furthermore, it can be observed that the RGARCH-LEV has the highest FMSE and FMAE. From Table 3a it can be observed that the Realized GARCH-T has the highest partial log-likelihood and lowest AIC. Here, the R-GARCH-LEV model actually performs well, beating all other GARCH/GAS models in terms of AIC. As the R-GARCH-LEV model performance very well in-sample but poor outof-sample, this could indicate a possibility of overfitting. Table 4 shows the Diebold-Mariano test scores. The Diebold-Mariano test score gives a measure of the number of times a certain model outperforms another model, given that the forecast errors are significant at 5%. Tables 15, 16 and 17 show the p-values of all Diebold-Mariano tests. Whenever the p-value is smaller than the 5% significance level it can be said that the models are significantly different from eachother. For the Diebold-Mariano test based on the FMAE, Table 4 shows that the Realized-GARCHGED model outperforms most other models. Looking at the Diebold-Mariano test based on the FMSE it can be found that the different Realized models (i.e. Realized GARCH, Realized GARCH-T and Realized GARCH-GED) outperform most of the other models. Looking at Table ??, it can also be observed that when using the Diebold-Mariano test, most of the time no significant difference between the different Realized models can be found. Therefore one can conclude more generally that the different Realized models outperform the other models. Finally, a Diebold-Mariano test based was also conducted based on the likelihood. Table 4 shows that the Realized GARCH-T outperforms all other models. Interestingly enough, the Realized GARCH only outperforms the RGARCH model, indicating that a fat-tailed distribution is necessary for modelling BP’s stock’s returns. In section G.1.3 in Appendix G, one can observe the results obtained by the different error tests ran on the residuals of each model for data set A. By observing the results of the Ljung-Box test on the residuals, one can observe that none of the models suffer from autocorrelation. For the squared residuals, it can be observed that apart from the GJR-GARCH-GED, the GARCH-GED and the GARCH models, none of the models have autocorrelation in the residuals. The tables in section G.1.3 in Appendix G also give the results of the Kolmogorov-Smirnov test (KS-test). The KS-test is rejected for all models in this paper. This indicates that the forecasting residuals are not distributed in the way one expected them to Page 9 Modeling and forecasting volatility be, which suggests that the model forecasts might not have the distributions that are assumed in this paper. GARCH GARCH-T GARCH-GED RGARCH RGARCH-LEV GJR-GARCH GJR-GARCH-T GJR-GARCH-GED NAGARCH NAGARCH-T NAGARCH-GED GAS-T GAS-GED Realized GARCH Realized GARCH-T Realized GARCH-GED Likelihood -2561.646 -2540.090 -2544.254 -2718.208 -2532.571 -2553.087 -2534.381 -2538.021 -2561.646 -2540.090 -2544.254 -2563.700 -2546.030 -2521.525 -2508.555 -2512.642 AIC 5129.291 5088.181 5096.508 5440.416 5077.141 5114.174 5078.761 5086.043 5131.292 5090.181 5098.508 5135.391 5100.053 5059.050 5037.111 5045.284 GARCH GARCH-T GARCH-GED RGARCH RGARCH-LEV GJR-GARCH GJR-GARCH-T GJR-GARCH-GED NAGARCH NAGARCH-T NAGARCH-GED GAS-T GAS-GED Realized GARCH Realized GARCH-T Realized GARCH-GED (a) In sample results, Likelihood, AIC FMSE 5.032 4.772 4.871 9.130 5.089 5.170 4.950 5.044 5.033 4.772 4.871 5.271 5.049 4.527 4.495 4.391 FMAE 1.267 1.245 1.252 2.323 1.364 1.308 1.292 1.299 1.267 1.245 1.252 1.310 1.238 1.030 1.025 1.007 Likelihood -1211.196 -1178.383 -1181.057 -1289.287 -1186.480 -1210.783 -1180.560 -1182.678 -1211.199 -1178.424 -1181.055 -1167.652 -1176.430 -1182.124∗ -1141.504∗ -1154.145∗ (b) Out of sample results, FMSE, FMAE, Likelihood. * indicates partial log-likelihood FMAE FMSE Likelihood 6 3 2 G G A R C H A R C H G A R T C H R G A GE R C D R H G A R C G JR H-G LE V G A JR R C -G H G A JR R -G CH N A A R T -G C H N AR -G A C ED -G H A N A RC -G H G AR -T A S- CH -G G T ED A SG R ea ED liz R ed G ea liz AR R ed G CH ea liz AR ed C H G -T A R C H -G ED Table 3: Set A, Results. 7 7 6 5 3 4 0 0 0 1 1 4 2 1 1 4 6 5 3 2 4 5 2 1 6 6 5 6 4 5 12 10 12 11 1 4 13 13 1 13 13 15 14 13 12 Table 4: Diebold-Mariano test scores, set A. 7.2 Forecasting results Set B We used the same three forecast accuracy measures for scenario B as we used for scenario A. In Table 5b the FMSE, FMAE and the likelihood of encountering the returns of the validation set under our estimated models can be seen. In Table 6 the Diebold-Mariano scores can be found. The Diebold-Mariano score is the number of times a model outperformes another model, conditional on the fact that the forecast errors or (partial) likelihoods are significantly different at 5% significance. In the tables 37, 38, and 39 one can find the actual Diebold-Mariano test p-values for set B. In Table 6 it can be seen that, when FMAE is used as the forecast error measurement, the Realized GARCH models outperform most of the other models, followed by the GAS-T and the GAS-GED models. When taking a closer look at Table 37 and Table 5b, it is easy to see that the realized GARCH models outperform all others, but that there is no significant difference in FMAE among the three realized GARCH models. If one looks at the Diebold-Mariano test score for FMSE instead of the FMAE in Table 6, it can be found that the GAS-GED model is the clear winner. However, if one looks at Table 39, it can be found that the FMSE of the GAS-GED model is not significantly different from any of the GAS,GJR-GARCH and Realized GARCH models. Furthermore, it can be seen that the FMSEs of most models are only significantly different from the FMSEs of the RGARCH models. Hence, it might be concluded that the GAS-GED model is the only model that, in terms of the FMSE, is outperforming all the other models and that the RGARCH models are worse than all other models. Page 10 Modeling and forecasting volatility In Table 6 it can also be seen that, when the log-likelihood is used as measure of the forecasting performance, there is no model that is a clear winner. It can be seen that the models that have a GED or a Student’s-t distribution perform slightly better. When one takes a closer look at Table 37, it can be seen that there is no significant difference between the log-likelihoods of any of the models with a GED or a Student’s-t distribution. Furthermore, in Table 34, it can be seen that all the models with a GED distribution have a kurtosis between 0 and 2. We can therefore conclude that models that allow for fat tails perform the best in terms of the log-likelihood of forecasting the returns in the validation set. In 5a the (partial) likelihood of seeing the returns of the test set under our estimated models as well as the AIC of our models can be seen. Note that the Realized Garch-T model outperforms all the other models in terms of likelihood as well as AIC. However,interestingly, all other realized models are outperformed by the other models. Hence, just as for set A, in general, the realized GARCH models do not describe the test set returns correctly. Furthermore, as in set A, the RGARCH is performing well on the in-sample data, but very poorly on the out-of-sample data. Again, this could indicate a possibility of overfitting. In section G.3.1 of the appendix, one can find the results of the error test on the residuals of each model for data set B. We find that for all models, except for the RGARCH, the residuals are uncorrelated. Furthermore, all models, except the realized GARCH models reject that the squared residuals are independent. What is however a more important, is that all models reject the KS test. That means the residuals for every model are not empirically distributed in the way that we expected them to be distributed. Therefore, our model forecasts might not have the distributions we are supposing they have. GARCH GARCH-T GARCH-GED RGARCH RGARCH-LEV GJR-GARCH GJR-GARCH-T GJR-GARCH-GED NAGARCH NAGARCH-T NAGARCH-GED GAS-T GAS-GED Realized GARCH Realized GARCH-T Realized GARCH-GED Likelihood -1623.655 -1613.484 -1615.421 -1732.855 -1607.950 -1619.776 -1610.807 -1612.539 -1623.655 -1610.808 -1612.539 -1633.386 -1616.081 -1593.853∗ -1589.544∗ -1591.419∗ AIC 3253.309 3234.967 3236.841 3469.710 3227.89 3247.552 3231.616 3235.077 3255.309 3231.616 3235.078 3274.771 3240.162 3203.705 3199.088 3202.838 FMSE 104.739 105.315 104.782 170.160 231.426 108.246 105.675 106.410 104.738 105.317 104.784 100.248 99.460 104.626 104.587 104.605 GARCH GARCH-T GARCH-GED RGARCH RGARCH-LEV GJR-GARCH GJR-GARCH-T GJR-GARCH-GED NA-GARCH NA-GARCH-T NA-GARCH-GED GAS-T GAS-GED Real GARCH Real GARCH-T Real GARCH-GED (a) In sample, Likelihood, AIC FMAE 4.435 4.457 4.434 5.447 6.391 4.476 4.394 4.415 4.435 4.457 4.435 3.823 4.245 3.787 3.796 3.790 Likelihood -546.978 -538.286 -539.656 -594.873 -539.471 -545.003 -537.278 -538.438 -546.978 -538.278 -539.656 -539.539 -539.870 -544.702∗ -537.016∗ -539.410∗ (b) Out of sample, FMSE, FMAE, Likelihood * indicates partial log-likelihood G A R C H G A R C H G -T A R C H R -G G ED A R C R H G A R C G JR H-L EV -G A G R JR C H -G A G R JR C H -G -T A N R A C -G H -G A R ED N C A H -G A R N C A H -G -T A R G C A H S-G T ED G A SG ED R ea liz ed G R A ea R C liz H ed G R A ea R C liz H ed -T G A R C H -G ED Table 5: Set B, Results FMSE FMAE Likelihood 2.0 2.0 1.0 2.0 2.0 3.0 2.0 2.0 3.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0 2.0 1.0 1.0 4.0 3.0 1.0 3.0 4.0 2.0 3.0 1.0 2.0 2.0 3.0 2.0 2.0 3.0 2.0 8.0 1.0 8.0 8.0 3.0 2.0 13.0 1.0 2.0 13.0 2.0 2.0 13.0 2.0 Table 6: Diebold-Mariano test scores, set B. Page 11 Modeling and forecasting volatility 7.3 Visualisation of result of Set A and B Below in Figures 3 and 4, the filtered volatility of the best three models for Set A and B are depicted. The reason for having three best models per set is the fact that three different criteria are investigated namely, the score of FMSE, the FMAE, and the Likelihood. In Figure 3 the results of Set A are shown, here the best models were found to be the GAS-T, Realized GARCH-GED, and Realized GARCH-T as can be found in section 7.1. The colours given to these models are red, yellow, and green respectively. For Set B, the filtered volatility of the GJR-GARCH-GED, the GAS-GED, and the Realized GARCH-T models are estimated. Once again shown in red, yellow, and green respectively in Figure 4. Here they are plotted against the kernel (Figures 3a and 4a) and returns (Figures 3b and 4b). Larger depictions of these Figures are given in Appendix G.4. When looking at Figure 3a one notices that it seems that the red line, belonging to the GAS-T, overshoots the kernel (i.e. the assumed true volatility) quite some times. The green and yellow line (i.e. the Realized GARCH-GED and Realized GARCH-T) are more or less overlapping. The last two follow the kernel much closer, which is a good property since it is related to forecasting precision. This is as expected since these models are quite similar in contrast to the GAS-T. When observing Table 4 one can see that concerning the FMAE and the FMSE scores, the GAS-T scored lower than both of the Realized models suggesting that these Realized models are better for forecasting. In Figure 4a one can see that all of the three models of Set B seem to follow the kernel quite well. As expected, the peaks are more flattened out during extreme shocks. This can be observed very clearly around the beginning of the Deep Water Horizon scandal. When visualizing the likelihood fit, one needs to turn to Figures 3b and 4b. Since the kernel is a mere approximation of the true volatility it is not possible to directly look at the true volatility. Figures 3b and 4b are of interest given that the returns are driven by the true volatility. Also, the likelihood is based on the true volatility. When visualizing the likelihood fit of Set A and B, one needs to turn to Figures 3b and 4b. These returns are harder to visualize and are an interesting subject for further research. (a) Results vs Kernel. (b) Results vs Returns. Figure 3: Set A, Visualisation of results against Kernel and Return. (a) Results vs Kernel. (b) Results vs Returns. Figure 4: Set B, Visualisation of results against Kernel and Return. Page 12 Modeling and forecasting volatility 8 Conclusion When considering the results of the 16 different GARCH, GAS, and realized GARCH models on dataset A, it is easy to conclude that the realized GARCH models are superior in forecasting volatility of BP’s stock, considering all three forecast accuracy measures. In Figure 3 we have visualized the filtered volatility of the realized GARCH-T and GED, as well as the filtered volatility of the GAS-T model, versus close-toclose returns and the corrected Realized Kernel (across our validation set). It is easy to see that both realized GARCH models perform quite well in approximating the RK and in approximating volatility of the returns (with respect to GAS-T, one of the best performing models of the GARCH/GAS family on set A). Both realized GARCH-T and GED follow almost exactly the same filtered volatility path. When dataset B is considered, however, a different conclusion should be drawn. Now, only when considering the FMAE the realized GARCH models reign supreme, while the GAS-GED outperforms them massively when considering the FMSE. Based on the likelihood it is difficult to choose a “best” model, since most of them do not show significant differences in daily log-likelihood contributions. This is quite different from the results obtained using dataset A, where the realized GARCH models were without doubt the best performing ones. There may be multiple explanations for this, but the most realistic are that (1) in the ‘Deepwater Horizon’ period the distribution of returns was very complex, and this made it very difficult for any model to accurately predict the volatility of returns after this event; (2) the Realized Kernel is not a good approximation of volatility in the ‘Deepwater Horizon’ period, and hence the extra information it provides in the realized GARCH models does not necessarily lead to a better distributional fit of returns in this period. There is at least some evidence for (1), since in the residual test none of the models passed the KolmogorovSmirnov test (neither for set A or B). Hence, we can conclude that there is still some room for improvement by choosing a more flexible distribution, for example a skewed student-t. Also, it may be interesting to investigate the possibility of using a time-varying parameter for kurtosis, as this varies heavily across years (spiking in 2010, the year of the ‘Deepwater Horizon’ crisis; see Table 7). It is more difficult to determine whether the RK is a good approximation of volatility; however, there are two ways in which our RK could give a wrong estimate. The first one is the fact that we multiplied our open-to-close approximation with a fixed number, to correct for close-to-open variance. However, the close-to-open variance seems to vary over time as well, and hence this fixed number correction could lead to bias. A second possibility is the presence of “gradual jumps” in the ‘Deepwater Horizon’ period. Barndorff-Nielsen et al. (2009) show that the Realized Kernel cannot properly deal with this feature in the data, but it is nevertheless very difficult to determine the presence of gradual jumps, let alone clean them from the data. It would therefore be very interesting to look into methods for identifying the presence of gradual jumps, and investigate the possibility of robustifying the Realized Kernel against them. Kurt. 2007 0.724 2008 5.222 2009 1.310 2010 11.585 2011 0.775 2012 1.415 2013 3.120 2014 3.902 Table 7: Overview of kurtosis across each year (C2C returns) Page 13 Modeling and forecasting volatility References Akaike, H. (1998). Information theory and an extension of the maximum likelihood principle. , 199–213. Barndorff-Nielsen, O. E., Hansen, P. R., Lunde, A., & Shephard, N. (2009). Realized kernels in practice: Trades and quotes. Oxford University Press Oxford, UK. Barndorff-Nielsen, O. E., & Shephard, N. (2004). Power and bipower variation with stochastic volatility and jumps. Journal of financial econometrics, 2 (1), 1–37. Blasques, C. (2019). Advanced econometric measures. canvas, 31 (3), 62–64. Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. Journal of econometrics, 31 (3), 307–327. Casella, G., & Berger, R. (2002). Statistical inference. Thomson Learning. Retrieved from https:// books.google.nl/books?id=0x vAAAAMAAJ Cox, D. R., Gudmundsson, G., Lindgren, G., Bondesson, L., Harsaae, E., Laake, P., . . . Lauritzen, S. L. (1981). Statistical analysis of time series: Some recent developments [with discussion and reply]. Scandinavian Journal of Statistics, 93–115. Creal, D., Koopman, S. J., & Lucas, A. (2013). Generalized autoregressive score models with applications. Journal of Applied Econometrics, 28 (5), 777–795. Czyżycki, R. (2013). Using ged (generalized error distribution) for modeling distribution of the rates of return. Diebold, F. X., & Mariano, R. S. (2002). Comparing predictive accuracy. Journal of Business & economic statistics, 20 (1), 134–144. Engle, R. F. (1982). Autoregressive conditional heteroscedasticity with estimates of the variance of united kingdom inflation. Econometrica: Journal of the Econometric Society, 987–1007. Engle, R. F., & Ng, V. K. (1993). Measuring and testing the impact of news on volatility. The journal of finance, 48 (5), 1749–1778. Glosten, L. R., Jagannathan, R., & Runkle, D. E. (1993). On the relation between the expected value and the volatility of the nominal excess return on stocks. The journal of finance, 48 (5), 1779–1801. Hansen, P. R., Huang, Z., & Shek, H. H. (2012). Realized garch: a joint model for returns and realized measures of volatility. Journal of Applied Econometrics, 27 (6), 877–906. Hoogerheide, L. (n.d.). Some distributions that are used in models for daily stock returns. Lee, Y. G., Garza-Gomez, X., & Lee, R. M. (2018). Ultimate costs of the disaster: Seven years after the deepwater horizon oil spill. The Journal of Corporate Accounting & Finance, 29 (1), 69–79. Nelson, D. B. (1991). Conditional heteroskedasticity in asset returns: A new approach. Econometrica: Journal of the Econometric Society, 347–370. Pallardy, R. (2016). Deepwater horizon oil spill environmental disaster, gulf of mexico [2010]. Encyclopedia Brittanica. Smirnov, N., & Smirnov, N. (1939). On the estimation of the discrepancy between empirical curves of distribution for two independent samples. Zhang, L., Mykland, P. A., & Aı̈t-Sahalia, Y. (2005). A tale of two time scales: Determining integrated volatility with noisy high-frequency data. Journal of the American Statistical Association, 100 (472), 1394–1411. Page 14 Modeling and forecasting volatility Appendices A Data Cleaning Table 8 summarizes the process of data cleaning. For each step in the data cleaning process from Barndorff-Nielsen et al. (2009) it shows how many data points were removed. This is given for every year in separate columns. If one adds the values of all years for the 6 different steps, it results in the amount of removed data points per step in the total data set. This accumulation can be found in the first column of Table 8. Descriptive trading days observations outside window (P1) Incorrect Prices (P2) non-NASDAQ Exchanges (P3) Corrected Trades (T1) Abnormal Trades (T2) Double Exchanges (T3) Retained Observations Total 2.014 72.419.933 299.503 0 Test Sample (Set A) 2007 2008 251 253 3.305.145 7.411.499 1.263 12.537 0 0 2009 252 8.290.510 16.108 0 2010 252 22.838.089 188.055 0 2011 252 10.940.914 33.927 0 Validation 2012 250 7.125.196 18.819 0 Sample (Set 2013 252 5.813.265 15.363 0 55.944.226 2.277.898 5.416 1.121.824 380 25.709 10.944.478 4.104.486 A) 2014 252 6.695.315 13.426 0 5.229.284 6.458.787 17.260.936 8.594.100 5.701.799 4.880.917 5.349.262 914 42.298 354 50.077 1.138 407.081 496 115.028 306 62.657 1.134 80.882 694 338.092 453.184 1.372.950 1.131.670 4.110.637 1.587.533 985.161 606.265 697.078 355.462 753.516 633.514 870.242 609.830 356.454 228.704 296.763 Table 8: Overview of Data cleaning and Filtration process 2007-2014. The most data is lost during the execution of P3, where data from other exchanges are removed. This is no big surprise, as BP is listed on several exchanges. Besides this, step T3, where exchanges that took place during the same second have been summarized in the median, accounts for a large reduction. This shows that the data really is high-frequency. Corrected trades were to be found very few and there were no incorrect prices at all. B B.1 Descriptive Statistics Returns When one looks at the plotted ’Open to Close’ and ’Close to open’ returns, there are clearly periods where the variance is larger i.e. peaks are higher. The most remarkable periods of higher variance correspond to the shocks in 2008 and 2010 about which is reported earlier. Figure 5: The development of returns on BP shares between 2007 and 2015 shows two large shocks. Page 15 Modeling and forecasting volatility Table 9 projects the descriptive statistics of the O2C and C2C returns. There are no large differences between the two classes of returns. Descriptive Statistic number of observations mean standard deviation variance minimum first quartile median third quartile maximum kurtosis skewness O2C C2C 1987.0 0.0279 1.3657 1.8651 -15.1994 -0.5724 0.0436 0.6462 10.3397 14.7165 -0.6964 1986.0 -0.0247 2.0319 4.1285 -17.1814 -0.8721 0.0597 0.8941 14.7646 11.7923 -0.4902 Table 9: Summary of descriptive statistics of BP O2C and C2C returns. B.2 2009-2010 Figure 6 takes a closer look at the price and return development in the year 2010. It is remarkable that it takes long before the prices and returns react to the big shock. Although the disaster happened in April 2010, serious decline in prices and increased volatility in returns starts in May 2010. However, after the big shock, the price level never recovers to the value before the Deep Water Horizon accident, even years after the it as is shown in Figure 1. (a) Return development (b) Price development Figure 6: BP shares in 2010, large shock around the Deep Water Horizon spill (’10/04/20). Focussing on the descriptive statistics of the prices during the years 2009 up to and including 2011, some interesting differences are to be seen. The amount of trades executed during 2010 is very high. Also the variance in 2010 is much higher compared to the other years. These descriptives confirm that 2010 was a highly unstable year as is stated above. Page 16 Modeling and forecasting volatility Descriptive Statistic number of observations mean standard deviation variance minimum first quartile median third quartile maximum kurtosis skewness 2009 2010 2011 633514 46.9046 6.5859 43.3743 33.7 41.24 46.99 51.88 60.0 -1.009 0.1548 870242 42.1806 9.2636 85.8152 26.75 35.49 40.56 49.26 62.38 -0.8732 0.4396 609830 42.7359 3.4955 12.2185 33.62 39.79 43.465 45.44 49.5 -0.7352 -0.4312 Table 10: Summary of descriptive statistics of BP prices of 2009, 2010, 2011. In order to test whether data is normally distributed, a Jarque-Bera test was performed on the prices as well as returns for the total dataset and the data in the years 2009 up to and including 2010. The results of these JB tests are given in Table 11. The null hypothesis of this test assumes a normal distribution: skewness and excess kurtosis are 0. The null hypothesis of a normal distribution is clearly rejected in all cases. Returns in 2010 have a much higher JB-statistic than other years. JB Test Statistic P-Value Conclusion: Normal Distribution Prices Total 441,015.5 0.0 2009 29,404.7 0.0 2010 55,678.5 0.0 2011 32,628.8 0.0 Returns (C2C) Total 2009 11,586.7 23.1 0.0 9.653e− 06 2010 1496.3 0.0 2011 10.2 0.006 No No No No No No No No Table 11: Results of the JB test for Prices and Returns of all years and 2009-2011. Significance level: 0.01. C Kernel Models With Kernel models we can estimate the integrated volatility over a given time interval [0,T]. The integrated volatility is defined as Z T IVT = σ(s)2 ds (14) 0 and it is the population measure of the actual return variance over a certain time interval. If we let Pt denote the price process of an asset and we suppose that Xt = log(Pt ) follows an Itô process, so that we have dX(t) = µt dt + σt dBt (15) where, according to Zhang et al. (2005), Bt is a standard Brownian Motion, µt is the drift coefficient and σt is the instantaneous variance of the returns process, then the quadratic variation is given by 2 QVT = plimN →∞ ΣN i=1 (X(ti ) − X(ti−1 )) (16) where 0 = t0 < t1 < t2 < ... < tN = T and sup(ti − ti−1 ) → 0 as n → ∞. In Barndorff-Nielsen & i Shephard (2004) it is shown that the quadratic variation is equal to the integrated volatility. Hence, it is shown that Z T 2 plimN →∞ ΣN (X(t ) − X(t )) = σ(s)2 ds (17) i i−1 i=1 0 This knowledge brings us to the first kernel model that we have implemented, namely the realized volatility. Page 17 Modeling and forecasting volatility Realized Volatility The realized volatility is the sample analogue of the quadratic variation. For a given time interval [0,T], it is defined as N X RVT = (Xti − Xti−1 )2 (18) i=1 where 0 = t0 < t1 < t2 < ... < tN = T It can be shown that when N → ∞, RVT → QVT . One would therefore expect that calculating the realized volatility over all the observed values in time interval [0,T] will give the best estimate of the integrated volatility over that interval. Let us denote this measure by (all) (all) RVT . Zhang et al. (2005) tells us that RVT is actually not a good estimator. It is stated that due to the microstructure of the market we do not actually observe the real price process Xti , but the process Yti = Xti + ti instead. We can interpret the microstructure of the market as our observation errors. In Zhang et al. (2005) they show that, because of the market microstructure, the expectation of the realized (all) volatility RVT calculated on the observed data and conditioned on the true, latent, log price process X, is given by E([Y, Y ]T |X process) = [X, X]T + 2nE2 (19) and hence, that our estimator is biased. What’s more, as can be seen in (19), this bias increases linearly with the sample frequency n. In Zhang et al. (2005) they therefore propose to sample less frequently. (sparse) The realized volatility obtained with this sparse sample is denoted by RVT and has a lower bias (all) than RVT . Two Time Scales Realized Volatility (sparse) (all) In Zhang et al. (2005) it is pointed out that RVT , altough it has a lower bias than RVT , does have a bigger error caused by discretization. Remember that (17) still holds and that by sampling sparsely our estimate will be further removed from the limit. It is shown in Zhang et al. (2005) that this error has an effect on the variance of the estimator. It turns out that the bigger the discretization error, the (sparse) (all) bigger the variance of the estimator. Hence, RVT has a bigger variance than RVT To alleviate this problem, Zhang et al. (2005) proposes to take the average of estimators created on different subgrids. To do this, the full grid, hence all the observations included in a time interval [0,T], is divided into K nonoverlapping subgrids. Hence, we have K [ G = G (k) (20) k=1 where each subgrid G (k) is constructed by taking the tk−1 sample point and then taking every Kth sample point until you reach T. We then obtain the realized volatility on each of these K subgrids. The realized (k) volatility obtained in this way on the kth subgrid is denoted by RVT . Then, the final estimator is obtained by taking the average over the realized volatilities calculated on the K subgrids. Hence, we have (avg) RVT = K 1 X (k) RVT K (21) k=1 In this way, the benefits of not sampling too often is retained, while at the same time the variation of the estimator is decreased. However,it is pointed out in Zhang et al. (2005) that this estimator is still biased. The bias is equal 2n̄E2 , where n̄ is the average size of the K subgrids. (avg) To overcome this problem, Zhang et al. (2005) proposes to correct the bias of RVT . Zhang et al. (all) 1 (2005) shows that 2n RVT is a consistent estimator for the variance of the error term. For a fixed true return process X, 1 (all) − E2 ) → N (0, E4 ) (22) n1/2 ( RVT 2n Here, n is the number of observation in the full grid. Then, by taking (avg) RVT − n̄ (all) RV n T (23) Page 18 Modeling and forecasting volatility we obtain an unbiased estimator. Let’s dentote this measure by T SRVT . In Zhang et al. (2005) a small-sample adjustment is given by (1 − n̄n )−1 T SRV Bipower Variation Model In Barndorff-Nielsen & Shephard (2004) they state that if the log-price of an asset, Xt , is a member of the Brownian semimartingale plus jump (BSMJ) class, so that we have t Z Xt = Zt = t Z as ds + σs dWs + Zt 0 t ΣN j=1 cj (24) 0 where Zt is a jump process, with N a counting process that is finite for every t and the cj are nonzero random variables, then the realized volatility converges in probability to the sum of the integrated volatility and the quadratic variation of the jump process. To still be able to estimate the integrated volatility, Barndorff-Nielsen & Shephard (2004) introduce the Bipower Variation Model. The Bipower Variation process on interval [0,T] of order (1,1) is given by BVT = N X |rj,T ||rj−1,T | (25) j=0 where rj,T = Xj,T −Xj−1,T . Hence, rj,T is the return over the jth period on time interval [0,T] BarndorffNielsen & Shephard (2004) show that BVT = µ21 T Z σ 2 (s)ds (26) 0 where µ1 = E|u| = obtained: √ √2 π and u ∼ N (0, 1). The following estimator of the integrated volatility is therefore µ−2 1 BVT = µ−2 1 N X |rj,T ||rj−1,T | (27) j=0 Z = T σ 2 (s)ds (28) 0 Realized Kernel In Barndorff-Nielsen et al. (2009) an other kernel model is proposed to estimate the integrated variation, namely the realized kernel. In Barndorff-Nielsen et al. (2009) they also assume that the price process, X, follows a Brownian semimartingale plus jump process. Hence, we have Z Xt = T Z au du + 0 T σu dWu + Jt (29) 0 where T JT = ΣN i=1 Ci (30) is a finite activity jump process. Here NT is the number of jumps that occurred in the time interval [0,T] and finite activity means that NT < ∞ for any T. Barndorff-Nielsen et al. (2009) also states that we do not observe the true price process, but that we instead observe Yti = Xti + ti (31) where again represents the market microstructure. Page 19 Modeling and forecasting volatility In Barndorff-Nielsen et al. (2009) they are interested in the quadratic variation of X given by Z T 2 t σu2 du + ΣN i=1 Ci (32) 0 RT Here, 0 σu2 du is the integrated variance. In Barndorff-Nielsen et al. (2009) the following model is proposed to estimate the quadratic variation K(X) = ΣH h=−H k( h )γh H +1 (33) where γh = Σnj=|h|+1 xj xj−|h| and k(x) is given by 1 2 3 1 − 6|x| + 6|x| , if 0 ≤ |x| ≤ 2 k(x) = 2(1 − |x|)3 , if 21 ≤ |x| ≤ 1 0, if |x| > 1 Here xj is the jth high frequency return calculated over the interval tj−1 − tj . In Barndorff-Nielsen et al. (2009) they state that 4 3 H = cξ 5 n 5 (34) where ξ 2 = √ T ω2 RT 0 4 du σu , c=3.5134 and n is equal to the number of intra-day returns of a day. Further- more, H is rounded up to to nearest integer. In Barndorff-Nielsen et al. (2009) a non-trivial way for estimating ξ 2 is proposed. This method is implemented in this paper to estimate ξ 2 . Realized Measure Paths over the Full Data Set Figure 7 plots the paths of the 4 Realized Measures that were produced in this study over the returns. This is done over the whole period 2007-2015. Figure 7: Realized Measure paths over 2007-2014. Looking at the full sample, the RV seems much more prone to price jumps than the other measures, and hence shows some strong jumps in volatility as well. The other RM’s, all decently seem to approximate the stock’s volatility. Page 20 Modeling and forecasting volatility D Realized Garch In Hansen et al. (2012) a framework is introduced that combines a GARCH structure for returns with an integrated model for realized measures of volatility. The models within this framework are called Realized GARCH models. The general structure of the model is given by p (35) rt = ht zt ht = v(ht−1 , ..., ht−p , xt−1 ..., xt−q ) xt = m(ht , zt , ut ) where rt is the return, xt is a realized measure of volatility, zt ∼ i.i.d.(0, 1), ut ∼ i.i.d.(0, σu2 ), and ht = var(rt |Ft−1 ) with Ft = σ(rt , xt , rt−1 , xt−1 , ...). The first and second equation in (35) are called the return equation and GARCH equation respectively. The third equation is the measurement equation and relates the observed realized measure to the latent volatility. Hence, the Realized GARCH model fully specifies the dynamic properties of both returns and the realized measure Hansen et al. (2012). In our paper we implemented a Realized GARCH with log-linear specification as defined in Hansen et al. (2012). The model looks in general as follows p rt = ht zt (36) log(ht ) = ω + Σpi=1 βi log(ht−i ) + Σqj=1 γj log(xt−j ) log(xt ) = ξ + φlog(ht ) + τ (zt ) + ut where zt = √rht ∼ i.i.d.(0, 1), ut ∼ i.i.d.(0, σu2 ). τ (zt ) is the leverage function. In our paper both p and q t are equal to one. Furthermore our leverage function is, as suggested in Hansen et al. (2012), defined as τ (zt ) = τ1 zt + τ2 (zt2 − 1) (37) In this way it is, according to Hansen et al. (2012), possible to generate an asymmetric response in volatility to return shocks. Furthermore, in estimating the realized GARCH models we use the following (back)transformations of the original parameters to the parameter set θ, in order to use unrestricted optimization. ω = θ1 exp θ3 exp θ2 + exp θ3 + 1 exp θ2 γ = exp −θ5 exp θ2 + exp θ3 + 1 ξ = θ4 θ1 = ω φγ 1 − φγ − β β θ3 = ln 1 − φγ − β θ4 = ξ β= θ2 = ln φ = exp θ5 θ5 = ln φ σu = exp θ6 θ6 = ln σu τ1 = θ7 θ7 = τ1 τ2 = θ8 θ8 = τ2 realized − GARCH − T : ν1 = exp θ9 + 2 θ9 = ln ν1 − 2 ν2 = exp θ10 + 2 θ10 = ln ν2 − 2 realized − GARCH − GED : ν1 = tanh θ9 + 1 θ9 = arctanh(ν1 − 1) ν2 = tanh θ10 + 1 θ10 = arctanh(ν2 − 1) Page 21 Modeling and forecasting volatility E GAS E.1 Introduction The generalized autoregressive score (GAS) model belongs to the class of oberservation driven models. Creal et al. (2013) argue that GAS models have some advantages such as, but not limited to, straightforward likelihood evaluation since it is available in closed form, extensions asymmetric, and long memory. Creal et al. (2013) also states that since a GAS model is based on a score function, it exploits the complete density structure rather than the means and higher moment. This sets the GAS model apart from other observation driven models. In the GAS model the updating equation is described by equation (38). where ω is a vector of constants, Ai and Bi are coefficient matrices and st is a function of past data, st = st (yt , σt2 , Ft ; θ). Here the returns are given by yt , The time-varying parameter are given by σt2 , the σ-field is given by Ft , and the vector of static parameters is given by θ. In this paper the focus will be on a GAS(1,1) model (i.e. both parameter p and q are set to 1). This reduces the model to equation (39). 2 σt+1 =ω+ p X Ai st−i+1 + i=1 q X 2 Bj σt−j+1 (38) j=1 2 σt+1 = ω + Ast + Bσt2 (39) In equation (40) further detailing is provided of the st part. Here S(·) is a Matrix function, the scaling matrix. st = St · ∇t , ∇t = ∂lnp(yt |σt2 , Ft , θ) , ∂ft St = S(t, σt2 , Ft ;θ) (40) Creal et al. (2013) explain how the score defines a steepest ascent direction for improving the local fit in terms of the likelihood or density at time t given the current position of the parameter σt2 . This will then give the natural direction for the updating parameter. The exploitation of the full density structure by the GAS model introduces new transformations of the data that can be used to update the time-varying parameter σt2 . Furthermore, Creal et al. (2013) stretch how via the choice of the scaling matrix St , the GAS model allows for additional flexibility in how the score is used for updating σt2 . Therefore, different choices for the scaling matrix St result in different GAS models. There is no clear theory available on how to scale the score but the literature gives three main suggestions namely, • Unit scaling: St = I • Inverse fisher information matrix scaling: St = (Et−1 [∇t ∇t 0 ])−1 • square root inverse Fisher information matrix: St = (Et−1 [∇t ∇t 0 ])−1/2 E.2 General framework GAS student t When using a Gaussian distribution for model estimation of a GAS(1,1) model when using the following basic model: yt = σt t it can be shown that this model reduces to a GARCH(1,1). Proof of this can be found in Creal et al. (2013). Therefore, in this paper, one focuses on both student’s-t and GED distributed t . For a GAS(1,1) model with Student-t distribution the updating equation is depicted in equation (41). (1 + ν −1 ) 2 2 2 y − σ + B1 σt2 (41) σt+1 = ω + A1 · (1 + 3ν −1 ) · t (1 + ν −1 yt2 /σt2 ) t One needs to keep in mind that when νt−1 is equal to zero this equation will reduce to equation (39) as stated by Creal et al. (2013). An important distinction between equation (41) and a GARCH(1,1) with Student-t distribution is the fact that the denominator in the GAS(1,1) model ensures that large realisations of yt result in a smaller effect in the variance when ν is finite. Page 22 Modeling and forecasting volatility F Error distributions When estimating the models as described above in section 5, one needs to notice the fact that it is possible for t to be distributed in various ways. In this paper, the following three error distributions are investigated: the Gaussian distribution, the Student-t distribution, and the Generalized error distribution (GED). The reason for these specific distributions is as follows, the Gaussian distribution is the easiest and most standard choice for t . There are some drawbacks with this distribution. For example the fact that most financial data is not normally distributed. In most, if not all, cases in financial data one encounters fat tails. To deal with the fact that it is common for empirical distributions to be more leptokurtic, the Student-t distribution is investigated. This specific distribution is a lot better in handling this specific data. Finally, the GED distributions is useful because it deals with both kurtosis, which is related to fat tails, and skewness, which accounts for asymmetric effect. As stated by Nelson (1991) this distribution is able to capture both fat and skinny tails depending on different parameter values. Distributions such as the skewed Student-t distribution are not used in this paper since the close-to-close and open-to-close data is not skewed. As described above, in financial data these two concepts are of major importance when one wants to properly estimate the a model which encapture all the properties of the data. F.1 Gaussian This Gaussian distribution is one of the most standard ways to model an error distribution. In this approach, the probability density function (pdf) is given by equation (42), here µt is equal to the conditional mean of the observation equation. The the Log likelihood is given by equation (43). f (yt |µt , σt2 ) = p Lt (YT , θ) = 1 2πσt2 exp − (yt − µt )2 2σt2 T X 1 1 − log(2πσ2 ) − 2 (yt − E[yt |yt−1 ])2 2 2σ t=2 (42) (43) One has to keep in mind that using this distribution the kurtosis in equal to 3 and the skewness is equal to 0 as stated by Casella & Berger (2002). F.2 Student-t When looking at the Student-t distribution there are some restrictions, for the degrees of freedom parameter, ν, must be larger than 2 for the distribution to have a variance Casella & Berger (2002). The PDF of a Student-t distributed error is given by equation (44). The log likelihood is given by equation 45 Hoogerheide (n.d.). When ν < 2 it implies extreme fat tails in the distribution of the data. When 2 < ν ≤ 3, then there is no skewness. ν → ∞ implies that the distribution of the Student-t converges to a Gaussian distribution. Most common for financial data is a ν of between 3 and 5, however this is dependent on the data as ν depends on the stock, the period and the model for the variance . − ν+1 2 Γ( ν+1 (yt − µt )2 2 −1/2 2 ) f (yt |ν) = 1 + ((ν − 2)πσ ) t Γ( ν2 ) (ν − 2)σt2 ν 1 ν+1 ν+1 (yt − µt )2 2 log p(rt ) = log Γ − log Γ − log((ν − 2)πσt ) − log 1 + 2 2 2 2 (ν − 2)σt2 F.3 (44) (45) Generalized error distribution For the GED distribution one needs to focus on the ν parameter. This ν is a Tail-thickness parameter Nelson (1991), the properties of this distribution strongly depend on this value. If the ν parameter is equal to 1, the distribution yields a Laplace distribution. When ν is equal to 2 the GED transforms in a Gaussian distribution Czyżycki (2013). A ν between 0 and 2 results in a kurtosis larger than 3. Whereas a ν > 2 results in the kurtosis < 3, Hoogerheide (n.d.). The PDF of the GED is given by equation (46). In this equation the λ variable is a function of ν. For the equation of the log likelihood one needs to look at equation 47. This equation uses the same λ(ν) as used in 46. Page 23 Modeling and forecasting volatility p(rt ) = (1+(1/ν)) 2 −1 1 1 rt − µt −1/2 Γ λ σt2 ν · exp − ν 2 λσ 2 1/2 t ν! s , λ= Γ(1/ν) 2/ν 2 Γ(3/ν) 1 1 rt − µt 1 λ − log(σt2 ) + log(ν) − log p(rt ) = − log 2(1+(1/ν)) Γ ν 2 2 λσ 2 1/2 t F.4 (46) ν (47) Derivative This part of the appendix contains the derivation of the Student-t and Generalized Error Distribution. As already explained, the derivatives are used in the GAS models within the score. A fully explanation of the score can be found in Appendix E. F.4.1 Student-t In this section derivation of the likelihood equation for a Student t is stated step by step: ν 1 ∂ log p(rt ) ∂ ν+1 ν+1 (yt − µt )2 2 − log((ν − 2)πσt ) − = log Γ − log Γ log 1 + ∂σt2 ∂σt2 2 2 2 2 (ν − 2)σt2 (48) r2 ν+1 ∂ 1 ∂ 2 [ln(π(ν − 2)σ )] − · ln + 1 =− · t 2 ∂σt2 2 lσt2 (ν − 2)σt2 1 =− r2 2 (ν−2)σt +1 · ∂ ∂σt2 h r2 (ν−2)σt2 i + 1 · (ν + 1) − 2 π(ν − 2) · =− 2π(ν − ∂ [σ 2 ] ∂σt2 t 2)σt2 − r2 ν−2 − 1 − 2σt2 ∂ [1] ∂σt2 σt2 2 =− · 2 ∂ [σ 2 ] t 2 ∂σt r2 4 σt ν−2 1 π(ν−2)σt2 ∂ [π(ν ∂σt2 · − 2)σt2 ] 2 + ∂ [1] ∂σt2 r2 (ν−2)σt2 ) +1 (49) (50) (ν + 1) (51) (ν + 1) r2 (ν−2)σt2 +1 r2 (ν + 1) 1 − 2 r2 2σ 4 t 2(ν − 2) (ν−2)σ2 + 1 σt (52) (53) t =− (ν − 2)σt2 − r2 ν 2σt2 ((ν − 2)σt2 + r2 ) (54) Page 24 Modeling and forecasting volatility F.4.2 GED ∂ log p(rt ) ∂ = ∂t2 ∂σt2 1 1 1 rt − µt (1+(1/ν)) − log 2 Γ λ − log(σt2 ) + log(ν) − ν 2 2 λσ 2 1/2 t 1 ∂ ∂ |r|ν =− · · [ln(σt2 )] − 2 2 ∂σt 2|λ|ν ∂σt2 ν−1 ν √1 2 · σt ∂ ∂σt2 √1 2 σt σt 1 =− 2 − 2σt √1 2 · ∂ ∂t2 1 p σt2 · |r|ν − 2|λ|ν √1 2 " σt (55) ν# (56) 1 σt2 (57) 2 √1 2 · |r|ν ν √1 2 ν! ν−1 σt σt (58) 2|l|ν 1 (− 12 )(σt2 )− 2 −1 |r|ν ν| √1 2 |ν−2 1 σt p =− − 2 2 ν 2 2|λ| σt t |r|ν ν| √1 2 |ν−2 = σt 4|λ|ν σt4 − |r|ν ν| √1 2 |ν = σt 4|λ|ν σt2 − 1 2σt2 1 2σt2 (59) (60) (61) Page 25 Modeling and forecasting volatility G G.1 Results Set A In this part of the Appendix the parameter results of data Set A are depicted. This data set includes values from 2007 up to and including 2011 as the validation sample. Here the test sample is form 2012 till 2014. G.1.1 Estimates ω α β ν γ GARCH Coefficient P-value 0.1102 0.000 0.1238 0.000 0.8568 0.000 GARCH-GED Coefficient P-value 0.1050 0.002 0.1047 0.000 0.8735 0.000 1.45438 0.000 GARCH-T Coefficient P-value 0.096 0.002 0.090 0.000 0.899 0.000 7.414 0.000 RGARCH Coefficient P-value 2.170 0.000 0.777 0.000 RGARCH-LEV Coefficient P-value 0.129 0.000 0.072 0.000 0.895 0.000 GJR-GARCH Coefficient P-value 0.108 0.000 0.040 0.023 0.879 0.000 GJR-GARCH-T Coefficient P-value 0.100 0.000 0.030 0.055 0.899 0.000 8.001 0.000 0.092 0.000 GJR-GARCH-GED Coefficient P-value 0.106 0.000 0.034 0.052 0.889 0.000 1.485 0.000 0.103 0.000 NAGARCH Coefficient P-value 0.110 0.000 0.124 0.000 0.857 0.000 NAGARCH-T Coefficient P-value 0.096 0.002 0.090 0.000 0.889 0.000 7.413 0.000 0.000 0.500 NAGARCH-GED Coefficient P-value 0.105 0.002 0105 0.000 0.874 0.000 1.454 0.000 0.000 0.500 GAS-T Coefficient P-value 0.029 0.000 0.234 0.000 0.979 0.000 9.244 0.000 GAS-GED Coefficient P-value 0.029 0.006 0.230 0.000 0.978 0.000 1.487 0.000 7.875 0.000 ρ δ -0.010 0.140 -2.120 0.001 0.113 0.000 0.000 0.500 Table 12: Set A: Estimates of parameters of the different models. Realized GARCH Realized GARCH-T Realized GARCH-GED coeff pval coeff pval coeff pval θ1 0.055 0.044 0.056 0.050 0.054 0.044 θ2 2.695 0.000 2.691 0.000 2.709 0.000 θ3 2.811 0.000 2.859 0.000 2.872 0.000 θ4 -0.083 0.425 -0.087 0.500 -0.086 0.418 θ5 -0.022 0.326 -0.013 0.382 -0.018 0.326 θ6 -0.807 0.000 -0.807 0.000 -0.807 0.000 θ7 0.048 0.463 0.044 0.500 0.045 0.455 θ8 0.022 0.471 0.023 0.500 0.022 0.478 θ9 θ10 2.070 0.000 0.679 0.000 2.386 0.000 0.768 0.000 Table 13: θ estimates for set A. Page 26 Modeling and forecasting volatility Coefficient Realized GARCH Realized GARCH-T Realized GARCH-GED ω 0.055 0.056 0.054 β 0.513 0.526 0.525 γ 0.466 0.450 0.454 ξ -0.083 -0.087 -0.086 φ 0.979 0.987 0.982 σ 0.446 0.446 0.446 τ1 0.048 0.044 0.045 τ2 0.022 0.023 0.022 ν1 ν2 9.924 1.591 12.87 1.646 Table 14: Set A: Estimates of parameters of the different Realized GARCH models. 0,442 0,000 0,000 0,000 0,001 0,000 0,000 0,441 0,000 0,000 0,000 0,000 0,090 0,000 0,000 0,000 0,113 0,434 0,000 0,000 0,000 0,149 0,000 0,117 0,434 0,006 0,272 0,477 0,000 0,041 0,000 0,053 0,195 0,000 0,072 0,000 0,149 0,000 0,055 0,195 0,001 0,128 0,491 0,000 0,020 0,000 0,000 0,000 0,000 0,004 0,441 0,000 0,000 0,000 0,000 0,000 0,000 0,072 0,000 0,000 0,000 0,000 0,127 0,000 0,001 0,000 0,117 0,055 0,000 0,127 0,008 0,375 0,444 0,000 0,050 0,000 0,123 0,000 0,000 0,074 0,000 0,434 0,195 0,000 0,127 0,000 0,161 0,483 0,000 0,021 0,000 0,008 0,000 0,000 0,001 0,000 0,006 0,001 0,000 0,008 0,000 0,005 0,270 0,000 0,133 -G ED R C H -T ed ed G A R C H H G A A R C G ED R ea liz 0,004 0,001 0,074 0,000 0,001 0,000 0,072 0,004 0,001 0,074 0,001 0,089 0,436 0,000 0,020 R ea liz 0,000 0,000 0,000 0,000 0,000 0,000 0,000 0,000 0,000 0,000 0,000 0,000 0,000 0,000 0,000 ea liz ed 0,000 0,123 0,000 0,074 0,000 0,434 0,195 0,000 0,127 0,000 0,000 0,161 0,483 0,000 0,021 R 0,000 0,123 0,000 0,001 0,000 0,113 0,053 0,000 0,000 0,123 0,008 0,377 0,443 0,000 0,050 SG R G 0,000 0,000 0,000 0,004 0,442 0,000 0,000 0,000 0,000 0,000 0,000 0,000 0,072 0,000 0,000 G A H -L G EV JR -G A R C G H JR -G A R C G H JR -T -G A R C N H A -G G A ED R C H N A G A R C H N -T A G A R C H G -G A ED ST A R C R G A R C H G A R C H -G ED -T H G GARCH GARCH-T GARCH-GED RGARCH RGARCH-LEV GJR-GARCH GJR-GARCH-T GJR-GARCH-GED NAGARCH NAGARCH-T NAGARCH-GED GAS-T GAS-GED Realized GARCH Realized GARCH-T Realized GARCH-GED G A R C H DM-test results A R C G.1.2 0,000 0,377 0,161 0,000 0,089 0,000 0,272 0,128 0,000 0,375 0,161 0,005 0,397 0,000 0,019 0,072 0,443 0,483 0,000 0,436 0,090 0,477 0,491 0,072 0,444 0,483 0,270 0,397 0,021 0,014 0,000 0,000 0,000 0,000 0,000 0,000 0,000 0,000 0,000 0,000 0,000 0,000 0,000 0,021 0,048 0,000 0,050 0,021 0,000 0,020 0,000 0,041 0,020 0,000 0,050 0,021 0,133 0,019 0,014 0,048 - 0.001 0.000 0.000 0.000 0.000 0.001 0.001 0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.026 0.000 0.000 0.000 0.000 0.001 0.003 0.027 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.007 0.000 0.000 0.000 0.000 0.001 0.003 0.007 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.001 0.027 0.007 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.010 0.000 0.000 0.000 0.000 0.000 0.000 0.010 0.000 0.246 0.000 0.000 0.000 0.000 0.007 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.010 0.000 0.059 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 G A R ea C liz H ed G A R R ea C liz H ed -T G A R C H -G ED 0.000 0.257 0.059 0.000 0.000 0.000 0.000 0.000 0.000 0.246 0.059 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 R G ED ea liz ed 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 R 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 S- 0.000 0.007 0.000 0.000 0.000 0.000 0.000 0.000 0.010 0.000 0.000 0.059 0.000 0.000 0.000 G A H -L EV JR -G A R C G H JR -G A R C G H JR -T -G A R C N H A -G G A ED R C H N A G A R C H N -T A G A R C H G -G A ED ST 0.000 0.007 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.007 0.000 0.257 0.000 0.000 0.000 G R G A R C H R C H R G A A R C G R C 0.000 0.000 0.000 0.000 0.001 0.026 0.007 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 A G GARCH GARCH-T GARCH-GED RGARCH RGARCH-LEV GJR-GARCH GJR-GARCH-T GJR-GARCH-GED NAGARCH NAGARCH-T NAGARCH-GED GAS-T GAS-GED Realized GARCH Realized GARCH-T Realized GARCH-GED G A R C H H -T -G ED Table 15: Diebold-Mariano test with the Log likelihood , Set A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 - Table 16: Diebold-Mariano test with FMAE as forecast error, Set A. Page 27 G A R ea liz 0.416 0.117 0.245 0.000 0.414 0.005 0.086 0.415 0.120 0.245 0.002 0.284 0.000 0.000 0.000 0.021 0.311 0.437 0.000 0.284 0.263 0.301 0.475 0.020 0.317 0.438 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.143 0.179 0.024 0.415 0.191 0.000 0.005 0.004 0.009 0.024 0.408 0.191 0.010 0.301 0.000 0.000 0.000 0.083 0.356 0.419 0.000 0.086 0.002 0.009 0.082 0.361 0.420 0.005 0.475 0.000 0.000 0.000 0.000 0.002 0.001 0.000 0.415 0.308 0.024 0.082 0.003 0.001 0.000 0.020 0.000 0.000 0.000 0.003 0.000 0.014 0.000 0.120 0.160 0.408 0.361 0.003 0.014 0.000 0.317 0.000 0.000 0.000 0.001 0.012 0.000 0.000 0.245 0.277 0.191 0.420 0.001 0.014 0.000 0.438 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.002 0.002 0.010 0.005 0.000 0.000 0.000 0.000 0.000 0.000 0.000 ed R ea liz R C H A R C A R C H 0.309 0.157 0.278 0.000 0.414 0.004 0.002 0.308 0.160 0.277 0.002 0.263 0.000 0.000 0.000 G A R C H ed G A R R ea C liz H ed -T G A R C H -G ED R G A R C H -L G EV JR -G A R C G H JR -G A R C G H JR -T -G A R C N H A -G G A ED R C H N A G A R C H N -T A G A R C H G -G A ED ST 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 SG ED R G 0.001 0.012 0.000 0.245 0.278 0.191 0.419 0.001 0.014 0.000 0.000 0.437 0.000 0.000 0.000 -G ED G A 0.002 0.012 0.000 0.117 0.157 0.415 0.356 0.002 0.000 0.012 0.000 0.311 0.000 0.000 0.000 A GARCH GARCH-T GARCH-GED RGARCH RGARCH-LEV GJR-GARCH GJR-GARCH-T GJR-GARCH-GED NAGARCH NAGARCH-T NAGARCH-GED GAS-T GAS-GED Realized GARCH Realized GARCH-T Realized GARCH-GED H -T G 0.002 0.001 0.000 0.416 0.309 0.024 0.083 0.000 0.003 0.001 0.000 0.021 0.000 0.000 0.000 R C H G Modeling and forecasting volatility 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.143 0.094 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.179 0.094 - Table 17: Diebold-Mariano test with FMSE as forecast error, Set A. G.1.3 Error tests Below the results of the test on the residuals are depicted for data Set A. Here the results of the Kolmogorov-Smirnov (KS) test and the Ljung-Box test on both the residuals (LB res) and the squared residuals (LB res2 ) are depicted. Also the mean, standard deviation, and the shape (shape is only depicted if it exist for the given distribution) of the residuals a depicted in the tables. KS test LB test res LB test res2 Test Stat 1.000 0.002 4.404 P-val 0.0 0.965 0.036 Mu 0.010 Std 0.998 Table 18: Set A, error test, GARCH. KS test LB test res LB test res2 Test Stat 0.999 0.033 1.937 P-val 0.000 0.856 0.164 Mu 0.033 Std 0.849 Shape 7.306 Table 19: Set A, error test, GARCH-T. KS test LB test res LB test res2 Test Stat 1.000 0.005 3.017 P-val 0.000 0.943 0.082 Mu 0.042 Std 1.113 Shape 1.431 Table 20: Set A, error test, GARCH-GED. KS test LB test res LB test res2 Test Stat 1.000 2.044 37.208 P-val 0.000 0.153 0.000 Mu 0.003 Std 0.998 Table 21: Set A, error test, RGARCH. Page 28 Modeling and forecasting volatility KS test LB test res LB test res2 Test Stat 0.999 0.005 2.293 P-val 0.000 0.946 0.130 Mu 0.032 Std 0.859 Shape 7.760 Table 22: Set A, error test, RGARCH-LE. KS test LB test res LB test res2 Test Stat 1.000 0.005 3.019 P-val 0.000 0.946 0.082 Mu 0.011 Std 0.998 Table 23: Set A, error test, GJR-GARCH. KS test LB test res LB test res2 Test Stat 0.999 0.014 1.770 P-val 0.000 0.906 0.183 Mu 0.031 Std 0.861 Shape 7.885 Table 24: Set A, Error test,GJR-GARCH-T. KS test LB test res LB test res2 Test Stat 1.000 0.001 2.334 P-val 0.000 0.978 0.127 Mu 0.041 Std 1.135 Shape 1.462 Table 25: Set A, Error test, GJR-GARCH-GED. KS test LB test res LB test res2 Test Stat 1.000 0.002 4.405 P-val 0.000 0.965 0.036 Mu 0.010 Std 0.998 Table 26: Set A, Error test, NA-GARCH. KS test LB test res LB test res2 Test Stat 0.999 0.033 1.936 P-val 0.000 0.856 0.164 Mu 0.033 Std 0.849 Shape 7.306 Table 27: Set A, Error test, NA-GARCHT . KS test LB test res LB test res2 Test Stat 1.000 0.005 3.016 P-val 0.000 0.943 0.082 Mu 0.042 Std 1.113 Shape 1.431 Table 28: Set A, Error test, NA-GARCH-GED. KS test LB test res LB test res2 Test Stat 0.999 0.000 3.519 P-val 0.000 0.982 0.061 Mu 0.034 Std 0.839 Shape 6.810 Table 29: Set A, Error test, T-GAS. Page 29 Modeling and forecasting volatility KS test LB test res LB test res2 Test Stat 1.000 0.000 2.738 P-val 0.000 0.987 0.098 Mu 0.043 Std 1.110 Shape 1.425 Table 30: Set A, Error test, GAS-GED. KS test LB test res LB test res2 Test Stat 1.000 0.001 2.722 P-val 0.000 0.974 0.099 Mu 0.013 Std 0.990 Table 31: Set A, Error test, Realized GARCH. KS test LB test res LB test res2 Test Stat 0.997 0.001 2.722 P-val 0.000 0.974 0.099 Mu 0.033 Std 0.882 Shape 9.857 Table 32: Set A, Error test, Realized GARCH-T. KS test LB test res LB test res2 Test Stat 0.934 0.001 2.599 P-val 0.000 0.980 0.107 Mu 0.037 Std 1.198 Shape 1.573 Table 33: Set A, Error test, Realized GARCH-GED. G.2 Set B In this part of the Appendix the parameter results of data Set B are depicted. Here the validation sample is from 2007 up to and including 2010/04/19, this was one day before the deep water horizon scandal. The test sample of Set B one year (i.e. 252 days) from 2010/04/20 onwards. Page 30 Modeling and forecasting volatility G.2.1 Estimates ω α β ν γ GARCH Coefficient P-value 0.0586 0.018 0.0901 0.000 0.8966 0.000 GARCH-GED Coefficient P-value 0.058 0.031 0.084 0.000 0.902 0.000 1.516 0.000 GARCH-T Coefficient P-value 0.057 0.031 0.084 0.000 0.906 0.000 1.516 0.000 RGARCH Coefficient P-value 1.678 0.000 0.817 0.000 RGARCH-LEV Coefficient P-value 0.104 0.003 0.063 0.000 0.905 0.000 GJR-GARCH Coefficient P-value 0.062 0.009 0.038 0.030 0.905 0.000 GJR-GARCH-T Coefficient P-value 0.067 0.017 0.039 0.034 0.905 0.000 9.161 0.000 0.076 0.000 GJR-GARCH-GED Coefficient P-value 0.065 0.017 0.038 0.041 0.905 0.000 1.540 0.010 0.079 0.000 NAGARCH Coefficient P-value 0.059 0.019 0.090 0.000 0.897 0.000 NAGARCH-T Coefficient P-value 0.057 0.033 0.080 0.000 0.906 0.000 8.569 0.000 0.000 0.500 NAGARCH-GED Coefficient P-value 0.058 0.032 0.084 0.000 0.902 0.000 1.516 0.000 0.000 0.500 GAS-T Coefficient P-value 0.016 0.071 0.199 0.000 0.986 0.000 10.561 0.000 GAS-GED Coefficient P-value 0.014 0.082 0.183 0.000 0.987 0.000 1.557 0.000 9.135 0.000 ρ δ -0.022 0.019 -2.271 0.004 0.082 0.003 0.000 0.500 Table 34: Estimates of parameters of the different models Realized GARCH Realized GARCH-T Realized GARCH-GED coeff pval coeff pval coeff pval θ1 0.037 0.101 0.025 0.263 0.030 0.203 θ2 2.956 0.000 3.025 0.000 3.036 0.000 θ3 3.108 0.000 3.190 0.000 3.219 0.000 θ4 0.070 0.500 0.025 0.500 0.026 0.469 θ5 0.005 0.456 -0.004 0.472 0.002 0.486 θ6 -0.850 0.000 -0.851 0.000 -0.850 0.000 θ7 -0.092 0.500 -0.026 0.500 -0.039 0.453 θ8 0.139 0.500 0.071 0.500 0.084 0.399 θ9 θ10 2.459 0.000 0.888 0.000 2.436 0.000 0.729 0.000 Table 35: θ estimates for set B. Coefficient Realized GARCH Realized GARCH-T Realized GARCH-GED ω 0.037 0.025 0.030 β 0.525 0.529 0.534 γ 0.449 0.451 0.444 ξ 0.070 0.025 0.026 φ 1.005 0.996 1.002 σ 0.427 0.427 0.427 τ1 -0.092 -0.026 -0.039 τ2 0.139 0.071 0.084 ν1 ν2 13.695 1.710 13.432 1.622 Table 36: Set B: Estimates of parameters of the different Realized GARCH models. Page 31 Modeling and forecasting volatility G JR G H -G A R C G H JR -T -G A R C N H A -G G A ED R C H N A G A R C H N -T A G A R C H G -G A ED ST G A R ea liz 0.007 0.005 0.005 0.009 0.008 0.006 0.005 0.007 0.005 0.005 0.005 0.005 0.005 0.004 0.004 0.102 0.328 0.476 0.009 0.129 0.070 0.303 0.102 0.327 0.476 0.445 0.450 0.261 0.326 0.496 0.200 0.091 0.091 0.008 0.129 0.054 0.038 0.200 0.091 0.091 0.155 0.083 0.481 0.103 0.173 0.039 0.261 0.143 0.006 0.070 0.054 0.211 0.039 0.263 0.143 0.122 0.134 0.164 0.477 0.348 0.028 0.187 0.338 0.005 0.450 0.083 0.134 0.189 0.028 0.186 0.339 0.481 0.234 0.270 0.463 0.351 0.191 0.228 0.005 0.261 0.481 0.164 0.188 0.351 0.191 0.228 0.247 0.234 0.039 0.029 0.124 0.043 0.032 0.007 0.102 0.200 0.039 0.031 0.043 0.032 0.070 0.028 0.351 0.061 0.097 0.043 0.058 0.181 0.005 0.327 0.091 0.263 0.470 0.043 0.181 0.064 0.186 0.191 0.390 0.415 0.032 0.182 0.199 0.005 0.476 0.091 0.143 0.233 0.032 0.181 0.428 0.339 0.228 0.283 0.480 0.070 0.065 0.428 0.005 0.445 0.155 0.122 0.281 0.070 0.064 0.428 0.481 0.247 0.252 0.457 ed SG ED JR R C -G A H A R C H -L E -G ED R C H H -T A R C R C 0.031 0.472 0.233 0.005 0.303 0.038 0.211 0.031 0.470 0.233 0.281 0.189 0.188 0.382 0.426 G A R C H ed G A R R ea C liz H ed -T G A R C H -G ED R G A 0.032 0.182 0.005 0.476 0.091 0.143 0.233 0.032 0.181 0.199 0.428 0.338 0.228 0.283 0.480 R ea liz R G 0.043 0.182 0.005 0.328 0.091 0.261 0.472 0.043 0.058 0.182 0.065 0.187 0.191 0.390 0.416 A GARCH GARCH-T GARCH-GED RGARCH RGARCH-L GJR-GARCH GJR-GARCH-T GJR-GARCH-GED NA-GARCH NA-GARCH-T NA-GARCH-GED GAS-T GAS-GED Realized GARCH Realized GARCH-T Realized GARCH-GED V G A 0.043 0.032 0.007 0.102 0.200 0.039 0.031 0.124 0.043 0.032 0.070 0.028 0.351 0.061 0.097 R C H G DM-test results G G.3 0.061 0.390 0.283 0.004 0.326 0.103 0.477 0.382 0.061 0.390 0.283 0.252 0.270 0.039 0.081 0.097 0.416 0.480 0.004 0.496 0.173 0.348 0.426 0.097 0.415 0.480 0.457 0.463 0.029 0.081 - G A R R R 0.008 0.006 0.006 0.112 0.025 0.014 0.016 0.008 0.006 0.006 0.000 0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.112 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.003 0.000 0.001 0.001 0.000 0.110 0.178 0.160 0.003 0.000 0.001 0.180 0.018 0.019 0.018 0.006 0.003 0.004 0.000 0.000 0.019 0.026 0.024 0.006 0.003 0.004 0.026 0.018 0.144 0.239 0.006 0.003 0.004 0.000 0.000 0.019 0.026 0.024 0.006 0.003 0.004 0.030 0.019 0.144 0.184 0.006 0.003 0.004 0.000 0.000 0.018 0.026 0.024 0.006 0.003 0.004 0.027 0.018 0.239 0.184 - 0.365 0.335 0.383 0.014 0.000 0.004 0.029 0.365 0.335 0.383 0.161 0.178 0.026 0.026 0.026 0.439 0.395 0.448 0.016 0.000 0.001 0.029 0.439 0.396 0.447 0.150 0.160 0.024 0.024 0.024 0.002 0.314 0.490 0.008 0.000 0.389 0.365 0.439 0.315 0.491 0.035 0.003 0.006 0.006 0.006 0.316 0.483 0.106 0.006 0.000 0.457 0.335 0.396 0.315 0.106 0.007 0.000 0.003 0.003 0.003 0.491 0.103 0.077 0.006 0.000 0.399 0.383 0.447 0.491 0.106 0.018 0.001 0.004 0.004 0.004 0.035 0.007 0.018 0.000 0.000 0.117 0.161 0.150 0.035 0.007 0.018 0.180 0.026 0.030 0.027 G A d ea liz e ed G A R C ea liz ea liz ed SG ED R C A -T H H R C H 0.389 0.457 0.399 0.025 0.000 0.004 0.001 0.389 0.457 0.399 0.117 0.110 0.019 0.019 0.018 R C R G A R C H -L G EV JR -G A R C G H JR -G A R C G H JR -T -G A R C N H A -G G A ED R C H N A G A R C H N -T A G A R C H G -G A ED ST H -T R G 0.489 0.103 0.006 0.000 0.399 0.383 0.448 0.490 0.106 0.077 0.018 0.001 0.004 0.004 0.004 H G A R C 0.315 0.103 0.006 0.000 0.457 0.335 0.395 0.314 0.483 0.103 0.007 0.000 0.003 0.003 0.003 G A R C G A 0.315 0.489 0.008 0.000 0.389 0.365 0.439 0.002 0.316 0.491 0.035 0.003 0.006 0.006 0.006 H -G ED R C G A GARCH GARCH-T GARCH-GED RGARCH RGARCH-L GJR-GARCH GJR-GARCH-T GJR-GARCH-GED NA-GARCH NA-GARCH-T NA-GARCH-GED GAS-T GAS-GED Realized GARCH Realized GARCH-T Realized GARCH-GED H -G ED Table 37: Diebold-Mariano test with the Log likelihood for set B. -G R C A G A liz ed G R R ea ed R ea liz liz ea H -T R C R C A G ed ED G SG A G A H H ED -G ST R C H A R C N A G A N A G A N A G H -T -G R C H R C A G JR -G A G JR -G A -G G JR H -T H R C H R C -L H R G A R C H R C R G A R C H -T G A G A R C H H R C A G GARCH GARCH-T GARCH-GED RGARCH RGARCH-L GJR-GARCH GJR-GARCH-T GJR-GARCH-GED NA-GARCH NA-GARCH-T NA-GARCH-GED GAS-T GAS-GED Realized GARCH Realized GARCH-T Realized GARCH-GED -G ED EV ED ED Table 38: Diebold-Mariano test with FMAE as forecast error for Set B. 0.411 0.489 0.033 0.003 0.324 0.438 0.401 0.054 0.412 0.489 0.448 0.046 0.496 0.494 0.495 0.411 0.292 0.026 0.004 0.385 0.483 0.451 0.411 0.447 0.292 0.393 0.004 0.468 0.465 0.466 0.489 0.292 0.028 0.003 0.352 0.452 0.420 0.489 0.295 0.279 0.435 0.009 0.493 0.491 0.492 0.033 0.026 0.028 0.179 0.068 0.055 0.059 0.033 0.026 0.028 0.009 0.018 0.009 0.009 0.009 0.003 0.004 0.003 0.179 0.001 0.001 0.001 0.003 0.004 0.003 0.007 0.003 0.006 0.006 0.006 0.324 0.385 0.352 0.068 0.001 0.073 0.053 0.324 0.385 0.352 0.387 0.189 0.412 0.409 0.411 0.438 0.483 0.452 0.055 0.001 0.073 0.124 0.438 0.483 0.452 0.443 0.226 0.472 0.470 0.471 0.401 0.451 0.420 0.059 0.001 0.053 0.124 0.400 0.451 0.420 0.426 0.217 0.454 0.452 0.453 0.054 0.411 0.489 0.033 0.003 0.324 0.438 0.400 0.411 0.488 0.448 0.046 0.496 0.494 0.495 0.412 0.447 0.295 0.026 0.004 0.385 0.483 0.451 0.411 0.295 0.392 0.004 0.467 0.464 0.466 0.489 0.292 0.279 0.028 0.003 0.352 0.452 0.420 0.488 0.295 0.435 0.009 0.493 0.491 0.492 0.448 0.393 0.435 0.009 0.007 0.387 0.443 0.426 0.448 0.392 0.435 0.272 0.433 0.437 0.434 0.046 0.004 0.009 0.018 0.003 0.189 0.226 0.217 0.046 0.004 0.009 0.272 0.271 0.266 0.268 0.496 0.468 0.493 0.009 0.006 0.412 0.472 0.454 0.496 0.467 0.493 0.433 0.271 0.473 0.465 0.494 0.465 0.491 0.009 0.006 0.409 0.470 0.452 0.494 0.464 0.491 0.437 0.266 0.473 0.483 0.495 0.466 0.492 0.009 0.006 0.411 0.471 0.453 0.495 0.466 0.492 0.434 0.268 0.465 0.483 - Table 39: Diebold-Mariano test with FMSE as forecast error, Set B. Page 32 Modeling and forecasting volatility G.3.1 Error tests Below the results of the test on the residuals are depicted for data Set B. Here the result of the Kolmogorov-Smirnov (KS) test and the Ljung-Box test on both the residuals (LB res) and the squared residuals (LB res2 ) are depicted. Also the mean, standard deviation, and the shape (shape is only depicted if it exist for the given distribution) of the residuals a depicted in the tables. KS test LB test res LB test res2 Test Stat 1.000 0.910 4.361 P-val 0.000 0.340 0.037 Mu 0.018 Std 0.996 Table 40: Set B, error test, GARCH. Table 41: Diebold Mariano tests Realized GARCH, Set B KS test LB test res LB test res2 Test Stat 0.999 1.043 3.925 P-val 0.000 0.307 0.048 Mu 0.040 Std 0.867 Shape 8.348 Table 42: Set B, error test, GARCH-T. KS test LB test res LB test res2 Test Stat 1.000 0.985 4.117 P-val 0.000 0.321 0.042 Mu 0.054 Std 1.144 Shape 1.478 Table 43: Set B, error test, GARCH GED. KS test LB test res LB test res2 Test Stat 1.000 4.814 16.600 P-val 0.000 0.028 0.000 Mu 0.013 Std 0.997 Table 44: Set B, error test, RGARCH. KS test LB test res LB test res2 Test Stat 0.999 0.966 4.431 P-val 0.000 0.326 0.035 Mu 0.038 Std 0.876 Shape 8.887 Table 45: Set B, error test, RGARCH-LE KS test LB test res LB test res2 Test Stat 1.000 0.994 4.006 P-val 0.000 0.319 0.045 Mu 0.018 Std 0.996 Table 46: Set B, error test, GJR-GARCH. Page 33 Modeling and forecasting volatility KS test LB test res LB test res2 Test Stat 0.999 1.044 3.961 P-val 0.000 0.307 0.047 Mu 0.038 Std 0.876 Shape 8.906 Table 47: Set B, error test, GJR-GARCH-T. KS test LB test res LB test res2 Test Stat 1.000 1.027 3.974 P-val 0.000 0.311 0.046 Mu 0.052 Std 1.160 Shape 1.502 Table 48: Set B, error test, GJR-GARCH-GED. KS test LB test res LB test res2 Test Stat 1.000 0.910 4.361 P-val 0.000 0.340 0.037 Mu 0.018 Std 0.996 Table 49: Set B, error test, NA-GARCH. KS test LB test res LB test res2 Test Stat 0.999 1.044 3.922 P-val 0.000 0.307 0.048 Mu 0.040 Std 0.867 Shape 8.347 Table 50: Set B, error test, NA-GARCH-T. KS test LB test res LB test res2 Test Stat 1.000 0.985 4.117 P-val 0.000 0.321 0.042 Mu 0.054 Std 1.144 Shape 1.477 Table 51: Set B, error test, NA-GARCH-GED-T. KS test LB test res LB test res2 Test Stat 0.999 0.931 5.102 P-val 0.000 0.335 0.024 Mu 0.044 Std 0.844 Shape 7.253 Table 52: Set B, error test, T-GAS. KS test LB test res LB test res2 Test Stat 1.000 0.918 3.946 P-val 0.000 0.338 0.047 Mu 0.054 Std 1.144 Shape 1.476 Table 53: Set B, error test, GAS-GED. KS test LB test res LB test res2 Test Stat 1.000 0.942 2.010 P-val 0.000 0.332 0.156 Mu 0.024 Std 0.982 Table 54: Set B, error test, Realized GARCH. Page 34 Modeling and forecasting volatility KS test LB test res LB test res2 Test Stat 0.983 0.906 1.964 P-val 0.000 0.341 0.161 Mu 0.039 Std 0.910 Shape 13.483 Table 55: Set B, error test, Realized GARCH-T. KS test LB test res LB test res2 Test Stat 0.980 0.828 1.969 P-val 0.000 0.363 0.161 Mu 0.044 Std 1.199 Shape 1.656 Table 56: Set B, error test, Realized GARCH-GED. G.4 Plots G.4.1 Set A Figure 8: Set A, Results vs Kernel. Page 35 Modeling and forecasting volatility Figure 9: Set A, Results vs Returns. G.4.2 Set B Figure 10: Set B, Results vs Kernel. Page 36 Modeling and forecasting volatility Figure 11: Set B, Results vs Returns. Page 37