An Introduction to Time Series Forecasting

advertisement
Scott Nelson
July 29, 2008
Outline of Presentation
 Introduction to Quantitative Finance
 Time Series Concepts
 Stationarity, Autocorrelation, Time Series Models
 Univariate Volatility Models
 Stylized facts about return series
 GARCH
 Multivariate Volatility Models
 Moving averages
 EWMA
 Dynamic Conditional Correlation (DCC)
Motivation from Quant Finance
 Most of the stuff in this talk is motivated by problems
from quantitative finance
 Financial econometrics is one part of a larger field
which goes under various names (quantitative finance,
mathematical finance, computational finance, etc)
 The field applies quantitative models and theories to
solve problems in the financial markets
 Some questions we can answer better than others
 What will be the closing price of IBM tomorrow?
 What is the fair price today of a call option on IBM,
expiring in 3 months with a strike price of $57?
Motivation From Finance
 Other examples (Alexander, 2000)
 What is the volatility forecast for asset XYZ? Need this
to price options written on the asset (option pricing)
 How can we optimally structure our positions to
minimize our risk? (portfolio optimization)
 What is the overall risk exposure of our firm, so we can
set aside adequate capital reserves? (value at risk)
 All of these questions depend on modeling and
forecasting of volatility and correlations of asset prices
Efficient Market Hypothesis
 Standard economic theory states that stock price
movements are unpredictable
 Efficient market hypothesis: prices completely reflect all
available information
 If the future price of the stock is expected to increase, the
current stock price will fully adjust to account for this
 Since future news is unpredictable (by definition), future
price movements are also unpredictable (follow a random
walk)
 According to the weakest form of this theory, it is
impossible to make consistent above-average returns by
studying only the historical price
The Statistical Approach to QF
 We observe a sequence of asset prices at discrete
points in time,{ pt }, t  1,...., T
 They are modeled as random variables using
techniques from time series analysis
Time Series Concepts - Stationarity
 We observe a univariate time series Y  { yt }, t  1,...., T
 Most time series models assume Y is stationary
 A time series is covariance stationary if it has a
constant mean, variance and autocovariances
 In other words the distribution is “invariant to time
shift”
 If Y is nonstationary, we can difference it to make it
stationary
Time Series Concepts - Autocorrelation
 We can define the correlation between the current
value of yt and it’s lagged value yt i :
Cov( yt , yt i )
Cov( yt , yt i )
i 

if Var ( yt )  Var ( yt i )
Var ( yt )
Var ( yt )Var ( yt i )
 A consistent finite sample estimate is given by:
T
ˆ i 
 (r  r )(r
t i
t
T
t i
 r)
2
(
r

r
)
 t
t 1
, 0  i  T -1
Time Series Concepts - Models
 Model Y as a linear combination of its’ lagged values (AR)
+past errors (MA) + contemporaneous error
AR(p)
MA(q)
ARMA(p,q)
p
yt    i yt i   t
i 1
Q
yt    i t i   t
i 1
P
yt     t i yt 1   t i t 1   t
i 1
ARIMA(p,1,q)
Q
j 1
P
Q
i 1
j 1
zt  yt  yt 1 , zt     t i zt 1   t i t 1   t
 Traditionally we assume  t ~ N (0,  2 )
 Parameter estimation via maximum likelihood
 Model selection can be done based on goodness of fit stats
The Statistical Approach to QF
 What to model: prices or returns?
 Prices are nonstationary
 Define the return, log rt  log( pt / pt 1 )
 Log returns are stationary and approximately normally
distributed with a mean of 0 and a possibly time varying
variance
Stylized Facts About Returns
 Returns difficult to predict
 Volatility is time-varying with persistent
autocorrelation
 Positive skewness in the distribution of returns (long
left tail)
 Extreme crashes
 Fat tails in the distribution of returns
 Fatter than a normal distribution would suggest
Stylized Facts About Returns
What is Volatility?
 Volatility = variance
 Volatility is a measure of the variability of the returns
 Need to distinguish between unconditional volatility
and conditional volatility.
 Volatility cannot be directly observed
 As a proxy we take Squared Returns
 Engle (1981) noticed that volatility of time series clusters,
and could be modeled using an ARMA-type process
Univariate Volatility Modeling
Univariate Volatility Modeling
 Bollerslev (1987) extended Engle’s model to the now
familiar GARCH model:
Yt  μ  e t
(Mean equation)
(Error term with
conditional variance)
e t ~ N(0, h t )
q
h t  α0   α e
i 1
2
i t i
p
  β jh t  j
(Conditional variance
equation)
j1
 Parameter estimation via maximum likelihood
Conditional Correlation
Multivariate Models
 Why are multivariate models better than just building
a bunch of univariate models?
 Multivariate models allow the analyst to model the
important variables in the system together
 These models allow for dynamic relationships between
the variables (more realistic)
Data Used in this Section
What is Correlation?
 The unconditional correlation between 2 r.v. each with
mean 0 is:
Cov(r1 , r2 )
 12
12 

Var (r12 )Var (r22 )  1 2
 This is the covariance standardized to lie in [-1,1]
 Here we are assuming there exists a “true” correlation,
and the observed correlation at any time is just
random variation around this
 If instead we believe the correlation is time varying
then we would have
12,t
 12,t


2
2
Var (r1,t )Var (r2,t )  1,t 2,t
Cov(r1,t , r2,t )
Time Varying Models of Correlation
Moving averages
1.


Advantage: simplest approach
Problem: equal weight to all the history, need to select
window size
Exponentially weighted moving averages
2.


Advantage: uses all the history, recent history given more
weight than older history
Disadvantage: need to select smoothing parameter, the
model yields restrictive dynamics
Multivariate GARCH
3.


Advantage: realistic dynamics informed by the data
Disadvantage: can be difficult to ensure covariance matrix is
positive definite
Moving Average of Correlation
 Instead of averaging over the entire sample, we can use a
rolling window estimate of correlation

t 1
ˆ12,t 

t 1
r r
s t  n 1 1, s 2, s
2
r
s t  n 1 1, s

t 1
2
r
s t  n 1 2, s

 This depends on an appropriate window size (n)
 Small values of n will result in a choppy correlation
 Large value of n will smooth out the correlation
 Old observations have the same weight as recent values
 When an old observation drops out of the window, we will
see a large change in the correlation, even though nothing
has happened recently
Moving Average
EWMA of Correlation
 Exponentially weighted moving average (EWMA) is
usually written as
ˆ122 ,t
ˆ12,t 
where ˆ122 ,t  (1   )r1,t 1r2,t 1  ˆ122 ,t 1
ˆ12,tˆ 22,t
ˆ12,t  (1   )r12,t 1  ˆ12,t 1
ˆ 22,t  (1   )r22,t 1  ˆ 22,t 1
 Nice thing about this is it uses the entire history, and
attaches exponentially decreasing weights to the
observations
 In other words recent history counts more than old history
 Larger lambda -> smoother estimate
Impact of Lambda
EWMA vs. MA50
EWMA reacts
more quickly
Generalizing to n-Dimensions
 OK that’s great but most likely our portfolio has more
than 2 assets – 1000’s of assets is more realistic
 How do we generalize this to n dimensions?
 This is most easily expressed in matrix notation
rt | t 1 ~ N (0, H t )
H t is k  k time - varying covariance matrix
 11   1k 


Ht      
 k1   kk 


H t must be positive definite
Curse of Dimensionality
 Consider the case of k=2
 h11,t h12,t 
Ht  
, where h12,t  h21,t

h21,t h22,t 
 In the most general form we need to estimate 21 parameters
h11,t  1  11 12,t 1  12 22,t 1  13 1,t 1 2,t 1
 11h11,t 1  12 h12,t 1  13h12,t 1
h12,t  2   21 12,t 1   22 22,t 1   23 1,t 1 2,t 1
  21h11,t 1   22h12,t 1   23h12,t 1
h 22,t  3   31 12,t 1   32 22,t 1   33 1,t 1 2,t 1
  31h11,t 1   32 h12,t 1   33h12,t 1
 For 100 assets we need to estimate 51,010,050 parameters
Conditional Variance and Conditional Correlation
Following Engle (2002), the conditiona l correlatio n between
two r.v. each with mean 0 is :
Et 1 (r1,t , r2,t )
12,t 
Et 1 (r12,t ) Et 1 (r22,t )
Recall the conditiona l variance is defined as
hi ,t  Et 1 (ri 2,t )
We can define the return as a N(0,1) scaled by it' s conditiona l variance
ri,t  hi ,t  i ,t ,  i ,t ~ N (0,1), i  1,2
Therefore we can write the conditiona l correlatio n as
Et 1 ( 1,t ,  2,t )
12,t 
 Et 1 ( 1,t ,  2,t )
2
2
Et 1 ( 1,t ) Et 1 ( 2,t )
which is the conditiona l covariance of the disturbanc es
Dynamic Conditional Correlation
The model is :
rt |  t 1 ~ N (0, H t )
H t  Dt Rt Dt
Rt  diag (Qt ) 1 Qt diag (Qt ) 1
Qt  (1     )Q   ( t -1 t -1 )  Qt 1
 t  Dt1rt
where
H t is the time varying covariance matrix
D t is a diagonal matrix of time - varying
standard deviations from univariate GARCH models
Rt is the time - varying correlatio n matrix
Qt is the time - varying covariance matrix of the
standardiz ed residuals
 t are the standardiz ed returns
Dynamic Conditional Correlation
 Estimation procedure:
1. Estimate univariate GARCH models for all k assets
2. Standardize the returns by the estimated std. dev.
3. Estimate Rt from the standardized returns, using a
simple model
Example: 2 asset case
 Step 1: Construct Dt from the elements of the
univariate GARCH models
 h
1,t
Dt  
 0
0 

h2,t 
Example: 2 asset case
 The covariance matrix Ht can be decomposed as:
 h
0   1 12   h1,t
0 
1,t



Dt Rt Dt  

 0
h2,t    21 1   0
h2,t 
 h
0   h1,t
12 h2,t 
1,t



 0
h2,t    21 h1,t
h2,t 

h12,t
h1,t 12 h2,t 


2
 h2,t  21 h1,t

h2,t
 12,t


2



1,t
1,t 2 ,t 

  12,t


1
,
t
2
,
t


  21,t
  21,t
2



1,t 2 ,t
2 ,t
 

 2,t 1,t

 12,t 

 22,t 
Example: 2 asset case
 Step 2: construct standardized residuals matrix
1 /  1,t
t  D r  
 0
1
t t
 r1,t 
0   r1,t    1,t 
    r 
1 /  2,t  r2,t   2,t 
 
 2 ,t 
Example: 2 asset case
 Recall from the previous discussion that:
 1
Et 1 ( t  t )  Rt  
  21,t

 1
12,t  


1   q12,t

 q1,t q2,t


q1,t q2,t 

1 

q12,t
 Give each ρi,j,t a simple GARCH(1,1) type structure:
qi , j ,t   i , j 1         i ,t 1 j ,t 1    qi , j ,t 1 .
Example: 2 asset case
 Step 3: estimate R.
 In multivariate form
Qt  (1     )Q   ( t 1 t 1 )  Qt 1
 Q is the unconditional covariance matrix of the
returns/residuals
 Variance targeting:
 Pre-estimate Q and then calibrate α, β during
estimation of Rt
Example: 2 asset case
 Kevin Sheppard’s UCSD GARCH toolbox, available at
http://www.kevinsheppard.com/wiki/UCSD_GARCH
Example: 2 asset case
 Estimated coefficients
DCC Results: VCOV Plots
-4
8
-4
Nasdaq Variance
x 10
6
Nasdaq-DJ Covariance
x 10
6
4
4
2
2
0
0
2000
4000
6000
8000
10000
0
0
2000
-3
1
4000
6000
8000
10000
8000
10000
Dow Jones Variance
x 10
0.8
0.6
0.4
0.2
0
0
2000
4000
6000
DCC vs. EWMA
Advantages & Disadvantages of DCC
 Advantages
 Relatively easy to estimate
 Should work for large dimensional covariance matrices
 More flexible dynamics than exponential smoothing
 Disadvantages
 Imposes the same dynamics on all the assets
Conclusions
1.
2.
3.
4.
5.
Practical problems in finance require forecasts of
conditional variances and conditional
covariances/correlations
Univariate GARCH models can provide forecasts of
conditional variances
Conditional correlation forecasts are plagued by the
curse of dimensionality
Simple methods are widely used (rolling window,
EWMA) but they lack a firm statistical basis
The DCC estimator offers a practical multivariate
GARCH framework that overcomes some of these
problems
THANKS!
Download