Chapter 1 of Stokes (200x) - Houston H. Stokes Page

Preliminary Draft July 2013
The Essentials of Time Series Modeling: An Applied Treatment with
Emphasis on Topics Relevant to Financial Analysis
Houston H. Stokes
Professor of Economics
University of Illinois at Chicago
hhstokes@uic.edu
PREFACE .................................................................................................................................................................... 1
Chapter 1 ...................................................................................................................................................................... 2
1.1 OVERVIEW ...................................................................................................................................................... 2
Figure 1.1-1 GAM Smoothed AGE series ............................................................................................................. 9
Figure 1.1-2 GAM Smoothed BDEFICIT series .................................................................................................. 9
1.2 Plan of Book ..................................................................................................................................................... 10
Table 1.1 Useful Concepts.................................................................................................................................. 12
PREFACE
This monograph is intended to give the reader the essentials of modern times series
analysis without emphasis on proving theorems. The focus is on illustrating theorems with actual
problems with the B34S® software and other programs such as SCA and RATS. Actual and
generated datasets are used. The reader can cut and paste self contained code to extend the
examples. If the user does not have access to a computer, the output of the examples can be
studied for insight. This monograph should be read in conjunction with Stokes (1997, 200x),
Stokes-Neuburger (1998), Enders (1995, 2004, 2010) and other time series texts such as Liu
(2005). While Stokes (1997, 200x) covers a wide range of topics, this monograph's goal is to
expand on material in Stokes (1997, 200x), particularly chapter 7, 8, 14 and 15. A major
objective is to show the setups and output of a wide range of problems and discuss the results. I
learned many years ago from Henri Theil, it is only through practice, through looking at
printouts and results, that one gets a real feel of how to apply econometrics to real problems. The
goal is to facilitate this process by showing how time series methods can be thought of as an
extension of traditional econometric analysis.
Many people have assisted me in the writing phase of this study My wife, Diana A.
Stokes has helped in the editing phase of the document and provided continuing and invaluable
support. Jin-Man Lee has helped in the testing of the matrix command and advised on content.
My students in Economics 537 and 538 have helped me improve the exposition and the focus.
This document, however, is still a work in progress.
Houston H. Stokes
September 2007
Chapter 1 in The Essentials of Time Series Modeling: An Applied Treatment with Emphasis on
Topics Relevant to Financial Analysis © by Houston H. Stokes 16 July 2013. All rights
reserved. Preliminary Draft
Chapter 1
1.1 OVERVIEW
This overview will give you a brief “road map” of how modern time series analysis fits
into general econometric modeling. In this overview, not all details are given nor are terms
completely defined. As the monograph proceeds, please refer to this section to see were you are
going, where you came from and how much you now understand.1 At every step computer
applications are shown to see how the theory is applied. These applications use the B34S®
software. Most sample problems are variants of examples distributed with the B34S so that they
can be modified by users for further experimentation. If this is not the case, since the examples
are complete, they can be "cut" out of this document and run. The goal of this overview is to
provide a rough outline of how time series fits into applied Econometric Analysis.
The basic OLS model for k input series2 is
k
yt      k xt  j i , i  ut
(1.1-1)
i 1
(1.1-1) assumes no serial correlation E (ut ut k )  0 for k>0 , no heteroskedasticity (constant
residual variance), and that all coefficients are stable over time and independent of the level of
the explanatory variable. Define B as the backshift operator such that Bk xt  xt  k . Many books
use L in place of B. For k > 0. The simple GLS model for one input series (here xt  xt ,1 ) is
yt
   1 xt   2 zt  [ut /(1  1B  ..  k Bk )]
yt (1  1B  ..  k Bk )   (1  1B  ..  k Bk ) 
1 xt (1  1B  ..  k Bk ) 
(1.1-2)
 2 zt (1  1B  ..  k B )  ut
k
1
Key references for these notes are: Enders (1995, 2004, 2010), Stokes (1997, 200x), Nelson (1973) and the major
work Box-Jenkins-Riensel (2008) that started everything in. Hamilton (1994) is a comprehensive but highly
technical reference. Much of modern Time Series has found its way into basic texts such as Greene (2000). A
simplified treatment is contained in Liu (2005). Finance applications of Time Series include Campbell-LoMacKinley (1997), Lo-MacKinley (1999) and Stokes-Neuburger (1998). This monograph attempts to summarize
key topics in these references with a major emphasis on applied examples. Until this note is removed, it should be
treated a "provisional text." The B34S matrix command and other time series commands are used to illustrate the
theory.
2
It is assumed that the reader has had a basic statistics course and knows the OLS model.
2
Equation (1.1-2) can be viewed as a special case of the more general rational distributed lag
model and the even more general transfer function model given in (1.1-8). A number of methods
can be used to estimate the GLS AR parameters. ML (Maximum likelihood) methods use
nonlinear estimation to estimate these terms jointly with the model. Two pass methods are
simpler but implicitly assume that the covariance between the AR terms and the other
coefficients in the model are zero. A subset of the transfer function model is the ARIMA(p,d,q)
model where q refers to the AR part of the model, d refers to the differencing part of the model
and q refers to the MA part of the model.. If we take the GLS model (1.1-2) and generalize the
error process to have q moving average terms and p autoregressive terms and no differencing
terms (d=0), we have
yt    1xt  2 zt  ut (1  1B,
,  p Bq ) /(1  1B,
, q B p )
(1.1-3)
It may be more parsimonious to express the error process as a ratio of polynomials than as many
AR terms. In general, if there is invertability, a MA(1) model can be written as an AR() model
and an AR(1) model can be written as a MA() model.
If we have only one series, then it may be possible to filter (remove the autocorrelation) of the
series with an ARIMA model which can be written from (1.1-3) as
yt  ut (1  1B,
,  p Bq ) /(1  1B,
, q B p )
(1.1-4)
of if differencing on yt is needed to obtain stationarity
yt (1  B)  ut (1  1B,
,  p Bq ) /(1  1B,
, q B p )
(1.1-5)
Note that (1.1-5) is a special case of the more general multiple series model (1.1-3). It will be
shown later that (1.1-3) is a special case of the more general transfer function model. The great
advantage of the ARMA model (1.1-5) is that it can be applied in cases where we do not have a
good theory on what series to use on the right hand side of a model such as (1.1-3) although
autocorrelation analysis indicates there is structure in the yt series. An ARMA model, if
correctly identified, captures the structure in the series, or filters the series to obtain white noise,
and can be used to forecast ahead. Forecasts of such models can be updated in the field without
having to re-estimate the model. In contrast, a model of the form yt  f ( xt ) with no lags can
only be used to forecast if we have future data on xt . This is often hard to achieve in practice
unless xt  k is an expectation.
In summery, the objective of the ARIMA model building is to select the appropriate
terms in the AR and MA part of the model so that there is no structure left in the error process.
The autocorrelation and partial correlation functions help us in this task. The Dickey-Fuller and
Phillips Perron tests can be used to determine if the series are stationary. Consider (1.1-6).
Unless the series 1 ( B)1 ( B) xt is stationary, the ACF and PACF of yt cannot be calculated. If a
stationary series is being used to predict a nonstationary series, the theory of the model is not
consistent with the state of the world. If a model between two nonstationary series is estimated,
the error may or may not be stationary. If the errors of the model are stationary, then the series
are assumed to be cointegrated or a linear combination of the two series is stationary. Unit root
analysis helps us in this task. A rational distributed lag model starts with (1.1-1) and adds a ratio
of polynomials for the coefficients. Using mathematical shorthand
yt    [1 ( B) / 1 ( B)]xt  [ 2 ( B) / 2 ( B)]zt  ut
(1.1-6)
where  i ( B) and i ( B ) are polynominals in the lag operator B. If we add an ARMA process
for the error from (1.1-3) we obtain
yt    [1 ( B) / 1( B)]xt  [ 2 ( B) / 2 ( B)]zt  ut [(1  1B ,
, q B p ) /(1  1 ,
, p B q )]
(1.1-7)
which can be written more generally and compactly as
yt    [1 ( B) / 1 ( B)]xt  [ 2 ( B) / 2 ( B)]zt  [ ( B) /  ( B)]ut
(1.1-8)
Equation (1.1-8) is a transfer function. It assumes that the dynamic relationship moves from x
and z to y. Feedback from y to x and y to z is ruled out. The noise model is an ARIMA model
that allows us to model systematic errors and improve the forecasting. Transfer function models
can be tested to see if there is feedback. If feedback is found, VAR, VMA or VARMA models
should be considered.
If we drop the assumption of no feedback, then the general VARMA model (see Stokes
(1997) page 197) becomes
G ( B ) Zt  D( B ) u t
(1.1-9)
where Zt’ is the tth observation on k series {x1 t ,
, xk t } and G(B) and D(B) are k by k
polynomial matrices in which each element, Gi j ( B ) and Di j ( B ) is itself a polynomial vector in
the lag operator B. Assuming k = 3, we can write (1.1-9) as
G11 ( B) G12 ( B) G13 ( B)   x1t   D11 ( B) D12 ( B) D13 ( B)  u1t 
 G ( B ) G ( B ) G ( B )   x    D ( B ) D ( B ) D ( B )  u 
22
23
22
23
 21
  2t   21
  2t 
G31 ( B) G32 ( B) G33 ( B)   x3t   D31 ( B) D32 ( B) D33 ( B)  u3t 
(1.1-10)
If Gi , j ( B )  Di , j ( B )  0 for i  j , then the above model reduces to three ARIMA models.
If Gi , j ( B )  Di , j ( B )  0 for i  j , then series 1 is exogenous to series 2 and series 2 is exogenous
to series 3. If Di , j ( B )  I , then we have an VAR model, while if Gi , j ( B )  I we have a VMA
4
model. It is important to note that while the VARMA form of the model may be the most
parsimonious form, if invertability is possible, a VARMA model can be written in VAR or VMA
form as [ D( B)]1 G( B) or [G( B)]1 D( B) respectively. If such a model were to be estimated, then
many parameters would turn out not to be significant due to covariance between the parameters.
Note that the transfer function model (1.1-8) is a special case of the more general VARMA
model (1.1-10) just as the ARIMA model is a special case of the ARIMA model.
The above models all assume constant variance of the error term or homoskedasticity.
Engle (1982) and others developed a class of models that drops this assumption and estimates
both the first moment and the second moment of a process. These approaches, that can be
applied to both univariate and multivariable models, will be sketched. Assume the variance of a
model of yt , conditional on all information known up to period t-1 t1 , can be written
V ( yt | t 1 )  0  1et21  ...  q et2q
(1.1-11)
where et is the error of the first moment equation. Such an ARCH model attempts to explain
variance clustering in the residuals. ARCH models imply nonlinear dependence among the
squared errors of the first moment model. If we define vt  V ( yt | t 1 ) , then the GARCH second
moment equation is:
vt  0  1et21  ...  q et2q  1vt 1  ...   p vt  p
(1.1-12)
which can be seen as an ARIMA (p,0,q) model on the second moment. Such a model can be
generalized to a transfer function. For example a GARCH-M(p,q), and MA(1) maximizes
.5*(log(vt )  et2 / vt )
where
yt  a   i 0  i Bi xt   j 1  j B j yt   (vt ) 2  et   et 1
k
m
(1.1-13)
vt  a0   j 1 a j et2 j   i 1 gi vt i
q
p
If we drop et from the first equation and assume either no input series or that the input series
starts at lag t-1, we can think of it as the expected value of yt conditional on information known
at period t-1. The second equation calculates the expected value of the squared error of the first
equation, conditional on information known up to period t-1 or Et 1et2  vt . If   0, k  1, m  1
we have a GARCH(1,1) model on the error term of a transfer function. If   0 we have a
GARCH(p,q)-M transfer function model since information from the second equation on the
second moment is feeding back into the first equation. Such models cannot be estimated using
the two pass method if we use Et 1et2  vt . If there is no input, then we just have a GARCH(p,q)
or GARCH(p,q)-M model depending on the value of  . Here we show the case where   0 .
yt  a   j 1 j B j yt  et
m
(1.1-14)
vt  a0   j 1 a j (et  j )2   i 1 gi vt i
q
p
If   0 , there is no feedback from the second moment equation to the first moment equation
and the GARCH(1,1) model can be estimated using either the joint (one pass) method or the two
pass method. The two pass method is computationally much simpler and was originally
suggested by Engle (1982). For a one variable models (ARIMA), the two pass approach involves
estimation of two ARIMA models. The first model is on the series. The second model is on the
square of the errors of the first moment model. The ARCH/GARCH class of models can be
extended to multiple series where it is called BGARCH or bivariate garch. This can be seen as a
VARMA model of the form of (1.1-10) on the second moment.
One main assumption of the VARMA model is that there is a multidimensional
hyperplane. This means that no matter what the values of the variables, the same coefficients are
used. This is a strong, but widely made, assumption, that if the model is nonlinear, will trigger
the nonlinearity tests in Stokes (200x chapter 8). One way to proceed is to use theory, or luck, to
parameterize a nonlinear model. This is very difficult, unless theory provides a clear formulation.
As an alternative MARS, GAM and ACE models attempt to automatically detect any
nonlinearity in the model. These approaches are covered in some detail, in Stokes (200x) and
will only be sketched here.
The MARS model, covered in more detail in Stokes (200x Chapter 14), drops the
assumption of a hyperplane. Variables on the right no longer have to be "switched on" all the
time. The VAR, AR, and OLS models become special cases of this more general nonlinear
representation that includes the TAR (threshold autoregressive model) as a special case.
Assume a nonlinear model of the form
y  f ( x1,
, xm )  e
(1.1-15)
involving N observations on m right-hand-side variables, x1 ,
approximate the nonlinear function f( ) by
, xm . A MARS model attempts to
s
fˆ ( X )   c j K j ( X ),
(1.1-16)
j 1
where fˆ ( X ) is an additive function of the product basis functions {K j ( X )}sj 1 associated with
the s sub regions {R j }sj 1 and c j is the coefficient for the j th product basis function. If all sub
regions include the complete range of each of the right-hand-side variables, then the coefficients
{c j }sj 1 can be interpreted as just OLS coefficients of variables or interactions among variables.
The B34S mars procedure can identify the sub regions under which the coefficients are stable
6
and detect any possible interactions up to a maximum number of possible interactions
controllable by the user. For example, assume the model
y    1 x  e
for x  100
(1.1-17)
    2 x  e for x  100
In terms of the MARS notation, this is written
y   ' c1 ( x   * )   c2 ( *  x )   e ,
(1.1-18)
where  *  100 and ( )+ is the right (+) truncated spline function which takes on the value 0 if
the expression inside ( )+ is negative and its actual value if the expression inside ( )+ is > 0.
Here c1  1 and c2  2 . In terms of equation (1-12), K1 ( X )  ( x   * ) and K 2 ( X )  ( *  x ) .
Note that the derivative of the spline function is not defined for values of x at the knot value of
100. Friedman (1991b) suggests using either a linear or cubic approximation to determine the
exact y value. In the results reported later, both evaluation techniques were tested and the one
with the lowest sum of squares of the residual was selected. The MARS user selects the
maximum number of knots to consider and the highest order interaction to investigate.
Alternatively, the minimum numbers of observations between knots can be set. An example of
an interaction model for y  f ( x, z ) follows.
y    c1 ( x   1* )   c2 ( 1*  x )   c3 ( x   1* )  ( z   2* )   e
(1.1-19)
implies that
y    c1 x  c1 1*  e
for x   1* and z   2*
   c2 x  c2 1*  e
for x   1*
(1.1-20)
   c1 x  c   c3 ( xz   z   x    )  e for x   and z   .
*
1 1
*
1
*
2
* *
1 2
*
1
*
2
An alternative to the MARS model is the GAM (generalized additive model) model
discussed
by
Hastie-Tibshirani
(1986,
1990)
and
Faraway
(2006,
240).
Assuming y  f ( x1 , x2 ,..., xk ) where xi and y are one dimensional vectors, a GAM model can
be written as
k
E ( y | x1, x2 ,..., xk )  0   a j ( x j )
(1.1-21)
j 1
where the  j (.) are smooth functions standardized
(to remove free constants) so that
E j ( x j )  0 and estimated one at a time using forward stepwise estimation using a scatterplot
smoother. When (1.1-21) is estimated with OLS, the expected coefficients are all 1.0. The user
sets the degree of the smoother. The B34S implementation allows the user who has set the
degree of the smoother > 1 to see the “cost” in the sense of an increase in the errors sum of
squares if the linearity assumption (degrees of freedom = 1) was imposed. A significance test
that measures the difference of the sum of squares of the residuals for the linear restriction case
and the DF> 1 case allows a relative measure of the degree of nonlinearity, by variable, that is
assumed away if the model was estimated using OLS. The Hastie-Tibshirani (1990. 87) data on
diabetes that models ln(level of serum C-peptide) (lpeptide) as a function of age and base
deficit (bdeficit) will be used to show how the GAM model can be used to graphically
display the degree of nonlinearity..
Annotated Output follows.
Ordinary Least Squares Estimation
Dependent variable
Centered R**2
Adjusted R**2
Residual Sum of Squares
Residual Variance
Standard Error
Total Sum of Squares
Log Likelihood
Mean of the Dependent Variable
Std. Error of Dependent Variable
Sum Absolute Residuals
F( 2,
40)
F Significance
1/Condition XPX
Maximum Absolute Residual
Number of Observations
Variable
AGE
BDEFICIT
CONSTANT
Lag
0
0
0
Coefficient
0.15016835E-01
0.89648947E-02
1.4828545
LPEPTIDE
0.3742769804943026
0.3429908295190177
0.6685211174195327
1.671302793548832E-02
0.1292788766020510
1.068397831915541
28.50922237373565
1.545441506335143
0.1594930832890150
4.189827641219706
11.96302417609537
0.9999153469426932
5.670219014249498E-04
0.2937646199111468
43
SE
0.50989105E-02
0.28794517E-02
0.59691527E-01
t
2.9451067
3.1134033
24.841960
Generalized Additive Models (GAM) Analysis
Reference: Generalized Additive Models by Hastie and Tibshirani. Chapman (1990)
Model estimated with GPL code obtained from R.
Gaussian additive model assumed
Identity link - yhat = x*b + sum(splines)
Response variable ....
Number of observations:
Residual Sum of Squares
# iterations
# smooths/variable
Mean Squared Residual
df of deviance
Scale Estimate
Primary
tolerence
Secondary tolerance
R square
Total sum of Squares
Model df
-----------1.
3.00
3.00
----7.00
coef
---1.47871
0.147574E-01
0.816856E-02
LPEPTIDE
43
0.4518579290553415
1
9
1.050832393151957E-02
35.99996595688997
1.255162100976546E-02
1.000000000000000E-09
1.000000000000000E-09
0.5770695937811844
1.068397831915541
st err
z score
-----------0.5173E-01
28.59
0.4419E-02
3.340
0.2495E-02
3.274
8
nl pval
-------
lin_res
-------
0.9941
0.8278
0.6085
0.5145
Name
---intcpt
AGE
BDEFICIT
Lag
--0
0
The gain from going from OLS to GAM was to reduce e ' e from .66852 to .45186. When age
is restricted to be linear, e ' e becomes .6085, which is a highly significant difference. When
bdeficit is restricted to be linear, e ' e increases to .5145, which is significant at .8278. Plots
of the smoothed right hand side variables are shown next and should be compared to the same
plots in Hastie-Tibshirani (1990, 87) figure 4.3.
Surface AGE
.10
.05
0
-.05
L
E
V
E
R
A
G
E
-.10
-.15
-.20
-.25
-.30
-.35
2
4
6
8
10
12
14
16
X_VAR
Figure 1.1-1 GAM Smoothed AGE series
Surf ace B D EFI CI T
.15
.10
.05
L
E
V
E
R
A
G
E
0
-.05
-.10
-.15
-.20
-30
-25
-20
-15
-10
X _VA R
Figure 1.1-2 GAM Smoothed BDEFICIT series
-5
0
L
O
W
E
R
_
B
U
P
P
E
R
_
B
L
O
W
E
R
_
B
U
P
P
E
R
_
B
Another alternative is the ACE model (Brieman-Friedman (1985) that smoothes both the
left hand side and the right hand sides. The ACE model is written as
k
( y )  0   a j ( x j )
(1.1-22)
j 1
If  is invertible, (1.1-22) can be written as
k
ˆ 1[ˆ   aˆ ( x )] .
y
0
j
j
(1.1-23)
j 1
Once the model is estimated. The ACE algorithm minimizes the squared error
k
E{( y )  0  j ( x j )}2 subject to var{( y )}  1 . The steps of the ACE algorithm3 are:
j 1
(i)
(ii)
(iii)
Initialize by setting ( y )  { y  E ( y )}/{var( y )}.5
Fit an additive model to ( y ) that will obtain new functions f1 ( x1 ),..., f p ( x p ).
ˆ ( y )  E{ f ( x ) | y} and update the left hand side by forming
Compute 
j
j
j
ˆ ( y ) /[var{
ˆ ( y )}].5 .
( y )  
(iv)
Alternate: steps (ii) and (iii) until E{( y )   f j ( x j )}2 does not change.
j
Step (ii) can be thought of as for a fixed  , the minimizing fi ( xi ) is f ( X )  E{ ( y ) | X } while
step (iii) can be thought of as for fixed f ( ) , the minimizing  is  ( y )  E{ f ( X ) | y}.
In addition to forecasting, an advantage of the ACE procedure may be as a diagnostic tool that
will lead to an understanding whether a model can be safely estimated as linear or that some kind
of transformation is needed to one of both sides of the equation. The question arises concerning
why it is often necessary to transform both sides of the equation, not just the right hand side as is
done with GAM and MARS. Hastie and Tibshirani (1990) make the point that a model of the
form y  exp( x  z 2 )e cannot be estimated in additive form by GAM or MARS but a simple
additive model can be found that describes log( y ) . For example log( y )  x  z 2  e .
1.2 Plan of Book
Table 1.1 lists a number of useful concepts that will be developed further in this book and in the
revision of Stokes (1997) that is referred to as Stokes (200x).
3
See Hastie-Tibshirani (1990, 176) for details. The discussion of ACE has been taken from this key reference with
minor modifications.
10
Chapter 2 provides an overview of time series modeling objectives. White noise is defined and a
number of basic filtering techniques are discussed and illustrated using the B34S software
system. While these may appear naive, they provide a "base case" which much be beaten if a
more complex model is to be used. Many of the Data Mining applications use such simple
models when it is not possible due to time constraints or data constraints to develop a more
sophisticated approach. The most basic of these methods of forecasting in the "no change
extrapolation" which asserts that the expectation of a series in period t+1 formed in period t is is
actual value in period t or t 1 xte  xt .
Chapter 3 looks at stability conditions and outlines how the frequency and time series
representations of a series are related.
Chapter 4 provides a brief discussion of stationary time series models. The autocorrelation and
partial correlation functions are defined and shown to map to the AR (autoregressive) and MA
(moving average) coefficients. Use of simulation techniques will give the reader confidence on
how to use these core diagnostic tools.
Chapter 5 covers the estimation of AR(p), MA(q) and ARMA(p,q) models both with user input
and using automatic or "expert" systems.
Chapter 6 deals with filtering issues and introduces some of the concepts involved in
cointegration tests of economic series. Rather that dealing solely with the theory, the main point
is to show that models that do not adequately check for unit roots run serious risks of being
invalid.
Chapter 7 involves relaxing the assumption of homoskedasticity. The efficient markets
assumption suggests that an expected stock price series for period t+k t  j pte should embody all
known information t up to that time period, or
t j
pte  ( pt | t ) . While the ACF of pt may
show no spikes, the ACF of the squared residual ( pt   )2 may show spikes. ARCH and
GARCH models model both the first and second moment of ARIMA models and have been
shown to be quite useful in financial modeling. In general the squared residual of any time series
model can and should be inspected.
Chapter 8 covers bivariate GARCH models that are discussed in the light of VAR models.
Table 1.1 Useful Concepts
______________________________________________________________________

The VAR model can be expressed in the frequency domain.

The assumption of coefficient stability can be tested using recursive residual analysis, which
involves recursive estimation.

If the assumption of coefficient stability is violated, the marspline, gamfit, acefit and
pispline commands can be used, provided that the instability is level dependent. These
techniques allow for automatic detection of parameter shifts and provide diagnostic tests for
how the model is doing. If the instability is time dependent, then it may be possible to
parameterize the model to take this into account. .

A dynamic system can be studied in both the time and frequency domain. The varfreq
command can be used to study a VAR model in the frequency domain. The wavelet
command under the matrix command can be used to remove short duration noise from data.

The assumption of linearity can be tested using the Hinich (1982) test, the Hinich (1996) test,
the BDS test, the Tsay (1986) test and other nonlinearity tests. Since not all nonlinearity tests
can detect all types of models, which test shows nonlinearity may indicate how linearity is
violated and provide guidance on solutions.

The assumption of homoskedasticity can be tested using the Engle (1982) test or tests on the
autocorrelations of the squared residuals. If heteroskedasticity is found, an ARCH or
GARCH model can be estimated. (see Stokes (1997) page 337 – 342). For multiple series
models BGARCH or bivariate GARCH model can be used. Such models can jointly estimate
the first moment and the second moment or employ the more econometrically tractable two
pass method of estimation.

A bivariate GARCH model for two series can be thought of as VARMA model on both the
first and the second moment.

Simultaneous equations models can be shown to be special cases of the more general
VARMA class of models.
_____________________________________________________________________________________
12