Lecture Slides

Lecture 2: Univariate Time Series
Analysis: Conditional and Unconditional
Densities, Stationarity, ARMA Processes
Prof. Massimo Guidolin
20192– Financial Econometrics
Spring 2015
 Motivation: what is time series analysis
 The ARMA class
 In-depth analysis of the AR(1) process
 Stationarity
 ARMA processes as combinations of white noise
 Wold decomposition theorem
 In-depth analysis of the ARMA(1,1) process
 Estimation of ARMA models
 Maximum Likelihood Estimation vs. OLS
 Deterministic vs. Stochastic Trends
Lecture 2: Univariate Time Series Analysis– Prof. Guidolin
Motivation: time series analysis
 Time series analysis exploits properties of past data to
predict their future (density or moments)
 A time series is a sequence {x₁, x₂, ..., xT} or {xt}, t=1,...,T, where
t is an index denoting the period in which x occurs
 Can we do better than using the constant as the only predictor for
financial returns?
o Selecting time-varying predictors requires using properties of
observed data to predict future observations
 Time series is branch of econometrics that deals with this question
 We shall consider univariate and multivariate time series models
o Univariate models = relevant information set to predict one variable is
restricted to the past history of that variable
 A time series is a sequence {x₁, x₂, ..., xT} or {xt}, t=1,...,T, where t is
an index denoting the period in time in which x occurs
o Returns on financial assets observed over a given sample constitute
the typical time series of our interest
Univariate time series processes
 Under a Gaussian IID (CER) model there is no predictability
either in the mean or in the variance
 xt is a random variable a time series is a sequence of random
variables ordered in time; such a sequence is also known as a
stochastic process
 The probability structure of a sequence of random variables is
determined by the joint distribution of a stochastic process.
 The famous (Gaussian) IID/CER model is the simplest case of a
probability model for such a joint distribution:
zt+1 ∼ N (0, 1)
xt+1 ≡ Rt+1 = µ + σzt+1
o It implies xt+1 is normally IID over time with constant variance and
mean equal to µ
o CER = Constant Expected Return, Rt is the sum of constant + a white
noise process, zt
o Under CER, forecasting is not interesting as the best forecast for the
moments would be their unconditional moments
The autoregressive moving average (ARMA) class
 ARMA models are linear combinations of white noise
 Because we have a suspicion that financial data may contain some
predictability, we construct more realistic models than Gaussian IID
 In univariate strategies the basic idea is to use combinations of
white noise processes, t, to generate more flexible models capable
of replicating the relevant features of the data
 In particular, autoregressive moving average (ARMA) models are
built by taking linear combinations of white noise processes
The autoregressive moving average (ARMA) class
ρ =0.99
ρ =0
In-depth analysis of the AR(1) model properties
 Consider a Gaussian AR(1) model with drift (= intercept term):
o t ∼ n.i.d (0, 2) equivalent to t ∼ i.i.d. N(0, 2) or NID(0, 2)
o Given that each realization of the process is a random variable, the
first relevant fundamental is the density of each observation
o We distinguish between conditional and unconditional densities
 The unconditional density is obtained under the hypothesis that no
observation on the time-series is available
o We derive the unconditional density by ideally placing the observer at
time zero, i.e., before observing any realizations
o At that moment the information set contains only the knowledge of
the process generating the observations
 As observations become available, we compute conditional
densities by combining the information on the process with
observed data
The AR(1) model: conditional moments
o The moments of the density of xt conditional upon xt-1 are obtained
o This can be generalized to moments of xt conditional upon xt-2:
The AR(1) model: unconditional moments
 At this point, to get to unconditional moments it is sufficient to
substitute recursively from
to express xt as a function of information available at time 0, the
moment before we start observing the process:
Auto-covariance fnct.
Auto-correlation fnct.
 However, all these unconditional moments depend on “t” or “j”
 At this point, another important statistical concepts comes in:
The AR(1) model: stationarity
 A stochastic process is strictly stationary iff its distribution
does not change over time
 A stochastic process is covariance stationary iff its first two
unconditional moments exist and do not change over time
 A stochastic process is strictly stationary if its joint density function
does not depend on time: for each j₁, j₂, ..., jn, the joint density,
does not depend on t
 A stochastic process is covariance stationary if its two first
unconditional moments do not
depend on time, i.e., if all the
relations that follow are
satisfied for each h, i, j:
 In the case of an AR(1) process, condition for stationarity is |ρ₁|<1
 This occurs because limt→∞(ρ₁)t = limt→∞(ρ₁)2t-2 = 0
The AR(1) model: stationarity
 When such a covariance stationarity condition is satisfied, we
 When ρ₁ = 1, the process is non-stationary, and both the
expectation and the variance become explosive:
 Explosive means that the unconditional moments diverge as t → ∞
The AR(1) model: stationarity
 Under stationarity one can predict the future using
information from the past
 Strict stationarity enables density forecasts; covariance
stationarity enables moment predictions
 Stationarity is a important for forecasting because under stationarity one can legitimately learn from the past to predict the future
o If stationarity is not satisfied the density of the observations
estimated from past data is not going to be helpful to predict future
 We have claimed that ARMA just represent clever combinations of
white noise processes
o Such combinations should allow us to best fit the data
 This is guaranteed by an important statistical results, the Wold
decomposition theorem: any stationary stochastic process can be
expressed as the sum of a deterministic component and a stochastic
moving-average component
ARMA processes: Wold decomposition theorem
 Wold’s theorem states that any stationary stochastic process
can be expressed as the sum of a deterministic component
and a stochastic moving-average component
 Using Wold’s theorem, a time series is written as a polynomial
distributed lag of white noise processes:
o L is simply the lag operator: Lt = t -1, L2t = t -2, ..., Lnt = t -n
o b(L) is a lag polynomial
 Problem: even though Wold’s theorem is a mathematical fact, in
order to describe successfully many time series, a very high order
in the polynomial b(L) is required
o This feature can be problematic for estimation, given the usual
limitations for sample sizes
ARMA processes: Wold decomposition theorem
 An ARMA(p, q) is stationary when the AR component has
roots all outside the unit circle
 The MA term invertible: all roots lie outside the unit circle
 This potential problem is resolved if the polynomial b(L) can be
represented as the ratio of two polynomials of lower order:
o Under simple technical conditions this often occurs
 This is an ARMA(p, q) process. The process is stationary when the
roots of c(L) lie outside the unit circle
 The MA component is invertible when the roots of a(L) lie outside
the unit circle
o Invertibility of MA means it can be represented as stationary AR()
 Consider the simplest case,
the ARMA(1,1) process:
The ARMA(1,1) model: unconditional moments
 An ARMA(1,1) can be re-written as:
 Using this representation, one can see that:
The ARMA(1,1) model: unconditional moments
 While AR(p) models may be estimated by OLS, MA and ARMA
models cannot – estimation by maximum likelihood (ML) is
called for
 The first-order autocorrelation is instead:
 Successive values for ρ(j) are obtained from the recurrent relation:
ρ(j) = c₁ · ρ(j-1) for j ≥ 2
 Because ARMA were defined from
, they
clearly all possess a infinite moving average representation
 How do you estimate an ARMA model?
o In the AR case, there is no problem: just apply standard regression
analysis to linear specifications where the regressors are lags of the
dependent variable
o However, standard regression methods are no longer applicable when
MA terms appear
ML estimation of ARMA models
 ML estimation is based on the maximization of the likelihood
function, the joint density of all available data in the sample
 This derives from the fact that some key conditions of classical
regression analysis are violated (see the lecture notes for details)
o In a nutshell, the orthogonality condition between xt and t+1 required
by OLS in xt+1 = 0 +  1 t + t+1 fails
 A more general method, capable of dealing with these issues is
Maximum Likelihood Estimation (MLE)
 Estimates of the parameters of interests are obtained by
maximizing the likelihood function
o The likelihood function is the joint probability distribution of the data,
that depends on the observations on the time series of interest and on
the unknown parameters
o It is defined on the parameter space Θ, given the observation of the
observed sample Xt t=1, …, T and of a set of initial conditions X0
One can interpret such initial conditions as the pre-sample observations
on the relevant variables
ML estimation of ARMA models
 Although it represents the joint density of the data, in MLE
the log-likelihood is maximized choosing parameter
estimates for fixed data
 Once a sample of observations is fed to the likelihood, the latter can
be treated as a function of the unknown coefficients
 The MLE is then obtained by choosing the value of the unknown
parameters that maximize the likelihood function
 In practice, the MLE selects the value of parameters to maximize
the probability of drawing data that are effectively observed
 We now provide the example of the MLE of an MA(1) process:
o In this case the unknown parameters to be estimated are θ₀, θ₁, and σ²
 To derive MLEs, first define the time series of residuals:
ML estimation of ARMA models
 Under IID shocks (returns) the likelihood is obtained as the
product of the density function of each random observation
 The likelihood may often be maximized only numerically
 Given the distributional assumption on εt+1, we have:
o This expression is the distribution of a single observation, while the
likelihood function is the joint distribution of the entire sample
 If the εt+1 are independent over time, then the likelihood function
can be written as follows:
 The MLE chooses θ₀, θ₁,σ2ε to maximize the probability that the
estimated model has generated the observed data
o The optimum is not always found analytically, iterative search is the
standard method, easily implemented in EViews or even Excel
Hints to the Box-Jenkins approach
 The Box-Jenkins approach is a structured sequence of steps
aiming at best specifying and estimating models in the
ARIMA class
 What is the I in ARIMA? It stands for integrated, to mean that xt
contains a stochastic trend, or xt+1 = xt + t+1 with t+1 white noise
o White noise = t+1 is IID, no serial correlation, constant variance
 The approach is structured on FIVE STEPS:
❶ PRE-WHITENING: make sure that the time series is stationary
 Make sure that the model at hand is ARMA and not ARIMA: this is
commonly achieved for simple univariate time series via
differencing, i.e., by considering Δxt = (1 - L)xt instead of xt
❷ MODEL SELECTION: look for the best ARMA specification
 Information criteria are a useful tool to this end
 They are model selection criteria based on penalized versions of
the maximized log-likelihood function
Hints to the Box-Jenkins approach
 The key steps of Box-Jenkins’ approach are: pre-whitening;
model selection (pick p and q); estimation (often by MLE);
model checking/diagnostic; forecasting/use in decisions
 They are used to select p and q in an ARMA(p, q) model, e.g.,
Akaike's (AIC) and the Schwarz Bayesian (SBIC) are the most
commonly used criteria
❸ ESTIMATION: see above
Number of observations (n=T)
❹ MODEL CHECKING/DIAGNOSTICS: make sure residuals reflect any
assumptions that were made, e.g.,
❺ FORECASTING: the selected and estimated model is typically
simulated forward to produce forecasts for the variable of interests
at one or more relevant horizons
 Time series of long-horizon returns (computed as sums of higher
frequency returns) besides being persistent, often feature trends
Deterministic vs. stochastic trends
 There are two types of trends, stochastic and deterministic
 In a stochastic trend, the baseline is a random walk, xt+1 = 0 +
01xt + t+1 to be decomposed in deterministic comp. + trend
 Stochastic integrated series are made stationary by
differentiating them
 Two basic kinds of trends
 Stochastic ones, characterizing random walk processes (below, with
o Recursive substitution yields
 This shows the structure: deterministic component (a0t) +
stochastic trend, here
o The series is non-stationary in that the unconditional mean (E(xt)=x0
+ a0t) is a function of time
Deterministic vs. stochastic trends
 A time series that needs to be differentiated d times before
becoming stationary, is said to be integrated of order d
 In an integrated series, all past shocks matter equally and
have permanent effects in the infinite distant future
 An immediate way to make a non-stationary series stationary is by
differencing it:
 If {xt} needs differentiation d times, it is integrated of order d, I(d)
o A random walk with drift is clearly I(1)
 Assuming a0 = 0, a random walk may be re-written as:
xt = xt-1 + t = xt-2 + t-1 + t = xt-3 + t-2 + t-1 + t = ...
this means that all past shocks matter equally and have permanent
effects in the infinite distant future
 In this sense, I(1) processes display maximum persistence...
Deterministic vs. stochastic trends
 A deterministic trend is a process the value of which directly
depends on time (t) as a variable
 This type of non-stationarity is simply removed by regressing
zt on the deterministic trend
 The alternative is represented by deterministic trends:
 These processes are also called trend-stationary
 The process for zt is non-stationary, but non-stationarity is
removed simply by regressing zt on the deterministic trend
 Unlike the stochastic case, for integrated processes the removal of
deterministic trend does not deliver a stationary time-series
Reading List/How to prepare the exam
 Carefully read these Lecture Slides + class notes
 Possibly read BROOKS, chapter 6.
 You may want to take a look at CHRISTOFFERSEN, chapter 3.
 Lecture Notes are available on Prof. Guidolin’s personal web page
