Lecture 2: Univariate Time Series Analysis: Conditional and Unconditional Densities, Stationarity, ARMA Processes Prof. Massimo Guidolin 20192– Financial Econometrics Spring 2015 Overview Motivation: what is time series analysis The ARMA class In-depth analysis of the AR(1) process Stationarity ARMA processes as combinations of white noise processes Wold decomposition theorem In-depth analysis of the ARMA(1,1) process Estimation of ARMA models Maximum Likelihood Estimation vs. OLS Deterministic vs. Stochastic Trends Lecture 2: Univariate Time Series Analysis– Prof. Guidolin 2 Motivation: time series analysis Time series analysis exploits properties of past data to predict their future (density or moments) A time series is a sequence {x₁, x₂, ..., xT} or {xt}, t=1,...,T, where t is an index denoting the period in which x occurs Can we do better than using the constant as the only predictor for financial returns? o Selecting time-varying predictors requires using properties of observed data to predict future observations Time series is branch of econometrics that deals with this question We shall consider univariate and multivariate time series models o Univariate models = relevant information set to predict one variable is restricted to the past history of that variable A time series is a sequence {x₁, x₂, ..., xT} or {xt}, t=1,...,T, where t is an index denoting the period in time in which x occurs o Returns on financial assets observed over a given sample constitute the typical time series of our interest Lecture 2: Univariate Time Series Analysis– Prof. Guidolin 3 Univariate time series processes Under a Gaussian IID (CER) model there is no predictability either in the mean or in the variance xt is a random variable a time series is a sequence of random variables ordered in time; such a sequence is also known as a stochastic process The probability structure of a sequence of random variables is determined by the joint distribution of a stochastic process. The famous (Gaussian) IID/CER model is the simplest case of a probability model for such a joint distribution: zt+1 ∼ N (0, 1) xt+1 ≡ Rt+1 = µ + σzt+1 o It implies xt+1 is normally IID over time with constant variance and mean equal to µ o CER = Constant Expected Return, Rt is the sum of constant + a white noise process, zt o Under CER, forecasting is not interesting as the best forecast for the moments would be their unconditional moments Lecture 2: Univariate Time Series Analysis– Prof. Guidolin 4 The autoregressive moving average (ARMA) class ARMA models are linear combinations of white noise processes: Because we have a suspicion that financial data may contain some predictability, we construct more realistic models than Gaussian IID In univariate strategies the basic idea is to use combinations of white noise processes, t, to generate more flexible models capable of replicating the relevant features of the data In particular, autoregressive moving average (ARMA) models are built by taking linear combinations of white noise processes Lecture 2: Univariate Time Series Analysis– Prof. Guidolin 5 The autoregressive moving average (ARMA) class data ρ =0.99 ρ =0 Lecture 2: Univariate Time Series Analysis– Prof. Guidolin 6 In-depth analysis of the AR(1) model properties Consider a Gaussian AR(1) model with drift (= intercept term): o t ∼ n.i.d (0, 2) equivalent to t ∼ i.i.d. N(0, 2) or NID(0, 2) o Given that each realization of the process is a random variable, the first relevant fundamental is the density of each observation o We distinguish between conditional and unconditional densities The unconditional density is obtained under the hypothesis that no observation on the time-series is available o We derive the unconditional density by ideally placing the observer at time zero, i.e., before observing any realizations o At that moment the information set contains only the knowledge of the process generating the observations As observations become available, we compute conditional densities by combining the information on the process with observed data Lecture 2: Univariate Time Series Analysis– Prof. Guidolin 7 The AR(1) model: conditional moments o The moments of the density of xt conditional upon xt-1 are obtained from: o This can be generalized to moments of xt conditional upon xt-2: Lecture 2: Univariate Time Series Analysis– Prof. Guidolin 8 The AR(1) model: unconditional moments At this point, to get to unconditional moments it is sufficient to substitute recursively from to express xt as a function of information available at time 0, the moment before we start observing the process: Auto-covariance fnct. Auto-correlation fnct. However, all these unconditional moments depend on “t” or “j” At this point, another important statistical concepts comes in: STATIONARITY Lecture 2: Univariate Time Series Analysis– Prof. Guidolin 9 The AR(1) model: stationarity A stochastic process is strictly stationary iff its distribution does not change over time A stochastic process is covariance stationary iff its first two unconditional moments exist and do not change over time A stochastic process is strictly stationary if its joint density function does not depend on time: for each j₁, j₂, ..., jn, the joint density, does not depend on t A stochastic process is covariance stationary if its two first unconditional moments do not depend on time, i.e., if all the relations that follow are satisfied for each h, i, j: In the case of an AR(1) process, condition for stationarity is |ρ₁|<1 This occurs because limt→∞(ρ₁)t = limt→∞(ρ₁)2t-2 = 0 Lecture 2: Univariate Time Series Analysis– Prof. Guidolin 10 The AR(1) model: stationarity When such a covariance stationarity condition is satisfied, we have: When ρ₁ = 1, the process is non-stationary, and both the expectation and the variance become explosive: Explosive means that the unconditional moments diverge as t → ∞ Lecture 2: Univariate Time Series Analysis– Prof. Guidolin 11 The AR(1) model: stationarity Under stationarity one can predict the future using information from the past Strict stationarity enables density forecasts; covariance stationarity enables moment predictions Stationarity is a important for forecasting because under stationarity one can legitimately learn from the past to predict the future o If stationarity is not satisfied the density of the observations estimated from past data is not going to be helpful to predict future observations We have claimed that ARMA just represent clever combinations of white noise processes o Such combinations should allow us to best fit the data This is guaranteed by an important statistical results, the Wold decomposition theorem: any stationary stochastic process can be expressed as the sum of a deterministic component and a stochastic moving-average component Lecture 2: Univariate Time Series Analysis– Prof. Guidolin 12 ARMA processes: Wold decomposition theorem Wold’s theorem states that any stationary stochastic process can be expressed as the sum of a deterministic component and a stochastic moving-average component Using Wold’s theorem, a time series is written as a polynomial distributed lag of white noise processes: o L is simply the lag operator: Lt = t -1, L2t = t -2, ..., Lnt = t -n o b(L) is a lag polynomial Problem: even though Wold’s theorem is a mathematical fact, in order to describe successfully many time series, a very high order in the polynomial b(L) is required o This feature can be problematic for estimation, given the usual limitations for sample sizes Lecture 2: Univariate Time Series Analysis– Prof. Guidolin 13 ARMA processes: Wold decomposition theorem An ARMA(p, q) is stationary when the AR component has roots all outside the unit circle The MA term invertible: all roots lie outside the unit circle This potential problem is resolved if the polynomial b(L) can be represented as the ratio of two polynomials of lower order: o Under simple technical conditions this often occurs This is an ARMA(p, q) process. The process is stationary when the roots of c(L) lie outside the unit circle The MA component is invertible when the roots of a(L) lie outside the unit circle o Invertibility of MA means it can be represented as stationary AR() Consider the simplest case, the ARMA(1,1) process: Lecture 2: Univariate Time Series Analysis– Prof. Guidolin 14 The ARMA(1,1) model: unconditional moments An ARMA(1,1) can be re-written as: Using this representation, one can see that: Lecture 2: Univariate Time Series Analysis– Prof. Guidolin 15 The ARMA(1,1) model: unconditional moments While AR(p) models may be estimated by OLS, MA and ARMA models cannot – estimation by maximum likelihood (ML) is called for The first-order autocorrelation is instead: Successive values for ρ(j) are obtained from the recurrent relation: ρ(j) = c₁ · ρ(j-1) for j ≥ 2 Because ARMA were defined from , they clearly all possess a infinite moving average representation How do you estimate an ARMA model? o In the AR case, there is no problem: just apply standard regression analysis to linear specifications where the regressors are lags of the dependent variable o However, standard regression methods are no longer applicable when MA terms appear Lecture 2: Univariate Time Series Analysis– Prof. Guidolin 16 ML estimation of ARMA models ML estimation is based on the maximization of the likelihood function, the joint density of all available data in the sample This derives from the fact that some key conditions of classical regression analysis are violated (see the lecture notes for details) o In a nutshell, the orthogonality condition between xt and t+1 required by OLS in xt+1 = 0 + 1 t + t+1 fails A more general method, capable of dealing with these issues is Maximum Likelihood Estimation (MLE) Estimates of the parameters of interests are obtained by maximizing the likelihood function o The likelihood function is the joint probability distribution of the data, that depends on the observations on the time series of interest and on the unknown parameters o It is defined on the parameter space Θ, given the observation of the observed sample Xt t=1, …, T and of a set of initial conditions X0 • One can interpret such initial conditions as the pre-sample observations on the relevant variables Lecture 2: Univariate Time Series Analysis– Prof. Guidolin 17 ML estimation of ARMA models Although it represents the joint density of the data, in MLE the log-likelihood is maximized choosing parameter estimates for fixed data Once a sample of observations is fed to the likelihood, the latter can be treated as a function of the unknown coefficients The MLE is then obtained by choosing the value of the unknown parameters that maximize the likelihood function In practice, the MLE selects the value of parameters to maximize the probability of drawing data that are effectively observed We now provide the example of the MLE of an MA(1) process: o In this case the unknown parameters to be estimated are θ₀, θ₁, and σ² To derive MLEs, first define the time series of residuals: Lecture 2: Univariate Time Series Analysis– Prof. Guidolin 18 ML estimation of ARMA models Under IID shocks (returns) the likelihood is obtained as the product of the density function of each random observation The likelihood may often be maximized only numerically Given the distributional assumption on εt+1, we have: o This expression is the distribution of a single observation, while the likelihood function is the joint distribution of the entire sample If the εt+1 are independent over time, then the likelihood function can be written as follows: The MLE chooses θ₀, θ₁,σ2ε to maximize the probability that the estimated model has generated the observed data o The optimum is not always found analytically, iterative search is the standard method, easily implemented in EViews or even Excel Lecture 2: Univariate Time Series Analysis– Prof. Guidolin 19 Hints to the Box-Jenkins approach The Box-Jenkins approach is a structured sequence of steps aiming at best specifying and estimating models in the ARIMA class What is the I in ARIMA? It stands for integrated, to mean that xt contains a stochastic trend, or xt+1 = xt + t+1 with t+1 white noise o White noise = t+1 is IID, no serial correlation, constant variance The approach is structured on FIVE STEPS: ❶ PRE-WHITENING: make sure that the time series is stationary Make sure that the model at hand is ARMA and not ARIMA: this is commonly achieved for simple univariate time series via differencing, i.e., by considering Δxt = (1 - L)xt instead of xt ❷ MODEL SELECTION: look for the best ARMA specification Information criteria are a useful tool to this end They are model selection criteria based on penalized versions of the maximized log-likelihood function Lecture 2: Univariate Time Series Analysis– Prof. Guidolin 20 Hints to the Box-Jenkins approach The key steps of Box-Jenkins’ approach are: pre-whitening; model selection (pick p and q); estimation (often by MLE); model checking/diagnostic; forecasting/use in decisions They are used to select p and q in an ARMA(p, q) model, e.g., Akaike's (AIC) and the Schwarz Bayesian (SBIC) are the most commonly used criteria ❸ ESTIMATION: see above Number of observations (n=T) ❹ MODEL CHECKING/DIAGNOSTICS: make sure residuals reflect any assumptions that were made, e.g., ❺ FORECASTING: the selected and estimated model is typically simulated forward to produce forecasts for the variable of interests at one or more relevant horizons Time series of long-horizon returns (computed as sums of higher frequency returns) besides being persistent, often feature trends Lecture 2: Univariate Time Series Analysis– Prof. Guidolin 21 Deterministic vs. stochastic trends There are two types of trends, stochastic and deterministic In a stochastic trend, the baseline is a random walk, xt+1 = 0 + 01xt + t+1 to be decomposed in deterministic comp. + trend Stochastic integrated series are made stationary by differentiating them Two basic kinds of trends Stochastic ones, characterizing random walk processes (below, with drift): o Recursive substitution yields This shows the structure: deterministic component (a0t) + stochastic trend, here o The series is non-stationary in that the unconditional mean (E(xt)=x0 + a0t) is a function of time Lecture 2: Univariate Time Series Analysis– Prof. Guidolin 22 Deterministic vs. stochastic trends A time series that needs to be differentiated d times before becoming stationary, is said to be integrated of order d In an integrated series, all past shocks matter equally and have permanent effects in the infinite distant future An immediate way to make a non-stationary series stationary is by differencing it: If {xt} needs differentiation d times, it is integrated of order d, I(d) o A random walk with drift is clearly I(1) Assuming a0 = 0, a random walk may be re-written as: xt = xt-1 + t = xt-2 + t-1 + t = xt-3 + t-2 + t-1 + t = ... this means that all past shocks matter equally and have permanent effects in the infinite distant future In this sense, I(1) processes display maximum persistence... Lecture 2: Univariate Time Series Analysis– Prof. Guidolin 23 Deterministic vs. stochastic trends A deterministic trend is a process the value of which directly depends on time (t) as a variable This type of non-stationarity is simply removed by regressing zt on the deterministic trend The alternative is represented by deterministic trends: These processes are also called trend-stationary The process for zt is non-stationary, but non-stationarity is removed simply by regressing zt on the deterministic trend Unlike the stochastic case, for integrated processes the removal of deterministic trend does not deliver a stationary time-series Lecture 2: Univariate Time Series Analysis– Prof. Guidolin 24 Reading List/How to prepare the exam Carefully read these Lecture Slides + class notes Possibly read BROOKS, chapter 6. You may want to take a look at CHRISTOFFERSEN, chapter 3. Lecture Notes are available on Prof. Guidolin’s personal web page Lecture 2: Univariate Time Series Analysis– Prof. Guidolin 25