White Noise Processes (Section 6.2) Recall that covariance stationary processes are time series, yt, such 1. E(yt) = μ for all t 2. Var(yt) = σ2 for all t, σ2 < ∞ 3. Cov(yt,yt-τ) = γ(τ) for all t and τ An example of a covariance stationary process is an i.i.d. sequence. The “identical distribution” property means that the series has a constant mean and a constant variance. The “independent” property means that γ(1)=γ(2)=… = 0. [γ(0)=σ2] Another example of a covariance stationary process is a white noise process. A time series yt is a white noise process if: E(yt) = 0 for all t Var(yt) = σ2 for all t, σ2 < ∞ Cov(yt,ys) = 0 if t ≠ s That is, a white noise process is a serially uncorrelated, zero-mean, constant and finite variance process. In this case we often write yt ~ WN(0,σ2) If yt ~ WN(0,σ2) then γ(τ) = σ2 if τ = 0 = 0 if τ ≠ 0 ρ(τ), p(τ) = 1 if τ = 0 = 0 if τ ≠ 0 Note: Technically, an i.i.d. process (with a finite variance) is a white noise process but a white noise process is not necessarily an i.i.d. process (since the y’s are not necessarily identically distributed or independent). However, for most of our purposes we will use these two interchangeably. Note: In regression models we often assume that the regression errors are zero-mean, homoskedastic, and serially uncorrelated random variables, i.e., they are white noise errors. In our study of the trend and seasonal components, we assumed that the cyclical component of the series was a white noise process and, therefore, was unpredictable. That was a convenient assumption at the time. We now want to allow the cyclical component to be a serially correlated process, since our graphs of the deviations of our seasonally adjusted series from their estimated trends indicate that these deviations do not behave like white noise. However, the white noise process will still be very important to us: It forms the basic building block for the construction of more complicated time series. More on this later. Estimating the Autocorrelation Function (Section 6.5) Soon we will specify the class of models that we will use for covariance stationary processes. These models (ARMA and ARIMA models) are built up from the white noise process. We will use the estimated autocorrelation and partial autocorrelation functions of the series to help us select the particular model that we will estimate to help us forecast the series. How to estimate the autocorrelation function? The principle we use is referred to in your textbook as the analog principle. The analog principle, which turns out to have a sound basis in statistical theory, is to estimate population moments by the analogous sample moment, i.e., replace expected values with analogous sample averages. For instance, since the yt’s are assumed to be drawn from a distribution with the same mean, μ, the analog principle directs us to use the sample mean to estimate the population mean: 1 T ̂ y t T t 1 Similarly, to estimate σ2 = Var(yt) = E[(yt-μ)2] the analog principle directs us to replace the expected value with the sample average, i.e., 1 T ˆ ( y t ˆ ) 2 T t 1 2 The autcorrelation function at displacement τ is ( ) E[( yt )( yt )] E[( yt ) 2 ] The analog principle directs us to estimate ρ(τ) by using its sample analog: 1 T [( y t ˆ )( y t ˆ ) T ˆ ( ) t 1 T 1 ( y t ˆ ) 2 T t 1 or, since the 1/T’s cancel, T [( y t t 1 T (y t 1 ˆ ( ), ˆ )( yt ˆ ) t ˆ ) 2 τ = 0,1,2,… is called the sample autocorrelation function or the correlogram of the series. Notes – 1. The autocorrelation function and sample autocorrelation function will always be equal to one for τ = 0. For τ ≠ 0, the absolute values of the autocorrelations will be less than one. (The first statement is obvious from the formulas; the second is true, but not obvious.) 2. The summation in the numerator of the sample autocorrelation function begin with t = τ + 1 (rather than t = 1). {Why? Consider, e.g., the sample autocorrelation at displacement 1. If it started at t = 1, what would we use for y0, since our sample begins at y1?} 3. The summation in the numerator of the τ-th sample autocorrelation coefficient is the sum of T- τ terms, but we divide the sum by T to compute the “average”. This is partly justified by statistical theory and partly a matter of convenience. For large values of T- τ whether we divide by T or T- τ will have no practical effect. The Sample Autocorrelation Function for a White Noise Process – Suppose that yt is a white noise process (i.e., yt is a zero-mean, constant variance, and serially uncorrelated process). We know that the population autocorrelation function, ρ(τ), will be zero for all nonzero τ. What will the sample autocorrelation function look like? For large samples, ˆ ( ) ~ N (0,1/ T ) or, equivalently, T ˆ ( ) ~ N (0,1) This result means that if yt is a white noise process then for 95% of the realizations of this time series ˆ ( ) should lie in the interval [2 / T ,2 / T ] for any given τ. That is, for a white noise process, 95% of the time ˆ ( ) will lie within the two-standard error band around 0, [2 / T ,2 / T ] . {The “2” comes in because it is approximately the 97.5 percentile of the N(0,1) distribution; the square root of T comes in because it is the standard deviation of rho-hat.} This result allows us to check whether a particular displacement has a statistically significant sample autocorrelation. For example, if ˆ (1) 2 / T then we would likely conclude that the evidence of first-order autocorrelation appears to be too strong for the series to be a white noise series. However, it is not reasonable to say that you will reject the white noise hypothesis if any of the rho-hats falls outside of the twostandard error band around zero. Why not? Because even if the series is a white noise series, we expect some of the sample autocorrelations to fall outside that band – the band was constructed so that most of the sample autocorrelations would typically fall within the band if the time series is a realization of a white noise process. A better way to conduct a general test of the null hypothesis of a zero autocorrelation function (i.e., white noise) against the alternative of a nonzero autocorrelation function is to conduct a Q-test. Under the null hypothesis that yt is a white noise process, the Box-Pierce Q-statistic m QBP T ˆ 2 ( ) ~ 2 (m) 1 for large T. So, we reject the joint hypothesis H0: ρ(1)=0,…,ρ(m)=0 against the alternative that at least one of ρ(1),…,ρ(m) is nonzero at the 5% (10%, 1%) test size if QBP is greater than the 95th percentile (90th percentile, 99th percentile) of the χ2(m) distribution. How to choose m? Suppose, for example, we set m = 10 and it turns out that there is autocorrelation in the series but it is primarily due to autocorrelation at displacements larger than 10. Then, we are likely to incorrectly conclude that our series is the realization of a white noise process. On the other hand, if, for example we set m = 50 and there is autocorrelation in the series but only because of autocorrelation at displacements 1,2, and 3, we are likely to incorrectly conclude that our series is a white noise process. So, selecting m is a balance of two competing concerns. Practice has suggested that a reasonable rule of thumb is to select m = T1/2. The Ljung-Box Q-Statistic The Box-Pierce Q-Statistic has a χ2(m) distribution provided that the sample is sufficiently large. It turns out that this approximation does not work very well for “moderate” sample sizes. The Ljung-Box Q-Statistic makes an adustment to the B-P statistic to make it work better in finite sample settings without affecting its performance in large samples. m QLB T (T 2) ˆ 2 ( ) /(T ) ~ 2 (m) 1 The Q-statistic reported in EViews is the Ljung-Box statistic. Estimating the Partial Autocorrelation Function The partial a.c. function p(1), p(2),… is estimated through a sequence of “autoregressions” p(1): Regress yt on 1, yt-1: ˆ0 , ˆ1 pˆ (1) ˆ1 p(2): Regress yt on 1, yt-1, yt-2 : ˆ0 , ˆ1, ˆ2 pˆ (2) ˆ2 and so on.