Economics 300 Econometrics Time Series

advertisement
Review
Serial
Correlation
Time Series
Economics 300
Econometrics
Time Series
Dennis C. Plott
University of Illinois at Chicago
Department of Economics
www.dennisplott.com
Fall 2014
Dennis C. Plott (UIC)
ECON 300 – Fall 2014
1 / 29
Review
Serial
Correlation
OLS (Gauss-Markov) Assumptions
Time Series
Ï
One of the assumptions of the error term (u) in a regression analysis is that the
error must be independent and identically distributed (i.i.d.).
Ï
Error variance is the same for all values; i.e., homoskedasticity.
Ï
Ï
Ï
Error is not related to other error values; i.e., no serial correlation.
Ï
Ï
Ï
Ï
covered in this section
similar consequences as heteroskedasticity
can use similar solution, but is typically dealt with by incorporating into model
Error is normally distributed.
Ï
Dennis C. Plott (UIC)
violation consequences: biased standard errors (invalid inference); i.e., t- and
F-statistics useless
solutions: simple and reliable fix – use robust standard errors or cluster the standard
errors, then inference is valid
optional
ECON 300 – Fall 2014
2 / 29
Review
Serial
Correlation
Serial Correlation – Introduction
Time Series
Ï
Ï
If a systematic correlation does exist between one observation of the error term
and another, then it will be more difficult for OLS to get accurate estimates of
the standard errors of the coefficients
This assumption is most likely to be violated in time-series models:
Ï
Ï
Ï
An increase in the error term in one time period (a random shock, for example) is
likely to be followed by an increase in the next period.
Example: Hurricane Katrina
If, over all the observations of the sample ut+1 (i.e., next period’s erorr) is
correlated with ut (i.e., this period’s error), then the error term is said to be
serially correlated (or autocorrelated), and the OLS (Gauss-Markov)
assumptions are violated.
Dennis C. Plott (UIC)
ECON 300 – Fall 2014
3 / 29
Review
Serial
Correlation
Time Series
(Pure) Serial Correlation
Ï
Ï
Serial correlation (pure) occurs when the OLS (Gauss-Markov) assumption of
uncorrelated observations of the error term is violated (in a correctly specified equation).
The most commonly assumed kind of serial correlation is first-order serial correlation, in
which the current value of the error term is a function of the previous value of the error
term:
ut = ρut−1 + vt
where
Ï
Ï
Ï
Ï
u = the error term of the equation in question
ρ = the first-order autocorrelation coefficient
v = an i.i.d. (not serially correlated) error term; i.e., OLS (Gauss-Markov) assumptions not
violated
(Impure) Serial Correlation is serial correlation that is caused by a specification error such
as:
Ï
Ï
omitted variable(s) and/or
incorrect functional form
Dennis C. Plott (UIC)
ECON 300 – Fall 2014
4 / 29
Review
Serial
Correlation
Time Series
(Pure) Serial Correlation (Continued)
Ï
The magnitude of ρ indicates the strength of the serial correlation:
Ï
Ï
Ï
Ï
If ρ is zero, there is no serial correlation
As ρ approaches one in absolute value, the previous observation of the error term
becomes more important in determining the current value of ut and a high
degree of serial correlation exists
For ρ to exceed one is unreasonable (i.e., unlikely), since the error term effectively
would “explode”
As a result of this, we can state, in general, that:
−1 < ρ < +1
Ï
The sign of ρ indicates the nature of the serial correlation in an equation
Ï
Positive
Ï
Ï
Negative
Ï
Dennis C. Plott (UIC)
implies that the error term tends to have the same sign from one time period to the
next
implies that the error term has a tendency to switch signs from negative to positive
and back again in consecutive observations
ECON 300 – Fall 2014
5 / 29
Review
Serial
Correlation
Serial Correlation – Graphs
Time Series
Dennis C. Plott (UIC)
ECON 300 – Fall 2014
6 / 29
Review
Serial
Correlation
(Pure) Serial Correlation – Consequences
Time Series
Ï
The existence of serial correlation in the error term of an equation violates the
Gauss-Markov assumptions and the estimation of the equation with OLS has at
least three consequences:
1. Pure serial correlation does NOT cause bias in the coefficient estimates.
2. Serial correlation causes OLS to no longer be the minimum variance estimator (of
all the linear unbiased estimators); i.e., OLS in no longer BLUE.
3. Serial correlation causes the OLS estimates of the standard error to be biased,
leading to unreliable hypothesis testing.
Ï
Dennis C. Plott (UIC)
Typically the bias in the SE estimate is negative (downward), meaning that OLS
underestimates the standard errors of the coefficients (and thus overestimates the
t-scores)
ECON 300 – Fall 2014
7 / 29
Review
Serial
Correlation
(Pure) Serial Correlation – One Solution: Newey-West SE
Time Series
Ï
Ï
Newey-West standard errors take account of serial correlation by correcting the
standard errors without changing the estimated coefficients.
The logic begin Newey-West standard errors:
Ï
If serial correlation does not cause bias in the estimated coefficients but does
impact the standard errors, then it makes sense to adjust the estimated equation
in a way that changes the standard errors but not the coefficients.
Ï
The Newey-West standard errors are biased but generally more accurate than
uncorrected standard errors for large samples in the face of serial correlation.
Ï
As a result, Newey-West standard errors can be used for t-tests and other
hypothesis tests in most samples without the errors of inference potentially
caused by serial correlation.
Ï
Typically, Newey-West standard errors are larger than OLS standard errors, thus
producing lower t-scores
Dennis C. Plott (UIC)
ECON 300 – Fall 2014
8 / 29
Review
Serial
Correlation
Time Series Models
Time Series
Ï
Time Series Examples
Ï
Ï
Ï
Ï
Modeling relationships using data collected over time – prices, quantities, GDP,
etc.
Forecasting – predicting economic growth.
Time series involves decomposition into a trend, seasonal, cyclical, and irregular
component.
Problems ignoring lags
Ï
Values of yt are affected by the values of y in the past.
Ï
Ï
Dennis C. Plott (UIC)
For example, the amount of money in your bank account in one month is related to
the amount in your account in a previous month.
Regression without lags fails to account for the relationships through time and
overestimates the relationship between the dependent and independent
variables.
ECON 300 – Fall 2014
9 / 29
Review
Serial
Correlation
Time Series
White Noise
Ï
White noise describes the assumption that each element in a series is a random
draw from a population with zero mean and constant variance.
Ï
Autoregressive (AR) and moving average (MA) models correct for violations of
this white noise assumption.
Dennis C. Plott (UIC)
ECON 300 – Fall 2014
10 / 29
Review
Serial
Correlation
Time Series
Autoregressive (AR) Models
Ï
Autoregressive (AR) models are models in which the value of a variable in one
period is related to its values in previous periods.
Ï
AR(p) is an autoregressive model with p lags:
yt = µ +
p
X
γi yt−i + ut
i=1
where µ is a constant and γp is the coefficient for the lagged variable at time
t − p.
Ï
AR(1) is expressed as:
yt = µ + γ1 yt−1 + ut
= µ + γ(Lyt ) + ut
(1 − γL)yt = µ + ut
Dennis C. Plott (UIC)
ECON 300 – Fall 2014
11 / 29
Review
Serial
Correlation
Autoregressive (AR) Models (Continued)
Time Series
Dennis C. Plott (UIC)
ECON 300 – Fall 2014
12 / 29
Review
Serial
Correlation
Moving Average (MA) Models
Time Series
Ï
Moving average (MA) models account for the possibility of a relationship
between a variable and the residuals from previous periods.
Ï
MA(q) is a moving average model with q lags:
y t = µ + εt +
q
X
θi εt−i
i=1
where θq is the coefficient for the lagged error term in time t − q.
Ï
MA(1) model is expressed as:
yt = µ + εt + θ1 εt−1
Ï
Note: some programs, like SAS (unlike Stata), model θ with a reverse sign
Dennis C. Plott (UIC)
ECON 300 – Fall 2014
13 / 29
Review
Serial
Correlation
Moving Average (MA) Models (Continued)
Time Series
Dennis C. Plott (UIC)
ECON 300 – Fall 2014
14 / 29
Review
Serial
Correlation
Autoregressive Moving Average (ARMA) Models
Time Series
Ï
Autoregressive moving average (ARMA) models combine both p autoregressive
terms and q moving average terms, also called ARMA(p, q).
yt = µ +
p
X
i=1
Dennis C. Plott (UIC)
γi yt−i + ut +
q
X
θi ut−i
i=1
ECON 300 – Fall 2014
15 / 29
Review
Serial
Correlation
Time Series
Stationarity
Ï
Ï
Ï
Modeling an ARMA(p, q) process requires stationarity.
A stationary process has a mean and variance that do not change over time and
the process does not have trends.
An AR(1) disturbance process:
ut = ρut−1 + vt
Ï
is stationary if |ρ| < 1 and vt is white noise.
Example of a time-series variable, is it stationary?
Dennis C. Plott (UIC)
ECON 300 – Fall 2014
16 / 29
Review
Serial
Correlation
Time Series
Differencing
Ï
Ï
Ï
When a variable yt is not stationary, a common solution is to use differenced
variable: ∆yt = yt − yt−1 , for first-order differences
The variable yt is integrated of order one, denoted I(1), if taking a first
difference produces a stationary process.
ARIMA(p, d, q) denotes an ARMA model with p autoregressive lags, q moving
average lags, and a is differenced of order d.
Dennis C. Plott (UIC)
ECON 300 – Fall 2014
17 / 29
Review
Serial
Correlation
Seasonality
Time Series
Ï
Seasonality is a particular type of autocorrelation pattern where patterns occur
every “season”, like monthly, quarterly, etc.
Ï
For example, quarterly data may have the same pattern in the same quarter
from one year to the next.
Ï
Seasonality must also be corrected before a time series model can be fitted.
Dennis C. Plott (UIC)
ECON 300 – Fall 2014
18 / 29
Review
Serial
Correlation
Time Series
Dickey-Fuller Test for Stationarity
Ï
Dickey-Fuller Test
Ï
Assume an AR(1) model. The model is non-stationary or a unit root is present if
|ρ| = 1.
yt = ρyt−1 + ut
yt − yt−1 = ρyt−1 − yt−1 + ut
∆yt = (ρ − 1)yt−1 + ut
∆yt = γyt−1 + ut
Ï
We can estimate the above model and test for the significance of the γ
coefficient.
Ï
Ï
Ï
Dennis C. Plott (UIC)
If the null hypothesis is not rejected, γ = 0, then γ is NOT stationary. Difference
the variable and repeat the Dickey-Fuller test to see if the differenced variable is
stationary.
If the null hypothesis is rejected, γ > 0, then yt is stationary. Use the variable.
Note: non-significance is mean stationarity.
ECON 300 – Fall 2014
19 / 29
Review
Serial
Correlation
Time Series
Dickey-Fuller Test for Stationarity (Continued)
Ï
Augmented Dickey-Fuller test
Ï
In addition to the model above, a drift µ and additional lags of the dependent
variable can be added.
∆yt = µ + γyt−1 +
p−1
X
φj ∆yt−j + ut
j=1
Ï
Ï
The augmented Dickey-Fuller test evaluates the null hypothesis that γ = 0. The
model will be non-stationary if γ = 0.
Dickey-Fuller test with a time trend
Ï
The model with a time trend:
∆yt = µ + βt + γyt−1 +
p−1
X
φj ∆yt−j + ut
j=1
Ï
Dennis C. Plott (UIC)
Test the null hypothesis that β = 0 and γ = 0. Again, the will be non-stationary or
will have a unit root present if γ = 0.
ECON 300 – Fall 2014
20 / 29
Review
Serial
Correlation
Time Series
Autocorrelation Function (ACF) and Partial Autocorrelation
Function (ACF)
Ï
Autocorrelation function (ACF)
Ï
ACF is the proportion of the autocovariance of yt and yt−k to the variance of a
dependent variable yt
ACF(k) = ρ k =
Ï
Ï
Ï
Cov(yt , yt−k )
Var(yt )
The autocorrelation function ACF(k) gives the gross (total) correlation between yt
and yt−k .
For an AR(1) model, the ACF is ACF(k) = ρ k = γk . We state the function tails off.
Partial autocorrelation function (PACF)
Ï
Ï
Dennis C. Plott (UIC)
PACF is the simple correlation between yt and yt−k minus the part explained by
the intervening lags.
For an AR(1) model, the PACF is γ for the first lag and then cuts off.
ECON 300 – Fall 2014
21 / 29
Review
Serial
Correlation
Time Series
Autocorrelation Function (ACF) and Partial Autocorrelation
Function (ACF) (Continued)
ACF
PACF
Dennis C. Plott (UIC)
AR(p)
MA(q)
ARMA(p, q)
tails off
cuts off at lag p
cuts off at lag q
tails off
tails off
tails off
ECON 300 – Fall 2014
22 / 29
Review
Serial
Correlation
Time Series
Dennis C. Plott (UIC)
ECON 300 – Fall 2014
23 / 29
Review
Serial
Correlation
Time Series
Dennis C. Plott (UIC)
ECON 300 – Fall 2014
24 / 29
Review
Serial
Correlation
Time Series
Dennis C. Plott (UIC)
ECON 300 – Fall 2014
25 / 29
Review
Serial
Correlation
Time Series
Dennis C. Plott (UIC)
ECON 300 – Fall 2014
26 / 29
Review
Serial
Correlation
Diagnostics for ARIMA Models
Time Series
Ï
Testing for White Noise
Ï
The Ljung-Box statistic:
Q = T (T + 2)
p
X
ρ 2k
k=1
Ï
Ï
T −k
where ρ k is the sample autocorrelation at lag k.
Under the null hypothesis that the series is white noise (data are independently
distributed), Q has a limiting χ2 distribution with p degrees-of-freedom.
Goodness-of-Fit
Ï
Ï
Dennis C. Plott (UIC)
Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC)
are two measures goodness of fit. They measure the trade-off between model fit
and complexity of the model.
A lower AIC or BIC value indicates a better fit (more parsimonious model).
ECON 300 – Fall 2014
27 / 29
Review
Serial
Correlation
Time Series
The Box-Jenkins Methodology for ARIMA Model Selection:
Identification
Ï
Examine the time plot of the series.
Ï
Ï
Ï
Identify outliers, missing values, and structural breaks in the data.
Non-stationary variables may have a pronounced trend or have changing
variance.
Transform the data if needed. Use logs, differencing, or detrending.
Ï
Ï
Ï
Examine the autocorrelation function (ACF) and partial autocorrelation function
(PACF).
Ï
Ï
Ï
Dennis C. Plott (UIC)
Using logs works if the variability of data increases over time.
Differencing the data can remove trends. But over-differencing may introduce
dependence when none exists.
Compare the sample ACF and PACF to those of various theoretical ARMA models.
Use properties of ACF and PACF as a guide to estimate plausible models and select
appropriate p, d, and q.
With empirical data, several models may need to be estimated.
Differencing may be needed if there is a slow decay in the ACF.
ECON 300 – Fall 2014
28 / 29
Review
Serial
Correlation
Time Series
The Box-Jenkins Methodology for ARIMA Model Selection:
Estimation & Diagnostics
Ï
Estimation step
Ï
Ï
Ï
Estimate ARIMA models and examine the various coefficients.
The goal is to select a stationary and parsimonious model that has significant
coefficients and a good fit.
Diagnostic checking step
Ï
If the model fits well, then the residuals from the model should resemble a while
noise process.
Ï
Ï
Ï
Ï
Dennis C. Plott (UIC)
Check for normality looking at a histogram of the residuals
Check for independence by examining the ACF and PACF of the residuals, which
should look like a white noise.
The Ljung-Box-Pierce statistic performs a test of the magnitude of the
autocorrelations of the correlations as a group.
Examine goodness of fit using the Akaike Information Criteria (AIC) and Bayesian
Information Criteria (BIC). Use most parsimonious model with lowest AIC and/or
BIC.
ECON 300 – Fall 2014
29 / 29
Download