Statistical Anomaly Problem Detection Correction

advertisement
Distributions, Anomalies, and Econometric Definitions
Normal Distribution: X ~ N   ,  2 
( = population mean, 2 = population variance)
Probability density function for a normally distributed random variable, x, with a population mean of  and a population variance of 2
1
f ( x) 
e
 2
( x )2
2 2
 x 
Central Limit Theorem: Regardless of the distribution from which a random variable, X, originates,
 2
lim X ~ N   , 
n 
 n 
Standard Normal Distribution: Z ~ N  0,1
Let X be a normally distributed random variable with population mean  and population variance 2.
Z =
X -

~ N(0,1)
Chi-Square Distribution: Q ~  k2 (k  degrees of freedom)
Let Z1, Z2, ..., Zk be independent standard normal random variables. Let Q be defined as
k
Q =
Z ~
2
i
 2k
i=1
The population mean of Q is k.
The population variance of Q is 2k.
The chi-square distribution is skewed to the right. As k approaches infinity, the distribution approaches symmetry.
2Q  2k  1 is approximately distributed standard normal.
For k > 100, the transformation
t - Distribution: T ~ tk
Let Z be a standard normal distributed variable. Let Q be a chi-square distributed variable with k degrees of freedom.
T =
Z
~ tk
Q/ k
The population mean of T is zero.
The population variance of T is
k
.
k 2
As k approaches infinity, the t-distribution approaches the standard normal distribution.
F Distribution: F ~ Fk1 ,k2
Let Q1 and Q2 be independently distributed chi-square variables with k1 and k2 degrees of freedom, respectively.
F =
The population mean of F is
Q1 / k 1
~ F k 1,k 2
Q2 / k 2
k2
k2  2
The population variance of F is
2k22  k1  k2  2 
k1  k2  4  k2  2 
2
The F distribution is skewed to the right. As k1 and k2 approach infinity, the distribution approaches the normal distribution.
Statistical Anomaly
Problem
Detection
Correction
Omitted Variable: A significant exogenous regressor has been omitted.
Parameter estimates are biased and
inconsistent.
A new exogenous regressor can be
found which is statistically significant.
Include the missing regressor.
Extraneous Variable*: An insignificant exogenous regressor has been
included.
Parameter estimates are inefficient.
One or more of the exogenous regressors are statistically insignificant.
Exclude the extraneous regressor.
Regime Shift: The values of the parameters change at some point(s) in
the data set.
Parameter estimates are biased and
possibly inconsistent.§
Parameter estimates change significantly when the sample is split.
Determine the type of shift (slope or
intercept) and location. Include a
dummy variable to account for the
shift.
Serial Correlation†: Current values of
the error term are correlated with past
values.
Standard errors of parameter estimates are biased.** (and thus tstatistics are invalid)
Durbin-Watson statistic is significantly different from 2.††
Perform AR and/or MA correction.
Non-Zero Errors: The expected value
of the error term is not zero.
Standard errors are biased. Correlation coefficient is biased.
Mean of the estimated residuals is not
equal to zero.
Include a constant term in the regression.
Non-Linearity: The model is of a
different functional form than the
equation that generated the data.‡
Parameter estimates are biased and
inconsistent.
A different functional form can be
found which yields a greater adjusted
R2 while using the same exogenous
regressors.‡‡
Alter the functional form of the regression equation.
Non-stationarity: The dependent variable is a random walk.
Parameter estimates are biased and
inconsistent. Standard errors are biased. Correlation coefficient is biased.
Regress dependent variable on constant and itself lagged. Slope coefficient will be equal to 1. Also, D.W. is
usually less than R2.
Difference dependent variable until it
becomes stationary.
Multicollinearity: One or more of the
exogenous regressors are significantly correlated.
When combined with the extraneous
variable anomaly, parameter estimates are biased but consistent; parameter estimates are inefficient.
Correlation between exogenous regressors is significant. Parameter
estimates are highly sensitive to
changes in observations.
Remove one of the multicollinear
regressors from the regression. Not a
problem unless the correlation of the
exogenous variables exceeds the regression correlation.
Statistical Anomaly
Problem
Detection
Correction
Heteroskedasticity: The variance of
the error term is not constant over the
data set.
Standard errors of parameter estimates are biased.
Regress the squared residuals a time
dummy and the exogenous regressors. The presence of significant coefficients indicates heteroskedasticity.
Divide the regression by a regressor
that is correlated with the squared
error term. Iterated WLS: obtain y
^
divide equation by y
^ ; re-estimate,
etc.; standard errors are unbiased and
asymptotically efficient.
Measurement Error: There is a random component attached to one or
more of the exogenous regressors.§§
Parameter estimates
downward.
Examination of the construction of
the exogenous variable.
Two-stage least squares procedure.
Truncated Regression: No data is
drawn when values of y are beyond
certain limits.
Parameter estimates are biased.
Examination of the criteria for data
selection.
Include truncated observations.
Suppressor Variable: Independent
variable is uncorrelated with the dependent variable, but appears significantly in a multiple regression model.
Parameter estimates are biased and
inconsistent.
Significance when variable appears in
multiple regression, but no significance when variable appears in single
regression.
Eliminate the variable from the model.
are
biased
Unbiasedness: E  ˆ   
The estimate of  is equal to  (on average).
Efficiency: sˆ  sˆ for all standard error estimators, s , associated with linear unbiased coefficient estimates
The standard error of the estimate of  is the smallest attainable standard error among the class of standard errors associated with linear unbiased estimators.


Consistency: lim Pr   ˆ    1 for arbitrarily small 
N 
The estimate of  approaches  as the number of observations approaches infinity.
Parts of the Regression Equation: Y =  + X + u
In the above model, Y is the dependent variable (also called the endogenous variable), X and a constant term are independent variables
(also called exogenous variables or explanatory variables or exogenous regressors), u is the error term (also called the stochastic term),
 and  are unknown parameters (or regression coefficients) that can be estimated with some known error. The estimates of the parameters are called ̂ (alpha hat) and ̂ (beta hat) and are, collectively, called parameter estimates or regression coefficient estimates.
The ordinary least squares (OLS) method of estimation calculates the parameters of the model (and the standard errors of the parameters) as follows (where N = # of observations, df = degrees of freedom):
uˆ  Y  ˆ  ˆ X
cov Y , X 
ˆ 
var  X 
u is called the regression error; û is called the regression residual.
var  uˆ 
1
ˆ   X X  X Y
var ˆ 
In matrix notation:
 N  1 var  X 
 
 
1
var ˆ  var  u  X X 
ˆ  Y  ˆ X
 

 1  N  1 X 2 
var ˆ    
 var  uˆ 
var  X  
N
1
1
var(u ) 
Y ' I  X X 'X  X ' Y
N 1


var  uˆ 
cov ˆ , ˆ    N  1 X
var  X 



1
var Yˆ   2 1  X f  X ' X  X 'f , where X f  future value of X
for the multiple regression model: Y =  + 1X1 + 2X2 + u
ˆ1 
var  X 2  cov  X 1 , Y   cov  X 1 , X 2  cov  X 2 , Y 
var  X 1  var  X 2   cov 2  X 1 , X 2 
Salkever’s Method of Computing Forecasts and Forecast Variances
0
X
Y 
Regress   on 
 . This generates the LS coefficient vector followed by the predictions. Residuals are 0 for the predictions
0
 X f I 
so, the error covariance matrix is the covariance matrix for the coefficient estimates and the variances of the forecasts.
Degrees of Freedom: df = # of observations - # of parameters in the model
parameter estimate
standard error
The probability of a parameter estimate being significantly different from zero is at least x% when the absolute value of the t-Stat associated with that parameter is greater than or equal to the critical value associated with the probability x%.
t-Stat: t  Stat 
For example, the critical value associated with a probability of 95% (when there are 30 degrees of freedom) is 2.042. Thus, there is a
95% chance that any parameter estimate with a t-Stat greater than 2.042 (in absolute value) is not equal to zero.
 cov  uˆt , uˆt 1  df 
Durbin-Watson Statistic: D.W .  2 1 

var  uˆt  N 

To use the D.W. statistic, find the values of du and dl associated with the number of observations and degrees of freedom in the regression. The following are true:
D.W. < dl
D.W. > 4 - dl
du < D.W. < 4 - du
 positive serial correlation
 negative serial correlation
 no serial correlation
Correlation and Adjusted Correlation: R 2 
cov2 Y , X 
var Y  var  X 
1
var  uˆ 
var Y 
R2  1
N 1
1  R2 
df
Note: R2 uses var(resid) where df = N – 1, R uses var(resid) where df = N – k, and is proportional to the mean squared error.
2
F-statistic:
 R 2  R 2   df 
F   u 2r   u  , Ru2  R 2 in unrestricted model; Rr2  R 2 in restricted model;
 1  Ru   k 
dfu  df in unrestricted model; k  # non-constant regressors in unrestricted model
Cross-Validity Correlation Coefficient:
R2  1 
 N  1 N  1 N  2  1  R 2


N  df  df  1
Correlation indicates the percentage of variation in the dependent variable that is explained by variations in all of the independent variables combined. Because correlation always increases when the number of independent variables increases, the adjusted correlation
provides a measure of correlation that accounts for the number of independent regressors. Criticisms of the adjusted correlation measure claim that the measure may not penalize a model enough for decreases in degrees of freedom. Because this issue is more acute in
out-of-sample analysis, the cross-validity correlation coefficient has been proposed as an alternate measure of adjusted correlation.
Correlations (and adjusted correlations) from different models can be compared only if the dependent variables in the models are the
same.
White Noise Error: An error term that is uncorrelated with all explanatory variables, is uncorrelated with past values of itself, and has
a constant variance over time.
Autocorrelation of Order k: One of two types of serial correlation.
When the error term is autocorrelated of order k, the error term is not white noise, but is generated by the process:
ut   ut  k   t where  is a constant less than one in absolute value and  is a white noise error.
Moving-Average of Order k: One of two types of serial correlation.
When the error term is moving average of order k, the error term is not white noise, but is generated by the process:
ut    t  k   t where  is a constant less than one in absolute value and  is a white noise error.
Geometric Lag Model: A model that allows us to express an infinite series of lags as a function of a few parameters. The model
achieves this by assigning arbitrarily small weights to past lags.
yt     1    xt  1     xt 1  1     2 xt  2  ...  ut The implied coefficients from this model are:
i   1     i , 0    1. The model is estimated by regressing y on x and y lagged. We have:
yt   1    xt   yt 1  ut  ut 1
The mean lag of the model is λ / (1 – λ)
The median lag of the model is ln(0.5) / ln λ
The short-run multiplier is β (1 – λ)
The long-run multiplier is β
Two-Stage Least Squares Procedure: A procedure that corrects the Errors in Variable anomaly.
Suppose we are attempting to estimate the regression equation: Y =  + X + u where there is a random component imbedded in the
exogenous variable X.
(1) Find another variable, Z, which is correlated with X and which does not contain a random component.
(2) Regress X on a constant and Z and compute X̂ .
(3) Regress Y on a constant and X̂ .
The resulting standard errors will be more efficient, although still less efficient than the straight OLS standard errors.
Test for Parameter Restrictions: A method for testing the hypothesis that a set of parameter restrictions are valid.
1. Run the regression without the restriction. Capture the sum squared residuals, RR.
2. Run the regression with the restriction. Capture the sum squared residuals, RU.
 R  RU  /  KU  K R 
3. The test statistic R
is distributed FKU  K R , N  KU under the null hypothesis that the restriction(s) are valid (where
RU /  N  KU 
KU is the number of parameters in the unrestricted model, KR is the number of parameters in the restricted model, and N is the
number of observations)
RR  RU
4. Note: If the errors are not normally distributed, via the central limit theorem,
~  K2U  KR
RU /  N  KU 
Seemingly Unrelated Regression (SUR)
Forecast Errors:
Let the estimated relationship between X and Y be: Y  ˆ  ˆ X
The forecasted value of Y when X is equal to some pre-specified value, X0, is Yf and is given by: Y f  ˆ  ˆ X 0
_


(
X

X
)2 
1
0

The variance of the mean value of the Y’s when X = X0 is given by: var Y f   var(u )

 N ( N  1) var( X ) 


_


1
( X 0  X )2 
The variance of one observation of Y when X=X0 is given by: var(Y f )  var( u ) 1  
 N ( N  1) var( X ) 



1
In matrix notation, the variance of the mean prediction is given by: var Yˆ0 | X 0  var(u ) x0 ( X X ) x0




In matrix notation, the variance of the individual prediction is given by: var Yˆ0 | X 0  var(u ) 1  x0 ( X X ) 1 x0 
Transformed Dependent Variables
R2’s from models with different dependent variables cannot be directly compared. Suppose Z = f(Y) is a transformation of Y. Using Z
as the dependent variable, obtain fitted values Ẑ . Employ the inverse function to the fitted values, to obtain f 1 ( Zˆ ) . Find the squared
correlation between f 1 ( Zˆ ) and Y. This is the transformed R2.
OLS Assumptions:
The error term has a mean of zero.
There is no serial correlation in the error term.
The error term is homoskedastic.
There are no extraneous regressors.
There are no omitted regressors.
The exogenous regressors are not systematically correlated.
The exogenous regressors have no random component.
The true relationship between the dependent and independent variables is linear.
The regression coefficients are constant over the sample.
Test for significance of a correlation (or multiple correlation coefficient)
t N 2  r
N 2
1 r2
for correlation, r, and n observations.
Three Possible Goals of Regression Analysis
Goal: Find determination equation for Y
Goal: Measure the effect of X on Y
Goal: Forecast Y
*
Seek: All regressors that jointly yield statistically significant regression coefficients.
Seek: Coefficient for X (regardless of its significance) and all other regressors that jointly
yield statistically significant regression coefficients.
Seek: High adjusted R2, all t-stats greater than or equal to 1, statistically significant F statistic.
Research indicates that exogenous regressors which produce t-stats which are insignificant though greater than 1 should be included in the regression when the
purpose of the regression is to produce a forecast.
†
Serial correlation can sometimes be caused by an omitted variable and is corrected by including the omitted variable in the regression.
‡
In the model y=+X+u,  is interpreted as the change in y given a change in X. In the model ln(y)=+X+u,  is interpreted as the relative change in y given a
change in X. In the model y=+ln(X)+u,  is interpreted as the change in y given a relative change in X. Multiplying a relative change by 100 yields a percentage change.
§
Whether or not the parameter estimates are to be considered “consistent” depends on the where in the data set the regime shift occurs. If, for example, the regime
shift occurs close to the beginning of the data set, as more observations are included after the regime shift the parameter estimates become (on average) more unbiased estimates of the true parameters that exist after the regime shift, but no less biased (and possibly more biased) estimates of the true parameters that exist
before the regime shift.
**
When the standard errors of a parameter estimate are inefficient, they are larger than the OLS standard errors in the absence of any statistical anomalies. Thus, a
parameter estimate that is significant even in the presence of inefficient standard errors would be even more significant in the presence of efficient standard errors.
By contrast, biased standard errors could be larger or smaller (we usually don’t know which is the case) than the OLS standard errors in the absence of statistical
anomalies. Thus, we can conclude nothing about the significance of a parameter estimate in the presence of biased standard errors.
††
Note the following: (1) failure to use a constant term in a regression makes the Durbin-Watson statistic biased, (2) the D-W statistic tests only for serial correlation of order 1, (3) the D-W statistic is unreliable in the presence of regressors with stochastic components (in fact, the combination of a lagged dependent variable as a regressor and a positively autocorrelated error term will bias the D-W statistic upward), (4) it has been shown that when the regressors are slowly changing series (as is the case with many economic series), the true critical value of the D-W statistic will be close to the Durbin-Watson upper bound.
‡‡
The correlation coefficients from different regressions can only be compared if the dependent variable is the same in both regressions.
§§
This problem also exists when the lagged dependent variable is used as an explanatory regressor and when the error term from that regression exhibits a degree
of serial correlation greater than or equal to the lag.
Download