GEODA DIAGNOSTICS FOR

advertisement
ESE 502
Tony E. Smith
_______________________________________________________________________
GEODA DIAGNOSTICS FOR
SPATIAL REGRESSION
The following notes are based on the example of a spatial lag model for Eire using the
(modified) rook matrix discussed above.
1. R-Square
Recall that the spatial lag model can be written as
(1)
Y  WY  X     Y  WY  X   
 BY  X   
 Y  X   
where B  I n  W , and Y  BY . This is the reduced form of the model, which for
any fixed value of  is necessarily an OLS model. Hence if for any given observations,
y , and corresponding maximum-likelihood estimates ( ˆ , ˆ ) , we denote the estimated
residuals by
(2)
ˆ  yˆ  X ˆ
then the pseudo R-square value for this model is simply
(2)
2
Rpseudo
1 
ˆˆ
yˆ Dyˆ
( 0.47756 in MATLAB)
where D  I n  ( 1n )1n1n , is the deviation matrix (that subtracts sample means). But while
this yields a well-defined R-square value, its interpretation is complicated by the fact that
it represents the fraction of explained variation in Y rather than Y . As an alternative
approach, the value computed in GEODA replaces yˆ in (2) with y . So the GEODA
value computed is
(3)
2
Rgeoda
1 
ˆˆ
yDy
( 0.740039 )
which seems to show a “better fit” than the values in (2). But while the second term in (3)
“appears” to be the fraction of unexplained variation in Y , this interpretation is only
1
ESE 502
Tony E. Smith
_______________________________________________________________________
meaningful when sample variance is decomposable into “explained” and “unexplained”
variance. If we denote the predicted value, X ˆ , of yˆ in model (1) by
(4)
ŷˆ  X ˆ
then this decomposition is automatically true in (2), where the equivalent form of
explained variation is given by
(5)
2
Rpseudo

yˆˆ D yˆˆ
(  0.47756 in MATLAB)
yˆ D yˆ
with ŷˆ  X ˆ . But in the GEODA case, the corresponding “explained variation” is
(6)
2
Rgeoda

yˆˆ D yˆˆ
( 0.23763 in MATLAB)
yDy
2
which is not only different from (3), but in this case is actually close to 1  Rgeoda
in (3).
In fact, the situation is even worse, since the actual meaning of this denominator is not
clear. To see this, note that model (1) can be written equivalently as
(7)
BY  X     Y  B1 X   B1
which in turn implies that,
(8)
cov(Y )  B1 cov( )( B ) 1   2 ( B B ) 1
But since the diagonal of ( B B ) 1 is not constant, this means that each component of
Y  (Y1,.., Yn ) has a different variance. Hence the classical “variance” estimator,
(9)
1
n
yDy  1n ( y  y1n )( y  y1n ) 
1
n

n
i 1
( yi  y )2
used for the denominators of (3) and (6), has no real meaning in this case, since there is
no “common variance” to be estimated.
2. Akaike and Schwarz Information Criteria
If the log likelihood of the model estimates is denoted by
(10)
L  L(ˆ | y, X ) (= -47.9943 for Eire example)
2
ESE 502
Tony E. Smith
_______________________________________________________________________
where ˆ  (ˆ1,..,ˆk ) is the vector of parameter estimates, then the Akaike Information
Criterion (AIC) is given by
(11)
AIC  2   L  k ( = 2 {47.9943  3}  101.99 for Eire example in GEODA)
[Note: In the present case, ˆ  ( ˆ0 , ˆ1,ˆ 2 , ˆ) , so this should be k  4 ]. Intuitively, AIC is
an “adjusted log likelihood” that is penalized for the number of parameter fitted, in a
manner analogous to adjusted R-Square.
A second criterion that incorporates sample size (n ) into the penalty term is the Schwarz
Information Criterion [more commonly known as the Bayesian Information Criterion
(BIC)] which is given by
(12)
BIC  2 { L  (k / 2)log(n)}
( = 2 {47.9943  (3/ 2) log(26)} = 105.76 for Eire example in GEODA)
3. Breusch-Pagan Test of Heteroscedasticity
The Breusch-Pagan Test considers heteroscedastic variance models of the form
(13)
   ( 0   xi ) 
2
i
2
 i2
 0   xi
2
where xi  (1, xi1,.., xik ) is the vector of explanatory variables (plus intercept) for
observation i and  i2  var( i ) . The appropriate null hypothesis is then, H 0 :   0 . To
test this hypothesis, observe that since var( i )  E ( i2 ) , the residual, ˆi2 , constitutes a
one-sample estimate of  i2 . If the mean-square error is denoted by
(14)
s 2  1n ˆˆ 
1
n
ESS
then under H 0 , the sample vector
(15)
 ˆ 2 ˆ 2 
r   12 ,.., n2 
s 
s
is a natural estimate of ( 12 /  2 ,.., n2 /  2 ) . Hence if one regresses r on the set of
explanatory variables, X  [1, x1,.., xk ] , then “significantly large” values for the model
sum of squares (MSS) of this regression (under the null hypothesis H 0 ) indicate that
3
ESE 502
Tony E. Smith
_______________________________________________________________________
model (9) fits better than would be expected under H 0 . The appropriate Breusch-Pagan
statistic, S BP , is thus taken to be half this MSS,
(16)
S BP  12 MSS  12  r X ( X X ) 1 X r  n  (= 0.0664 for Eire example)
which can be shown to be asymptotically distributed  k21 under H 0 . In the Eire case, this
value is not sufficiently high to suggest the presence of significant heteroscedasticity (Pvalue = 0.797)
4
Download