ESE 502 Tony E. Smith _______________________________________________________________________ GEODA DIAGNOSTICS FOR SPATIAL REGRESSION The following notes are based on the example of a spatial lag model for Eire using the (modified) rook matrix discussed above. 1. R-Square Recall that the spatial lag model can be written as (1) Y WY X Y WY X BY X Y X where B I n W , and Y BY . This is the reduced form of the model, which for any fixed value of is necessarily an OLS model. Hence if for any given observations, y , and corresponding maximum-likelihood estimates ( ˆ , ˆ ) , we denote the estimated residuals by (2) ˆ yˆ X ˆ then the pseudo R-square value for this model is simply (2) 2 Rpseudo 1 ˆˆ yˆ Dyˆ ( 0.47756 in MATLAB) where D I n ( 1n )1n1n , is the deviation matrix (that subtracts sample means). But while this yields a well-defined R-square value, its interpretation is complicated by the fact that it represents the fraction of explained variation in Y rather than Y . As an alternative approach, the value computed in GEODA replaces yˆ in (2) with y . So the GEODA value computed is (3) 2 Rgeoda 1 ˆˆ yDy ( 0.740039 ) which seems to show a “better fit” than the values in (2). But while the second term in (3) “appears” to be the fraction of unexplained variation in Y , this interpretation is only 1 ESE 502 Tony E. Smith _______________________________________________________________________ meaningful when sample variance is decomposable into “explained” and “unexplained” variance. If we denote the predicted value, X ˆ , of yˆ in model (1) by (4) ŷˆ X ˆ then this decomposition is automatically true in (2), where the equivalent form of explained variation is given by (5) 2 Rpseudo yˆˆ D yˆˆ ( 0.47756 in MATLAB) yˆ D yˆ with ŷˆ X ˆ . But in the GEODA case, the corresponding “explained variation” is (6) 2 Rgeoda yˆˆ D yˆˆ ( 0.23763 in MATLAB) yDy 2 which is not only different from (3), but in this case is actually close to 1 Rgeoda in (3). In fact, the situation is even worse, since the actual meaning of this denominator is not clear. To see this, note that model (1) can be written equivalently as (7) BY X Y B1 X B1 which in turn implies that, (8) cov(Y ) B1 cov( )( B ) 1 2 ( B B ) 1 But since the diagonal of ( B B ) 1 is not constant, this means that each component of Y (Y1,.., Yn ) has a different variance. Hence the classical “variance” estimator, (9) 1 n yDy 1n ( y y1n )( y y1n ) 1 n n i 1 ( yi y )2 used for the denominators of (3) and (6), has no real meaning in this case, since there is no “common variance” to be estimated. 2. Akaike and Schwarz Information Criteria If the log likelihood of the model estimates is denoted by (10) L L(ˆ | y, X ) (= -47.9943 for Eire example) 2 ESE 502 Tony E. Smith _______________________________________________________________________ where ˆ (ˆ1,..,ˆk ) is the vector of parameter estimates, then the Akaike Information Criterion (AIC) is given by (11) AIC 2 L k ( = 2 {47.9943 3} 101.99 for Eire example in GEODA) [Note: In the present case, ˆ ( ˆ0 , ˆ1,ˆ 2 , ˆ) , so this should be k 4 ]. Intuitively, AIC is an “adjusted log likelihood” that is penalized for the number of parameter fitted, in a manner analogous to adjusted R-Square. A second criterion that incorporates sample size (n ) into the penalty term is the Schwarz Information Criterion [more commonly known as the Bayesian Information Criterion (BIC)] which is given by (12) BIC 2 { L (k / 2)log(n)} ( = 2 {47.9943 (3/ 2) log(26)} = 105.76 for Eire example in GEODA) 3. Breusch-Pagan Test of Heteroscedasticity The Breusch-Pagan Test considers heteroscedastic variance models of the form (13) ( 0 xi ) 2 i 2 i2 0 xi 2 where xi (1, xi1,.., xik ) is the vector of explanatory variables (plus intercept) for observation i and i2 var( i ) . The appropriate null hypothesis is then, H 0 : 0 . To test this hypothesis, observe that since var( i ) E ( i2 ) , the residual, ˆi2 , constitutes a one-sample estimate of i2 . If the mean-square error is denoted by (14) s 2 1n ˆˆ 1 n ESS then under H 0 , the sample vector (15) ˆ 2 ˆ 2 r 12 ,.., n2 s s is a natural estimate of ( 12 / 2 ,.., n2 / 2 ) . Hence if one regresses r on the set of explanatory variables, X [1, x1,.., xk ] , then “significantly large” values for the model sum of squares (MSS) of this regression (under the null hypothesis H 0 ) indicate that 3 ESE 502 Tony E. Smith _______________________________________________________________________ model (9) fits better than would be expected under H 0 . The appropriate Breusch-Pagan statistic, S BP , is thus taken to be half this MSS, (16) S BP 12 MSS 12 r X ( X X ) 1 X r n (= 0.0664 for Eire example) which can be shown to be asymptotically distributed k21 under H 0 . In the Eire case, this value is not sufficiently high to suggest the presence of significant heteroscedasticity (Pvalue = 0.797) 4