Statistical and Econometric Issues Associated with Community Economic Modeling Steven Deller Department of Agricultural and Applied Economics University of Wisconsin-Madison Statistical and Econometric Issues Associated with Community Economic Modeling In an idealized world we hope for Theoretical Rigor Simulation Ease / Reasonableness Statistical and Econometric Issues Associated with Community Economic Modeling In real world we get Theoretical Rigor Simulation Ease / Reasonableness Statistical and Econometric Issues Associated with Community Economic Modeling The ultimate purpose for building and using the model will directly affect your choice of econometric approaches. If theoretical insights and rigor are paramount, then one will tend towards the construction and estimation of a system of simultaneous equations. If simulation ease and prediction accuracy are paramount, then one will tend towards the construction and estimation of a set of reduced form equations. There is no “right” answer. The art versus the science of community economic modeling. Statistical and Econometric Issues Associated with Community Economic Modeling Overview of a BLUE Model Basic Assumptions: 1. The model is correctly specified. 2. The matrix of exogenous variables (X) are non-stochastic E(X’e) = 0 or/ cov(X’e) = 0 3. Normality E(ei) = 0 In 4. Independence E(eiej) = 0 Ij n 5. Constant Variance E(ei ei) = 2 In Statistical and Econometric Issues Associated with Community Economic Modeling Key Assumptions for Our Problem The model is correctly specified Specification Error Functional Form Multi-parameter Testing Incorporating Information The matrix of exogenous variables (X) are non-stochastic E(X’e) = 0 or/ cov(X’e) = 0 Endogenous Variables Errors in Variables System of Structural Equations Statistical and Econometric Issues Associated with Community Economic Modeling The model is correctly specified Relevant for both structural equations and reduced form approach Inclusion of irrelevant variables OLS yields unbiased, but inefficient estimates Omitted relevant variables OLS yields biased, and weakly inefficient estimates Trade-off between too many variables and not enough is efficiency (minimum variance) and biased parameter estimates If unbiased parameters are the goal, error on the side of too many variables If hypothesis testing is the goal, error on the side of keeping the model simple Statistical and Econometric Issues Associated with Community Economic Modeling The model is correctly specified Specification Rules Overfit the Model: Let the sample evidence drive the specification of the model (e.g., maximize R2) Encompassing Principle: The chosen model should account for results of competing models and explain something new itself Fragility Analysis: Chose the model which is least sensitive to minor changes in specification Appeal to Economic Theory: Let theory dictate specification and imposition of additional information Ad Hoc Statistical Criterion: 1. Change in R2 (stepwise regression) -Over simplification of problem -R20 (when do you stop?) -very easy to manipulate the R2 Statistical and Econometric Issues Associated with Community Economic Modeling The model is correctly specified Statistical Criterion 2. Cp Criterion (mean squared prediction error) Full Model: y = X + e Subset Model: y = X1 + e1 (n-k1) 12 Cp = --------------- + (2 k1 – n) ~ F(n-k1)(2k1-n) 2 Rule: Pick X1 X such that Cp < Fcrit 3. Amemiya Prediction Criterion (PC) PC = 12(1+ k1/n) Rule: Pick X1 X such that CP is minimized Statistical and Econometric Issues Associated with Community Economic Modeling The model is correctly specified Statistical Criterion 4. Akaike Final Prediction Error (FPE) FPE = 12(n+k1)/(n-k1) Rule: Pick X1 X such that FPE is minimized 5. Akaike Information Criteria (AIC) AIC = ln 12 + 2k/n Rule: Pick X1 X such that AIC is minimized 6. Standard F-test (test of linear restrictions) (SSErestricted – SSEunrestricted)/J SSEunrestricted/(n-k) Where Xrestricted X Statistical and Econometric Issues Associated with Community Economic Modeling Is the functional form correct Theory seldom lends insight into what the correct functional form of the statistical relationship should be. Option #1: PUNT use linear / Cobb-Douglas Option #2: Flexible Functional Forms Quadratic: y = + IxI + ijxIxj Translog: ln y = + IlnxI + ij(lnxI)(lnxj) (see Griffin, Montgomery and Rister, WJAE, Dec 1987) Statistical and Econometric Issues Associated with Community Economic Modeling Is the functional form correct Option #3: Box-Cox ( or Box-Tidwell) y - 1 x - 1 ------- = + ------- lim 0 (y - 1)/ lny C-D lim 0 (x - 1)/ lnx and lim (y - 1)/ y 1 linear lim 1 (x - 1)/ x Box-Tidwell allows to vary for each variable Statistical and Econometric Issues Associated with Community Economic Modeling Endogenous Variables Or/ Errors in Variables Result of using OLS in this case, parameter estimates are biased and inconsistent. Hausman Test Explicit test if plim 1/n(Xe)=0 where Ho is that X and e are orthogonal in large samples Let q = (IV - ols) Let B = IV2 ((’)-1 – (X’X)-1)) Where = W(W’W)-1W’X And (W’W) is the design matrix for the IV model and (X’X) is the design matrix for the OLS model TS q’B-1q ~ 2 if TS > 2 problem exits or/ (IV - ols) is statistically “big” Statistical and Econometric Issues Associated with Community Economic Modeling Endogenous Variables Or/ Errors in Variables Potential Solutions Option #1: Direct use of Instrumental Variables, via proxy or 2SLS Option #2: Estimation of a Simultaneous Set of Equations, via 2SLS or 3SLS Issues to consider: Is the system Identified? Order Condition (necessary) “The number of exogenous variables excluded from the equation must be greater than or equal to the number of endogenous variables included in the equation.” Statistical and Econometric Issues Associated with Community Economic Modeling Endogenous Variables Or/ Errors in Variables Rank Condition (sufficient) “The rank of the matrix of rhs endogenous variables must be equal to M-1 where M is the number of equations.” Some general rules: An equation that contains one endogenous variable and all exogenous variables in the system is just identified. An equation that contains all of the variables in the system is not identified. If none of the excluded variables of the ith equation appears in the jth equation, the ith equation is not identified. If two equations contain the same set of variables, neither equation is identified. If the same excluded variables of the ith equation are also excluded from the jth equation, the ith equation is not identified. Statistical and Econometric Issues Associated with Community Economic Modeling Endogenous Variables Or/ Errors in Variables Issues to Consider Simulation of a set of simultaneous equations, or simulation of the reduced form equations derived from the simultaneous equations? Structural: Y + XB = E Reduced: Y = X + V Simultaneous estimation of each module independently or as a group? Labor housing Demographic fiscal retail Statistical and Econometric Issues Associated with Community Economic Modeling Concluding Thoughts The art of model building versus the science of economics. Intent of undertaking? Start simple, future generations can add theoretical and empirical rigor. Discourage a community from investing millions of dollars because your model predicts one number to be bigger or smaller than another number.