PPA 207: Quantitative Methods Meeting 11, Spring 2004 1. Homework Studenmund, Chapter 9, Number 11 a. The calculated DW statistic is 0.85. To find the critical lower and upper DW statistics at a 5% significance level need to know that N = 40 and K = 3 (three explanatory variables). Table B-5 in Studenmund reveals dL= 1.25 and dU = 1.57. Since 0.85 is < 1.25, we can reject the null hypothesis that there is no positive serial correlation. b. Using a two-tailed test at the 1% level of significance with 36 (30) dof, the critical t from Table B-1 in Studenmund is 2.750. The actual t for L is 0.04, for P is 2.6, and for W is 3.0. Therefore the only statistically significant explanatory variable is W (dummy if game played on Friday, Saturday, or Sunday). c. The winning percentage of the Lakers opponent (P) is very close to being statistically significant, while the winning percentage of the Lakers (L) is not. I would not consider L an irrelevant variable. The Lakers may have a core fan base that attends all games that only changes based upon the quality of the opposing team and day of game. This may not be that case for other NBA teams. d. I would expect it to be impure because there are explanatory variables omitted from this specification. The intro paragraph mentioned that late season games are more likely to be attended. There is no explanatory dummy variable in specification to account for this. But even after correcting for this, pure serial correlation could still exist. e. The decision to omit first game is debatable. The variables representing winning percentages could just be set to zero. 2. Studenmund, Chapter 10, Heteroskedasticity With exception of simultaneous equations (extra credit), the last topic needed for final paper Violation of Classical Assumption V Error term observations are not drawn from a sample with constant variance Example: explaining waist size of sample of people that contains marathon runners and others Based upon other observable characteristics Variance in error term surrounding others greater than around marathon runners Example: explaining amount spent on automobile from sample that contains all income groups Based upon taste and price factors Variance in error term surrounding high income people greater than low income people More likely in cross sectional models Pure heteroskedasticity See Figure 10.1 Error term’s variance can change depending on observation Narrow or wide distribution Usually due to differences in “size” of observations in a sample Example: data sets from states, counties, or cities Population size differences A common form of pure heteroskedasticity VAR (ei) = σ2Zi2 Variance in error term is proportion to a factor Z Z may or may not be an explanatory variables in regression See Figures 10.2 and 10.3 Can occur in non-cross section models Time series model: sale of VCRs over past 30 years Data collection methods are better for some entities (cross section) Or improve over time (time series) Impure heteroskedasticity Due to an omitted variable Portion of omitted effect is absorbed by the error term Example of a dummy variable for marathon runner in previous example Consequences of pure heteroskedasticity (1) Does not bias regression coefficient estimates (2) Does increase the variance of the regression coefficient estimates (3) Does underestimate the standard errors of regression coefficients (4) t and F statistics cannot be relied upon Higher t scores than if heteroskedasticity not present Simple detection of heteroskedasticity Correct obvious omitted variables Is it a cross-sectional study were it is more likely to occur? Run standard regression, recover residuals, square residuals, and plot against Z to look for a relationship Park test Log the square of residuals and regress against the log of chosen Z Check for statistical significance of regression coefficient See Woody’s restaurant example White test If an obvious Z proportionality factor is not present Original regression: Y = f(X1, X2, X3) Run regression, retrieve residuals, square them and run E2 = f (X1, X2, X3, X12, X22, X32, X1X2, X2X3, X1X3) Problem if many explanatory variables and few observations Test statistic = N x R2 Look up critical chi-square value in Table B-8 with dof = K (or 9 in example) 10% significance = 14.68 If NR2 > 14.68, then reject null of no heteroskedasticity Remedies for heteroskedasticity Weighted least squares If Park Test reveals that Z is statistically significant Divide your dependent variable and all explanatory variables by Z Re-estimate regression If Z is not an explanatory variable, then straightforward If Z is an explanatory variable Z/Z = 1 (is same as constant term) 1/Z = constant term So divide all non-Z variables by Z for new estimation Create new variable 1/Z (this is your new constant) The reported constant is the new effect of Z White’s heteroskeadsticity corrected standard errors Not available in SPSS Rethink regression Log-log functional form Inherently less variation Discount for the scale factor in your theory Use per-capita income in city, crimes per 1000 people, etc Instead of trying to explain total income and total crimes Could still have heteroskedasticity, but less likely Example 10.5 in Studenmund Regression example: LUALAND = f (LUAPOP, LAGPRICE, LPCAPINC, LAUTODEP, LCPPOVER) Calculate regression Save residuals and square them Run Park Test Run WLS 3. Student based discussion of “Does Sprawl Reduce the Black/White Housing Consumption Gap” 4. Homework Due the Start of Meeting 12 (1) Would anyone like to prepare a 10 minute presentation on Ziliak and McCloskey article for April 20 that would highlight the main points they are making in regard to how policymakers should interpret regression results? You will have access to a projection of article in classroom. Just prepare some overheads or notes to lead a discussion on this topic. Grade granted will substitute for one HW grade. More than one person can do. (2) Read all of the material under meeting twelve in the syllabus; come prepared to discuss. (3) A typed and well developed question from reading assignment for week twelve. (4) Answer question 11 in Studemund, Chapter 10. (5) Run your base log-log regression that you will use in your paper and retrieve the residuals from it. Perform a Park Test to see if heteroskeadsticity is present. Then perform a new weighted regression that corrects for heteroskedasticity (do this regardless of whether your Park Test revealed heteroskedasticity). Submit all of your SPSS output and a one to two-paged typed description of the entire process you undertook and the results. In doing this, make sure you describe each step.