Meeting Eleven

advertisement
PPA 207: Quantitative Methods
Meeting 11, Spring 2004
1. Homework
Studenmund, Chapter 9, Number 11
a. The calculated DW statistic is 0.85. To find the critical lower and upper DW
statistics at a 5% significance level need to know that N = 40 and K = 3 (three
explanatory variables). Table B-5 in Studenmund reveals dL= 1.25 and dU =
1.57. Since 0.85 is < 1.25, we can reject the null hypothesis that there is no
positive serial correlation.
b. Using a two-tailed test at the 1% level of significance with 36 (30) dof, the
critical t from Table B-1 in Studenmund is 2.750. The actual t for L is 0.04, for P
is 2.6, and for W is 3.0. Therefore the only statistically significant explanatory
variable is W (dummy if game played on Friday, Saturday, or Sunday).
c. The winning percentage of the Lakers opponent (P) is very close to being
statistically significant, while the winning percentage of the Lakers (L) is not. I
would not consider L an irrelevant variable. The Lakers may have a core fan
base that attends all games that only changes based upon the quality of the
opposing team and day of game. This may not be that case for other NBA
teams.
d. I would expect it to be impure because there are explanatory variables omitted
from this specification. The intro paragraph mentioned that late season games
are more likely to be attended. There is no explanatory dummy variable in
specification to account for this. But even after correcting for this, pure serial
correlation could still exist.
e. The decision to omit first game is debatable. The variables representing
winning percentages could just be set to zero.
2. Studenmund, Chapter 10, Heteroskedasticity

With exception of simultaneous equations (extra credit), the last topic
needed for final paper

Violation of Classical Assumption V
Error term observations are not drawn from a sample with constant
variance
Example: explaining waist size of sample of people that contains
marathon runners and others
Based upon other observable characteristics
Variance in error term surrounding others greater than
around marathon runners
Example: explaining amount spent on automobile from sample that
contains all income groups
Based upon taste and price factors
Variance in error term surrounding high income
people greater than low income people

More likely in cross sectional models

Pure heteroskedasticity
See Figure 10.1
Error term’s variance can change depending on observation
Narrow or wide distribution
Usually due to differences in “size” of observations in a sample
Example: data sets from states, counties, or cities
Population size differences
A common form of pure heteroskedasticity
VAR (ei) = σ2Zi2
Variance in error term is proportion to a factor Z
Z may or may not be an explanatory variables in
regression
See Figures 10.2 and 10.3
Can occur in non-cross section models
Time series model: sale of VCRs over past 30 years
Data collection methods are better for some entities (cross section)
Or improve over time (time series)

Impure heteroskedasticity
Due to an omitted variable
Portion of omitted effect is absorbed by the error term
Example of a dummy variable for marathon runner in
previous example


Consequences of pure heteroskedasticity
(1) Does not bias regression coefficient estimates
(2) Does increase the variance of the regression coefficient estimates
(3) Does underestimate the standard errors of regression coefficients
(4) t and F statistics cannot be relied upon
Higher t scores than if heteroskedasticity not present
Simple detection of heteroskedasticity
Correct obvious omitted variables
Is it a cross-sectional study were it is more likely to occur?
Run standard regression, recover residuals, square residuals, and plot
against Z to look for a relationship

Park test
Log the square of residuals and regress against the log of chosen Z
Check for statistical significance of regression coefficient
See Woody’s restaurant example

White test
If an obvious Z proportionality factor is not present
Original regression: Y = f(X1, X2, X3)
Run regression, retrieve residuals, square them and run
E2 = f (X1, X2, X3, X12, X22, X32, X1X2, X2X3, X1X3)
Problem if many explanatory variables and few
observations
Test statistic = N x R2
Look up critical chi-square value in Table B-8 with dof
= K (or 9 in example)
10% significance = 14.68
If NR2 > 14.68, then reject null of no
heteroskedasticity

Remedies for heteroskedasticity
Weighted least squares
If Park Test reveals that Z is statistically significant
Divide your dependent variable and all explanatory variables by Z
Re-estimate regression
If Z is not an explanatory variable, then straightforward
If Z is an explanatory variable
Z/Z = 1 (is same as constant term)
1/Z = constant term
So divide all non-Z variables by Z for new
estimation
Create new variable 1/Z (this is your new
constant)
The reported constant is the new effect of Z
White’s heteroskeadsticity corrected standard errors
Not available in SPSS
Rethink regression
Log-log functional form
Inherently less variation
Discount for the scale factor in your theory
Use per-capita income in city, crimes per 1000 people, etc
Instead of trying to explain total income and total crimes
Could still have heteroskedasticity, but less likely

Example 10.5 in Studenmund

Regression example: LUALAND = f (LUAPOP, LAGPRICE, LPCAPINC,
LAUTODEP, LCPPOVER)
Calculate regression
Save residuals and square them
Run Park Test
Run WLS
3. Student based discussion of “Does Sprawl Reduce the Black/White Housing
Consumption Gap”
4. Homework Due the Start of Meeting 12
(1) Would anyone like to prepare a 10 minute presentation on Ziliak and
McCloskey article for April 20 that would highlight the main points they are
making in regard to how policymakers should interpret regression results? You
will have access to a projection of article in classroom. Just prepare some
overheads or notes to lead a discussion on this topic. Grade granted will
substitute for one HW grade. More than one person can do.
(2) Read all of the material under meeting twelve in the syllabus; come prepared
to discuss.
(3) A typed and well developed question from reading assignment for week
twelve.
(4) Answer question 11 in Studemund, Chapter 10.
(5) Run your base log-log regression that you will use in your paper and retrieve
the residuals from it. Perform a Park Test to see if heteroskeadsticity is present.
Then perform a new weighted regression that corrects for heteroskedasticity (do
this regardless of whether your Park Test revealed heteroskedasticity). Submit
all of your SPSS output and a one to two-paged typed description of the entire
process you undertook and the results. In doing this, make sure you describe
each step.
Download