Econ 420

advertisement
Welcome to Econ 420
Applied Regression Analysis
Study Guide
Week Six
The F-Test of Overall Significance
of Equation
Testing to see if, in general, our equation is any
good at all
Step 1: State the null and
alternative hypotheses.
Step 2: Choose the level of significance; find
critical F( pages 316-319, d.f. of
numerator =k and d.f. of denominator =
n-k-1); state the decision rule
Step 3: Estimate the Regression; find F- Stat
(formula on page 56, EViews calculates
F-Stat automatically).
Step 4: Apply the decision rule
•
If FStat > critical F  reject null hypothesis
The overall fit of the estimated
model
•
•
•
•
•
•
•
Graph of total, explained, and residual sums of
squares
TSS = RSS + ESS
Divide both sides by TSS
1 = RSS/TSS + ESS/TSS
The coefficient of determination (R2)
R2 = ESS/TSS, or
Definition: Percentage of total variation of the
dependent variable around its mean that is
explained by the independent variables
R2
• R2 = 1 – RSS/TSS
• the smaller the sum of squared residuals
the _______ the R2
• Under what condition R2 = 1?
• Under what condition R2 = 0?
• In the presence of an intercept  1> R2>0
• Suppose we got an R2 =0.7. What does
this number mean?
Problem of R2
•
•
•
•
Remember our height –weight
example
Suppose R 2 = 0.7
Now suppose we add another
independent variable to our model:
pairs of shoes each individual owns
Does R2 go up?
– May be
•
Should it go up?
– No
Problem: The addition of an
irrelevant variable never
decreases R2
• Why?
1. If there is no correlation between the
added variable and dependent
variable, then the estimated coefficient
will be zero and RSS does not change
2. Sometimes the addition of an irrelevant
independent variables to the model
increases R2
• Why?
• There may (accidentally) be a correlation
between the weight and pairs of shoes.
This diminishes the sum of squared
residuals
R Bar Squared
(Adjusts R squared for degrees of
freedom.)
Adjusted R Squared
•
As K goes up what happens to R bar squared?
1.
The sum of squared residuals may go down.
–
What does this do to R bar squared?
–
2.
(n-k-1) goes down the term in the bracket goes up
–
–
–
•
•
R bar squared may go up
R bar squared goes down
R bar squared goes up if the first effect is stronger than the
second effect.
This is more likely to happen if the added independent variable
is a relevant variable
Note: High R or R bar squared is not the only sign of a
good fit.
EViews reports both R2 and Rbar2
Steps in Applied Regression
Analysis (Chapter 4)
1. Identify the question
2. Review the literature
a) Theoretical literature will help you to
• Specify the model
• Dependent and Independent Variables
• Real/nominal variables
• Omitted variables
• Extra variables
• Functional form
• Hypothesize the expected signs of coefficients
• A perfect but useless regression (cause and
effect rather than equality)
Effects of Omitted Variables
• Example
• True equation is Y = f (X1,X2)
– Where
– Y = GPA
– X1,= hours of study
– X2 = IQ score
• We fail to include X2 in our model
• Does this violate any assumptions?
– Go back and study the assumptions to
answer this question
• Violates assumption 1. Why?
• May violate assumption 3. Why?
Effects of Omitted Variables
• What if X1 and X2 are correlated?
– Does this violate any assumptions?
• OLS is not BLUE
• The estimated coefficient of X1 (that is,
B^1) is biased
• Bias depends on the correlation between
X1 & X2 and the coefficient of X2 in true
regression line.
Direction of Bias
The sign (direction) of Bias
•
Bias is zero either
1. if X2 does not affect Y (Bomitted is zero), or
2. if X2 is not correlated with X1
•
•
•
How do you expect IQ (X2) to affect GPA (Y)?
How are IQ (X2) and Hours of study (X1)
correlated?
What is the direction of bias in our example?
–
Will B^1 be bigger or smaller than it actually should
be?
The Variance of The estimated
Coefficient
• Fact:
– When we omit a relevant independent variables that
is correlated with other independent variables,
variance of the estimated coefficients of the included
independent variable goes down t statistic goes up
t-test may yield significant coefficient while it should
not
When should we suspect the
omitted variable problem?
1. The adjusted R squared is low
2. The magnitude or the sign of the
estimated coefficients is not as expected
3. The unimportant variables end up being
highly significant
Correction for Omitted Variables
• Study the theoretical literature again
• Include the omitted variable based on the
Expected bias analysis
Irrelevant Variable Problem
• Suppose the true regression model: GPA = f
(Hours of study), but
• Our version of the true model: GPA = f (hours of
study, and weight of the person)
• Does our model violate assumption 1?
• Any other assumptions are violated?
• Is our estimator bias?
– Not necessarily: if the expected value of the error
term is zero, the expected value of Bhat on hours of
study = B
• Does our estimator have the minimum variance?
• No, our estimator does not have the smallest
variance (not the most efficient)
• How does this affect t-test?
– variance of the estimated coefficients of hours of
study goes up t statistic goes down t-test may
mot yield significant coefficient on hours of study
while it should.
Should we include X in the set of
our independent variable?
•
Yes, if
1. Theory calls for its inclusion (the most important
criterion)
2. T- test: the estimated coefficient of X is significant in
the right direction (Note: this does not mean that if
the estimated coefficient is insignificant you have to
drop the variable from your model.)
3. As you include X, the adjusted R squared goes up.
4. As you include X, the other variables’ coefficients
change significantly.
b) Empirical literature will help you to
•
•
•
•
•
See what others have done
Their variables
Their functional forms
Their data sets
Their findings
3. Choose a sample & collect data
•
•
Cross Sectional/ Time Series
Degrees of freedom
4. Estimate and evaluate the
equation
a) Overall Quality of estimation
• Adjusted R squared
• F- test
b) Test your hypotheses
5. Document the results
• Predictions
• Policy recommendations
Assignment 5 (5 questions for 10 points each, total =50 points)
Due: before 10PM on Friday, October 5)
1. Use the data set in dvd4 file to
•
•
•
run an F test of the overall significance of the
equation.
test the significance of all of the estimated
coefficients at 1% level. Make sure to not skip any
of the 4 steps in hypothesis testing. Attach your
EViews output.
construct a 95% confidence interval for the
coefficient on income.
Assignment 5 (continued)
2. #17, Page 63
3. #4, PP 81-82
4. #5, Page 82
5. #6, Page 83
Download