Welcome to Econ 420 Applied Regression Analysis Study Guide Week Six The F-Test of Overall Significance of Equation Testing to see if, in general, our equation is any good at all Step 1: State the null and alternative hypotheses. Step 2: Choose the level of significance; find critical F( pages 316-319, d.f. of numerator =k and d.f. of denominator = n-k-1); state the decision rule Step 3: Estimate the Regression; find F- Stat (formula on page 56, EViews calculates F-Stat automatically). Step 4: Apply the decision rule • If FStat > critical F reject null hypothesis The overall fit of the estimated model • • • • • • • Graph of total, explained, and residual sums of squares TSS = RSS + ESS Divide both sides by TSS 1 = RSS/TSS + ESS/TSS The coefficient of determination (R2) R2 = ESS/TSS, or Definition: Percentage of total variation of the dependent variable around its mean that is explained by the independent variables R2 • R2 = 1 – RSS/TSS • the smaller the sum of squared residuals the _______ the R2 • Under what condition R2 = 1? • Under what condition R2 = 0? • In the presence of an intercept 1> R2>0 • Suppose we got an R2 =0.7. What does this number mean? Problem of R2 • • • • Remember our height –weight example Suppose R 2 = 0.7 Now suppose we add another independent variable to our model: pairs of shoes each individual owns Does R2 go up? – May be • Should it go up? – No Problem: The addition of an irrelevant variable never decreases R2 • Why? 1. If there is no correlation between the added variable and dependent variable, then the estimated coefficient will be zero and RSS does not change 2. Sometimes the addition of an irrelevant independent variables to the model increases R2 • Why? • There may (accidentally) be a correlation between the weight and pairs of shoes. This diminishes the sum of squared residuals R Bar Squared (Adjusts R squared for degrees of freedom.) Adjusted R Squared • As K goes up what happens to R bar squared? 1. The sum of squared residuals may go down. – What does this do to R bar squared? – 2. (n-k-1) goes down the term in the bracket goes up – – – • • R bar squared may go up R bar squared goes down R bar squared goes up if the first effect is stronger than the second effect. This is more likely to happen if the added independent variable is a relevant variable Note: High R or R bar squared is not the only sign of a good fit. EViews reports both R2 and Rbar2 Steps in Applied Regression Analysis (Chapter 4) 1. Identify the question 2. Review the literature a) Theoretical literature will help you to • Specify the model • Dependent and Independent Variables • Real/nominal variables • Omitted variables • Extra variables • Functional form • Hypothesize the expected signs of coefficients • A perfect but useless regression (cause and effect rather than equality) Effects of Omitted Variables • Example • True equation is Y = f (X1,X2) – Where – Y = GPA – X1,= hours of study – X2 = IQ score • We fail to include X2 in our model • Does this violate any assumptions? – Go back and study the assumptions to answer this question • Violates assumption 1. Why? • May violate assumption 3. Why? Effects of Omitted Variables • What if X1 and X2 are correlated? – Does this violate any assumptions? • OLS is not BLUE • The estimated coefficient of X1 (that is, B^1) is biased • Bias depends on the correlation between X1 & X2 and the coefficient of X2 in true regression line. Direction of Bias The sign (direction) of Bias • Bias is zero either 1. if X2 does not affect Y (Bomitted is zero), or 2. if X2 is not correlated with X1 • • • How do you expect IQ (X2) to affect GPA (Y)? How are IQ (X2) and Hours of study (X1) correlated? What is the direction of bias in our example? – Will B^1 be bigger or smaller than it actually should be? The Variance of The estimated Coefficient • Fact: – When we omit a relevant independent variables that is correlated with other independent variables, variance of the estimated coefficients of the included independent variable goes down t statistic goes up t-test may yield significant coefficient while it should not When should we suspect the omitted variable problem? 1. The adjusted R squared is low 2. The magnitude or the sign of the estimated coefficients is not as expected 3. The unimportant variables end up being highly significant Correction for Omitted Variables • Study the theoretical literature again • Include the omitted variable based on the Expected bias analysis Irrelevant Variable Problem • Suppose the true regression model: GPA = f (Hours of study), but • Our version of the true model: GPA = f (hours of study, and weight of the person) • Does our model violate assumption 1? • Any other assumptions are violated? • Is our estimator bias? – Not necessarily: if the expected value of the error term is zero, the expected value of Bhat on hours of study = B • Does our estimator have the minimum variance? • No, our estimator does not have the smallest variance (not the most efficient) • How does this affect t-test? – variance of the estimated coefficients of hours of study goes up t statistic goes down t-test may mot yield significant coefficient on hours of study while it should. Should we include X in the set of our independent variable? • Yes, if 1. Theory calls for its inclusion (the most important criterion) 2. T- test: the estimated coefficient of X is significant in the right direction (Note: this does not mean that if the estimated coefficient is insignificant you have to drop the variable from your model.) 3. As you include X, the adjusted R squared goes up. 4. As you include X, the other variables’ coefficients change significantly. b) Empirical literature will help you to • • • • • See what others have done Their variables Their functional forms Their data sets Their findings 3. Choose a sample & collect data • • Cross Sectional/ Time Series Degrees of freedom 4. Estimate and evaluate the equation a) Overall Quality of estimation • Adjusted R squared • F- test b) Test your hypotheses 5. Document the results • Predictions • Policy recommendations Assignment 5 (5 questions for 10 points each, total =50 points) Due: before 10PM on Friday, October 5) 1. Use the data set in dvd4 file to • • • run an F test of the overall significance of the equation. test the significance of all of the estimated coefficients at 1% level. Make sure to not skip any of the 4 steps in hypothesis testing. Attach your EViews output. construct a 95% confidence interval for the coefficient on income. Assignment 5 (continued) 2. #17, Page 63 3. #4, PP 81-82 4. #5, Page 82 5. #6, Page 83