4.3 Confidence Intervals -Using our CLM assumptions, we can construct CONFIDENCE INTERVALS or CONFIDENCE INTERVAL ESTIMATES of the form: CI ˆ j t * se( ˆ j ) -Given a significance level α (which is used to determine t*), we construct 100(1- α)% confidence intervals -Given random samples, 100(1- α)% of our confidence intervals contain the true value Bj -we don’t know whether an individual confidence interval contains the true value 4.3 Confidence Intervals -Confidence intervals are similar to 2-tailed tests in that α/2 is in each tail when finding t* -if our hypothesis test and confidence interval use the same α: 1) we can not reject the null hypothesis (at the given significance level) that Bj=aj if aj is within the confidence interval 2) we can reject the null hypothesis (at the given significance level) that Bj=aj if aj is not within the confidence interval 4.3 Confidence Example -Going back to our Pepsi example, we now look at geekiness: Coˆol 4.3 0.3 Geek 0.5 Pepsi 2.1 R 0.62 2 0.25 0.21 N 43 -From before our 2-sided t* with α=0.01 was t*=2.704, therefore our 99% CI is: CI ˆ j t * se( ˆ j ) CI 0.3 2.704(0.25) CI [0.376,0.976] 4.3 Confidence Intervals -Remember that a CI is only as good as the 6 CLM assumptions: 1) Omitted variables cause the estimates (Bjhats) to be unreliable -CI is not valid 2) If heteroskedasticity is present, standard error is not a valid estimate of standard deviation -CI is not valid 3) If normality fails, CI MAY not be valid if our sample size is too small 4.4 Complicated Single Tests -In this section we will see how to test a single hypothesis involving more than one Bj -Take again our coolness regression: Coˆol 4.3 0.3 Geek 0.5 Pepsi 2.1 R 0.62 2 0.25 0.21 N 43 -If we wonder if geekiness has more impact on coolness than Pepsi consumption: H 0 : 1 2 H a : 1 2 4.4 Complicated Single Tests -This test is similar to our one coefficient tests, but our standard error will be different -We can rewrite our hypotheses for clarity: H 0 : 1 2 0 H a : 1 2 0 -We can reject the null hypothesis if the estimated difference between B1hat and B2hat is positive enough 4.4 Complicated Single Tests -Our new t statistic becomes: ˆ1 ˆ2 t se( ˆ1 ˆ2 ) -And our test continues as before: 1) Calculate t 2) Pick α and calculate t* 3) Reject if t<t* 4.4 Complicated Standard Errors -The standard error in this test is more complicated than before -If we simply subtract standard errors, we may end up with a negative value -this is theoretically impossible -se must always be positive since it estimates standard deviations 4.4 Complicated Standard Errors -Using the properties of variances, we know that: Var ( ˆ1 ˆ2 ) Var ( ˆ1 ) Var ( ˆ2 ) 2Cov( ˆ1 , ˆ2 ) -Where the variances are always added and the covariance always subtracted -transferring to standard deviation, this becomes: 2 2 ˆ ˆ ˆ ˆ se(1 2 ) {se(1 )} {se( 2 )} 2s12 -Where s12 is an estimate of the covariance between coefficients -s12 can either be calculated using matrix algebra or be supplied by econometrics programs 4.4 Complicated Standard Errors -To see how to find this standard error, take our typical regression: y 0 1 x1 2 x2 3 x3 u -and consider the related equation where θ=B1-B2 or B1= θ+B2: y 0 ( 2 ) x1 2 x2 3 x3 u y 0 x1 2 ( x1 x2 ) 3 x3 u -where x1 and x1 could be related concepts (ie: sleep time and naps) and x3 could be relatively unrelated (ie: study time) 4.4 Complicated Standard Errors -By running this new regression, we can find the standard error for our hypothesis test -using an econometric program is easier -Empirically: 1) B0 and se(B0) are the same for both regressions 2) B2 and B3 are the same for both regressions 3) Only B1 (the coefficient of θ) changes -given this new standard error, CI’s are created as normal 4.5 Testing Multiple Restrictions -Thus far we have tested whether a SINGLE variable is significant, or how two different variable’s impacts compare -In this section we will test whether a SET of variables are significant; have a partial effect on the dependent variable -Even though a group of variables may be individually insignificant, they may be significant as a group due to multicollinearity 4.5 Testing Multiple Restrictions -Consider our general true model and an example measuring reading week utility (rwu): y 0 1 x1 2 x2 3 x3 u rwu 0 1ski 2trips 3homework u -we want to test the hypothesis that B1 and B2 equal zero at the same time, that x1 and x1 have no partial effect simultaneously: H 0 : 1 0, 2 0 -in our example, we are testing that positive activities have no effect on r.w. utility 4.5 Testing Multiple Restrictions -our null hypothesis had two EXCLUSION RESTRICTIONS -this set of MULTIPLE RESTRICTIONS is tested using a MULTIPLE HYPOTHESIS TEST or JOINT HYPOTHESIS TEST -the alternate hypothesis is unique: H a : H 0 is not true -note that we CANNOT use individual t tests to test this multiple restriction; we need to test the restriction jointly 4.5 Testing Multiple Restrictions -to test joint significance, we need to use SSR and R squared values obtained from two different regressions -we know that SSR increases and R2 decreases when variable are dropped from the model -in order to conduct our test, we need to regress two models: 1) An UNRESTRICTED model with all of the variables 2) A RESTRICTED MODEL that excludes the variables in the test 4.5 Testing Multiple Restrictions -Given a hypothesis test with q restrictions, we have the following regressions: H 0 : k q 1 0,....., k 0 (4.35) y 0 1 x1 2 x2 ... k xk u (4.34) y 0 1 x1 2 x2 ... k q xk q u (4.36) -Where 4.34 is the UNRESTIRCTED MODEL giving us SSRu and 4.35 is the RESTRICTED MODEL giving us SSRr 4.5 Testing Multiple Restrictions -These SSR values combine to give us our F STATISTIC or TEST F STATISTIC: ( SSRr SSRur ) / q F SSRur /( n k 1) (4.37) -Where q is the number of restrictions in the null hypothesis and q=numerator degrees of freedom -n-k-1=denominator degrees of freedom (the denominator is the unbiased estimator of σ2) -since SSRr≥SSRur, F is always positive 4.5 Testing Multiple Restrictions -Once can think of our test F stat as measuring the relative increase in SSR from moving from the unrestricted model to restricted -a large F indicates that the excluded variables have much explanatory power -using Ho and our CLM assumptions, we know that F has an F distribution with q, n-k-1 degrees of freedom: F~Fq, n-k-1 -we obtain F* from F tables and reject Ho if: F F* 4.5 Multiple Example -Given our previous example of reading week utility, a restricted and unrestricted model give us: rwˆ u 15.9 2.0 ski 3.0 trips 0.5 homework ( 4.3) N 572 ( 0.9 ) (1.3) ( 0.12) SSR 175 rwˆ u 17.6 0.6 homework ( 6.3) N 572 ( 0.17) SSR 141 -Which correspond to the hypotheses: H 0 : 2 0, 3 0 H a : H 0 is not true 4.5 Multiple Example -We use these SSR to construct a test statistic: ( SSRr SSRur ) / q F SSRur /( n k 1) (175 - 141)/2 F 68.6 141 /(572 3 1) -given α=0.05, F*2,569=3.00 -since F>F*, reject H0 at a 95% confidence level; positive activities have an impact on reading week utility 4.5 Multiple Notes -Once the degrees of freedom in F’s denominator reach about 120, the F distribution is no longer sensitive to it -hence the infinity entry in the F table -if H0 is rejected, the variables in question are JOINTLY (STATISTICALLY) SIGNIFICANT at the given alpha level -if H0 is not rejected the variables in question are JOINTLY INSIGNIFICANT at the alpha level -an F test can often be not rejected when individual t tests are rejected due to multicollinearity 4.5 F, t’s secret identity? -the F statistic can also be used to test significance of a single variable -in this case, q=1 -it can be shown that F=t2 in this case -or t2n-k-1 ~F1, n-k-1 -this only applies to two-sided tests -therefore t statistic is more flexible since it allows for one-sided tests -the t statistic is always best suited for testing a single hypothesis 4.5 F tests and abuse -we have already seen where individually insignificant variables may be jointly significant due to multicollinearity -a significant variable can also prove to be jointly insignificant if grouped with enough insignificant variables -an insignificant variable can also prove to be significant if grouped with significant variables -therefore t tests are much better than F tests at determining individual significance 4.5 R2 and F -While SSR can be large, R2 is bounded, often making it an easier way to calculate F: ( R ur R r ) / q F (4.41) 2 (1 R ur ) /( n k 1) 2 2 -Which is also called the R-SQUARED FORM OF THE F STATISTIC -since R2ur>R2r, F is still always positive -this form is NOT valid for testing all linear restrictions (as seen later) 4.5 F and p-values -similar to t-tests, F tests can produce p-values which are defined as: p - value P(F* F) (4.43) -the p-value is the “probability of observing a value of F at least as large as we did, given that the null hypothesis is true” -a small p-value is therefore evidence against H0 -as before, reject H0 if p>α -p-values can give us a more complete view of significance 4.5 Overall significance -Often it is valid to test if the model is significant overall -the hypothesis that NONE of the explanatory variables have an effect on y is given as: H 0 : 1 2 .... k 0 (4.44) -as before with multiple restrictions, we compare against the restricted model: y 0 u (4.45) 4.5 Overall significance -Since our restricted model has no independent variables, its R2 is zero and our F formula simplifies to: 2 R /k F (4.46) 2 (1 R ) /( n k 1) -Which is only valid for this special test -this test determines the OVERALL SIGNIFICANCE OF THE REGRESSION -if this tests fails, we need to find other explanatory variables 4.5 Testing General Linear Restrictions -Sometimes economic theory (generally using elasticity) requires us to test complicated joint restrictions, such as: H 0 : 1 0, 2 1, 3 2 -Which expects our model: y 0 1 x1 2 x2 3 x3 u -To be of the form: y 0 0 x1 1x2 2 x3 u 4.5 Testing General Linear Restrictions -We rewrite this expected model to obtain a restricted model: y 1x2 2 x3 0 u -We then calculate the F statistic using the SSR formula -note that since the dependent variable changes between the two models, the R2 F formula is not valid in this case -note that the number of restrictions (q) is simply equal to the number of = in the null hypothesis 4.6 Reporting Regression Results -When reporting single regressions, the proper reporting method is: ln( Taˆste) i 3.7 0.2 ln( Time)i 1.4 ln( Skill ) i ( 0.9 ) ( 0.15) ( 0.78) R 2 0.41 N 143 -where R2, estimated coefficients, and N MUST be reported (note also the ^ and i’s) -either standard errors or t-values must also be reported (se is more robust for tests other than Bk=0) -SSR and standard error of the regression can also be reported 4.6 Reporting Regression Results -When multiple, related regressions are run (often to test for joint significance), the results can be expressed in table format, as seen on the next slide -whether a simple or table reporting method is done, the meanings and scaling of all the included variables must always be explained in a proper project Ie: price: average price, measured weekly, in American dollars College: Dummy Variable. 0 if no college education, 1 if college education 4.6 Reporting Regression Results Dependent variable: Midterm readiness Ind. variables 1 Study Time 0.47 (0.12) Intellect 1.89 (1.7) 2.36 (1.4) Intercept 2.5 (0.03) 2.8 (0.02) 33 0.48 33 0.34 Observations R2 2