Econ 388 R. Butler 2014 revisions Lecture 7 I. One more assumption about the error: Part III, normality, it is that Assumption 5: the errors are normally distributed. The regression model can be conveniently summarized as follows (using matrix algebra): Y X u u ~ N (0, 2 I ) where the last notation stands for “u is normally distributed with mean vector, 0, and a variance-covariance matrix 2 I ”. Why do we make this final assumption? There is a very useful rule that the states the following (Wooldridge, appendix B): The sum of independent, identically distributed normal random variables is also a normal random variable This rule leads to the genealogy of normality in regression 1. it starts with the error term: ui . 2. Then by the useful summation rule, normality spreads to Yi because of the regression specification Yi 0 1 X1i ui , even though we condition on the values of X. 3. And because Yi is normal, then the estimated slope coefficients will also be normally distributed: n ˆ1 = (X - X)(Y - Y) (X X)Y Y (X = (X X ) ) (X - X i i i i =1 i n i X) 2 2 i i (X X )Y (X X ) I i 2 , or i i =1 (X i X) (X i X ) 2 Since the estimated slope coefficient, ˆ , is a linear sum of normal random variables, Yi , ˆ a Y 1 i i where ai 1 then ˆ1 will also be normally distributed (we show this here for the case of simple regression; but from the matrix specification above, it is also clearly the case in the multiple regression model. In summary: normality of ui normality of Yi normality of ̂1 result: ˆ ~ N ( , 2ˆ ) k k k We expect, because of natural sampling variability, that ˆ k will vary from sample to sample. So if we want to test a theory about ˆ , then we need to take into k account this natural sampling variability, and ask what the distribution of ˆ k would look like if the null hypothesis were true. Then we ask what the probability is that we achieved a given sample outcome, assuming the null hypothesis is true. Our testing procedure does NOT test whether the null hypothesis is true (such a test would be “unconditional” in the sense that would have to know something about the “objective” 1 state of the hypothesis, presumably because we could examine the whole population and not just a sample). Our tests are based on how likely our sample result was IF the null hypothesis were true (hence, it’s a “conditional” test that says if the null hypothesis were true, is this an outcome that we would expect to observe). II. 3 tests on individual slope coefficients ( ˆ k ) A. one sided: we think that the slope is negative (in the alternative hypothesis) null: Ho: ˆ k 0 alternative: Ha: ˆ k < 0 what is the critical value, C , for this test? B. one sided: we think that the slope is positive (in the alternative hypothesis) null: Ho: ˆ k 0 alternative : Ha: ˆ k > 0 what is the critical value, C , for this test? C. two sided: we think that the slope is not zero (the alternative hypothesis that effect is important, but we are not sure whether it is positive or negative) null: Ho: ˆ k = 0 alternative: ˆ k 0 what are the critical values, C , for this test? D. type I and type II errors with tests on the value of slope coefficients Suppose: Ho : j 1 vs. Ha : j 2 and we know (the standard error angel has visited us) that S.E. j .3 in either case. 1) We employ the usual 95% confidence interval against a type I error, constructing the cutoff value for accepting the null hypothesis (Ho). So far, we have supposed Ho is true. 2) Next, to get to the type II error, assume the alternative hypothesis were true, and see – given the critical cut off for the type I error that we computed supposing that Ho were true– what the size of the type II would be. Critical Hypo 1) Find the critical value under Ho : Z score95 S .E. Critical 1 or Critical 1.4935 Or 1.645 .3 2)Find the likelihood of type II error (given Ha is true). So given our critical cutoff (1.4935) computed from our test for type I error, the type II error would be 1.4935 2.00 Z .3 Z 1.6883 From the standard normal table this is a probability of about .0454 E. Zero is not sacred: we can generalize the nature to tests about estimated coefficients have “cutoff” values other than 0. 2 example: null: Ho: ˆ k 1 alternative: Ha: ˆ k < 1 for the wealth/income coefficient in a consumption regression: Consumption = 0 + 1 Permanent Income + u III. t-tests: some examples A. the empirical rule (an ‘approximation’ for normal and t-distributions) For bell-shaped (mound-shaped or normal) distributions, knowing the mean and standard deviation tells you quite a bit about the distribution. Other common shapes are skewed to the right, and skewed to the left distributions. Occasionally, when leaving near campus, you also see bimodal age distributions in the wards (the newly wed and nearly dead wards). The empirical rule (appendix C on the standard normal distribution “rule of thumb”) for mound-shaped (normal or bell-shaped) distributions: 1.Approx. 68% of the observations will be within 1 standard deviation of the mean. 2.Approx. 95% of the observations will be within 2 standard deviations of the mean. 3.Approx. 99% of the observations will be within 3 standard deviations of the mean. The ESSENCE OF MOST STATISTICAL TESTING is to find out whether the difference between a statistic and its hypothesized value is smaller than two standard deviations—if so, its not statistically significant (that is, you can’t reject the null hypothesis). An equivalent way of expressing this is to say that the probability significance level is greater than .05 (using the 5 percent level as the cutoff value in evaluating significance). That is, roughly, ( statisic hypothesized _ value) 2 ( s tan dard _ deviation) when a statistic is not significant at the 5 percent level. Suppose our statistic of interest is the ith regression coefficient, i , and we hypothesize that its associated variable (xi) has no effect on the outcome, which is to say, that the null hypothesis is i 0 . Letting ˆi be the statistic—the estimated value of i --and letting the standard deviation of i be denoted as i (the standard deviation of the estimated coefficient is called the “standard error”), then the test for significance under the null hypothesis is given as ( ˆi 0) . i If ˆi is more than two standard deviations away from zero, we reject the null hypothesis (in this case, we would reject the hypothesis that the coefficient is zero—i.e., we reject the implicit idea in this null hypothesis that xi it can be dropped from the right hand side regressors). 3 ****MATH MOMENT********************************************** When we actually go to test a hypothesis, we don’t know (the true standard deviation), and we have to estimate its value by s (the standard deviation we estimate from our sample). But estimating the standard deviation (instead of simply knowing it) introduces some additional uncertainty about the size of “empirical rule” intervals. The t-distribution captures this additional uncertainty: it is bell shaped like the standard normal distribution with its mean value at zero, but it has thicker tails than the standard normal. This additional uncertainty is captured by the t-distribution. So, for example, for a small sample of n=20--where the degrees of freedom=19-- the 95 percent confidence interval would be plus or minus 2.093 (see the Standard Normal distribution in Wooldridge) rather than 1.96 for the normal (which is approximately 2), where the variance is known and doesn’t have to be estimated. As n gets large, the t-distribution approaches the normal distribution in shape (see the distributions in the back of the Wooldridge book). For large n (30 or more), the computer will still compute a confidence interval based on the t-distribution, but it will be very close to the standard normal, or z-score, distribution. You can think of them as roughly equivalent: the t-distribution is slightly “fatter” than the normal distribution since it accounts for the estimation of standard deviation. Often, we will ignore this difference in the course since our sample sizes make the Zscore (standard normal) approximation good enough. B. one sided, MEAP93.raw data (chapter 4 in Wooldridge) Case A-type example: the impact of school size on student performance IV. p-values: the likelihood of getting the sample outcome you obtained IF the null hypothesis were true. (see the glossary in Wooldridge for an alternative definition) That is what is reported in STATA and SAS (that is, the null hypothesis that these programs automatically compute is that the respective coefficient i =0, assuming a two tailed test). V. practical significance does not equal statistical significance VI. Confidence intervals (random intervals for ̂ k constructed to capture the true k 95 percent of the time—or 90 percent, or 99 percent, depending on application) A. Crude intervals from the empirical rule (OK guide in many cases, particularly with large samples—for small samples, we need to probably construct the confidence interval using t-table): d. 68% within one standard deviation e. 95% within two standard deviations f. 99% within three standard deviations More precise confidence intervals for small samples, and for obsessive people, use the ttables. 4 95% confidence interval for k = ̂ k c se( ˆk ) where c is value corresponding to the 97.5 percentile for the t-distribution (p. 849, the “1-tailed, .025 significance level” column) III. TESTING A SINGLE LINEAR COMBINATION OF THE PARAMETERS: Utah data: testing whether an additional year of schooling at high school, college, have an equivalent on the log of a workers wage (following the example in Wooldridge, chapter four, part one.) Recall example on “C Transformations” lecture 4. ln wage 0 1 H .S . 2 coll let 1 2 so Ho : 0 ln wage 0 1 H .S . 2 coll (recall 1 2 ) 0 ( 2 ) H .S . 2 coll 0 H .S . 2 ( H .S . College ) So if coefficient on HS=0 then accept null hypothesis test 1 2 . VII. TESTING MULTIPLE LINEAR COMBINATIONS OF PARAMETERS: F-tests or Chow tests A. testing subsets of regressors within a regime B. testing parameters between regimes 5