Chapter Five TWO-VARIABLE REGRESSION: INTERVAL ESTIMATION AND HYPOTHESIS TESTING 5.2 INTERVAL ESTIMATION: SOME BASIC IDEAS Now in statistics the reliability of a point estimator is measured by its standard error. Therefore, instead of relying on the point estimate alone, we may construct an interval around the point estimator, say within two or three standard errors on either side of the point estimator, such that this interval has, say, 95 percent probability of including the true parameter value. This is roughly the idea behind interval estimation. 1 It is very important to know the following aspects of interval estimation: 1. Equation (5.2.1) does not say that the probability of β2 lying between the given limits is 1 − α. Since β2, although an unknown, is assumed to be some fixed number, either it lies in the interval or it does not. What (5.2.1) states is that, for the method described in this chapter, the probability of constructing an interval that contains β2 is 1 − α. 2 3. Since the confidence interval is random, the probability statements attached to it should be understood in the long-run sense, that is, repeated sampling. 5.3 CONFIDENCE INTERVALS FOR REGRESSION COEFFICIENTS β1 AND β2 3 as noted in (4.3.6), is a standardized normal variable. It therefore seems that we can use the normal distribution to make probabilistic statements about β2 provided the true population variance σ2 is known. Therefore, instead of using the normal distribution, we can use the t distribution to establish a confidence interval for β2 as follows: 4 as noted in (4.3.6), is a standardized normal variable. It therefore seems that we can use the normal distribution to make probabilistic statements about β2 provided the true population variance σ2 is known. It can be shown that the t variable thus defined follows the t distribution with n − 2 df. Equation (5.3.5) provides a 100(1 − α) percent confidence interval for β2, which can be written more compactly as 100(1 − α)% confidence interval for β2: 5 Arguing analogously, and using (4.3.1) and (4.3.2), we can then write: or, more compactly, 100(1 − α)% confidence interval for β1: and (5.3.8): In both cases the width of the confidence interval is proportional to the standard error of the estimator. That is, the larger the standard error, the larger is the width of the confidence interval. Put differently, the larger the standard error of the estimator, the greater is the uncertainty of estimating the true value of the unknown parameter. Thus, the standard error of an estimator is often described as a measure of the precision of the estimator, i.e., how precisely the estimator measures the true population value. 6 The interpretation of this confidence interval is: Given the confidence coefficient of 95%, in the long run, in 95 out of 100 cases intervals like (0.4268, 0.5914) will contain the true β2. But, as warned earlier, we cannot say that the probability is 95 percent that the specific interval (0.4268 to 0.5914) contains the true β2 because this interval is now fixed and no longer random; therefore, β2 either lies in it or does not: 7 Confidence Interval for β1 Following (5.3.7), the reader can easily verify that the 95% confidence interval for β1 of our consumption–income example is Again you should be careful in interpreting this confidence interval. In the long run, in 95 out of 100 cases intervals like (5.3.11) will contain the true β1 8 5.5 HYPOTHESIS TESTING: GENERAL COMMENTS The problem of statistical hypothesis testing may be stated simply as follows: Is a given observation or finding compatible with some stated hypothesis or not? The word “compatible,” as used here, means “sufficiently” close to the hypothesized value so that we do not reject the stated hypothesis. In the language of statistics, the stated hypothesis is known as the null hypothesis and is denoted by the symbol H0. The null hypothesis is usually tested against an alternative hypothesis (also known as maintained hypothesis) denoted by H1, 5.6 HYPOTHESIS TESTING: THE CONFIDENCE-INTERVAL APPROACH Two-Sided or Two-Tail Test is, the true MPC is 0.3 under the null hypothesis but it is less than or greater than 0.3 under the alternative hypothesis. The null hypothesis is a simple hypothesis, whereas the alternative hypothesis is composite; 9 In statistics, when we reject the null hypothesis, we say that our finding is statistically significant. On the other hand, when we do not reject the null hypothesis, we say that our finding is not statistically significant. One-Sided or One-Tail Test Sometimes we have a strong a priori or theoretical expectation (or expectations based on some previous empirical work) that the alternative hypothesis is one-sided or unidirectional rather than two-sided, as just discussed. Thus, for our consumption–income example, one could postulate that H0: β2 ≤ 0.3 and H1: β2 > 0.3 Perhaps economic theory or prior empirical work suggests that the marginal propensity to consume is greater than 0.3. 10 5.7 HYPOTHESIS TESTING: THE TEST-OF-SIGNIFICANCE APPROACH Testing the Significance of Regression The key idea behind tests of significance is that of a test statistic (estimator) and the sampling distribution of such a statistic under the null hypothesis. The decision to accept or reject H0 is made on the basis of the value of the test statistic obtained from the data at hand. As an illustration, recall that under the normality assumption the variable follows the t distribution with n − 2 df. If the value of true β2 is specified under the null hypothesis, the t value of (5.3.2) can readily be computed from the available sample, and therefore it can serve as a test statistic. And since this test statistic follows the t distribution, confidence-interval statements such as the following can be made: 11 where β*2 is the value of β2 under H0 and where −tα/2 and tα/2 are the values of t (the critical t values) obtained from the t table for (α/2) level of significance and n − 2 df [cf. (5.3.4)]. The t table is given in Appendix D. Rearranging (5.7.1), we obtain 12 In practice, there is no need to estimate (5.7.2) explicitly. One can compute the t value in the middle of the double inequality given by (5.7.1) and see whether it lies between the critical t values or outside them. For our example, 13 Therefore, a “large” |t| value will be evidence against the null hypothesis. Of course, we can always use the t table to determine whether a particular t value is large or small; the answer, as we know, depends on the degrees of freedom as well as on the probability that we are willing to accept. If you take a look at the t table given in Appendix D, you will observe that for any given value of df the probability of obtaining an increasingly large |t| value becomes progressively smaller. Thus, for 20 df the probability of obtaining a |t| value of 1.725 or greater is 0.10 or 10 percent, but for the same df the probability of obtaining a |t| value of 3.552 or greater is only 0.002 or 0.2 percent. 14 Since we use the t distribution, the preceding testing procedure is called appropriately the t test. In the language of significance tests, a statistic is said to be statistically significant if the value of the test statistic lies in the critical region. In this case the null hypothesis is rejected. By the same token, a test is said to be statistically insignificant if the value of the test statistic lies in the acceptance region. In this situation, the null hypothesis is not rejected. In our example, the t test is significant and hence we reject the null hypothesis. 15 5.8 HYPOTHESIS TESTING: SOME PRACTICAL ASPECTS The Meaning of “Accepting” or “Rejecting” a Hypothesis If on the basis of a test of significance, say, the t test, we decide to “accept” the null hypothesis, all we are saying is that on the basis of the sample evidence we have no reason to reject it; we are not saying that the null hypothesis is true beyond any doubt. The “Zero” Null Hypothesis and the “2-t” Rule of Thumb A null hypothesis that is commonly tested in empirical work is H0: β2 = 0, that is, the slope coefficient is zero. This “zero” null hypothesis is a kind of straw man, the objective being to find out whether Y is related at all to X, the explanatory variable. If there is no relationship between Y and X to begin with, then testing a hypothesis such as β2 = 0.3 or any other value is meaningless. 16 Or or when for the appropriate degrees of freedom. Choosing α, the Level of Significance It should be clear from the discussion so far that whether we reject or do not reject the null hypothesis depends critically on α, the level of significance or the probability of committing a Type I error—the probability of rejecting the true hypothesis. A Type II error (the probability of accepting the false hypothesis) for a given sample size, if we try to reduce a Type I error, a Type II error increases, and vice versa. That is, given the sample size, if we try to reduce the probability of rejecting the true hypothesis, we at the same time increase the probability of accepting the false hypothesis. So there is a tradeoff involved between these two types of errors, given the sample size. 17 The Exact Level of Significance: The p Value The p value (i.e., probability value), also known as the observed or exact level of significance or the exact probability of committing a Type I error. More technically, the p value is defined as the lowest significance level at which a null hypothesis can be rejected. If the data do not support the null hypothesis, |t| obtained under the null hypothesis will be “large” and therefore the p value of obtaining such a |t| value will be “small.” In other words, for a given sample size, as |t| increases, the p value decreases, and one can therefore reject the null hypothesis with increasing confidence. 18 5.9 REGRESSION ANALYSIS AND ANALYSIS OF VARIANCE In this section we study regression analysis from the point of view of the analysis of variance and introduce the reader to an illuminating and complementary way of looking at the statistical inference problem. that is, TSS = ESS + RSS, which decomposed the total sum of squares (TSS) into two components: explained sum of squares (ESS) and residual sum of squares (RSS). A study of these components of TSS is known as the analysis of variance (ANOVA) from the regression viewpoint. Let us arrange the various sums of squares and their associated df in Table 5.3, which is the standard form of the AOV table, sometimes called the ANOVA table. Given the entries of Table 5.3, we now consider the following variable: 19 If we assume that the disturbances ui are normally distributed, which we do under the CNLRM, and if the null hypothesis (H0) is that β2 = 0, then it can be shown that the F variable of (5.9.1) follows the F distribution with 1 df in the numerator and (n − 2) df in the denominator. 20 Therefore, the F ratio of (5.9.1) provides a test of the null hypothesis H0: β2 = 0. Since all the quantities entering into this equation can be obtained from the available sample, this F ratio provides a test statistic to test the null hypothesis that true β2 is zero. All that needs to be done is to compute the F ratio and compare it with the critical F value obtained from the F tables at Thus, the t and the F tests provide us with two alternative but complementary ways of testing the null hypothesis that β2 = 0. If this is the case, why not just rely on the t test and not worry about the F test and the accompanying analysis of variance? For the twovariable model there really is no need to resort to the F test. But when we consider the topic of multiple regression we will see that the F test has several interesting applications that make it a very useful and powerful method of testing statistical hypotheses. 21 5.10 APPLICATION OF REGRESSION ANALYSIS: THE PROBLEM OF PREDICTION On the basis of the sample data of Table 3.2 we obtained the following sample regression: Mean Prediction To fix the ideas, assume that X0 = 100 and we want to predict E(Y | X0 = 100). Now it can be shown that the historical regression (3.6.2) provides the point estimate of this mean prediction as follows: 22 follows the t distribution with n− 2 df. The t distribution can therefore be used to derive confidence intervals for the true E(Y0 | X0) and test hypotheses about it in the usual manner, namely, 23 For our data (see Table 3.3), Therefore, the 95% confidence interval for true E(Y | X0) = β1 + β2X0 is given by 24 Thus, given X0 = 100, in repeated sampling, 95 out of 100 intervals like (5.10.5) will include the true mean value; the single best estimate of the true mean value is of course the point estimate 75.3645. 25 5.11 REPORTING THE RESULTS OF REGRESSION ANALYSIS There are various ways of reporting the results of regression analysis, but in this text we shall use the following format, employing the consumption–income example of Chapter 3 as an illustration: In Eq. (5.11.1) the figures in the first set of parentheses are the estimated standard errors of the regression coefficients, the figures in the second set are estimated t values computed from (5.3.2) under the null hypothesis that the true population value of each regression coefficient individually is zero (e.g., 3.8128 = 24.4545 ÷ 6.4138), and the figures in the third set are the estimated p values. Thus, for 8 df the probability of obtaining a t value of 3.8128 or greater is 0.0026 and the probability of obtaining a t value of 14.2605 or larger is about 0.0000003. 26 By presenting the p values of the estimated t coefficients, we can see at once the exact level of significance of each estimated t value. Thus, under the null hypothesis that the true population intercept value is zero, the exact probability (i.e., the p value) of obtaining a t value of 3.8128 or greater is only about 0.0026. Therefore, if we reject this null hypothesis, the probability of our committing a Type I error is about 26 in 10,000, a very small probability indeed. For all practical purposes we can say that the true population intercept is different from zero. Likewise, the p value of the estimated slope coefficient is zero for all practical purposes. If the true MPC were in fact zero, our 27 5.12 EVALUATING THE RESULTS OF REGRESSION ANALYSIS we would like to question the adequacy of the fitted model. How “good” is the fitted model? We need some criteria with which to answer this question. First, are the signs of the estimated coefficients in accordance with theoretical or prior expectations? A priori, β2, the marginal propensity to consume (MPC) in the consumption function, should be positive. In the present example it is. Second, if theory says that the relationship should be not only positive but also statistically significant, is this the case in the present application? As we discussed in Section 5.11, the MPC is not only positive but also statistically significantly different from zero; the p value of the estimated t value is extremely small. The same comments apply about the intercept coefficient. Third, how well does the regression model explain variation in the consumption expenditure? One can use r 2 to answer this question. In the present example r 2 is about 0.96, which is a very high value considering that r 2 can be at most 1. Thus, the model we have chosen for explaining consumption expenditure behavior seems quite good. But before we sign off, we would like to find out whether our model satisfies the assumptions of CNLRM. We will not look at the various assumptions now because the model is patently so simple. But there is one assumption that we would like to check, namely, the normality of the disturbance term, ui. Recall that the t and F tests used before require that the error term follow the normal distribution. Otherwise, the testing procedure will not be valid in small, or finite, samples. 28 Normality Tests Although several tests of normality are discussed in the literature, we will consider just three: (1) histogram of residuals; (2) normal probability plot (NPP), a graphical device; and (3) the Jarque–Bera test. Histogram of Residuals. A histogram of residuals is a simple graphic device that is used to learn something about the shape of the PDF of a random variable. On the horizontal axis, we divide the values of the variable of interest (e.g., OLS residuals) into suitable intervals, and in each class interval we erect rectangles equal in height to the number of observations (i.e., frequency) in that class interval. If you mentally superimpose the bell shaped normal distribution curve on the histogram, you will get some idea as to whether normal (PDF) approximation may be appropriate. A concrete example is given in Section 5.13 (see Figure 5.8). It is always a good practice to plot the histogram of the residuals as a rough and ready method of testing for the normality assumption. 29 30 As noted earlier, if the fitted line in the NPP is approximately a straight line, one can conclude that the variable of interest is normally distributed. In Figure 5.7, we see that residuals from our illustrative example are approximately normally distributed, because a straight line seems to fit the data reasonably well. MINITAB also produces the Anderson–Darling normality test, known as the A2 statistic. The underlying null hypothesis is that the variable under consideration is normally distributed. As Figure 5.7 shows, for our example, the computed A2 statistic is 0.394. The p value of obtaining such a value of A2 is 0.305, which is reasonably high. Therefore, we do not reject the hypothesis that the residuals from our consumption–income example are normally distributed. Incidentally, Figure 5.7 shows the parameters of the (normal) distribution, the mean is approximately 0 and the standard deviation is about 6.12. 31 32 Jarque–Bera (JB) Test of Normality. The JB test of normality is an asymptotic, or large-sample, test. It is also based on the OLS residuals. This test first computes the skewness and kurtosis measures of the OLS residuals and uses the following test statistic: where n = sample size, S = skewness coefficient, and K = kurtosis coefficient. For a normally distributed variable, S = 0 and K = 3. Therefore, the JB test of normality is a test of the joint hypothesis that S and K are 0 and 3, respectively. In that case the value of the JB statistic is expected to be 0. Under the null hypothesis that the residuals are normally distributed, Jarque and Bera showed that asymptotically (i.e., in large samples) the JB statistic given in (5.12.1) follows the chi-square distribution with 2 df. If the computed p value of the JB statistic in an application is sufficiently low, which will happen if the value of the statistic is very different from 0, one can reject the hypothesis that the residuals are normally distributed. But if the p value is reasonably high, which will happen if the value of the statistic is close to zero, we do not reject the normality assumption. The sample size in our consumption–income example is rather small. Hence, strictly speaking one should not use the JB statistic. If we mechanically apply the JB formula to our example, the JB statistic turns out to be 0.7769. The p value of obtaining such a value from the chi-square distribution with 2 df is about 0.68, which is quite high. In other words, we may not reject the normality assumption for our example. Of course, bear in mind 33 the warning about the sample size.