2-t - Haiku Learning

advertisement
Chapter Five
TWO-VARIABLE REGRESSION: INTERVAL ESTIMATION
AND
HYPOTHESIS TESTING
5.2 INTERVAL ESTIMATION: SOME BASIC IDEAS
Now in statistics the reliability of a point estimator is measured by its standard error.
Therefore, instead of relying on the point estimate alone, we may construct an interval
around the point estimator, say within two or three standard errors on either side of the
point estimator, such that this interval has, say, 95 percent probability of including the
true parameter value. This is roughly the idea behind interval estimation.
1
It is very important to know the following aspects of interval estimation:
1. Equation (5.2.1) does not say that the probability of β2 lying between the given limits
is 1 − α. Since β2, although an unknown, is assumed to be some fixed number, either it
lies in the interval or it does not. What (5.2.1) states is that, for the method described in
this chapter, the probability of constructing an interval that contains β2 is 1 − α.
2
3. Since the confidence interval is random, the probability statements attached to it
should be understood in the long-run sense, that is, repeated sampling.
5.3 CONFIDENCE INTERVALS FOR REGRESSION COEFFICIENTS β1 AND β2
3
as noted in (4.3.6), is a standardized normal variable. It therefore seems that we can
use the normal distribution to make probabilistic statements about β2 provided the
true population variance σ2 is known.
Therefore, instead of using the normal distribution, we can use the t distribution to establish
a confidence interval for β2 as follows:
4
as noted in (4.3.6), is a standardized normal variable. It therefore seems that we can use
the normal distribution to make probabilistic statements about β2 provided the true
population variance σ2 is known.
It can be shown that the t variable thus defined follows the t distribution with
n − 2 df.
Equation (5.3.5) provides a 100(1 − α) percent confidence interval for β2, which can be
written more compactly as 100(1 − α)% confidence interval for β2:
5
Arguing analogously, and using (4.3.1) and (4.3.2), we can then write:
or, more compactly, 100(1 − α)% confidence interval for β1:
and (5.3.8): In both cases the width of the confidence interval is proportional to the
standard error of the estimator. That is, the larger the standard error, the larger is the
width of the confidence interval. Put differently, the larger the standard error of the
estimator, the greater is the uncertainty of estimating the true value of the unknown
parameter. Thus, the standard error of an estimator is often described as a measure
of the precision of the estimator, i.e., how precisely the estimator measures the true
population value.
6
The interpretation of this confidence interval is: Given the confidence coefficient of 95%,
in the long run, in 95 out of 100 cases intervals like (0.4268, 0.5914) will contain the true
β2. But, as warned earlier, we cannot say that the probability is 95 percent that the specific
interval (0.4268 to 0.5914) contains the true β2 because this interval is now fixed and no
longer random; therefore, β2 either lies in it or does not:
7
Confidence Interval for β1
Following (5.3.7), the reader can easily verify that the 95% confidence interval for β1 of
our consumption–income example is
Again you should be careful in interpreting this confidence interval. In the long run, in
95 out of 100 cases intervals like (5.3.11) will contain the true β1
8
5.5 HYPOTHESIS TESTING: GENERAL COMMENTS
The problem of statistical hypothesis testing may be stated simply as follows:
Is a given observation or finding compatible with some stated hypothesis or not? The word
“compatible,” as used here, means “sufficiently” close to the hypothesized value so that
we do not reject the stated hypothesis.
In the language of statistics, the stated hypothesis is known as the null hypothesis and
is denoted by the symbol H0. The null hypothesis is usually tested against an alternative
hypothesis (also known as maintained hypothesis) denoted by H1,
5.6 HYPOTHESIS TESTING:
THE CONFIDENCE-INTERVAL APPROACH
Two-Sided or Two-Tail Test
is, the true MPC is 0.3 under the null hypothesis but it is less than or greater than 0.3
under the alternative hypothesis. The null hypothesis is a simple hypothesis, whereas
the alternative hypothesis is composite;
9
In statistics, when we reject the null hypothesis, we say that our finding is statistically
significant. On the other hand, when we do not reject the null hypothesis, we say that our
finding is not statistically significant.
One-Sided or One-Tail Test
Sometimes we have a strong a priori or theoretical expectation (or expectations based
on some previous empirical work) that the alternative hypothesis is one-sided or
unidirectional rather than two-sided, as just discussed.
Thus, for our consumption–income example, one could postulate that
H0: β2 ≤ 0.3
and
H1: β2 > 0.3
Perhaps economic theory or prior empirical work suggests that the marginal
propensity to consume is greater than 0.3.
10
5.7 HYPOTHESIS TESTING:
THE TEST-OF-SIGNIFICANCE APPROACH
Testing the Significance of Regression
The key idea behind tests of significance is that of a test statistic (estimator) and the
sampling distribution of such a statistic under the null hypothesis. The decision to
accept or reject H0 is made on the basis of the value of the test statistic obtained from
the data at hand.
As an illustration, recall that under the normality assumption the variable
follows the t distribution with n − 2 df. If the value of true β2 is specified under the null
hypothesis, the t value of (5.3.2) can readily be computed from the available sample, and
therefore it can serve as a test statistic. And since this test statistic follows the t
distribution, confidence-interval statements such as the following can be made:
11
where β*2 is the value of β2 under H0 and where −tα/2 and tα/2 are the values of t (the
critical t values) obtained from the t table for (α/2) level of significance and n − 2 df
[cf. (5.3.4)]. The t table is given in Appendix D.
Rearranging (5.7.1), we obtain
12
In practice, there is no need to estimate (5.7.2) explicitly. One can compute the t value in
the middle of the double inequality given by (5.7.1) and see whether it lies between the
critical t values or outside them. For our example,
13
Therefore, a “large” |t| value will be evidence against the null hypothesis. Of course, we
can always use the t table to determine whether a particular t value is large or small; the
answer, as we know, depends on the degrees of freedom as well as on the probability that
we are willing to accept. If you take a look at the t table given in Appendix D, you will
observe that for any given value of df the probability of obtaining an increasingly large |t|
value becomes progressively smaller. Thus, for 20 df the probability of obtaining
a |t| value of 1.725 or greater is 0.10 or 10 percent, but for the same df the probability of
obtaining a |t| value of 3.552 or greater is only 0.002 or 0.2 percent.
14
Since we use the t distribution, the preceding testing procedure is called appropriately
the t test. In the language of significance tests, a statistic is said to be statistically
significant if the value of the test statistic lies in the critical region. In this case the null
hypothesis is rejected. By the same token, a test is said to be statistically insignificant if the
value of the test statistic lies in the acceptance region. In this situation, the null hypothesis
is not rejected. In our example, the t test is significant and hence we reject the null
hypothesis.
15
5.8 HYPOTHESIS TESTING: SOME PRACTICAL ASPECTS
The Meaning of “Accepting” or “Rejecting” a Hypothesis
If on the basis of a test of significance, say, the t test, we decide to “accept” the null
hypothesis, all we are saying is that on the basis of the sample evidence we have no
reason to reject it; we are not saying that the null hypothesis is true beyond any doubt.
The “Zero” Null Hypothesis and the “2-t” Rule of Thumb
A null hypothesis that is commonly tested in empirical work is H0: β2 = 0, that is, the slope
coefficient is zero. This “zero” null hypothesis is a kind of straw man, the objective being to
find out whether Y is related at all to X, the explanatory variable. If there is no relationship
between Y and X to begin with, then testing a hypothesis such as β2 = 0.3 or any other
value is meaningless.
16
Or
or when
for the appropriate degrees of freedom.
Choosing α, the Level of Significance
It should be clear from the discussion so far that whether we reject or do not reject
the null hypothesis depends critically on α, the level of significance or the probability
of committing a Type I error—the probability of rejecting the true hypothesis.
A Type II error (the probability of accepting the false hypothesis)
for a given sample size, if we try to reduce a Type I error, a Type II error increases, and
vice versa. That is, given the sample size, if we try to reduce the probability of
rejecting the true hypothesis, we at the same time increase the probability of
accepting the false hypothesis. So there is a tradeoff involved between these two
types of errors, given the sample size.
17
The Exact Level of Significance: The p Value
The p value (i.e., probability value), also known as the observed or exact level of
significance or the exact probability of committing a Type I error. More technically, the p
value is defined as the lowest significance level at which a null hypothesis can be
rejected.
If the data do not support the null hypothesis, |t| obtained under the null hypothesis will
be “large” and therefore the p value of obtaining such a |t| value will be “small.” In other
words, for a given sample size, as |t| increases, the p value decreases, and one can
therefore reject the null hypothesis with increasing confidence.
18
5.9 REGRESSION ANALYSIS AND ANALYSIS OF VARIANCE
In this section we study regression analysis from the point of view of the analysis of
variance and introduce the reader to an illuminating and complementary way of looking at
the statistical inference problem.
that is, TSS = ESS + RSS, which decomposed the total sum of squares (TSS) into two
components: explained sum of squares (ESS) and residual sum of squares (RSS). A study of
these components of TSS is known as the analysis of variance (ANOVA) from the regression
viewpoint.
Let us arrange the various sums of squares and their associated df in Table 5.3, which is the
standard form of the AOV table, sometimes called the ANOVA table. Given the entries of
Table 5.3, we now consider the following variable:
19
If we assume that the disturbances ui are normally distributed, which we do under the
CNLRM, and if the null hypothesis (H0) is that β2 = 0, then it can be shown that the F
variable of (5.9.1) follows the F distribution with 1 df in the numerator and (n − 2) df in the
denominator.
20
Therefore, the F ratio of (5.9.1) provides a test of the null hypothesis H0: β2 = 0. Since all
the quantities entering into this equation can be obtained from the available sample, this F
ratio provides a test statistic to test the null hypothesis that true β2 is zero. All that needs
to be done is to compute the F ratio and compare it with the critical F value obtained from
the F tables at
Thus, the t and the F tests provide us with two alternative but complementary ways of
testing the null hypothesis that β2 = 0. If this is the case, why not just rely on the t test and
not worry about the F test and the accompanying analysis of variance? For the twovariable model there really is no need to resort to the F test. But when we consider the
topic of multiple regression we will see that the F test has several interesting applications
that make it a very useful and powerful method of testing statistical hypotheses.
21
5.10 APPLICATION OF REGRESSION ANALYSIS:
THE PROBLEM OF PREDICTION
On the basis of the sample data of Table 3.2 we obtained the following sample regression:
Mean Prediction
To fix the ideas, assume that X0 = 100 and we want to predict E(Y | X0 = 100).
Now it can be shown that the historical regression (3.6.2) provides the point estimate
of this mean prediction as follows:
22
follows the t distribution with n− 2 df. The t distribution can therefore be used to derive
confidence intervals for the true E(Y0 | X0) and test hypotheses about it in the usual
manner, namely,
23
For our data (see Table 3.3),
Therefore, the 95% confidence interval for true E(Y | X0) = β1 + β2X0 is given by
24
Thus, given X0 = 100, in repeated sampling, 95 out of 100 intervals like (5.10.5) will include
the true mean value; the single best estimate of the true mean value is of course the point
estimate 75.3645.
25
5.11 REPORTING THE RESULTS OF REGRESSION ANALYSIS
There are various ways of reporting the results of regression analysis, but in this text we
shall use the following format, employing the consumption–income example of Chapter 3 as
an illustration:
In Eq. (5.11.1) the figures in the first set of parentheses are the estimated standard
errors of the regression coefficients, the figures in the second set are estimated t values
computed from (5.3.2) under the null hypothesis that the true population value of each
regression coefficient individually is zero (e.g., 3.8128 = 24.4545 ÷ 6.4138), and the figures
in the third set are the estimated p values. Thus, for 8 df the probability of obtaining a t
value of 3.8128 or greater is 0.0026 and the probability of obtaining a t value of 14.2605 or
larger is about 0.0000003.
26
By presenting the p values of the estimated t coefficients, we can see at once the exact
level of significance of each estimated t value. Thus, under the null hypothesis that the
true population intercept value is zero, the exact probability (i.e., the p value) of
obtaining a t value of 3.8128 or greater is only about 0.0026. Therefore, if we reject this
null hypothesis, the probability of our committing a Type I error is about 26 in 10,000, a
very small probability indeed. For all practical purposes we can say that the true
population intercept is different from zero. Likewise, the p value of the estimated slope
coefficient is zero for all practical purposes. If the true MPC were in fact zero, our
27
5.12 EVALUATING THE RESULTS OF REGRESSION ANALYSIS
we would like to question the adequacy of the fitted model. How “good” is the fitted
model? We need some criteria with which to answer this question.
First, are the signs of the estimated coefficients in accordance with theoretical or prior
expectations? A priori, β2, the marginal propensity to consume (MPC) in the consumption
function, should be positive. In the present example it is. Second, if theory says that the
relationship should be not only positive but also statistically significant, is this the case in
the present application?
As we discussed in Section 5.11, the MPC is not only positive but also statistically
significantly different from zero; the p value of the estimated t value is extremely small.
The same comments apply about the intercept coefficient. Third, how well does the
regression model explain variation in the consumption expenditure? One can use r 2 to
answer this question. In the present example r 2 is about 0.96, which is a very high value
considering that r 2 can be at most 1.
Thus, the model we have chosen for explaining consumption expenditure behavior seems
quite good. But before we sign off, we would like to find out whether our model satisfies
the assumptions of CNLRM. We will not look at the various assumptions now because the
model is patently so simple. But there is one assumption that we would like to check,
namely, the normality of the disturbance term, ui. Recall that the t and F tests used before
require that the error term follow the normal distribution. Otherwise, the testing
procedure will not be valid in small, or finite, samples.
28
Normality Tests
Although several tests of normality are discussed in the literature, we will consider
just three: (1) histogram of residuals; (2) normal probability plot (NPP), a graphical
device; and (3) the Jarque–Bera test.
Histogram of Residuals. A histogram of residuals is a simple graphic device that is used to
learn something about the shape of the PDF of a random variable. On the horizontal axis,
we divide the values of the variable of interest (e.g., OLS residuals) into suitable intervals,
and in each class interval we erect rectangles equal in height to the number of
observations (i.e., frequency) in that class interval. If you mentally superimpose the bell
shaped normal distribution curve on the histogram, you will get some idea as to whether
normal (PDF) approximation may be appropriate. A concrete example is given in Section
5.13 (see Figure 5.8). It is always a good practice to plot the
histogram of the residuals as a rough and ready method of testing for the normality
assumption.
29
30
As noted earlier, if the fitted line in the NPP is approximately a straight line, one can
conclude that the variable of interest is normally distributed.
In Figure 5.7, we see that residuals from our illustrative example are approximately
normally distributed, because a straight line seems to fit the data reasonably well.
MINITAB also produces the Anderson–Darling normality test, known as the A2 statistic.
The underlying null hypothesis is that the variable under consideration is normally
distributed. As Figure 5.7 shows, for our example, the computed A2 statistic is 0.394. The
p value of obtaining such a value of A2 is 0.305, which is reasonably high. Therefore, we
do not reject the hypothesis that the residuals from our consumption–income
example are normally distributed. Incidentally, Figure 5.7 shows the parameters of
the (normal) distribution, the mean is approximately 0 and the standard deviation
is about 6.12.
31
32
Jarque–Bera (JB) Test of Normality. The JB test of normality is an asymptotic, or
large-sample, test. It is also based on the OLS residuals. This test first computes the
skewness and kurtosis measures of the OLS residuals and uses the following test statistic:
where n = sample size, S = skewness coefficient, and K = kurtosis coefficient.
For a normally distributed variable, S = 0 and K = 3. Therefore, the JB test of normality is a
test of the joint hypothesis that S and K are 0 and 3, respectively. In that case the value of
the JB statistic is expected to be 0.
Under the null hypothesis that the residuals are normally distributed, Jarque and Bera
showed that asymptotically (i.e., in large samples) the JB statistic given in (5.12.1) follows
the chi-square distribution with 2 df. If the computed p value of the JB statistic in an
application is sufficiently low, which will happen if the value of the statistic is very different
from 0, one can reject the hypothesis that the residuals are normally distributed. But if the p
value is reasonably high, which will happen if the value of the statistic is close to zero, we do
not reject the normality assumption.
The sample size in our consumption–income example is rather small.
Hence, strictly speaking one should not use the JB statistic. If we mechanically apply the JB
formula to our example, the JB statistic turns out to be 0.7769. The p value of obtaining such
a value from the chi-square distribution with 2 df is about 0.68, which is quite high. In other
words, we may not reject the normality assumption for our example. Of course, bear in
mind
33 the warning about the sample size.
Download