random hence

advertisement
3. Inference in the single regression
This section contains the paragraphs
(3.1) Simple t-tests
(3.2) Confidence intervals for the regression coefficient  1
(3.3) The P  value
(3.4) The least square prediction
Sometimes economic theory asserts that the regression coefficients should take
specific values. For example, Friedman’s “Permanent Income hypothesis” put forward
the hypothesis that the intercept term  0  0 . If our model is a macro consumption
function, the slope parameter is the marginal propensity to consume income. Thus,
macro economists might be interested in specific hypothesis regarding  1 . But how
should we proceed to test specific hypothesis on  0 and  1 ?
3.1 Simple t-tests
In this paragraph we shall describe procedures for testing hypotheses on  1 .
When we have learned these procedures, we understand that similar approach can be
used to test hypotheses on the intercept parameter.
A test of  1 will naturally take the sampling distribution of the estimator ˆ1 as
~
a starting point. We know already that in this distribution the expectation E ( 1 )  1 ,
and that the variance Var ( ˆ1 ) is given by (2.4.4). From the expression (2.2.6) for the
estimator ˆ1 we understand that the distribution of ˆ1 is for given X 1 , X 2 ,..., X N ,
determined by the distribution the dependent variables Y1 , Y2 ,..., YN . We understand
that the distribution of ˆ1 will be unknown until we specify a particular distribution
of Y1 , Y2 ,..., YN .
Everything will be put in order when specify the distribution of the random
1
disturbances  1 ,  2 ,...,  N . . Hence, we supplement the assumptions (2.1.5)- (2.1.7) by
a fourth one, namely!
 i is N (0,  2 )
(3.1.1)
and independently, identically distributed for all
i.
Since Yi is a linear function of  i , it follows that:
Yi is N (  0  1 X i ,  2 )
(3.1.2)
The independence of the  ’s implies that independence of the Y’s. Since the X’s are
deterministic, all random variation of Yi is determined by the randomness of  i .
We also observe that the OLS estimator ˆ1 is a linear function of Y1 , Y2 ,...., YN
We can write:
ˆ1 
1
N
(X
i 1
(3.1.3)
i
 X)
{( X 1  X )Y1  ( X 2  X )Y2  ...  ( X N  X )YN }
2
N
  ciYi
i 1
where
ci 
(3.1.4)
(Xi  X )
N
(X
i 1
i
 X )2
Since ˆ1 is a linear function of Y1 , Y2 ,...., YN , and all the Y ' s are normally
distributed, it follows from standard results in statistics that the estimator ˆ1 is
normally distributed. We write:
(3.1.5)
ˆ1 is N ( 1 ,  2
N
(X
i 1
i
 X )2 )
In order to get a better “feeling for the problem”, let us assume that we wish to
test a null hypothesis H 0 : 1  0 , given a certain data set. Suppose that if the
calculated estimate of 1 happens to be situated far from 0, then we reject H 0 .
2
However, we realize that we cannot be certain that this is the correct decision since
ˆ1 is a random variable. How should we approach this problem so that the decision
reject or not reject the hypothesis can be given a sound footing. If follows from (3.1.5)
that when H 0 is correct the distribution of the estimator ˆ1 is given by:
(3.1.6)
ˆ1 is N (0,  2
N
(X
i 1
i
 X )2 )
Hence, even if our estimate is situated far from 0, it is not fully contradicted by
the distribution of ˆ1 under H 0 . We understand that whatever decision, regarding
rejecting or not rejecting, we take a certain risk that our decision will be wrong:
We
might reject the null hypothesis when it is in fact true, or we might not reject when it
is in fact false. In this literature one talks about making errors of Type I (the first
situation) or of Type II (the second situation). To solve this dilemma the following
proposal is reasonable. Since we specify a given value of 1 under H 0 , we know the
distribution of ˆ1 when H 0 is true. Utilizing this distribution we can compute the
exact probability of committing a Type I error. In this literature this probability is
called the level of significance of the test. It is so important that it deserve a separate
definition.
Definition 3.1. The level of significance of a test is that probability we accept for
rejecting H 0 when H 0 is true.
Usually the level of significance (  ) is chosen to be a small number, typically
 =0.05,  =0.025,   0.01 etc.
What remains to know before we can actually carry out a test of a particular
hypothesis about  1 is a few more technical calculations.
Suppose
wish
to
test
the
null
10 is a specified number.
3
hypothesis
H 0 : 1  10
where
Assuming that H 0 is true we know that
N
(X
ˆ1 is N ( 10 ,  2
(3.1.7)
i 1
i
 X )2 )
From this fact follows immediately that.
ˆ1   1
Z
(3.1.8)
N
(X
2
i 1
i

 X )2
ˆ 1  1
is N (0,1)
Std ( ˆ1 )
Hence , the normalized variable Z has a standard normal distribution with mean 0
and standard deviation 1. But to be used as a test statistic, Z has a serious
drawback since it depends on the variance  2 which is unknown to us. But we
2
are almost home. We know that the estimator ˆ given by (2.4.7) is an unbiased
2
2
estimator of  . It is very tempting to substitute this estimator for  in (3.1.8).
Hence, we have:
T
(3.1.9)
ˆ1  10

ˆ 2
(X
i
ˆ1  10
ˆ ( ˆ1 )
Std
 X )2
where Stdˆ ( ˆ1 ) denotes the estimated standard deviation of ˆ1 . Luckily, we also
know the distribution of the random variable T . Usually, we call it Student’s
t-distribution, a distribution which closely related to the N (0,1) distribution. It is
uni-modal and symmetric about 0. The mean E (T )  0 and the variance is given by
Var(T )  N N  2, so that Var(T )  1 when N increases.
Figure (3.1.1)
When we test the null hypothesis H 0 : 1  10 , it is reasonable to test H 0 against
4
the three alternative hypotheses specified below, that is to say:
(3.1.10)
H0
H 1A
H A2
1  0
1  01
1  01
H A3
1  01
Let us describe the procedure to follow when we wish to test H 0 against H 1A
and we choose a level of significance  . In this case we doubt the truth of
H 0 when
the estimate ˆ1 is considerably larger than  10 . The larger the difference between
ˆ1 and  10 , the more we doubt the truth of H 0 . The chosen level of significance of
the test,  , helps us to determine a threshold or critical value of the test statistic T. If
the observed value of the test statistic, T  ( ˆ1  10 ) Stdˆ ( ˆ1 ) is larger than the
threshold value, the null hypothesis is rejected.
Since the test statistic T is t-distributed under H 0 , the critical value t c is
determined from the equation:
(3.1.11)


P T  tc  
where  is the chosen level of significance of the test.
Decision rule: Reject H 0 when the estimated value Tˆ of the test statistic (3.1.9)
falls in the interval to the right of the critical value t c , that is if
Tˆ  tc , if Tˆ  tc
do not reject H 0 .
Figure (3.1.2)
Put in formal terms: if Tˆ  (tc , ) reject H 0 if Tˆ  (, tc ) do not reject H 0 .
If we wish to test our null hypothesis against H A2 with a level of significance  ,
5
the procedure is similar. In this case we doubt the truth of H 0 when the estimate ˆ1
is considerably smaller than  10
Decision rule: Reject H 0 when the estimate Tˆ  (,tc ) . Do not reject when
Tˆ  (tc ,) .
Figure (3.1.3)
The procedure for testing H 0 : 1  10 against H A3 : 1  10 with level of
significance  , follows the same guiding lines.
In this case we are doubtful about the null hypothesis when the estimate ˆ1 is
either much larger or much smaller than  10 This indicates that one shall restrict
intervals in both tails of the t-distribution. The critical value t c / 2 is determined from
the equation:
(3.1.12)
P( T  tc / 2 )  
Decision rule: Reject
H0
when the estimate
Tˆ
satisfies
Tˆ  tc
Tˆ  (,tc / 2 ) or Tˆ  (tc / 2 ,) . Do not reject H 0 when Tˆ  (tc / 2 , tc / 2 ).
The decision rule is illustrated below:
Figure (3.1.4)
6
i.e.
When we test a null hypothesis against one-sided alternatives, we talk about
one-sided tests. Figures (3.1.2)-(3.1.3) illustrate such tests. In the same way Figure
(3.1.4) illustrates a two-sided test.
When we test the null hypothesis H 0 : 1  10 against the two-sided alternative
hypothesis H A3 : 1  10 , we have seen that the decision rule is to reject H 0 when
the test statistic takes a value in either of the two tails of the t-distribution.
3.2 Confidence intervals for the regression coefficient  1
Closely related to a two-sided test with level of significance  , is the
construction of a confidence interval for  1 with confidence coefficient (1   ) .
Many textbooks seem to prefer the idea of constructing a confidence interval. The
procedure is simple:
We know that:
(3.2.1)
T
ˆ1  1
is t  distributed with N  2 degrees of freedom.
Stdˆ ( ˆ1 )
When we construct a symmetric interval with confidence coefficient (1   ), we
first determine the critical value t c / 2 in the way we have explained above. The
probability of the event  tc / 2  T  tc / 2  is obviously (1   ).
Formally, we put:
(3.2.2)


ˆ  1
P  tc / 2  1
 tc / 2   1  


Stdˆ ( ˆ1 )
Then we can solve the two inequalities with respect to  1 , attaining:
ˆ (ˆ1 )  1  ˆ1  tc  Std
ˆ (ˆ1 )]  1  
P[ˆ1  tc  Std
(3.2.3)
2
2
The interval (3.2.3) is equivalent to the interval appearing in (3.2.2). The upper
7
and lower limits are random since the OLS estimator ˆ1 and Stdˆ ( ˆ1 ) are random
variables. Since (3.2.3) follows directly from (3.2.2), one can interpret it by saying
that the interval has a (1   ) probability of covering the true value  1 .
When the estimates ˆ1 and Stdˆ ( ˆ1 ) are attained the two limits appearing in
(3.2.3) are simply given numbers. Although, it may be tempting, it will be incorrect to
say that the unknown parameter  1 is situated between these two limits with
probability (1   ). The point is that there is no random variables involved when we
consider the estimated limits. The statement: “the unknown slope parameter  1 is
contained in the estimated interval” is either true (i.e.  1 is contained in the interval) or
not true (i.e.  1 is not contained in the interval). So when the sample of observations
are given we calculate the two bounds of the interval and present the result in the
form.
“A 100(1   )% confidence interval for  1 computed from the observed
sample is ( Lˆ , Uˆ ) where Lˆ and Uˆ denote the two bounds of the interval”
Above we indicated that there is a close connection between constructing a
two-sided test for  1 and calculating a confidence interval for  1 . The two-sided
test approach leads to the rule: Reject H 0 when the absolute value of the test
statistic exceeds the threshold t c / 2 i. e. reject when Tˆ  tc / 2 . Thus, we shall not
reject H 0 when
Tˆ  tc / 2 or
 t
c
 /2
 Tˆ  tc / 2

which is equivalent to the
conclusion we deduced by constructing the confidence interval for  1 .
Above we have described in detail how to proceed when testing a null hypothesis
H 0 : 1  10 against three standard alternatives H 1A , H A2 and H A3 . We understand
that the same approach applies directly when we wish to test a null hypothesis on the
intercept  0 .
8
Now we use the test statistic
(3.2.4)
G
ˆ0   0
Stdˆ ( ˆ0 )
By standard arguments G is t-distributed with (N-2) degrees of freedom. From
now on we can continue as above.
3.3 The P-value
In constructing the simple t-tests a notion that naturally emerges is the so called
p-value. The p-value, also called the significance probability, is the probability of
drawing a value of the test statistic at least as adverse to the null hypothesis as the one
we actually computed in our sample, assuming the null hypothesis is correct.
Let us assume that we wish to test
(3.3.1)
H 0 : 1  0
against
H 1A :   0
with level of significance  . From what has been said above we know the decision
rule: Reject H 0 when the calculated value of the test statistic, Tˆ , is larger than the
critical value t c .
When H 0 is true, we know the test statistic
(3.3.2)
T
ˆ1
is t  distributed with N  2 degrees of freedom.
Stdˆ ( ˆ1 )
In order to illustrate a dilemma which easily occurs we use the following figure.
Figure (3.3.1)
In this figure the critical values t 0c.05 and t 0c.15 correspond to the significance
9
levels   0.05 and   0.15 . When the estimates ˆ1 and Stdˆ ( ˆ1 ) are available, we
compute the test statistic Tˆ using (3.3.2). Our decision rule says that if Tˆ is situated
to the right of the relevant critical value t c , we shall reject H 0 . We notice that the
decision to reject H 0 depends on the chosen level of significance  . Let us now
have a closer look at the computed value, Tˆ , of the test statistic T . When H 0 is
true we know that the test statistic T has a t-distribution with N  2 degrees of

freedom. Therefore, it is simple task to compute the probability of the event P T  Tˆ

under H 0 . It is precisely this probability which is called the P  value . That is:
(3.3.3)

P  value  P T  Tˆ

given that H 0 is true.
A specific illustration is shown in Figure (3.3.1). Since Tˆ is situated to the left of
t 0c.05 , the P  value will be larger than   0.05 . But it will be smaller than
  0.15 since it is situated to the right of t 0c.15 .
Above we have computed the P-value when we tested H 0 against the one-sided
alternative H 1A . If we faced the test problem:
(3.3.4)
H 0 : 1  0 against H A3
(i.e. the two-sided alternative), the situation is a bit different. Let Tˆ denoted the


estimated value of T . If Tˆ  0 we compute the probability P T  Tˆ , this
probability is multiplied by 2 since we test against a two-sided alternative. If Tˆ  0


we compute the probability P T  Tˆ and multiply by 2.
When we test a null hypothesis against a one-sided alternative, we could the call the
calculated P  value a one-sided P  value. Similarly, when we test against a
two-sided alternative we could call the P  value a two-sided P  value .
Owing to what we have deduced above, we have the following conclusion:
10
Reject H 0 whenever the computed P-value is less than the level of significance  . If
the P-value is larger than the significance level  do not reject H 0 .
Calculating P-values are informative, but our decision dilemma is unchanged:
Shall we reject or not reject the null hypothesis. Any interest in the P-value cannot
conceal this fact.
3.4 The least square prediction
Some times the ultimate aim of our econometric modeling is to predict the value
of the dependent variable (for example a future value of Y ), given the values of the
P
explanatory variables. If X denotes the value of the explanatory variable X, the
value of the dependent variable is, according to our model, given by:
(3.4.1)
Y   0  1 X P  
Since the random disturbance  fluctuates in a purely random fashion, our best
prediction of  is given by the mean E    0. If
we know the structural
parameters  0 and  1 , a natural predictor is given by:
(3.4.2)
Y P   0  1 X P
since the disturbance terms are uncorrelated the best predictor of  is, of course, its
mean 0 .
But  0 and 1 are unknown and have to be estimated. It is very reasonable to use
the OLS estimators ˆ0 and ˆ1 . Hence, an estimator of the predictor is given by:
(3.4.3)
Yˆ P  ˆ0  ˆ1 X P
As a measure of the prediction uncertainty it is natural to use the prediction error
P
F  (Y  Yˆ P ) . Since both Y and the predictor Yˆ are random variables, F is
certainly also random. However, we observe directly:
(3.3.4)
E( F )  E(0  1 X P    ˆ0  ˆ1 X P )  0
11
(3.3.4) follows from the unbiasedness of the OLS estimators ˆ0 and ˆ1 .
As to the variance of the prediction error we calculate:
(3.4.5)
Var ( F )  Var (Y  Yˆ P )  Var (Y )  Var (Yˆ P )
2
P
The variance of Y is simply  , the variance of Yˆ is calculated using
(2.4.3)-(2.4.5), giving
1
(X P  X 2)
Var (Yˆ P )  2 (  N
)
N
2
(Xi  X )
i 1
(3.4.6)
Hence, we have:
1
( X P  X )2
Var ( F )   (1   N
)
N
2
(Xi  X )
2
(3.4.7)
i 1
P
Note that the variance of F has a minimum when X  X . In this formula
everything is known except for the variance  2 .
2
However, we have a natural estimator of  (given by (2.4.7) above). Using
standard arguments we deduce:
Y  Yˆ 
P
(3.4.8)
Std (Y  Yˆ P )
is t - distributed with N  2 degrees of freedom.
where:
(3.4.9)
1
( X P  X )2
Std (Y  Yˆ P )  ˆ 2 (1   N
)
N
2
(X i  X )
i 1
Constructing a confidence interval for Y
will now follow the familiar guiding lines
we have shown in detail above. Note that many econometric textbooks prefer to call
this interval prediction for Y . Let us assume that we wish to construct a prediction
interval with confidence coefficient (1   ) . Then we start by the relation
12
(3.4.10)


Y  Yˆ P
P  tc / 2 
 tc / 2   1  
P
Std (Y  Yˆ )


Following the recipe described in section 3.2 we find the following interval limits
for Y
(3.4.11)
Yˆ
P

 tc / 2 Stdˆ (Y  Yˆ P ), Yˆ P  tc / 2 Stdˆ (Y  Yˆ P )
Interpreting this interval we could say something like that we said above: ”A
100(1   )% prediction interval for Y computed from an observed sample is given
by the lower and upper bounds Lˆ and Uˆ appearing in (3.4.11). Observe that there is
an important difference between the interval constructed in section (3.2) and the one
we have produced here. In section (3.2) the interval is meant to cover the unknown
non-random slope parameter  1 . In this section the interval is meant to cover the
unknown but random variable Y .
13
Download