random hence

3. Inference in the single regression This section contains the paragraphs (3.1) Simple t-tests (3.2) Confidence intervals for the regression coefficient  1 (3.3) The P  value (3.4) The least square prediction Sometimes economic theory asserts that the regression coefficients should take specific values. For example, Friedman’s “Permanent Income hypothesis” put forward the hypothesis that the intercept term  0  0 . If our model is a macro consumption function, the slope parameter is the marginal propensity to consume income. Thus, macro economists might be interested in specific hypothesis regarding  1 . But how should we proceed to test specific hypothesis on  0 and  1 ? 3.1 Simple t-tests In this paragraph we shall describe procedures for testing hypotheses on  1 . When we have learned these procedures, we understand that similar approach can be used to test hypotheses on the intercept parameter. A test of  1 will naturally take the sampling distribution of the estimator ˆ1 as ~ a starting point. We know already that in this distribution the expectation E ( 1 )  1 , and that the variance Var ( ˆ1 ) is given by (2.4.4). From the expression (2.2.6) for the estimator ˆ1 we understand that the distribution of ˆ1 is for given X 1 , X 2 ,..., X N , determined by the distribution the dependent variables Y1 , Y2 ,..., YN . We understand that the distribution of ˆ1 will be unknown until we specify a particular distribution of Y1 , Y2 ,..., YN . Everything will be put in order when specify the distribution of the random 1 disturbances  1 ,  2 ,...,  N . . Hence, we supplement the assumptions (2.1.5)- (2.1.7) by a fourth one, namely!  i is N (0,  2 ) (3.1.1) and independently, identically distributed for all i. Since Yi is a linear function of  i , it follows that: Yi is N (  0  1 X i ,  2 ) (3.1.2) The independence of the  ’s implies that independence of the Y’s. Since the X’s are deterministic, all random variation of Yi is determined by the randomness of  i . We also observe that the OLS estimator ˆ1 is a linear function of Y1 , Y2 ,...., YN We can write: ˆ1  1 N (X i 1 (3.1.3) i  X) {( X 1  X )Y1  ( X 2  X )Y2  ...  ( X N  X )YN } 2 N   ciYi i 1 where ci  (3.1.4) (Xi  X ) N (X i 1 i  X )2 Since ˆ1 is a linear function of Y1 , Y2 ,...., YN , and all the Y ' s are normally distributed, it follows from standard results in statistics that the estimator ˆ1 is normally distributed. We write: (3.1.5) ˆ1 is N ( 1 ,  2 N (X i 1 i  X )2 ) In order to get a better “feeling for the problem”, let us assume that we wish to test a null hypothesis H 0 : 1  0 , given a certain data set. Suppose that if the calculated estimate of 1 happens to be situated far from 0, then we reject H 0 . 2 However, we realize that we cannot be certain that this is the correct decision since ˆ1 is a random variable. How should we approach this problem so that the decision reject or not reject the hypothesis can be given a sound footing. If follows from (3.1.5) that when H 0 is correct the distribution of the estimator ˆ1 is given by: (3.1.6) ˆ1 is N (0,  2 N (X i 1 i  X )2 ) Hence, even if our estimate is situated far from 0, it is not fully contradicted by the distribution of ˆ1 under H 0 . We understand that whatever decision, regarding rejecting or not rejecting, we take a certain risk that our decision will be wrong: We might reject the null hypothesis when it is in fact true, or we might not reject when it is in fact false. In this literature one talks about making errors of Type I (the first situation) or of Type II (the second situation). To solve this dilemma the following proposal is reasonable. Since we specify a given value of 1 under H 0 , we know the distribution of ˆ1 when H 0 is true. Utilizing this distribution we can compute the exact probability of committing a Type I error. In this literature this probability is called the level of significance of the test. It is so important that it deserve a separate definition. Definition 3.1. The level of significance of a test is that probability we accept for rejecting H 0 when H 0 is true. Usually the level of significance (  ) is chosen to be a small number, typically  =0.05,  =0.025,   0.01 etc. What remains to know before we can actually carry out a test of a particular hypothesis about  1 is a few more technical calculations. Suppose wish to test the null 10 is a specified number. 3 hypothesis H 0 : 1  10 where Assuming that H 0 is true we know that N (X ˆ1 is N ( 10 ,  2 (3.1.7) i 1 i  X )2 ) From this fact follows immediately that. ˆ1   1 Z (3.1.8) N (X 2 i 1 i   X )2 ˆ 1  1 is N (0,1) Std ( ˆ1 ) Hence , the normalized variable Z has a standard normal distribution with mean 0 and standard deviation 1. But to be used as a test statistic, Z has a serious drawback since it depends on the variance  2 which is unknown to us. But we 2 are almost home. We know that the estimator ˆ given by (2.4.7) is an unbiased 2 2 estimator of  . It is very tempting to substitute this estimator for  in (3.1.8). Hence, we have: T (3.1.9) ˆ1  10  ˆ 2 (X i ˆ1  10 ˆ ( ˆ1 ) Std  X )2 where Stdˆ ( ˆ1 ) denotes the estimated standard deviation of ˆ1 . Luckily, we also know the distribution of the random variable T . Usually, we call it Student’s t-distribution, a distribution which closely related to the N (0,1) distribution. It is uni-modal and symmetric about 0. The mean E (T )  0 and the variance is given by Var(T )  N N  2, so that Var(T )  1 when N increases. Figure (3.1.1) When we test the null hypothesis H 0 : 1  10 , it is reasonable to test H 0 against 4 the three alternative hypotheses specified below, that is to say: (3.1.10) H0 H 1A H A2 1  0 1  01 1  01 H A3 1  01 Let us describe the procedure to follow when we wish to test H 0 against H 1A and we choose a level of significance  . In this case we doubt the truth of H 0 when the estimate ˆ1 is considerably larger than  10 . The larger the difference between ˆ1 and  10 , the more we doubt the truth of H 0 . The chosen level of significance of the test,  , helps us to determine a threshold or critical value of the test statistic T. If the observed value of the test statistic, T  ( ˆ1  10 ) Stdˆ ( ˆ1 ) is larger than the threshold value, the null hypothesis is rejected. Since the test statistic T is t-distributed under H 0 , the critical value t c is determined from the equation: (3.1.11)   P T  tc   where  is the chosen level of significance of the test. Decision rule: Reject H 0 when the estimated value Tˆ of the test statistic (3.1.9) falls in the interval to the right of the critical value t c , that is if Tˆ  tc , if Tˆ  tc do not reject H 0 . Figure (3.1.2) Put in formal terms: if Tˆ  (tc , ) reject H 0 if Tˆ  (, tc ) do not reject H 0 . If we wish to test our null hypothesis against H A2 with a level of significance  , 5 the procedure is similar. In this case we doubt the truth of H 0 when the estimate ˆ1 is considerably smaller than  10 Decision rule: Reject H 0 when the estimate Tˆ  (,tc ) . Do not reject when Tˆ  (tc ,) . Figure (3.1.3) The procedure for testing H 0 : 1  10 against H A3 : 1  10 with level of significance  , follows the same guiding lines. In this case we are doubtful about the null hypothesis when the estimate ˆ1 is either much larger or much smaller than  10 This indicates that one shall restrict intervals in both tails of the t-distribution. The critical value t c / 2 is determined from the equation: (3.1.12) P( T  tc / 2 )   Decision rule: Reject H0 when the estimate Tˆ satisfies Tˆ  tc Tˆ  (,tc / 2 ) or Tˆ  (tc / 2 ,) . Do not reject H 0 when Tˆ  (tc / 2 , tc / 2 ). The decision rule is illustrated below: Figure (3.1.4) 6 i.e. When we test a null hypothesis against one-sided alternatives, we talk about one-sided tests. Figures (3.1.2)-(3.1.3) illustrate such tests. In the same way Figure (3.1.4) illustrates a two-sided test. When we test the null hypothesis H 0 : 1  10 against the two-sided alternative hypothesis H A3 : 1  10 , we have seen that the decision rule is to reject H 0 when the test statistic takes a value in either of the two tails of the t-distribution. 3.2 Confidence intervals for the regression coefficient  1 Closely related to a two-sided test with level of significance  , is the construction of a confidence interval for  1 with confidence coefficient (1   ) . Many textbooks seem to prefer the idea of constructing a confidence interval. The procedure is simple: We know that: (3.2.1) T ˆ1  1 is t  distributed with N  2 degrees of freedom. Stdˆ ( ˆ1 ) When we construct a symmetric interval with confidence coefficient (1   ), we first determine the critical value t c / 2 in the way we have explained above. The probability of the event  tc / 2  T  tc / 2  is obviously (1   ). Formally, we put: (3.2.2)   ˆ  1 P  tc / 2  1  tc / 2   1     Stdˆ ( ˆ1 ) Then we can solve the two inequalities with respect to  1 , attaining: ˆ (ˆ1 )  1  ˆ1  tc  Std ˆ (ˆ1 )]  1   P[ˆ1  tc  Std (3.2.3) 2 2 The interval (3.2.3) is equivalent to the interval appearing in (3.2.2). The upper 7 and lower limits are random since the OLS estimator ˆ1 and Stdˆ ( ˆ1 ) are random variables. Since (3.2.3) follows directly from (3.2.2), one can interpret it by saying that the interval has a (1   ) probability of covering the true value  1 . When the estimates ˆ1 and Stdˆ ( ˆ1 ) are attained the two limits appearing in (3.2.3) are simply given numbers. Although, it may be tempting, it will be incorrect to say that the unknown parameter  1 is situated between these two limits with probability (1   ). The point is that there is no random variables involved when we consider the estimated limits. The statement: “the unknown slope parameter  1 is contained in the estimated interval” is either true (i.e.  1 is contained in the interval) or not true (i.e.  1 is not contained in the interval). So when the sample of observations are given we calculate the two bounds of the interval and present the result in the form. “A 100(1   )% confidence interval for  1 computed from the observed sample is ( Lˆ , Uˆ ) where Lˆ and Uˆ denote the two bounds of the interval” Above we indicated that there is a close connection between constructing a two-sided test for  1 and calculating a confidence interval for  1 . The two-sided test approach leads to the rule: Reject H 0 when the absolute value of the test statistic exceeds the threshold t c / 2 i. e. reject when Tˆ  tc / 2 . Thus, we shall not reject H 0 when Tˆ  tc / 2 or  t c  /2  Tˆ  tc / 2  which is equivalent to the conclusion we deduced by constructing the confidence interval for  1 . Above we have described in detail how to proceed when testing a null hypothesis H 0 : 1  10 against three standard alternatives H 1A , H A2 and H A3 . We understand that the same approach applies directly when we wish to test a null hypothesis on the intercept  0 . 8 Now we use the test statistic (3.2.4) G ˆ0   0 Stdˆ ( ˆ0 ) By standard arguments G is t-distributed with (N-2) degrees of freedom. From now on we can continue as above. 3.3 The P-value In constructing the simple t-tests a notion that naturally emerges is the so called p-value. The p-value, also called the significance probability, is the probability of drawing a value of the test statistic at least as adverse to the null hypothesis as the one we actually computed in our sample, assuming the null hypothesis is correct. Let us assume that we wish to test (3.3.1) H 0 : 1  0 against H 1A :   0 with level of significance  . From what has been said above we know the decision rule: Reject H 0 when the calculated value of the test statistic, Tˆ , is larger than the critical value t c . When H 0 is true, we know the test statistic (3.3.2) T ˆ1 is t  distributed with N  2 degrees of freedom. Stdˆ ( ˆ1 ) In order to illustrate a dilemma which easily occurs we use the following figure. Figure (3.3.1) In this figure the critical values t 0c.05 and t 0c.15 correspond to the significance 9 levels   0.05 and   0.15 . When the estimates ˆ1 and Stdˆ ( ˆ1 ) are available, we compute the test statistic Tˆ using (3.3.2). Our decision rule says that if Tˆ is situated to the right of the relevant critical value t c , we shall reject H 0 . We notice that the decision to reject H 0 depends on the chosen level of significance  . Let us now have a closer look at the computed value, Tˆ , of the test statistic T . When H 0 is true we know that the test statistic T has a t-distribution with N  2 degrees of  freedom. Therefore, it is simple task to compute the probability of the event P T  Tˆ  under H 0 . It is precisely this probability which is called the P  value . That is: (3.3.3)  P  value  P T  Tˆ  given that H 0 is true. A specific illustration is shown in Figure (3.3.1). Since Tˆ is situated to the left of t 0c.05 , the P  value will be larger than   0.05 . But it will be smaller than   0.15 since it is situated to the right of t 0c.15 . Above we have computed the P-value when we tested H 0 against the one-sided alternative H 1A . If we faced the test problem: (3.3.4) H 0 : 1  0 against H A3 (i.e. the two-sided alternative), the situation is a bit different. Let Tˆ denoted the   estimated value of T . If Tˆ  0 we compute the probability P T  Tˆ , this probability is multiplied by 2 since we test against a two-sided alternative. If Tˆ  0   we compute the probability P T  Tˆ and multiply by 2. When we test a null hypothesis against a one-sided alternative, we could the call the calculated P  value a one-sided P  value. Similarly, when we test against a two-sided alternative we could call the P  value a two-sided P  value . Owing to what we have deduced above, we have the following conclusion: 10 Reject H 0 whenever the computed P-value is less than the level of significance  . If the P-value is larger than the significance level  do not reject H 0 . Calculating P-values are informative, but our decision dilemma is unchanged: Shall we reject or not reject the null hypothesis. Any interest in the P-value cannot conceal this fact. 3.4 The least square prediction Some times the ultimate aim of our econometric modeling is to predict the value of the dependent variable (for example a future value of Y ), given the values of the P explanatory variables. If X denotes the value of the explanatory variable X, the value of the dependent variable is, according to our model, given by: (3.4.1) Y   0  1 X P   Since the random disturbance  fluctuates in a purely random fashion, our best prediction of  is given by the mean E    0. If we know the structural parameters  0 and  1 , a natural predictor is given by: (3.4.2) Y P   0  1 X P since the disturbance terms are uncorrelated the best predictor of  is, of course, its mean 0 . But  0 and 1 are unknown and have to be estimated. It is very reasonable to use the OLS estimators ˆ0 and ˆ1 . Hence, an estimator of the predictor is given by: (3.4.3) Yˆ P  ˆ0  ˆ1 X P As a measure of the prediction uncertainty it is natural to use the prediction error P F  (Y  Yˆ P ) . Since both Y and the predictor Yˆ are random variables, F is certainly also random. However, we observe directly: (3.3.4) E( F )  E(0  1 X P    ˆ0  ˆ1 X P )  0 11 (3.3.4) follows from the unbiasedness of the OLS estimators ˆ0 and ˆ1 . As to the variance of the prediction error we calculate: (3.4.5) Var ( F )  Var (Y  Yˆ P )  Var (Y )  Var (Yˆ P ) 2 P The variance of Y is simply  , the variance of Yˆ is calculated using (2.4.3)-(2.4.5), giving 1 (X P  X 2) Var (Yˆ P )  2 (  N ) N 2 (Xi  X ) i 1 (3.4.6) Hence, we have: 1 ( X P  X )2 Var ( F )   (1   N ) N 2 (Xi  X ) 2 (3.4.7) i 1 P Note that the variance of F has a minimum when X  X . In this formula everything is known except for the variance  2 . 2 However, we have a natural estimator of  (given by (2.4.7) above). Using standard arguments we deduce: Y  Yˆ  P (3.4.8) Std (Y  Yˆ P ) is t - distributed with N  2 degrees of freedom. where: (3.4.9) 1 ( X P  X )2 Std (Y  Yˆ P )  ˆ 2 (1   N ) N 2 (X i  X ) i 1 Constructing a confidence interval for Y will now follow the familiar guiding lines we have shown in detail above. Note that many econometric textbooks prefer to call this interval prediction for Y . Let us assume that we wish to construct a prediction interval with confidence coefficient (1   ) . Then we start by the relation 12 (3.4.10)   Y  Yˆ P P  tc / 2   tc / 2   1   P Std (Y  Yˆ )   Following the recipe described in section 3.2 we find the following interval limits for Y (3.4.11) Yˆ P   tc / 2 Stdˆ (Y  Yˆ P ), Yˆ P  tc / 2 Stdˆ (Y  Yˆ P ) Interpreting this interval we could say something like that we said above: ”A 100(1   )% prediction interval for Y computed from an observed sample is given by the lower and upper bounds Lˆ and Uˆ appearing in (3.4.11). Observe that there is an important difference between the interval constructed in section (3.2) and the one we have produced here. In section (3.2) the interval is meant to cover the unknown non-random slope parameter  1 . In this section the interval is meant to cover the unknown but random variable Y . 13

random hence

Related documents

Products

Support

random hence

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib