1 Statistical Inference from the Ordinary Least Squares Model The reading assignments have derived the OLS estimator. Using assumptions A – E given in the Assumptions Reading Assignment, the mean and the variance of the OLS estimator is derived. The distribution of the estimator, however, has not been defined. The mean and variance do not generally define a statistical distribution. To conduct statistical tests, the distribution (normal, t-statistic, F distribution, Chi-Squared) must be known. The subject of this reading assignment is inference, statistical and economic, from the OLS regression. Statistical inference involves several tests on the estimated parameters. These tests will involve tests of associated with either a single parameter or a group of parameters. Economic inference includes these statistical tests, but also includes estimated parameter’s magnitude and sign. Inference concerning the previously discussed goodness-of-fit measure, R2, will also be expanded in this reading assignment. Additional Assumptions Necessary for Statistical Inference Because the distribution of the OLS estimator is not given by Assumptions A – E, we need to rely on one of two assumptions to perform statistical tests. Under either of these two assumptions, the distributions for the statistical tests can be derived. In both cases, the distribution is the same. Deriving the distributions is beyond the scope of this class. Assume Normality The most restrictive assumption is the error terms in the model are normally distributed. We have already assumed they have a mean of zero and constant variance. This assumption is written as u i ~ N (0, 2 ) . Central Limit Theorem Instead of assuming the error terms are normally distributed, we can rely on the Central Limit Theorem. In either case, the same distributions are obtained. The Central Limit Theorem (CLT) states: given any distribution that has a mean, , and a variance,2, the distribution of sample means drawn at random from the original distribution approaches the normal 2 distribution with a mean and variance n as the sample size increases. This theorem can also be stated in the form, let q1, q2, q3, . . ., qn be independent and randomly distributed from any distribution that has a mean of and a variance 2. The mean of qi is: q then n 1 q (q1 q 2 q3 q n ) i n n i 1 2 z q E[ q ] q . [var( q )] 2 n In this form, the CLT states that the average value of n independent random variables from any probability distribution (as long as it has as mean and a variance) will have an approximately a standard normal distribution after subtracting its mean and dividing by its standard deviation, if the sample size, n, is large enough. The standard normal distribution is N(0, 1), that is a mean of zero and a variance of one. Why n and not n 1 ? We are interested in the standard error of a series of means and not the standard error of a sample of random numbers. Second, in practice, n is large, therefore, n and n - 1 are not very different. The sample means of the random variables approximate a normal distribution. Why is the CLT important and how is it use? To answer these questions, we need to examine the error term, u. What is u? Given the model set-up, the error term includes all factors affecting the dependent variable that are not included as an independent variable. That is, the error term includes everything affecting y except the x’s. Therefore, u is a sum of many different factors affecting y. Because u is a sum of many different factors, we can invoke the CLT to conclude u approximates the normal distribution. Why is the CLT important? If we consider the error term as a sum of many different factors, the CLT states the error term will approximate the normal distribution. The CLT use is similar to assuming the error terms are normally distributed. Thus, by invoking the CLT, statistical distributions can be derived. These distributions allow for statistical tests on the estimated OLS parameters. Although, the assumptions of either normality or invoking the CLT are the most restrictive assumptions made to use OLS, these assumptions allow statistical tests to be performed. These statistical tests are maybe the most important component of running an OLS regression. Statistical inference is a powerful aspect of OLS. You must understand the following statistical tests. Inference Concerning Parameters As noted earlier inference involves statistical tests and examining the estimated coefficients’ magnitude and sign. In this section, aspects concerning individual parameters are discussed. t-tests Individual t-tests can be done for each estimated coefficient. Under Assumptions A - E and either assuming normality of the error terms or invoking the CLT, the following can be shown: (1) (ˆ j j ) var( ˆ j ) ~ t n k . 3 This equation states the result obtained by dividing the quantity obtained by subtracting any value, j, from the estimated coefficient, ̂ j , by the standard error of the estimate for ̂ j will be distributed as a student t-distribution (t-distribution) with n - k degrees of freedom. Note, before a particular sample is taken, the estimates for ̂ j are a random number. That is, the estimator has a distribution associated with it. This is why the above equation is a distribution. After the OLS estimates are obtained for a particular sample, ̂ j become as fixed number. The statistics and mathematics necessary to derive this result are beyond this class. We will take this result as a given. t-distribution. Before applying the above result to your OLS estimates, it is informative to review, hypothesis formulation and testing, the student t-distribution, and the use of the tdistribution in statistical testing. As noted in the statistics reading assignment, the t-distribution is one of the most important statistical distributions. See the statistics reading assignment for the general t-test. You are responsible for knowing this test upside down and backwards. The t-distribution is a symmetric bell-shaped distribution, but the shape (probabilities) depends on the degrees of freedom of the distribution. For different degrees of freedom, the tdistribution has different critical values. As the degrees of freedom increase, the t-distribution approaches the normal distribution. On the web page is a file containing a table associated with the t-distribution. At this point, you should download this file and confirm the distribution various by degrees of freedom. Given a distribution, statistical tests associated with various hypotheses can be conducted. To conduct hypotheses testing, a null hypothesis and an alternative hypothesis are necessary. It is important that the alternative and null hypothesis cover all possible outcomes. The null hypothesis is commonly denoted as Ho, whereas, the alternative is commonly denoted as HA. Several alternative and null hypotheses are: Null Alternative H0: HA: j = 0 j 0 j > 0 j 0 j < 0 j 0. In this list, three different null hypotheses are given in the top row and the associated alternative in the second row. For each null, the alternative hypothesis covers all possible alternatives not given by the null hypothesis. For example, consider the first null hypothesis, H0: j = 0. Given this null, two different alternatives are possible, j could be less than zero or j could be greater than zero. Both alternatives are covered by the alternative hypothesis of j 0. An alternative hypothesis of j > 0 would be inappropriate for the null hypothesis of H0: j = 0. It is inappropriate because it does not cover the potential for j < 0. If your test statistics was such that it implied j < 0, you would not be able to make any inference from your two hypothesis. It is important to set-up your null and alternative hypothesis such that they cover all possible alternatives. Given the different null and alternative hypotheses, the t-test can be either a two-tailed test or a one-tailed test. Knowing if the test is one- or two-tailed is important in conducting the 4 test, the value associated with the critical value will depend on the tails, and in interpreting the test inference. A two-tailed test is given by H0: HA: j = d j d. An example of a one-tailed test is H0: HA: j d j < d. In general, these examples show that when using a t-test, any value can be tested as given by the general notation, d. It is not necessary to that d = 0, as given in the previous examples. The number d can be any value. The “fail to reject” and “rejection” regions are different between one- and two-tailed tests. To conduct a test, a level of significance must be chosen. The level of significance is given by . The probability of a “Type I” error is given by the level of significance. A Type I error occurs when the null hypothesis is rejected, but the hypothesis is true. Associated any level of significance is a critical value. The critical value is the point of demarcation between the acceptance and region regions. Before proceeding, a few words about Type I and II errors is appropriate. The two types of errors are defined in table 1. Table 1. Type I and Type II Errors Defined Decision Regarding Statistical States of the World Test Null hypothesis true Null hypothesis false Reject null Type I error Correct decision Do not reject null Correct decision Type II error We can fix the probability of a Type I error by picking a value for . Unfortunately, the same control over a Type II error is not possible. The probability of a Type II error can only be calculated if we know the true value for the estimated parameters. If we know the true value of the parameters, there is no reason to perform statistical tests. We can, however, state three important aspects concerning Type I and II errors. 1) The probabilities of Type I and Type II errors are inversely related. This means as you decrease the probability of a Type I error, you are increasing the probability of a Type II error and vice versa. 2) The closer the true value is to the hypothesized value, the greater the chance for a Type II error. 3) The t-test may be the best test, because for a given probability for a Type I error, the test minimizes the probability of a Type II error. 5 For a two-tailed test, the fail to reject and rejection regions are given by α/2. This is shown in figure 1. Rejection Region α/2 Rejection Region α/2 Fail to Reject Region Figure 1. Two-tailed test A one-tailed test fail to reject and rejection regions are defined by α probability in one of the tails as shown in figure 2. Fail to Reject Region Rejection Region α Rejection Region α Fail to Reject Region Figure 2. Two Cases Associated with a One-tailed Test For either a two-tailed or a one-tailed test, you calculate a value based on equation (1). Then the null hypothesis is either failed to reject or rejected based on where the calculated tstatistic values falls. Key Point: the test is testing hypothesis concerning the population parameters. The test is not testing hypothesis about the estimated parameters from a particular sample. Once a sample is taken, the estimated values are a fixed number. It makes no sense to test if a given number is equal to some other number. We know the value for a given number. Application to OLS. At the beginning of this inference section, we stated we could use the estimated parameters and their estimated variance to obtain a statistic that is distributed as a tdistribution. Combining our knowledge of hypothesis testing with this result, it is clear we can conduct tests using the estimated parameters. These tests are concerning with hypothesis 6 concerning the true parameters and are not testing hypothesis about the sample. Recall, for a given sample, your OLS estimates are a unique set of fixed numbers. As an application to OLS, let’s assume you have estimated the following equation, with estimated standard errors beneath each estimated parameter: (2) y t 1.5 5.2x t (0.25) (5.2) . It is not uncommon to see estimated equations written in this form. Here, the estimated slope is 5.2 and the estimated intercept is 1.5. These estimated values come from the OLS estimator given by the equation ˆ (X' X ) 1 X' Y . The standard errors (the square root of the variance) of the estimated parameters are 0.25 for the intercept and 5.2 for the slope parameter. These standard errors are the square root of the diagonal elements from the estimator of the variance of the estimated parameters given by var( ˆ ) ˆ 2 (X' X) 1 . Lets test the following hypothesis: H0: HA: 1 2 1 < 2. Inserting the values from equation (2) into equation (1), the t-statistic becomes: (3) ˆ t n k ~ (1 1 ) var( ˆ 1 ) 1.5 2 2.0 . 0.25 The next step is to use a t-table to obtain the critical value for your assumed level of significance. Assuming 28 degrees of freedom (n - k, n = 30 and k = 2) and an = 0.05, the critical value is 1.701. At this point, you should look at the t-table given on the class web site and convince yourself you know how the critical value was determined. This is a one-tailed test and we are interested in the left-hand tail. Notice the hypotheses are concerned with the true parameter value, , and not the estimated value, ̂ . Graphically, the problem is stated in figure 3. Rejection Region α = 0.05 Fail to Reject Region -2.0 -1.701 Figure 3. One-tailed t-test Example 7 In this example, the calculated value falls into the rejection region. Therefore, we would reject the null hypothesis the 1 = 0. If we chose level of significance equal to 0.025, the critical value would be -2.048. At this level of significance, we would fail to reject the null hypothesis. This examples illustrated that different statistical conclusions (inferences) can be reached depending of the level of significance chosen. It is important for you to think about what are the statistical and economic implications of choosing different levels? As a second example, lets test the following null hypothesis; H0: HA: 2 = 0 2 0 Inserting the values from equation (2) into equation (1), the t-statistic becomes: (3) ˆ t n k ~ ( 2 2 ) var( ˆ 2 ) 5.2 0 1.0 . 5.2 As before, the next step is to use a t-table to obtain the critical value for your assumed level of significance. Assuming 28 degrees of freedom and an = 0.05, the critical values are -2.048 and 2.048. This is a two-tailed test. At this point, you should look at the t-table given on the class web site and convince yourself you know how the critical value was determined. Another point is even through the significance level is the same between the two examples; the critical values differ. This is caused by the one- versus two-tailed aspect. Convince yourself why this occurs. Graphically, the problem is stated in figure 4. Rejection Region α/2 = 0.025 -2.048 Rejection Region α/2 = 0.025 Fail to Reject Region 1 -2.048 Figure 4. Two-Tailed Example In this test, the calculated value falls into the fail to reject region. We would state that we fail to reject the null hypothesis, 2 = 0. Significance of a Variable. Most regression packages, including Excel, print out a specific t-test for every estimated parameter. This test is 8 H0: HA: j = 0 j 0. This test is often referred to as testing if the variable is significant. If the true parameter value is equal to zero, independent variable, xj, has no affect on the dependent variable. This is what is meant by significance of the variable. You will need to know and understand this test. p-values. In addition to printing out the specific t-test associated with the significance of a variable, most regression packages also print out the probability value. The probability value is known as the p-value. It is increasingly common to state the p-value associated with the test statistics rather than choosing a level of significance. It is, therefore, important you understand the meaning of a p-value. The probability value is the probability that the test statistic, t-statistic, takes a value larger than the calculated value. In other words, the p-value for a given t-statistic is the smallest significance level at which the null hypothesis would be rejected. Because the p-value represents the area under a probability density function, p-values range from 0 to 1. P-values are reported as decimals. An illustration will help in the understanding of p-values. From the hypothesis-testing example associated with 2, we obtained a calculated t-value equal to 1. The test was a twotailed test. The p-value is given graphically in figure 5. Rejection Region p/2 = 0.16 Rejection Region p/2 = 0.16 Fail to Reject Region -1 1 Figure 5. p-values Areas Defined for Two-Tailed Test As illustrated in figure 5, the calculated t-statistics are place on the student t-distribution graph. The p-value is the areas in the two tails for a two-tailed test using the calculated t-statistic as the demarcation point between the reject and fail to reject regions. Computer programs automatically compute these areas by integration. A concept that is beyond this class. P-values can also be associated with one-tailed tests. For a one-tailed test, obviously we are interested the area given by only one of the tails. In the example illustrated, the p-value would equal 0.32. P-values can be used several ways. The first is the p-value gives the level of significance associated with the calculated t-statistic. In the above example, if you were to choose a level of 9 significance equal to 0.32, your two-tailed test critical values would equal -1 and 1. At this level of significance, your critical value and test statistic are equal. In other words, you can report the exact level of significance that provides the cut off value between rejecting and failing to reject the null hypothesis. Second, the p-values can be used as follows, the null hypothesis is rejected if the p-value is less than or equal to your chosen level of significance, . In the above example, a level of 0.05 was chosen and a p-value of 0.32 was given. At this level of significance, we would fail to reject the null hypothesis; the p-value is larger than the level of significance. Graphically, we can show the use of p-values for the two-tailed test as follows. Rejection Region p/2 = 0.16 Rejection Region p/2 = 0.16 Fail to Reject Region -1 zc=-2.048 alpha=0.05 1 2.048 zc=-0.67 alpha=0.5 0.67 We are concerned with two levels of significance, = 0.05 and =0.50. At the 5% significance level, the critical values are -2.048 and 2.048, whereas at the 50% level, the critical values are -0.67 and 0.67. The calculated t-statistic fails in the range between these two levels of significance by design for this example. At the 5% critical level, we reject the null hypothesis. At the 50% level we would fail to reject the null hypothesis. The decision rule is given by the null hypothesis is rejected if the p-value is less than or equal to the level of significance, , which is the level of a Type I error you are willing to accept. You fail to reject the null hypothesis if the p-value is greater than the chosen level of significance. For the one-tailed test given above, the p-value is 0.0276. This is the area or probability in the left-hand tail. The p-value also shows why we failed to reject the null hypothesis at the 0.05% level, but rejected the null hypothesis at the 0.025% level. The calculated t-statistic fails in between the critical values associated with these two levels of significance. This is illustrated as follows 10 Rejection Region p = 0.0276 Fail to Reject Region -2.0 zc=-2.048 alpha=0.025 zc=-1.701 alpha=0.05 Confidence Intervals The estimated parameters, ̂' s , are a single number. Such estimates are known as point estimates. A point estimate by definition provides no indication of the reliability of the number. That is, what is reasonable range we would consider the parameter would fall into. Using the tdistribution, a confidence interval can be obtained for each estimated parameter. Confidence intervals are interpreted as follows: if random samples were obtained over and over with the upper and lower confidence interval computed each time, then the unknown value, , would lie in the intervals calculated (1-) percent of the samples. For a single sample, we do not know if the true parameter lies inside or outside of the confidence interval. To calculate a confidence interval, the starting point is the following equation: ˆ j j Pr( t c tc ) 1 var( ˆ ) where Pr denotes probability, tc the critical value from the t-distribution, and all other variables are as previously defined. The critical value, tc, is the value from the t-table using the appropriate α, a two-tailed test, and degrees of freedom. ˆ and var( ) are estimated values from your OLS regressions. β is the true value. This equation gives the probability of being between the two critical values, tc. Rearranging the equation, by multiplying by the standard error of the variance of β (noting this value is positive, because it is the square root of a positive number) gives: Pr( t c var( ˆ ) ˆ j t c var( ˆ ) ) 1 . 11 Subtracting ̂ j from both sides gives: Pr( ˆ j t c var( ˆ ) ˆ j t c var( ˆ ) ) 1 . Multiplying by -1 gives: Pr( ˆ j t c var( ˆ ) ˆ j t c var( ˆ ) ) 1 . Rearranging terms gives: (4) Pr( ˆ j t c var( ˆ ) ˆ j t c var( ˆ ) ) 1 . This gives the 1-α confidence interval. The confidence interval is known as an interval estimate in contrast to the point estimates. Continuing with the above example, the (1 - ) % interval estimators for j’s are obtained by applying equation (4) to the estimated for ˆ 1 and ˆ 2 . The interval estimates are: Pr(1.5 2.048 0.25 2 1 1.5 2.048 0.25 2 ) 1 .05 Pr(0.988 1 2.012) 95% Pr(5.2 2.048 5.2 2 2 5.2 2.048 5.2 2 ) 1 .05 Pr( 5.45 2 15.85) 95% In this example, the confidence interval for the intercept is much smaller than the confidence interval for the slope parameter. The difference is in the estimated standard errors for the parameters. The estimated standard error for the slope parameter is over 20 times larger than the estimated standard error for the intercept. Economic Interpretation of the Parameters Up to this point we have been concerned with obtaining either a point or interval estimate for the parameters and then conducting a t-test for significance. Economic interpretation of the estimated coefficients involves more than these mechanical issues. Economic interpretation of an estimated equation and the individual parameters involves combining economic theory with the estimated parameters and statistics. Interpretation is a science, as well as, an art and comes with experience. We will spend a large part of the class on interpretation. For now, a short discussion is appropriate. 12 Basically, economic interpretation involves the question, “Do the estimated parameters make sense from a theoretical standpoint?” To answer this question, you must look at several issues: 1) 2) 3) significance of the variable - specific t-test discussed earlier, sign of the estimated parameter (positive or negative), and magnitude or size of the estimated parameter. The first issue was previously discussed. Here, we are considering the issue is the parameter different than zero. Given the t-test value, if the null hypothesis of equal to zero is not rejected, the associated independent variable does not affect the dependent variable. You should ask yourself what does theory state about the relationship. If you are estimating a demand equation, own price should be significantly different from zero. That is, own price is expected to affect quantity demanded for a product. If own price is not significantly different than zero, you need to ask yourself, why? Did you do something wrong? A price of a substitute or complement should also be significantly different than zero. You may have included a price of a product that you wanted to test if it was a substitute or complement. Insignificance may indicate the product is not a substitute or a compliment. Sign of the estimated parameter is very important. Continuing with the demand example, the parameter associated with own price should be negative. As own price increases, the quantity demanded should decrease. Estimated parameters associated with the price of substitutes (complements) should be positive (negative). As the price of the substitute (complement) increases, the quantity of the good in questions should increase (decrease). What would you expect for the sign of the estimated parameter for income in a demand equation? The magnitude of the estimated parameters must be examined. For example, if you estimate a demand equation for Pepsi, what would you expect the magnitude of the parameter on own price to be? This is a difficult question to answer, but nonsensical parameter estimates can be weeded out. For example, if the estimated coefficient indicated if the price of Pepsi increased by $1 per 20-ounce bottle, the number of 20-ounce bottles sold would decrease by one bottle per day in the U.S would you believe your results? Do you think Pepsi could more than double its price and have little impact on demand for its product? Is the following reasonable? Your estimated coefficient indicates a $1 increase would cause the number of 20-ounce Pepsi bottles sold in the U.S. to decrease by one-half. This is where experience, economic theory, and prior studies become important. Inference from Multiple Parameters To this point the discussion on inference has been concerned with inference associated with each estimated parameters separately. Two measures that are concerned with making inference from more than a single parameter are discussed in this section, adjusted R2 and Ftests. Adjusted Coefficient of Determination 13 In a previous reading assignment, we defined the coefficient of determination, R2, as the amount of sample variation in the y’s that is explained by the x’s. R2 range between zero and one. The equation for calculating R2 is: SSR R SST 2 ( ŷ (y i i SSE û i . 1 1 2 SST y) ( y i y) 2 y) 2 2 This statistic is a measure of the goodness-of-fit of the equation. Thus, the measure is looking at how all the independent variables together explain the dependent variable. It is no longer looking at an individual estimated parameter. There is a problem with using R2. The problem is R2 is sensitive to the number of independent variables. Addition of another independent variable increases the R2. Therefore, to maximize the R2 all you have to do is add additional variables. One can obtain an R2 of one by having the number of independent variables equal the number of observations, n = k. This problem can be shown by examining the equation to calculate R2: SSR R SST 2 ( ŷ (y i y) 2 i y) 2 . An addition independent variable will not affect SST. SST is strictly the variation in the observed y’s around there mean. The independent variables have no impact on this variation. Adding independent variables will affect the estimated y’s. Adding additional x’s will increase the amount of the variation explained, increasing ( ŷ i ŷ) 2 . Increasing the numerator and not changing the denominator will cause R2 to increase. R2 is concerned with variation. The solution to the R2 problem is to concern ourselves with the variance instead of the variation. The adjusted coefficient of determination, R 2 , is defined as: R2 1 SSE ˆ (n k ) 1 1 SST var( y) (n 1) 2 (y ŷ i ) 2 i (y i y) (n k ) 2 (n 1) where ̂ 2 is the variance of the error terms or variance of the y’s net of the influence of the x’s, and var(y) is the variance of the y’s. As in the equation for R2, increasing the number of independent variables has no impact on the dominator in the equation for R 2 . Increasing the number of independent variables affects both components in the numerator. Increasing the number of independent variables will decrease the SSE. This is the same affect as for R2 increasing the SSR will cause SSE to decrease. At the same time, the degrees of freedom (n - k) also decreases, with n constant and k increasing, n - k will decrease. 14 Adding additional independent variables increases R2. For the adjusted coefficient of determination, it is shown adding additional independent variables will not necessarily increase R 2 . If the additional variable(s) help to explain the dependent variable, R 2 will increase. If the additional independent variable(s) do not help explain the dependent variable, R 2 will decrease. This occurs because R 2 takes into account the degrees of freedom. Relationship Between R2 and R 2 . A relationship between R2 and R 2 can be shown. The difference between the two measures is R2 measures variation and R 2 measures variance. Taking in account degrees of freedom changes variation into variance. Rearranging the equation for R2 one obtains the following equation: SSE SST . SSE 2 (1 R ) SST R2 1 Multiplying both sides of the above equation by -(n-1)/(n-k) and rearranging the following equation is obtained: 1 SSE n 1 n 1 SSE (n k ) 2 n k (1 R ) . 1 SST nk n k SST SST (n 1) nk SSE Adding one to both sides, one obtains the definition for R 2 . Rearranging one obtains SSE n 1 (n k ) 2 R 1 (1 R ) 1 SST nk (n 1) . n 1 R 2 1 (1 R 2 ) nk 2 (5) From equation (5), several aspects of the relationship between R2 and R 2 can be illustrated. These aspects are summarized as follows. 1) If the number of independent variables equals one, k = 1, then R2 = R 2 . This is true because the last term in equation (5) reduces to (n - 1) / (n - 1), which equals one. Equation (5) reduces to one minus one plus R2, which equals R2. 2) If the number of independent variables is greater than one, k > 1, then R2 > R 2 . The definition for R2 was one minus the percent variation unexplained. With this definition, (1-R2) becomes one minus the quantity one minus the percent variation unexplained, which equals the percent unexplained. Equation (5) in words 15 becomes 1 - (% unexplained)(a number >1). Note, (n - 1) / (n - k) will be a number greater than one, because the numerator is greater than the denominator. R 2 because one minus the a number bigger than the percent explained, because the percent explained is multiplied by a number greater than one. R2 recall is one minus percent unexplained. Therefore, R2 must be greater than R 2 . 3) R 2 can be a negative number. A negative R 2 indicates a very poor fit of your equation to the data. Recall the lower bond for R2 is zero. Both R2 and R 2 have an upper bound of one, indicating a perfect fit. 4) R2 increase as k increases for a given n, whereas R 2 may increase or decrease as k increases for a given n. 5) R 2 eliminates some of the problems associated with R2 by taking into account the degrees of freedom. However, R 2 does not eliminate all the problems. Use of R2 and R 2 . R2 and R 2 are used to compare different regression equations. For example, you have estimated the following two equations, but are not sure which equation is “better”: y t ˆ 1 ˆ 2 x t , 2 , and y t ˆ 1 ˆ 2 x t , 2 ˆ 3 x t ,3 . Theory does not provide enough guidance to determine if x3 should be in the equation or not. We know the R2 for the second equation will be larger than the R2 for the first equation, regardless if x3 helps explain y. Informally (no real statistical test), we use R 2 to compare the two estimated equations. The rule is to choose the equation with the highest R 2 as the “best” equation. To compare R 2 ’s, the dependent variables must be the same variable and must be in the same units. F-Test The t-test is used to test individual parameters, whereas we use the F-test to test several parameters at the same time. This is called multiple restrictions. Consider the following equation: (6) y t 1 2 x t , 2 3 x t ,3 4 x t , 4 u t . Examples of several different null hypothesis that are multiple restrictions are H0: (7) 3 = 4 = 0 2 = 3 = 4 = 0 2 = 4 = 0 16 HA: H0 is not true. In the above example, the alternative hypothesis holds if only one of the i’s is not equal to zero in the different null hypothesis. Key Point: the null hypothesis is concerned with testing jointly if several true parameters are equal to zero. This is in contrast to the t-test in which only one parameter was tested at a time. The alternative hypothesis is usually stated in this generic form to cover all possible alternatives. F-test. To conduct an F-test, an unrestricted and a restricted model must be defined. The general unrestricted model, which contains all k parameters, is: (8) y t 1 2 x t , 2 3 x t ,3 4 x t , 4 k x t ,k u t . Restrictions as given by the null hypothesis constitute the restricted model. In general a null hypothesis will have q restrictions, therefore, the restricted model will have q less estimated parameters. Note the null hypothesis is the parameter is equal to zero. A zero parameter value is the same as leaving the variable out of the estimated equation. The general restricted model is given by: y t 1 2 x t , 2 3 x t ,3 4 x t , 4 k q x t ,k q u t . The restricted model has q less parameters to be estimated. This is accomplished by leaving the q variables out of the estimated equation. This effectively forces the q parameters to equal zero. The general null hypothesis to be tested is: H0: HA: k-q+1 = k-q+2 = . . . = k H0 is not true. Key point is the null hypothesis is placing q restrictions on the restricted model, forcing q coefficients to equal zero. Under the assumptions of the OLS model and either assuming normality of the error terms or invoking the Central Limit Theorem, it can be shown the following ratio has an F distribution with q and n - k degrees of freedom: (SSE r SSE ur ) (9) F statistic SSE ur q ~ Fq ,n k (n k ) where SSEr is the sum of squared residuals from the restricted model and SSEur is the sum of squared residuals from the unrestricted model. The above statistic will be positive, because all components of the equation are positive. Recall the sum of squares is positive and n, k, and q by definition are positive. SSEr SSEur because the restricted model has less independent variables so it cannot explain more (smaller SSE) than the unrestricted model. 17 F-Distribution. Before conducting an F-test associated with an OLS regression, it is useful to briefly review the F-distribution. The F-distribution is always positive and has two degrees of freedom, one for the numerator and one for the denominator. The F-test is a one-tailed test, concerned with the area in the right-hand tail. Graphically, the F-test is: F critical value with q, n-k degrees of freedom Area = 1- 0 Rejection Region α Values of F If the calculated F-statistic falls in the rejection region of the right-hand tail, the null hypothesis is rejected. If the calculated F-statistic falls to the left of the critical value, we fail to reject the null hypothesis. Conducting an F-test. To calculate the F-statistic, two regression equations must be estimated. Let’s consider the model given by equation (6). In this equation there are three independent variables plus the intercept. Equation (6) is the unrestricted model. Consider testing the first null hypothesis given in equation (7), H0: 3 = 4 = 0. First, we would estimate the unrestricted model given by equation (6). From this estimation we would obtain the SSEur. Next, we would estimate the restricted model and obtain the SSEr. The restricted model in this case is: (10) y t 1 2 x t , 2 u t . In the restricted model, the variables associated with 3 and 4 are not included in the estimation. To calculate the F-statistic, we would substitute the calculated SSE into equation (9) with q = 2. We are placing two restrictions on the model, 3 = 4 = 0. The calculated value is then compared to the critical value for a given level of significance. As an example, lets assume you have 124 observations and estimate equation (6). You obtain an SSEur = 84.02. Next, you estimate equation (10) and obtain an SSEr = 149.64. The Fstatistic from equation (9) is: (SSE r SSE ur ) F statistic SSE ur (n k ) q (149.64 84.02) 84.02 (124 4) 2 32.81 46.86. 0.700 Assuming a level of significance of 0.01, the critical value for the F-distribution with 2, 120 degrees of freedom is 4.79. The calculated F-statistic lies to the right of the critical value; 18 therefore, the null hypothesis is rejected that the parameters 3 and 4 are jointly equal to zero. The test does not say anything about the individual parameters only the null hypothesis of jointly equal to zero is rejected. Similar to the t-test, the F-test is testing hypothesis concerning the true parameter and not the estimated parameters. The estimated parameters are fixed numbers for a given sample. This procedure can be used to jointly test multiple restrictions. Equations must be separately estimated and the SSE’s obtained. Linear Relationship. Similar to the t-test, most regression packages, including Excel, automatically calculate and print out a specific F-test. This F-test is jointly testing if there is a linear relationship between the dependent variable and all of the independent variables together except the intercept. This is a special case of the general F-test previously discussed. For the general model given in equation (8), the null and alternative hypotheses are: H0: HA: 2 = 3 = . . . = k H0 is not true. This test is jointly testing if all parameters are equal to zero. This test gives an indication if jointly the independent variables are linearly related to the dependent variable. Linearly related arises because of the assumption the equations are linear in parameters. The interpretation of this test is the same as previously discussed. This is just a special case of the F-test. When interpreting estimated equations, the F-test must be discussed. That is, do you reject or fail to reject the null hypothesis of a linear relationship between the dependent variable and the independent variables jointly. What level of significance is assumed? The section on pvalues will help in the interpretation. Continuing with the above example, the SSEur remains the same, but the restricted SSEr is obtained by estimating the model with only an intercept. That is, no independent variables are included in the model. The SSEr = 257.81. Using these SSE values, the F-statistic associated with the above hypothesis is: (SSE r SSE ur ) F statistic SSE ur (n k ) q (257.81 84.02) 84.02 (124 4) 3 57.91 82.74. 0.700 With 3, 120 degrees of freedom the critical value associated with an = 0.05 (0.01) is 2.68 (3.95). At either of these levels of significance, we would reject the null hypothesis that there is no linear relationship between the dependent and independent variables. Relationship Between the F-Statistic and R2 19 Because both the F-test and the R2 are both based on the SSE and SST, it is intuitive a relationship exists between the two metrics. To show this relationship, recall the equation for calculating R2 is R 2 1 (SSE ) . This equation can be rearranged to obtain SST SSE SST (1 R 2 ) . We can then substitute this result into the F-statistic equation given in equation (9) to obtain: (SSE r SSE ur ) F statistic SSE ur q (n k ) (10) [SST (1 R 2r ) SST (1 R 2ur )] SST (1 R ) 1 ur q (n k ) where as before r denotes the restricted model and ur the unrestricted model. Note, SST is the same in both models because it is only a function of the observed dependent variable and not dependent on the independent variables. Because the term SST is present in a multiplicative form in each term of equation (10) we can eliminate SST from the equation, the one’s in the numerator will cancel, and by rearranging we obtain the relationship between the F-statistic and R2. Mathematically, this is shown as follows: [SST(1 R 2r ) SST(1 R 2ur )] F statistic SST(1 R ) 2 ur (n k ) SST[(1 R 2r ) (1 R 2ur )] SST(1 R ) 2 ur (n k ) [(1 R 1 R )] 2 r 2 ur (1 R ) 2 ur (n k ) [( R R )] 2 ur F statistic 2 r (1 R ) 2 ur q (n k ) . q q q 20 This relationship shows, you can calculate the F-statistic and conduct the F-test, if you know the R2 for both equations. More important, the relationship shows that the F-statistic and R2 are related. For the special null hypothesis, all ’s are jointly equal to zero, the relationship reduces to: [( R 2ur )] F statistic (1 R ) 2 ur q . (n k ) The relationship reduces to this form, because R2r = 0. The coefficient of determination is zero in the restricted model, because there are no independent variables, x’s in the model. Recall, R2 is the amount of variation in y explained by the x’s. If there are no x’s in the model, they obviously explain none of the variation. p-value for the F-Test The p-value for the F-test associated with the null hypotheses all parameters are jointly equal to zero is printed out by most regression software. The interpretation of the p-value is the same as it was for the t-test. The p-value is the probability of observing a value of the F distribution at least as large as the calculated F-statistic. Graphically, the p-value is given by 21 Calculated Fstatistic with q, n-k degrees of freedom Area = p-value 0 Values of F As with the p-values associated with the t-statistic, p-values for the F-statistic can be used several ways. The first is the p-value gives the level of significance associated with the calculated t-statistic. In the above example, if you were to choose a level of significance equal to 0.00044, your F-critical value would equal 82.74. At this level of significance, your critical value and test statistic are equal. In other words, you can report the exact level of significance that provides the cut off value between rejecting and failing to reject the null hypothesis. Second, the p-values can be used as follows, the null hypothesis is rejected if the p-value is less than or equal to your chosen level of significance, . In the above example, a level of 0.05 was chosen and a p-value of 0.00044 is obtained. At this level of significance, we would reject the null hypothesis. If we chose a that is smaller than 0.00044, then we would fail to reject the null hypothesis. The decision rule is given by the null hypothesis is rejected if the p-value is less than or equal to the level of significance, , which is the level of a Type I error you are willing to accept. You fail to reject the null hypothesis if the p-value is greater than the chosen level of significance.