CHAPTER 6 STATISTICAL INFERENCE: HYPOTHESIS TESTS 1. 2. 3. 4. The Concept of Hypothesis Testing The General Methodology of Hypothesis Testing 2.1. The Procedure 2.1.1. The Null Hypothesis versus the Alternative Hypothesis 2.1.2. The Type I Error versus Type II Error 2.1.3. Two-Tailed Hypothesis Tests versus One-Tailed Hypothesis Tests 2.1.3.1. Two-Tailed Tests 2.1.3.1.1. Decision Rules 2.1.3.1.2. The Relationship Between the Confidence Interval and the Acceptance Region for the TOH 2.1.3.1.3. Type I and Type II Errors Revisited 2.1.3.2. One-Tailed Tests 2.1.3.2.1. Lower Tail Test 2.1.3.2.2. Upper Tail Test 2.1.4. How to Set Up the Null and Alternative Hypotheses Hypothesis Test for the μ—Small Samples From Normal Populations Test of Hypothesis on Population Proportion π 1. The Concept of Hypothesis Testing In addition to the confidence interval, the hypothesis test is another approach to making inferences about a population parameter using a sample statistic. To compare the two approaches, in the confidence interval we have no prior knowledge, judgment or claim about the population parameter. We take a sample and build an appropriate interval around the sample statistic, for a given level of confidence, to estimate the range of values within which the population parameter may fall. In the hypothesis test, in contrast, we have some prior knowledge, judgment or claim about the population parameter. In other words, we have a hypothesis about the population parameter, which may be accepted or rejected depending on the analysis of the sample data. 2. The General Methodology of Hypothesis Testing To explain the theoretical foundation of hypothesis tests we will start with the formula for the confidence interval for μ with large sample size ๐: ๐ฟ, ๐ = ๐ฅฬ ± ๐ง๐ผ⁄2 se(๐ฅฬ ) where se(๐ฅฬ ) = ๐ ⁄√๐. Note that the interval is built around ๐ฅฬ , where the expression ๐ง๐ผ⁄2 se(๐ฅฬ ) is the familiar marginal error—MOE. The ๐๐๐ธ plays a similar role in the hypothesis test, as will be seen. Now, to explain the methodology for the hypothesis test, consider the following example: Chapter 6—Hypothesis Tests Page 1 of 22 Example 1 Casual observation of vehicle speed on a freeway indicates that most vehicles exceed the speed limit of 70 mph. Suppose we want to test the hypothesis that mean speed is 75 mph. Accordingly, a random sample of ๐ = 110 vehicles is secretly clocked, yielding the following data: 65 74 82 73 80 86 80 69 87 83 78 83 84 66 81 80 90 77 84 80 67 62 91 69 92 84 69 64 65 84 82 68 75 65 88 86 89 85 66 76 92 76 66 85 88 78 84 83 83 81 64 91 76 88 89 69 64 79 66 78 90 81 72 66 77 84 64 65 65 87 62 83 75 78 74 92 84 87 86 89 82 87 78 72 73 68 91 76 90 87 76 72 85 71 67 86 62 89 70 73 68 83 65 89 72 73 70 62 70 72 The mean speed obtained from this data is ๐ฅฬ = 77.5 mph. Does this sample provide significant evidence that the mean speed of all vehicles is different from 75 mph? Should we reject the hypothesis that the population mean speed is 75 mph? Note that ๐ฅฬ = 77.5 is obtained from a single sample. Now that we have learned about the sampling distribution of ๐ฅฬ , we know that there are infinite number of samples of size ๐ = 110, each yielding a different ๐ฅฬ values. These values are normally distributed with the population mean ๐ as their center of gravity. Therefore, it is inevitable that the mean obtained from a single sample of size n will deviate from the population mean. The question is then: is the deviation of the sample mean from the hypothesized population mean significant? To answer this question, we must determine whether the deviation is due to sampling error. That is, does ๐ฅฬ = 77.5 mph fall within the margin of sampling error (๐๐๐ธ) from the hypothesized population mean? If the deviation is within ๐๐๐ธ, then we can conclude that this is a "natural" deviation, that 77.5 is one of the ๐ฅฬ values that fall within the 95% interval in the sampling distribution. Therefore, this ๐ฅฬ belongs to the sampling distribution which has the center of gravity μ = 75 mph, the hypothesized population mean. If the sample ๐ฅฬ value falls within the ±๐๐๐ธ from µ0 = 75 (the expression µ0 implies that this is the hypothesized, rather than the actual, population mean), then the deviation of the sample value from the hypothesized population mean is said to be not significant—the deviation is due to the sampling error. If, however, the ๐ฅฬ value falls outside the interval µ0 ± ๐๐๐ธ, then the deviation is said to be significant—the deviation is not due the sampling error. This ๐ฅฬ belongs to a different sampling distribution with a center of gravity other than the hypothesized µ0 . Then it can be argued that the hypothesized µ0 is not the true population mean. We should reject the hypothesis that the population mean is equal to 75 mph. What constitutes a significant deviation? How do you determine the acceptable ๐๐๐ธ for a hypothesis test? This is the question that the test of hypothesis attempts to answer. 2.1. The Procedure The main task in performing a test of hypothesis is to find the margin of sampling error, ๐๐๐ธ. This would provide us with the decision rule, the criterion, to determine whether to reject the hypothesis. Chapter 6—Hypothesis Tests Page 2 of 22 2.1.1. The Null Hypothesis versus the Alternative Hypothesis To obtain the decision rule, first you must state the claim (the hypothesis) about the population mean in a prescribed way. The claim consists of two components, a null hypothesis, denoted by ๐ฏ๐ , and an alternative hypothesis, ๐ฏ๐ . For the vehicle speed example, we state the null and alternative hypotheses as follows. The null hypothesis: The alternative hypothesis: ๐ป0 : µ = 75 ๐ป1 : µ ≠ 75 The null hypothesis states that the population mean is equal to is 75 mph; the alternative hypothesis states that the population mean is not equal to, or different than, 75 mph. 2.1.2. The Type I Error versus Type II Error Once you state your hypothesis, you must deal with the following dilemma involving hypothesis tests. Since the test of hypothesis involves the sampling distribution, in deriving a conclusion from the results of the test that is based on a random sampling process, there is always a chance that you may arrive at a wrong conclusion about the hypothesis. A wrong conclusion can happen in two ways. 1) Reject a true null hypothesis. This is called a Type I Error. 2) Not reject a false null hypothesis. This is called a Type II Error. There is always a chance or probability that you may commit either one of the two errors. The probability of committing a Type I error is denoted by α and that of committing a Type II error is denoted by β. Reducing α, for a given sample size, comes only at the cost of increasing β. Performing a test of hypothesis is very similar to conducting a trial in a criminal court. Given the evidence that a crime is committed, the defendant or the accused is charged for or accused of committing the crime. The purpose of the trial is to establish the defendant’s guilt or innocence. The null hypothesis is that the defendant is innocent (the accused is presumed innocent) and the alternative is that he is guilty. If the jury finds an innocent person guilty, it has rejected a true null hypothesis; it has, therefore, made a Type I error. On the other hand, if the jury finds a guilty person not guilty, it has not rejected a false null hypothesis; it has, therefore, made a Type II error. The following table shows the four possible situations resulting from a test of hypothesis (or a court trial). The null hypothesis ๐ฏ๐ (presumed innocent) ๐ฏ๐ is True ๐ฏ๐ is rejected ๐ฏ๐ is not rejected Incorrect decision (Type I Error) The accused is innocent and he is found guilty. Probability = α Correct decision (no error) The accused is innocent and he is found not guilty. Probability = 1 − α ๐ฏ๐ is False Correct decision (no error) The accused is guilty and he is found guilty. Probability = 1 − β Incorrect decision (Type II Error) The accused is guilty and he is found not guilty. Probability = β In the hypothesis test, the burden of proof is always on the alternative hypothesis. In a criminal court, the burden of proof is on the prosecutor. The prosecutor must convince the jury, show beyond a reasonable doubt, that the defendant is guilty. Therefore, we want to make it unlikely to reject the null hypothesis unless the evidence is "very strong" or "significant". In a criminal court, “significant” means “beyond a reasonable doubt”. We want to make it unlikely to find the defendant guilty unless guilt is established beyond a Chapter 6—Hypothesis Tests Page 3 of 22 reasonable doubt. For this reason the α, the probability of rejecting the null hypothesis, is always assigned a small value—typically, 5 percent in statistical hypothesis tests. The α value is also called the level of significance of the test. Note that in a confidence interval, α is the percentage of all possible intervals built around sample means that do not capture the population mean. That was because α% of sample means fall outside the margin of error ๐๐๐ธ = ๐ง๐ผ⁄2 se(๐ฅฬ ). In a test of hypothesis α plays a similar role. If the randomly selected sample yields an ๐ฅฬ value which falls outside the prescribed margin of error, we would wrongly reject the null hypothesis. And there is always an α% chance of doing that. Since committing a Type I Error is considered as the more serious of the two errors (finding an innocent person guilty), the threshold probability (the level of significance α) is set in advance. The probability of Type II Error (β), however, varies based on several factors, one of the them being α. 2.1.3. Two-Tailed Hypothesis Tests To determine the acceptance region, like the confidence interval, we need a margin of (sampling) error. The form of the ๐๐๐ธ in the hypothesis test depends on the null hypothesis to be rejected. If the null hypothesis is that µ0 = 75 (the population mean is equal to 75), then using the margin of error MOE = ๐ง๐ผ⁄2 se(๐ฅฬ ) the interval which would contain 1 − ๐ผ percent of all sample means would be ๐ฟ, ๐ = ๐0 ± ๐ง๐ผ⁄2 se(๐ฅฬ ) Here the hypothesis test is said to be a two-tailed test. The reason this is called a two-tailed test is that no matter what the value of the sample statistic ๐ฅฬ , whether it is greater than or less than the hypothesized mean, there is always some evidence against the null hypothesis in terms of the difference between the value of the sample statistic and value stated as the null hypothesis. The purpose of the test (the trial) is to gauge the significance of the difference in either direction from the null mean. The significance of the difference will be measured relative to the margin of error on either side. In the ๐๐๐ธ formula, therefore, you must use ๐ง๐ผ⁄2. The null and alternative hypotheses for a TWO-TAIL TEST The null hypothesis: The alternative hypothesis: ๐ป0 : µ = ๐0 ๐ป1 : µ ≠ ๐0 The vehicle speed example is a two-tail test. Test the null hypothesis that the population mean vehicle speed is equal to 75. ๐ป0 : µ = 75 mph ๐ป1 : µ ≠ 75 mph We select α = 0.05 (allowing for 5% chance of committing a Type I error, that is, rejecting a true null hypothesis). Going back to the sample data shown above, the sample mean and standard deviation are obtained as: ๐ฅฬ = 77.5 and ๐ = 8.94. To determine the margin of error, first compute the standard error of ๐ฅฬ . se(๐ฅฬ ) = 8.94⁄√110 = 0.852 Given α = 0.05, the other component of MOE, ๐ง๐ผ⁄2 , is ๐ง0.025 = 1.96. The margin of error is then Chapter 6—Hypothesis Tests Page 4 of 22 MOE = (1.96)(0.852) = 1.67 The interval is then ๐ฟ, ๐ = 75 ± 1.67 = [73.23,76.67] Since ๐ฅฬ = 77.5 falls outside this interval, we conclude that the deviation is significant, and reject ๐ป0 : µ = 75. In the diagram below, the interval [73.23,76.67] is labeled as the “acceptance region”. The sample mean falls outside this region. xฬ = 77.5 73.33 µโ = 75 76.67 73.33 ≤ Acceptance Region ≤ 76.67 2.1.3.1. The Relationship Between the Confidence Interval and the Acceptance Region for the TOH We can use the above diagram to observe how the confidence interval for µ and the acceptance region for a two-tail test of hypothesis are related. The margin of error for a 95% confidence interval for the vehicle speed example is: MOE = ๐ง๐ผ⁄2 se(๐ฅฬ ) = 1.96 × 0.852 = 1.67 The lower and upper boundaries of the confidence interval are: ๐ฟ, ๐ = ๐ฅฬ ± MOE = 77.5 ± 1.67 = (75.83,79.17) Chapter 6—Hypothesis Tests Page 5 of 22 73.33 µโ = 75 76.67 = 75.83 = 77.5 = 79.17 Note that this interval does not capture the null mean µโ = 75. You can thus use a confidence interval to observe if the null mean ๐0 falls within the interval. If it does not, then you reject the null hypothesis. 2.1.3.2. Type I and Type II Errors Revisited In this example, we rejected ๐ป0 = 75 because the sample mean ๐ฅฬ = 77.5 happened to fall outside the ๐๐๐ธ. That is, this sample mean was not one of the 95% of ๐ฅฬ values that would fall within the interval 75 ± 1.67. Now we can ask the question, “what if the population mean were in fact 75 mph?”. If that were the case, then 5% of ๐ฅฬ values would fall outside the interval 75 ± 1.67. Therefore, if ๐ฅฬ = 77.5 belonged to this 5%, then we have rejected a true ๐ป0 ; we have made a Type I error. Suppose now we take another sample of ๐ = 110 and obtain ๐ฅฬ = 76.2 mph and ๐ = 8.66. Note that with ๐๐๐ธ = 1.96(8.66⁄√110) = 1.6 the boundaries of the acceptance region are: ๐ฅฬ ๐ฟ , ๐ฅฬ ๐ = 75 ± 1.6 = (73.4,76.6) Then ๐ฅฬ = 76.2 falls inside the “acceptance region” under the ๐ป0 distribution. Therefore, we conclude that this sample mean belongs to the ๐ป0 distribution and do not reject the null hypothesis ๐ป0 : µ = 75. But, what if the population mean is some number other than 75? Suppose the true population mean speed is µ1 . This is the center of gravity of the alternative sampling distribution represented in the following diagram by ๐ป1 , and ๐ฅฬ = 76.2 belongs to that distribution. Thus, by wrongly concluding that ๐ฅฬ = 76.2 belongs to the ๐ป0 distribution, we have not rejected a false null hypothesis. We have, therefore, committed a Type II error. Chapter 6—Hypothesis Tests Page 6 of 22 73.4 µโ = 75 76.6 µโ The following is a graphic representation of the four scenarios involving a hypothesis test: o o o o ๐ป0 is true and is not rejected: No error. ๐ป0 is true and is rejected: Type I error. ๐ป0 is false and is not rejected: Type II error. ๐ป0 is false and is rejected: No error. Chapter 6—Hypothesis Tests µโ = 75 µโ = 75 µโ = 75 µโ = 75 Page 7 of 22 2.1.3.3. Decision Rules for Rejecting ๐ฏ๐ The decision rule is always set up to reject the null hypothesis. There are two ways to set up the decision rule. Both are derived from the ๐๐๐ธ formula. The role of ๐๐๐ธ here is that, if the null hypothesis ๐ป0 were true, then 1 − ๐ผ percent of the sample means must fall within the ๐๐๐ธ. Thus ๐๐๐ธ becomes the criterion for rejecting ๐ป0 . We will reject ๐ป0 , that is, we conclude the deviation ๐ฅฬ − ๐0 is significant, only when ๐ฅฬ falls outside the ๐๐๐ธ, when the (absolute value of) deviation of ๐ฅฬ from ๐0 exceeds ๐๐๐ธ. We reject ๐ป0 if, ๐ฅฬ − ๐0 > ๐๐๐ธ Substituting for ๐๐๐ธ, we have, ๐ฅฬ − ๐0 > ๐ง๐ผ⁄2 se(๐ฅฬ ) Dividing both sides of in inequality by se(๐ฅฬ ) gives us, ๐ฅฬ − ๐0 > ๐ง๐ผ⁄2 se(๐ฅฬ ) The term on the left-hand-side above is called the test statistic (๐ป๐บ)and ๐ง๐ผ⁄2 is the critical value (๐ช๐ฝ). Decision Rule (a)—Reject H0 if ฬ − µ๐ ๐ > ๐๐ถ⁄๐ ฬ ) ๐ฌ๐(๐ |๐ป๐บ| ≡ |๐| > ๐ช๐ฝ ≡ ๐๐ถ⁄๐ Note that the test statistic when you compute the test statistic ๐ฅฬ −๐0 se(๐ฅฬ ) , the result is the ๐ง score. Also note the absolute value lines around the test statistic. This means that when the test statistic is negative, to avoid the confusion arising from the negative sign regarding the direction of the inequality, use the absolute value. Now back to the vehicle speed example. The mean obtained from the sample is ๐ฅฬ = 77.5. The objective of this exercise is to see if the deviation of the sample mean and the hypothesized mean (๐ฅฬ − µ0 = 2.50) is significant. If this difference exceeds MOE, then the difference is significant and it will lead us to reject the null hypothesis. The difference is: ๐ฅฬ − µ0 = 77.5 − 75 = 2.5 Using se(๐ฅฬ ) = 0.852, the test statistic is, ๐ง= ๐ฅฬ − ๐0 2.5 = = 2.93 se(๐ฅฬ ) 0.852 and the critical value is, ๐ง๐ผ⁄2 = ๐ง0.025 = 1.96 Chapter 6—Hypothesis Tests Page 8 of 22 ๐๐ = 2.93 > ๐ถ๐ = 1.96, then the deviation is significant. Therefore, we reject the null hypothesis that µ0 = 75. We conclude that the population mean vehicle speed is different from 75 mph. The alternative approach for determining if the difference ๐ฅฬ − ๐0 is significant is to find the tail area associated with the value of the test statistic. That is, find P(๐ง > ๐๐). Using the z table, this tail area is: P(๐ง ≥ ๐๐) = P(๐ง ≥ 2.93) = 0.0017 When the test is a two-tail test, double the computed probability (2 × 0.0017 = 0.0034) and compare it to α = 0.05. Note that 0.0034 is now the computed probability of Type I error. With ๐๐๐๐ ๐ฃ๐๐๐ข๐ = 0.0034, there is about 0.58% probability that we might reject a true null hypothesis. Since we are allowing 5% as the “comfort zone” or threshold probability for rejecting a true null, the computed probability 0.0034 is clearly within this comfort zone. There is only a 0.34% chance that we will be rejecting a true null, or committing a Type I error. This approach to the hypothesis test is the probability (๐๐๐๐) value approach. Decision Rule (b)—Reject H0 if ๐ × ๐(๐ > ๐ป๐บ) < ๐ถ p-value < level of significance For a two-tail test, in Decision Rule (b) the ๐๐๐๐ value is twice the tail area corresponding to TS. If the ๐­value < α, then reject ๐ฏ๐ . Summary of Steps For a Two-Tail Tests a. State the null and alternative hypotheses. ๐ป0 : µ = µ0 ๐ป1 : µ ≠ µ0 a. Specify the level of significance α. b. Use any of the two methods to reject or not reject the null hypothesis Decision Rule (a)—Test Statistic ๐ฅฬ − ๐0 se(๐ฅฬ ) i. Compute the test statistic ๐๐ = ii. Determine the critical value ๐ถ๐ = ๐งα⁄2 iii. Reject H0 if ๐๐ > ๐ถ๐ Chapter 6—Hypothesis Tests Page 9 of 22 Decision Rule (b)—The ๐๐๐๐ value ๐ฅฬ − ๐0 se(๐ฅฬ ) i. Compute the test statistic ๐๐ = ii. Find the probability value 2 × ๐(๐ง > ๐๐) iii. Reject H0 if p-value < α (The two tail areas for TS) 2.1.4. One-Tailed Tests In many cases the null hypothesis is that the population mean is either at least (greater than or equal to), or is at most (less than or equal to) some value. In these cases the sample statistic ๐ฅฬ may contradict the null hypothesis in only one direction. For example, if ๐ป0 is ๐ ≥ 75, the test is of interest only if ๐ฅฬ is less than 75. Only this way does the sample statistic contradict the null hypothesis and we want to test whether this is a significant contradiction. If the sample mean turns out to be greater than 75, then it confirms the null and, therefore, there is no need for the test.1 This is why the significance of the deviation will be measured relative to the margin of error only in one direction. Regarding the level of significance α, to maintain the same probability of rejecting a true null hypothesis as in a two-tailed test, the whole α must be used. Thus, in the ๐๐๐ธ formula we use ๐งα instead of ๐งα⁄2 . ๐๐๐ธ = ๐ง๐ผ ๐ ๐(๐ฅฬ ) Here the hypothesis test is said to be a one-tail test. In the above example, we conducted a two-tail test, testing the null hypothesis ๐ป0 : µ = 75 mph against the alternative ๐ป1 : µ ≠ 75 mph. What if the concern was the mean vehicle speed is 75 mph or more (at least 75 mph). In this case we would be conducting a one-tail test, testing the null hypothesis ๐ป0 : µ ≥ 75 mph against the alternative ๐ป1 : µ < 75 mph. 2.1.4.1. Lower Tail Test If the sample of 110 vehicles yields a mean which is less than 75 mph, this may be evidence that the population mean speed is less than 75 mph. The question is, however, is the evidence conclusive? Is the sample mean significantly less than 75? How far should the sample mean fall below the null mean, ๐0 ≥ 75, before we conclude that the evidence is significant? To set up the test, the null hypothesis ๐ป0 must be that the mean is equal to or greater than 75 mph. To reject this hypothesis, the sample evidence must be significant. That is, the sample mean must be significantly less than 75 mph. Generally, for a lower-tail test, the null and alternative hypotheses are written as The null and alternative hypotheses for a LOWER-TAIL TEST The null hypothesis: The alternative hypothesis: 1 ๐ป0 : µ ≥ ๐0 ๐ป1 : µ < ๐0 If there is no evidence the “defendant” has committed the crime, then there would be no trial. Chapter 6—Hypothesis Tests Page 10 of 22 For this example, the null and alternative hypotheses are written as: ๐ป๐ : µ ≥ 75 ๐ป1 : µ < 75 This is a lower-tail test, as indicated by "<" (a strict inequality) in the alternative hypothesis. Example 2 To perform a test, let us continue with the example of clocking a random sample of ๐ = 110 vehicles. Suppose the sample yields ๐ฅฬ = 73.9 mph and ๐ = 8.82. Note that ๐ฅฬ = 73.9 < µ0 = 75 implies that the evidence from the sample contradicts ๐ป0 . The question is, is this a significant evidence contradicting the null hypothesis? Let us compute the ๐๐๐ธ, determine the acceptance region for the test, and see where ๐ฅฬ falls relative to the acceptance region. se(๐ฅฬ ) = 8.82 √110 = 0.841 ๐๐๐ธ = ๐ง๐ผ se(๐ฅฬ ) = 1.64 × 0.841 = 1.38 The following diagram shows the acceptance region and where the ๐ฅฬ value falls. Note that the acceptance region is now bounded only on the left tail. ๐ฅ๐ฟ = µ0 − MOE = 75 − 1.38 = 73.62 xฬ = 73.90 73.62 µโ = 75 The sample statistic ๐ฅฬ = 73.90 falls inside the acceptance region bounded on the left by ๐ฅฬ ๐ฟ = 73.62 mph. We do not reject the null hypothesis and conclude that the population mean is not less than 75 mph. The following show the decision rules for a lower tail test. Chapter 6—Hypothesis Tests Page 11 of 22 Decision Rule (a) for a Lower Tail Test—Reject H0 if ฬ − µ๐ ๐ > ๐๐ถ ฬ ) ๐ฌ๐(๐ |๐ป๐บ| ≡ |๐| > ๐ช๐ฝ ≡ ๐๐ถ For our example, ๐๐ = ๐ฅฬ − ๐0 73.9 − 75 = = −1.31 se(๐ฅฬ ) 0.841 To avoid the confusion arising with the negative sign of the test statistic, use the absolute value of ๐๐ to compare to the ๐ถ๐. Thus, |๐๐| = 1.31 < ๐ถ๐ = ๐ง0.05 = 1.64 We do not reject the null hypothesis and conclude that the population mean speed is not less than 75 mph. Decision Rule (b)—Reject H0 if ๐(๐ < ๐ป๐บ) < ๐ถ p-value < level of significance To find the p-value P(๐ง < −1.31) = 0.0951 Since this is a one-tail test, we do not double the tail area. Thus, ๐­๐ฃ๐๐๐ข๐ = 0.0951 > ๐ผ = 0.05 We do not reject the null hypothesis. i. Find the critical value. ii. Find the test statistic: iii. Reject H0 if −๐๐ < −๐ถ๐, or |๐๐| > ๐ถ๐ 2.1.4.2. ๐ถ๐ = ๐งα = ๐ง0.05 = 1.64 ๐๐ = (๐ฅฬ − µ0 )⁄se(๐ฅฬ ) = −1.10⁄0.841 = −1.31 Do not reject H0 since ๐๐ = −1.25 > −๐ถ๐ = −1.64 ๐๐ |๐๐| = 1.25 < ๐ถ๐ = 1.64 Upper Tail Test The upper-tail test applies when we want to test if the sample evidence is significantly greater than the value stated in the null hypothesis for the population mean. Example 3 A random sample n = 115 reimbursements for office visits to physicians paid by Medicare yielded a sample mean of ๐ฅฬ = $104.9 and a standard deviation of ๐ = $25.30. Does the sample provide significant evidence Chapter 6—Hypothesis Tests Page 12 of 22 that the mean reimbursement is greater than $100? Perform the test of hypothesis at a 5% level of significance. Since we want to determine if the sample mean ๐ฅฬ = $104.9 is significantly greater than (>) $100, the null hypothesis should be µ0 ≤ $100. Therefore, we must write the null and alternative hypotheses as: ๐ป0 : µ ≤ $100 ๐ป1 : µ > $100 Let us compute the ๐๐๐ธ, determine the acceptance region for the test, and see where ๐ฅฬ falls relative to the acceptance region. se(๐ฅฬ ) = 25.3 √115 = 2.36 ๐๐๐ธ = ๐ง๐ผ se(๐ฅฬ ) = 1.64 × 2.36 = 3.87 The following diagram shows the acceptance region and where the ๐ฅฬ value falls. Note that the acceptance region is now bounded only on the right tail. ๐ฅ๐ = µ0 + MOE = 100 + 3.87 = 103.87 xฬ = 104.90 µโ = 100 103.87 The sample mean ๐ฅฬ = $104.9 falls outside the acceptance region. Hence, we reject the null hypothesis, ๐ป0 : µ ≤ $100, and conclude that the mean reimbursement is greater than $100. Now let’s use the test statistic and p-value decision rules. Decision Rule: Reject ๐ฏ๐ , if ๐ป๐บ > ๐ช๐ฝ ๐๐ = ๐ง = ๐ฅฬ − ๐0 104.9 − 100 = = 2.08 se(๐ฅฬ ) 2.36 ๐ถ๐ = ๐ง๐ผ = ๐ง0.05 = 1.64 Since ๐๐ = 2.08 > ๐ถ๐ = 1.64, reject ๐ป0 Decision Rule: Reject ๐ฏ๐ , if p-๐ฏ๐๐ฅ๐ฎ๐ < ๐ถ p-value = P(๐ง > ๐๐) = P(๐ง > 2.08) = 0.0188 Chapter 6—Hypothesis Tests Page 13 of 22 Reject H0 since p-value = 0.0188 < ๐ผ = 0.05 Remark: What would the conclusion be if α = 0.01? Since ๐๐ = 2.08 < ๐ถ๐ = ๐ง0.01 = 2.33, do not reject ๐ป0 . Since p-value = 0.0188 < ๐ผ = 0.01, do not reject ๐ป0 . ๐๐๐ธ = ๐งα se(๐ฅฬ ) = 2.33(8.82⁄√115) = 5.56 Example 4 A light bulb manufacturer claims the mean life of its light bulbs is at least 1,000 hours. To perform a test of hypothesis at 5 percent level of significance, a sample of ๐ = 105 light bulbs yields an average life of 989.2 and a standard deviation of 56 hours. Should the manufacturer's claim be rejected? Use ๐ผ = 0.05. The problem is asking if we should reject the manufacturer’s claim that the mean life is at least 1,000. At least means “no less than” or “greater than or equal to”, the symbol for which is “≥”. This is the null hypothesis symbol. The alternative is “<”. ๐ป0 : ๐ ≥ 1,000 ๐ป1 : ๐ < 1,000 ๐ = 56 se(๐ฅฬ ) = 56⁄√105 = 5.465 ๐ผ = 0.05 ๐ง๐ผ = ๐ง0.05 = 1.64 Decision Rule (a)—Reject Hโ if |๐ป๐บ| < ๐ช๐ฝ ๐ = 49 i. Find the critical value: ii. Find the test statistic: iii. Reject H0 if |๐๐| > ๐ถ๐ ๐ถ๐ = ๐งα = ๐ง0.05 = 1.64 |๐๐| = (๐ฅฬ − µ0 )⁄se(๐ฅฬ ) = −10.8⁄5.465 = 1.98 Reject H0 since |๐๐| = 1.98 > ๐ถ๐ = 1.64 Decision Rule (b)—Reject Hโ if p-value < ๐ถ i. Find the test statistic: ii. Find the ๐๐๐๐ value: P(๐ง < ๐๐) iii. Reject H0 if ๐๐๐๐ ๐ฃ๐๐๐ข๐ < ๐ผ ๐๐ = (๐ฅฬ − µ0 )⁄se(๐ฅฬ ) = −10.8⁄5.465 = −1.98 P(๐ง < −1.98) = 0.0239 Reject ๐ป0 since p-value = 0.0239 < ๐ผ = 0.05 Reject the manufacturer’s claim that the mean life is at least 1,000 hours and conclude that it is less the 1,000 hours. 2.1.5. How to Set Up the Null and Alternative Hypotheses The most important part of performing a hypothesis test is stating the correct null and alternative hypotheses. The incorrect statement of the hypotheses will invariably lead you to a wrong conclusion about the test. If you are confused about setting up the hypotheses, hopefully the following guidelines will help. ๏ท Never put the equal sign in the alternative hypothesis. The following symbols should not appear in the alternative hypothesis: "=", "≤", "≥". These symbols belong to the null hypothesis. Depending on the nature of the test, the alternative hypothesis may contain any of the following: "≠", ">", "<". Chapter 6—Hypothesis Tests Page 14 of 22 ๏ท Following the above directions, after you state your null and alternative hypotheses, make certain that the sample evidence contradicts the null hypothesis (and agrees with the alternative). Remember, the reason we conduct a hypothesis test is to determine if the sample evidence is significant in order to reject the null. In a two tail test, the sample evidence will always be different, or contradict, the null. So, there is no confusion. However, in a one tail test the reason we conduct the test is that there is evidence against the null, and we want to determine if the evidence is significant. For example, if in a problem you set your null and alternative hypothesis as, say, ๐ป0 : ๐ ≥ 100 ๐ป1 : ๐ < 100 and the sample evidence is ๐ฅฬ = $110, then the sample evidence does not contradict the null. There is no evidence that the population mean is less than $100; there is no evidence to reject the null. This should be a warning that your hypotheses statement is incorrect. The correct statement should be, ๐ปโ: ๐ ≤ $100 ๐ปโ: ๐ > $100 Now the sample evidence, ๐ฅฬ = $110, contradicts the null. There is evidence the mean is greater than $100, but you want to determine if ๐ฅฬ is significantly greater than 100 in order for you to reject the null. ๏ท Generally, any hypothesis test which involves challenging the status quo, the prevailing practice or belief, the challenger's viewpoint should be the alternative hypothesis. If you want to prove the prevailing practice or belief wrong, you have to provide significant proof, a proof which is "beyond a reasonable doubt". Consider the following examples o The production team of a manufacturing company has designed a new production process which is supposed to lower the average production cost. To implement the new process, the production team must convince the management that the average cost is lower with the proposed process than the current process. Suppose the current average cost is $10. The production team must provide significant evidence that the average cost under their proposed process is lower. Therefore, the null and alternative hypotheses must be stated as: ๐ปโ: ๐ ≥ $10 ๐ปโ: ๐ < $10 Note that the null hypothesis states that the average cost is "no less than" $10. The task of the production team is to show significant evidence to reject the null. o A pharmaceutical company has developed new drug to treat a certain type of cancer. Suppose 60% of patients who take the existing drug experience remission. To prove that the new drug is more effective than the current treatment, the company must convince, must provide significant evidence to the medical community that the new drug is better, that the remission rate is higher. The null hypothesis, to be rejected, then must be the new drug is no better: ๐ปโ: ๐ ≤ 60% ๐ปโ: ๐ > 60% An interesting point to keep in mind in this example is that the medical community would require a smaller level of significance for the test, say, ๐ผ = 0.01, compared to the typical ๐ผ = 0.05. This is to reduce the probability of Type I error, to lower the likelihood of rejecting the "no better" hypothesis, when it may be true. Chapter 6—Hypothesis Tests Page 15 of 22 ๏ท Another issue you should keep in mind in choosing the null and alternative hypothesis is: choose ๐ป0 such that, if the hypothesis is true, the consequence of rejecting it is costly, dire, etc... In the problems that you deal with in this course, you should mainly be concerned about how the problem is stated. The problem may be stated as the null hypothesis or the alternative hypothesis. For example, if you are asked to test the hypothesis that the mean is "at least", say, $50, then you should recognize this as a null statement: ๐ปโ: ๐ ≥ 0. The same problem may be state as: test the hypothesis that the mean is "less than" $50. This is an alternative hypothesis statement: ๐ปโ: ๐ < 0. Just be careful to use the appropriate symbol corresponding to the statement of the hypothesis. Then make sure that the equality sign in any form, "=", "≥", or "≤", does not appear in the alternative hypothesis statement. 3. Hypothesis Test for the μ—Small Samples From Normal Populations Like for confidence intervals, when the standard deviation of the population is unknown, the test of hypothesis will use the t distribution, if the sample size is small. Example 5 A filling machine fills bottles with a target mean of 12 ounces of beer. To test whether the target mean is being achieved, a random sample of 20 bottles is selected with the following results (in ounces): 12.06 11.86 11.84 11.98 12.00 11.96 11.83 11.95 12.03 11.82 11.91 11.75 11.96 11.95 11.86 11.97 11.85 11.92 11.89 12.02 Perform the test at 5% level of significance. Achieving the target mean implies that our null hypothesis should be ๐ = 12, and we want to find out if the sample mean deviated from the target mean significantly. ๐ป0 : ๐ = 12 ๐ป1 : ๐ ≠ 12 First we must compute the sample mean ๐ฅฬ and the sample standard deviation ๐ from the sample: ๐ฅฬ = ∑๐ฅ = 12.032 ๐ Find the standard error: Find the t score: ๐ =√ ∑(๐ฅ − ๐ฅฬ )2 = 0.100 ๐−1 se(๐ฅฬ ) = 0.100⁄√20 = 0.022 ๐ก๐ผ/2,(๐−1) = ๐ก0.025,(19) = 2.093 Note that the margin of error is now computed using the ๐ก distribution. ๐๐๐ธ = ๐ก๐ผ/2,(๐−1) se(๐ฅฬ ) = 2.093 × 0.022 = 0.05 The acceptance region is ๐ฟ, ๐ = ๐ป0 ± ๐๐๐ธ ๐ฟ = 12 − 0.05 = 11.05 ๐ = 12 + 0.05 = 12.05 Chapter 6—Hypothesis Tests Page 16 of 22 xฬ = 12.032 μโ = 12 11.95 xฬ 12.05 11.95 ≤ Acceptance Region ≤ 12.05 Decision Rule (a)—Reject Hโ if ๐ป๐บ > ๐ช๐ฝ i. Find the critical value in terms of t ii. Find the test statistic: iii. Reject ๐ป0 if ๐๐ > ๐ถ๐ ๐ถ๐ = ๐ก๐ผ/2,(๐−1) = ๐ก0.025,(19) = 2.093 ๐๐ = (๐ฅฬ − ๐0 )⁄se(๐ฅฬ ) = 1.431 Do not reject ๐ป0 since 1.431 < 2.093 Decision Rule (b)—Reject ๐ฏ๐ if probability value < ๐ถ i. ii. Find the test statistic Find 2 × ๐(๐ก > ๐๐) ๐๐ = (๐ฅฬ − ๐0 )⁄se(๐ฅฬ ) = 1.431 2 × ๐(๐ก > 1.431) = 0.1687 2 (See the footnote and the note below) Do not reject ๐ป0 since 0.1687 > 0.05. iii. Reject ๐ป0 if p value < ๐ผ. IMPORTANT NOTE Note that to compute P(๐ก > 1.43) you must use a computer program that finds the tail area under the t curve for a given t score and degrees of freedom. There are no tables to determine such areas (probabilities). However, you can estimate this probability ๐๐๐๐๐๐๐๐๐ฆ using the ๐ก table as shown below: df 16 17 18 19 20 21 22 23 0.100 1.337 1.333 1.330 1.328 1.325 1.323 1.321 1.319 0.050 1.746 1.740 1.734 1.729 1.725 1.721 1.717 1.714 0.025 2.120 2.110 2.101 2.093 2.086 2.080 2.074 2.069 0.010 2.583 2.567 2.552 2.539 2.528 2.518 2.508 2.500 0.005 2.921 2.898 2.878 2.861 2.845 2.831 2.819 2.807 You can still correctly guess, from a given ๐ก value, whether the ๐๐๐๐ value is greater or less than a given level of significance α (for a one-tail test) or ๐ผ⁄2 (for a two-tail test). In the last example ๐ก = 1.431. Given ๐๐ = 19, the ๐ก score increases as the tail area in the top row decreases (as we move to the right in the table). In the above example, ๐ก = 1.431 is greater than 1.328, the smallest ๐ก score shown in the table associated with ๐๐ = 19. This means that the tail area associated with t score of 1.431 must be greater than 0.100. So, the 2 Here the Excel command =T. DIST. 2T(x, deg_freedom) is used, where =T. DIST. 2T(1.431,19) = 0.1687. Chapter 6—Hypothesis Tests Page 17 of 22 combined tail areas is definitely greater than the level of significance α. Therefore, we do not reject the null hypothesis. Example 6 A light bulb manufacturer claims the average life of its light bulbs is at least 1,000 hours. To perform a test of hypothesis at 5 percent level of significance, a sample of 25 light bulbs yields an average life of 992.6 hours with a sample standard deviation ๐ = 49.3 hours. Should the manufacturer's claim be rejected? ๐ป0 : ๐ ≥ 1,000 ๐ป1 : ๐ < 1,000 Note that this is a lower tail test because ๐ฅฬ − ๐0 = 992.6 − 1000 = −7.40 < 0. ๐ = 25 ๐ฅฬ = 992.6 se(๐ฅฬ ) = 49.3⁄√25 = 9.86 ๐ = 49.3 ๐ผ = 0.05 Find the t score: ๐๐ = ๐ก๐ผ,(๐−1) = ๐ก0.05,(24) = 1.711. [Here you must use ๐ก๐ผ,(๐−1) , rather than ๐ก๐ผ⁄2,(๐−1) , because you are performing a one-tail test.] Decision Rule (b)—Reject Hโ if |๐ป๐บ| > ๐ช๐ฝ i. Find the critical value in terms of t ii. Find the test statistic: iii. Reject ๐ป0 if |๐๐| > ๐ถ๐ ๐ถ๐ = ๐ก๐ผ,(๐−1) = ๐ก0.05,(24) = 1.711 |๐๐| = |๐ฅฬ − ๐0 |⁄se(๐ฅฬ ) = 0.751 Do not reject ๐ป0 since 0.751 < 1.711 Decision Rule (c)—Reject ๐ฏ๐ if probability value < ๐ถ i. Find the test statistic ๐๐ = (๐ฅฬ − ๐0 )⁄se(๐ฅฬ ) = −0.751 ii. Find ๐(๐ก < ๐๐) ๐(๐ก < −0.751) = 0.2300 (See the note below) iii. Reject ๐ป0 if p-value < ๐ผ. Do not reject ๐ป0 since 0.2300 > 0.05. NOTE: Using Excel, =T.DIST.RT(0.751,24) = 0.2300. If a computer is not available, you can use the t table to determine if the p-value is greater than or less than the level of significance: df 23 24 25 0.100 1.319 1.318 1.316 0.050 1.714 1.711 1.708 0.025 2.069 2.064 2.060 0.010 2.500 2.492 2.485 0.005 2.807 2.797 2.787 Note that |t| = 0.751 is less than 1.318, the smallest of the shown t scores corresponding to df = 24, which is associated with a tail area of 0.10, the largest of the shown tail areas. Thus, |t| = 0.751 must be associated with a much larger tail area than 0.10, which, in turn, would exceed α = 0.05. 4. Test of Hypothesis on Population Proportion ๐ The hypothesis test about the population proportion follows a pattern similar to that for the population mean. You compare the sample proportion ๐ฬ to the value stated in the null hypothesis regarding ๐. If the difference between ๐ฬ and ๐0 exceeds MOE, then this difference is statistically significant and you reject the null hypothesis. Chapter 6—Hypothesis Tests Page 18 of 22 Example 7 To test the hypothesis that the proportion of all Hoosier adults in the labor force with a 4-year college degree is 26 percent, a sample of 600 Hoosier adults in the labor force is selected. The sample proportion is 27.3 percent. Test the hypothesis at 5 percent level of significance. ๐ป0 : ๐ = 0.26 ๐ป1 : ๐ ≠ 0.26 This is a two-tail test, because we are testing the hypothesis that the population proportion is 26 percent. ๐ = 600 ๐ผ = 0.05 ๐ฬ = 0.273 ๐ง๐ผ/2 = 1.96 To perform the test you need to determine the standard error of ๐ฬ . Use the following formula: π0 (1 − π0 ) se(๐) = √ ๐ Note that to find se(๐ฬ ), unlike the standard error in the confidence interval problems, instead of the sample proportion ๐ฬ you use π0 in the formula. This is logical because we are presuming the population proportion is the value specified in the null hypothesis. 0.26(1 − 0.26) se(๐) = √ = 0.0179 600 The margin of error and the acceptance region for the test are determined as follows: ๐๐๐ธ = ๐ง๐ผ⁄2 se(๐ฬ ) = 1.96 × 0.0179 = 0.035 ๐ฟ, ๐ = ๐0 ± ๐๐๐ธ ๐ฟ = 0.26 − 0.035 = 0.225 ๐ = 0.26 + 0.035 = 0.295 pฬ = 0.273 0.225 πโ = 0.26 0.295 0.225 ≤ Acceptance Region ≤ 0.295 The sample pฬ = 0.273 falls within the acceptance region. Chapter 6—Hypothesis Tests Page 19 of 22 Decision Rule (a)—Reject ๐ฏ๐ if ๐ป๐บ > ๐ช๐ฝ i. Find the critical value ii. Find the test statistic ๐๐ iii. Reject H0, if ๐๐ > ๐ถ๐ ๐ถ๐ = ๐ง๐ผ⁄2 = ๐ง0.025 = 1.96 ๐๐ = (๐ฬ − ๐0 )⁄se(๐ฬ ) = (0.273 − 0.26)⁄0.0179 = 0.73 Do not reject ๐ป0 since 0.73 < 1.96 Decision Rule (b)—Reject Hโ if probability value < α i. Find ๐๐ ๐๐ = 0.73 ii. Find the ๐๐๐๐ ๐ฃ๐๐๐ข๐ 2 × P(๐ง > ๐๐) 2 × ๐(๐ง > 0.73) = 2 × 0.2327 = 0.4654 iii. Reject ๐ป0 , if ๐๐๐๐ ๐ฃ๐๐๐ข๐ < ๐ผ: Do not reject since 0.4654 > 0.05 The test of hypothesis provides that we should not reject the null hypothesis that ๐ป0 : ๐ = 0.26. Therefore we conclude that the proportion of all Hoosier adults in the labor force with a 4-year college degree is 26 percent. Example 8 A pest control company claims that no more than 15% of its customers need repeated treatment after a 90day warranty period. To test the validity of this claim, a consumer organization selected a sample of 300 customers and found that 57 needed repeated treatment after the 90-day warranty period. Is there evidence, at 5% level of significance, that the claims is not valid? Here the claim is "no more than" 15%... The symbol for "no more than" or "at most" is ≤. This symbol must be stated in the null hypothesis. The alternative is then "greater than" 15%, which is shown as > 15%. This makes the test an upper tail test. ๐ป0 : ๐ ≤ 0.15 ๐ป1 : ๐ > 0.15 ๐ = 200 ๐ฬ = 57⁄200 = 0.19 ๐ผ = 0.05 ๐ง๐ผ = 1.64 0.15(1 − 0.15) se(๐) = √ = 0.0206 300 Compute ๐๐๐ธ. Note that since this is a one tail test. Therefore, you must use ๐ง๐ผ , rather than ๐ง๐ผ⁄2 , to obtain ๐๐๐ธ. ๐๐๐ธ = ๐ง๐ผ se(๐ฬ ) = 1.64 × 0.0206 = 0.034 For the acceptance region: ๐ = ๐0 + ๐๐๐ธ = 0.15 + 0.034 = 0.184 Chapter 6—Hypothesis Tests Page 20 of 22 pฬ = 0.19 πโ = 0.15 0.184 Acceptance Region ≤ 0.184 The sample statistic ๐ฬ = 0.19 falls inside the acceptance region. Therefore, do not reject H0. Decision Rule (a)—Reject Hโ if ๐ป๐บ > ๐ช๐ฝ i. Find the critical value ii. Find the test statistic iii. Reject H0, if ๐๐ > ๐ถ๐. ๐ถ๐ = ๐ง๐ผ = ๐ง0.05 = 1.64 ๐๐ = (๐ฬ − ๐0 )⁄se(๐ฬ ) = 1.94 Reject ๐ป0 since ๐๐ = 1.94 > ๐ถ๐ = 1.64 Decision Rule (b)—Reject Hโ if probability value < α i. Find the test statistic ii. Find the ๐๐๐๐ ๐ฃ๐๐๐ข๐ iii. Reject ๐ป0 , if ๐๐๐๐ ๐ฃ๐๐๐ข๐ < ๐ผ ๐๐ = (๐ฬ − ๐0 )⁄se(๐ฬ ) = 1.94 P(z > 1.94) = 0.0.0262 Reject ๐ป0 since ๐๐๐๐ ๐ฃ๐๐๐ข๐ = 0.0262 < ๐ผ = 0.05 Both methods indicate that the null hypothesis H0: π ≤ 0.15 should be rejected. The test does not support the company’s claim that no more than 15% of its customers need repeated treatment after a 90-day warranty period. Example 9 To test the hypothesis that less than 40% of drivers on a certain highway obey the legal speed limit, in a random sample of 700 vehicles clocked secretly, 252 observed the legal speed limit. Is there significant evidence that less than 40% of drivers observe the legal speed limit? Perform the test at a 5 percent level of significance. Since the hypothesis to be tested is “less than” 40 percent (๐0 < 0.40), then this is a lower tail test: ๐ป0 : ๐ ≥ 0.40 ๐ป1 : ๐ < 0.40 Compute the sample proportion: ๐ฬ = ๐ฅ ⁄๐ = 252⁄700 = 0.36 Since this is a lower tail test, the deviation of the sample proportion from the null value for the proportion should be a negative value. ๐ฬ − ๐0 = 0.36 − 0.40 = −0.04 se(๐ฬ ) = √๐0 (1 − ๐0 )⁄๐ = √0.40(1 − 0.40)⁄700 = 0.0185 Chapter 6—Hypothesis Tests Page 21 of 22 ๐๐๐ธ = 1.64 × 0.0185 = 0.03 pฬ = 0.36 0.37 πโ = 0.15 The test statistic is then, ๐๐ = (๐ฬ − ๐0 )⁄se(๐ฬ ) |๐๐| = |0.36 − 0.40|/0.0185 = 2.16 At ๐ผ = 0.05, the critical value is, ๐ถ๐ = ๐ง0.05 = 1.64 Decision rule: reject ๐ป0 , if |๐๐| > ๐ถ๐: Since |๐๐| = 2.16 > ๐ถ๐ = 1.64, reject ๐ป0 . Conclude that less than 40 percent of drivers observe the legal speed limit. The pโvalue for the test is ๐(๐ง < −2.16) = 0.0154 Since ๐­ ๐ฃ๐๐๐ข๐ = 0.0154 < ๐ผ = 0.05, reject ๐ป0 . Chapter 6—Hypothesis Tests Page 22 of 22