Hypotheses and Test Procedures A statistical hypothesis, or just hypothesis, is a claim or assertion either about the value of a single parameter (population characteristic or characteristic of a probability distribution), about the values of several parameters, or about the form of an entire probability distribution. One example of a hypothesis is the claim = .75, where is the true average inside diameter of a certain type of PVC pipe. Another example is the statement p < .10, where p is the proportion of defective circuit boards among all circuit boards produced by a certain manufacturer. 1 Hypotheses and Test Procedures If 1 and 2 denote the true average breaking strengths of two different types of twine, one hypothesis is the assertion that 1 – 2 = 0, and another is the statement 1 – 2 > 5. Yet another example of a hypothesis is the assertion that the stopping distance under particular conditions has a normal distribution. In any hypothesis-testing problem, there are two contradictory hypotheses under consideration. One hypothesis might be the claim = .75 and the other ≠ .75, or the two contradictory statements might be p .10 and p < .10. 2 Hypotheses and Test Procedures In this sense, the claim of innocence is the favored or protected hypothesis, and the burden of proof is placed on those who believe in the alternative claim. Similarly, in testing statistical hypotheses, the problem will be formulated so that one of the claims is initially favored. This initially favored claim will not be rejected in favor of the alternative claim unless sample evidence contradicts it and provides strong support for the alternative assertion. 3 Hypotheses and Test Procedures Definition The null hypothesis, denoted by H0, is the claim that is initially assumed to be true (the “prior belief” claim). The alternative hypothesis, denoted by Ha, is the assertion that is contradictory to H0.The null hypothesis will be rejected in favor of the alternative hypothesis only if sample evidence suggests that H0 is false. If the sample does not strongly contradict H0, we will continue to believe in the plausibility of the null hypothesis. The two possible conclusions from a hypothesis-testing analysis are then reject H0 or fail to reject H0. 4 Hypotheses and Test Procedures A test of hypotheses is a method for using sample data to decide whether the null hypothesis should be rejected. Thus we might test H0: = .75 against the alternative Ha: ≠ .75. Only if sample data strongly suggests that is something other than .75 should the null hypothesis be rejected. In the absence of such evidence, H0 should not be rejected, since it is still quite plausible. 5 Hypotheses and Test Procedures The objective is to decide, based on sample information, which of the two hypotheses is correct. There is a familiar analogy to this in a criminal trial. One claim is the assertion that the accused individual is innocent. In the U.S. judicial system, this is the claim that is initially believed to be true. Only in the face of strong evidence to the contrary should the jury reject this claim in favor of the alternative assertion that the accused is guilty. 6 Hypotheses and Test Procedures Sometimes an investigator does not want to accept a particular assertion unless and until data can provide strong support for the assertion. As an example, suppose a company is considering putting a new type of coating on bearings that it produces. The true average wear life with the current coating is known to be 1000 hours. With denoting the true average life for the new coating, the company would not want to make a change unless evidence strongly suggested that exceeds 1000. 7 Hypotheses and Test Procedures An appropriate problem formulation would involve testing H0: = 1000 against Ha: > 1000. The conclusion that a change is justified is identified with Ha, and it would take conclusive evidence to justify rejecting H0 and switching to the new coating. Scientific research often involves trying to decide whether a current theory should be replaced by a more plausible and satisfactory explanation of the phenomenon under investigation. 8 Hypotheses and Test Procedures A conservative approach is to identify the current theory with H0 and the researcher’s alternative explanation with Ha. Rejection of the current theory will then occur only when evidence is much more consistent with the new theory. In many situations, Ha is referred to as the “researcher’s hypothesis,” since it is the claim that the researcher would really like to validate. 9 Hypotheses and Test Procedures The word null means “of no value, effect, or consequence,” which suggests that H0 should be identified with the hypothesis of no change (from current opinion), no difference, no improvement, and so on. Suppose, for example, that 10% of all circuit boards produced by a certain manufacturer during a recent period were defective. An engineer has suggested a change in the production process in the belief that it will result in a reduced defective rate. 10 Hypotheses and Test Procedures Let p denote the true proportion of defective boards resulting from the changed process. Then the research hypothesis, on which the burden of proof is placed, is the assertion that p < .10. Thus the alternative hypothesis is Ha: p < .10. In our treatment of hypothesis testing, H0 will generally be stated as an equality claim. If denotes the parameter of interest, the null hypothesis will have the form H0: = 0, where 0 is a specified number called the null value of the parameter (value claimed for by the null hypothesis). 11 Hypotheses and Test Procedures The alternative to the null hypothesis Ha: = 0 will look like one of the following three assertions: 1. Ha: > 0 (in which case the implicit null hypothesis is 0), 2. Ha: < 0 (in which case the implicit null hypothesis is 0), or 3. Ha: ≠ 0 12 Test Procedures A test procedure is specified by the following: 1. A test statistic, a function of the sample data on which the decision (reject H0 or do not reject H0) is to be based 2. A rejection region, the set of all test statistic values for which H0 will be rejected The null hypothesis will then be rejected if and only if the observed or computed test statistic value falls in the rejection region. 13 Errors in Hypothesis Testing Definition A type I error consists of rejecting the null hypothesis H0 when it is true. A type II error involves not rejecting H0 when H0 is false. In the nicotine scenario, a type I error consists of rejecting the manufacturer’s claim that = 1.5 when it is actually true. If the rejection region x 1.6 is employed, it might happen that x = 1.63 even when = 1.5, resulting in a type I error. 14 Errors in Hypothesis Testing Alternatively, it may be that H0 is false and yet x = 1.52 is observed, leading to H0 not being rejected (a type II error). In the best of all possible worlds, test procedures for which neither type of error is possible could be developed. However, this ideal can be achieved only by basing a decision on an examination of the entire population. The difficulty with using a procedure based on sample data is that because of sampling variability, an unrepresentative sample may result, e.g., a value of X that is far from or a value of that differs considerably from p. 15 Errors in Hypothesis Testing Instead of demanding error-free procedures, we must seek procedures for which either type of error is unlikely to occur. That is, a good procedure is one for which the probability of making either type of error is small. The choice of a particular rejection region cutoff value fixes the probabilities of type I and type II errors. 16 Errors in Hypothesis Testing These error probabilities are traditionally denoted by and , respectively. Because H0 specifies a unique value of the parameter, there is a single value of . However, there is a different value of for each value of the parameter consistent with Ha. 17 Errors in Hypothesis Testing Proposition Suppose an experiment and a sample size are fixed and a test statistic is chosen. Then decreasing the size of the rejection region to obtain a smaller value of results in a larger value of for any particular parameter value consistent with Ha. 18 Errors in Hypothesis Testing This proposition says that once the test statistic and n are fixed, there is no rejection region that will simultaneously make both and all ’s small. A region must be chosen to effect a compromise between and . Because of the suggested guidelines for specifying H0 and Ha, a type I error is usually more serious than a type II error (this can always be achieved by proper choice of the hypotheses). 19 Errors in Hypothesis Testing The approach adhered to by most statistical practitioners is then to specify the largest value of a that can be tolerated and find a rejection region having that value of rather than anything smaller. This makes as small as possible subject to the bound on . The resulting value of is often referred to as the significance level of the test. Traditional levels of significance are .10, .05, and .01, though the level in any particular problem will depend on the seriousness of a type I error—the more serious this error, the smaller should be the significance level. 20 Errors in Hypothesis Testing The corresponding test procedure is called a level test (e.g., a level .05 test or a level .01 test). A test with significance level is one for which the type I error probability is controlled at the specified level. 21 Case I: A Normal Population with Known Although the assumption that the value of is known is rarely met in practice, this case provides a good starting point because of the ease with which general procedures and their properties can be developed. The null hypothesis in all three cases will state that has a particular numerical value, the null value, which we will denote by 0 . Let X1,…, Xn represent a random sample of size n from the normal population. 22 Case I: A Normal Population with Known Then the sample mean has a normal distribution with expected value and standard deviation When H0 is true, Consider now the statistic Z obtained by standardizing under the assumption that H0 is true: 23 Case I: A Normal Population with Known Null hypothesis: H0 : = 0 Test statistic value : Alternative Hypothesis Rejection Region for Level Test 24 Case I: A Normal Population with Known Use of the following sequence of steps is recommended when testing hypotheses about a parameter. 1. Identify the parameter of interest and describe it in the context of the problem situation. 2. Determine the null value and state the null hypothesis. 3. State the appropriate alternative hypothesis. 25 Case I: A Normal Population with Known 4. Give the formula for the computed value of the test statistic (substituting the null value and the known values of any other parameters, but not those of any samplebased quantities). 5. State the rejection region for the selected significance level . 6. Compute any necessary sample quantities, substitute into the formula for the test statistic value, and compute that value. 26 Case I: A Normal Population with Known 7. Decide whether H0 should be rejected, and state this conclusion in the problem context. The formulation of hypotheses (Steps 2 and 3) should be done before examining the data. 27 Case I: A Normal Population with Known and Sample Size Determination The z tests for case I are among the few in statistics for which there are simple formulas available for , the probability of a type II error. Consider first the upper-tailed test with rejection region z z. This is equivalent to rejected if so H0 will not be 28 Case I: A Normal Population with Known Now let denote a particular value of that exceeds the null value 0. Then, 29 Case I: A Normal Population with Known As increases, 0 – becomes more negative, so ( ) will be small when greatly exceeds 0 (because the value at which is evaluated will then be quite negative). Error probabilities for the lower-tailed and two-tailed tests are derived in an analogous manner. If is large, the probability of a type II error can be large at an alternative value that is of particular concern to an investigator. 30 Case I: A Normal Population with Known Suppose we fix and also specify for such an alternative value. In the sprinkler example, company officials might view = 132 as a very substantial departure from H0: = 130 and therefore wish (132) = .10 in addition to = .01. More generally, consider the two restrictions P(type I error) = and ( ) = for specified , and . 31 Case I: A Normal Population with Known Then for an upper-tailed test, the sample size n should be chosen to satisfy This implies that 32 Case I: A Normal Population with Known It is easy to solve this equation for the desired n. A parallel argument yields the necessary sample size for lower- and two-tailed tests as summarized in the next box. Alternative Hypothesis Type II Error Probability for a Level a Test 33 Case I: A Normal Population with Known where (z) = the standard normal cdf. The sample size n for which a level test also has ( ) = at the alternative value is for a one-tailed (upper or lower) test for a two-tailed test (an approximate solution) 34 Case II: Large-Sample Tests When the sample size is large, the z tests for case I are easily modified to yield valid test procedures without requiring either a normal population distribution or known . Earlier we used the key result to justify large-sample confidence intervals: A large n implies that the standardized variable has approximately a standard normal distribution. 35 Case II: Large-Sample Tests The use of rejection regions given previously for case I (e.g., z z when the alternative hypothesis is Ha: > 0) then results in test procedures for which the significance level is approximately (rather than exactly) . The rule of thumb n > 40 will again be used to characterize a large sample size. 36 Large-Sample Tests Large-sample tests concerning p are a special case of the more general large-sample procedures for a parameter . Let be an estimator of that is (at least approximately) unbiased and has approximately a normal distribution. The null hypothesis has the form H0: = 0 where 0 denotes a number (the null value) appropriate to the problem context. 37 Large-Sample Tests The estimator is unbiased , has approximately a normal distribution, and its standard deviation is When H0 is true, and so does not involve any unknown parameters. It then follows that when n is large and H0 is true, the test statistic has approximately a standard normal distribution. 38 Large-Sample Tests If the alternative hypothesis is Ha: p > p0 and the upper-tailed rejection region z z is used, then P(type I error) = P(H0 is rejected when it is true) = P(Z z when Z has approximately a standard normal distribution) Thus the desired level of significance is attained by using the critical value that captures area in the upper tail of the z curve. 39 Large-Sample Tests Rejection regions for the other two alternative hypotheses, lower-tailed for Ha: p < p0 and two-tailed for Ha: p ≠ p0, are justified in an analogous manner. Null hypothesis: H0: p = p0 Test statistic value: 40 Large-Sample Tests Alternative Hypothesis Rejection Region Ha: p > p0 z z (upper-tailed) Ha: p < p0 z –z (lower-tailed) Ha: p ≠ p0 either z z/2 or z –z/2 (two-tailed) These test procedures are valid provided that np0 10 and n(1 – p0) 10. 41 Small-Sample Tests Test procedures when the sample size n is small are based directly on the binomial distribution rather than the normal approximation. Consider the alternative hypothesis Ha: p > p0 and again let X be the number of successes in the sample. Then X is the test statistic, and the upper-tailed rejection region has the form x c. When H0 is true, X has a binomial distribution with parameters n and p0, so P(type I error) = P(H0 is rejected when it is true) = P(X c when X ~ Bin(n, p0)) 42 Small-Sample Tests = 1 – P(X c – 1 when X ~ Bin(n, p0)) = 1 – B(c – 1; n, p0) As the critical value c decreases, more x values are included in the rejection region and P(type I error) increases. Because X has a discrete probability distribution, it is usually not possible to find a value of c for which P(type I error) is exactly the desired significance level (e.g., .05 or .01). Instead, the largest rejection region of the form {c, c + 1, … , n} satisfying 1 – B(c – 1: n, p0) is used. 43 Small-Sample Tests Let p denote an alternative value of p(p > p0). When p = p, X ~ Bin(n, p), so (p) = P(type II error when p = p) = P(X < c when X ~ Bin(n, p)) = B(c – 1; n, p) 44 Small-Sample Tests That is, (p), is the result of a straightforward binomial probability calculation. The sample size n necessary to ensure that a level test also has specified at a particular alternative value p must be determined by trial and error using the binomial cdf. Test procedures for Ha: p < p0 and for Ha: p ≠ p0 are constructed in a similar manner. In the former case, the appropriate rejection region has the form x c (a lower-tailed test). 45 Small-Sample Tests The critical value c is the largest number satisfying B(c; n, p0) . The rejection region when the alternative hypothesis is Ha: p ≠ p0 consists of both large and small x values. 46 Case III: A Normal Population Distribution When n is small, the Central Limit Theorem (CLT) can no longer be invoked to justify the use of a large-sample test. Our approach here will be the same one used there: We will assume that the population distribution is at least approximately normal and describe test procedures whose validity rests on this assumption. 47 Case III: A Normal Population Distribution The key result on which tests for a normal population mean are based was used to derive the one-sample t CI: If X1, X2,…, Xn is a random sample from a normal distribution, the standardized variable has a t distribution with n – 1 degrees of freedom (df). 48 Case III: A Normal Population Distribution Consider testing against H0: = 0 against Ha: > 0 by using the test statistic That is, the test statistic results from standardizing under the assumption that H0 is true (using the estimated standard deviation of , rather than ). When H0 is true, the test statistic has a t distribution with n – 1 df. 49 Case III: A Normal Population Distribution Knowledge of the test statistic’s distribution when H0 is true (the “null distribution”) allows us to construct a rejection region for which the type I error probability is controlled at the desired level. In particular, use of the upper-tail t critical value to specify the rejection region implies that P(type I error) = P(H0 is rejected when it is true) = P(T t,n – 1 when T has a t distribution with n – 1 df) = 50 Case III: A Normal Population Distribution The One-Sample t Test Null hypothesis: H0: = 0 Test statistic value: Alternative Hypothesis Rejection Region for a Level Test 51 Case III: A Normal Population Distribution and Sample Size Determination The calculation of at the alternative value in case I was carried out by expressing the rejection region in terms of (e.g., ) and then subtracting to standardize correctly. An equivalent approach involves noting that when = the test statistic z = ( – 0 )/( ) still has a normal distribution with variance 1, but now the mean value of Z is given by ( – 0)( ). That is, when = , the test statistic still has a normal distribution though not the standard normal distribution. 52 Case III: A Normal Population Distribution Because of this, ( ) is an area under the normal curve corresponding to mean value ( – 0)/ ) and variance 1. Both and involve working with normally distributed variables. The calculation of ( ) for the t test is much less straightforward. This is because the distribution of the test statistic T = ( – 0)/(S/ ) is quite complicated when H0 is false and Ha is true. Thus, for an upper-tailed test, determining ( ) = P(T when = rather than 0) involves integrating a very unpleasant density function. This 53 must be done numerically. Case III: A Normal Population Distribution The value of is the height of the n – 1 curve above the value of d (visual interpolation is necessary if n – 1 is not a value for which the corresponding curve appears), as illustrated in Figure 8.5. A typical curve for the t test Figure 8.5 54 Case III: A Normal Population Distribution Rather than fixing n (i.e., n – 1 and thus the particular curve from which is read), one might prescribe both (.05 or .01 here) and a value of for the chosen and . After computing d, the point (d, ) is located on the relevant set of graphs. The curve below and closest to this point gives n – 1 and thus n (again, interpolation is often necessary). 55 Case III: A Normal Population Distribution Most of the widely used statistical software packages are capable of calculating type II error probabilities. They generally work in terms of power, which is simply 1 – . A small value of (close to 0) is equivalent to large power (near 1). A powerful test is one that has high power and therefore good ability to detect when the null hypothesis is false. 56 Case III: A Normal Population Distribution Finally, Minitab now also provides power curves for the specified sample sizes, as shown in Figure 8.6. Such curves show how the power increases for each sample size as the actual value of moves further and further away from the null value. Power curves from Minitab for the t test of Example 10 Figure 8.6 57 P-Values One advantage is that the P-value provides an intuitive measure of the strength of evidence in the data against H0. Definition The P-value is the probability, calculated assuming that the null hypothesis is true, of obtaining a value of the test statistic at least as contradictory to H0 as the value calculated from the available sample. 58 P-Values This definition is quite a mouthful. Here are some key points: • The P-value is a probability. • This probability is calculated assuming that the null hypothesis is true. • Beware: The P-value is not the probability that H0 is true, nor is it an error probability! • To determine the P-value, we must first decide which values of the test statistic are at least as contradictory to H0 as the value obtained from our sample. 59 P-Values We will shortly illustrate how to determine the P-value for any z or t test—i.e., any test where the reference distribution is the standard normal distribution (and z curve) or some t distribution (and corresponding t curve). For the moment, though, let’s focus on reaching a conclusion once the P-value is available. Because it is a probability, the P-value must be between 0 and 1. 60 P-Values What kinds of P-values provide evidence against the null hypothesis? Consider two specific instances: • P-value = .250: In this case, fully 25% of all possible test statistic values are at least as contradictory to H0 as the one that came out of our sample. So our data is not all that contradictory to the null hypothesis. 61 P-Values • P-value = .0018: Here, only .18% (much less than 1%) of all possible test statistic values are at least as contradictory to H0 as what we obtained. Thus the sample appears to be highly contradictory to the null hypothesis. More generally, the smaller the P-value, the more evidence there is in the sample data against the null hypothesis and for the alternative hypothesis. That is, H0 should be rejected in favor of Ha when the P-value is sufficiently small. So what constitutes “sufficiently small”? 62 P-Values Decision rule based on the P-value Select a significance level (as before, the desired type I error probability). Then reject H0 if P-value do not reject H0 if P-value > Thus if the P-value exceeds the chosen significance level, the null hypothesis cannot be rejected at that level. 63 P-Values for z Tests Since –z = |z| when z is negative, P-value = 2[1 – (|z|)] for either positive or negative z. Each of these is the probability of getting a value at least as extreme as what was obtained (assuming H0 true). 64 P-Values for z Tests The three cases are illustrated in Figure 8.9. Determination of the P-value for a z test Figure 8.9 65 P-Values for z Tests cont’d Determination of the P-value for a z test Figure 8.9 66 P-Values for t Tests Just as the P-value for a z test is a z curve area, the P-value for a t test will be a t-curve area. Figure 8.10 illustrates the three different cases. The number of df for the one-sample t test is n – 1. P-values for t tests Figure 8.10 67 P-Values for t Tests cont’d P-values for t tests Figure 8.10 68