Statistics 305 Small Sample Confidence Intervals and Significance Tests for Population Median When sample size is small and the sampled population cannot be assumed to be normal then a so-called nonparametric method can be employed. There are many such methods. The one described herein is based on the sign statistic. It is used to make inference about the population median M. To indicate the nature of the underlying theory, consider the operation of sampling from a population having median M. Each sample value has probability ½ of exceeding M (a success). The sample selections are independent. Thus we can view the process as a Binomial Experiment having n trials and probability of success on any trial being ½. The number of sample values larger than M is modeled as a random variable B having the Binomial B(n, ½) distribution for sample size n. Let us use the notation b(α, n, ½) for the integer satisfying P(B ≥ b(α, n, ½) ) = α for selected α. Due to the symmetry of this distribution we also have P(B ≤ n − b(α, n, ½) ) = α. Judicious choice of α is necessary. There are only a few values that will exactly satisfy these equations. Significance Test for Median M Let B0 be the number of sample values that are greater than or equal to the hypothesized value M0. The test is then defined as one of the three cases: 1. H 0 : M = M 0 vs. H a : M > M 0 , reject H 0 if B0 ≥ b (α , n, 1 / 2) . 2. H 0 : M = M 0 vs. H a : M < M 0 , reject H 0 if B0 ≤ n − b (α , n, 1 / 2) . 3. H 0 : M = M 0 vs. H a : M ≠ M 0 , reject if B0 ≤ n − b (α / 2, n, 1 / 2) or B0 ≥ b (α / 2, n, 1 / 2) . Practically, we look in the body of the table of Binomial Cumulative Probabilities and select α so that P ( B ≥ b (α , n, 1 / 2) ) = α is satisfied and α is small enough to be acceptable to us. Then the b (α , n, 1 / 2) integer is extracted from the margin of the table and compared to B0. The test procedures described above are one way to implement the basic ideas that constitute a significance test in this situation. Take the first case for example. We first assume that H0 is true, i.e. M = M0. The random sample yields a number of values B0 which are greater than or equal to M0. If B0 is “too large” we will take that as evidence to reject H0 and say the evidence favors H a : M > M 0 . We judge how large is “too large” by looking at the p-value defined as P ( B ≥ B 0 ) . If it is “sufficiently small” we reject H0. The test statistic in this case is B ~ Binomial (n, ½). The finite number of possible values for B0 limits the possible p-values to a finite set so we don’t have the continuum of possible p-values that exists when the test statistic is continuous. Approximate 100 (1 − α )% Confidence Interval for M Select a desired α value (any one you wish). Let X 1 , X 2 , K , X n denote the sample values. Sort these smallest to largest and denote the result as X (1) ≤ X ( 2) ≤ K ≤ X ( n) . Now proceed as follows: 1. Find, from the Table of Binomial Cumulative Probabilities, the integer (l – 1) such that P ( B ≤ l − 1) ≤ α / 2 and P ( B ≤ l − 1) is as close as possible to α /2 in the table without exceeding it. Add one to l – 1 to obtain l. 2. Compute n − l + 1 ≡ u . 3. The approximate 100 (1 − α )% confidence interval for M is then ( X (l ) , X (u ) ) . The exact level of confidence is 1 − 2 P ( B ≤ l − 1) . A large sample approximation to l is found by computing n 1 + n / 2 − zα / 2 4 and rounding down if necessary to get an integer l. Why not use this in all cases and forget about large sample approximate normal and small sample student’s t based methods? The answer is that those methods are more powerful when their required conditions are satisfied. 2 Example Consider the data given in the JMP output, and the JMP analysis of their distribution. Certainly a symmetric population distribution is not suggested by this sample. Let us first find an approximate 95% confidence interval for M. From the table of CDF values in B (25, .05) read l − 1 = 7 corresponding to the probability P(B ≤ 7) = 0.02164263 with n = 25. Thus l = 8 and u = n − l + 1 = 18. The eighth and eighteenth values in the ordered list of sample values yields the interval having endpoints (3.9, 6.7). To perform the significance test of H0: M = 5 versus Ha: M > 5 We proceed as follows. From the table of upper quantiles read b(0.02164263, 25, ½) = 18. There are 13 sample values greater than or equal to 5, so B0 = 13. Since B0 < 18 we do not reject H0. The data fail to suggest that M > 5. 3 4 5