Statistics 305 Small Sample Confidence Intervals and Significance Tests for Population Median

advertisement
Statistics 305
Small Sample Confidence Intervals and Significance Tests
for Population Median
When sample size is small and the sampled population cannot be assumed to be normal then a
so-called nonparametric method can be employed. There are many such methods. The one
described herein is based on the sign statistic. It is used to make inference about the population
median M.
To indicate the nature of the underlying theory, consider the operation of sampling from a
population having median M. Each sample value has probability ½ of exceeding M (a success).
The sample selections are independent. Thus we can view the process as a Binomial
Experiment having n trials and probability of success on any trial being ½. The number of
sample values larger than M is modeled as a random variable B having the Binomial B(n, ½)
distribution for sample size n. Let us use the notation b(α, n, ½) for the integer satisfying P(B ≥
b(α, n, ½) ) = α for selected α. Due to the symmetry of this distribution we also have P(B ≤ n −
b(α, n, ½) ) = α. Judicious choice of α is necessary. There are only a few values that will
exactly satisfy these equations.
Significance Test for Median M
Let B0 be the number of sample values that are greater than or equal to the hypothesized value
M0. The test is then defined as one of the three cases:
1. H 0 : M = M 0 vs. H a : M > M 0 , reject H 0 if B0 ≥ b (α , n, 1 / 2) .
2. H 0 : M = M 0 vs. H a : M < M 0 , reject H 0 if B0 ≤ n − b (α , n, 1 / 2) .
3. H 0 : M = M 0 vs. H a : M ≠ M 0 , reject if B0 ≤ n − b (α / 2, n, 1 / 2) or
B0 ≥ b (α / 2, n, 1 / 2) .
Practically, we look in the body of the table of Binomial Cumulative Probabilities and select α so
that P ( B ≥ b (α , n, 1 / 2) ) = α is satisfied and α is small enough to be acceptable to us. Then the
b (α , n, 1 / 2) integer is extracted from the margin of the table and compared to B0.
The test procedures described above are one way to implement the basic ideas that constitute a
significance test in this situation. Take the first case for example. We first assume that H0 is
true, i.e. M = M0. The random sample yields a number of values B0 which are greater than or
equal to M0. If B0 is “too large” we will take that as evidence to reject H0 and say the evidence
favors H a : M > M 0 . We judge how large is “too large” by looking at the p-value defined as
P ( B ≥ B 0 ) . If it is “sufficiently small” we reject H0. The test statistic in this case is B ~
Binomial (n, ½). The finite number of possible values for B0 limits the possible p-values to a
finite set so we don’t have the continuum of possible p-values that exists when the test statistic is
continuous.
Approximate 100 (1 − α )% Confidence Interval for M
Select a desired α value (any one you wish). Let X 1 , X 2 , K , X n denote the sample values. Sort
these smallest to largest and denote the result as X (1) ≤ X ( 2) ≤ K ≤ X ( n) . Now proceed as
follows:
1. Find, from the Table of Binomial Cumulative Probabilities, the integer (l – 1) such that
P ( B ≤ l − 1) ≤ α / 2 and P ( B ≤ l − 1) is as close as possible to α /2 in the table without
exceeding it. Add one to l – 1 to obtain l.
2. Compute n − l + 1 ≡ u .
3. The approximate 100 (1 − α )% confidence interval for M is then ( X (l ) , X (u ) ) .
The exact level of confidence is 1 − 2 P ( B ≤ l − 1) . A large sample approximation to l is found
by computing
n
1 + n / 2 − zα / 2
4
and rounding down if necessary to get an integer l.
Why not use this in all cases and forget about large sample approximate normal and small
sample student’s t based methods? The answer is that those methods are more powerful when
their required conditions are satisfied.
2
Example
Consider the data given in the JMP output, and the JMP analysis of their distribution. Certainly
a symmetric population distribution is not suggested by this sample.
Let us first find an approximate 95% confidence interval for M. From the table of CDF values in
B (25, .05) read l − 1 = 7 corresponding to the probability P(B ≤ 7) = 0.02164263 with n = 25.
Thus l = 8 and u = n − l + 1 = 18. The eighth and eighteenth values in the ordered list of sample
values yields the interval having endpoints (3.9, 6.7).
To perform the significance test of
H0: M = 5 versus Ha: M > 5
We proceed as follows. From the table of upper quantiles read b(0.02164263, 25, ½) = 18.
There are 13 sample values greater than or equal to 5, so B0 = 13. Since B0 < 18 we do not reject
H0. The data fail to suggest that M > 5.
3
4
5
Download