Document

advertisement
Hypotheses and Test Procedures
A statistical hypothesis, or just hypothesis, is a claim or
assertion either about the value of a single parameter
(population characteristic or characteristic of a probability
distribution), about the values of several parameters, or
about the form of an entire probability distribution.
One example of a hypothesis is the claim  = .75, where
 is the true average inside diameter of a certain type of
PVC pipe.
Another example is the statement p < .10, where p is the
proportion of defective circuit boards among all circuit
boards produced by a certain manufacturer.
1
Hypotheses and Test Procedures
If 1 and 2 denote the true average breaking strengths of
two different types of twine, one hypothesis is the assertion
that 1 – 2 = 0, and another is the statement 1 – 2 > 5.
Yet another example of a hypothesis is the assertion that
the stopping distance under particular conditions has a
normal distribution.
In any hypothesis-testing problem, there are two
contradictory hypotheses under consideration. One
hypothesis might be the claim  = .75 and the other  ≠ .75,
or the two contradictory statements might be p  .10 and
p < .10.
2
Hypotheses and Test Procedures
In this sense, the claim of innocence is the favored or
protected hypothesis, and the burden of proof is placed on
those who believe in the alternative claim.
Similarly, in testing statistical hypotheses, the problem will
be formulated so that one of the claims is initially favored.
This initially favored claim will not be rejected in favor of the
alternative claim unless sample evidence contradicts it and
provides strong support for the alternative assertion.
3
Hypotheses and Test Procedures
Definition
The null hypothesis, denoted by H0, is the claim that is
initially assumed to be true (the “prior belief” claim).
The alternative hypothesis, denoted by Ha, is the
assertion that is contradictory to H0.The null hypothesis will
be rejected in favor of the alternative hypothesis only if
sample evidence suggests that H0 is false.
If the sample does not strongly contradict H0, we will
continue to believe in the plausibility of the null hypothesis.
The two possible conclusions from a hypothesis-testing
analysis are then reject H0 or fail to reject H0.
4
Hypotheses and Test Procedures
A test of hypotheses is a method for using sample data to
decide whether the null hypothesis should be rejected.
Thus we might test H0:  = .75 against the alternative
Ha:  ≠ .75. Only if sample data strongly suggests that  is
something other than .75 should the null hypothesis be
rejected.
In the absence of such evidence, H0 should not be rejected,
since it is still quite plausible.
5
Hypotheses and Test Procedures
The objective is to decide, based on sample information,
which of the two hypotheses is correct.
There is a familiar analogy to this in a criminal trial. One
claim is the assertion that the accused individual is
innocent.
In the U.S. judicial system, this is the claim that is initially
believed to be true. Only in the face of strong evidence to
the contrary should the jury reject this claim in favor of the
alternative assertion that the accused is guilty.
6
Hypotheses and Test Procedures
Sometimes an investigator does not want to accept a
particular assertion unless and until data can provide
strong support for the assertion.
As an example, suppose a company is considering putting
a new type of coating on bearings that it produces.
The true average wear life with the current coating is
known to be 1000 hours. With  denoting the true average
life for the new coating, the company would not want to
make a change unless evidence strongly suggested that 
exceeds 1000.
7
Hypotheses and Test Procedures
An appropriate problem formulation would involve testing
H0:  = 1000 against Ha:  > 1000.
The conclusion that a change is justified is identified with
Ha, and it would take conclusive evidence to justify rejecting
H0 and switching to the new coating.
Scientific research often involves trying to decide whether a
current theory should be replaced by a more plausible and
satisfactory explanation of the phenomenon under
investigation.
8
Hypotheses and Test Procedures
A conservative approach is to identify the current theory with
H0 and the researcher’s alternative explanation with Ha.
Rejection of the current theory will then occur only when
evidence is much more consistent with the new theory.
In many situations, Ha is referred to as the “researcher’s
hypothesis,” since it is the claim that the researcher would
really like to validate.
9
Hypotheses and Test Procedures
The word null means “of no value, effect, or consequence,”
which suggests that H0 should be identified with the
hypothesis of no change (from current opinion), no
difference, no improvement, and so on.
Suppose, for example, that 10% of all circuit boards
produced by a certain manufacturer during a recent period
were defective.
An engineer has suggested a change in the production
process in the belief that it will result in a reduced defective
rate.
10
Hypotheses and Test Procedures
Let p denote the true proportion of defective boards
resulting from the changed process.
Then the research hypothesis, on which the burden of
proof is placed, is the assertion that p < .10. Thus the
alternative hypothesis is Ha: p < .10.
In our treatment of hypothesis testing, H0 will generally be
stated as an equality claim. If  denotes the parameter of
interest, the null hypothesis will have the form H0:  = 0,
where 0 is a specified number called the null value of the
parameter (value claimed for  by the null hypothesis).
11
Hypotheses and Test Procedures
The alternative to the null hypothesis Ha:  = 0 will look like
one of the following three assertions:
1. Ha:  > 0 (in which case the implicit null hypothesis is
  0),
2. Ha:  < 0 (in which case the implicit null hypothesis is
  0), or
3. Ha:  ≠ 0
12
Test Procedures
A test procedure is specified by the following:
1. A test statistic, a function of the sample data on which
the decision (reject H0 or do not reject H0) is to be based
2. A rejection region, the set of all test statistic values for
which H0 will be rejected
The null hypothesis will then be rejected if and only if the
observed or computed test statistic value falls in the
rejection region.
13
Errors in Hypothesis Testing
Definition
A type I error consists of rejecting the null hypothesis H0
when it is true.
A type II error involves not rejecting H0 when H0 is false.
In the nicotine scenario, a type I error consists of rejecting
the manufacturer’s claim that  = 1.5 when it is actually
true.
If the rejection region x  1.6 is employed, it might happen
that x = 1.63 even when  = 1.5, resulting in a type I error.
14
Errors in Hypothesis Testing
Alternatively, it may be that H0 is false and yet x = 1.52 is
observed, leading to H0 not being rejected (a type II error).
In the best of all possible worlds, test procedures for which
neither type of error is possible could be developed.
However, this ideal can be achieved only by basing a
decision on an examination of the entire population. The
difficulty with using a procedure based on sample data is
that because of sampling variability, an unrepresentative
sample may result, e.g., a value of X that is far from  or a
value of that differs considerably from p.
15
Errors in Hypothesis Testing
Instead of demanding error-free procedures, we must seek
procedures for which either type of error is unlikely to
occur.
That is, a good procedure is one for which the probability of
making either type of error is small.
The choice of a particular rejection region cutoff value fixes
the probabilities of type I and type II errors.
16
Errors in Hypothesis Testing
These error probabilities are traditionally denoted by  and
, respectively.
Because H0 specifies a unique value of the parameter,
there is a single value of .
However, there is a different value of  for each value of the
parameter consistent with Ha.
17
Errors in Hypothesis Testing
Proposition
Suppose an experiment and a sample size are fixed and a
test statistic is chosen.
Then decreasing the size of the rejection region to obtain a
smaller value of  results in a larger value of  for any
particular parameter value consistent with Ha.
18
Errors in Hypothesis Testing
This proposition says that once the test statistic and n are
fixed, there is no rejection region that will simultaneously
make both  and all  ’s small.
A region must be chosen to effect a compromise between 
and .
Because of the suggested guidelines for specifying H0 and
Ha, a type I error is usually more serious than a type II error
(this can always be achieved by proper choice of the
hypotheses).
19
Errors in Hypothesis Testing
The approach adhered to by most statistical practitioners is
then to specify the largest value of a that can be tolerated
and find a rejection region having that value of  rather
than anything smaller.
This makes  as small as possible subject to the bound on
. The resulting value of  is often referred to as the
significance level of the test.
Traditional levels of significance are .10, .05, and .01,
though the level in any particular problem will depend on
the seriousness of a type I error—the more serious this
error, the smaller should be the significance level.
20
Errors in Hypothesis Testing
The corresponding test procedure is called a level  test
(e.g., a level .05 test or a level .01 test).
A test with significance level  is one for which the type I
error probability is controlled at the specified level.
21
Case I: A Normal Population with Known 
Although the assumption that the value of  is known is
rarely met in practice, this case provides a good starting
point because of the ease with which general procedures
and their properties can be developed.
The null hypothesis in all three cases will state that  has a
particular numerical value, the null value, which we will
denote by 0 . Let X1,…, Xn represent a random sample of
size n from the normal population.
22
Case I: A Normal Population with Known 
Then the sample mean has a normal distribution with
expected value
and standard deviation
When H0 is true,
Consider now the statistic Z
obtained by standardizing under the assumption that H0
is true:
23
Case I: A Normal Population with Known 
Null hypothesis: H0 :  = 0
Test statistic value :
Alternative Hypothesis
Rejection Region for Level  Test
24
Case I: A Normal Population with Known 
Use of the following sequence of steps is recommended
when testing hypotheses about a parameter.
1. Identify the parameter of interest and describe it in the
context of the problem situation.
2. Determine the null value and state the null hypothesis.
3. State the appropriate alternative hypothesis.
25
Case I: A Normal Population with Known 
4. Give the formula for the computed value of the test
statistic (substituting the null value and the known values
of any other parameters, but not those of any
samplebased quantities).
5. State the rejection region for the selected significance
level .
6. Compute any necessary sample quantities, substitute
into the formula for the test statistic value, and compute
that value.
26
Case I: A Normal Population with Known 
7. Decide whether H0 should be rejected, and state this
conclusion in the problem context.
The formulation of hypotheses (Steps 2 and 3) should be
done before examining the data.
27
Case I: A Normal Population with Known 
 and Sample Size Determination The z tests for case I
are among the few in statistics for which there are simple
formulas available for , the probability of a type II error.
Consider first the upper-tailed test with rejection region
z  z.
This is equivalent to
rejected if
so H0 will not be
28
Case I: A Normal Population with Known 
Now let   denote a particular value of  that exceeds the
null value 0. Then,
29
Case I: A Normal Population with Known 
As   increases, 0 –   becomes more negative, so
 ( ) will be small when   greatly exceeds 0 (because
the value at which  is evaluated will then be quite
negative).
Error probabilities for the lower-tailed and two-tailed tests
are derived in an analogous manner.
If  is large, the probability of a type II error can be large at
an alternative value   that is of particular concern to an
investigator.
30
Case I: A Normal Population with Known 
Suppose we fix  and also specify  for such an alternative
value. In the sprinkler example, company officials might
view   = 132 as a very substantial departure from
H0:  = 130 and therefore wish (132) = .10
in addition to  = .01.
More generally, consider the two restrictions
P(type I error) =  and ( ) =  for specified ,   and .
31
Case I: A Normal Population with Known 
Then for an upper-tailed test, the sample size n should be
chosen to satisfy
This implies that
32
Case I: A Normal Population with Known 
It is easy to solve this equation for the desired n. A parallel
argument yields the necessary sample size for lower- and
two-tailed tests as summarized in the next box.
Alternative Hypothesis
Type II Error Probability for
a Level a Test
33
Case I: A Normal Population with Known 
where (z) = the standard normal cdf.
The sample size n for which a level  test also has
( ) =  at the alternative value   is
for a one-tailed
(upper or lower) test
for a two-tailed test
(an approximate solution)
34
Case II: Large-Sample Tests
When the sample size is large, the z tests for case I are
easily modified to yield valid test procedures without
requiring either a normal population distribution or
known .
Earlier we used the key result to justify large-sample
confidence intervals:
A large n implies that the standardized variable
has approximately a standard normal distribution.
35
Case II: Large-Sample Tests
The use of rejection regions given previously for case I
(e.g., z  z when the alternative hypothesis is Ha:  > 0)
then results in test procedures for which the significance
level is approximately (rather than exactly) .
The rule of thumb n > 40 will again be used to characterize
a large sample size.
36
Large-Sample Tests
Large-sample tests concerning p are a special case of the
more general large-sample procedures for a parameter .
Let be an estimator of  that is (at least approximately)
unbiased and has approximately a normal distribution.
The null hypothesis has the form H0:  = 0 where 0
denotes a number (the null value) appropriate to the
problem context.
37
Large-Sample Tests
The estimator
is unbiased
, has
approximately a normal distribution, and its standard
deviation is
When H0 is true,
and
so
does not involve any unknown parameters. It then follows
that when n is large and H0 is true, the test statistic
has approximately a standard normal distribution.
38
Large-Sample Tests
If the alternative hypothesis is Ha: p > p0 and the upper-tailed
rejection region z  z is used, then
P(type I error) = P(H0 is rejected when it is true)
= P(Z  z when Z has approximately a
standard normal distribution)  
Thus the desired level of significance  is attained by using
the critical value that captures area  in the upper tail of the
z curve.
39
Large-Sample Tests
Rejection regions for the other two alternative hypotheses,
lower-tailed for Ha: p < p0 and two-tailed for Ha: p ≠ p0, are
justified in an analogous manner.
Null hypothesis: H0: p = p0
Test statistic value:
40
Large-Sample Tests
Alternative Hypothesis
Rejection Region
Ha: p > p0
z  z (upper-tailed)
Ha: p < p0
z  –z (lower-tailed)
Ha: p ≠ p0
either z  z/2
or z  –z/2 (two-tailed)
These test procedures are valid provided that np0  10 and
n(1 – p0)  10.
41
Small-Sample Tests
Test procedures when the sample size n is small are based
directly on the binomial distribution rather than the normal
approximation. Consider the alternative hypothesis
Ha: p > p0 and again let X be the number of successes in
the sample.
Then X is the test statistic, and the upper-tailed rejection
region has the form x  c. When H0 is true, X has a
binomial distribution with parameters n and p0, so
P(type I error) = P(H0 is rejected when it is true)
= P(X  c when X ~ Bin(n, p0))
42
Small-Sample Tests
= 1 – P(X  c – 1 when X ~ Bin(n, p0))
= 1 – B(c – 1; n, p0)
As the critical value c decreases, more x values are
included in the rejection region and P(type I error)
increases. Because X has a discrete probability
distribution, it is usually not possible to find a value of c for
which P(type I error) is exactly the desired significance
level  (e.g., .05 or .01).
Instead, the largest rejection region of the form
{c, c + 1, … , n} satisfying 1 – B(c – 1: n, p0)   is used.
43
Small-Sample Tests
Let p denote an alternative value of p(p > p0).
When p = p, X ~ Bin(n, p),
so
(p) = P(type II error when p = p)
= P(X < c when X ~ Bin(n, p))
= B(c – 1; n, p)
44
Small-Sample Tests
That is, (p), is the result of a straightforward binomial
probability calculation.
The sample size n necessary to ensure that a level  test
also has specified  at a particular alternative value p must
be determined by trial and error using the binomial cdf.
Test procedures for Ha: p < p0 and for Ha: p ≠ p0 are
constructed in a similar manner.
In the former case, the appropriate rejection region has the
form x  c (a lower-tailed test).
45
Small-Sample Tests
The critical value c is the largest number satisfying
B(c; n, p0)  .
The rejection region when the alternative hypothesis is
Ha: p ≠ p0 consists of both large and small x values.
46
Case III: A Normal Population Distribution
When n is small, the Central Limit Theorem (CLT) can no
longer be invoked to justify the use of a large-sample test.
Our approach here will be the same one used there: We
will assume that the population distribution is at least
approximately normal and describe test procedures whose
validity rests on this assumption.
47
Case III: A Normal Population Distribution
The key result on which tests for a normal population mean
are based was used to derive the one-sample t CI:
If X1, X2,…, Xn is a random sample from a normal
distribution, the standardized variable
has a t distribution with n – 1 degrees of freedom (df).
48
Case III: A Normal Population Distribution
Consider testing against H0:  = 0 against Ha:  > 0
by using the test statistic
That is, the test statistic results from standardizing
under the assumption that H0 is true (using
the
estimated standard deviation of , rather than
).
When H0 is true, the test statistic has a t distribution with
n – 1 df.
49
Case III: A Normal Population Distribution
Knowledge of the test statistic’s distribution when H0 is true
(the “null distribution”) allows us to construct a rejection
region for which the type I error probability is controlled at
the desired level.
In particular, use of the upper-tail t critical value
to specify the rejection region
implies that
P(type I error) = P(H0 is rejected when it is true)
= P(T  t,n – 1 when T has a t distribution
with n – 1 df)
=
50
Case III: A Normal Population Distribution
The One-Sample t Test
Null hypothesis: H0:  = 0
Test statistic value:
Alternative Hypothesis
Rejection Region for a Level 
Test
51
Case III: A Normal Population Distribution
 and Sample Size Determination
The calculation of  at the alternative value   in case I was
carried out by expressing the rejection region in terms of
(e.g.,
) and then subtracting  to
standardize correctly.
An equivalent approach involves noting that when  = 
the test statistic z = ( – 0 )/(
) still has a normal
distribution with variance 1, but now the mean value of Z is
given by ( – 0)(
). That is, when  = , the test
statistic still has a normal distribution though not the
standard normal distribution.
52
Case III: A Normal Population Distribution
Because of this, ( ) is an area under the normal curve
corresponding to mean value (  – 0)/
) and variance
1. Both  and  involve working with normally distributed
variables.
The calculation of ( ) for the t test is much less
straightforward. This is because the distribution of the test
statistic T = ( – 0)/(S/
) is quite complicated when H0 is
false and Ha is true. Thus, for an upper-tailed test,
determining
( ) = P(T 
when  =   rather than 0)
involves integrating a very unpleasant density function. This
53
must be done numerically.
Case III: A Normal Population Distribution
The value of  is the height of the n – 1 curve above the
value of d (visual interpolation is necessary if n – 1 is not a
value for which the corresponding curve appears), as
illustrated in Figure 8.5.
A typical  curve for the t test
Figure 8.5
54
Case III: A Normal Population Distribution
Rather than fixing n (i.e., n – 1 and thus the particular curve
from which  is read), one might prescribe both  (.05 or
.01 here) and a value of  for the chosen   and .
After computing d, the point (d, ) is located on the relevant
set of graphs.
The curve below and closest to this point gives n – 1 and
thus n (again, interpolation is often necessary).
55
Case III: A Normal Population Distribution
Most of the widely used statistical software packages are
capable of calculating type II error probabilities.
They generally work in terms of power, which is simply
1 – . A small value of  (close to 0) is equivalent to large
power (near 1).
A powerful test is one that has high power and therefore
good ability to detect when the null hypothesis is false.
56
Case III: A Normal Population Distribution
Finally, Minitab now also provides power curves for the
specified sample sizes, as shown in Figure 8.6. Such curves
show how the power increases for each sample size as the
actual value of  moves further and further away from the
null value.
Power curves from Minitab for the t test of Example 10
Figure 8.6
57
P-Values
One advantage is that the P-value provides an intuitive
measure of the strength of evidence in the data against H0.
Definition
The P-value is the probability, calculated assuming that the
null hypothesis is true, of obtaining a value of the test
statistic at least as contradictory to H0 as the value
calculated from the available sample.
58
P-Values
This definition is quite a mouthful. Here are some key
points:
• The P-value is a probability.
• This probability is calculated assuming that the null
hypothesis is true.
• Beware: The P-value is not the probability that H0
is true, nor is it an error probability!
• To determine the P-value, we must first decide which
values of the test statistic are at least as contradictory to
H0 as the value obtained from our sample.
59
P-Values
We will shortly illustrate how to determine the P-value for
any z or t test—i.e., any test where the reference
distribution is the standard normal distribution (and z curve)
or some t distribution (and corresponding t curve).
For the moment, though, let’s focus on reaching a
conclusion once the P-value is available.
Because it is a probability, the P-value must be between 0
and 1.
60
P-Values
What kinds of P-values provide evidence against the null
hypothesis?
Consider two specific instances:
• P-value = .250: In this case, fully 25% of all possible test
statistic values are at least as contradictory to H0 as the
one that came out of our sample. So our data is not all
that contradictory to the null hypothesis.
61
P-Values
• P-value = .0018: Here, only .18% (much less than 1%) of
all possible test statistic values are at least as
contradictory to H0 as what we obtained. Thus the sample
appears to be highly contradictory to the null hypothesis.
More generally, the smaller the P-value, the more evidence
there is in the sample data against the null hypothesis and
for the alternative hypothesis. That is, H0 should be rejected
in favor of Ha when the P-value is sufficiently small. So
what constitutes “sufficiently small”?
62
P-Values
Decision rule based on the P-value
Select a significance level  (as before, the desired type I
error probability).
Then
reject H0 if P-value  
do not reject H0 if P-value > 
Thus if the P-value exceeds the chosen significance level,
the null hypothesis cannot be rejected at that level.
63
P-Values for z Tests
Since –z = |z| when z is negative, P-value = 2[1 – (|z|)]
for either positive or negative z.
Each of these is the probability of getting a value at least as
extreme as what was obtained (assuming H0 true).
64
P-Values for z Tests
The three cases are illustrated in Figure 8.9.
Determination of the P-value for a z test
Figure 8.9
65
P-Values for z Tests
cont’d
Determination of the P-value for a z test
Figure 8.9
66
P-Values for t Tests
Just as the P-value for a z test is a z curve area, the
P-value for a t test will be a t-curve area.
Figure 8.10 illustrates the three different cases.
The number of df for the one-sample t test is n – 1.
P-values for t tests
Figure 8.10
67
P-Values for t Tests
cont’d
P-values for t tests
Figure 8.10
68
Download