here - BCIT Commons

advertisement
MATH 2441
Probability and Statistics for Biological Sciences
A Non-Statistical Example of Hypothesis Testing
The formalism and rationale behind the hypothesis testing procedure introduced in the previous document is
sometimes difficult to understand at first because it seems to be artificially complicated. Many writers have
pointed out the parallel between the statistical hypothesis testing procedure and the trial procedures
followed in our criminal justice system, and looking at the parallel features between the two may help you
understand a bit better how the pieces of the statistical hypothesis testing procedure fit together, and why
they've been set up the way they are.
(One of the most well-known descriptions of this parallel is given by James M. Kenney in an article on page
55 of the January 1988 issue of Quality Progress, a monthly publication of the American Society for Quality.
Kenney's article, called "Hypothesis Testing: Guilty or Innocent", is somewhat wider ranging than the
discussion below, raising a few statistical concepts we haven't looked into yet. However, you might find it
interesting to read through one of the paper copies of the article available in class.)
We will use the "generic" symbols introduced at the end of the previous document to denote the elements of
statistical hypothesis tests. In particular,
 stands for a population parameter (eg. , , , 1 - 2, 1 - 2, etc.)
0 stands for a specific hypothesized numerical value of 
f is the symbol for the standardized test statistic (e. g. z, t, 2, etc.)
The symbol f stands for the value of f which cuts of a right-hand tail of area . Then, the symbol f1 - 
stands for the value of f that cuts of a left-hand tail of area . In the table to follow, the commentary refers to
a two-tailed statistical hypothesis test, but with appropriate minor modifications, the points raised could be
illustrated with reference to single-tailed tests as well.
concept
statistical hypothesis testing
criminal justice system
null hypothesis
H0:  = 0
"the defendant is innocent"
alternative hypothesis
HA:   0
"the defendant is guilty"
nature of experiment
A random sample of the population is
collected. Various summary quantities
of that sample are used as estimators
of corresponding population
summaries and/or estimates of
parameters in the sampling
distribution. With an adequate
sampling plan, the data obtained for
the random sample is believed to
reflect the properties of the entire
population. A standardized test
statistic, f, is computed, a measure of
the degree to which the data is
inconsistent with the null hypothesis.
"Data" in the form of observations
(visual, verbal, etc.) is collected,
organized, summarized. The case that
the prosecution presents is intended to
demonstrate that these observations
are inconsistent with the defendant's
claim of innocence.
rejection region
The null hypothesis is rejected (and
thereby support for the alternative
hypothesis is inferred) if either f > f/2
or f < f1 - /2. If this rejection criterion is
not met, the alternative hypothesis is
assumed to be unsupported.
The prosecution must present enough
evidence in contradiction to the null
hypothesis that the judge or jury is
convinced of the guilt of the defendant
beyond "reasonable doubt."
If the data from the sample is
ambiguous, the result of the
© David W. Sabo (2000)
The default outcome is "the defendant
is innocent," the null hypothesis. Only
if overwhelming evidence to the
A Non-Statistical Example of Hypothesis Testing
Page 1 of 5
hypothesis test is "no conclusion" -cannot say that HA is supported.
contrary is presented will the null
hypotheses be rejected by the judge or
jury, and a finding of "guilty" occur.
The null hypothesis is rejected and the
alternative hypothesis is supported
only by the presentation of
observations contradicting the null
hypothesis at some pre-determined
level of significance. The burden of
proof is on the experiment to provide
clear evidence in support of HA before
a conclusion can be declared.
If the prosecution is unable to present
evidence of the defendant's guilt, the
defendant is not required to present
any defense -- the verdict is
automatically "innocent" unless clear
evidence of guilt is presented. We
say, the burden of proof is on the
prosecution to make a case for guilt -the role of the defense is not to prove
innocence, but to challenge the
evidence presented for guilt.
type 1 error
The null hypothesis is rejected even
though the alternative hypothesis is not
correct. This means that rejection of
the null hypothesis is not absolute
proof that the alternative hypothesis is
correct -- there is a chance that an
error has been made. Just by
coincidence, the random sample may
misrepresent the population.
However, the rejection criterion has
been set up to control the probability of
such an error occurring.
An innocent defendant is convicted.
This means that conviction of the
defendant does not establish
absolutely that the defendant is truly
guilty. Errors can occur if inadvertently
misleading evidence is presented and
not effectively challenged. However,
the principles and procedures followed
in the investigation and trial are
designed to control the probability of
such an error occurring.
type 2 error
The null hypothesis is not rejected
even though the alternative hypothesis
is correct. Thus, failing to reject the
null hypothesis does not mean that HA
can be said to be absolutely certainly
untrue. It just means that there has
not been adequate evidence presented
to support the conclusion HA.
A guilty defendant is acquitted.
Acquittal doesn't necessarily mean that
the defendant is innocent -- it may
mean that the prosecution was simply
unable to find adequate evidence of
the defendant's guilt.
The principles and procedures of our criminal justice system have evolved over centuries to better and
better ensure that verdicts are based on a logical and reliable decision process. The more abstract
formalism we've described for statistical hypothesis testing illuminates some of the less immediately obvious
motivations for certain aspects of the criminal justice system. At the same time, the more intuitive features
of the criminal justice system (at least as portrayed so familiarly by Ben Matlock or Perry Mason or their
more recent incarnations) give insight to the elements of the statistical hypothesis testing procedure. Note
for example, the following few observations:
"Burden of Proof"
The notion of "burden of proof" is a useful one in interpreting the result of a hypothesis test. In a criminal
trial structured along the lines described above, it is the responsibility of the prosecutor to provide evidence
of the defendant's guilt. It is not necessary for defendants to provide evidence of their innocence. (Of
course, you could organize your "justice" system so that the accused is assumed guilty until they can
provide evidence of their innocence, and apparently this is the approach in some societies. You can see
that that would lead to a quite different legal system.) While it may sometimes be in the defendant's interest
to provide evidence of their innocence as a way to counter the prosecution's case, the prosecution has the
"burden" of making a case in the first place. If the prosecution cannot provide strong evidence of guilt, the
defendant is acquitted, regardless of whether they committed the crime or not and regardless of whether
they present any explicit defense or not at the trial.
Page 2 of 5
A Non-Statistical Example of Hypothesis Testing
© David W. Sabo (2000)
Similarly in statistical hypothesis testing, it is up to the statistician to make the case for the alternative
hypothesis by providing evidence that is clearly inconsistent with the null hypothesis in the appropriate way.
If you state a conclusive result, it must be because the corresponding alternative hypothesis has been
supported through rejection of a null hypothesis.
This means that conclusive results of hypothesis tests are always supported by direct evidence. They are
not the result of lack of evidence for some other possibilities. For example, suppose the hypotheses
H0:  = 65 ppm
vs
HA:  < 65 ppm
are tested, and the data does not allow the rejection of H0. What you can say is that "our evidence does not
allow us to conclude that the population mean is less than 65 ppm." What you cannot say in this instance is,
"since we could not reject H0, we cannot say that the mean is less than 65 ppm [correct so far] so this must
indicate that the mean is greater than 65 ppm [not correct at all!]." The fact that you cannot prove that the
mean is less than 65 ppm is not evidence that the mean is greater than 65 ppm.
To be able to declare that the mean is greater than 65 ppm in the instance just described, you would have to
set up the hypothesis test:
H0:  = 65 ppm
vs
HA:  > 65 ppm
and obtain data that allowed you to reject H0. This is what we mean by saying conclusive results must be
directly supported by evidence rather than based on the lack of evidence for some contrary result.
At the risk of belaboring the point, you need to realize that this opens the door for some flexibility in the way
a hypothesis test is set up, and the potential for stating results in a misleading manner. To be a bit more
concrete, we'll refer to the SalmonCa0 question used to illustrate the hypothesis testing procedure in the
preceding introductory document. At issue was the relationship of the mean calcium content of unsanitized
salmon fillets to the value 65 ppm. The points here can be made by considering just one-tailed tests.
H0:  = 65 ppm
HA:  > 65 ppm
The sample mean would have to
be a statistically significant
amount greater than 65 ppm.
H0:  = 65 ppm
HA:  < 65 ppm
The sample mean would have to
be a statistically significant
amount less than 65 ppm.
2. For what kind of populations
will H0 most likely be rejected?
The true mean value of those
populations tend to be markedly
larger than 65 ppm .
The true mean value of those
populations tend to be markedly
smaller than 65 ppm.
3. How can you state the
conclusion when the data allows
rejection of H0?
The data supports the conclusion
that the population mean is
greater than 65 ppm at a level of
significance of … .
The data supports the conclusion
that the population mean is less
than 65 ppm at a level of
significance of … .
4. What sort of evidence
(experimental data) will result in
H0 not being rejected?
The sample mean may be larger
than 65 ppm, but not large
enough to allow rejection of H0 at
an acceptable level of
significance.
The sample mean may be smaller
than 65 ppm, but not small
enough to allow rejection of H0 at
an acceptable level of
significance.
5. For what kind of populations
will H0 most likely not be rejected?
The true population mean is either
less than 65 ppm, or only
marginally greater than 65 ppm.
The true population mean is either
greater than 65 ppm, or only
marginally less than 65 ppm.
Hypotheses:
1. What sort of evidence
(experimental data) will result in
rejection of H0?
© David W. Sabo (2000)
A Non-Statistical Example of Hypothesis Testing
Page 3 of 5
Hypotheses:
6. How can you state the
conclusion when the data does
not allow rejection of H0?
7. Suppose that it was in the
interests of the researcher to find
that the mean calcium
concentration in the salmon fillets
was greater than 65 ppm. How
would the researcher state the
result of the study if the null
hypothesis could be rejected?
8. Suppose that it was in the
interests of the researcher to find
that the mean calcium
concentration in the salmon fillets
was greater than 65 ppm. How
would the researcher state the
result of the study if the null
hypothesis could not be rejected?
H0:  = 65 ppm
HA:  > 65 ppm
"The data is not adequate to
support a conclusion that the true
population mean is greater than
65 ppm."
H0:  = 65 ppm
HA:  < 65 ppm
"The data is not adequate to
support the conclusion that the
true population mean is less than
65 ppm."
This is not the same thing as
saying that the data indicates the
true population mean is less than
65 ppm.
"The data definitely supports the
claim that the mean calcium
concentration in the salmon fillets
is greater than 65 ppm."
This is not the same thing as
saying that the data indicates the
true population mean is greater
than 65 ppm.
"aw shucks!"
"Our data is inconclusive on the
issue of whether the mean
calcium concentration is greater
than 65 ppm."
"The data does not indicate that
the mean calcium concentration in
the fillets is definitely less than 65
ppm."
or
"We do not have definite evidence
that the mean calcium
concentration is less than 65
ppm."
By the way, if for some reason a two-tailed hypothesis test was done,
H0:  = 65 ppm
HA:   65 ppm
the answers to questions 1, 2, 4, and 5 would be the amalgamation or combination of the two answers to
each question above. This is because the two-tailed rejection region consists of two (smaller) one-tailed
regions, and the standardized test statistic will inevitably fall in one tail or the other if you are to be able to
reject H0. The only difference is that rejection of H0 for a two-tailed test essentially requires data (and so
underlying populations) which are considerably more inconsistent with H 0 than do one-tailed tests because
the two "single tails" making up the rejection region are smaller than the single tail of a one-tailed test using
the same value of . In answer to question 3 for the two-tailed test, we could say "The data supports the
conclusion that the population mean is different from 65 ppm." In answer to question 6, we could say, "The
data is consistent with the population mean being equal to 65 ppm."
Relationship Between Type 1 Error Probability and Type 2 Error Probability
Recall the observation we made previously that for a given amount of data (i.e. sample size), any
modification of the rejection region to decrease the probability of making a type 2 error inevitably resulted in
an increase in the probability of making a type 1 error. Without increasing the sample size, you cannot
adjust the rejection region to simultaneously decrease the values of both  and .
The corresponding feature of the criminal justice system is this. If you create trial rules that require the
prosecution to present very strong evidence of guilt before a defendant can be convicted, then you will have
a system in which it is very improbable that an innocent defendant will be falsely convicted. In such a
system, the probability of a type 1 error (convicting an innocent defendant) will be very small. However, in
such a system, it would also be more difficult to convict a guilty defendant , and so it would be more
Page 4 of 5
A Non-Statistical Example of Hypothesis Testing
© David W. Sabo (2000)
probable that a guilty defendant will be acquitted (a type 2 error) due to lack of adequate evidence. Setting
very high standards of evidence for guilt will hinder the mistaken conviction of innocent defendants, but at
the cost of increasing the frequency of acquittal of guilty defendants. A lenient system is less likely to
convict an innocent defendant, but it is also less likely to convict a guilty defendant.
Similarly, if you relax the rules of evidence to make it easier for the prosecution to obtain a verdict of guilty,
then while you'll increase the likelihood of conviction of truly guilty defendants (i.e., the probability of a type 2
error will decrease), you will also be making it easier for innocent defendants to be convicted (the probability
of making type 1 errors will increase). A severe system is more likely to convict guilty defendants, but it is
also more likely to convict innocent defendants.
The Worst Possible Error
Recall the principle stated earlier that one way to decide on an appropriate hypothesis test is to choose
those hypotheses for which the type 1 error is the most serious error possible in the situation. We can see
how this principle is realized in the way our criminal justice system is organized.
In Canada and the United States, criminal trials amount to testing the hypotheses
H0: the defendant is innocent
HA: the defendant is guilty
(CRIM - 1)
This is the "hypothesis testing" version of the principle: the defendant is presumed innocent until "proven"
guilty. People say this system is based on "the presumption of innocence." For these hypotheses, the
type 1 error amounts to erroneously convicting an innocent defendant. The type 2 error would be
erroneously acquitting a guilty defendant.
In principle, one could create a system organized in the opposite way:
H0: the defendant is guilty
HA: the defendant is innocent
(CRIM - 2)
Now, the simple act of charging a person with a crime is taken to be sufficient to presume their guilt. It is up
to the defendant to provide adequate evidence of their innocence. We could call this fundamental principle
"the presumption of guilt." In such a system, the type 1 error would be to acquit a guilty defendant, and the
type 2 error would be to convict an innocent victim.
You can see that the criminal justice system based on (CRIM - 1) views the conviction of an innocent
defendant to be a more serious error than the acquittal of a guilty defendant. That is the view held by most
in our society. In our society, the debate is usually not over the presumption of innocence (i.e., the choice of
the basic hypotheses to test) as it is over the standard of evidence required for conviction (which is the
analog of the value of , the probability of making a type 1 error). When people say, "the courts are too
soft," they are not usually advocating a switch from (CRIM - 1) to (CRIM - 2)  that we automatically
presume the defendants to be guilty until proven otherwise  but that the degree of evidence required to
obtain a conviction may be too strict. They are criticizing the analog of "" used in the trial, not the structure
of the "hypothesis test" itself.
© David W. Sabo (2000)
A Non-Statistical Example of Hypothesis Testing
Page 5 of 5
Download