Van-Griner Publishing Company
Diagnostic tests, such as a field sobriety tests and home pregnancy test, have to choose between two outcomes.
• A “positive” outcome means the test has uncovered adequate evidence of what it is designed to find.
• A “negative” outcome means the test does not have adequate evidence of what it is designed to find.
Many experiments in medicine, social science, education, etc. are designed to make similar bimodal choices.
• A “positive” outcome means the experiment has uncovered adequate evidence that the treatment is effective.
• A “negative” outcome means the experiment has not found adequate evidence that the treatment is effective.
From TIME :
The Flibanserin findings are based on the study of 1,378 premenopausal women who had been in a monogamous relationship for 10 years on average. The women were randomly assigned to take 100 mg of Flibanserin or a placebo daily and to record daily whether they had sex and whether it was satisfying.
In this case:
Flibanserin is no better than a
Placebo or
Flibanserin is better than a Placebo
Generally:
Treatment is Not Effective or
Treatment is Effective
The paradigm for deciding between “Treatment is Not
Effective” and “Treatment is Effective” is an example of what statisticians call “hypothesis testing.”
This diagnostic test is not a kit or a physical examination.
Rather, it consists of a collection of mathematical steps.
Can’t make this decision risk free. There are two potential mistakes:
Experimental Results
Treatment Ineffective
Treatment Effective
Treatment Really is Ineffective
True Negative
False Positive
Truth
Treatment Really is Effective
False Negative
True Positive
Generally, sensitivity and specificity are used to evaluate how well a screening test performs. That evaluation informs our confidence in results produced by the test.
Statistical science tends to focus on specificity for a similar role in hypothesis testing.
This is partly because the sensitivity of most common hypothesis testing procedures is pretty good.
Data from
Test
Subjects
Truth from Gold
Standard
Based on Some
Rule
Test
Actual Status
Prediction Negative Positive
Negative A B
Positive C D
Compute Sensitivity and
Specificity
Apply the Test to a
Real Person
Get Yes/ No Result
More likely to believe a
“Yes” if the Specificity is high; a “No” if the
Sensitivity is high.
Data from
Experimental
Subjects
A Truth is Hypothesized
Adopt Awkward Rule:
“ Based on the data from the experiment, say the treatment is effective.” This is like an automatic
“YES”
If FPR is small enough, accept the
“YES” and conclude treatment is
Compute False
Positive Rate for effective.
Awkward Rule.
Else : don’t trust the recommended
“YES” and conclude that the treatment is not effective.
If the estimated false positive rate (FPR) for deciding between “Treatment is Not Effective” and “Treatment is
Effective” is low enough – typically less than 0.05 - the results of the experiment are said to be statistically significant .
Testing a hypothesis in the present context means choosing between
H0: Treatment is Not Effective the “null” hypothesis and
HA: Treatment is Effective the “alternative” hypothesis
To make the choice, we have to compute an estimated false positive rate and compare it to 0.05. If the estimated FPR is smaller than 0.05, choose HA. Else, choose H0.
In hypothesis testing the estimated false positive rate is more commonly called a p-value .
That stands for “probability value”.
Statistical science, particularly statistical inference, is a very complex endeavor. In this presentation we have purposely avoided discussing a few things, including:
• The distinction between two very different approaches to hypothesis testing due to Fisher and Neyman-Pearson.
• The difference between a p-value and a Type I error rate.
• The real and important distinction between “Accepting H0” and
“Failing to Reject H0”.
Your instructor may want to offer more details.
Statistical hypothesis testing amounts to a screening test that chooses between a null hypothesis and an alternative hypothesis based on the size of the estimated false positive rate.