Lecture_6_Framing and testing hypothesis

advertisement
Framing and testing
hypotheses
Hypotheses
• Potential explanations that can account for
our observations of the external world
• They usually describe cause and effect
relationships
Collecting observations is a means
to the understanding of a cause
Observations from
• Manipulative experiments
• Observational or correlative studies
Hypothesis
• Suggested by the
data
• Existing body of
scientific literature
• Predictions of
theoretical models
• Our own intuition and
reasoning
A valid scientific hypothesis
• Must be testable
• Should generate novel predictions
• Should provide a unique set of predictions
that do not emerge from other
explanations
Scientific method
• Is the technique used
to decide among
hypotheses on the
basis of observations
and predictions
Deduction and induction
• Deduction proceeds from the general case
to the specific case: “certain inference”
• Induction proceeds from the specific case
to the general case: “probable inference”
• Both induction and deduction are used in all models of
scientific reasoning, but they receive different emphasis
Statistics
• It is an inductive process: we are trying to
draw general conclusions based on a
specific, limited sample
The inductive method
Initial observation
Prediction
suggests
generates
hypothesis
NO, modify
hypothesis
experiments and data
New observations
Do new
observations match
predictions?
YES,
confirm
hypothesis
“Accepted
truth”
Advantages of the inductive
method
• It emphasizes the link between data and
theory
• Explicitly builds and modifies the
hypothesis based on previous knowledge
• It is confirmatory (we seek data that
support the hypothesis)
Disadvantages of the inductive
method
• Considers only a single starting hypothesis
• Derives theory exclusively from empirical
observations; “some important hypotheses
have emerged well in advance of the
critical data that are needed to test them”
• Places emphasis on a single correct
hypothesis, making it difficult to evaluate
cases in which multiple factors are at
work.
The null hypothesis
• Is the starting point of a scientific
investigation
• It tries to account for patterns in the data in
the simplest way possible, which often
means initially attributing variation in the
data to randomness or measurement error
How do we generate an
appropriate null hypothesis?
• Example:
• The photosynthetic
response of leaves to
increases in light
intensity
Each point represents a
different leaf for which we
record the light intensity (x
axis, predictor variable)
and the photosynthetic rate
(y axis, response variable)
Simplest null hypothesis
is that there is no
relationship between the
two variables
The Michaelis-Menten equation
• Notice that if X is large
compared to D, X/(D + X)
approaches 1. Therefore,
the rate of product
formation (k) is equal to Y
in this case.
• When X equals D, X/(D +
X) equals 0.5. In this
case, the rate of product
formation is half of the
maximum rate (1/2 k). By
plotting Y against X, one
can easily determine
Ymax (k) and D.
kX
Y
(D  X )
Using our knowledge about
plant physiology, we can
formulate a more realistic initial
hypothesis
The Michaelis-Menten
equation [Y=kX/(D+X)],
where k =asymptotic
assimilation rate, and D
=half saturation constant
Real data could be used to
test the degree of support
for this more realistic
hypothesis against other
alternatives
The Hypothetico-Deductive Method
• Championed by the
philosopher of science Karl
Popper (1902-1994)
• The goal of these tests is not
to confirm, but to falsify, the
hypothesis
• The accepted scientific
explanation is the hypothesis
that successfully withstands
repeated attempts to falsify it
The Hypothetico-Deductive Method
Initial observation
suggests
hypothesis
Prediction A
hypothesis
hypothesis
Prediction B
Prediction C
New observations
NO, falsify
hypothesis
Do new
observations match
predictions?
hypothesis
Prediction D
YES, repeat
attempts to
falsify
Multiple
failed
falsifications
“Accepted
truth”
Advantages of the HypotheticoDeductive Method
• It forces a consideration of multiple
working hypotheses right from the start
• It highlights the key predictive differences
between them
• The emphasis on falsification tends to
produce simple, testable hypotheses, so
that parsimonious explanations are
considered first and more complicated
mechanisms only later.
Disadvantages of the HypotheticoDeductive Method
• Multiple working hypotheses may not always be
available, particularly in the early stages of
investigation
• Even if multiple hypotheses are available, the
method does not really work unless the “correct”
hypothesis is among the alternatives
• Places emphasis on a single correct hypothesis,
making it difficult to evaluate cases in which
multiple factors are at work.
Testing Statistical Hypotheses
• Statistical hypothesis versus Scientific
hypothesis
• We use statistics to describe pattern in our
data, and then we use statistical tests to
decide whether the predictions of an
hypothesis are supported or not
The Scientific Method
• Establishing hypotheses
• Articulating predictions
• Designing and executing valid
experiments
• Collecting data
• Organizing data
• Summarizing data
• Statistical tests
Statistical hypothesis versus
Scientific hypothesis
• Accepting or rejecting a statistical
hypothesis is quite distinct from accepting
or rejecting a scientific hypothesis.
• The statistical null hypothesis is usually
one of “no pattern”, such as no difference
between groups or no relationship
between two continuous variables.
Statistical hypothesis versus
Scientific hypothesis
• In contrast, the alternative hypothesis
is that pattern exists.
• You must ask how such patterns relate to
the scientific hypothesis you are testing
• The absence of evidence is not
evidence of absence; failure to reject a
null hypothesis is not equivalent to
accepting a null hypothesis
The statistical null hypothesis
A typical statistical null hypothesis is
that “differences between groups are
no greater than we would expect due
to
random variation”
The statistical alternative
hypothesis
• Once we state the statistical null
hypothesis, we then define one or more
alternatives to the null hypothesis
• The alternative hypothesis is focused
simply on the pattern that is present in the
data
• The investigator “infers” the
mechanism from the pattern, but that
inference is a separate step
• The statistical test merely reveals whether
the pattern is likely or unlikely, given that
the null hypothesis is true.
• Our ability to assign causal mechanisms to
those statistical patterns depends on the
quality of our experimental design and our
measurements
• An important goal of a good
experimental design is to avoid
confounded designs
Statistical significance and
P-values
• In many statistical analyses, we ask
whether the null hypothesis of random
variation among individuals can be
rejected
• A statistical P-value measures the
probability that observed or more extreme
differences would be found if the null
hypothesis were true. P(data|Ho)
What determines the P-value?
•
The calculated P-value depends on three
things:
1. The number of observations in the
samples (n)
2. The differences between the means of
the samples
3. The level of variation among individuals
When is a P-value small enough?
• This is a judgment call, as there is no
natural critical value below which we
should always reject the null hypothesis
and above which we should never reject it.
• Convention: P<0.05 (1/20)
When is a P-value small enough?
• Perhaps the strongest
argument in favor of
requiring a low critical
value is that we
humans are
psychologically
predisposed to
recognizing and
seeing patterns in our
data, even when they
don’t exist!
Decision Errors
Because we have incomplete and imperfect information,
there are four possible outcomes when testing a H0:
1.
2.
3.
4.
When we correctly reject a false H0
When we correctly retain a true H0
When we mistakenly reject a true H0
(Type I Error)
When we mistakenly retain a false H0
(Type II Error)
Decision Errors
Type I Error
If we falsely reject a null hypothesis that is true, we have
made a false claim that some factor above and beyond
random variation is causing patterns in our data.
In environmental impact assessment would be a “false +”
It is signified by the greek letter: α (alpha)
This error only occurs when the H0 is indeed true.
Generally, this is the most concerning error because it
misleads us into believing that our results are significant
when they are not. “Producer error”
Type I Error
Type II Error
This error occurs when there are systematic differences
between the groups being compared, but the investigator
has failed to reject the null hypothesis and has concluded
incorrectly that only random variation among observations
is present.
In environmental impact assessment would be a “false -”
It is signified by the greek letter: β (Beta)
This error only occurs when the H0 is false.
A Type II error will mislead you into thinking that there is no
significant effect happening, when in actuality there is.
Depending on the experimental design, this type of error
can be just as damaging (e.g. environmental impact
surveys, medical diagnosis, etc). “Consumer error”
Type II Error
Power
• (1-β): equals the probability of correctly
rejecting the null hypothesis when is false
• Ideally, we would like to minimize both
Type I and Type II errors in our statistical
inference. However strategies designed to
reduce Type I error inevitably increase the
risk of Type II error, and vice versa.
Power
•
1.
2.
3.
4.
Although Type I and Type II errors are inversely
related to one another, there is no simple
mathematical relationship between them,
because the probability of a Type II error
depends on:
The alternative hypothesis
How large an effect we hope to detect
Sample size
Wisdom of our experimental design and
sampling protocol
The relationship between
Type I and Type II errors
Estimating Power
Power 
ES    n

• ES is effect size we wish to detect, n is sample size, α
is the significance level, and σ is the standard
deviation between sampling or experimental units
• R. Lenth provides free online software to assist in a
priori power analysis for various statistical tests:
http://www.stat.uiowa.edu/~rlenth/Power/
Parameter estimation and
prediction
• Rather than try to test multiple
hypotheses, it may be more worthwhile to
estimate the relative contributions of each
to a particular pattern.
• In such cases, rather than ask whether a
particular cause has some effect versus
no effect, we ask what is the best estimate
of the parameter that expresses the
magnitude of the effect
Download