Powerpoint - Marshall University Personal Web Pages

advertisement
Marshall University School of Medicine
Department of Biochemistry and Microbiology
BMS 617
Lecture 6 – Multiple comparisons,
non-normality, outliers
Marshall University Genomics Core Facility
Analyzing data without a plan
• The framework for hypothesis testing assumes
all aspects of the experimental design are
defined before the experiment and the
analysis are performed
– Not doing this can invalidate the interpretation of
a p-value
– Easy trap to fall into
• This happens a lot!
Marshall University School of Medicine
Multiple comparisons
• Previous lecture discussed multiple
comparisons and their effect on interpretation
of p-values
– In that context, the multiple comparisons were
part of the experimental design
• In this case, can correct for them or otherwise analyze
the data appropriately
• Not having a complete plan can introduce
multiple comparison effects in an uncontrolled
manner
Marshall University School of Medicine
Examples of multiple comparisons
• Trying multiple statistical tests for the same data set
– “A t-test doesn’t give me significance… let’s separate into
three groups instead and try an ANOVA”
• Trying multiple algorithmic implementations of the
same test
• In multiple regression (we will discuss this later…),
choosing to include or exclude different independent
variables
– For complex data sets, trying enough approaches will
almost always result in a “statistically significant” result
Marshall University School of Medicine
Sequential Analyses
• Another common approach is to try an
experiment, and if the result is not statistically
significant, to then repeat it with additional
samples or experimental replicates
– Another form of multiple comparisons
– Problem with this approach is that it is biased towards
a statistically significant result
• Stop experimenting if a result is statistically significant
• Continue experimenting otherwise
• In theory you can always get a statistically significant result
with this approach
– Though it may take a very long time…
Marshall University School of Medicine
Publication Bias
• Remember, the interpretation of a p-value is the probability of
observing data at least as extreme as the data observed, assuming
the null hypothesis is true
– This is not the same as the probability the null hypothesis is true
• Most p-values we see are in journal articles
• There is a strong preference to publish results which are
“statistically significant,” i.e. which have p < 0.05
• Some of these results are “real” and some are false positives
• Because the publications are selected based on the p-value, the
interpretation of a p-value in published results is skewed
– If we assume the null hypothesis is true, and the result was published,
the probability of a false positive can be much higher than 5%
Marshall University School of Medicine
Normally distributed data
• Many of the statistical tests we will study rely on
the assumption that the data were sampled from
a normal distribution
• How reasonable is this assumption?
• The normal distribution is an ideal distribution
that likely never exists in reality
– Includes arbitrarily large values and arbitrarily small
(negative) values
• However, simulations show that most tests that
rely on the assumption of normality are robust to
deviations from the normal distribution
Marshall University School of Medicine
The ideal normal distribution
• Image shows data sampled from a theoretical normal distribution
• Uses a very large sample size
• Close approximation to theoretical distribution
Marshall University School of Medicine
Samples from a normal distribution
Marshall University School of Medicine
Tests for normality
• It is possible to perform tests to see if the
sample data are consistent with the
assumption that they were sampled from a
normal distribution
– Unfortunately, this is not what we really want to
know…
– Would really like to know if the distribution is
close enough to normal for the test we use to be
useful
Marshall University School of Medicine
Tests for normality
• A test for normality is a statistical test for
which the null hypothesis is
The data were sampled from a normal
distribution
• Common normality tests include
– D’Agostino-Pearson omnibus K2 normality test
– Shapiro-Wilk test
– Kolmogorov-Smirnov test
Marshall University School of Medicine
D’Agostino-Pearson omnibus K2
normality test
• The D’Agostino-Pearson omnibus K2 normality
test works by computing two values for the data
set:
– The skewness, which measures how far the data is
from being symmetric
– The kurtosis, which measures how sharply peaked the
data is
• The test then combines these to a single value
that describes how far from normal the data
appear to lie
– Computes a p-value for this combined value
Marshall University School of Medicine
Problem with normality tests
• If the p-value for a normality test is small, the interpretation is:
– If the data were sampled from an ideal normal distribution, it is
unlikely the sample would be this skewed and/or kurtotic
• If the p-value for a normality test is large, then the data are not
inconsistent with being sampled from a normal distribution
• However…
– If the sample size is large, it is possible to get a small p-value even for
small deviations from the normal distribution
• Data are likely sampled from a distribution that is close to, but not exactly,
normal
– If the sample size is small, it is possible to get a large p-value even if
the underlying distribution is far from normal
• Data do not provide sufficient evidence to reject the null hypothesis…
– Useful to examine the values for skewness and kurtosis as well as the
p-value
Marshall University School of Medicine
Skewness and kurtosis
Marshall University School of Medicine
Interpreting skewness and kurtosis
• The real question we would like to answer is
– How much skewness and kurtosis are acceptable?
– Difficult to answer…
• In general, interpret a skewness between -0.5 and 0.5
as being approximately symmetric
– Between -1.0 and -0.5, or 0.5 and 1.0 is moderately skewed
– Less than -1.0 or more than 1.0 is highly skewed
• For kurtosis, values between -2 and 2 are generally
accepted as being “within limits”
– Outside this is evidence the distribution is far from normal
Marshall University School of Medicine
What to do if the data fail a test for
normality
• If the data fail a test for normality, the following options are
available
– Can the data be transformed to data that come from a normal
distribution?
• For example, if the data are negatively skewed, transforming to logs
may give normally distributed data
– Are there a small number of outliers that are causing the data to
fail a normality test?
• Next section discusses outliers
– Is the departure from normality small? I.e. are the skewness and
kurtosis “small”. If so, your statistical tests may still be accurate
enough
– Use a test that does not assume a normal distribution (a nonparametric test)
Marshall University School of Medicine
Non-parametric tests
• The most common statistical tests assume the
data are sampled from a normal distribution
– T-tests, ANOVA, Pearson correlation, etc
• Some other tests do not make this assumption
– Mann-Whitney test, Kruskal-Wallis test, Spearman
correlation, etc
• However, these tests have (much) lower
statistical power than their parametric
equivalents when the data are normally
distributed
Marshall University School of Medicine
Choosing non-parametric tests
• When running a series of similar experiments, all data
should be analyzed the same way
– Use normality tests to choose the statistical test for all
experiments together
– Following “common practice” is acceptable…
– Ideally, run one experiment just to determine whether the
data look like they come from a normal distribution
• For small data sets
– A test for normality does not tell you much
• Not likely to get a small p-value anyway
– Violations of the normality assumption are more egregious
– Non-parametric tests have very low statistical power
Marshall University School of Medicine
Outliers
• Outliers are values in the data that are “far” from the other values
• Occur for several reasons:
– Invalid data entry
– Experimental mistakes
– Random chance
• In any distribution, some values are far from the others
– In a normal distribution, these values are rarer, but still exist
– Biological diversity
• If your samples are from patient or animal samples, the outlier may be
“correct” and due to biological diversity
– May be an interesting finding!
– Wrong assumptions
• For example, in a lognormal distribution, some values are far from the others
Marshall University School of Medicine
Why test for outliers
• Presence of erroneous outliers, or assuming
the wrong distribution, can introduce spurious
results or mask real results
• Trying to detect outliers without a test can be
problematic
– We tend to want to observe patterns in data
– Anything that appears to be counter to these
patterns seems to be an outlier
– We tend to see too many outliers
Marshall University School of Medicine
Before testing for outliers
• Before testing for outliers:
– Check the data entry
• Errors here can often be fixed
– Were there problems with the experiment?
• If errors were observed during the experiment, remove data
associated with those errors
• Many experimental protocols have quality control measures
– Is it possible your data is not normally distributed
• Most outlier tests assume the (non-outlier) data is normally
distributed
– Was there anything different about any of the samples
• Was one of the mice phenotypically different, etc?
Marshall University School of Medicine
Outlier tests
• After addressing the concerns on the previous
slide, if you still suspect an outlier you can run
an outlier test
• Outlier tests answer the following question:
If the data were sampled from a normal
distribution, what is the chance of observing
one value as far from the others as is in the
observed data?
Marshall University School of Medicine
Results of an outlier test
• If an outlier test results in a small p-value, then
the conclusion is that the outlying value is
(probably) not from the same distribution as the
other values
– Justifies excluding it from the analysis
• If the outlier test results in a high p-value, there is
no evidence the value came from a different
distribution
– Doesn’t prove it did come from the same distribution,
just that there is no strong evidence to the contrary
Marshall University School of Medicine
Guidelines on removing outliers
• If you address all the previous concerns, and
an outlier test gives strong evidence of an
outlier, then it is legitimate to remove it from
the analysis
– The rules for eliminating outliers should be
established before you generate the data
– You should report the number of outliers removed
and the rationale for doing so in any publication
using the data
Marshall University School of Medicine
How outlier tests work
• Outlier tests work by computing the
difference between the extreme value and
some measure of central tendency
• That value is typically divided by a measure of
the variability
• Resulting ratio is compared with a table or
expected distribution of those values
Marshall University School of Medicine
Grubb’s outlier test
• Grubb’s outlier test calculates the difference
between the extreme value and the mean of
all values (including the extreme value), and
divides by the standard deviation
• Resulting value is then compared to a table of
critical values
– Critical value depends on the sample size
– If the value is larger than the critical value, then
the extreme value can be considered an outlier
Marshall University School of Medicine
Download