Stats 1

advertisement
BioStatistics
Why Statistics?
You want to make the strongest conclusions
based on limited data
Differences in biological systems sometimes
cannot be easily observed
Random variation?
Real difference?
Statistics sometimes are
Unnecessary
Large differences in observed events
And small scatter within groups
In most instances, though, the use of statistics
can provide you with mathematically-based
conclusions
Clinical research
Field research
Statistics extrapolate from
sample to population
The only way to draw absolute conclusions about
a population is to measure the trait(s) of interest
of every individual in that population
The reality is, this is almost always impossible to
do
Thus, randomly sampling some of the individuals
can provide information about the entire
population
Sometimes random sampling can be difficult to
define
If your sample is not random, then conclusions
Samples and Populations
Quality control
A company manufactures 20,000 vials
(population) of a vaccine from a single
production run
About 50 vials (samples) are taken from this
production run and analyzed for a variety of
characteristics
The results on 50 vials are then extrapolated to
the remaining vials
Samples and Populations
Political polls
The number of eligible U. S. voters is about
125,000,000 (population)
A few hundred or thousands (sample) are
asked to respond to political questions
Samples and Populations
Clinical studies
Patients in a clinical study (sample) have a
clinical condition (e.g., disease)
They rarely reflect the entire population
However, they often reflect the population
with the condition
Sampling humans can be particularly difficult
Samples and Populations
Field experiments
Local variations
Impact of weather
Environmental conditions/changes
Human impact
Sampling bias
Samples and Populations
Laboratory experiments
Usually not necessary
Highly-controlled experiments
Single variable
Genetically-defined organisms
Very little variation
What statistical calculations
can do
Statistical estimation
Calculation of a mean within a population is a precise number
However, the number is only an estimate of the whole population
Statistical hypothesis testing
Helps determine if an observed difference is due simply to random chance
Provides a P value; if P is small, the difference is unlikely due to random
chance and the conclusion is statistically significant
Statistical modeling
Tests how well experimental data fit a mathematical model
The most common form of statistical modeling is linear regression
LR usually determines the best straight line through a set of data points
What statistical calculations
cannot do
Analysis of a simple experiment
Define a population you are interested in
Randomly select a sample of subjects to study
Randomly split the sample subjects into two
groups
One group gets one treatment
The other group gets another treatment
Measure a single variable trait in each subject
Use statistical tests to determine if there’s a
difference between the groups
What statistical calculations
cannot do
The problems with real experiments
Populations can be more diverse than your samples
Samples are collected on convenience, rather than randomly
The measured value is proxy value for what you’re really
interested in
Errors in data collection
Record data incorrectly
Assays may not report what you think they report
You need to combine different types of measurements to
reach an overall conclusion (multiple variables)
Why statistics are difficult to
learn
Deceptive terminology (significant, error,
hypothesis)
Statistical conclusions are never absolute
(statistically significant)
Statistics uses abstract concepts (populations,
probabilities)
Statistics are at the interface of math and science
Many statistical calculations require complex math
Variables
Independent variable - The variable scientists
manipulate to evaluate a response
Dependent variable - The variable (i.e., trait)
resulting from a treatment with an independent
variable
Variables
Types of variables in biology
Measurement variables
Continuous
Discontinuous
Ranked variables
Attributes
Variables
Measurement variables - Those whose differing
states can be expressed in a numerically-ordered
fashion
Continuous
Can assume any value between two distinct points
For example, there are infinite numbers between 1.5 and 1.6
Include: lengths, areas, volumes, weights, angles, temperatures, periods of
time, percentages, rates
Discontinuous
Discrete values that can only have fixed numerical values
The number of segments in an insect’s appendage may be 4, 5, or 6, but
not 4.3
Variables
Ranked variables
Variables that cannot be measured
For example, order of emergence of pupae
without regard to time
Attribute variables
Variables that cannot be measured, but must
be expressed qualitatively
For example: black/white;
pregnant/nonpregnant; male/female; live/dead
Appropriate tests
Design
Measurement Var
Ranked Var
Computing median and
frequencies
1 variable
Computing means
1 sample
Computing standard
deviations
Attribute Var
Confidence limits for
percentages
Runs test for
randomness
t-tests
1 Variable
Test of equality
2 samples
Paired comparisons test
Mann-Whitney Utest
Testing differences
Kolmogorovbetween two
Smirnov two-sample percentages
test
1 Variable
ANOVA
2+
Tukey-Kramer test
Samples
Kruskal-Wallis test G-test for
Friedman’s random- percentages
ized block test
2
Variables
Regression analysis
Polynomial regression
Olmstead and Tukey’s
Ordering test
Spearman’s rank
Chi-square test
Fisher’s exact test
Means and Standard
Deviations
The mean is the average of measured trait from
a population
In biology, we usually compare two or more
populations, which we call groups
The standard deviation is the variance around
the mean
Many statistical tests use means and standard
deviations to determine if there are significant
differences between groups
null hypothesis
Used to assume an event is true
Statistics can be used to disprove the hypothesis
This lends support to an alternative hypothesis
Nearly every experiment that uses statistics
should define null and alternative hypotheses
Student’s T-test
Determines if there is a significant difference between the means of two
groups of measured data
Paired - compares matched values between members of a group
Unpaired - assumes values between members are not related
Tests values for fit to a normal (aka -Gaussian) distribution (“bell
curve”)
If not, then use nonparametric testing
One-tailed vs. two-tailed
One-tailed: You must specify which group will have a larger mean in advance of data collection
Two-tailed: You do not know which group will have a larger mean in advance of data collection
Student’s T-test
P value: Is there a significant difference between
the means of the two groups?
Generally, if the P value is less than or equal to
0.05, then the difference is considered significant
t-value:
Positive if the first mean is larger than the second
and negative if it is smaller
Student’s T-test
Confidence interval
The calculated mean is unlikely the exact same as the entire
population
Assumes your samples are randomly collected and fit a normal
distribution
If your sample is large with a small standard deviation, then your
calculated mean likely is close to the actual mean
The CI is a calculation based upon sample size and standard
deviation
If the CI is 95%, then the range of your calculated mean (i.e, standard
deviation) probably (95%) includes the actual mean of the population
under study
Download