Biostatistics Tools
-
-
When you plan an experiment, it is important how you will analyze your results. You can then plan
what data to collect, and how you will collect them, in order to allow you to use your chosen statistical
test to interpret what the data mean.
In the examination, however you may be given information about the investigation, and how you
would analyze the results.
The following table summarizes data analysis calculations used in A Level course.
calculation
criteria for using it
Lincoln/Petersen
index (markreleaserecapture)
you want to estimate the size of a
population of mobile animals.
You can mark an animal without
affecting its chances of survival.
Marked animals mix randomly
and fully into the rest of the
population after release.
Simpson’s index
of Diversity, D
you have collected quantitative
data about the numbers of
individuals belonging to the
different species in an area
The greater the value the
greater the species diversity
(that is, the higher the number
of species and the greater the
evenness of their population
sizes).
Hardy Weinberg
equations
In a population, a gene has two
alleles. There is random mating
between organisms with different
phenotypes. There is no
significant immigration or
emigration from the population.
Each phenotype has an equal
chance of survival (there is no
selection against any of the
phenotypes).
you have data that show a normal
distribution. You want to know
how much these data are spread
on either side of the mean value.
You have data that show a normal
distribution. You want to know
how close the mean value is likely
If you use the equation to
calculate the frequency of a
recessive allele in a population,
you can then predict the
proportion of people in the
population who are likely to be
carriers of that allele.
Standard
deviation, s
Standard error,
SM
Hot to interpret the value you
calculate
The value gives an estimate of
the size of the population
The greater the standard
deviation, the more widely the
data are spread on either side
of the mean.
The greater the standard error,
the less chance that a set of
data would give you exactly the
Notes
The value is only
an estimate; the
only way to find
the absolute
number of
organisms in a
population is to
count every one
of them. (census)
There are several
different
versions of the
formula for
Simpson’s index
of Diversity;
make sure you
are familiar with
the one given in
your syllabus
Standard error
can be used to
draw error bars
to be to the true mean – in other
words, how likely it is that if you
collected another set of data, the
mean would be the same.
t-test
You have two sets of continuous,
quantitative data.
You have more than 10 but fewer
than 30 readings in each set.
Both sets of data come from
populations with normal
distributions.
The standard deviations for two
sets of data are similar.
Chi-squared
test,
You have two or more sets of
quantitative data, which belongs
to two or more discontinuous
categories.
same mean value. You can be
95% certain that the true mean
lies with 2Xthe standard error
either above or below your
calculated mean.
If this shows overlap between
the possible true mean values
of two sets of data, you cannot
be sure that the true means are
different
Construct a null hypothesis,
predicting that there is no
significant difference between
the two data sets.
Determine the number of
degrees of freedom in your
data.
Look up your calculated value
of ‘t’ in a table of probabilities.
If the probability table tells you
that the probability of your null
hypothesis being correct (that
is, the difference between the
two populations being due to
chance) is 0.05 or greater, then
the null hypothesis is
supported.
Construct a null hypothesis,
predicting that there is no
significant difference between
the two data sets.
Determine the number of
degrees of freedom in your
data.
Look up your calculated value
of ‘ ’ in a table of
probabilities.
If the probability table tells you
that the probability of your null
on graphs. These
are usually
extending to
2XSM above and
below the data
plot.
In general, the
greater the value
of ‘t’, the smaller
the probability
that the null
hypothesis is
correct.
The value of 0.05
means that there
is 5% chance that
the differences
between the two
data sets is due
to chance.
This is the critical
value that is
generally used in
biology.
If the value is
less than 0.05,
then we can
reasonably
assume that the
differences are
real, and not just
due to chance.
In general, the
greater the value
of ‘ ’, the
smaller the
probability that
the null
hypothesis is
correct.
The value of 0.05
means that there
is 5% chance that
the differences
hypothesis being correct (that
is, the difference between the
two populations being due to
chance) is 0.05 or greater, then
the null hypothesis is
supported.
Spearman’s rank
correlation, rs
Pearson’s linear
correlation, r
between the two
data sets is due
to chance.
This is the critical
value that is
generally used in
biology.
If the value is
less than 0.05,
then we can
reasonably
assume that the
differences are
real, and not just
due to chance.
You have collected data of two
Values of rs can lie anywhere
Remember that
variables, both of which can be
between rs -1 and +1. A value of correlation does
ranked.
-1 indicates a perfect negative
not prove
The samples for both set of data
correlation, and +1 a perfect
causation or
have been collected randomly.
positive correlation.
prove cause and
You have at least 5 pairs of data,
A value of 0 indicates no
effect.
but preferably 10 and 30.
correlation.
You have plotted a scatter graph, Use a correlation coefficient
and it looks as though there may
table to look up your value of rs.
be a relationship between the two If your value of rs is greater
variables.
than the value with a
probability of 0.05, then you
can say there is a significant
correlation between the two
variables.
You have two sets of interval
Values of rs can lie anywhere
Remember that
data.
between rs -1 and +1. A value of correlation does
You have at least 5 pairs of data,
-1 indicates a perfect negative
not prove
but preferably 10 or more.
correlation, and +1 a perfect
causation or
Both sets of data have an
positive correlation.
prove cause and
approximately normal
A value of 0 indicates no
effect.
distribution.
correlation.
You have plotted a scatter graph, Use a correlation coefficient
and it looks as though there might table to look up your value of rs.
be a linear correlation between
If your value of rs is greater
the two variables.
than the value with a
probability of 0.05, then you
can say there is a significant
correlation between the two
variables.