Uploaded by Daniela Ponader

IB Statistics Handbook

advertisement
IB STATISTICS
HANDBOOK
Mag Karl Schauer BSc
xkcd.com
IB STATISTICS HANDBOOK
Why Statistics? ...................................................................................4
What should you know? ....................................................................4
Types of Data .....................................................................................5
Categorical data..................................................................................5
Ordinal Data .......................................................................................5
Numerical data ...................................................................................5
Frequency ...........................................................................................5
Other Data ..........................................................................................5
Sampling Techniques ..........................................................................6
Random sampling ................................................................................6
Systematic Sampling ............................................................................6
Stratified Sampling ...............................................................................6
Descriptive statistics ...........................................................................7
Averages ...................................................................................................7
Mean ..................................................................................................7
Median ...............................................................................................7
Mode ..................................................................................................7
Measures of central Tendency .....................................................................8
Standard deviation ..............................................................................8
Minimum, Maximum and Range ...........................................................8
Quartiles and Interquartile Ranges ........................................................8
Histograms .................................................................................................9
The normal curve ......................................................................................10
Data tables .......................................................................................11
Title ...................................................................................................11
Labels ................................................................................................11
Data ..................................................................................................11
Summary statistics ..............................................................................11
Formatting .........................................................................................11
Graphical techniques ........................................................................13
Formatting and Labelling ...........................................................................13
2
IB STATISTICS HANDBOOK
Bar graphs ...............................................................................................13
Line graphs ..............................................................................................14
Scatter plots and correlation ......................................................................14
Correlation ........................................................................................15
Extrapolation and Interpolation ...........................................................17
Other graphs ............................................................................................19
Hypothesis Testing ...........................................................................20
Testing for differences ...............................................................................21
The T-test ...........................................................................................21
ANOVA ............................................................................................24
The Mann-Whitney U test....................................................................24
Testing for Correlation ..............................................................................25
The Pearson correlation test ................................................................25
Spearman Rank correlation test ..........................................................26
Other statistical tests .................................................................................27
The chi-squared test ............................................................................27
Nearest neighbour analysis ................................................................27
Critical Value tables ..........................................................................30
How to use spreadsheet software ...................................................31
More Help and further reading ........................................................32
3
IB STATISTICS HANDBOOK
WHY STATISTICS?
An academic investigation is a way to try to answer a question. This question must be
defined, and a method determined to collect appropriate data. Predictions are then made
based on the knowledge gained by answering previous questions in previous
investigations. So where do statistics come in? Statistics are the tool you need to boil
down all of your carefully collected data into a clear answer. Importantly, they also tell you
how sure you can be of that answer.
W H AT S H O U L D YO U
KNOW?
In order to complete your internally assessed work or data-based extended essays in the IB,
you will need to apply some basic statistics. You will need to summarise and describe your
data using descriptive statistics like averages and standard deviations. Then you will need
to present your data in tables and graphs. Finally, depending on the investigation, you may
need to perform a hypothesis test or other calculations to definitively answer your
question. You won’t generally be expected to do the sometimes complicated calculations
by hand. Tools like spreadsheet software (Excel, LibreOffice etc.) or your Tinspire
handheld make many calculations a trivial matter of entering numbers. You can and will
likely need to look up tutorials for using your software online, as there are many different
softwares and platforms, but there are many tutorials readily available (see this list of
resources for help). What is trickier, and mostly up to you, is deciding what statistics to
apply in what circumstances, and understanding what those calculations tell you. This
handbook should help you with those decisions and that understanding.
This handbook is intended to be used digitally, and contains some cross-referencing and
external links. Underlined text, as well as the table of contents can be clicked to take you
where you want to go.
4
IB STATISTICS HANDBOOK
T YPES OF DATA
You might encounter all types of data in your investigations. It is important to distinguish
between a few different types of data because not all statistical techniques work with all
types of data.
C ATEGORIC AL DATA
This type of data fits into defined categories. For example: red, green and blue as options
for people’s favourite colour are categories.
ORDINAL DATA
This is similar to categorical data, but there is a clear order of the groups. For example:
low, medium and high income categories. These categories don’t always or necessarily
have the same distances between them.
NUMERIC AL DATA
This type of data includes measurements of all kinds. There is a clear order, as in ordinal
data, but the distances between data points are clearly defined. Length, mass and speed are
all numerical data.
FREQUENCY
When a statistician uses the word frequency, they generally mean a count of the number of
things. ‘What is the frequency of…’ can usually be translated to ‘ how many…’
OTHER DATA
This list is certainly not complete. There are other specialty types of data that you might
encounter, but these should be sufficient for most of your investigations.
5
IB STATISTICS HANDBOOK
SAMPLING TECHNIQUES
Since you will never have enough time or resources to measure all of the possible data
points in the population (and if you use statistics, you shouldn’t need to), you will always
only measure a small portion of all of the possible points called a sample. But which data
points should go into the sample? In order to have a fair test, it is important that each
possible data point is equally likely to be chosen for the sample. That is to say, there
should be no sampling bias. In order to do this, you will need to use a sampling strategy
that fits your investigation.
RANDOM SAMPLING
Just to be clear, it is not sufficient to claim that a sample is random if you have simply
chosen ‘at random’ places to measure. You might have a subconscious bias for certain
measurements. To be truly random, you will need to assign a number to each possible data
point, and use a random number generator to tell you which measurements to collect.
One way to do this is with the RAND() function in spreadsheet software. Simply enter
‘=RAND()’ in a cell, and it will show you a random number between 0 and 1. You can
then multiply this number by whatever you need to in order to have a random number
between 0 and that number. For example: if you wanted a random number between 0 and
100, you could simply type ‘=RAND()*100’ into a cell.
SYSTEMATIC SAMPLING
In this technique, you simply choose to sample at regular intervals. For example, you
might choose to make a measurement every ten meters on a transect.
STRATIFIED SAMPLING
This more complicated sampling method is only used if the population is made up of
different sub-sets that make up different proportions of the whole. It might be important
to make sure that one sub-set isn’t being over- or under-represented in the data. This is
most commonly used with survey data.
6
IB STATISTICS HANDBOOK
DESCRIPTIVE STATISTICS
Once you have collected your data, you will need to boil it down. Descriptive statistics,
sometimes called summary statistics, do just that: they help your reader see the general
trends and patterns in your data.
AV E R AG E S
MEAN
This is generally what is meant when someone says ‘average’. It is the sum of the values
divided by the number of values. This is the most common way to summarise sample data.
MEDIAN
This is the ‘middle’ data point. There are just as many data points higher and lower than
this one in the sample. This measure of average is more likely to be used if the sample is
distributed in a strange way, or if outliers might strongly affect the mean. For example, in a
sample of measuring personal wealth, one or two billionaires in the sample might heavily
skew the mean to show an average income that is higher than almost every single
individual. In this case it would be appropriate to use median income to better represent
the sample.
MODE
Mode is much less commonly used. It is the ‘most common’ data point. Or in other words,
the data point with the highest frequency.
Example given the five values in the sample:
1 , 3, 3, 5, 8, 11, 12
Mean
43/7
Median
Mode
1, 3, 3, 5, 8, 11, 12 1, 3, 3, 5, 8, 11, 12
6.1
5
3
average
middle
most common
7
IB STATISTICS HANDBOOK
MEASURES OF CENTRAL TENDENCY
Just knowing the average of a group only tells part of the story. Another important aspect
is the spread in the data. How similar are the data points to each other? Are they all
basically the same, or are there wild differences? These measurements all describe how
spread out the data is around the mean.
STANDARD DEVIATION
Standard deviation is the average of the differences between each data point and the mean.
A large standard deviation, relative to the size of the measurement, means that the data is
very spread out. This means that there are generally large differences between data points.
A small standard deviation, relative to the size of the measurements, indicates that the
measurements are close together, or that they ‘agree’.
Standard deviations are useful for numerical data. If the data is not numerical, or is not
normal, you will need a different way to show the spread of the data.
MINIMUM, MAXIMUM AND RANGE
To give a basic idea of the spread of your data it is often good to include the total range of
the data, that is to point out the highest and lowest measured values (maximum and
minimum), and the distance between them (range).
QUARTILES AND INTERQUARTILE RANGES
These measures of spread apply to the median in a similar way to how the standard
deviation applies to the mean. Here is how they are calculated:
1. Arrange the data in rank order and divided into four equal parts, each containing an
equal number of values. Each section is called a quartile. The quartile containing
the highest values is the upper quartile, while the one with the lowest values is the
lower quartile.
2. The Upper Quartile Value (UQV) or Q1 is the mean of the lowest value in the
upper quartile and the highest value in the quartile below it.
3. The Lower Quartile Value (LQV) or Q3 is the mean of the highest value in the
lower quartile and the lowest value in the quartile above it.
4. The Inter-Quartile Range (IQR) is the difference between both values calculated in
#2 and #3. A high IQR means the data is very dispersed, while a low IQR means the
data is less dispersed.
8
IB STATISTICS HANDBOOK
For example:
1 2 3 4 5 6 7 8 9 10 11
Q3
Q2
Q1
IQR = Q1- Q3=8.5-3.5 = 5
For a more detailed explanation, visit this site:
https://stattrek.com/statistics/dictionary.aspx?definition=interquartile%20range
A histogram is a way to visualise data in a sample.
It is essentially a bar chart with categories for the
measured values (for example 1-10, 11-20, 21-30)
on the x-axis and frequency (the number of data
points in that category) on the y-axis.
A histogram can be useful to show if your data are
normally distributed, that is they generally look like
a bell curve. (See section on the normal curve) It
may be important to show this before you can use
some types of hypothesis tests.
Frequency
HISTOGRAMS
Your histogram might not show a normal curve.
The data might be skewed to one direction, or even
show several peaks. These might be important aspects
to bring up in your evaluation, and might help you
choose what hypothesis test, if any, you can use.
Normal
Right-skewed
Age
This histogram shows the
distribution of age in a
sample. It appears to be
slightly right-skewed.
Left-skewed
These are some shapes of histograms you might encounter.
Here is a deeper explanation of histograms and how they are made by hand:
https://youtu.be/4eLJGG2Ad30
Here is a longer explanation of the different shapes you might encounter in histograms:
https://youtu.be/Y53_8WRrPzg
9
IB STATISTICS HANDBOOK
THE NORMAL CURVE
Many data just happen to fit to a normal distribution curve, also called a bell curve or,
more technically, a gaussian curve. If your data fits this type of distribution, you can make
some predictions using your data and the mathematics behind this curve.
The standard, ‘bell’, or gaussian curve
shows the pattern of how normally
distributed data spreads around the
mean. If your histogram looks like this,
your data is probably
normally
distributed.
The shaded areas under the curve
represent the proportion of data points
that will likely be found in this section of
the curve. The x-axis shows standard
deviation distances with the mean at 0.
The area under the normal curve shows how many data points are likely to be found in any
given range. 68% of data points will, on average, be within one standard deviation of the
mean, and 95% will fall within 2 standard deviations. This is helpful in predicting
probabilities, and this type of math is the basis for hypothesis tests.
The normal distribution curve is one of many used in statistics, but it is the most common
shape you will likely encounter.
If your data appear to be normally distributed; that is, your histogram appears to have a
normal curve shape, then you may be able to use some hypothesis tests that require
normal data as a prerequisite.
10
IB STATISTICS HANDBOOK
DATA TABLES
Once you have boiled your data down into some tangible values, you will need to present
the raw data and your descriptive statistics in well-organised tables. Designing data tables
is an art form all its own. A few points might help you make yours beautiful.
TITLE
Be sure that each table has a meaningful and descriptive title (not just ‘table 1’). With
multiple tables, it is usually a good idea to number them (hint: check that your numbers
are right before you hand in a draft!) so that you can refer to them easily in your text.
LABELS
Your data columns need proper labels including:
• A clear descriptive title of what data is listed in the column,
• The appropriate units of those values, and
• The measurement precision of those values.
DATA
The data itself should have the correct number of significant figures to reflect the precision
of the data (see these links for help with significant figures). Be careful not to show more
precision (more significant figures) in an average. You will likely need to format the cells of
your table to show the appropriate number of digits, since the trailing zeros disappear
otherwise. If you have very large or very small values, simply use scientific notation.
SUMMARY STATISTICS
You may want to include your averages and standard deviations right in the table with your
raw data. If you have a lot of data, or it is relatively complex, you might want to create a
separate data table of your summary statistics. You should use whatever you think will
help your reader see the data best.
FORMATTING
It is usually a good idea, if possible, to present your data table on one page. Having the
first half of a table at the end of one page, with the last half continuing on the next makes
it very hard to get an overview of the data. Also, try to size your columns carefully to fit
them on the page, but not to muddle the titles.
11
IB STATISTICS HANDBOOK
Here is an example of a well-organised table:
Table 1: The height of 15 Z. mays plants after growing for 30 days at different fertiliser
concentrations in three different field sites.
Plant height 30 days after germination (+/- 1mm)
Concentration
of fertiliser in
Standard
Field site 1
Field site 2
Field site 3
Average
0.10
345
330
404
360
39
0.20
442
410
430
427
16
0.30
510
470
550
510
40
0.40
580
530
603
571
37
0.50
200
130
240
190
56
soil (+/- 0.10
deviation
mg/kg)
12
IB STATISTICS HANDBOOK
GRAPHICAL TECHNIQUES
Once you have presented your data in tables, you will need to make it more readily visible
to your reader. It is important to choose the right graph for the type of data you are
presenting. The formatting and labelling of the graph is also important. If done well,
graphs should show the reader the answer to your research question at a glance.
FORMATTING AND L ABELLING
Generally speaking, the same rules for tables apply to graphs. Be sure that each graph has a
clear and descriptive title, that the axes are labelled in the same way that the columns of
the corresponding tables are labelled. Make sure that the axes are scaled so that the data
fill the graph, and that the scale numbers reflect the same level of precision as the data.
Also, make sure that the independent variable (that you changed or defined on purpose) is
on the x-axis, and that the y-axis shows your dependent variable (measured result). It is
almost always best to graph your processed data, and not the raw data, unless there is
something important you want to show the reader about your raw data.
BAR GRAPHS
Bar graphs are used to represent numerical data (y-axis) from different categories (x-axis).
Bar graphs of averages should have error bars showing standard deviation, or some other
measure of spread. Somewhere on the graph or in its caption you need to declare what the
error bars represent. Be sure that you are using the unique standard deviation values that
Figure 1: The average growth of cress seeds after growing for 4 days under
different coloured light. The error bars represent one standard deviation.
Average growth of cress plants after 4 days
(+/-1mm)
40
30
20
10
0
Red
Blue
Green
Yellow
Orange
Color of light applied to growing plants
13
IB STATISTICS HANDBOOK
you calculated in your tables, and not the automatic values that some softwares apply
(incorrectly).
LINE GRAPHS
Line graphs and scatter plots are often confused for each other. Line graphs show straight
lines connecting the dots of the data points. This is to represent the fact that line graphs
show multiple measurements of the same thing. The straight line is an assumed linear
change of that measured value between measurements. Therefore, only use a line graph if
you are tracking the change of something.
Figure 2: Global average temperature from 1880 - 2000
14.50
Average global temperaure (+/-0.01°C)
14.42
14.34
14.26
14.18
14.10
14.02
13.94
13.86
13.78
13.70
1870
1890
1910
1930
1950
1970
1990
2010
Year
SC ATTER PLOTS AND CORREL ATION
Scatter plots are used to compare numerical values on both axes. If both your independent
and dependent variables are numerical measurements, this is probably the type of graph
you should use. Each dot on the graph represents a data point, and this can show trends in
the data. Usually this type of investigation is looking for some sort of relationship between
the two variables. You will need to start with this type of graph to look for correlations, or
in order to perform interpolations or extrapolations.
If you graph average values, you will need error bars to show the spread of the data (see
bar graphs). Be sure, however, to use all data points, not just averages, to calculate an R2
value.
14
IB STATISTICS HANDBOOK
Figure 1: This graph shows a strong positive correlation between the height and
age of the sampled trees.
32
y = 0.7334x - 2.8792
R² = 0.9189
Height of tree (+/-2m)
25.6
19.2
12.8
6.4
0
0
10
20
30
40
50
Age of tree (+/-1year)
CORREL ATION
A correlation is a relationship between two numerical variables. A correlation can be
positive or negative:
• Positive correlation: As the independent variable increases, the dependent variable also
increases.
• Negative correlation: As the independent variable increases the dependent variable
decreases.
Positive correlation
Negative correlation
No correlation
15
IB STATISTICS HANDBOOK
The line of best fit or trend line is chosen by the computer to be as close as possible to all
of the data points. It is an approximation of the linear trend in the data. The closer all of
the data points are to the trend line, the stronger the correlation.
The degree of correlation is measured by the correlation coefficient, r or R, more
technically called the ‘Pearson product-moment correlation coefficient’. This value ranges
from -1 for a set of data that aligns perfectly to a line with a negative slope, to 1, for a set
of data points that align perfectly to a line with a positive slope. The closer the r value is to
either -1 or 1, the stronger the relationship between the two variables.
r= 1
0.8
0.3
0
-0.3
-0.8
-1
Alternatively, correlation can be reported as the coefficient of determination, r2 or R2.
This is simply the correlation coefficient squared, which therefore always has a positive
value between 0 and 1. This value is defined as the proportion of the variance in the
dependent variabel that can be predicted by the independent variable. For example, with
an r2 value of 1, all of the data points align perfectly. That means that the value of the
dependent variable can be predicted with 100% precision for any value of the independent
variable. For an r2 value of 0.8, 80% of the variance in the dependent variable is predicted
by the independent variable, and therefore values of the dependent variable can be
predicted with 80% precision given any independent variable. Predicting the values of
variables based on a correlation is called extrapolation or interpolation.
In order to make claims about a linear correlation, it is important that the data show a
linear trend. If the data are not linear, or not expected to be linear, it is not appropriate to
compare them to a trend line! Here are some examples of when linear regression is not
appropriate:
16
IB STATISTICS HANDBOOK
• Enzyme activity is expected to increase logarithmically as temperature increases, then
peak at the optimum temperature for that enzyme, then drop sharply as the enzyme
denatures at higher temperatures. A graph comparing temperature and enzyme activity
might therefore look something like this:
Enzyme activity
50
40
This shape of a graph should not
be compared to a line.
30
20
10
0
0
20
40
60
80
Temperature (+/-1°C)
• The rate of a chemical reaction slows over time as substrate is used up. The shape of the
curve produced is predictable and depends on the type of reaction. The graph of the
concentration of the product over time might look something like this:
Concentration of
product (+/- 0.1M)
50
40
This shape of a graph should not be
compared to a line. Instead, a Spearman
Rank test can be performed.
30
20
10
0
0
30
60
90
120
Time (+/-1s)
When interpreting r or r2 values, be sure to be realistic about the strength of the
correlation. An r2 value of 0.3 may or may not indicate any kind of relevant relationship. If
you want more certainty about whether your correlation is statistically significant, you
should consider using a Pearson’s R test for correlation or a Spearman’s Rank test. You can
read more about these tests in the section on hypothesis testing.
EXTRAPOL ATION AND INTERPOL ATION
If you have determined a strong linear relationship in your data, you can use this data to
make predictions. You can use the equation of the line of best fit to calculate the expected
value for an unknown.
17
IB STATISTICS HANDBOOK
Interpolation is using the trend line to predict values within the range of your data.
Extrapolation is expanding the trend line beyond the data to make predictions outside of
the range of data. The further the predicted value is from the measured values, the less
reliable the extrapolated value will be.
An example of extrapolation is using current trends in climate change to make predictions
about how the planet’s climate will continue to change in the future. This is how
climatologists make predictions about how warm the earth might be in the coming
decades.
An example of interpolation is determining the osmolarity of a tissue. Suppose you
measured the rate of osmosis in potato tissue in various concentrations of sugar solution.
Your data were linear and looked like this:
Change in mass of potato tissue after 1h
(+/-0.01g)
3.00
y = -7.1143x + 2.073
R² = 0.9659
You must use all of the
data points (not just the
averages!) in order for your
software to accurately
calculate R2.
2.00
1.00
0.00
-1.00
0.29
-2.00
0.00
0.10
0.20
0.30
0.40
0.50
Concentration of sucrose solution (+/-0.01M)
18
IB STATISTICS HANDBOOK
The high R2 value of 0.97 shows that the data have a strong linear correlation. The
negative slope value of -7.1143 shows that the relationship is a negative correlation.
Because the trend is very strong, the equation for the line can be used to make predictions.
You were asked to determine what concentration would be isotonic to the potato tissue,
that is, at what concentration no net osmosis would occur. At this concentration the
change in the mass of the potato tissue would be zero. To find the corresponding
concentration, you can substitute zero for y (the change in mass) in the trend line
equation, and solve for x (the concentration):
y = − 7.1143x + 2.073
→
0 = − 7.1143x + 2.073
−2.073 = − 7.1143x
→
−2.073
=x
−7.1143
x = 0.2914
This value of 0.29 M is where the trend line crosses the x axis, and it is the osmolarity, or
isotonic concentration of the potato tissue. It must be rounded to reflect the precision of
the measurements that were used to calculate it. This process could be repeated to predict
any given change in mass or any concentration within the range of the data.
OTHER GRAPHS
Though the graphs listed above are the most likely you will need, there are of course many
other types of graphs. Here are two others that you might consider:
Pie charts show the breakdown of a group into its parts, usually percentages. The
percentages should add up to 100. Avoid too many categories, as the chart can quickly
become difficult to read.
Radar charts can be used to show many different attributes at once, and compare these
between locations or individuals.
19
IB STATISTICS HANDBOOK
HYPOTHESIS TESTING
The goal of an experiment or investigation is to answer a specific question. The data
should make it clear what the answer to that question is. Often, due to the uncertainty
inherent in data, the answer may not be entirely clear. It may look like there is a difference
between two groups, but the difference might only be due to chance. It may appear that
there is a correlation between two variables, but the sample may have been a fluke.
Hypothesis testing allows you to determine how sure you are of the answer, and the
likelihood of the observed pattern being due to chance.
A hypothesis test requires that you make an assumption, and calculate the probability of
this assumption being true. This assumption is called the null hypothesis, H0. If this null
hypothesis assumption can be shown to be very unlikely, then you can conclude instead
that the alternative hypothesis, HA, is true. Despite the naming, these hypotheses are
different than your experimental hypothesis, that is, your reasoning about what you think
will happen in your experiment. You always need to declare and explain an experimental
hypothesis in the exploration portion of your work. You only need to declare null and
alternative hypotheses in the context of your hypothesis test, if you choose to use one.
This should be included in your explanation of the data analysis.
A hypothesis test generates a test statistic. The value of this test statistic gives you
information about the likelihood of the null hypothesis being true. That test statistic can
then be compared to table of critical values that it must be higher or lower than in order to
conclude a statistically significant result.
Usually this process is simplified, and a p-value can be calculated based on the test
statistic. The p of p-value stands for probability, and it is the probability of the null
hypothesis being true given your data. It is always a value between 0 and 1 (i.e. 0 and
100%) . If the p-value is low enough, then it is very unlikely that the null hypothesis is
true, and it can safely be rejected. When the null hypothesis is rejected, the alternative
hypothesis can be concluded, and there is a statistically significant result.
The p-value is compared to the alpha value. For our intents and purposes you will use an
alpha value of 0.05. This is the threshold probability below which you determine that the
null hypothesis is too unlikely. That is, if the probability of the null hypothesis being true
(p-value) is less than 5%, then you should conclude that it is too unlikely to be reasonable
and reject the null hypothesis. For an example of this process, read the section on ttesting.
If the p-value is above the alpha threshold of 0.05, then you must ‘fail to reject the null
hypothesis’. This is different than accepting the null hypothesis! You don’t have enough
20
IB STATISTICS HANDBOOK
evidence to conclude that the null hypothesis is true. Instead you simply ‘fail to reject’ and
conclude that you cannot be sure whether the observed result is due to random chance or
a real effect.
For example, if a test gives a p-value of 0.2, there is a 20% chance that the null hypothesis
is true given your data. It would not be reasonable to conclude, then, that the null
hypothesis is true, as there is only a 20% chance of this being the case. You also cannot
rule it out entirely, since 20% is a significantly high likelihood. Therefore you simply ‘fail
to reject the null hypothesis’.
TESTING FOR DIFFERENCES
Often an investigation aims to find differences between groups. The t-test, ANOVA, and
the Mann-Whitney U test are different ways to determine whether observed differences are
statistically significant, or just due to random chance.
The t-test is used when the data can be assumed to be normal and the sample sizes are
relatively large (more than 10 measurements). It might be a good idea to make a
histogram to see if the data appear to be normal, but at the very least you should state that
you assume the data to be normally distributed, and why you think it is.
If the assumptions for normality are met for the t-test, but you have more than two
groups, you will need to perform an ANOVA (analysis of variance) test to see if the
variability between the groups is due to chance or some real effect.
If it is not safe to assume that the data are normally distributed, you have small samples,
or your data are ordinal, but not numerical, then you can make a comparison between two
groups using the Mann-Whitney U Test instead. This test is less likely to find a difference
if there is one, but it is safer to use if the prerequisites for a t-test are unclear or not met.
THE T-TEST
The t-test assumes a null hypothesis that there is no significant difference between the
groups (any observed difference is due to chance), then calculates the probability of that
hypothesis given your data.
H0: There is no significant difference between the two groups
HA: The observed difference between the groups is statistically significant, and not
likely due to chance.
21
IB STATISTICS HANDBOOK
Suppose you want to find out if dandelions (T. officinale) grow to different heights in two
different types of soil. In your experiment you measure the average growth of the
dandelions in each of two soil types.
Figure 1: The average maximum height of 16 T. officinale plants grown in two
different soil types. Error bars represent one standard deviation.
Maximum achieved height of
T. officinale plants (+/-0.1cm)
Soil a
Soil b
8.0
15.5
6.3
14.7
9.1
14.5
13.2
12.2
12.0
10.1
6.3
15.0
10.0
12.1
11.0
13.2
12.1
16.0
9.8
14.2
8.5
13.1
12.2
9.9
9.7
17.8
10.1
10.3
13.2
16.4
10.3
19.0
Average
10.1
14.0
Standard
deviation
2.1
2.7
Average maximum height of plants (+/-0.1cm)
17.0
12.8
8.5
4.3
0.0
Soil a
Soil b
You notice a difference between the groups. The plants in soil b have grown taller on
average than the plants in soil a. Since the error bars are overlapping, it is hard to say
whether this observed difference is due to chance, or whether the two are really different.
Therefore you decide to perform a t-test to find out.
First, you need to determine whether the data are normally distributed, so you make a
histogram to see.
22
IB STATISTICS HANDBOOK
Figure 2: A histogram of the data shows that
it appears to be normally distributed.
Soil a
Soil b
5
Frequency
4
3
2
1
0
6.0-7.9
8.0-9.9
10.0-11.9
12.0-13.9
14.0-15.9
16.0-17.9
18.0-19.9
Height of plants (+/-0.1cm)
Because the data appear to be normally distributed, and the sample sizes are sufficiently
large (n=16), you can proceed with the t-test.
Using the T.TEST() function of your spreadsheet software, you enter the following
information:
=T.TEST(dataset from soil a, dataset from soil b, 2, 2)
This decimal may
or may not be
required in your
software.
These are the lists
of raw values, not
the averages.
The two 2’s in the syntax tell
the software what kind of ttest to perform. You want a
‘two-tailed’, ‘non-paired’ test.
The T.TEST() function of your spreadsheet software returns the p-value for the test. The pvalue is the the probability of H0 being true given your data.
You can watch a tutorial video on the t-test in Excel here:
https://youtu.be/DPNUpldVC4M
As the p-value decreases, the likelihood of the difference being due to random chance also
decreases. Eventually, the p-value is so small, that it is no longer reasonable to assume that
the null hypothesis is true, and therefore the null hypothesis can be rejected. The most
commonly used threshold level (also called the alpha value) for rejecting the p-value is
23
IB STATISTICS HANDBOOK
0.05. That means that if the probability of the null hypothesis being true sinks below 0.05
(5%), then it can be rejected and the alternative hypothesis accepted.
In the case of these data the p-value is 0.00008. This is well below the threshold level of
0.05, and therefore the null hypothesis can be rejected. The observed difference between
the two soils is not due to random chance, it is a statistically significant difference.
A N OVA
The ANOVA test should be used if you have more than two groups to compare. Though
you could theoretically perform many t-tests between each of the possible combinations,
this is inefficient, and mathematically risky. Every time you perform a t-test, there is a
small probability that your difference was in fact due to a random fluke, and not a real
difference. If you perform many t-test, the likelihood of making such an error increases.
The ANOVA test assumes the following hypotheses:
H0: The groups are all the same
HA: At least one of the groups appears to be different than the rest. The variability
between groups is not likely to be due to chance.
The ANOVA test produces a p-value that can be interpreted in the same way as in the ttest. If the p-value is below the threshold of 0.05, then the null hypothesis can be rejected
and the alternative hypothesis concluded. It is not clear from the ANOVA test what groups
are different from each other, but instead that the variability between the groups is not due
to chance.
You can learn how to perform the ANOVA test in Excel or LibreOffice here:
Excel: https://youtu.be/qQSQr_JldyY
LibreOffice: https://youtu.be/TxTKq4W8qX8
THE MANN-WHITNEY U TEST
The Mann-Whitney U test is mathematically very different than the t-test, but achieves a
similar goal of finding out whether an observed difference is statistically significant.
Instead of using the actual values of the measurements for the comparison, it simply
compares the rank order of the values. This is somewhat analogous to the difference
between a mean and median average with the t-test being similar to the mean and the
24
IB STATISTICS HANDBOOK
Mann-Whitney U similar to the median. This type of hypothesis test that does not rely on
the actual value of the measurements is called a non-parametric test.
The null and alternative hypotheses are the same as for the t-test:
H0: There is no significant difference between the two groups
HA: The observed difference between the groups is statistically significant, and not
likely due to chance.
Although you can painstakingly calculate the Mann-Whitney U statistic by hand, with
some help from your spreadsheet software, this is not a requirement of the IB. Instead
simply enter your two sets of data in this online calculator:
https://www.socscistatistics.com/tests/mannwhitney/default2.aspx
The calculator gives you the p-value for the test, which you compare to the alpha threshold
of 0.05 as in the t-test. If the p-value is below 0.05, you can reject the null hypothesis and
conclude that there is a statistically significant difference.
TESTING FOR CORREL ATION
If your investigation intends to look for a relationship between two variables, it might be a
good idea to test whether the correlation your data suggest is statistically significant or
likely to be due to chance. A correlation test does just that. These tests work in a similar
way to tests for differences. The two most relevant tests are the Pearson productmoment correlation test also called the Pearson correlation test and the Spearman rank
correlation test.
THE PEARSON CORREL ATION TEST
The Pearson correlation test, similar to the t-test, requires the data to meet some
prerequisites. In order to run this test, the following conditions must be true:
• The data are numerical for both variables.
• The data are paired, that is, there are two measurements or values for each data point,
the dependent and independent variables.
• The data follow a linear trend.
• The data can be assumed to be normally distributed for both variables.
• There are no obvious outliers in the data set.
25
IB STATISTICS HANDBOOK
If these conditions are met, you can continue with the test.
The null and alternative hypotheses are as follows:
H0: There is no correlation in the data. The observed trend is due to random chance.
HA: The observed trend is a statistically significant correlation.
In your spreadsheet software, enter the formula to calculate r, the Pearson correlation
coefficient:
=PEARSON(dataset of independent variable, dataset of dependent variable)
This formula calculates the r value that then needs to be compared to a critical value table
(see critical value tables here).
The critical value table shows what values of r are significant. The strength of the test
depends on the number of data points included (n), so the critical value also changes with
n. Keep in mind that one data point has two measurements. If you were comparing, for
example, height and weight of plants, and measured height and weight of 12 plants, then n
would be 12, not 24. You simply need to compare the r value that you calculated to the
critical value corresponding to the number of data points you used. If the absolute value of
r is greater than the critical value, then the correlation is statistically significant and not
likely to be due to chance.
For example, if your r value was -0.65 and you had 10 data points, you would compare that
r value to the critical value 0.521 from the critical value table and conclude that the
absolute value of r is greater than the critical value. Therefore the correlation is statistically
significant.
SPEARMAN RANK CORREL ATION TEST
The Spearman rank correlation is denoted by the symbol ρ (rho) or rs. Analogous to how
the Mann-Whitney U test compares rank instead of the actual values of the data, the
Spearman rank test determines a correlation in the data by looking at the rank order of the
data instead of its actual values. Use this test instead of the Pearson correlation test if any
of the following are true:
• The data are not numerical but are ordinal.
• The data are not normally distributed.
26
IB STATISTICS HANDBOOK
• The data do not appear to have a linear trend, but do trend either positively or
negatively.
• There are apparent outliers in the data.
The null and alternative hypotheses are the same as for the Pearson correlation test:
H0: There is no correlation in the data. The observed trend is due to random chance.
HA: The observed trend is a statistically significant correlation.
To calculate the test statistic, simply enter your data in the calculator at this site, and
interpret the p-value as in the other tests:
https://www.socscistatistics.com/tests/spearman/default2.aspx
OTHER STATISTIC AL TESTS
Though making comparisons between groups and determining correlations between
variables are the most common statistical tests, there are many other ways to test data for
significance. The chi-squared test for goodness of fit and the nearest neighbour analysis are
two that you might need for biology and geography respectively.
THE C HI-SQUARED TEST
This test determines whether data fit a pattern or model. The data for the test needs to be
categorical. One of the simplest versions of this test is to test for an association between
species, that is whether the location of some species of immobile organism associates with
the location of another species. A chi-squared test can also be used in genetics to
determine if the frequency of genotypes matches the expected ratios. For more information
about this type of test and how to calculate and interpret the chi-squared value, refer to
your oxford biology textbook on pages 215 (association between species) and 453
(genotype ratios).
Here is a demonstration of how you can use Excel to calculate chi-squared:
https://youtu.be/o0VhMWeotFg
N E A R E S T N E I G H B O U R A N A LY S I S
In geography, the nearest neighbour analysis can be used to determine if the spacing
between points is random, clustered, or ordered. First, the data is collected by measuring
27
IB STATISTICS HANDBOOK
the distance between each location (eg. tree) and it’s nearest neighbour. The nearest
neighbour index, (NNI or Rn) is then calculated according to the following formula:
NNI = 2D̄
n
A
Where D̄ is the average nearest neighbour distance,
n is the number of observations, and
A is the total area studied.
The NNI value can indicate that the points are clustered, random, or ordered, depending on
its value:
NNI = 0 : The points are completely clustered
NNI = 1.0 : The points have a completely random distribution
NNI = 2.15 : The points are distributed uniformly.
0
1.0
2.15
clustered
random
uniform
Given the number of data points, you can compare the NNI value to the critical value table.
If NNI is below the number for clustered points, then there is statistically significant
clustering. If it is above the value for uniformity, then the points are statistically
significantly uniform. If it lies between the two values, the
Tree
Nearest
Distance
points are randomly distributed.
number
tree
between trees
1
2
1.1
2
1
1.1
3
2
1.3
4
7
0.4
5
3
1.2
6
7
1.0
7
4
0.4
8
9
2.0
9
8
2.0
D̄
=
For example, the following data was collected by measuring
the distances between trees in a 36m2 park:
1+
2+
8+
3+
4+
7+
5+
6+
1.2
n= 9
9+
A = 36m2
28
IB STATISTICS HANDBOOK
Since n= 9, A=36m2, and D̄ = 1.17m The NNI value is therefore calculated as:
NNI = 2D̄
n
= 2 ⋅ 1.2
A
9
= 1.2
36
For n= 9, the critical value table gives 0.713 as the limit below which the points would be
considered clustered, and 1.287 as the upper limit, above which the data would be
considered ordered. You can therefore conclude that the trees are randomly dispersed.
29
IB STATISTICS HANDBOOK
C R I T I C A L VA L U E TA B L E S
Pearson r
n
Critical
value
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
26
31
36
41
46
51
61
71
81
91
101
0.988
0.900
0.805
0.729
0.669
0.622
0.582
0.549
0.521
0.497
0.476
0.458
0.441
0.426
0.412
0.400
0.389
0.378
0.369
0.360
0.323
0.296
0.275
0.257
0.243
0.231
0.211
0.195
0.183
0.173
0.164
Nearest neighbour index
Critical values
n
clustered
uniform
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
50
60
70
80
90
100
0.392
0.504
0.570
0.616
0.649
0.675
0.696
0.713
0.728
0.741
0.752
0.762
0.770
0.778
0.785
0.792
0.797
0.803
0.808
0.812
0.817
0.821
0.825
0.828
0.831
0.835
0.838
0.840
0.843
0.846
0.848
0.850
0.853
0.855
0.857
0.859
0.861
0.862
0.864
0.866
0.867
0.869
0.870
0.872
0.878
0.889
0.897
0.904
0.909
0.914
1.608
1.497
1.430
1.385
1.351
1.325
1.304
1.287
1.272
1.259
1.248
1.239
1.230
1.222
1.215
1.209
1.203
1.197
1.192
1.188
1.183
1.179
1.176
1.172
1.169
1.166
1.163
1.160
1.157
1.155
1.152
1.150
1.148
1.145
1.143
1.141
1.140
1.138
1.136
1.134
1.133
1.131
1.130
1.128
1.122
1.111
1.103
1.096
1.091
1.086
30
IB STATISTICS HANDBOOK
HOW TO USE
SPREADSHEET SOFT WARE
You will need to spend some time learning how to use your brand of software on your
platform, as they all differ somewhat. Excel©(subscription based) and LibreOffice©
(freeware) are both good options, but you could also use Numbers© on MacOS, or Google©
Sheets, though the latter has some significant limitations. Many of these calculations can
also be performed on a Tinspire© handheld. Searching the web, or using your software’s
help function will usually yield quick answers to tricky problems. Here are some tips and
resources that might help you on your way:
Tip: Be sure you know whether your software expects a decimal ( . ) or a comma ( , ) as a
separator. If you use the wrong one, the computer does not recognise your data as
numbers, but instead treats it as text which causes all calculations to fail. Use the 'search
and replace’ function of your software to change all of them at once.
Tip: Use this site to find the appropriate function in your software’s language:
http://www.excelfunctions.eu
The Moodle site 7AB Tabellenkalkulation has guides for performing simple calculations
and making diagrams in Excel© and LibreOffice© here:
https://moodle.tsn.at/course/view.php?id=36089
At Mr. Schauer’s youtube channel you can find a handful of videos on data analysis:
bit.ly/mrschauersyoutube
Here are some instructions on how to make a histogram in Excel:
https://support.office.com/en-us/article/create-a-histogram-85680173-064b-4024b39d-80f17ff2f4e8
For more information on calculating quartiles and the inter-quartile range using excel, visit
this site:
https://www.statisticshowto.com/probability-and-statistics/interquartile-range/
#IQRExcel
31
IB STATISTICS HANDBOOK
MORE HELP AND FURTHER
READING
For help reviewing how to use significant figures appropriately, watch these videos:
An introduction to significant figures: https://youtu.be/eCJ76hz7jPM
Rules to determine significant figures: https://youtu.be/eMl2z3ezlrQ
For lots of in-depth information on the geography Internal Assessment, visit these pages:
https://www.thinkib.net/geography/page/22606/ia-student-guide
https://sites.google.com/site/geographyfais/fieldwork
For more help with biological statistics for the IA, visit this site:
https://www.biologyforlife.com/statistics.html
For a very useful handbook of basic statistics, look for a copy of this book:
Methods of Statistical Analysis of Fieldwork Data. St. John, P. and Richardson, D.A.
Geographical association 1996.
32
Download