standard deviation

advertisement
Statistical Analysis
Topic 1
Statistics
1.1.1 State that error bars are a
graphical representation of the
variability of data.
 1.1.2 Calculate the mean and standard
deviation of a set of values.
 1.1.3 State that the term standard
deviation is used to summarize the
spread of values around the mean, and
that 68% of values fall within one
standard deviation of the mean.




1.1.4 Explain how the standard deviation is
useful for comparing the means and spread
of data between two or more samples.
1.1.5 Deduce the significance of the
difference between two sets of data using
calculated values for t and the appropriate
tables.
1.1.6 Explain that the existence of a
correlation does not establish that there is a
causal relationship between two variables.
What is data?
Information, in the form of facts
or figures obtained from
experiments or surveys, used
as a basis for making
calculations or drawing
conclusions
Encarta dictionary
2 types of Data
Qualitative
Quantitative
Statistics in Science
 Data
can be collected about a
population (surveys)
 Data
can be collected about a
process/mechanism
(experimentation)
Qualitative Data



Information that relates to characteristics
or description (observable qualities)
Information is often grouped by descriptive
category
Examples




Species of plant
Type of insect
Shades of color
Rank of flavor in taste testing
Remember: qualitative data can be “scored” and
evaluated numerically
Qualitative data, manipulated
numerically


Survey results, teens and need for environmental action
Data presented in proportion or % form:
Quantitative data
 Quantitative
– measured using a
naturally occurring numerical
scale
 Examples
 Chemical concentration
 Temperature
 Length
 Weight…etc.
Quantification

Measurements are often displayed
graphically
Quantitation = Measurement



In data collection for Biology, data must be
measured carefully, using laboratory
equipment
(ex. Timers, metersticks, pH meters, balances , pipettes, etc)
The limits of the equipment used add some
uncertainty to the data collected. All
equipment has a certain magnitude of
uncertainty. For example, is a ruler that is
mass-produced a good measure of 1 cm?
1mm? 0.1mm?
For quantitative testing, you must indicate
the level of uncertainty of the tool that
you are using for measurement.
Finding the level of
uncertainty

As a “rule-of-thumb”, if not specified, use ±
1/2 of the smallest measurement unit (e.g.,
metric ruler is lined to 1mm, so the limit of
uncertainty of the ruler is ± 0.5 mm.)

If the room temperature is read as 25ºC,
with a thermometer that is scored at 1degree intervals, what is the range of
possible temperatures for the room?


Answer: 25 ± 0.5 ºC
If you read 15oC, it may be between 14.5
and 15.5 ºC
Definition of Statistics

Branch of mathematics which allows us to
characterize large populations of data by randomly
sampling small portions of data from the whole.

Samples come from habitats, communities, biological
populations, or experimental investigations, and
enable us to draw conclusions about the larger
population.

Statistics measure the differences and relationships
between sets of data

Nothing is 100% certain in science!
Randomization

Valid conclusions about populations can
only be reached when samples are
drawn randomly.

Each member of the population must
have an equal and independent chance
of being sampled.

How might you ensure that populations
are randomly sampled?
Sample Size

The greater the number of samples drawn
from a population, the more
representative the sample is of that
population.

Replication refers to repeatedly
measuring a treatment in an experiment
to account for variation.
Factor: Amount of water per day
1
2
3
Treatments: 0.1L, 0.5L, 1.0L
Number of replicates: 3 per
treatment
1
2
3
1
2
3
Mean




An average of data
points
Central tendency of
the data
Find the mean of
the given data³:
Answer: 12999.4
Country
# of reported
HIV cases
Argentina
27517
Bahamas
4548
Canada
19468
Dominican
Republic
7167
Ecuador
6297
Range





A measure of the spread of
data
Difference between the
largest and the smallest
observed values
Find the range of the given
data:
Answer: 22969
If one data point were
unusually large or
unusually small, it would
have a great effect on the
range. Such points are
called outliers.
Country
# of reported
HIV cases
Argentina
27517
Bahamas
4548
Canada
19468
Dominican
Republic
7167
Ecuador
6297
Looking at Data

How accurate is the data? (How close
are the data to the “real” results?) This
is also known as BIAS

How precise is the data? (All test
systems have some uncertainty, due to
limits of measurement) Estimation of
the limits of the experimental
uncertainty is essential.
the mean.
(=Replication!)
Comparing Averages
 Now
plot means together on a
graph to visualize the
relationship between the two
groups.
The size of our error
bars depends on how
spread out the data is
around the mean
Drawing error bars

The simplest way to draw an error bar is
to use the mean as the central point, and
to use the distance of the measurement
that is furthest from the average as the
endpoints of the data bar
Value farthest
from average
Calculated
distance
Average
value
What do error bars suggest?

If the bars show extensive overlap, it is
likely that there is not a significant
difference between those values
Error bars


Graphical
representation of the
variability of data
Can be used to show
either the range of
data or the standard
deviation on a graph
Standard deviation

A measure of how the individual observations
of a data set are dispersed or spread out
around the mean.

Determined by a mathematical formula which is
programmed into your calculator.

In a normal distribution, about 68% of all
values lie within ±1 standard deviation of the
mean. This rises to about 95% for ±2 standard
deviations from the mean.
How is Standard Deviation
calculated?
With this formula!
How to calculate SD



TI-86
http://www.saintmarys.edu/~cpeltier/calcfor
stat/StatTI-86.html
TI-83 and 84
http://www.saintmarys.edu/~cpeltier/calcfor
stat/StatTI-83.html
In Microsoft Excel, type the following code into the
cell where you want the Standard Deviation result,
using the "unbiased," or "n-1" method:
=STDEV(A1:A30) (substitute the cell name of the
first value in your dataset for A1, and the cell name
of the last value for A30.)
Comparing the means and
standard deviation between two
or more samples
Height of bean plants in the sunlight in
centimetres ±0.1 cm
Height of bean plants in the shade in
centimetres ±0.1 cm
124
131
120
60
153
160
98
212
123
117
142
65
156
155
128
160
139
145
117
95
Total 1300
Total 1300
Mean: 1300/10 = 130.0 cm
Answers

SD for sunlight data: 17.68 cm

SD for shade data: 47.02 cm


Wide variation makes us question experimental
design
Means alone are not sufficient in determining
whether two groups differ statistically from
one another.
A typical standard distribution
curve
According to this curve:
One standard deviation away from the
mean in either direction on the
horizontal axis (the red area on the
preceding graph) accounts for
approximately 68 percent of the data in
this group.
 Two standard deviations away from the
mean (the red and green areas)
account for roughly 95 percent of the
data.

Three Standard Deviations?

three standard deviations (the red,
green and blue areas) account for
about 99 percent of the data
-3sd -2sd
+/-1sd
2sd
+3sd
Significant difference between
two data sets using the t-test
T-test compares two sets of data to see
if chance alone could make a difference
 Scientists like to be at least 95%
certain of their findings before drawing
conclusions
 Mean, SD, and sample size are used to
calculate the value of t
 Degrees of freedom = sum of sample
sizes of each of the two groups minus 2

T-test calculation
For all data values:
http://www.graphpad.com/quickcalcs/tt
est1.cfm
 For means:
http://www.dimensionresearch.com/res
ources/calculators/ttest.html

Worked example

Compare two groups of barnacles living
on a rocky shore. Measure the width
of their shells to see if a significant size
difference is found depending on how
close they live to the water. One group
lives between 0 and 10 metres from
the water level. The second group
lives between 10 and 20 metres above
the water level.

Measurement was taken of the width of
the shells in millimetres. 15 shells were
measured from each group. The mean
of the group closer to the water
indicates that living closer to the water
causes the barnacles to have a larger
shell. If the value of t is 2.25, is that a
significant difference?
Steps to determining significant
difference when given value of t




Determine degree of freedom (# in each set
minus 2)

Ex. 15 + 15 – 2 = 28

Ex. 2.25

Ex. 0.05 or 5%
Use given value of t
Use table of t values to determine probability
(p) of chance
The confidence level is 95%

Ex. We are 95% confident that the difference
between barnacles is significant. Barnacles living
nearer the water have a significantly larger shell
than those living 10 metres or more away from
the water.
T table
One-tailed t-test– if your hypothesis is
that one mean is either larger or
smaller than the other
 Two-tailed t-test – if your hypothesis is
that the two means are not equal (not
specifying larger or smaller)

Web-based t-test

http://graphpad.com/quickcalcs/ttest1.
cfm
Correlation does not mean
causation
Experiments provide a test which
shows cause
 Observations without an experiment
can only show a correlation

Correlation test
Correlation signified by value of r
 +1 (completely positive correlation)
 0 (no correlation)
 -1 (completely negative correlation)
 http://www.socscistatistics.com/tests/p
earson/
 Note that r describes linear
relationships

Correlation or causation?
1.
2.
3.
4.
5.
Cars with low gas mileage per gallon
of fuel cause global warming.
Drinking red wine protects against
heart disease.
Tanning beds can cause skin cancer.
UV rays increase the risk of cataracts.
Vitamin C cures the common cold.
Resources
¹http://www.globalissues.org/TradeRel
ated/Facts.asp#src1
 ²http://www.globalissues.org/TradeRel
ated/Consumption.asp
 ³http://www.who.int/globalatlas/includ
eFiles/generalIncludeFiles/listInstances.
asp
 Stephe Taylor Bandung international
school

Download