ARE OBSERVATIONS OBTAINED
DIFFERENT?
ARE OBSERVATIONS OBTAINED
DIFFERENT?
• You use different statistical tests for different problems.
• We will examine some basic tests (χ2, t-test, Regression,
ANOVA, ANCOVA, χ2)
• We expect you to use these basic tests in your research.
• Your research project should not be so complicated that
more advanced tests are required.
• Always state your hypothesis – what you are testing.
BASIC PREMISE OF STATISTICAL TESTING:
Null Hypothesis: The coin is fair
Toss a coin 100 times
Frequency
A fair coin: x = 50 heads
sd = 5 heads (√(½ x ½ x 100))
You observe 60 heads. Is the coin fair?
sd away from mean = (60 – 50)/5 = 2 sd
2 sd is 5% chance,
but in one direction so 2.5% chance (5%/2)
Proportion of heads
NULL HYPOTHESIS
NULL HYPOTHESIS
ACCEPTED
REJECTED
TRUE
CORRECT
TYPE I ERROR
FALSE
TYPE II ERROR
CORRECT
What if you set the probability to claim it to be unfair to be 5%?
What if you set the probability to claim it to be unfair to be 25%?
NONPARAMETRIC TESTS:
(data does not have to be normally distributed)
Data must be counts and you test
proportional distribution of counts.
Null hypothesis: no difference
in proportion of red among strata
2 CONTINGENCY TABLE:
#1
strata
#2
#3
SPECIES
RED
NOT RED
#1
STRATA
#2
#3
8 (2.94 ) 1 (3.24) 1 ( 3.82) 10
2 (7.06) 10 (7.76) 12 ( 9.18) 24
10
11
13
34
Expected for each cell = (R x C)/TOTAL
2 = 
RED
8/10
RED
1/11
RED
1/13
(O – E)2
=
8.71 + 3.63 +1.55 + .65
+ 2.08 + .87 = 17.49
E
P < 0.001; df = (r-1)(c-1) = 2
2 CONTINGENCY TABLE:
Make a spreadsheet
with table categories and
counts in each, and
then have MYSTAT use
as frequencies (Data …
Case weighting …
By frequencies)
Depending on table,
use One-way frequency tables
(one category – e.g., tree type)
or Tables (more than one category – e.g.,
tree type and strata) in Analyze in MYSTAT
PARAMETRIC TESTS
(data is normally distributed)
Data do not have to be counts.
Easier to see differences
(more powerful) than nonparametric
statistics.
#1
strata
#2
Frequency
Null hypothesis: no difference
in proportion of red between strata
#1 and #2.
#3
Proportion red
T-TEST:
t=
(x1 - x2)√n1n2/(n1 + n2)
√[(n1 – 1)s12 + (n2 – 1)s22]/(n1+ n2 – 2)
t = [(0.71)(1.41)]/.214 = 4.68
RED
RED
RED
0.79 + 0.25 0.08 + 0.17 0.08 + 0.17
P < 0.005, degrees of freedom = 6
T-TEST:
Use Hypothesis testing in Analyze
in MYSTAT for means
PARAMETRIC TESTS
(data is normally distributed)
Data do not have to be counts.
Easier to see differences
(more powerful) than nonparametric
statistics.
#1
strata
#2
Null hypothesis: no difference
in relative abundance of red
between strata #1 and #2 for
matched plots based on similarity.
#3
EVEN MORE POWERFUL IF A PRIORI
BASIS TO PAIR OBSERVATIONS.
PAIRED T-TEST:
Pairs: 0.5 – 0 = 0.5; 1.0 – 0 = 1.0;
1.0 - 0.33 = 0.67; 0.67 – 0 = 0.67
mean = 0.71, sd = 0.21
t = 0.71/(0.21/√4) = 6.76
RED
RED
RED
0.79 + 0.25 0.08 + 0.17 0.08 + 0.17
P < 0.001, degrees of freedom = n-1 = 3
PARAMETRIC TESTS
(data is normally distributed)
Data do not have to be counts.
Easier to see differences
(more powerful) than nonparametric
statistics.
Null hypothesis: no difference
in absolute abundance of red
between strata #1 and #2.
#1
strata
#2
#3
Now use numbers not proportions.
T-TEST:
Strata #1: mean = 2.0, sd = 0.82, n = 4
Strata #2: mean = 0.25, sd = 0.5, n =4
t = [(2 – 0.25)(1.41)]/ 0.68 = 3.63
P < 0.01, degrees of freedom = 6
RED
RED
RED
0.79 + 0.25 0.08 + 0.17 0.08 + 0.17
STATISTICAL TESTS
Null hypothesis: there is no relationship
between red vs. blue + green in plots.
REGRESSION ANALYSIS:
3
2
#3
RED
#1
strata
#2
1
5
0
0
1
2
3
BLUE or GREEN
RED = 2.33 – 0.75(BLUE or GREEN)
RED
RED
RED
0.79 + 0.25 0.08 + 0.17 0.08 + 0.17
r2 = 0.75, r = -0.88
Degrees of freedom = 12 – 2 = 10
P < 0.001
REGRESSION ANALYSIS:
Use Regression … Linear …
Least squares in Analyze
in MYSTAT
Select dependent (y) and
independent (x) variables
PARAMETRIC TESTS
(data is normally distributed)
WHAT IF MULTIPLE COMPARISONS
OF A CATEGORY (ANOVA)
Null hypothesis: no difference
in relative abundance of red
among all strata.
#1
strata
#2
#3
Three possible t-test comparisons:
#1 vs. #2
#1 vs. #3
#2 vs. #3
PROBLEM: As number of comparisons
increases, the likelihood of finding at
least one significant difference by chance
increases. ANOVA takes this into account
to compare differences in mean values.
1-WAY ANOVA:
RED
RED
RED
0.79 + 0.25 0.08 + 0.17 0.08 + 0.17
F = 19.75
df = 2, 9 (strata -1, samples – strata)
p < 0.001
ANOVA:
Use Analysis of variance …
Estimate model in Analyze
in MYSTAT
Select continuous dependent (y)
variable and categorical
independent (x) variables
MULTIPLE COMPARISONS (ANOVA):
(Which specific differences are significant?)
Post –hoc analysis:
Must compensate for number of
comparisons and the fact that a
difference is already known to be
significant.
#1
strata
#2
#3
Bonferroni test:
(t-test adjusted for # of comparisons)
#1 vs. #2 – p < 0.001
#1 vs. #3 – p < 0.001
#2 vs. #3 – p < 1.0
RED
RED
RED
0.79 + 0.25 0.08 + 0.17 0.08 + 0.17
ANOVA – POST HOC:
(cannot do with MYSTAT, but will with SYSTAT)
Use Analysis of variance …
Estimate model …
Hypothesis test in Analyze
in SYSTAT
MULTIPLE COMPARISONS (ANOVA):
(several independent categorical variables)
Null hypothesis: no difference
in relative abundance of red
between strata and with distance into
the woods.
TWO-WAY ANOVA:
#1
strata
#2
#3
far
RED
RED
RED
0.79 + 0.25 0.08 + 0.17 0.08 + 0.17
DISTANCE FROM EDGE
near
Strata:
F = 15.65; df = 2,6; p < 0.001
Distance:
F = 0.12; df = 1,6; p < 0.74
Strata X Distance Interaction:
F = 0.51; df = 2,6; p < 0.63
COULD HAVE N-WAY ANOVA,
YOUR PROJECT SHOULD NOT
EXCEED A 2-WAY.
THE INTERACTION TERM’S MEANING
(no variety)
LOCATION
SEASON
A
B
C
mean
I
1
2
3
2
II
2
2
2
2
III
3
2
1
2
mean
2
2
2
2
NO MAIN EFFECTS (SEASON or LOCATION – no differences)
INTERACTION IS SIGNIFICANT (greatest at A:III and C:I)
THE INTERACTION TERM’S MEANING
(wider variety)
LOCATION
SEASON
A
B
C
mean
I
1
2
3
2
II
4
5
6
5
III
7
8
9
8
mean
4
5
6
5
MAIN EFFECTS (SEASON or LOCATION -- differences)
NO INTERACTION (highest always in C and III)
MULTIPLE COMPARISONS (ANCOVA):
(several independent variables: one categorical and one continuous)
Null hypothesis: no difference
in relative abundance of red
with blue + green and distance into
the woods (assume equal slopes).
#1
strata
#2
RED (#/plot)
3.5
#3
3.0
2.5
2.0
1.5
1.0
DISTANCE$
FAR
NEAR
0.5
far
RED
RED
RED
0.79 + 0.25 0.08 + 0.17 0.08 + 0.17
3.
5
3.
0
2.
5
2.
0
1.5
1.0
0.
5
DISTANCE FROM EDGE
near
0.
0
0.0
BLUE + GREEN (#/plot)
ANCOVA:
Blue + Green:
F = 36.10; df = 1,9; p < 0.0002
Distance:
F = 0.78; df = 1,9; p < 0.40
Interaction (slope):
F = 0.08; df = 1,8; p < 0.08
COULD HAVE N-WAY ANCOVA,
ANCOVA:
Use Analysis of variance …
Estimate model in Analyze
in MYSTAT. In SYSTAT use
General linear model …
Estimate model in Analyze
Select continuous dependent (y)
variable and categorical
independent (x1) variable and
covariate (x2). In SYSTAT, create
interaction term to test slope.
DATA TRANSFORMATIONS
(can normalize data or make it continuous so parametric statistics can be used,
or make data linear for regression)
• Data are not always normally distributed,
but a transformation may make it normal (e.g., log). If it cannot be
normalized then must use non-parametric statistics (less powerful).
• Data are not always continuous,
percentages or proportions are not continuous because they cannot
be less than 0 or greater than 100 or 1. To make them continuous
from 0 to infinity or –infinity to +infinity, you can use transforms:
arcsine transform = arcsinproportion;
logarithmic transform = log(proportion)*
logit transform = log (proportion/1-proportion)*.
This stretches both tails and compresses the peak to approximate
a continuous normal distribution.
* If some proportions = 0 or 1, then add a small constant to all values (e.g, 0.001)
• Data for regression are not always linear,
various transformations, especially log x, log y or both, can
transform a curve into a straight line. What do logarithmic transforms
imply about the linear function?
DATA TRANSFORMATIONS
Use Data … Transform …
Let in MYSTAT.
ARE OBSERVATIONS OBTAINED
DIFFERENT?
• Different statistical tests for different problems.
• You will use these basic tests in your research (χ2, t-test,
Regression, ANOVA, ANCOVA)
• Your research project should not be so complicated that
more advanced tests are required.
• Always graph your data and state your hypothesis.
Meadow vole
(Microtus pennsylvanicus)
Yellowbellied marmot
(Marmota flaviventris)
UNDERC-WEST
(National Bison Range)
USE MYSTAT WITH DATA FILES CREATED
LAST WEEK
(be sure to set 6 decimal places -- Edit … Options …
Output in MYSTAT so p values are exact)
WITH MYSTAT ANSWER THESE QUESTIONS:
(you will use χ2, regression, t-test, 2-way ANOVA,
ANCOVA)
• Does snap-trapping lead to a sex bias in Microtus?
• What is the relationship between length and mass for Microtus?
(hint: need to use Data … Transform … Let)
• Do Microtus and Marmota exhibit similar length and mass growth relationships?
(hint: think about question above)
• Does Marmota mass vary with month? Explain ecologically what you see.
• Does reproductive status of female Microtus differ with mass? Why do you
observe this? (hint: need to use Data … Select cases)
• Does the reproductive status of male and female Microtus with mass differ?
Due in two weeks!