SP801

advertisement
116101773
Page 1 of 20
SP801 Session 8
1. Review: t-Tests and Basic ANOVA /
Discussion of Computing Assignment
ANOVA 1 (Part 2)
2. Computational Issues in Oneway
ANOVA
2.1 Computation from group means,
standard deviations and n’s
2.2 Unequal n’s and Multiple Comparisons
2.3 The F-Test, Directional Hypotheses and
Effect Sizes
3. Multi-Factor ANOVA
3.1 The Basic Model
3.2 Interpreting Interactions
3.3 Multiple Comparisons
3.4 Odds and Ends: Unequal n; “Types” of
Sums of Squares; Fixed vs. Random
Factors; Nested Designs
116101773
Page 2 of 20
1.
Review: t-Tests and Basic ANOVA / Discussion
of Computing Assignment ANOVA 1 (Part 2)
After the first two sessions on experimental design and statistical
methods of comparing means, you should be able to:
(Experiment)
 define experimental research, understand and discuss its purpose
and rationale
 differentiate random assignment from random sampling
 describe within-subjects and between-subjects designs
 understand and use basic experimental terminology (random vs.
systematic error, control techniques, field and quasi experiments
etc.).
(t-Test)
 understand the rationale of two types of t-test as well as their
underlying assumptions
 apply the appropriate t-test to problems of data analysis
 compute and report effect size measures related to t-tests
(ANOVA)
 understand the rationale and purpose of one-factor ANOVA for
both independent groups and repeated measures as well as their
underlying assumptions
 use ANOVA terminology (effect, error, sum of squares etc.)
 apply the appropriate ANOVA procedures to problems of data
analysis
 discuss the limitations of one-factor ANOVA
(Multiple Comparisons)
 discuss the problem of multiple testing
 differentiate between per-comparison and familywise error rates
 explain and apply the concept of alpha adjustment, in particular
the Bonferroni approach
 differentiate between a-priori and post-hoc comparisons
 compute these comparisons using SPSS
 formulate and identify orthogonal sets of contrasts
116101773
Page 3 of 20
2.
Computational Issues in Oneway ANOVA
2.1 Computing a Oneway ANOVA from group means,
standard deviations and n’s
A Oneway ANOVA can be computed by hand from a table of
group means, standard deviations and n’s per condition, even if
the raw data are not known (e.g. from published data).
Computational example:
Group
n
Mean
Std. Dev.
____________________________________________
1
10
4.50
1.08
2
8
6.50
1.20
3
12
5.00
0.85
SSwithin is obtained by squaring each standard deviation, multiplying the result by (nj – 1) and adding up these three values:
SS within  (1.08 2  9)  (1.20 2  7)  (0.852  11)  28.525
SSbetween can be found by (a) first computing the grand mean, (b)
then the squared deviations of each group mean from the grand
mean, and (c) then adding these up, each squared deviation
weighted by the appropriate nj:
( a ) Grand Mean 
( b)
4.5  10  6.5  8  5  12
 5.2333
10  8  12
( M 1  Grand Mean ) 2  ( 4.5  5.2333) 2  0.5377
( M 2  Grand Mean ) 2  (6.5  5.2333) 2  1.6045
( M 3  Grand Mean ) 2  (5.0  5.2333) 2  0.0543
(c)
SStreatment  0.5377  10  1.6045  8  0.0543  12  18.8667
116101773
Page 4 of 20
As dftreatment = 2 and dferror = 27, we find:
F ( 2,27) 
18.8667  2
 8.929
28.525  27
The raw data on which this example is based can be found in the
file v:\courses\sp801\oneway.sav. Running a Oneway ANOVA
with this dataset will confirm that the F above is correct within
rounding precision.
2.2 How do unequal n’s affect computations in Oneway
ANOVA and Multiple Comparisons?
Oneway ANOVA does not require equal sample sizes per
condition. The computation of sums of squares and degrees of
freedom is unaffected by inequality of sample sizes.
However, the computation of Multiple Comparisons needs to
take inequality of sample sizes into account. To compute apriori contrasts with unequal n’s, use the harmonic mean across
samples (nh) to replace n in the equation for MScontrast:
nh L2
MS contrast  SScontrast 
 a 2j
Note that nh increasingly deviates from the arithmetic mean of
n’s (and thus from n in the case of equal sample sizes), the
greater the differences among the sample n’s are.
For example, with three groups and overall N = 60, we obtain:
nh = 20.00
if
n1 = n2 = n3 = 20
nh = 18.00
if
n1 = 15,
n2 = 15,
n3 = 30
nh = 7.14
if
n1 = 5,
n2 = 5,
n3 = 50
 Unequal n’s mean a loss of power.
116101773
Page 5 of 20
SPSS post-hoc tests use the harmonic mean as well when
sample sizes are unequal. This can lead to a disparity in
conclusions based on pairwise comparisons versus
homogeneous subsets of means.
Consider the following example using the dataset
v:\courses\sp801\gssft.dat (a General Social Survey dataset
containing only full-time employed respondents, see Norusis’ SPSS 8.0
Guide to Data Analysis, Chapter 14).
These post-hoc tests are based on a Oneway ANOVA
examining the number of hours worked per week (HRS1) as a
function of education level (DEGREE). Table 1 shows the
descriptive statistics for five levels of DEGREE. Note the
greatly unequal sample sizes:
e
H
d
e
M
o
p
t
w
d
p
o
o
x
e
N
i
i
u
u
a
i
m
E
a
m
0
2
9
2
1
6
2
0
0
1
7
7
8
4
2
3
8
0
2
4
7
6
9
9
5
5
0
3
2
8
9
1
8
8
5
9
4
6
7
4
3
2
2
4
9
T
1
9
7
1
8
0
5
9
Table 2 shows pairwise post-hoc tests using the Tukey procedure. This table indicates that people with a graduate degree
worked more hours than (a) people with less than a high school
degree and (b) people with a high school degree.
Table 3 shows homogeneous subsets, also computed using the
Tukey procedure. Note that only two groups are different when
the overall harmonic mean of n’s is used.
116101773
Page 6 of 20
p
e
l
e
D
u
T
e
d
i
a
e
M
p
w
e
p
r
o
e
U
L
)
g
u
)
J
E
u
J
i
I
(
o
(
o
I
.
S
(
B
d
B
1
0
3
9
7
3
8
4
5
1
5
0
2
5
1
5
4
8
7
1
5
7
1
3
7
6
7
3
9
1
5
5
8
6
4
*
1
4
7
6
8
2
9
0
6
5
0
1
9
3
7
3
8
5
4
1
5
0
2
4
3
0
5
2
3
5
0
2
0
3
5
7
8
7
1
2
4
7
4
6
4
*
6
3
7
4
9
8
1
0
3
4
0
2
1
5
5
4
8
1
7
5
7
1
1
3
4
0
5
2
5
3
0
2
0
3
8
1
8
8
1
2
3
9
5
5
4
0
0
7
3
0
9
7
5
4
4
0
3
6
7
7
3
9
5
1
5
8
6
1
7
5
8
7
1
4
2
7
4
6
2
1
8
8
8
1
3
2
9
5
5
4
9
6
0
3
8
1
9
7
9
8
0
4
*
4
1
7
6
8
9
2
0
6
5
1
*
3
6
7
4
9
1
8
0
3
4
2
0
0
7
3
0
7
9
5
4
4
3
6
9
0
3
8
9
1
7
9
8
.
*
h
T
o
u
a
T
s
e
a
=
N
1
2
D
0
2
9
1
7
7
7
2
4
7
7
3
2
8
8
4
6
7
S
2
4
M
a
U
b
T
t
h
g
116101773
Page 7 of 20
2.3 The F-Test, Directional Hypotheses and Effect Sizes
F-distributions are defined for values from 0 to positive infinity,
and larger F’s generally indicate a larger effect. Thus, only F
values in the right-hand tail of the F-distribution lead to
rejection of H0. Does this mean that F-tests are generally onetailed?
No! In fact, directional hypotheses can only be tested via the Fdistribution when numerator df = 1, i.e. when only two groups
are compared. Recall that this test is equivalent to a t-test, with
t2 = F. Thus, the squared values of those t’s found in the
leftmost and rightmost 2.5% of a t-distribution, respectively,
correspond to the F’s found in the rightmost 5% of an Fdistribution with 1 numerator df.
To do a one-tailed test based on F, one would simply look up
the critical F-value in a table at p = 2. Or, when interpreting
computer output, one would divide the reported 2-tailed
probability by 2.
Any F-test with numerator df > 1 is necessarily non-directional,
as differences among more than 2 means are involved, and a
significant result simply indicates that some non-zero difference
among the means exists.
A common effect size measure associated with F is , which can
be defined as:
 
SStreatment
SStreatment  SSerror

F  df treatment
F  df treatment  df error
or
.
116101773
Page 8 of 20
When dftreatment = 1,
r
,
i.e. the product-moment or point-biserial correlation coefficient
between the grouping variable and the dependent variable.
Just as r2, 2 is a measure of DV variance accounted for by the
IV; it is more general than r, however, as it can be used to
express variance accounted for by effects of a nominal IV (or by
interactions of nominal IV’s).
SPSS optionally provides 2 as part of the ANOVA table in
procedure GLM (“Options” – “Effect Size”) but not as part of
procedure ONEWAY.
3.
Multifactor ANOVA
3.1 The Basic Model
A multifactor ANOVA considers the effects of two or more
nominal IV’s and their interactions on a DV. The levels of these
IV’s (or “factors”) can represent independent groups, repeated
measurements, or a mixture of both.
As a generic case, consider a two-factor ANOVA applied to a
randomised experiment with two IV’s whose levels are crossed
(also called an orthogonal design):
Communicator
Expertise
high
low
Argument Quality
strong
mixed
weak
116101773
Page 9 of 20
Rationale:
The multi-factor approach has various advantages:
 allows to study the generality of effects (e.g., do different
types of feedback have similar effects on performance for
males and females, or across various age levels?);
 allows examination of interaction effects;
 allows testing of complex theories that specify distinctive
combinations of independent variables (e.g. in the above
example, Chaiken’s HSM would predict that high [versus
low] expertise leads to clearly more positive attitudes in the
mixed arguments condition, does not have much of an effect
in the strong arguments condition, and leads to less positive
attitudes in the weak arguments condition);
 increases efficiency – several factors can be examined in a
single study.
Some Terminology
Main Effect: Comparison of the means from the various levels of
one particular factor ignoring (or collapsing over) the other
factors in the research design.
Main Effect Marginal Means: Means for the various levels of a
particular factor, ignoring (or collapsing over) other factors in
the design. A main effect can be thought of in terms of whether
or not the main effect marginal means differ from one another
statistically.
Interaction Effect: Pattern of cell means indicating that the
effects of one factor differ as a function of levels of another
factor (or combination of factors). Depending on how many
factors are involved, we speak of two-way, three-way etc.
interactions.
116101773
Page 10 of 20
Interaction Effect Marginal Means: Means for the various
combinations of levels of the particular factors comprising an
interaction effect, ignoring (or collapsing over) other factors in
the design.
Simple Main Effects: Main effects examined within a specific
level of one of the other factors (or specific factorial combinations of levels of factors in a more complex research design).
Simple Interaction Effects: Interaction effects examined within a
specific level of one of the other factors (or within specific
factorial combinations of levels of factors in a more complex
research design).
Computational Example:
In a persuasion experiment, students read a message that
ostensibly came from a communicator of high or low expertise
(Factor 1); the message contained strong, mixed or weak
arguments (Factor 2).
Table entries are DV scores of participants on a measure of
post-message attitudes that can vary between 1 and 7. For
simplicity, only 2 observations per condition are included.
Communicator
Expertise
high
low
Marginal Means
Argument Quality
Marginal
Means
strong
mixed
weak
7
6
1
8
7
3
7
3
2
6
4
4
4.33
7.00
5.00
2.50
4.83
5.33
116101773
Page 11 of 20
The ANOVA model assumes that each score (yijk) is composed
of the grand mean (), a main effect for the column factor (j), a
main effect for the row factor (k), an interaction effect (jk) and
an error term (eijk):
yijk     j   k   jk  eijk
Accordingly, the cell means can be thought of as representing
the grand mean plus a column effect plus a row effect plus an
interaction effect:
 jk     j   k   jk
The column and row effects can be estimated from the difference between the appropriate marginal mean and the sample
grand mean:
j  M j  M
k  M k  M
The interaction effect can be estimated indirectly:
M jk  M  ( M j  M )  ( M k  M )   jk
 jk  M jk  M  M j  M k
In our example, we find the following interaction effects jk:
Table of Interaction Effect
strong
mixed
weak
Residual
Marginal
Means
high
0
+1
-1
0
low
0
-1
+1
0
Resid. Marg. Means
0
0
0
0
Communicator
Expertise
Argument Quality
116101773
Page 12 of 20
The information in this table can be read as: The mean for high
expertise / mixed arguments is 1 scale unit higher than one
would expect if only two main effects were present; the mean
for low expertise / mixed arguments is 1 scale unit lower than
one would expect if only two main effects were present ... etc.
Plot of observed means based on full model:
Estimated Marginal Means of ATTITUDE
8
7
6
5
4
COMM_EXP
3
1.00
2
1
1.00
2.00
2.00
3.00
ARGU
Plot of estimated means based on main effects model:
Estimated Marginal Means of ATTITUDE
8
7
6
5
4
COMM_EXP
3
1.00
2
1
1.00
ARGU
2.00
2.00
3.00
116101773
Page 13 of 20
To test for significance, mean squares and F-tests are computed.
The total sum of squares consists of four components:
SStotal  SScolumns  SSrows  SS IA  SSerror
Computation of these components proceeds as in Oneway
ANOVA, with the exception of SSIA, which is determined
indirectly:
SS IA  SStotal  SScolumns  SSrows  SSerror
Mean squares are obtained by dividing the sums of sqares for
each effect by its associated df.
For
and
C = number of levels of column factor
R = number of levels of row factor,
df columns

C -1
df rows

R -1
df IA

(R - 1)  (C - 1)
df error

N - (R  C)
df total

N -1
F-tests are computed as usual, dividing the MS of each effect by
MSerror.
116101773
Page 14 of 20
An SPSS analysis (GLM General Factorial; defaults plus
descriptive statistics and estimates of effect size) of our example
data yields the following output:
e
D
t
d
e
i
N
C
a
A
a
1
1
0
1
2
2
0
1
2
3
0
2
2
T
3
5
6
2
1
0
1
2
2
0
1
2
3
0
2
2
T
3
9
6
T
1
0
5
4
2
0
7
4
3
0
0
4
T
3
6
2
n
-
D
p
e
E
m
e
t
a
S
u
u
d
u
F
S
a
i
a
f
g
a
a
C
7
5
3
3
7
6
I
n
3
1
3
3
0
9
C
0
1
0
0
4
3
A
7
2
3
3
2
1
C
0
2
0
0
9
1
E
0
6
0
T
0
2
C
7
1
a
R
[Discuss the meaning of each entry in the ANOVA table.]
116101773
Page 15 of 20
3.2 Interpreting Interactions
The presence of an interaction effect in a two-way ANOVA
means that the size of the effects of one factor on the DV vary
with the levels of the other factor.
In our example, communicator expertise seems to have a
stronger positive effect on attitudes (i.e. higher expertise
produces more positive attitudes) in the mixed-arguments
condition than in the two other argument conditions. (In fact, in
the weak argument condition, the effect of expertise seems to be
negative).
In graphical representations of the condition means, where
levels of one factor define the partitioning of the x-axis and
levels of the other factor define separate lines (see example
above), non-parallel lines indicate an interaction effect. If the
lines cross, the interaction is called disordinal, if they do not
cross, it is called ordinal.
[NB: Of course, the lines in such a display rarely show a
perfectly parallel pattern. Deviations from parallel may be due
to chance. Therefore, the graphical display should only be
interpreted in combination with the F-test of the interaction.]
Higher-order interactions
The logic of the two-factor model can easily be generalized to
designs with more than two factors. With three factors, we can
test the three main effects, three interactions among two factors,
and one interaction involving all three factors.
[Give examples for higher-order interactions.]
In general, when the number of factors is k, the number of Ftests in the ANOVA is (2k – 1).
The problem of multiple testing applies to the F-tests in an
ANOVA as it does to multiple t-tests!
So if you run a three-factor ANOVA, the chance probability that
at least one of the seven F-tests will yield a “significant” result
at .05 is a little over .30 !
116101773
Page 16 of 20
In deciding if alpha adjustment is necessary, the same
considerations as with multiple t-tests apply.
What the multi-factor ANOVA does not tell us
Even though a multi-factor ANOVA yields more information
than a oneway ANOVA performed on the same data, any F-test
with numerator df > 1 is still ambiguous as to which means
exactly differ from each other.
Even an analysis whose effects all have numerator df = 1 does
not always tell us what we want to know. Consider a clinical
three-factor design with the IV’s medication (placebo vs. drug),
psychotherapy (not given vs. given) and accomodation (inpatient vs. out-patient).
Hypothesis: The combination of drug treatment and psychotherapy and in-patient accomodation leads to a reduction in
symptoms compared to the other seven conditions.
Results are shown in the Table below. Table entries are means
of a symptom change index (range -1 to +1; positive values =
symptom reduction).
Table of Means:
In-patient
No Psychotherapy
Psychotherapy
Out-patient
Drug
Placebo
Drug
Placebo
0
0
0
0
+0.8
0
0
0
116101773
Page 17 of 20
So the pattern of means is exactly as predicted. However,
assuming n = 10 per condition and MSerror = 0.8, a standard
three-factor ANOVA yields:
Effect
Accomodation
Psychotherapy
Medication
AxP
AxM
PxM
AxPxM
Error
SS
0.8
0.8
0.8
0.8
0.8
0.8
0.8
57.6
df
1
1
1
1
1
1
1
72
MS
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
F
1.0
1.0
1.0
1.0
1.0
1.0
1.0
p
.32
.32
.32
.32
.32
.32
.32
None of these seven tests, although focused, really tests our
hypothesis! In this case, one a-priori contrast, pinpointing the
critical cell against the seven others, would have yielded:
MScontrast = SScontrast = 5.6
Fcontrast(1,72) = 7.0, p  .01
3.3 Multiple Comparisons
As the above example shows, multiple comparisons are as
useful in multifactor ANOVA as they are in oneway analyses.
But within multi-factor designs, SPSS offers a-priori and posthoc comparisons only with respect to one factor at a time. Thus,
we might compare levels of the argument quality factor in our
initial example as part of a GLM General Factorial analysis.
However, a comparison of, say, the strong arguments / expert
communicator condition with the remaining five cells would not
be easily accomplished. (Similarly, the contrast relevant to the
above clinical example would not be possible).
116101773
Page 18 of 20
To run contrasts across factors anyway, the easiest way is to
create a variable that represents all the combinations of factor
levels and to run a Oneway ANOVA to test the contrasts one is
interested in, using this new variable as the IV (see computing
assignment ANOVA 2 for details and an example).
3.4 Odds and Ends: Unequal n; “Types” of Sums of
Squares; Fixed vs. Random Factors; Nested Designs
Unequal n
So far, we discussed equal-n designs. In the case of unequal n’s,
the factors in a multi-factor design are not orthogonal any more,
and there are various possibilities of computing the sums of
squares (just as there are various ways in MRA of treating
variability that is commonly explained by correlated predictors).
Types of sums of squares
In SPSS, three basic types of sum-of-squares computation are
available:
Type 3, the default, corresponds to a standard regression model
with effects-coding of IV’s. Each effect is adjusted for all other
effects in the model. The sums of squares of all effects plus
error do not add up to the “corrected total” sum of squares if n’s
are unequal.
Type 2 adjusts higher-order interactions for all lower-order
effects, but is invariant to ordering of effects within the same
level of interaction (comparable to a hierarchical regression with
successive entry of sets of IV’s: main effects first, then two-way
IA’s, then three-way IA’s etc.)
Type 1 adjusts each effect for all effects that were entered previously (comparable to a hierarchical regression with entry of
one IV at a time).
116101773
Page 19 of 20
With equal sample sizes per condition, the three approaches are
identical. To avoid any ambiguity in interpreting ANOVA
effects, try to always aim for equal n’s per condition when
designing a study and do not change the default setting sum-ofsquares type in SPSS.
Fixed versus Random Factors
There are two ways to conceptualise factors in an ANOVA:
Fixed factors are IV’s whose levels have been deliberately
selected by the researcher to operationalise a hypothetical
variable, or IV’s whose levels exhaust all naturally occurring
levels (e.g. male vs. female).
Random factors are IV’s whose levels have been randomly
drawn to represent a larger population of levels (e.g. 5 randomly
drawn presentation orders of 10 items in a questionnaire, to
control effects of order in general).
Fixed factors are the default in SPSS, and again, leave it at that
as long as your design is a standard experimental or quasiexperimental one. The major difference in the two approaches is
that different error terms are used for the factors designated as
random, which are more conservative. So in the above example,
controlling for order effects (which are usually unwanted) would
be even more rigid if the orders used were designated as fixed
factors.
Nested Designs
In a nested design, factors are not completely crossed with each
other. I.e., not every level of factor A is paired with every level
of factor B to form the experimental conditions.
An example would be a clinical study in which each ten
therapists (T1 to T10) treat patients using two methods (M1 and
M2), each method being practiced by five therapists.
116101773
Page 20 of 20
Design:
Methods
T1
T2
M1
M2
Therapists
Therapists
T3
T4
T5
T6
T7
T8
T9
T10
Patients
In this design, the Therapist factor is nested under Methods.
This means that an interaction of Therapist x Method cannot be
tested. Effects of Therapist can only be tested within Method,
and Method effects would be tested against a “patients within
therapists” error term.
Therapists should be thought of as a random factor if method
effects are of interest.
Note that additional factors may be crossed with both Methods
and Therapists (e.g. patient’s sex: each therapist may treat equal
numbers of women and men).
SPSS offers the possibility of testing nested designs, but this
application is beyond the scope of this course.
Download