Analysis of Variance: General Concepts

advertisement
Research Skills for Psychology Majors: Everything You Need to Know to Get Started
Analysis of Variance: General
Concepts
This chapter is designed to present the most basic ideas in analysis of variance in a
non-statistical manner. Its intent is to communicate the general idea of the analysis
and provide enough information to begin to read research result sections that
report ANOVA analyses.
Analysis of Variance is a general-purpose statistical procedure that is used to analize a wide range of research designs and to investigate many complex problems.
In this chapter we will only discuss the original, basic use of ANOVA: analysis of
experiments that include more than two groups. When ANOVA is used in this
simple sense, it follows directly from a still simpler procedure, the t-test.
The t-test compares two groups, either in a between-subjects design (different
subjects in the groups) or a repeated-measures design (same subjects assessed
twice). ANOVA can be thought of as an extension of the t-test to situations in
which there are more than two groups (one-way design) or where there is more
than one independent variable (factorial design). These situations are the most
common in research, so ANOVA is used far more frequently than t-tests.
Variance is Analyzed
The name “analysis of variance” is more representative of what the analysis is
about than “t-test” because we are in fact focusing on analyzing variances. The
conceptual model for ANOVA follows the familiar pattern first introduced in
the Inferential Statistics chapter: a ratio is formed between the differences in the
means of the groups and the error variance. In the same way that a variance (or
standard deviation) can be calculated from a set of data, a variance can be calculated from a set of means. So the differences among the means is thought of as
their variance: higher variance among the means indicates that there are more
differences (which is good, right?). The variance among the group means is called
the between-groups variance.
The ratio, then, is between-groups variance divided by error variance. A larger
ratio indicates that the differences between the groups are greater than the error
or “noise” going on inside the groups. If this ratio, the F statistic, is large enough
given the size of the sample, we can reject the null hypothesis. The whole story
in ANOVA is figuring out how to calculate (and understand) these two types of
variance.
©2003 W. K. Gabrenya Jr. Version: 1.0
A Deeper Truth
Actually, the t-test is a
special case of ANOVA.
ANOVA is the real thing.
A Still Deeper Truth
Actually, ANOVA is a simplification of very complex
correlations. Correlation is
the real thing.
Page 2
A Visual Example
Here is an example of a one-way, between-groups design that would be analyzed
using ANOVA. Four groups of participants are randomly sampled from four majors on campus. We will not identify the majors for the sake of interdepartmental
harmony, but the identity of Group 4 is clear. Each sample includes five students.
They are each administered the Wechsler Adult Intelligence Scale (WAIS-III) to
obtain a measure if IQ. IQs have a mean of 100 in the population as a whole. Our
question: which major is smarter?
The following table presents the raw data (IQ scores), the means within each
group, the standard deviation within each group, and the variance. The variance is
simply the SD squared, a more useful number for certain aspects of the calcultions.
It is normal that some of the SDs are larger than others. The gray bars below the
scale represent the range of the IQs in each major, which is one indication of the
within-group variability. (A wider range often produces a higher SD.) In the last
column, the mean of the means (grand mean), the standard deviation of the means,
and the variance of the means are presented.
Data:
Group 1 Group 2
80, 85, 90, 90, 93, 96,
95, 100
99, 102
Group 3
Group 4
Grand Mean
97, 100, 103, 105, 110, 115, (the group means
could go here)
106, 109
120, 125
Mean:
Std. Dev.:
Variance:
90.0
7.9
62.5
103.0
4.7
22.5
96.0
4.7
22.5
115.0
7.9
62.5
101.0
Grand
Mean
80
1
90
1
1
100
Gp 1
1
110
120
125
Group 4
1
Group 2
Group 3
What’s the null hypothesis? The null condition is that there is no difference between the population means:
H0: µ1 = µ2 = µ3 = µ4, where µ is “mu,” the population mean.
Our task is to determine if the sample means, presented in the table above, are
sufficiently different from each other compared to the error variance within the
groups, to reject the null hypothesis. Of course the sample size will also affect the
outcome because larger samples allow for better tests of the null hypothesis. In
the language of ANOVA, we will look at the ratio of the between-group variance
to the within-group (error) variance.
F=
Between-groups variance
Error Variance Within Groups
Page 3
In the example, we have included the
individual data for group 1 as circles inside
the group 1 gray bar. The SD of group
1, 7.9, is computed from these 5 values.
Recall that the SD is the variability of the
individual data based on how distant each
one is from the group mean (90.0). In
other words, it is a measure of the extent
to which the five students sampled for that
major are not exactly of the same intelligence. The student in group 2 are more
similar to each other and produce a SD
of 4.7. The overall error variance for the
sample is computed by combining these
four SDs (see sidebar).
Calculating the Variances
Within-Groups (Error) Variance:
The overall amount of error variance is the combined variances of the
four groups. Combining the variances from several groups together is
called pooling, so the resulting combined variance is termed the pooled
variance. Averaging the variances in this study produces a pooled error
variance of 6.522 =42.5.
Between-Groups Variance:
Calculation of the between-groups variance is not as intuitive as the
whithin-groups variance. Conceptually, it seems that the SD of the four
group means would be a good measure. (The SD of the means is 10.7.)
However, the actual between-groups SD is 24.0, so the between-groups
variance is 242 = 577.
The between-groups variability is computed in the same way, but we look at how much the group means vary from the
grand mean (the mean of the means). The higher this variability, the more the
means differ from each other and the more the null hyothesis looks “rejectable.”
(See sidebar.)
Finally, the ANOVA
The ANOVA focuses on the ratio of the
between group variance to the withingroups variance. SPSS produces an
ANOVA source table to report the result
of the analysis.This table is called a source
table because it identifies the sources of
variability in the data. As explained above,
there are two kinds of variability, variability between group means, and variability within groups (error variance). The
source table provides information about
these two sources. The column numbers
have been added for our use.
Column 3, reflecting the number of groups
and the sample size, is discussed in the
sidebar. Column 4 presents the variance
associated with the mean differences
(between groups) and within-group error. These numbers are discussed in the
Variances sidebar. Column 5 is the ratio of
these two values, the F statistic. Column 6
presents the p-value (see Inferential Statistics chapter) of the F statistic based on the
sample size. Because our normal criterion
for rejecting the null hypothesis is p<.05,
this p value is very good (good = low), and
ANOVA Source Table
1
2
3
4
5
6
Sum of
Squares
df
Mean
Square
F
Sig.
Between 1730.000
Groups
3
576.667
13.569
.0001
680.000
16
42.500
Total 2410.000
19
Within
Groups
Degrees of Freedom in ANOVA
All statistics, such as F, t, and chi-square, are evaluated in the context of
the sample size: larger samples allow lower statistical values to reach the
magic .05 level of confidence. The sample size is expressed in terms of
degrees of freedom (df). Your statistics class has more to say about df. In
a t-test, the df is the sample size minus 2 (N-2). In ANOVA, we use two
df values. The df-error is based on the sample size:
dfe = ∑(ng-1), where ng is the size of each of the group samples
16 = (5-1) + (5-1) + (5-1) + (5-1)
ANOVA also requires a df for the number of groups:
dfbg = g-1, where g is the number of groups
The F statistic is always presented along with these df values, e.g.,
F(3,19)=13.6, p<.0001.
Page 4
we can reject the null hypothesis.
What has been rejected? By rejecting the null hypothesis, we conclude that the
four means are not equal in the population, that is, all majors are not created
equal. However, what it does not tell us is exactly which major is smarter than
which other major. Is Group 4 smarter than Group 2, or just smarter than the
hapless Group 1? Just eyeballing the means is not good enough: we need to know
if the differences between particular pairs of means are really significantly different.
How is this done? One way is to perform t-tests between pairs of means (there
are several other ways as well).
Using SPSS to Calculate One-Way ANOVA
A one-way ANOVA is an analysis in which
there is only one independent variable, as
the preceding example. This is the simplest kind of ANOVA, and SPSS dedicates a
procedure purely to it. (See menu screen
illustration.)
The dialog window in which the details of
the analysis are entered is quite simple.
In the dialog illustration, the dependent
variable (IQ) and the independent variable
(group) have been entered. In the Options...
dialog you can ask for descriptive statistics
and a rather sorry looking graph of the
means.
Syntax:
ONEWAY
iq BY group
/STATISTICS DESCRIPTIVES
/PLOT MEANS
/MISSING ANALYSIS .
The principal output of the procedure is the
source table shown above.
In a paper, the approprate way to report the results
of an ANOVA is a variation of:
A one-way between-groups ANOVA revealed a significant effect of major,
F(3,16) = 13.6, p < .05.
Note that the ANOVA used two types of degrees of freedom: the between-groups
df and the error df.
Page 5
Factorial ANOVA
If indeed “the truth lies in the interactions,” then we need to perform more complicated studies that include more than one IV. Factorial designs of this kind were
introduced in the research designs chapter. For example, in the study presented
above, we might want to know if gender is related to IQ. The obvious design
would be a 4x2 between-subjects factorial: four majors crossed with gender. In
the table below, the 40 students are indicated by S1...S40 in the 8 cells of the factorial design.
Male
Female
Group 1
S1, S2, S3, S4, S5
S6, S7, S8, S9, S10
Group 2
S11, S12, S13, S14, S15
S16, S17, S18, S19, S20
Group 3
S21, S22, S23, S24, S25
S26, S27, S28, S29, S30
Group 4
S31, S32, S33, S34, S35
S36, S37, S38, S39, S40
The mathematics of a factorial ANOVA are more complicated than those of the
one-way ANOVA, but the principles are the same. The ANOVA compares the
variability due to between-groups differences to the amount of error variance in
the sample. However, in this two-way factorial, we need to look at three types
of between-groups variability: the variability between the majors, the variability
between the genders, and the interaction effect variability. A ratio (F statistic) of
between-subjects variability to error variance is calculated for each of these three
types of between-groups variability.
How many null hypotheses are there?
SPSS and Factorial ANOVA
The simple one-way ANOVA procedure
cannot be used. Instead, factorial ANOVAs
are produced by the SPSS GLM procedure. GLM mean “general linear model.”
You will study the GLM in your second
year of graduate-level statistics. GLM is a
very powerful and flexible procedure that
was only introduced to SPSS in the 1980s.
Because it is powerful and flexible, it can
be configured in many ways and has a large
number of options.
Univariate refers to the fact that you will
be analyzing one dependent variable at a
time. The IQ across majors study presented previously was enhanced by adding
gender as a second independent variable to serve as
an example of a factorial ANOVA. The analysis dialog
What Other Goodies are in this Menu?
box shown here has been configured to run this 4x2
Multivariate ANOVA (MANOVA) allows you to analyze
ANOVA.
several DVs simultaneously, in a single set.
Use the Fixed Factors box for the IVs. Ignore the boxes
below that until you get to graduate school. You can
specify in detail which means tables you would like to
see display in the output by clicking on Options.
Repeated Measures ANOVA analyzes the repeated measures designs introduced in the research designs chapter.
Page 6
Double-clicking on the items in the left-side box moves
them to the right side ‘Display Means for’ box. In this
case, moving ‘group’ to the Display Means box produces a means table that includes just the main effect of
group. The ‘group*gender’ item displays a 4x2 table of
means from which you can see if there is an interaction
effect.
Syntax:
UNIANOVA
iq BY group gender
/METHOD = SSTYPE(3)
/INTERCEPT = INCLUDE
/EMMEANS = TABLES(group*gender)
/CRITERIA = ALPHA(.05)
/DESIGN = group gender group*gender .
The Output
The source table in a factorial ANOVA expands on that
of the one-way ANOVA. Two additional sources are
reported: the second IV, and the interaction effect. (See Tests of Between-Subjects
Effects table.)
The only rows of importance in this source table are those indicating the effects
in the factorial model: GROUP, GENDER, and GROUP*GENDER. The F statistics
in this type of source table are calculated by dividing a factor’s Mean Square by the
Mean Square of the Error row. “Mean Square” is another way of saying “variance.”
Hence, F for the Group factor is:
MSgroup/MSerror = 477.2/14.167 =
33.685.
These results show that the Group
and Gender main effects are significant at a very low p value. SPSS
will not print all of the significant
digits of a very small p value.
For Group, the actual p value is
.000039, but no one cares because
it is so far below .05. The Group X
Gender interaction is not significant because the p value is so large
(p = .567).
In a paper, there are several forms
for reporting the results of a factorial ANOVA:
A 4 (major) x 2 (gender) between-groups ANOVA revealed
Page 7
significant main effects of major, F(3,12) = 33.7, p < .05, and
gender, F(1, 12) = 33.9, p < .05.
The interaction effect did not
approach significance, F < 1.
or, if the interaction had been
stronger:
A 4 (major) X 2 (gender)
between-groups ANOVA revealed significant main effects
of major, F(3,12) = 33.7, p <
.05, and gender, F(1, 12) = 33.9,
p < .05. However, these main
effects must be interpreted
within the significant Major X
Gender interaction, F(3,12) =
8.5, p < .05.
Note that the ANOVA used two
types of degrees of freedom: the
between-groups df and the error
df.
Tests of Between-Subjects Effects
Dependent Variable: IQ
Source Type III Sum
of Squares
IQ = ƒ (major, gender)
The Corrected Model essentially
combines all the predictors of IQ
(Group, Gender, and their interaction) to see if, as a whole, they
predict the dependent variable.
(Hint: add the df.) Of course, we
usually don’t care about the whole
model, but rather only about its
component parts, the individual
IVs.
The ‘Intercept’ row in the table is
Mean
Square
F
Sig.
Corrected
Model
2240.000
7
320.000
22.588
.000
Intercept
195859.200
1
195859.200
13825.355
.000
GROUP
1431.600
3
477.200
33.685
.000
GENDER
480.000
1
480.000
33.882
.000
GROUP *
GENDER
30.000
3
10.000
.706
.567
Error
170.000
12
14.167
Total
206430.000
20
Corrected
2410.000
19
Total
R Squared = .929 (Adjusted R Squared = .888)
Reprise of “Still Deeper Truth”
The intercept reveals a clue to the ridiculous conspiracy theory that ANOVA is just
a lot of correlations. Do you remember the equation for a line from algebra? In
statistics we call this a regression line, and write the equation as
Digging Deeper
Overall, do major and gender help
us know what students’ IQs are?
Said another way, do major and
gender predict IQ? The ‘Corrected Model’ row in the source
table answers this general question: yes. The idea of a model was
introduced in an early chapter.
Here, the model is expressed
mathematically:
df
y = a + bx + e
y is the dependent variable, IQ
x is sort of the independent variables, major, gender, and the interaction, all rolled into one (sort of)
a is the y-intercept of the line
b is the slope of the line
e is the error variance
In the ANOVA table, the intercept F-test is testing if the y-intercept (a) is different
than zero. In a certain sense, the corrected model F-test is testing whether the
slope (b) is different than zero. When the slope is different than zero, the independent variables (x) affect the dependent variable (y).
In the manner of a correlation, a (b) near 1.0 and a low (e) gives us a correlation
scattergram with a long, skinny oval. (i.e., a good correlation). Error variance (e) is
analogous to the fatness of the oval.
Line, slope (b)
y
(DV)
Skinny oval and
slope near 1.0
indicates high correlation coefficient
y-intercept (a)
x
(IV)
Page 8
not usually important. It compares the grand mean (101.0) to zero. Because 101 is
so far from zero, the F is enormous. (But see the sidebar for its deeper meaning.)
What’s Next?
A lot more...
Download