CHAP8

advertisement
Chapter 8
ANALYSIS OF VARIANCE
(ONE WAY)
Learning Objectives
In this chapter you will learn how to
analyze between more than two groups of
subjects
 Analysis of variance (ANOVA) is a
statistical test that is designed to examine
means across more than two groups by
comparing variances, based upon the
variability in each sample and in the
combined samples

SOURCE OF ANOVA
With ANOVA there is no limit to the
amount of groups that can be compared
 When there are three or more levels for
the nominal (grouping) variable the
number of comparisons increases with the
number of groups. Therefore having using
multiple t-tests would be required for all
comparisons two at a time, the results
would be very difficult to interpret.

SOUCRE OF ANOVA
ANOVA reports the variance within the groups,
ANOVA then calculates how that variation would
translate into differences between the groups
while taking into account how many groups
there are
 In your data set the region is a grouping
variable, and use the number of prisoners
executed between 1975-1995 as the
independent variable
 We see that the mean number of prisoners
executed varies across the regions over the time
period

SOURCE OF ANOVA
Under ANOVA the null hypothesis is that the
means for all the group (regions) are equal
 The null hypothesis is that there is no difference
in the average number of inmates executed
between 1976 and 1995 across the four regions
of the country
 ANOVA tests to see which region (independent
variable) is producing the effect, that at least
one of the group means is different from the
other

SOURCES OF ANOVA

When scores are divided into three or
more groups the variation can be divided
into two parts:
– 1. The variation of scores within the groups. Variance
and standard deviation
– 2. The variation of scores between the groups. There
is variation from one group to the next. It accounts for
the variability of the means between each sample
SOURCES OF ANOVA




ANOVA compares these two estimates of variability
The main function of ANOVA involves whether the
between-groups variance is significantly larger than the
within-groups variance.
If we show that the variance of between-groups is larger
than the variance of the within-groups then we reject
the null hypothesis
If the means of the groups are equal then they will vary
little around the total mean ( ) across the groups, if the
group means are different they will vary significantly
around ( ) more than they vary within their groups
SOURCES OF ANOVA
ANOVA tests the null hypothesis by comparing
the variation between the groups to that within
the groups.
 To compute ANOVA you:

– 1. The total amount of variation among all scores
combined (total sum of squares – SSt
– 2. The amount of variation between the groups
(between-groups sum of squares – SSb
– 3. The amount of variation within the groups (withingroups sum of squares SSw)
THE F TEST (F RATIO)
The ‘F’ test is a ratio of the two estimates of variability
(between-groups mean square divided by the withingroups mean square variation)
 If null hypothesis is true then ‘F’ ratio value is one, if null
hypothesis is false then ‘F’ ratio is value is greater than
one
 If the ‘F’ test value is statistically significant then you
reject the null hypothesis, hence the group means are
not equal
 You can look at he group means to tell this, but
inspection does not revel where the differences leading
to the ‘F’ value and the rejection of null hypothesis
originates

The F Test (F Ratio)
To locate the source of the difference you must
use a multiple comparison procedure, the
Bonferroni (discussed latter) procedure is
recommended. This test allows you to pinpoint
where the difference originate
 In our example we have four groups, thus eight
possible comparisons, the Bonferroni procedure
pinpoints where the differences come from,
protecting you from concluding that too many of
the differences between the group means are
statistically significant.

The ‘F’ Test (F ratio)

ANOVA Requirements:
– The data must be a random sample from a population
– The single dependent variable must be measured at the
interval level ( in order to compute a mean)
– The independent variable need only to be measured
categorically at either the nominal or ordinal level (to
provide group means)
ANOVA and SPSS


Calculating ANOVA by hand is nearly impossible by hand
with large groups, yet the example in the book shows
where the numbers that SPSS generates come from
Using SPSS, we are going to compare the average
number of inmates executed across the four regions of
the United States to see if different categories of states
(region is measured at the nominal level) vary
significantly in the mean number of prisoners executed
(a ratio level variable) over this time period (1977-1995).
The independent variable is the grouping of the regions,
and the dependant variable is number of prisoners
executed between 1977-1995.
SPSS, One way ANOVA

To obtain your SPSS output for ANOVA follow the steps outlined
here.
 1.Open the existing file, “StateData.sav” figure 8.1. This is the state data set
containing crime and other data from the criminal justice system for all fifty
states
 2. Choose “Analyze”, “Compare means,” and then “One-Way ANOVA” figure
8.2
 3. In the “One-Way ANOVA” window (Figure 8.3) Select and enter the
dependent variable – “Prisoners Executed between 1977-1995” – in the
“Dependant List” box. The program will calculate the mean and other
statistics for this variable. Select and enter the independent variable“Region” in the “Factor” box. This variable divided the sample into groups
 4. Returning to the One-Way ANOVA window (Figure 8.4), click on the
“Options” button to open the One-Way ANOVA: Options window. Under
statistics check “Descriptive.” Click on “Continue” to return to the One way
ANOVA window.
 5. Click on “Post Hoc.” Select the “Bonferroni” method (Figure 8.5). Click on
“Continue” to return to the One-Way ANOVA window.
 6. Click on “Ok” to generate your output.
SPSS & One Way ANOVA
In this example we selected “Prisoners Executed
between 1977-1995 as the independent variable.
 The null hypothesis is that there is no regional
difference in the average number of prisoners
executed between 1977 and 1995.
 The research hypothesis is that the South has
the highest average number of prisoners
executed during the this time period.

Results

The ANOVA table from 8.1 is what we
need to focus on, because we are focusing
on the null hypothesis (that there is no
difference in the average number of
executions across the regions during the
this time period). Thus we must make the
decision to accept or reject the null
hypothesis.
Results
The first column in ANOVA gives us the sum of
squares between and within the groups and for
the entire sample. The total sum of squares
represents the entire variance on the dependent
variable for the entire sample.
 The second column represents the degrees of
freedom, (n-1). The total degrees of freedom
represent 50-1=49, degrees of freedom between
groups equals the number of groups minus one
(4-1=3). The within groups degrees of freedom
equals 49-3=46.

Results
The third (mean square) column in figure
8.1 contains the estimates of variability
between and within the groups. The mean
square estimate is equal to the sum of the
squares divided by the degrees of
freedom.
 The between the groups mean square is
2469.194/3=823.065, the within-groups
mean square is 10260.426/46=223.053

Results
The fourth column, the F ratio, is calculated by
dividing the mean square between groups by
the mean square within the groups.
 If the null hypothesis is true, both mean square
estimates should be equal and the F ratio should
be one, the larger the F ratio the greater the
likelihood that he difference between the means
does not result from chance. In our example the
F ratio is 3.69 (823.065/223.053).

Results

The last column in figure 8.1 is the Significance
level or Sig., and it tells us that the value of our
F ratio (3.69) is large enough to reject the null
hypothesis. The Sig. level is .018 is less than
.05. The mean number of executions in the
different regions of the country between 19771995 were significantly different, and this
difference is greater than we would expect by
chance. We still can not say where the
differences lie.
Bonferroni Procedure

The Bonferroni comes into play here (Post
Hoc Tests). It adjusts the observed
significance level by multiplying it by the
number of comparisons being made, since
we are looking at eight possible
comparisons of group means the observed
sig. level must be 0.05/8 or 0.004 for the
difference between group means to be
significant at the 0.05 level.
Bonferroni Procedure


The multiple comparison procedures protect you from
calling differences significant when they are not, the
more comparisons you make the larger the difference
between pairs of means must be for a multiple
comparison procedure to call it statistically significant.
The multiple comparisons table, table 8.1 gives us all
possible combinations here. Each block represents one
region compared to all the others by row. We see that
the mean number of executions in southern states
compared against the western states is statistically
significant at 0.05 or less.
Conclusion

ANOVA is a statistical technique that compares the difference
between sample means when you have more than two samples or
groups. We are analyzing the variance within and between the
samples to determine the significance of any differences

The F ratio between the mean squares between the groups (MSb)
and the mean squares within the groups (MSw) is the heart of
ANOVA. If the null hypothesis is rejected from the F test there is a
statistically significant difference between the means of the groups.
In order to determine where the significant difference lies, the
Bonferroni multiple comparison method must be used. This test tells
you which of the multiple comparisons between groups is
statistically significant
Download