Analysis of Variance

advertisement
Analysis of Variance
Introduction
Analysis of Variance


The Analysis of Variance is abbreviated as
ANOVA
Used for hypothesis testing in



Simple Regression
Multiple Regression
Comparison of Means
Sources



There is variation anytime that all of the
data values are not identical
This variation can come from different
sources such as the model or the factor
There is always the left-over variation that
can’t be explained by any of the other
sources. This source is called the error
Variation



Variation is the sum of squares of the
deviations of the values from the mean of
those values
As long as the values are not identical,
there will be variation
Abbreviated as SS for Sum of Squares
Degrees of Freedom



The degrees of freedom are the number
of values that are free to vary once certain
parameters have been established
Usually, this is one less than the sample
size, but in general, it’s the number of
values minus the number of parameters
being estimated
Abbreviated as df
Variance





The sample variance is the average
squared deviation from the mean
Found by dividing the variation by the
degrees of freedom
Variance = Variation / df
Abbreviated as MS for Mean of the
Squares
MS = SS / df
F





F is the F test statistic
There will be an F test statistic for each
source except for the error and total
F is the ratio of two sample variances
The MS column contains variances
The F test statistic for each source is the
MS for that row divided by the MS of the
error row
F




F requires a pair of degrees of freedom,
one for the numerator and one for the
denominator
The numerator df is the df for the source
The denominator df is the df for the error
row
F is always a right tail test
The ANOVA Table


The ANOVA table is composed of rows,
each row represents one source of
variation
For each source of variation …




The variation is in the SS column
The degrees of freedom is in the df column
The variance is in the MS column
The MS value is found by dividing the SS by
the df
ANOVA Table


The complete ANOVA table can be
generated by most statistical packages
and spreadsheets
We’ll concentrate on understanding how
the table works rather than the formulas
for the variations
The ANOVA Table
Source
SS
(variation)
df
MS
F
(variance)
Explained*
Error
Total
The explained* variation has different names depending on the particular type
of ANOVA problem
Example 1
Source
SS
df
MS
Explained
18.9
3
Error
72.0
16
F
Total
The Sum of Squares and Degrees of Freedom are given. Complete the table.
Example 1 – Find Totals
Source
SS
df
MS
Explained
18.9
3
Error
72.0
16
Total
90.9
19
Add the SS and df columns to get the totals.
F
Example 1 – Find MS
Source
SS
df
MS
Explained
18.9
÷3
= 6.30
Error
72.0
÷ 16
= 4.50
Total
90.9
÷ 19
= 4.78
Divide SS by df to get MS.
F
Example 1 – Find F
Source
SS
df
MS
Explained
18.9
3
6.30
Error
72.0
16
4.50
Total
90.9
19
4.78
F = 6.30 / 4.50 = 1.4
F
1.40
Notes about the ANOVA



The MS(Total) isn’t actually part of the
ANOVA table, but it represents the sample
variance of the response variable, so it’s
useful to find
The total df is one less than the sample
size
You would either need to find a Critical F
value or the p-value to finish the
hypothesis test
Example 2
Source
Explained
Error
Total
Complete the table
SS
df
106.6
26
MS
F
21.32
2.60
Example 2 – Step 1
Source
Explained
Error
SS
106.6
df
MS
F
5
21.32
2.60
26
8.20
Total
SS / df = MS, so 106.6 / df = 21.32. Solving for df gives df = 5.
F = MS(Source) / MS(Error), so 2.60 = 21.32 / MS. Solving gives MS = 8.20.
Example 2 – Step 2
Source
SS
df
MS
F
2.60
Explained
106.6
5
21.32
Error
213.2
26
8.20
Total
31
SS / df = MS, so SS / 26 = 8.20. Solving for SS gives SS = 213.2.
The total df is the sum of the other df, so 5 + 26 = 31.
Example 2 – Step 3
Source
SS
df
MS
F
2.60
Explained
106.6
5
21.32
Error
213.2
26
8.20
Total
319.8
31
Find the total SS by adding the 106.6 + 213.2 = 319.8
Example 2 – Step 4
Source
SS
df
MS
F
2.60
Explained
106.6
5
21.32
Error
213.2
26
8.20
Total
319.8
31
10.32
Find the MS(Total) by dividing SS by df. 319.8 / 31 = 10.32
Example 2 – Notes


Since there are 31 df, the sample size was
32
Since the sample variance was 10.32 and
the standard deviation is the square root
of the variance, the sample standard
deviation is 3.21
Example 3
Source
Explained
Error
SS
df
MS
56.7
14
13.50
Total
The sample size is n = 20. Work this one out on your own!
F
Example 3 - Solution
Source
Explained
SS
df
MS
56.7
5
11.34
Error
189.0
14
13.50
Total
245.7
19
12.93
How did you do?
F
0.84
Download