Statistics Multivariate Studies Analysis of Variance ANOVA - 1

advertisement
Analysis of Variance
ANOVA - 1
Multivariate Studies
Observational Study: conditions to which
subjects are exposed are not controlled by
the investigator. (no attempt is made to
control or influence the variables of interest)
Statistics
Analysis of Variance:
Comparing More Than 2 Means
ANOVA - 1
Experimental Study: conditions to which
subjects are exposed to are controlled by the
investigator. (treatments are used in order to
observe the response)
ANOVA - 2
Experiment
Examples of Experiments
1. Investigator Controls One or More
Independent Variables
Called Treatment Variables or Factors
Contain Two or More Levels (Subcategories)
2. Observes Effect on Dependent Variable
Response to Levels of Independent Variable
3. Experimental Design: Plan Used to Test
Hypotheses
ANOVA - 3
1. Thirty Locations Are Randomly Assigned 1
of 4 (Levels) Health Promotion Banners
(Independent Variable) to See the Effect
on Using Stairs (Dependent Variable).
2. Two Hundred Consumers Are Randomly
Assigned 1 of 3 (Levels) Brands of Juice
(Independent Variable) to Study Reaction
(Dependent Variable).
ANOVA - 4
Completely Randomized
Design
Experimental Designs
1. Experimental Units (Subjects) Are
Assigned Randomly to Treatments
Experimental
Experimental
Designs
Designs
Completely
Completely
Randomized
Randomized
Randomized
Randomized
Block
Block
Factorial
Factorial
One-Way
One-Way
ANOVA
ANOVA
Randomized
Randomized
Block
Block FF Test
Test
Two-Way
Two-Way
ANOVA
ANOVA
Subjects are Assumed Homogeneous
2. One Factor or Independent Variable
2 or More Treatment Levels or Classifications
3. Analyzed by One-Way ANOVA
ANOVA - 5
ANOVA - 6
Analysis of Variance
ANOVA - 2
Randomized Design
Example
One-Way ANOVA F-Test
Factor
Factor (Training
(Training Method
Method)
Level
Level
Level
Level 11
Level
Level
Factor
Factor levels
levels
(Treatments)
(Treatments)
22
1. Tests the Equality of 2 or More (t)
Population Means (µ1=µ2= …=µt )
2. Variables
33
Experimental
Experimental
units
units
Dependent
Dependent
variable
variable
21
21 hrs.
hrs.
17
17 hrs.
hrs.
31
31 hrs.
hrs.
27
27 hrs.
hrs.
25
25 hrs.
hrs.
28
28 hrs.
hrs.
(Response)
(Response)
29
29 hrs.
hrs.
20
20 hrs.
hrs.
22
22 hrs.
hrs.
ANOVA - 7
One Nominal Scaled Independent Variable
One Interval or Ratio Scaled Dependent
Variable
2 or More (t) Treatment Levels or Classifications
3. Used to Analyze Completely
Randomized Experimental Designs
ANOVA - 8
One-Way ANOVA F-Test
Assumptions
One-Way ANOVA F-Test
Hypotheses
H0: µ1 = µ2 = µ3 = ... = µt
1. Randomness & Independence of Errors
Independent Random Samples are Drawn
2. Normality
Populations have Equal Variances
ANOVA - 9
y
µ1 = µ2 = µ3
Ha: Not All µj Are Equal
Populations are Normally Distributed
3. Homogeneity of Variance (σ1=σ2= …=σt )
f(y)
All Population Means
are Equal
No Treatment Effect
At Least 1 Pop. Mean
is Different
Treatment Effect
µ1 ≠ µ2 ≠ ... ≠ µt Is
Wrong
f(y)
y
µ1 = µ 2 µ 3
ANOVA - 10
Why Variances?
Why Variances?
Example: Hourly wage for three ethnic group
CASE I
1
2
Case I
3
1
2
Case II
6.0
CASE II
8
5.8
3
7
5.6
5.01
5.90
6.31
4.52
5.50
5.00
4.42
3.54
6.93
5.89
5.50
4.99
7.51
4.73
4.48
5.91
5.49
4.98
7.89
7.20
5.55
5.88
5.50
5.02
3.78
5.72
3.52
5.90
5.50
5.00
5.90
5.50
5.00
6
CASE2
5.51
5.92
CASE1
5.90
5.4
5
5.2
4
5.0
4.8
0.0
Average
ANOVA - 11
3
1.0
2.0
GROUPID
ANOVA - 12
3.0
4.0
0.0
1.0
2.0
GROUPID
3.0
4.0
Analysis of Variance
ANOVA - 3
One-Way ANOVA
Basic Idea
Why Variances?
Same treatment variation Different treatment variation
Different random variation Same random variation
A
Pop 1 Pop 2 Pop 3
Pop 4
Pop 5
B Pop 1 Pop 2
Pop 6
Variances WITHIN differ
ANOVA - 13
Pop 3
Pop 5
Pop 4
Pop 6
Variances AMONG differ
Possible to conclude means are equal!
1. Compares 2 Types of Variation to Test
Equality of Means
2. Comparison Basis Is Ratio of Variances
3. If Treatment Variation Is Significantly
Greater Than Random Variation then
Means Are Not Equal
4. Variation Measures Are Obtained by
‘Partitioning’ Total Variation
ANOVA - 14
One-Way ANOVA
Partitions Total Variation
Total variation
Variation due to
treatment
Variation due to
random sampling
Sum of Squares Among
Sum of Squares Between
Sum of Squares Treatment
Among Groups Variation
Sum of Squares Within
Sum of Squares Error
Within Groups Variation
ANOVA - 15
Notations
yij :
y i⋅ :
the j-th element from the i-th treatment
y ⋅⋅ :
the overall sample mean
the i-th treatment mean
n T : the total sample size (n1 + n2 + … + nt)
ANOVA - 16
Total Variation
Treatment Variation
TSS = ( y11 − y⋅⋅ )2 + ( y21 − y⋅⋅ )2 + + ( yij − y⋅⋅ )2
t ni
SSB = n1( y1⋅ − y⋅⋅ )2 + n2 ( y2⋅ − y⋅⋅ )2 + + nt ( yt⋅ − y⋅⋅ )2
t
= ∑ ni ( yi⋅ − y⋅⋅ ) 2
= ∑ ∑ ( yij − y⋅⋅ ) 2
i =1
i =1 j =1
Response, y
Response, y
y3
y
y
y1
Group 1
ANOVA - 17
Group 2
Group 3
Group 1
ANOVA - 18
y2
Group 2
Group 3
Analysis of Variance
ANOVA - 4
One-Way ANOVA F-Test
Test Statistic
Random (Error) Variation
SSW = ( y11 − y1⋅ )2 + ( y21 − y2⋅ )2 + l + ( ytj − yt⋅ )2
t ni
t
i =1 j =1
i =1
1. Test Statistic
= ∑ ∑ ( yij − yi⋅ ) 2 = ∑ (ni − 1) si2
F = MSB / MSW
Response, y
y3
2. Degrees of Freedom
y2
y1
ν1 = t -1
ν2 = nT - t
Group 1
Group 2
Group 3
ANOVA - 19
Source of Degrees Sum of
Squares
of
Variation
Freedom
Treatment
t-1
SSB
F
Mean
Square
(Variance)
MSB
MSB =
SSB/(t - 1) MSW
Error
MSW =
SSW/(n
SSW/(nT - t)
(Between samples)
Total
t = # Populations, Groups, or Levels
nT = Total Sample Size
ANOVA - 20
One-Way ANOVA
Summary Table
(Within samples)
MSB Is Mean Square for Treatment
MSW Is Mean Square for Error
nT - t
SSW
One-Way ANOVA F-Test
Critical Value
If means are equal,
F = MSB / MSW ≈ 1.
Only reject large F !
Reject H0
α
Do Not
Reject H0
F
0
Fα (t-1, n
nT - 1 TSS =
SSB+SSW
T
–t)
Always OneOne-Tail!
© 19841984-1994 T/Maker Co.
ANOVA - 21
ANOVA - 22
One-Way ANOVA F-Test
Example
As production manager,
you want to see if 3 filling
machines have different
mean filling times. You
assign 15 similarly trained
& experienced workers, 5
per machine, to the
machines. At the .05 level,
is there a difference in
mean filling times?
ANOVA - 23
Mach1
25.40
26.31
24.10
23.74
25.10
Mach2
23.40
21.80
23.50
22.75
21.60
Mach3
20.00
22.20
19.75
20.60
20.40
One-Way ANOVA F-Test
Solution
H0: µ1 = µ2 = µ3
Ha: Not All Equal
α = .05
ν 1 = 2, ν 2 = 12
Critical Value(s):
Test Statistic:
F=
α = .05
0
ANOVA - 24
3.89
F
MSB 23.5820
= 25.6
=
.9211
MSW
Decision:
Reject at α = .05
Conclusion:
There Is Evidence Pop.
Means Are Different
Analysis of Variance
ANOVA - 5
Summary Table
Solution
Source of Degrees of Sum of
Variation Freedom Squares
Treatment
(Machines)
3-1=2
Mean
F
Square
(Variance)
47.1640 23.5820 25.60
Error
15 - 3 = 12 11.0532
Total
15 - 1 = 14 58.2172
.9211
From Computer
ANOVA - 25
One-Way ANOVA F-Test
Thinking Challenge
You’re a trainer for Microsoft
Corp. Is there a difference in
mean learning times of 12
people using 4 different
training methods (α
α =.05)?
M1 M2 M3 M4
10 11 13 18
9 16
8 23
5
9
9 25
Use the following table.
ANOVA - 26
Summary Table
Solution*
One-Way ANOVA F-Test
Solution*
H0: µ1 = µ2 = µ3 = µ4
v.s.
Source of Degrees of Sum of
Freedom Squares
Variation
Treatment
(Methods)
4-1=3
348
Ha: Not All Equal
Error
12 - 4 = 8
80
α = .05
Total
12 - 1 = 11
428
ANOVA - 27
10
α = .05
4.07
F
SPSS Error Bar Chart
Test Statistic:
40
MSB 116
=
= 116
11.6
MSW
10
p-value = .003
30
F=
95% CI SCORE
H0: µ1 = µ2 = µ3 = µ4
Ha: Not All Equal
α = .05
ν1=3 ν2=8
Critical Value(s):
ANOVA - 29
F
Mean
Square
(Variance)
116
11.6
ANOVA - 28
One-Way ANOVA F-Test
Solution*
0
© 1984-1994 T/Maker Co.
Decision:
Reject at α = .05
20
10
0
-10
Conclusion:
There Is Evidence Pop.
Means Are Different
N=
3
3
3
3
1.00
2.00
3.00
4.00
METHOD
ANOVA - 30
Multiple Comparisons
Analysis of Variance
Linear Model for CRD
Let yij be the j-th sample observation from
the population i,
yij = µ + αi + εij
µ : over all mean
αi : i-th treatment effect
εij : error term, or random variation of yij
about µi where µi = µ + αi
ANOVA - 31
ANOVA - 6
One-Way ANOVA F-Test
Hypotheses
H0: µ1 = µ2 = µ3 = ... = µt
All Population Means are Equal
No Treatment Effect
is equivalent to
H0: α1 = α2 = α3 = ... = αt
ANOVA - 32
Error Term Assumptions
For parametric F test, εij’s are independent
and normally distributed with constant
variance σε2.
The normality assumption can be checked
by using the estimates (residuals)
eij = yij − yi⋅
ANOVA - 33
Equal variances assumption can be
verified by using Hartley’s test (very
sensitive to normality) or Levine’s Test.
Levine’s test can be done by applying
yi⋅
ANOVA on zij = yij − ~
~
where yi⋅ is the sample median of the i-th
sample.
ANOVA - 34
What if the assumptions
are not satisfied?
Try a nonparametric method:
Kruskal-Wallis Test
ANOVA - 35
Error Term Assumptions
Download