Uploaded by kadepoju

MANOVA LECTURE

advertisement
Comparison of Several Multivariate
Means
Suggested Reading : Chapter Six
Kazeem Adepoju, PhD
July 9, 2019
Outlines
• Univariate Analysis of Variance
• Multivariate Analysis of Variance
One way Analysis of Variance
(ANOVA)
Comparing k Populations
The F test – for comparing k means
Situation
• We have k normal populations
• Let mi and s denote the mean and standard
deviation of population i.
• i = 1, 2, 3, … k.
• Note: we assume that the standard deviation
for each population is the same.
s1 = s2 = … = sk = s
We want to test
H 0 : m1  m2  m3    mk
against
H A : mi  m j for at least one pair i, j
The F statistic
1
k 1
F
 n x  x 
k
1
N k
2
i
i 1
k nj
i
 x
i 1 j 1
ij  xi 
2
where xij = the jth observation in the i th sample.
i  1,2,, k
and j  1,2,, ni 
ni
xi 
x
j 1
ij
ni
 mean for i th sample i  1,2,, k 
k
N   ni  Total sample size
i k1
x
ni
 x
i 1 j 1
N
ij
 Overall mean
The ANOVA table
Source
S.S
SS B   ni xi  x 
k
Between
2
k 1
i 1
SSW   xij  xi 
k
Within
d.f,
nj
i 1 j 1
2
M.S.
MS B 
N  k MSW 
F
1
k 1
 n x  x 
1
N k
 x
k
i 1
k
F
2
i
nj
i 1 j 1
i
MS B
MSW
ij  xi 
2
The ANOVA table is a tool for displaying the
computations for the F test. It is very important when
the Between Sample variability is due to two or more
factors
Computing Formulae:
Compute
ni
1)
2)
Ti   xij  Total for sample i
j 1
k
k
G   Ti   xij  Grand Total
i 1
k
3)
i 1
ni
 x
ij
i 1 j 1
k
5)
i 1 j 1
N   ni  Total sample size
k
4)
ni
2
Ti

i 1 ni
2
The data
• Assume we have collected data from each of
k populations
• Let xi1, xi2 , xi3 , … denote the ni observations
from population i.
• i = 1, 2, 3, … k.
Then
1)
SS Between
3)
2
Ti G


N
i 1 ni
k
2)
2
k
ni
k
2
Ti
SSW ithin   xij  
i 1 j 1
i 1 ni
2
SS Between k  1
F
SSW ithin  N  k 
Anova Table
Source
d.f.
Sum of
Squares
Between
k-1
SSBetween
Mean
Square
MSBetween
Within
N-k
SSWithin
MSWithin
Total
N-1
SSTotal
SS
MS 
df
F-ratio
MSB /MSW
Example
In the following example we are comparing weight
gains resulting from the following six diets
1. Diet 1 - High Protein , Beef
2. Diet 2 - High Protein , Cereal
3. Diet 3 - High Protein , Pork
4. Diet 4 - Low protein , Beef
5. Diet 5 - Low protein , Cereal
6. Diet 6 - Low protein , Pork
Gains in weight (grams) for rats under six diets
differing in level of protein (High or Low)
and source of protein (Beef, Cereal, or Pork)
Diet
Mean
Std. Dev.
x
x2
1
73
102
118
104
81
107
100
87
117
111
100.0
15.14
1000
102062
2
98
74
56
111
95
88
82
77
86
92
85.9
15.02
859
75819
3
94
79
96
98
102
102
108
91
120
105
99.5
10.92
995
100075
4
90
76
90
64
86
51
72
90
95
78
79.2
13.89
5
107
95
97
80
98
74
74
67
89
58
83.9
15.71
792
839
64462 72613
6
49
82
73
86
81
97
106
70
61
82
78.7
16.55
787
64401
Thus
Ti 2 G 2
5272 2
SS Between  

 467846 
 4612.933
N
60
i 1 ni
2
k ni
k
T
2
SSW ithin   xij   i  479432  467846  11586
i 1 j 1
i 1 ni
k
SS Between k  1 4612.933 / 5 922.6
F


 4.3
SSW ithin N  k  11586 / 54
214.56
F0.05  2.386 with  1  5 and  2  54
Thus since F > 2.386 we reject H0
Anova Table
Source
d.f.
Sum of
Squares
Between
5
4612.933
Mean
Square
922.587
F-ratio
4.3**
(p = 0.0023)
SS
Within
54
11586.000
Total
59
16198.933
214.556
* - Significant at 0.05 (not 0.01)
** - Significant at 0.01
Equivalence of the F-test and the t-test
when k = 2
the t-test
xy
t
1 1
sPooled

n m
sPooled 
n  1sx2  m  1s 2y
nm2
the F-test
k
2
Between
2
Pooled
s
F
s

 n x  x 
2
i
i 1
i
k
 n  1s
i 1
2
i
i
k 1
 k

  ni  k 
 i 1

n1  x1  x   n2  x1  x 

n1  1s12  n1  1s12 n1  n2  2
2

denominato r  s
2

2
pooled
numerator  n1 x1  x   n2 x1  x 
2
2

n1 x1  n2 x2 

n1  x1  x   n1  x1 
n1  n2 

2
n1n2
2



x

x
1
2
2
n1  n2 
2
n2  x2  x 
2

n1 x1  n2 x2 

 n2  x2 
n1  n2 

n12 n2
2
x1  x2 

2
n1  n2 
2
2
n1  x1  x   n2  x2  x 
2
2
nn n n

n1  n 
2
1 2
2
2 1
2
2
x1  x2 2
n1n2
x1  x2 2

n1  n2 

Hence
F
1
1 1
  
 n1 n2 
1
x1  x2 
2
x1  x2 
2
 1 1  s Pooled
  
 n1 n2 
2
 t2
The model
Note:
yij  mi   yij  mi   mi   ij
 m   mi  m    ij  m  i   ij
 ij  yij  mi
where
1 k
m   mi
k i 1
 i  mi  m
a
Note:

i 1
i
0
has N(0,s2) distribution
(overall mean effect)
(Effect of Factor A)
by their definition.
Model 1:
yij (i = 1, … , a; j = 1, …, n) are independent
Normal with mean mi and variance s2.
Model 2:
yij  mi   ij
where ij (i = 1, … , a; j = 1, …, n) are independent
Normal with mean 0 and variance s2.
Model 3:
yij  m   i   ij
where ij (i = 1, … , a; j = 1, …, n) are independent
Normal with mean 0 and variance s2 and
a

i 1
i
0
MANOVA
Multivariate Analysis of Variance
One way Multivariate Analysis
of Variance (MANOVA)
Comparing k p-variate Normal
Populations
The F test – for comparing k means
Situation
• We have k normal populations
• Let mi and  denote the mean vector and
covariance matrix of population i.
• i = 1, 2, 3, … k.
• Note: we assume that the covariance matrix
for each population is the same.
1  2 
 k  
We want to test
H 0 : m1  m2  m3 
 mk
against
H A : mi  m j for at least one pair i, j
The data
• Assume we have collected data from each of
k populations
• Let xi1 , xi 2 , , xin denote the n observations
from population i.
• i = 1, 2, 3, … k.
Computing Formulae:
Compute
n
1) Ti   xij  Total vector for sample i
j 1
 n

x
  1ij   T 
 j 1   1i 
 

 n
  
 x  Tpi 
pij
 

j 1
 G1 
k
k
 
2) G   Ti   xij     Grand Total vector
i 1
i 1 j 1
G p 
 
ni
3)
N  kn  Total sample size
 k n 2
  x1ij
 i 1 j 1
k
n
4)  xij xij  
 k n
i 1 j 1

x1ij x pij

 i 1 j 1
5)
 1 k 2
 n  T1i
i 1
k

1

TT

i i  
n i 1
 k
 1  T1iTpi
 n i 1

x1ij x pij 

i 1 j 1



k
n
2

x

pij

i 1 j 1
k
n
1 k

T1iTpi 

n i 1



k
1
2 
T

pi

n i 1
Let
1 k
1

H   TT
GG
i i 
n i 1
N
 1 k 2 G12
T1i 


N
 n i 1

 k
 1 T T  G1G p
1i pi
 n 
N
i 1
G1G p 
1 k
T1iTpi 


n i 1
N 


1 k 2 G12 
T1i 

n i 1
N 
k

2
n
x

x
 1i 1 


i 1


 k
 n  x1i  x1   x pi  x p 
 i 1

n  x1i  x1   x pi  x p  
i 1



k
2

n  x pi  x p 

i 1
k
= the Between SS and SP matrix
k
Let
n
1 k

E   xij xij   TT
i i
n i 1
i 1 j 1
k
n
1 k 2

2
  x1ij  n  T1i
i 1
 i 1 j 1

 k n
1 k

x1ij x pij   T1iTpi
 
n i 1
i 1 j 1
k
n
2

 x1ij  x1i 


i 1 j 1


 k n

x1ij  x1i  x pij  x pi  


 i 1 j 1
1 k

x
x

T
T


1ij pij
1i pi 
n
i 1 j 1
i 1



k
n
k
1
2
x

Tpi2 


pij

n i 1
i 1 j 1
k
n

 x1ij  x1i  x pij  x pi 

i 1 j 1



k
n
2
 x pij  x pi  

i 1 j 1

k
n
= the Within SS and SP matrix
The Manova Table
Source
Between
Within
SS and SP matrix
 h11

H 
 h1 p

 e11

E
 e1 p

h1 p 


hpp 
e1 p 


e pp 
There are several test statistics for testing
H 0 : m1  m2  m3 
 mk
against
H A : mi  m j for at least one pair i, j
1. Roy’s largest root
1  largest eigenvalue of HE1
This test statistic is derived using Roy’s union
intersection principle
2. Wilk’s lambda (L)
E
1
L

H  E HE1  I
This test statistic is derived using the generalized
Likelihood ratio principle
3. Lawley-Hotelling trace statistic
T02  trHE1  sum of the eigenvalues of HE1
4. Pillai trace statistic (V)
V  trH  H  E 
1
Example
In the following study, n = 15 first year
university students from three different School
regions (A, B and C) who were each taking the
following four courses (Math, biology, English
and Sociology) were observed: The marks on
these courses is tabulated on the following slide:
The data
Student
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Math
62
54
53
48
60
55
76
58
75
55
72
72
76
44
89
A
Biology
65
61
53
56
55
52
71
52
71
51
74
75
69
48
71
English
67
75
53
73
49
34
35
58
60
69
64
51
69
65
59
Sociology Student
76
1
70
2
59
3
81
4
60
5
41
6
40
7
46
8
59
9
75
10
59
11
47
12
57
13
65
14
67
15
Educational Region
B
Math
Biology English Sociology Student
65
55
35
43
1
87
81
59
64
2
75
67
56
68
3
74
70
55
66
4
83
71
40
52
5
59
48
48
57
6
61
47
46
54
7
81
77
51
45
8
77
68
42
49
9
82
84
63
70
10
68
64
35
44
11
60
53
60
65
12
94
88
51
63
13
96
88
67
81
14
84
75
46
67
15
Math
47
57
65
41
56
63
43
28
47
42
50
46
74
63
69
C
Biology
47
69
71
64
54
73
62
47
54
44
53
61
78
66
82
English Sociology
98
78
68
45
77
62
68
58
86
64
88
76
84
78
65
58
90
78
79
73
89
89
91
82
99
86
94
86
78
73
Summary Statistics
xA 
SA 
xB 
SB 
63.267
61.600
58.733
60.133
160.638
104.829
-32.638
-47.110
104.829
92.543
-4.900
-22.229
-32.638
-4.900
155.638
128.967
-47.110
-22.229
128.967
159.552
76.400
69.067
50.267
59.200
141.257
155.829
45.100
60.914
155.829
185.924
61.767
71.057
45.100
61.767
96.495
93.371
60.914
71.057
93.371
123.600
xC 
52.733
61.667
83.600
72.400
SC 
156.067
116.976
53.814
35.257
116.976
136.381
3.143
-0.429
53.814
3.143
116.543
114.886
35.257
-0.429
114.886
156.400
x 
15
15
15
xA 
xB 
xC 
45
45
45
64.133
S Pooled 
64.111
64.200
63.911
14
14
14
S A  S B  SC 
42
42
42
152.654
125.878
22.092
16.354
125.878
138.283
20.003
16.133
22.092
20.003
122.892
112.408
16.354
16.133
112.408
146.517
Computations :
n
1) Ti   xij  Total vector for sample i
j 1
 G1 
k
k
 
2) G   Ti   xij     Grand Total vector
i 1
i 1 j 1
G p 
 
ni
Totals
Grand Totals
A
B
C
G
Math
Biology English Sociology
949
924
881
902
1146
1036
754
888
791
925
1254
1086
2886
2885
2889
2876
3) N  kn  Total sample size = 45
 k n 2
  x1ij
 i 1 j 1
k
n
4)  xij xij  
 k n
i 1 j 1

x1ij x pij

 i 1 j 1
=
195718
191674
180399
182865
191674
191321
184516
184542

x1ij x pij 

i 1 j 1



k
n
2

x

pij

i 1 j 1
k
n
180399
184516
199641
193125
182865
184542
193125
191590
 1 k 2
 n  T1i
i 1
k

1

TT

i i  
n i 1
 k
 1  T1iTpi
 n i 1
5)
=
189306.53
186387.13
179471.13
182178.13
186387.13
185513.13
183675.87
183864.40
1 k

T1iTpi 

n i 1



1 k 2 
Tpi


n i 1
179471.13
183675.87
194479.53
188403.87
182178.13
183864.40
188403.87
185436.27
Now
1 k
1

H   TT
GG
i i 
n i 1
N
=
4217.733333 1362.466667 -5810.066667 -2269.333333
1362.466667 552.5777778 -1541.133333 -519.1555556
-5810.066667 -1541.133333 9005.733333 3764.666667
-2269.333333 -519.1555556 3764.666667 1627.911111
= the Between SS and SP matrix
k
Let
n
1 k

E   xij xij   TT
i i
n i 1
i 1 j 1
k
n
1 k 2

2
  x1ij  n  T1i
i 1
 i 1 j 1

 k n
1 k

x1ij x pij   T1iTpi
 
n i 1
i 1 j 1
=
6411.467
5286.867
927.867
686.867
5286.867
5807.867
840.133
677.600
1 k

x
x

T
T


1ij pij
1i pi 
n
i 1 j 1
i 1



k
n
k
1
2
x

Tpi2 


pij

n i 1
i 1 j 1
k
n
927.867
840.133
5161.467
4721.133
= the Within SS and SP matrix
686.867
677.600
4721.133
6153.733
Download