Estimation of Variance Components in Unbalanced Nested Designs

advertisement
ESTIMATION OF VARIANCE COMPONENTS IN UNBALANCED NESTED DESIGNS
Motohiro Yamasaki1, Yoshikazu Ojima2 and Seiichi Yasui3
(1), (2), (3) Department of Industrial Administration, Tokyo University of Science, 2641 Yamazaki, Noda, Chiba, 278-8510, Japan,
E-mail: Yamasaki.Motohiro@md.mt-pharma.co.jp
ABSTRACT
Nested designs are often used for estimating variance components which enable us to evaluate the precision of the measurement method.
The balanced nested design is usually used for this purpose.
the factors at the upper parts of the hierarchy.
To eliminate this defect, several unbalanced nested designs with the suitable estimation
methods of variance components have been proposed.
measurement results.
However, it has a defect in having relatively few degrees of freedom for
On the other hand, the missing data or outliers are usually included in the
Therefore, even if the balanced design is conducted, the results have to be analyzed as the unbalanced design.
The sums of squares for the unbalanced design are not unique. Unbiased estimators with the sums of squares in ordinary ANOVA were
proposed.
In our study, the alternative method for constructing the sums of squares is proposed.
The methods for two or three-stage
design are examined and utilized to calculate the ANOVA estimators of variance components, including estimators of linear
combinations.
In our study, the existing and the proposed methods are introduced.
The proposed method constructs unbiased
estimators with unweighted sum of squares by unweighted mean of sums of data for each level.
The performance of estimators
obtained from two methods is evaluated by the positive semi-definite quadratic form in the vector y .
We will show the introduced
estimators and results of the evaluation on that day.
Key words: unbalanced nested design, variance components, ANOVA, unbiased estimators
INTRODUCTION
In general, precision is one of the most important performance measures to evaluate measurement methods and measurement results.
Measurement methods and measurement results are an important basis for trading, manufacturing and quality control.
Nested designs
are often used for estimating variance components which enable us to evaluate the precision of the measurement method.
nested design is usually used for this purpose.
the upper parts of the hierarchy.
The balanced
However, it has a defect in having relatively few degrees of freedom for the factors at
To eliminate this defect, several unbalanced nested designs with the suitable estimation methods of
variance components have been proposed.
The staggered nested design, which was proposed and named by Bainbridge (1965), is the
most popular unbalanced nested design in practical fields, because it has a very simple open-ended structure and each sum of squares in
the ANOVA has almost the same degrees of freedom.
with a specific term.
Calvin and Miller (1961) also proposed the same design, but did not assign it
The same design can be seen in Davies (1957).
1
The staggered nested design has been studied by many authors,
including Khattree and Naik (1995), Nelson (1995a, b) and Uhling (1996). The staggered nested design is also recommended for
application to precision experiments in the International Standard ISO 5725-Part 3 (1994).
the nested design.
Usually, the missing data or outliers are included in the measurement results. Therefore, the design can not keep a
very simple open-ended structure.
nested design.
design.
However, this is only a special situation in
That is to say, the design becomes another unbalanced nested design, which is not the staggered
Similarly, even if the balanced nested design is conducted, the results have to be analyzed as the unbalanced nested
In the past unbalanced study, Leone et al (1968) gives the expectation of mean squares from which the estimates of variance
components are obtained.
components.
Goldsmith and Gaylor (1970) derives the variances and covariances of the ANOVA estimators of variance
And, Ojima (1984) gives the canonical forms of the sums of squares used in ANOVA.
estimators with the sums of squares in ordinary ANOVA are proposed.
the unbalanced design are not unique.
In Ojima (1984), unbiased
In this paper, we focus on the point that the sums of squares for
And, we propose the alternative method for constructing the sums of squares.
To simplify the
calculation, the form for two-stage nested design, that is one-way ANOVA design, is derived and utilized by Ojima (1984) to calculate
the expectations and variances of the sums of squares and mean squares systematically.
for three or more-stage nested designs can also be obtained.
Extending this derivation, the canonical form
Using these canonical forms, we can obtain easily the expectation and
variance of sums of squares and of any linear combination of sums of squares, including the ANOVA estimators of variance components
and estimators of linear combinations of them.
of sums of squares.
However, the proposed method can not use the canonical form in light of the structure
Therefore, the variance of estimator obtained from the proposed method is derived by the positive semi-definite
quadratic form in vector y .
In the two-stage nested design, the performance of estimators obtained from the ordinary and proposed
methods is evaluated by their variances.
NOTATIONS AND THE MODEL
Notations used throughout this paper are as follows:
Ia
: identity matrix of order (a  a),
A
: transpose of a matrix of A ,
O
: zero matrix,
G m,n
: (m  n) matrix of units,
jn
: (n  1) vector of units,
diag ri 
: diagonal matrix whose (i, i)th element equals to ri,
Diag A i 
: direct sum of matrices Ai, i.e. A1  A 2    A i   ,
a 
ij i , j
ri i
: matrix whose (i, j)th element equals to aij,
: vector whose i-th element equals to ri,
combined subscripts ij
: two subscripts combined without comma are used as single subscript, which will vary with lexicographical
order, i.e. ij = 11, …, 1r1, 21, …, 2 r2, 31, …, ara.
The model of data in the two-stage unbalanced nested design, that is one-way ANOVA design, is usually expressed as
yij     i   ij
i  1,, a
j  1,  , ri
(1)
r
i
n
i
where  is an unknown constant that represents the general mean and  and  are mutually independent random variables, with zero
mean and Var () = A2, Var () = E2.
Usual SSk (sums of squares of factor k) in the ANOVA of the two-stage unbalanced nested
2
design are expressed as follows.
SSA    yi  y 
y i 
2
i
y
ij
y 
j
ri
j
 y
i
ij
j
(2)
n
SSE   yij  yi 
2
i
(3)
j
However, the sums of squares for the unbalanced design are not unique.
Therefore, we propose the alternative method for constructing
the sums of squares as follows.
SSA    yi  y 
y  
2
i
y
i
i
(4)
a
This method constructs unbiased estimators with unweighted sum of squares by unweighted mean of sums of data for each level.
Here,
y i is independent random variables, with  mean and Var ( y i ) = A2 + E2 / ri . The character of the each method is as follows. In
the ordinary method, under H0: A2 = 0, y i is independent random variables, with  mean and Var ( y i ) = E2 / ri .
= 0, SSA is defined as 2-distridution with degrees of freedom (a  1) multiplied by the constant E2.
2-distridution.
On the other hand, under H0: E2 = 0,
proposed method.
constant A2.
And, under H0: A2
If A2 > E2, SSA is far from
y i is independent random variables, with  mean and Var ( y i ) = A2 in the
And, under H0: E2 = 0, SSA is defined as 2-distridution with degrees of freedom (a  1) multiplied by the
If A2 > E2, SSA has 2-distridution approximately.
CANONICAL FORM AND SUM OF SQUARES
In general, sums of squares appeared in the ANOVA of the nested design satisfy the following condition.
Every SSk and CT (correction
term) can be expressed as the positive semi-definite quadratic form in the data vector y, and the sum of SSk’s, k = 1, 2, …, k, and CT
coincides with the sum of squared original data.
data vector y.
If z is a vector of canonical variables, it is an appropriate orthogonal transformation of
In case of two-stage, a vector z can be partitioned into three parts as z   z 0 z 1 z 2  .
And, every zk corresponds to each
SSk or CT in the ANOVA respectively, i.e. ;
CT  z 0 z 0
SSk  z k z k .
This is the definition of the canonical forms for the nested designs.
(5)
We think that the above content can be applied to our proposed
method, too.
CANONICAL FORM FOR TWO-STAGE UNBALANCED NESTED DESIGNS
Using matrix algebra, the equation (1) is expressed by another form, i.e.
y  X   X α  ε
where
y  y ij ij
(6)
(n  1) vector
3
(n  1) vector
X   jn
 
X  Diag jri
(n  a) matrix
i
α   i i
(a  1) vector
ε   ij ij
(n  1) vector.
Approach for the ordinary method of sums of squares in the ANOVA
Let z be a vector of canonical variables, and P  be an orthogonal matrix transforming y to z, i.e. z  P y . We construct P  as
follows.
Let P be an {(a  1)  a} matrix satisfying the condition that P1 is an orthogonal matrix of order (a  a).

P1  

r1
n
r2
n
ra
n

P1



(7)
Then,
 1 
  X
P  diag 
 r  
 i 
constructs an {(a  1)  n} orthogonal vector, and this vector and
1
n
jn 
1
n
X are orthogonal. Therefore, P2 is an orthogonal
vector of order (a  n) as follows.
1

X

n
P2  
 1


 P1  diag 
 ri





  X  




(8)
It follows that

 r r
P P  I a   i i

 n




 i ,i 
( i, i   1,, a ).
(9)
Further, let Q be an {(n  a)  n} orthogonal vector satisfying the condition that P  is an orthogonal matrix of order (n  n).
1

X

n

 1

P    P1  diag 
 r
 i

Q







  X 







(10)
4
Using the orthogonal matrix P  , y is transformed into z as follows.
1


X  y


n


d
 1 

  X y 
z  P y   P1  diag 
 r   
 i 




Q y




(11)
Since the expectation and dispersion matrix of y are
E y   X 

V y    Diag G ri ri
2
A


,
  I 
(12)
2
E n
the expectation and dispersion matrix of z are given by
z    n  


Ez   μ  E z     0 a 1 


 z    0 na 
d
 a 
Vz      a 
 0
d
where
a  
a  
2
A
1
1
ri 2   m

n i
n
1
n
a
A
O
(13)
0 
O   E2 I n
O
(14)


  ri 2  m 
 i

scalar
 
P1 ri3 2
{(a  1)  1} vector
(15)
i
A   P1 diag ri P1
{(a  1) (a  1)} matrix.
The following formulation is convenient to derive the expectation and variance of SSk and MSk (mean square of factor k).
According to
the partition of columns and rows of V (z), we set D, D and D such that;

1 0 0
0 0 0
0 0 0 







 D  0 O O , D  0 I a 1 O , D  0 O O  .

0 O O 
0 O O 
0 O I na 

(16)
CT  z D z
(17)
Then,
SSA  z D z
SSE  z D z .
It follows that
5
CT  SSA  SSE  z z  y y .
The degrees of freedom for each SSk are
CT  trD  1  A  trD  a  1  E  trD  n  a .
Here, we need the traces of A and A2 for the expectation and variance of SSk in the ordinary method. It follows that
trA   n 
2
trA 
1
m
ri 2  n  ,

n i
n
 ni2
n 
2
  ni  2    i
n  n
i
i

3
i
(18)
2

2

2l  m 
  m  
n n




  ri3  l  .
 i

(19)
Approach for the proposed method of sums of squares in the ANOVA
The proposed method can not use the canonical form in light of the structure of sums of squares.
for the proposed method.
Therefore, we try another approach
SSA can be expressed as the positive semi-definite quadratic form in vector y . Here, it follows that
y  y1 y 2 y a  ,

2
Ey   ja , Var y     diag  A2  E

ri


 .

(20)
And then, the equation (4) is expressed by the next equation.
1
1



SSA   yi2  ay2  y y  y  G aa  y  y  I a  G aa  y
a
a



i
(21)
DERIVATION OF THE VARIANCE OF ESTIMATORS
Here,
 and   are the nonsingular matrix. Therefore, we can use the next formulas.
EzAz   trA  μAμ ,
(22)
Vz Az   2tr A   4μAμ
2
Using equations (13), (14), (15), (18), (19) and the above formulas (22), we obtain the expectation and variance of estimators.
Approach for the ordinary method
By the equation (16), (17), (18), (19) and formulas (22), we can derive the next expectations and variances.
1
m



ESSA  trD    E2 trD   A2 trA  a  1 E2   n   ri 2  A2  a  1 E2   n   A2 ,
n i
n



(23)
ESSE   trD    E2 trD  n  a  E2 ,
(24)
6
EMSA   E2 
n

m 2 ,
A
na  1
2
(25)
EMSA   E2 .
(26)

2
VSSA  2trD    4 D D   2trD    2tr A
 A2   E2
2
2

2
2

  ri 2  



4
 2a  1 E4  4 n  1  ri 2  A2 E2  2  ri 2  2  ri3   i
  A
n i
n i
 i


 n  


 

2

m
2l  m  

 2a  1 E4  4 n   A2 E2  2 m      A4 ,

n
n  n  


(27)
VSSE   2trD   4 D D   2trD   2n  a E4 ,
(28)
2
2
2
4 
m
2
VMSA 
 E4 
n   A2 E2 
2 
a  1
a  1  n 
a  12
VMSE  
2

2l  m  

 4
m      A ,
n n 



2
4.
n  a  E
(29)
(30)
Using the equation (23), (24), (25) and (26), we proceed to derive the ANOVA estimators and the variance of estimators as follows.
ˆ A2 
na  1
MSA  MSE  ,
2
m
n

(31)
̂ E2  MSE ,
 
V ˆ A2 
(32)
n 2 a  1
2
n
2
m

2
VMSA  VMSE 
.
(33)
Approach for the proposed method
The equation (21) is expressed as follows;
1


SSA  y  I a  G aa  y  y My
a


Here, we can use the formulas (22).
M and M are expressed as follows.
 0
 
1


Mμ    I a  G aa  ja      ,
a


 0
 
7
 1  2  E2 
1  2  E2


 A 
1







 A r 
a 
r2
1 
 a 
2

1 2 E 


    A 
a
r1 

M 






1  2  E2 
  a   A  r 
1 


2  
1
   A2  E  
a
ra  


.






2
 1  2  E 

1    A 
ra 
 a 

  

(34)
By the equation (34) and formulas (22), we can derive the next expectations and variances.
ESSA  Ey My   trM  μMμ  trM
1
 1 
 a  1 A2  1   i  E2
ri 
 a 
(35)
1
1
EMSA   A2   i  E2
a
ri 
(36)
VSSA  Vy My   2tr M   4μMμ  2tr M 
2

2
 2 
 4 1    A2  E
ri
 i  a 

VMSA 
2

2
1  
  2    A2  E
ri
 a  i 
2



2




(37)
V( SSA)
a  12
(38)
Using the equation (35), (36), (37) and (38), we proceed to derive the ANOVA estimators and the variance of estimators as follows.

1
a
1
ri 
ˆ A2  MSA    MSE
i
(39)
2
 1  1 

V ˆ A2   VMSA      VMSE 


 a  i ri 
(40)
EVALUATION OF THE METHODS
The performance of estimators obtained from the ordinary and proposed methods is evaluated by their variances under the following
conditions.
The number of level
A2 / E2
; 12
; 0.25, 0.50, 1.00, 2.00, 4.00
The number of observation at the level ; (10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10)
(8, 8, 8, 8, 10, 10, 10, 10, 12, 12, 12, 12)
8
: Pattern 1,
: Pattern 2,
Pattern 1 shows the balanced design.
(6, 6, 6, 6, 10, 10, 10, 10, 14, 14, 14, 14)
: Pattern 3,
(4, 4, 4, 4, 10, 10, 10, 10, 16, 16, 16, 16)
: Pattern 4,
(2, 2, 2, 2, 10, 10, 10, 10, 18, 18, 18, 18)
: Pattern 5.
Now, we assume several situations from the balanced design to the extreme unbalanced design.
And then, we calculate the variance of estimators with the equations (33), (40) and the above conditions.
estimator
ˆ A2
Table 1 shows the variance of
for the ordinary and proposed methods.
Table 1.
Comparison of the variance for estimators
METHOD
A 2/E 2
Pattern 1
Pattern 2
Pattern 3
Pattern 4
Pattern 5
Ordinary method
0.25
0.50
1.00
2.00
4.00
0.022460
0.065640
0.220185
0.802003
3.056549
0.022767
0.066804
0.224723
0.819949
3.127950
0.023721
0.070395
0.238733
0.875364
3.348447
0.025399
0.076726
0.263450
0.973169
3.737697
0.027945
0.086361
0.301108
1.122266
4.331233
Proposed method
0.25
0.50
1.00
2.00
4.00
0.022460
0.065640
0.220190
0.802000
3.056550
0.022870
0.066310
0.221360
0.804180
3.060750
0.024420
0.068750
0.225610
0.812040
3.075820
0.028740
0.075330
0.236690
0.832150
3.113960
0.047400
0.101350
0.277450
0.902360
3.243100
Table 1 indicates that the more unbalanced the design becomes, the more the performance of estimators worsens.
However, the
performance of the proposed method is better than that of the ordinary method if the relationship for the variance of each factor is A2 >
E2.
On the other hand, if the relationship for the variance of each factor is E2 > A2, the performance of the ordinary method is better
than that of the proposed method.
CONCLUSIONS
In this paper, the alternative method for constructing the sums of squares is proposed.
That is, this method constructs unbiased
estimators with unweighted sum of squares by unweighted mean of sums of data for each level.
estimator of the proposed method with that of the ordinary method.
proposed method is superior to the ordinary method.
And, we compared the performance of
Hence, we should explore the pattern whose the performance of the
To do so, we need to increase several conditions. Additionally, we should
discuss whether even the proposed method can construct the canonical form of the positive semi-definite quadratic form because the
proposed method has a possibility.
And, we should compare the performance of estimator for linear combinations of variance
components, too.
ACKNOWLEDGMENTS
The author would like to thank Professor Y. Ojima and Dr. S. Yasui of the Tokyo University of Science for valuable comments and
suggestions for this study.
9
REFERENCES
1. Bainbridge, T. R. (1965), Staggered nested designs for estimating variance components, Industrial Quality Control, 22, pp. 12-20.
2. Calvin, L. D. & Miller, J. D. (1961), A sampling design with incomplete dichotomy, Agronomy Journal, 53, pp. 325-328.
3. Davies, O. L. (Ed.) (1957), Statistical Methods in Research and Production with Special Reference to the Chemical Industry, 3 rd Edn, p. 116 (London,
Oliver & Boyd).
4. Khattree, R. and Naik, D. N. (1995), Statistical tests for random effects in staggered nested designs, Journal of Applied Statistics, Vol. 22, No. 4,
pp.495-505.
5. Nelson, L. S. (1995a), Using nested designs 1. Estimation of standard deviations’, Journal of Quality Technology, 27, pp.169-171.
6. Nelson, L. S. (1995b), Using nested designs 2. Confidence-limits for standard deviations’, Journal of Quality Technology, 27, pp.265-267.
7. Uhlig, S. (1996), Optimum two-way nested designs for estimation of variance components, Tatra Mountains Mathematical Publications, 7, pp.105-112.
8. ISO 5725-3, (1994), Accuracy (trueness and precision) of measurement methods and results – Part3: Intermediate measures of the precision of a
standard measurement method, International Organization for Standardization, Geneva, Switzerland.
9. Leone, F. C., Nelson, L. S., Johnson, N. L. and Eisenstat, S. (1968), Sampling Distributions of Variance Components-2. Empirical Studies of
Unbalanced Nested Designs, Technometrics, 10, pp.719-737.
10. Goldsmith, C.H. and Gaylor, D.W. (1970) Three Stage Nested Designs for Estimating Variance Components’, Technometrics, Vol.12, No.3,
pp.487-498, August.
11. Yoshikazu OJIMA. (1984), The Use of Canonical Forms for Estimating Variance Components in Unbalanced Nested Designs, Report of Statistical
Application Research, JUSE, 31, pp. 1-18.
10
Download