ESTIMATION OF VARIANCE COMPONENTS IN UNBALANCED NESTED DESIGNS Motohiro Yamasaki1, Yoshikazu Ojima2 and Seiichi Yasui3 (1), (2), (3) Department of Industrial Administration, Tokyo University of Science, 2641 Yamazaki, Noda, Chiba, 278-8510, Japan, E-mail: Yamasaki.Motohiro@md.mt-pharma.co.jp ABSTRACT Nested designs are often used for estimating variance components which enable us to evaluate the precision of the measurement method. The balanced nested design is usually used for this purpose. the factors at the upper parts of the hierarchy. To eliminate this defect, several unbalanced nested designs with the suitable estimation methods of variance components have been proposed. measurement results. However, it has a defect in having relatively few degrees of freedom for On the other hand, the missing data or outliers are usually included in the Therefore, even if the balanced design is conducted, the results have to be analyzed as the unbalanced design. The sums of squares for the unbalanced design are not unique. Unbiased estimators with the sums of squares in ordinary ANOVA were proposed. In our study, the alternative method for constructing the sums of squares is proposed. The methods for two or three-stage design are examined and utilized to calculate the ANOVA estimators of variance components, including estimators of linear combinations. In our study, the existing and the proposed methods are introduced. The proposed method constructs unbiased estimators with unweighted sum of squares by unweighted mean of sums of data for each level. The performance of estimators obtained from two methods is evaluated by the positive semi-definite quadratic form in the vector y . We will show the introduced estimators and results of the evaluation on that day. Key words: unbalanced nested design, variance components, ANOVA, unbiased estimators INTRODUCTION In general, precision is one of the most important performance measures to evaluate measurement methods and measurement results. Measurement methods and measurement results are an important basis for trading, manufacturing and quality control. Nested designs are often used for estimating variance components which enable us to evaluate the precision of the measurement method. nested design is usually used for this purpose. the upper parts of the hierarchy. The balanced However, it has a defect in having relatively few degrees of freedom for the factors at To eliminate this defect, several unbalanced nested designs with the suitable estimation methods of variance components have been proposed. The staggered nested design, which was proposed and named by Bainbridge (1965), is the most popular unbalanced nested design in practical fields, because it has a very simple open-ended structure and each sum of squares in the ANOVA has almost the same degrees of freedom. with a specific term. Calvin and Miller (1961) also proposed the same design, but did not assign it The same design can be seen in Davies (1957). 1 The staggered nested design has been studied by many authors, including Khattree and Naik (1995), Nelson (1995a, b) and Uhling (1996). The staggered nested design is also recommended for application to precision experiments in the International Standard ISO 5725-Part 3 (1994). the nested design. Usually, the missing data or outliers are included in the measurement results. Therefore, the design can not keep a very simple open-ended structure. nested design. design. However, this is only a special situation in That is to say, the design becomes another unbalanced nested design, which is not the staggered Similarly, even if the balanced nested design is conducted, the results have to be analyzed as the unbalanced nested In the past unbalanced study, Leone et al (1968) gives the expectation of mean squares from which the estimates of variance components are obtained. components. Goldsmith and Gaylor (1970) derives the variances and covariances of the ANOVA estimators of variance And, Ojima (1984) gives the canonical forms of the sums of squares used in ANOVA. estimators with the sums of squares in ordinary ANOVA are proposed. the unbalanced design are not unique. In Ojima (1984), unbiased In this paper, we focus on the point that the sums of squares for And, we propose the alternative method for constructing the sums of squares. To simplify the calculation, the form for two-stage nested design, that is one-way ANOVA design, is derived and utilized by Ojima (1984) to calculate the expectations and variances of the sums of squares and mean squares systematically. for three or more-stage nested designs can also be obtained. Extending this derivation, the canonical form Using these canonical forms, we can obtain easily the expectation and variance of sums of squares and of any linear combination of sums of squares, including the ANOVA estimators of variance components and estimators of linear combinations of them. of sums of squares. However, the proposed method can not use the canonical form in light of the structure Therefore, the variance of estimator obtained from the proposed method is derived by the positive semi-definite quadratic form in vector y . In the two-stage nested design, the performance of estimators obtained from the ordinary and proposed methods is evaluated by their variances. NOTATIONS AND THE MODEL Notations used throughout this paper are as follows: Ia : identity matrix of order (a a), A : transpose of a matrix of A , O : zero matrix, G m,n : (m n) matrix of units, jn : (n 1) vector of units, diag ri : diagonal matrix whose (i, i)th element equals to ri, Diag A i : direct sum of matrices Ai, i.e. A1 A 2 A i , a ij i , j ri i : matrix whose (i, j)th element equals to aij, : vector whose i-th element equals to ri, combined subscripts ij : two subscripts combined without comma are used as single subscript, which will vary with lexicographical order, i.e. ij = 11, …, 1r1, 21, …, 2 r2, 31, …, ara. The model of data in the two-stage unbalanced nested design, that is one-way ANOVA design, is usually expressed as yij i ij i 1,, a j 1, , ri (1) r i n i where is an unknown constant that represents the general mean and and are mutually independent random variables, with zero mean and Var () = A2, Var () = E2. Usual SSk (sums of squares of factor k) in the ANOVA of the two-stage unbalanced nested 2 design are expressed as follows. SSA yi y y i 2 i y ij y j ri j y i ij j (2) n SSE yij yi 2 i (3) j However, the sums of squares for the unbalanced design are not unique. Therefore, we propose the alternative method for constructing the sums of squares as follows. SSA yi y y 2 i y i i (4) a This method constructs unbiased estimators with unweighted sum of squares by unweighted mean of sums of data for each level. Here, y i is independent random variables, with mean and Var ( y i ) = A2 + E2 / ri . The character of the each method is as follows. In the ordinary method, under H0: A2 = 0, y i is independent random variables, with mean and Var ( y i ) = E2 / ri . = 0, SSA is defined as 2-distridution with degrees of freedom (a 1) multiplied by the constant E2. 2-distridution. On the other hand, under H0: E2 = 0, proposed method. constant A2. And, under H0: A2 If A2 > E2, SSA is far from y i is independent random variables, with mean and Var ( y i ) = A2 in the And, under H0: E2 = 0, SSA is defined as 2-distridution with degrees of freedom (a 1) multiplied by the If A2 > E2, SSA has 2-distridution approximately. CANONICAL FORM AND SUM OF SQUARES In general, sums of squares appeared in the ANOVA of the nested design satisfy the following condition. Every SSk and CT (correction term) can be expressed as the positive semi-definite quadratic form in the data vector y, and the sum of SSk’s, k = 1, 2, …, k, and CT coincides with the sum of squared original data. data vector y. If z is a vector of canonical variables, it is an appropriate orthogonal transformation of In case of two-stage, a vector z can be partitioned into three parts as z z 0 z 1 z 2 . And, every zk corresponds to each SSk or CT in the ANOVA respectively, i.e. ; CT z 0 z 0 SSk z k z k . This is the definition of the canonical forms for the nested designs. (5) We think that the above content can be applied to our proposed method, too. CANONICAL FORM FOR TWO-STAGE UNBALANCED NESTED DESIGNS Using matrix algebra, the equation (1) is expressed by another form, i.e. y X X α ε where y y ij ij (6) (n 1) vector 3 (n 1) vector X jn X Diag jri (n a) matrix i α i i (a 1) vector ε ij ij (n 1) vector. Approach for the ordinary method of sums of squares in the ANOVA Let z be a vector of canonical variables, and P be an orthogonal matrix transforming y to z, i.e. z P y . We construct P as follows. Let P be an {(a 1) a} matrix satisfying the condition that P1 is an orthogonal matrix of order (a a). P1 r1 n r2 n ra n P1 (7) Then, 1 X P diag r i constructs an {(a 1) n} orthogonal vector, and this vector and 1 n jn 1 n X are orthogonal. Therefore, P2 is an orthogonal vector of order (a n) as follows. 1 X n P2 1 P1 diag ri X (8) It follows that r r P P I a i i n i ,i ( i, i 1,, a ). (9) Further, let Q be an {(n a) n} orthogonal vector satisfying the condition that P is an orthogonal matrix of order (n n). 1 X n 1 P P1 diag r i Q X (10) 4 Using the orthogonal matrix P , y is transformed into z as follows. 1 X y n d 1 X y z P y P1 diag r i Q y (11) Since the expectation and dispersion matrix of y are E y X V y Diag G ri ri 2 A , I (12) 2 E n the expectation and dispersion matrix of z are given by z n Ez μ E z 0 a 1 z 0 na d a Vz a 0 d where a a 2 A 1 1 ri 2 m n i n 1 n a A O (13) 0 O E2 I n O (14) ri 2 m i scalar P1 ri3 2 {(a 1) 1} vector (15) i A P1 diag ri P1 {(a 1) (a 1)} matrix. The following formulation is convenient to derive the expectation and variance of SSk and MSk (mean square of factor k). According to the partition of columns and rows of V (z), we set D, D and D such that; 1 0 0 0 0 0 0 0 0 D 0 O O , D 0 I a 1 O , D 0 O O . 0 O O 0 O O 0 O I na (16) CT z D z (17) Then, SSA z D z SSE z D z . It follows that 5 CT SSA SSE z z y y . The degrees of freedom for each SSk are CT trD 1 A trD a 1 E trD n a . Here, we need the traces of A and A2 for the expectation and variance of SSk in the ordinary method. It follows that trA n 2 trA 1 m ri 2 n , n i n ni2 n 2 ni 2 i n n i i 3 i (18) 2 2 2l m m n n ri3 l . i (19) Approach for the proposed method of sums of squares in the ANOVA The proposed method can not use the canonical form in light of the structure of sums of squares. for the proposed method. Therefore, we try another approach SSA can be expressed as the positive semi-definite quadratic form in vector y . Here, it follows that y y1 y 2 y a , 2 Ey ja , Var y diag A2 E ri . (20) And then, the equation (4) is expressed by the next equation. 1 1 SSA yi2 ay2 y y y G aa y y I a G aa y a a i (21) DERIVATION OF THE VARIANCE OF ESTIMATORS Here, and are the nonsingular matrix. Therefore, we can use the next formulas. EzAz trA μAμ , (22) Vz Az 2tr A 4μAμ 2 Using equations (13), (14), (15), (18), (19) and the above formulas (22), we obtain the expectation and variance of estimators. Approach for the ordinary method By the equation (16), (17), (18), (19) and formulas (22), we can derive the next expectations and variances. 1 m ESSA trD E2 trD A2 trA a 1 E2 n ri 2 A2 a 1 E2 n A2 , n i n (23) ESSE trD E2 trD n a E2 , (24) 6 EMSA E2 n m 2 , A na 1 2 (25) EMSA E2 . (26) 2 VSSA 2trD 4 D D 2trD 2tr A A2 E2 2 2 2 2 ri 2 4 2a 1 E4 4 n 1 ri 2 A2 E2 2 ri 2 2 ri3 i A n i n i i n 2 m 2l m 2a 1 E4 4 n A2 E2 2 m A4 , n n n (27) VSSE 2trD 4 D D 2trD 2n a E4 , (28) 2 2 2 4 m 2 VMSA E4 n A2 E2 2 a 1 a 1 n a 12 VMSE 2 2l m 4 m A , n n 2 4. n a E (29) (30) Using the equation (23), (24), (25) and (26), we proceed to derive the ANOVA estimators and the variance of estimators as follows. ˆ A2 na 1 MSA MSE , 2 m n (31) ̂ E2 MSE , V ˆ A2 (32) n 2 a 1 2 n 2 m 2 VMSA VMSE . (33) Approach for the proposed method The equation (21) is expressed as follows; 1 SSA y I a G aa y y My a Here, we can use the formulas (22). M and M are expressed as follows. 0 1 Mμ I a G aa ja , a 0 7 1 2 E2 1 2 E2 A 1 A r a r2 1 a 2 1 2 E A a r1 M 1 2 E2 a A r 1 2 1 A2 E a ra . 2 1 2 E 1 A ra a (34) By the equation (34) and formulas (22), we can derive the next expectations and variances. ESSA Ey My trM μMμ trM 1 1 a 1 A2 1 i E2 ri a (35) 1 1 EMSA A2 i E2 a ri (36) VSSA Vy My 2tr M 4μMμ 2tr M 2 2 2 4 1 A2 E ri i a VMSA 2 2 1 2 A2 E ri a i 2 2 (37) V( SSA) a 12 (38) Using the equation (35), (36), (37) and (38), we proceed to derive the ANOVA estimators and the variance of estimators as follows. 1 a 1 ri ˆ A2 MSA MSE i (39) 2 1 1 V ˆ A2 VMSA VMSE a i ri (40) EVALUATION OF THE METHODS The performance of estimators obtained from the ordinary and proposed methods is evaluated by their variances under the following conditions. The number of level A2 / E2 ; 12 ; 0.25, 0.50, 1.00, 2.00, 4.00 The number of observation at the level ; (10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10) (8, 8, 8, 8, 10, 10, 10, 10, 12, 12, 12, 12) 8 : Pattern 1, : Pattern 2, Pattern 1 shows the balanced design. (6, 6, 6, 6, 10, 10, 10, 10, 14, 14, 14, 14) : Pattern 3, (4, 4, 4, 4, 10, 10, 10, 10, 16, 16, 16, 16) : Pattern 4, (2, 2, 2, 2, 10, 10, 10, 10, 18, 18, 18, 18) : Pattern 5. Now, we assume several situations from the balanced design to the extreme unbalanced design. And then, we calculate the variance of estimators with the equations (33), (40) and the above conditions. estimator ˆ A2 Table 1 shows the variance of for the ordinary and proposed methods. Table 1. Comparison of the variance for estimators METHOD A 2/E 2 Pattern 1 Pattern 2 Pattern 3 Pattern 4 Pattern 5 Ordinary method 0.25 0.50 1.00 2.00 4.00 0.022460 0.065640 0.220185 0.802003 3.056549 0.022767 0.066804 0.224723 0.819949 3.127950 0.023721 0.070395 0.238733 0.875364 3.348447 0.025399 0.076726 0.263450 0.973169 3.737697 0.027945 0.086361 0.301108 1.122266 4.331233 Proposed method 0.25 0.50 1.00 2.00 4.00 0.022460 0.065640 0.220190 0.802000 3.056550 0.022870 0.066310 0.221360 0.804180 3.060750 0.024420 0.068750 0.225610 0.812040 3.075820 0.028740 0.075330 0.236690 0.832150 3.113960 0.047400 0.101350 0.277450 0.902360 3.243100 Table 1 indicates that the more unbalanced the design becomes, the more the performance of estimators worsens. However, the performance of the proposed method is better than that of the ordinary method if the relationship for the variance of each factor is A2 > E2. On the other hand, if the relationship for the variance of each factor is E2 > A2, the performance of the ordinary method is better than that of the proposed method. CONCLUSIONS In this paper, the alternative method for constructing the sums of squares is proposed. That is, this method constructs unbiased estimators with unweighted sum of squares by unweighted mean of sums of data for each level. estimator of the proposed method with that of the ordinary method. proposed method is superior to the ordinary method. And, we compared the performance of Hence, we should explore the pattern whose the performance of the To do so, we need to increase several conditions. Additionally, we should discuss whether even the proposed method can construct the canonical form of the positive semi-definite quadratic form because the proposed method has a possibility. And, we should compare the performance of estimator for linear combinations of variance components, too. ACKNOWLEDGMENTS The author would like to thank Professor Y. Ojima and Dr. S. Yasui of the Tokyo University of Science for valuable comments and suggestions for this study. 9 REFERENCES 1. Bainbridge, T. R. (1965), Staggered nested designs for estimating variance components, Industrial Quality Control, 22, pp. 12-20. 2. Calvin, L. D. & Miller, J. D. (1961), A sampling design with incomplete dichotomy, Agronomy Journal, 53, pp. 325-328. 3. Davies, O. L. (Ed.) (1957), Statistical Methods in Research and Production with Special Reference to the Chemical Industry, 3 rd Edn, p. 116 (London, Oliver & Boyd). 4. Khattree, R. and Naik, D. N. (1995), Statistical tests for random effects in staggered nested designs, Journal of Applied Statistics, Vol. 22, No. 4, pp.495-505. 5. Nelson, L. S. (1995a), Using nested designs 1. Estimation of standard deviations’, Journal of Quality Technology, 27, pp.169-171. 6. Nelson, L. S. (1995b), Using nested designs 2. Confidence-limits for standard deviations’, Journal of Quality Technology, 27, pp.265-267. 7. Uhlig, S. (1996), Optimum two-way nested designs for estimation of variance components, Tatra Mountains Mathematical Publications, 7, pp.105-112. 8. ISO 5725-3, (1994), Accuracy (trueness and precision) of measurement methods and results – Part3: Intermediate measures of the precision of a standard measurement method, International Organization for Standardization, Geneva, Switzerland. 9. Leone, F. C., Nelson, L. S., Johnson, N. L. and Eisenstat, S. (1968), Sampling Distributions of Variance Components-2. Empirical Studies of Unbalanced Nested Designs, Technometrics, 10, pp.719-737. 10. Goldsmith, C.H. and Gaylor, D.W. (1970) Three Stage Nested Designs for Estimating Variance Components’, Technometrics, Vol.12, No.3, pp.487-498, August. 11. Yoshikazu OJIMA. (1984), The Use of Canonical Forms for Estimating Variance Components in Unbalanced Nested Designs, Report of Statistical Application Research, JUSE, 31, pp. 1-18. 10