AN ABSTRACT OF THE DISSERTATION OF Waseem S. Alnosaier for the degree of Doctor of Philosophy in Statistics presented on April 25, 2007. Title: Kenward-Roger Approximate F Test for Fixed Effects in Mixed Linear Models. Abstract approved: David S. Birkes Clifford B. Pereira For data following a balanced mixed Anova model, the standard Anova method typically leads to exact F tests of the usual hypotheses about fixed effects. However, for most unbalanced designs with most hypotheses, the Anova method does not produce exact tests, and approximate methods are needed. One approach to approximate inference about fixed effects in mixed linear models is the method of Kenward and Roger (1997), which is available in SAS and widely used. In this thesis, we strengthen the theoretical foundations of the method by clarifying and weakening the assumptions, and by determining the orders of the approximations used in the derivation. We present two modifications of the K-R method which are comparable in performance but simpler in derivation and computation. It is known that the K-R method reproduces exact F tests in two special cases, namely for Hotelling T 2 and for Anova F ratios in fixed-effects models. We show that the K-R and proposed methods reproduce exact F tests in three more general models which include the two special cases, plus tests in all fixed effects linear models, many balanced mixed Anova models, and all multivariate regressions models. In addition, we present some theorems on the K-R, proposed, and Satterthwaite methods, investigating conditions under which they are identical. Also, we show the difficulties in developing a K-R type method using the conventional, rather than adjusted, estimator of the variance-covariance matrix of the fixed-effects estimator. A simulation study is conducted to assess the performance of the K-R, proposed, Satterthwaite, and Containment methods for three kinds of block designs. The K-R and proposed methods perform quite similarly and they outperform other approaches in most cases. ©Copyright by Waseem S. Alnosaier April 25, 2007 All Rights Reserved Kenward-Roger Approximate F Test for Fixed Effects in Mixed Linear Models by Waseem S. Alnosaier A DISSERTATION submitted to Oregon State University in partial fulfillment of the requirement for the degree of Doctor of Philosophy Presented April 25, 2007 Commencement June 2007 Doctor of Philosophy dissertation of Waseem S. Alnosaier presented on April 25, 2007. APPROVED: Co-Major Professor, representing Statistics Co-Major Professor, representing Statistics Chair of the Department of Statistics Dean of the Graduate School I understand that my dissertation will become part of the permanent collection of Oregon State University libraries. My signature below authorizes release of my dissertation to any reader upon request. Waseem S. Alnosaier, Author ACKNOWLEDGMENTS I am deeply indebt to my advisors, Dr. David Birkes and Dr. Clifford Pereira, for giving me the opportunity to work with them. I especially thank Dr. Birkes who patiently and tirelessly helped me throughout my doctoral program. Without his encouragement and constant guidance, this dissertation would not have been finished. I really can not thank him enough, and it is an honor to work with him. I also greatly thank Dr. Pereira for his help and suggestions. I appreciate his attention, and valuable comments that improved the work of this thesis. I would also like to thank the entire faculty, and staff in the Department of Statistics. I especially thank Dr. Alix Gitelman and Dr. Lisa Madsen for serving on my committee, and for their concern and comments. I thank Professor Robert Smythe and Professor Daniel Schafer for their help and concern. I also thank Professor Philippe Rossignol from the Department of Fisheries and Wildlife for serving on my committee. I am thankful to the faculty in the Department of Statistics at University of Pittsburgh, especially Professor Leon Gleser, Professor Satish Iyengar, and Professor David Stoffer who helped me during my studies there. Special thanks go to all current and former students, of whom which there are too many to mention, who influenced my statistical education. In particular, I thank my office mate, Yonghai Li, for the unforgettable memories and valuable discussions we had. I would like to express my sincere appreciation to my parents for their compassion, encouragement, and support. They taught me at a young age how to respect education, and offered me love and support through the course of my life. Thanks are due foremost to my wonderful wife, Sohailah, and my lovely children, Sarah, Kasim and Yaser, for supporting me throughout the entire process. Thank you Sohailah for your patience, sacrifices and believing in me. You helped me in so many ways, and I will never forget that. Your inspiration, understanding, and love made this work possible. TABLE OF CONTENTS Page 1. INTRODUCTION ……………………………………………………………………..1 1.1 Previous Results ………………………………………………………. ……….....1 1.2 Contributions and Summary of Results …………………………………………...4 2. VARIANCE-COVARIANCE MATRIX OF THE FIXED EFFECTS ESTIMATOR …………………………………………………………………………..7 2.1 The Model …………………………………………………………………………7 2.2 Notation ……………………………………………………………………………7 2.3 Assumptions ……………………………………………………………………….8 2.4 Estimating Var( β̂ ) ……………………………………………………………….11 3. TESTING THE FIXED EFFECTS …………………………………………………...21 3.1 Constructing a Wald-Type Pivot ...……………………………………………....21 3.2 Estimating the Denominator Degrees of Freedom and the Scale Factor ………...36 4. MODIFYING THE ESTIMATES OF THE DENOMINATOR DEGREES OF FREEDOM AND THE SCALE FACTOR …………………………………………..38 4.1 Balanced One-Way Anova Model ……………………………………………….38 4.2 The Hotelling T 2 model ………………………………………………………….41 4.3 Kenward and Roger’s Modification ……………………………………………...48 4.4 Modifying the Approach with the Usual Variance-Covariance Matrix ………….54 5. TWO PROPOSED MODIFICATIONS FOR THE KENWARD-ROGER METHOD …………………………………………………………………………….55 5.1 First Proposed Modification ……………………………………………………..55 5.2 Second Proposed Modification …………………………………………………..58 TABLE OF CONTENTS (Continued) Page 5.3 Comparisons among the K-R and the Proposed Modifications ………………….61 6. KENWARD-ROGER APPROXIMATION FOR THREE GENERAL MODELS ….67 6.1 Modified Rady’s Model ………………………………………………………….67 6.2 A General Multivariate Linear Model …………………………………………...77 6.3 A Balanced Multivariate Model with a Random Group Effect ...………………..83 7. THE SATTERTHWAITE APPROXIMATION ……………………………………..97 7.1 The Satterthwaite Method ………………………………………………………..97 7.2 The K-R, the Satterthwaite and the Proposed Methods ………………………….98 8. SIMULATION STUDY FOR BLOCK DESIGNS …………………………………105 8.1 Preparing Formulas for Computations ………………………………………….105 8.2 Simulation Results ……………………………………………………………...109 8.3 Comments and Conclusion ...…………………………………………………...124 9. CONCLUSION ……………………………………………………………………...127 9.1 Summary ………………………………………………………………………..127 9.2 Future Research ………………………………………………………………...128 Bibliography …………………………………………………………………………130 LIST OF TABLES Table Page 1.1 PBIB1 (t = s = 15, k = 4, and n = 60) …………………………………………...110 1.2 Simulated size of nominal 5% Wald F- tests for PBIB1 …..……………………111 1.3 Mean of estimated denominator degrees of freedom for PBIB1 ………………..111 1.4 Mean of estimated scale for PBIB1 ……………………………………………..111 1.5 Percentage relative bias in the variance estimates, convergence rate (CR), and efficiency ( E ) for PBIB1 …………………………………………………...111 1.6 PBIB2 design (t = 16, s = 48, k = 2, and n = 96) ………………………………...112 1.7 Simulated size of nominal 5% Wald F- tests for PBIB2 ……..…………………112 1.8 Mean of estimated denominator degrees of freedom for PBIB2 ………………..112 1.9 Mean of estimated scales for PBIB2 …………………………………………….113 1.10 Percentage relative bias in the variance estimates, convergence rate (CR), and efficiency ( E ) for PBIB2 …………………………………………………...113 2.1 BIB1 (t = 6, s = 15, k = 2, and n = 30) …………………………………………114 2.2 Simulated size of nominal 5% Wald F- tests for BIB1 ...………………………..114 2.3 Mean of estimated denominator degrees of freedom for BIB1 …………………114 LIST OF TABLES (Continued) Table Page 2.4 Mean of estimated scales for BIB1 ……………………………………………...114 2.5 Percentage relative bias in the variance estimates, convergence rate (CR), and efficiency ( E ) for BIB1 ……………………………………………………..115 2.6 BIB2 (t = 6, s = 10, k = 3, and n = 30) ………………………………………….115 2.7 Simulated size of nominal 5% Wald F- tests for BIB2 ...………………………..115 2.8 Mean of estimated denominator degrees of freedom for BIB2 …………………116 2.9 Mean of estimated scales for BIB2 ……………………………………………...116 2.10 Percentage relative bias in the variance estimates, convergence rate (CR), and efficiency for BIB2 …………………………………………………………116 2.11 BIB3 (t = 7, s = 7, k = 4, and n = 28) …………………………………………..117 2.12 Simulated size of nominal 5% Wald F- tests for BIB3 …...……………………..117 2.13 Mean of estimated denominator degrees of freedom for BIB3 …………………117 2.14 Mean of estimated scales for BIB3 ……………………………………………...117 2.15 Percentage relative bias in the variance estimates, convergence rate (CR), and efficiency for BIB3 …………………………………………………………118 LIST OF TABLES (Continued) Table Page 2.16 BIB4 (t = 9, s = 36, k = 2, and n = 72) ………………………………………….118 2.17 Simulated size of nominal 5% Wald F- tests for BIB4 ...………………………..118 2.18 Mean of estimated denominator degrees of freedom for BIB4 …………………119 2.19 Mean of estimated scales for BIB4 ……………………………………………...119 2.20 Percentage relative bias in the variance estimates, convergence rate (CR), and efficiency ( E ) for BIB4 …………………………………………………….119 2.21 BIB5 design (t = 9, s = 18, k = 4, and n = 72) ………………………………….120 2.22 Simulated size of nominal 5% Wald F- tests for BIB5 ...………………………..120 2.23 Mean of estimated denominator degrees of freedom for BIB5 …………………120 2.24 Mean of estimated scales for BIB5 ……………………………………………...120 2.25 Percentage relative bias in the variance estimates, convergence rate (CR), and efficiency ( E ) for BIB5 …………………………………………………….121 2.26 BIB6 (t = 9, s = 12, k = 6, and n = 72) ………………………………………….121 2.27 Simulated size of nominal 5% Wald F- tests for BIB6 ...………………………..121 2.28 Mean of estimated denominator degrees of freedom for BIB6 …………………121 LIST OF TABLES (Continued) Table Page 2.29 Mean of estimated scales for BIB6 ……………………………………………...122 2.30 Percentage relative bias in the variance estimates, convergence rate (CR), and efficiency ( E ) for BIB6 …………………………………………………….122 3.1 Simulated size of nominal 5% Wald F- tests for RCB1 ……..…………………122 3.2 Mean of estimated denominator degrees of freedom for RCB1 ………………...123 3.3 Mean of estimated scales for RCB1 ……………………………………………..123 3.4 Percentage relative bias in the variance estimates, convergence rate (CR), and efficiency ( E ) for RCB1 …………………………………………………...123 3.5 Simulated size of nominal 5% Wald F- tests for RCB2 ..……………………….123 3.6 Mean of estimated denominator degrees of freedom for RCB2 ………………...124 3.7 Mean of estimated scales for RCB2 ……………………………………………..124 3.8 Percentage relative bias in the variance estimates, convergence rate (CR), and efficiency ( E ) for RCB2 …………………………………………………...124 To my wife, Sohailah, for her significant support. KENWARD-ROGER APPROXIMATE F TEST FOR FIXED EFFECTS IN MIXED LINEAR MODELS 1. INTRODUCTION When testing fixed effects in a mixed linear model, an exact F test may not exist. Several approaches are available to perform approximate F tests, and one of the most common approaches is the method derived by Kenward and Roger (1997). Since the Kenward-Roger method has been implemented in the MIXED procedure of the SAS system, it has become well known and widely used. In this thesis, we investigate the Kenward-Roger approach, and suggest two other modifications. 1.1 Previous Results In testing a hypothesis about fixed effects, it is desirable to find an exact test. In balanced mixed linear models, the standard Anova method leads to optimal exact F tests for the fixed effects in most cases. In such models, Seifert (1979) showed that the standard Anova F tests are uniformly most powerful invariant unbiased tests (UMPIU). Rady (1986) studied the needed assumptions for the mixed linear model so that the Anova method produces optimal exact F tests for the fixed effects. Seely and Rady (1988) studied conditions where the random effects can be treated as fixed to construct an exact F test for the fixed effects. VanLeeuwen et al. (1998) introduced the concept of error orthogonal (EO) designs. In models with EO structure, we have an exact F test for certain standard hypotheses. Utlaut et al. (2001) introduced the concept of simple error orthogonal (SEO) designs, in which optimal exact F tests can be obtained. If a mixed effects Anova model has a type of “partial balance” called b&r balanced (VanLeeuwen et al. , 1999), then it is SEO. A typical normal balanced Anova model is b&r balanced, and therefore, has an optimal exact F test as indicated above. In particular, when the smallest random effect that contains a fixed effect is unique, there is an optimal exact F test for the fixed effect which coincides with the Anova F test (Birkes, 2004). In addition, exact F tests have been constructed for certain other models. For instance, an exact F test 2 can be obtained for testing certain hypotheses about fixed effects in a general multivariate model (Mardia et al. , 1992). Also, an exact F test can be obtained to test fixed effects in a balanced multivariate model with a random group effect (Birkes, 2006). In many cases, constructing an exact test seems hard and approximations need to be employed. When no two mean squares in the Anova table have the same expectations under the null hypothesis, the Anova method may still be used to obtain an approximate F test. If a mean square can be created as a linear function of other mean squares such that it has the same expectation as the mean square of the fixed effects under the null hypothesis, then the created mean square is used in the denominator of an F test where its degrees of freedom can be approximated by Satterthwaite (1946). Another common approach to produce an approximate F test for the fixed effects is based on the Wald-type statistic. When using a Wald-type test, the numerator degrees of freedom equal the number of contrasts being tested, but the denominator degrees of freedom needs to be estimated. In the Containment method, which is the default method in SAS’s MIXED procedure, the denominator degrees of freedom are chosen be the smallest rank contribution of the random effects that contain the fixed effects to the design matrix. If no such effects are found, the estimate is the residual degrees of freedom. Giesbrecht and Burns (1985) proposed a method based on Satterthwaite (1941) to determine the denominator degrees of freedom for an approximate Wald-type t test for a single contrast of the fixed effects. Fai and Cornelius (1996) extended Giesbrecht and Burns method where they approximate the Wald-type test for a multidimensional hypothesis by an F distribution. Both approaches proposed by Giesbrecht and Burns, and Fai and Cornelius use the conventional estimator of the variance-covariance matrix of the fixed effects estimator, which is the variance-covariance matrix of the generalized least squares (GLS) estimator of the fixed effects, replacing the variance components with the REML estimates. The Fai and Cornelius approach is referred to as the Satterthwaite approach in some literature and in this thesis as well. Bellavance et al. (1996) suggested a method to improve the Anova F test by using a scaled F distribution with modified 3 degrees of freedom. It is known that the conventional estimator of the variance-covariance matrix of the fixed effects estimator underestimates (Kackar and Harville, 1984). Indeed, Kackar and Harville expressed the variance-covariance matrix of the fixed effects estimator as a sum of two components. The first component is the variance-covariance matrix of the GLS estimator of the fixed effects, and the second component represents what the first component underestimates. The second component was approximated by Kackar and Harville (1984) and Prasad and Rao (1990), and estimation of the first component was addressed by Harville and Jeske (1992) and later by Kenward and Roger (1997). In fact, Kenward and Roger (1997) combined estimators of the two components to produce an adjusted estimator for the variance-covariance matrix of the fixed effects estimator. They plug the adjusted estimator into the Wald-type statistic where a scaled form of the statistic follows an F distribution approximately. Unlike the Satterthwaite-based approaches where only the degrees of freedom need to be estimated, in Kenward and Roger’s approximation, two quantities need to be estimated from the data: the denominator degrees of freedom and the scaling factor. After deriving approximate expressions for the expectation and the variance of the Wald-type statistic, Kenward and Roger then match these with the first and second moments of the F distribution to determine the estimates of the denominator degrees of freedom and the scale. Moreover, Kenward and Roger modified the approximation in such a way that the estimates match the known values for two special cases where a scaled form of the Wald-type statistic has an exact F test. In fact, the idea of modifying the approach in a way to produce the exact values was adopted previously by Graybill and Wang (1980) when they modified approximate confidence intervals on certain functions of the variance components in a way to make them exact for some special cases. Lately, the performance of Kenward and Roger’s approximation has become a subject for some simulation studies. For example, Schaalje et al. (2002) compared the KR and the Satterthwaite approaches for a split plot design with repeated measures for 4 several sample sizes and covariance structures. They found that the K-R method performs as well as or better than the Satterthwaite approach in all situations. They considered three factors in the comparisons: the complexity of the covariance matrix, imbalance, and the sample size, and they found that these factors affect the Satterthwaite method more than the K-R method. The Satterthwaite method was found to work well only when the sample size is moderately large and the covariance matrix was compound symmetric. The K-R method was found to have a tendency toward inflated levels when the sample size was small, except when the covariance structure was compound symmetric. Chen and Wei (2003) compared the Kenward-Roger approach and a modified Anova method suggested by Bellavance et al. (1996) for some crossover designs. They recommended using the K-R approach when the sample size is at least 24. For smaller sample sizes, they found the modified Anova method works better than the K-R approach. Savin et al. (2003) found the K-R approach reliable to construct a confidence interval for the common mean in interlaboratory trials. Spike et al. (2004) investigated the K-R and the Satterthwaite methods to estimate the denominator degrees of freedom for contrasts of the fixed subplot effects. Like the conclusion drawn by Schaalje et al. (2002) above, they suggested Kenward and Roger’s approximation be preferred in small datasets, and the two methods are comparable for large datasets. Valderas et al. (2005) studied the performance of the K-R approach when AIC and BIC are used as criteria to select the covariance structure. They found the K-R method’s level much higher than the target values. Even with the correct covariance structure, the level was found to be higher than the target for many cases in which the covariance structure is not compound symmetric. 1.2 Contributions and Summary of Results Since some of the detailed derivation for Kenward and Roger’s approach was absent from their original work, we provide the detailed theoretical derivation of the method which includes clarifying the assumptions to justify the theoretical derivation. Also, we weaken some of the assumptions that were imposed by Kenward and Roger, and determine the orders of the approximations used in the derivation. We present two 5 modifications of the K-R method which are comparable in performance but simpler in derivation and computation. Kenward and Roger modified their approach in such a way that their method reproduces exact F tests in two special cases, namely for Hotelling T 2 and for Anova F ratios. We show that the K-R and the two proposed methods reproduce exact F tests in three more general models, two of which are generalizations of the two special cases. We explore relationships among the K-R, proposed and Satterthwaite methods by specifying cases where the approaches produce the same estimate of the denominator degrees of freedom or are even identical. Also, we show the difficulties in developing a K-R type method using the conventional, rather than adjusted, estimator of the variance-covariance matrix of the fixed-effects estimator. In chapter 2, using Taylor series expansions, matrix derivatives and invariance arguments, we derive the adjusted estimator for the variance -covariance matrix of the fixed effects estimator that was provided by Kenward and Roger (1997). Besides clarifying all assumptions that justify the theoretical derivation, we weaken some of the assumptions imposed by Kenward and Roger. In addition, we determine the orders of the approximations used in the derivation. In chapter 3, we derive the approximate expectation and variance of the Waldtype statistics where they are constructed by using the conventional and the adjusted estimator of the variance-covariance matrix of the fixed effects estimator. In addition, we match the first and second moments of F distribution with these for the scaled form of the Wald-type statistic to obtain the Kenward-Roger approximation for the denominator degrees of freedom and scale before the modification. Two special cases: the balanced one-way Anova model, and the Hotelling T 2 model where the Wald-type statistic have an exact F distribution are considered in chapter 4 to establish the modification of the expectation and the variance of the Waldtype statistic proposed by Kenward and Roger so the approach produces the right and known values for these two special cases. Kenward and Roger (1997) mentioned that it can be argued that the conventional estimator of the variance-covariance matrix of the fixed effects estimator can be used instead of the adjusted estimator. We discuss the 6 difficulties in modifying the approach by using the conventional estimator instead of the adjusted estimator. The K-R approach was derived based on modifying the approximated expressions for the expectation and the variance of the Wald-type statistic. This modification is not unique, and hence we introduce two other modifications for the K-R method in chapter 5. We keep the modification of the expectation of the statistic as modified by Kenward and Roger; however, instead of modifying the variance, we modify other related quantities. The proposed modifications for the K-R method are comparable in performance and simpler in derivation and computation. As mentioned above, the special cases used by Kenward and Roger are not the only cases where the K-R and proposed methods produce the exact values. Indeed, the Kenward-Roger and the proposed modifications produce the exact values for three general models where there is an optimal exact F test. The models studied in chapter 6 are: (1) Rady’s model (with a slight modification) which includes a wide class of balanced mixed classification models and is more general than the balanced one-way Anova model, (2) a general linear multivariate model which is more general than the Hotelling T 2 , and (3) a balanced multivariate model with a random group effect. We show that the estimate of the denominator degrees of freedom and the scale factor match the known values for those models. The Satterthwaite, the Kenward-Roger and the proposed methods perform similarly in some situations. Chapter 7 is devoted to study the cases where these methods produce the same estimate for the denominator degrees of freedom. Moreover, we study the cases where the approaches become identical to each other. In chapter 8, we provide a simulation study for three types of block designs: partially balanced incomplete block designs, balanced incomplete block designs and complete block designs with some missing data. In the simulation study, the sample size, the ratio of the variance components, and the efficiency factor are considered to see how they affect the performance of the Kenward-Roger, the proposed, the Satterthwaite and the Containment methods. 7 2. VARIANCE-COVARIANCE MATRIX OF THE FIXED EFFECTS ESTIMATOR For a multivariate mixed linear model, statisticians used to estimate the precision of the fixed effects estimates based on the asymptotic distribution. However, this estimate was known to be biased and underestimate the variance of the fixed effects estimate. Kenward and Roger (1997) proposed an adjustment for the estimator of the variance of the fixed effects estimator which is investigated in this chapter. 2.1 The Model Consider n observations y following a multivariate normal distribution, y ∼ N(Χβ, Σ) , where Χ(n × p ) is a full column rank matrix of known covariates, β( p × 1) is a vector of unknown parameters and Σ (n × n) is an unknown variance-covariance matrix whose elements are assumed to be functions of r parameters, σ (r × 1) = (σ 1 ,...., σ r )′ . The generalized least squares estimator of β is β = ( Χ′Σ −1Χ) −1 Χ′Σ −1y , and the matrix Φ = ( Χ′Σ −1 Χ) −1 is the variance-covariance matrix of this estimator. The REML estimator of σ is denoted by σ̂ , and the REML-based estimated generalized least squares estimator of β (EGLSE) is βˆ = ( Χ′Σ −1 (σˆ ) Χ) −1 Χ′Σ −1 (σˆ ) y . The center of our study is β , and Σ is a nuisance to be addressed in the analysis. We are interested in testing H 0 : L′β = 0 , for L′ an ( × p ) fixed matrix . 2.2 Notation Throughout the thesis, we use the following notation. For a matrix A , we use A′, ℜ( A), N(A), r(A), tr(A), A , to denote transpose, range, null space, rank, trace, and determinant of A respectively. ℜ( A) ⊥ is used to denote the orthogonal complement of ℜ( A ). We use the abbreviation p.d. for positive definite and n.n.d. for nonnegative definite. The notation p.o. is used for projection operator and o.p.o. for orthogonal 8 projection operator. Ρ A is o.p.o. on ℜ( A ) . In addition, we use the following V = Var(βˆ ) W = Var(σˆ ), wij = Cov(σˆ i , σˆ j ) G = Σ −1 − Σ −1Χ( Χ′Σ −1Χ) −1 Χ′Σ −1 Θ = L(L′ΦL) −1 L′ Φ = ( Χ′Σ −1 Χ) −1 Ρ i = − Χ′Σ −1 ∂Σ −1 Σ Χ ∂σ i Qij = Χ′Σ −1 ∂Σ −1 ∂Σ −1 Σ Σ Χ ∂σ i ∂σ j R ij = Χ′Σ −1 ∂2Σ Σ −1Χ ∂σ i ∂σ j Λ = Var(βˆ − β) r r Λ = ∑∑ wij Φ(Qij − Ρ i ΦΡ j )Φ i =1 j =1 r r ˆ − Ρˆ ΦΡ ˆ =Φ ˆ + 2Φ ˆ ⎧⎨∑∑ wˆ (Q ˆ ˆ −1R ˆ ) ⎫⎬ Φ ˆ Φ ij ij i j ij A 4 ⎩ i =1 j =1 ⎭ r r A1 = ∑∑ wij tr(ΘΦΡ i Φ)tr(ΘΦΡ j Φ) i =1 j =1 r r A2 = ∑∑ wij tr(ΘΦΡ i ΦΘΦΡ j Φ), i =1 j =1 r r 1 A3 = ∑∑ wij tr[ΘΦ(Qij − Ρ i ΦΡ j − R ij )Φ] 4 i =1 j =1 1 ˆ L) −1 L′βˆ , unless otherwise mentioned. F = βˆ ′L(L′Φ A ˆ = H (σˆ ). For a matrix H (σ ), we use H 2.3 Assumptions For chapters 2 and 3, we impose the following assumptions about the model. (A1) The expectation of β̂ exists. (A2) Σ is a block diagonal and nonsingular matrix. Also, we assume that the elements of 9 ∂Σ k ∂ 2 Σ k , , and Χ k are bounded, where Σ = diag1≤ k ≤ m ( Σ k ), and Σ k ,Σ , ∂σ i ∂σ i ∂σ j −1 k Χ = col 1≤k ≤m ( Χ k ), and sup ni < ∞ , where ni are the blocks sizes. (A3) E[σˆ ] = σ + O (n − 2 ). . 3 (A4) The possible dependence between σˆ and βˆ is ignored. r (A5) r ⎡ ∂β ∑∑ Cov ⎢( ∂σ i =1 j =1 ⎢⎣ i )( ⎤ ∂β 5 )′, (σˆ i − σ i )(σˆ j − σ j ) ⎥ = O (n − 2 ). ∂σ j ⎥⎦ (A6) Φ = ( Χ′Σ −1 Χ) −1 = O(n −1 ), (L′ΦL) −1 = O(n), ∂β ∂ 2β 1 − 12 = O(n ), 2 = O(n − 2 ) ∂σ i ∂σ i (A7) If T = Op (nα ), then E[T] = O(nα ) Remarks (i ) Σk is said to be bounded when max (elements of Σ k ) ≤ b(σ) for some constant b . Since sup ni < ∞ , then n → ∞ ⇔ m → ∞. (ii ) Even though Kenward and Roger required the bias of the REML estimator to be ignored, we only require E[σˆ ] = σ + O ( n − 2 ) as stated as assumption (A3). In fact, for the 3 model mentioned in section 2.1, E[σˆ − σ ] = b(σ ) + O(n −2 ), where b(σ ) is of order O (n −1 ) (Pase and Salvan, 1997, expression 9.62). Moreover, when the covariance structure is linear, b(σ ) = 0 , and hence E[σˆ − σ ] = O (n −2 ) which is stronger than what we need. (iii ) Assumption (A4) was also imposed by Kenward and Roger. In fact, we did investigate some models, like Hotelling T 2 model, the fixed effects model, and the one way Anova model with fixed group effects and unequal group variances. In these models, σˆ and βˆ are independent exactly. Also, for those models, the sum of covariance in assumption (A5) is zero which is stronger than the assumption. This assumption is needed to derive approximate expressions for the expectation and the variance of the statistic by conditioning on σ̂ as we will see in chapter 3. However, we should mention 10 that the same derivation of the expectation and the variance of the statistic can be done without conditioning on σ̂ , in which case assumption (A4) is not needed anymore. (iv ) One circumstance for the covariance in assumption (A5) to be exactly zero is when σ̂ is obtained from a previous sample, and β(σ ) from current data (Kackar and Harville, 1984). Also, Kackar and Harville argued that when σ̂ is unbiased and ⎡ ∂β ∂β ⎤ E ⎢( )( )′ | σˆ ⎥ can be approximated by first order Taylor series, then the covariance ⎢⎣ ∂σ i ∂σ j ⎥⎦ in (A5) is expected to be zero. It appears that Kenward and Roger assumed one of the arguments mentioned. For us, it is enough to assume (A5). (v) Χ′H (σ ) Χ = O(n), where H (σ ) = diag1≤ k ≤ m (H k (σ ), and H k (σ ) is a product of any combination of Σ k ,Σ −k 1 , ∂Σ k ∂ 2Σk , and . This is true because Χ′k H k (σ ) Χ k = O(1) ∂σ i ∂σ i ∂σ j for 1 ≤ k ≤ m ⇒ max(elements of Χ′k H k (σ ) Χ k ) ≤ a for 1 ≤ k ≤ m m ⇒ ∑ max(elements of Χ′k H k (σ ) Χ k ) ≤ ma k =1 ⇒ m ∑ max(elements of Χ′ H k k =1 m k (σ ) Χ k ) ≤ ∑ max(elements of Χ′k H k (σ ) Χ k ) ≤ ma, k =1 m and hence Χ′H (σ ) Χ = ∑ Χ′k H k (σ ) Χ k = O(m) = O(n). k =1 (vi ) A general situation where the first two conditions in assumption (A6) hold is when all Χ′k Σ −k 1Χ k are contained in a compact set of p.d. matrices ⇒ 1 Χ′Σ −1Χ ∈ m ⇒ mΦ ∈ and hence Φ = O (n −1 ). Also, we have mL′ΦL ∈ L′ −1 L (p.d.) ⇒ (L′ΦL)−1 = O (n). −1 Also, this assumption holds when we suppose Σ k = Σ1 ∀k , and we suppose that 1 m Χ′k Σ1−1Χ k → A (p.d.). This supposition is reasonable in two situations: ∑ m k =1 1) Χ k = Χ1 ∀k , and in this case, A = Χ1′ Σ1−1Χ1. 2) Χk are regarded as iid random covariate matrices, and by the weak law of large numbers, 11 Χ′k Σ1−1Χ k converges to E[Χ1′ Σ1−1Χ1 ]. Then, since inversion is a continuous operation, mΦ → A −1. That is Φ = O( n −1 ) 1 Also, (L′ΦL) −1 → (L′A −1L) −1. That is (L′ΦL) −1 = O(n). m ∂β ∂Σ = −ΦΧ′Σ −1 Gy (as we will see in lemma 2.4.7). ∂σ i ∂σ i m ∂Σ G , and Χ′B(σ )y = ∑ Χ′k B k (σ )y k , ∂σ i k =1 where y = col1≤ k ≤ m (y k ), and B(σ ) = diag 1≤k ≤m (B k (σ ). If Χ′k B k (σ )y k are iid, then by the Consider Χ′B(σ)y , where B(σ ) = Σ −1 ⎡1 m ⎤ weak law of large numbers, we have m ⎢ ∑ Χ′k B k (σ )y k − E[Χ′k B k (σ )y k ]⎥ = O (1). ⎣ m k =1 ⎦ However, in lemma (2.4.7), we have E[Χ′k B k (σ )y k ] = 0, and hence Χ′B(σ )y = O (n 2 ). 1 ⇒ ∂β 1 1 = O (n −1 )O (n 2 ) = O (n − 2 ). ∂σ i (vii ) If H (σ ) = O p (nα ), then H (σˆ ) = O p (nα ). This result is obtained by employing a Taylor series expansion for H (σˆ ) about σ. (viii ) For some random matrices T, their expectations have higher order than the random matrices themselves. For instance, (σˆ i − σ i )(σˆ j − σ j )(σˆ k − σ k ) = O p (n − 2 ), as it will be 3 shown in theorem (2.4.9); however, E[(σˆ i − σ i )(σˆ j − σ j )(σˆ k − σ k )] = O (n −2 ), from expression (9.74) in Pace and Salvan. In assumption (A7), we assume that the expectation will reserve the order which does not conflict with the cases mentioned above. 2.4 Estimating Var( β̂ ) There are two main sources of bias in Φ̂ when it is used as an estimator for Var( β̂ ): Φ̂ underestimates Φ , and Φ = Var(β) does not take into account the variability of σ̂ in βˆ = β(σˆ ) . Kackar and Harville (1984) proposed that Var( β̂ ) can be partitioned as Var( β̂ ) = Φ + Λ , and they addressed the second source of bias by approximating Λ . The first source of bias was discussed by Harville and Jeske (1992), and Kenward and Roger (1997) proposed an approximation to adjust the first bias, and they combined both 12 adjustments to calculate their proposed estimator of Var( β̂ ) which is denoted by Φ̂ A . In this section, several lemmas are derived to lead to the expression for Φ̂ A . Lemma 2.4.1 With LR (σ ) being the likelihood function of z = Κ ′y, where Κ ′Χ = 0, With LR (σ ) being the likelihood function of z = Κ ′y , where Κ ′Χ = 0, 2log LR (σ ) = 2 R (σ ) = constant − log Σ − log Χ′Σ -1Χ − y ′ ⎡⎣ Σ −1 − Σ −1Χ( Χ′Σ -1Χ) −1 Χ′Σ −1 ⎤⎦ y Proof y ∼ N(Χβ, Σ), f (y ) = (2π ) −n 2 Σ −1 2 ⎡ 1 ⎤ exp ⎢ − (y − Χβ)′Σ −1 (y − Χβ) ⎥ ⎣ 2 ⎦ z ∼ N(0, Κ ′ΣΚ) , (Birkes, 2004, theorem 7.2.2). and f (z ) = (2π ) −q 2 Κ ′ΣΚ −1 2 ⎡ 1 ⎤ exp ⎢ − z′(Κ ′ΣΚ ) −1 z ⎥ , where q = n − p . ⎣ 2 ⎦ The REML estimator for σ is the maximum likelihood estimator from the marginal likelihood of z = Κ ′y where Κ ′ is any q × n matrix of full column rank 2 R (σ ) = −q log(2π ) − log(Κ ′ΣΚ ) − z′(Κ ′ΣΚ ) −1 z = −q log(2π ) − log(Κ ′ΣΚ ) − y ′Κ (Κ ′ΣΚ )−1 Κ ′y To prove the lemma, it suffices to show that (a ) y ′Κ (Κ ′ΣΚ ) −1 Κ ′y = y ′ ⎡⎣ Σ −1 − Σ −1Χ( Χ′Σ −1Χ) −1 Χ′Σ −1 ⎤⎦ y (b) − qlog(2π ) − log Κ ′ΣΚ = constant − log Σ − log Χ′Σ −1Χ For part ( a ), it suffices to show that Κ (Κ ′ΣΚ ) −1 Κ ′ = Σ −1 − Σ −1Χ( Χ′Σ -1Χ) −1 Χ′Σ −1 Indeed, Κ (Κ ′ΣΚ ) −1 Κ ′Σ = Ι − Σ −1Χ(Χ′Σ -1Χ)−1 Χ′ (Seely, 2002, problem (2.B.4)) ⇒ Κ (Κ ′ΣΚ ) Κ ′ = Σ − Σ Χ( Χ′Σ Χ) Χ′Σ −1 −1 −1 −1 -1 −1 Observe that Σ is nonsingular (assumption A2), and n.n.d (Birkes, 2004), then Σ is p.d. (Seely, 2002, corollary 1.9.3), and hence r(Κ ′ΣΚ ) = r(Κ ′) Also,we have ℜ(Κ ) = N ( Χ′) (Birkes, 2004, a lemma on Jan 23). (because ℜ(Κ ) = ℜ( Χ)⊥ ) 13 So, all conditions of problem B.4 from Seely notes are satisfied. For part (b), choose Τ=Χ( Χ′Χ) −1 −1 −1 , so Τ′Τ = (Χ′Χ) 2 Χ′Χ(Χ′Χ) Also, notice that we can choose Κ ′ to be orthogonal so Κ ′Κ = Ι. 2 2 = Ι. Κ ′Τ = 0, and this is Κ ′Χ = 0. ⎡ Τ′ ⎤ ⎡ Τ′Τ Τ′Κ ⎤ ⎡ Ι 0⎤ Let [ Τ Κ ] = R , so R′R = ⎢ ⎥ [ Τ Κ ] = ⎢ ⎥=⎢ ⎥ ⎣Κ ′⎦ ⎣Κ ′Τ Κ ′Κ ⎦ ⎣0 Ι ⎦ Hence R′R = 1 , and Σ = R Σ R′ = RΣR′ = Τ′ΣΤ Τ′ΣΚ = Κ ′ΣΚ Τ′ΣΤ − Τ′ΣΚ (Κ ′ΣΚ )−1 Κ ′ΣΤ Κ ′ΣΤ Κ ′ΣΚ = Κ ′ΣΚ ( Χ′Σ -1Χ) −1 Χ′Χ . So, log Σ = log Κ ′ΣΚ − log Χ′Σ -1Χ + log Χ′Χ ⇒ − log Κ ′ΣΚ = − log Σ − log Χ′Σ -1Χ + constant. Lemma 2.4.2 Proof β̂ is symmetric about β . Since σ̂ is reflection and translation invariant (Birkes, 2004), then β̂ is reflection and translation equivariant (Birkes, 2004, lemma 7.1). d Since y ∼ N(Χβ, Σ), then y − Χβ = − (y − Χβ) ⇒ βˆ (y ) − β = βˆ (y ) + (−β) = βˆ (y + Χ(−β)) (βˆ is translation equivariant) d = βˆ (y − Χβ) = − βˆ (−(y − Χβ)) = −βˆ ( y − Χβ) , βˆ is a reflection equivariant = −(βˆ (y ) − β). Lemma 2.4.3 Given the expectation exists (assumption A1), β̂ is an unbiased estimator for β . Proof d By applying lemma 2.3.2, βˆ − β = − (βˆ − β) ⇒ E(βˆ − β) = E[−(βˆ − β )] 14 ⇒ E(βˆ ) − β = − E(βˆ ) + β ⇒ 2E(βˆ ) = 2β ⇒ E(βˆ ) = β. Lemma 2.4.4 Proof β and (Ι - Ρ Χ )y are independent. Cov ⎣⎡β, (Ι - Ρ Χ )y ⎦⎤ = Cov ⎡⎣ ( Χ′Σ −1Χ) −1 Χ′Σ −1y , (Ι - Ρ Χ )y ⎤⎦ = ( Χ′Σ −1Χ) −1 Χ′Σ −1Σ(Ι - Ρ Χ ) = ( Χ′Σ −1Χ) −1 Χ′(Ι - Ρ Χ ) = ( Χ′Σ −1Χ) −1 Χ′ − ( Χ′Σ -1Χ) −1 Χ′Ρ Χ = ( Χ′Σ −1Χ) −1 Χ′ − ( Χ′Σ −1Χ) −1 Χ′ = 0 Since β and (Ι - Ρ Χ )y are both normally distributed, then they are independent (Birkes, 2004, proposition 7.2.3). Lemma 2.4.5 Proof βˆ − β is a function of (Ι - Ρ Χ )y . Consider (βˆ − β )( y ) = δ( y ), and hence δ(y ) is a translation-invariant function. To show that δ(y ) is a function of (Ι - Ρ Χ )y, it is equivalent to show that: (Ι - Ρ Χ )y (1) = (Ι - Ρ Χ )y (2) ⇒ δ(y (1) ) = δ(y (2) ) (Ι - Ρ Χ )y (1) = (Ι - Ρ Χ )y (2) ⇒ y (1) − Ρ Χ y (1) = y (2) − Ρ Χ y (2) ⇒ y (2) = y (1) − Ρ Χ y (1) + Ρ Χ y (2) = y (1) + Ρ Χ (y (2) − y (1) ) ⇒ δ(y (2) ) = δ(y (1) + Χb) for some b = δ(y (1) ) (δ is translation invariant). Lemma 2.4.6 Proof Var(βˆ ) = Φ + Λ, where Λ = Var(βˆ − β) From lemma 2.4.4, β and (Ι - Ρ Χ )y are independent, and from lemma 2.4.5, βˆ − β is a function of (Ι - Ρ Χ )y . Then, β and βˆ − β are independent. Write βˆ = β + βˆ − β ⇒ Var(βˆ ) = Var(β) + Var(βˆ − β ) = Φ + Λ. 15 ⎡ ∂β ⎤ ⎡ ∂β ⎤ )(σˆ i − σ i ) ⎥ = 0 for i = 1,...., r . Lemma 2.4.7 E ⎢ ⎥ = 0, and E ⎢( ⎣ ∂σ i ⎦ ⎣ ∂σ i ⎦ ∂β ∂ ⎡( Χ′Σ −1 Χ) −1 Χ′Σ −1y ⎤⎦ = ∂σ i ∂σ i ⎣ Proof = ∂ ∂ ⎡⎣( Χ′Σ -1 Χ) −1 ⎤⎦ Χ′Σ −1y + ( Χ′Σ -1 Χ) −1 ⎡⎣ Χ′Σ −1y ⎤⎦ ∂σ i ∂σ i = −ΦΧ′Σ −1 ∂Σ Gy where G = Σ −1 − Σ −1Χ( Χ′Σ -1Χ )−1 Χ′Σ −1 ∂σ i (2.1) ⎡ ∂β ⎤ ⎡ ⎤ ∂Σ −1 ∂Σ E(Gy ) , ⇒ E⎢ Gy ⎥ = −ΦΧ′Σ −1 ⎥ = E ⎢ −ΦΧ′Σ σ σ σ ∂ ∂ ∂ i i ⎣ ⎦ ⎣ i⎦ where Ε[Gy ] = E[(Σ −1 − Σ −1Χ( Χ′Σ-1Χ)−1 Χ′Σ −1 )y ] = Σ −1E[y − Χ( Χ′Σ -1Χ) −1 Χ′Σ -1Y] = Σ −1 (Χβ − Χβ) = 0. ⎡ ∂β ⎤ ∂Σ E⎢ (σˆ i − σ i ) ⎥ = −ΦΧ′Σ −1 Ε[ g (y )], where g (y ) = Gy (σˆ i (y ) − σ i ). ∂σ i ⎣ ∂σ i ⎦ g (−y ) = G (− y ) [σˆ i (−y ) − σ i )] = −Gy [σˆ i (− y ) − σ i )] = − g (y ), because σˆ i is reflection equivariant. So, g is reflection-equaivariant. Since y ∼ N(Χβ, Σ) d d ⇒ y − Χβ = − (y − Χβ) ⇒ y = − y + 2Χβ d and hence g(y ) = g( − y + 2 Χβ) = −g[y + Χ( −2β )] because g is reflection equivariant = − g (y ), because g is translation invariant. Notice that g (y + Χβ) = G (y + Χβ) [σˆ i (y + Χβ) − σ i ] = (Gy + GΧβ) [σˆ i (y + Χβ) − σ i ] = Gy[σˆ i (y ) − σ i ] = g (y ), and hence g is a translation invariant function. Since g (y ) = − g (y ), then Ε[ g (y )] = Ε[− g (y )] = −Ε[ g (y )] ⎡ ∂β ⎤ )(σˆ i − σ i ) ⎥ = 0 for i = 1,...., r ⇒ 2Ε[ g (y )] = 0 ⇒ Ε[ g (y )] = 0, and hence E ⎢ ( ⎣ ∂σ i ⎦ 16 Lemma 2.4.8 (a ) ∂Φ = −ΦΡ i Φ ∂σ i ∂ 2Φ = Φ(Ρ i ΦΡ j + Ρ j ΦΡ i − Qij − Q ji + R ij )Φ, (b) ∂σ i ∂σ j Ρ i = − Χ′Σ −1 where R ij = Χ′Σ −1 and Proof ∂Σ −1 Σ Χ, ∂σ i Qij = Χ′Σ −1 ∂Σ −1 ∂Σ −1 Σ Σ Χ, ∂σ i ∂σ j ∂2Σ Σ −1Χ ∂σ i ∂σ j −1 ∂Φ −1 ∂ ( Χ′Σ Χ) -1 ′ (a) ( Χ′Σ −1Χ) −1 = −( Χ Σ Χ) ∂σ i ∂σ i = −( Χ′Σ −1Χ) −1 Χ′ (b) ∂ 2Φ ∂ =− ∂σ i ∂σ j ∂σ i ∂Σ −1 Χ( Χ′Σ −1Χ)−1 = −ΦΡ i Φ, ∂σ i ⎧⎪ ⎫⎪ ∂Σ −1 −1 −1 ′ ′ Χ Σ Χ Χ Χ( Χ′Σ −1Χ) −1 ⎬ ( ) ⎨ ∂σ j ⎪⎩ ⎭⎪ ∂Σ −1 ∂Σ −1 −1 −1 = ( Χ′Σ Χ) Χ′ Χ( Χ′Σ Χ) Χ′ Χ( Χ′Σ −1Χ) −1 ∂σ i ∂σ j −1 −1 − ( Χ′Σ −1Χ) −1 Χ′ ∂ 2 Σ −1 Χ( Χ′Σ −1Χ) −1 ∂σ i ∂σ j + ( Χ′Σ −1Χ) −1 Χ′ ∂Σ −1 ∂Σ −1 Χ( Χ′Σ -1Χ) −1 Χ′ Χ( Χ′Σ -1Χ) −1 ∂σ j ∂σ i ⎧⎪ ⎫⎪ ∂ 2 Σ −1 = Φ ⎨Ρ i ΦΡ j − Χ′ Χ + Ρ j ΦΡ i ⎬ Φ. ∂σ i ∂σ j ⎪⎩ ⎪⎭ Observe that Χ′ = Χ′Σ −1 ∂ 2 Σ −1 ∂ Χ = Χ′ ∂σ i ∂σ j ∂σ i (2.2) ⎛ −1 ∂Σ −1 ⎞ Σ ⎟Χ ⎜⎜ − Σ ⎟ ∂ σ j ⎝ ⎠ ∂Σ −1 ∂Σ −1 ∂2Σ ∂Σ −1 ∂Σ −1 Σ Σ Χ − Χ′Σ −1 Σ −1Χ + Χ′Σ −1 Σ Σ Χ ∂σ i ∂σ j ∂σ i ∂σ j ∂σ j ∂σ i = Q ij − R ij + Q ji Combining expressions 2.3 and 2.3, we obtain (2.3) 17 ∂ 2Φ = Φ(Ρ i ΦΡ j + Ρ j ΦΡ i − Qij − Q ji + R ij )Φ. ∂σ i ∂σ j Before proceeding, we present asymptotic orders for some terms that will be used often in chapters 2 and 3. Lemma 2.4.9 (a)Φ = O(n −1 ), (b)Ρi = O(n), (e)Θ = O(n), ( f ) wij = Cov(σˆ i , σˆ j ) = O(n −1 ), (i ) ∂ 2Φ = O(n −1 ), ∂σ i ∂σ j ( m) ∂2Λ = O(n −2 ), ∂σ i Proof ( j) ∂wij ∂σ i = O (n −1 ), (k ) (c)Qij = O(n), ( g ) Λ = O( n −2 ), (d )R ij = O(n), ( h) ∂Φ = O( n −1 ), ∂σ i ∂ 2 wij ∂Λ = O(n −2 ), (l ) = O(n −1 ), 2 ∂σ i ∂σ i (n)σˆ i − σ i = O p ( n − 2 ) 1 Parts (b), (c), and( d ) are direct from remark(v ) above. Parts ( a ) and( e ) are direct from assumption (A6). wij = i ij + a (σ ), where i ij is the (i, j ) entry of the inverse of the expected information matrix (Pace and Salvan, 1997, expression 9.73). Pace and Salvan, showed that i ij = O (n −1 ), and a (σ ) = O(n −2 ). From parts (a ), (b), (c), and (f ), we have Λ = O (n −2 ). Since ∂Φ ∂ 2Φ = −ΦΡ i Φ, = Φ(Ρ i ΦΡ j + Ρ j ΦΡ i − Qij − Q ji + R ij )Φ(lemma 2.4.8 ), ∂σ i ∂σ i ∂σ j then from previous parts, results (h) and (i ) are obtained. By using expression (9.17) in Pace and Salvan(1997), we have ∂i ij = O (n −1 ), ∂σ i ∂ 2i ij ∂ 2 a(σ ) −1 ∂a (σ ) −2 = O(n ), = O(n ), and = O(n −2 )hence results ( j ) and (l ) are 2 2 ∂σ i ∂σ i ∂σ i obtained. By computing the derivative of Λ, and using previous parts, results (k ) and (m) are obtained. Finally, expression (n) is a direct result of the asymptotic normality of n (σˆ i − σ i ) (Pace and Salvan, expression 3.38). 18 5 Lemma 2.4.10 Λ = Var(βˆ − β) = Λ + O ( n − 2 ), Proof Using a Taylor series expansion about σ , we have r ∂β 3 βˆ = β(σˆ ) = β(σ ) + ∑ (σˆ i − σ i ) + O p (n − 2 ). i =1 ∂σ i r ∂β 3 ⇒ βˆ − β = ∑ (σˆ i − σ i ) + O p (n − 2 ), i =1 ∂σ i ⎡ r ′⎤ r ⎛ ∂β ∂β −32 ⎞ ⎛ − 32 ⎞ ⎥ ˆ ⎢ and Λ = Var(β − β) = E ⎜ ∑ (σˆ − σ i ) + O p (n ) ⎟ ⎜ ∑ (σˆ i − σ i ) + O p (n ) ⎟ ⎢⎝ i =1 ∂σ i i ⎠ ⎝ i =1 ∂σ i ⎠⎥ ⎣ ⎦ ⎡⎛ r ∂β ⎤ ⎡⎛ r ∂β ⎤′ − 32 ⎞ − 32 ⎞ ˆ ˆ − E ⎢⎜ ∑ (σ i − σ i ) + O p (n ) ⎟ ⎥ E ⎢⎜ ∑ (σ i − σ i ) + O p (n ) ⎟ ⎥ ⎠ ⎦ ⎣⎝ i =1 ∂σ i ⎠⎦ ⎣⎝ i =1 ∂σ i Applying lemma 2.4.7, we obtain ⎡ r r ∂β ∂β ⎤ 5 Λ = Ε ⎢ ∑∑ ( )( )′(σˆ i − σ i )(σˆ j − σ j ) ⎥ + O (n − 2 ) ⎢⎣ i =1 j=1 ∂σ i ∂σ j ⎥⎦ r r ⎡ ∂β ∂β ⎤ 5 = ∑∑ Ε ⎢( )( )′⎥ × Ε ⎣⎡ (σˆ i − σ i )(σˆ j − σ j ) ⎦⎤ + O( n − 2 ) (assumption A5). i=1 j =1 ⎣⎢ ∂σ i ∂σ j ⎦⎥ (2.4) By assumption (A3), we have Ε ⎡⎣ (σˆ i − σ i )(σˆ j − σ j ) ⎤⎦ = Ε(σˆ iσˆ j ) − σ iσ j + O (n − 2 ) = wij + O (n − 2 ) 3 3 (2.5) ⎡ ∂β ∂β ⎤ ⎡ ∂β ∂β ⎤ ⎡ ∂β ⎤ Since Ε ⎢ )( )′⎥ = Cov ⎢ , ⎥. ⎥ = 0 (lemma 2.4.7), then Ε ⎢ ( ⎢⎣ ∂σ i ∂σ j ⎥⎦ ⎢⎣ ∂σ i ∂σ j ⎥⎦ ⎣ ∂σ i ⎦ GΣG ′ = [ Σ −1 − Σ −1Χ( Χ′Σ -1Χ) −1 Χ′Σ −1 ]Σ[ Σ −1 − Σ −1Χ( Χ′Σ -1Χ) −1 Χ′Σ −1 ] = Σ −1[Ι − Χ( Χ′Σ -1Χ) −1 Χ′Σ −1 ][Ι − Χ( Χ′Σ -1Χ)−1 Χ′Σ −1 ] = Σ −1[Ι − Χ( Χ′Σ -1Χ) −1 Χ′Σ −1 ] = G , Then, using expression (2.1), we obtain ⎡ ∂β ∂β ⎤ ∂Σ −1 −1 ∂Σ Σ −1[Ι − Σ −1Χ( Χ′Σ −1Χ) −1 Χ′Σ −1 ] Σ ΧΦ Cov ⎢ , ⎥ = ΦΧ′Σ ∂σ i ∂σ j ⎣⎢ ∂σ i ∂σ j ⎦⎥ = Φ(Q ij − Ρ i ΦΡ j )Φ Combining expressions 2.4, 2.5 and 2.6 we have (2.6) 19 Λ = Λ + O(n −52 r r ), where Λ = ∑∑ wij Φ(Qij − Ρ i ΦΡ j )Φ. i =1 j =1 ˆ ] = Φ − Λ + R ∗ + O(n− 2 ) ( a ) E[Φ ˆ ] = Λ + O(n − 5 2 ) (b) E[Λ 5 Lemma 2.4.11 ˆ ∗ ] = R ∗ + O (n − 2 ) (c) E[R 5 where R ∗ = Proof (a) 1 r r ∑∑ wijΦRij Φ, 2 i =1 j =1 ˆ = Λ (σˆ ). and Λ Using a Taylor series expansion about σ , r ∂Φ 1 r r ∂ 2Φ 5 ˆ ˆ Φ =Φ+∑ (σ i − σ i ) + ∑∑ (σˆ i − σ i )(σˆ j − σ j ) + O p (n − 2 ) 2 i =1 j =1 ∂σ i ∂σ j i =1 ∂σ i By assumption (A3), we have 2 r r ˆ ] = Φ + 1 ∑∑ ∂ Φ w + O (n − 5 2 ) Ε[Φ ij 2 i =1 j =1 ∂σ i ∂σ j =Φ+ =Φ+ + 1 r r 5 wij Φ(Ρ i ΦΡ j + Ρ j ΦΡ i − Qij + R ij − Q ji )Φ + O(n − 2 ) (lemma 2.4.8) (2.7) ∑∑ 2 i =1 j =1 1 r r 1 r r ( ) w Φ Ρ ΦΡ − Q Φ + ∑∑ ij i j ij ∑∑ wij Φ(Ρ jΦΡi − Qij )Φ 2 i =1 j =1 2 i =1 j =1 1 r r 5 wij ΦR ij Φ + O (n − 2 ) ∑∑ 2 i =1 i =1 1 1 1 r r 5 5 = Φ − Λ − Λ + ∑∑ wij ΦR ij Φ + O(n − 2 ) = Φ − Λ + R ∗ + O(n − 2 ) 2 2 2 i =1 j =1 (b) Using a Taylor series expansion about σ , we obtain ˆ = Λ + O ( n − 5 2 ) , and then, E[ Λ ˆ ] = Λ + O (n − 5 2 ). Λ (c ) Using a Taylor series expansion about σ , we obtain ˆ ∗ = R ∗ + O ( n − 5 2 ) , and then, E[R ˆ ∗ ] = R ∗ + O (n − 5 2 ). R Proposition 2.4.12 ˆ ] = Var[βˆ ] + O(n − 5 2 ), E[Φ A 20 r r ˆ − Ρˆ ΦΡ ˆ =Φ ˆ + 2Φ ˆ ⎨⎧∑∑ wˆ (Q ˆ ˆ −1R ˆ ) ⎬⎫ Φ̂. where Φ A ij ij i j ij 4 ⎩ i =1 j =1 ⎭ Proof ˆ −R ˆ =Φ ˆ + 2Λ ˆ ∗, Notice that Φ A ˆ ] = Φ − Λ + R ∗ + 2 Λ − R ∗ + O (n − 5 2 ) = Φ + Λ + O (n − 5 2 ) (lemma 2.4.11) and hence E[Φ A = Φ + Λ + O(n − 2 ) 5 = Var(βˆ ) + O (n − 2 ). 5 (lemma 2.4.9) Comments From proposition 2.4.12, we obtain ˆ ] = Var(βˆ ) + O(n − 5 2 ) , E[Φ A and from lemma 2.4.10, we obtain ˆ ] = Var(βˆ ) + O ( n −2 ) . E[Φ This shows that the adjusted estimator for Var(βˆ ) has less bias than the conventional estimator. 21 3. TESTING THE FIXED EFFECTS Consider the model as described in section 2.1. Suppose that we are interested in making inferences about linear combinations of the elements of β . In other words, we are interested in testing H 0 : L′β = 0 where L′ a fixed matrix of dimension ( × p ) . A common statistic often used is the Wald statistic: 1 ˆ )]−1 (L′βˆ ) . F = (L′βˆ )′[L′VL In fact, even though we call this statistic F , it does not necessarily have an F-distribution. Kenward and Roger (1997) approximate the distribution of F by choosing a scale λ and denominator degrees of freedom m such that λ F ∼ F( , m) approximately. 3.1 Constructing a Wald-Type Pivot The construction of a Wald-type pivot can be approached through either the adjusted estimator for the variance-covariance matrix of the fixed effects Φ̂A (this is what was done by Kenward and Roger, 1997), or through the conventional estimator Φ̂ . Both approaches are to be considered in this section. 3.1.1 Constructing a Wald-Type Pivot Through Φ̂ The Wald type pivot is 1 ˆ ) −1 L′(βˆ − β) . F = (βˆ − β)′L(L′ΦL In this section, we will derive formulas for the E[ F ] and Var[F ] approximately. 3.1.1.1 Deriving An Approximate Expression for E[F] Ε[F ] = Ε[Ε[ F |σˆ ]], 1 ˆ ) −1 L′(βˆ − β) | σˆ ] Ε[ F |σˆ ] = Ε[ (βˆ − β)′L(L′ΦL = 1 ˆ ) {Ε[(βˆ − β)′L](L′ΦL −1 ( )} ˆ ) −1 Var[L′(βˆ − β)] , Ε[L′(βˆ − β)] + tr (L′ΦL 22 using assumption (A4), and (Schott, 2005, theorem 10.18). Since βˆ is an unbiased estimator for β (lemma 2.4.3), ˆ ) −1 (L′VL)], Ε[ F |σˆ ] = tr[(L′ΦL then where V = Var(βˆ ) = Φ + Λ ˆ ) −1 (L′ΦL)] + tr[(L′ΦL ˆ ) −1 (L′ΛL)] . = tr[(L′ΦL ˆ ) −1 about σ , we have Using a Taylor series expansion for (L′ΦL ˆ ) −1 = (L′ΦL) −1 + ∑ (σˆ − σ ) ∂ (L′ΦL) (L′ΦL i i ∂σ i i =1 r + −1 1 r r ∂ 2 (L′ΦL) −1 1 ˆ ˆ ( )( ) σ σ σ σ − − + O p (n − 2 ), ∑∑ i i j j 2 i =1 j =1 ∂σ i ∂σ j and r ˆ ) −1 (L′ΦL) = Ι + ∑ (σˆ − σ ) (L′ΦL i i i =1 + ∂ (L′ΦL) −1 (L′ΦL) ∂σ i 3 1 r r ∂ 2 (L′ΦL) −1 ˆ ˆ σ σ σ σ ( )( ) (L′ΦL) + O p (n − 2 ). − − ∑∑ i i j j 2 i =1 j =1 ∂σ i ∂σ j Since ⎡ ∂ 2 (L′ΦL) −1 ⎤ tr ⎢ (L′ΦL) ⎥ = ⎣⎢ ∂σ i ∂σ j ⎦⎥ ⎡ ⎡ ⎤ ∂Φ ∂Φ ⎤ ∂ 2Φ L)(L′ΦL) −1 (L′ L) ⎥ − tr ⎢(L′ΦL) −1 (L′ L) ⎥ , 2tr ⎢(L′ΦL) −1 (L′ ∂σ i ∂σ j ⎥⎦ ∂σ i ∂σ j ⎥⎦ ⎢⎣ ⎢⎣ then by assumption (A3), we have (3.1) r r ⎡ ⎤ ˆ ) −1 (L′ΦL)}] = + ∑∑ w tr ⎢(L′ΦL) −1 (L′ ∂Φ L)(L′ΦL) −1 (L′ ∂Φ L) ⎥ Ε[tr{(L′ΦL ij ∂σ i ∂σ j ⎥⎦ i =1 j =1 ⎣⎢ − r ⎡ ⎤ ∂ 2Φ 1 r r 3 −1 ′ ′ tr ( L ΦL ) ( L L) ⎥ + O(n − 2 ) w ⎢ ∑∑ ij ∂σ i ∂σ j ⎥⎦ 2 i =1 j =1 ⎢⎣ r = + ∑∑ wij tr[ΘΦΡ i ΦΘΦΡ j Φ] − i =1 j =1 where Θ = L (L′ΦL ) −1 L′, and noticing that ⎛ ∂ 2Φ 1 r r tr Θ w ⎜ ∑∑ ij ⎜ ∂σ ∂σ 2 i =1 j =1 i j ⎝ ⎞ −3 ⎟⎟ + O( n 2 ), ⎠ ∂Φ = −ΦΡ i Φ (lemma 2.4.8). ∂σ i 23 ˆ ) −1 (L′ΛL )}] = tr(ΘΛ ) + O ( n −2 ) . Similar steps lead to Ε[tr{(L′ΦL Hence, r r Ε[ F ] = + tr(ΘΛ ) + ∑∑ wij tr(ΘΦΡ i ΦΘΦΡ j Φ) − i =1 j =1 ⎛ ∂ 2Φ 1 r r wij tr ⎜ Θ ∑∑ ⎜ ∂σ ∂σ 2 i =1 j =1 i j ⎝ ⎞ −3 ⎟⎟ + O (n 2 ) ⎠ and, E[F ] = 1+ 1 r r ∑∑ wij tr(ΘΦΡiΦΘΦΡ j Φ) − i =1 j =1 = 1+ 1 r r ∑∑ wij tr(ΘΦΡiΦΘΦΡ jΦ) + i =1 j =1 2 r ⎛ ∂ 2Φ tr Θ w ⎜ ∑∑ ij ⎜ i =1 j =1 ⎝ ∂σ i ∂σ j r r i =1 j =1 ij A2 + 2 A3 ⎞ 1 −3 ⎟⎟ + tr(ΘΛ ) + O(n 2 ) ⎠ 1 3 − ΡiΦΡ j − Rij )Φ] + O(n− 2 ). 4 r ∑∑ w tr[ΘΦ(Q E[F ] = 1 + Proposition 3.1.1.1.1 1 2 ij + O(n− 2 ), 3 where r r A2 = ∑∑ wij tr(ΘΦΡiΦΘΦΡ j Φ), i =1 j =1 and r 1 A3 = ∑∑ wij tr[ΘΦ(Qij − Ρi ΦΡ j − Rij )Φ] 4 i =1 j =1 r 3.1.1.2 Deriving an Approximate Expression for Var[F] Var[F ] = E[Var[ F | σˆ ]] + Var[E[F | σˆ ]] To derive the right-hand side, we consider each term at a time. The first term will be derived in part A, and the second term will be derived in part B. A. Var[F | σˆ ]] = 1 2 ˆ ) −1 L′(βˆ − β ) | σˆ ] Var[(βˆ − β )′L (L′ΦL The expression above is a quadratic form, and hence by assumption (A4), 2 ˆ )−1 (L′VL)]2 (Schott, 2005, theorem 10.22). Var[F | σˆ ]] = 2 tr[(L′ΦL ˆ ) −1 about σ, we have Using a Taylor series expansion for (L′ΦL 24 ˆ ) −1 (L′VL) = (L′ΦL) −1 (L′VL) + ∑ (σˆ − σ ) ∂ (L′ΦL) (L′VL) (L′ΦL i i ∂σ i i =1 −1 r + 3 1 r r ∂ 2 (L′ΦL) −1 ˆ ˆ σ σ σ σ ( )( ) (L′VL) + O p (n − 2 ) − − ∑∑ i i j j 2 j =1 j =1 ∂σ i ∂σ j ˆ ) −1 (L′VL)]2 = [(L′ΦL) −1 (L′VL)]2 ⇒ [(L′ΦL r + (L′ΦL) −1 (L′VL)∑ (σˆ i − σ i ) i =1 ∂ (L′ΦL) −1 (L′VL) ∂σ i r r ∂ 2 (L′ΦL) −1 1 + (L′ΦL) −1 (L′VL)∑∑ (σˆ i − σ i )(σˆ j − σ j ) (L′VL) ∂σ i ∂σ j 2 i =1 j =1 ⎛ r ⎞ ∂ (L′ΦL) −1 + ⎜ ∑ (σˆ i − σ i ) (L′VL) ⎟ (L′ΦL) −1 (L′VL) ∂σ i ⎝ i =1 ⎠ −1 r ⎞ ⎛ ⎞⎛ r ∂ (L′ΦL) ∂ (L′ΦL) −1 + ⎜ ∑ (σˆ i − σ i ) (L′VL) ⎟ ⎜ ∑ (σˆ i − σ i ) (L′VL) ⎟ ∂σ i ∂σ i ⎠ ⎝ i =1 ⎠ ⎝ i =1 ⎞ 3 1⎛ r r ∂ 2 (L′ΦL) −1 (L′VL) ⎟ (L′ΦL) −1 (L′VL) + O p ( n − 2 ), + ⎜ ∑∑ (σˆ i − σ i )(σˆ j − σ j ) ⎟ 2 ⎜⎝ i =1 j =1 ∂σ i ∂σ j ⎠ and ˆ ) −1 (L′VL)]2 = tr {[(L′ΦL) −1 (L′VL)]2 tr[(L′ΦL r + 2(L′ΦL) −1 (L′VL)∑ (σˆ i − σ i ) i =1 r r ∂ (L′ΦL) −1 (L′VL) ∂σ i + (L′ΦL) −1 (L′VL)∑∑ (σˆ i − σ i )(σˆ j − σ j ) i =1 j =1 r r + ∑∑ (σˆ i − σ i )(σˆ j − σ j ) i =1 j =1 ∂ 2 (L′ΦL) −1 (L′VL) ∂σ i ∂σ j ∂ (L′ΦL) −1 ∂ (L′ΦL) −1 3 ′ (L VL) (L′VL)} + O p (n − 2 ). ∂σ i ∂σ j Taking the expectation and by assumption (A3), we obtain Ε[Var[F | σˆ ]] = 2 2 {tr[(L′ΦL) −1 (L′VL)]2 r r ⎛ ∂ 2 (L′ΦL) −1 ⎞ −1 ′ ′ ′ + tr ⎜ (L VL)(L ΦL) (L VL)∑∑ wij ⎟ ⎜ ∂σ i ∂σ j ⎠⎟ i =1 j =1 ⎝ r r ⎛ 3 ∂ (L′ΦL) −1 ∂ (L′ΦL) −1 ⎞ ⎫⎪ + O(n − 2 ) (L′VL) + tr ⎜ (L′VL)∑∑ wij ⎟ ⎬ ⎟ ⎜ ∂σ i ∂σ j i =1 j =1 ⎝ ⎠ ⎪⎭ To proceed, we need to derive each term mentioned in the expression above. 25 (i ) [(L′ΦL)−1 (L′VL)]2 = (L′ΦL) −1[(L′ΦL) + (L′ΛL)](L′ΦL)−1[(L′ΦL) + (L′ΛL)] = [Ι + (L′ΦL) −1 (L′ΛL)][Ι + (L′ΦL)−1 (L′ΛL)] = Ι + 2(L′ΦL) −1 (L′ΛL) + (L′ΦL) −1 (L′ΛL)(L′ΦL) −1 (L′ΛL) = Ι + 2(L′ΦL) −1 (L′ΛL) + O(n −2 ) (3.2) (ii ) (L′VL)(L′ΦL) −1 (L′VL) = [(L′ΦL) + (L′ΛL)](L′ΦL) −1[(L′ΦL) + (L′ΛL)] = (L′ΦL) + 2(L′ΛL) + (L′ΛL)(L′ΦL)−1 (L′ΛL) r r ⎡ ∂ 2 (L′ΦL) −1 ⎤ So, tr ⎢ (L′VL)(L′ΦL) −1 (L′VL)∑∑ wij ⎥ ∂σ i ∂σ j ⎥⎦ i =1 j =1 ⎢⎣ r r r r ⎡ ∂ 2 (L′ΦL) −1 ⎤ ⎡ ∂ 2 (L′ΦL) −1 ⎤ = ∑∑ wij tr ⎢ (L′ΦL) ⎥ + 2∑∑ wij tr ⎢ (L′ΛL) ⎥ i =1 j =1 i =1 j =1 ⎥⎦ ⎣⎢ ∂σ i ∂σ j ⎦⎥ ⎣⎢ ∂σ i ∂σ j ⎡ ∂ 2 (L′ΦL) −1 ⎤ + ∑∑ wij tr ⎢ (L′ΛL)(L′ΦL) −1 (L′ΛL) ⎥ i =1 j =1 ⎣⎢ ∂σ i ∂σ j ⎦⎥ r r r r ⎡ ∂ 2 (L′ΦL) −1 ⎤ = ∑∑ wij tr ⎢ (L′ΦL) ⎥ + O (n −2 ) i =1 j =1 ⎢⎣ ∂σ i ∂σ j ⎥⎦ r r r r ⎛ ∂ 2Φ = 2∑∑ wij tr(ΘΦΡ i ΦΘΦΡ j Φ) −∑∑ wij tr ⎜ Θ ⎜ ∂σ ∂σ i =1 j =1 i =1 j =1 i j ⎝ ⎞ −2 ⎟⎟ + O (n ), ⎠ where the last expression above is obtained by utilizing (3.1). Alternatively, we can utilize lemma 2.4.8, to rewrite expression (3.3) as r r 1 2A2 − 2∑∑ wij tr[ΘΦ(Ρ i ΦΡ j − Q ij + R ij )Φ] + O (n −2 ) . 2 i =1 i =1 ∂ (L′ΦL) −1 ∂ (L′ΦL) −1 ′ (iii ) (L VL) (L′VL) ∂σ i ∂σ j ∂Φ = (L′ΦL) −1 (L′ L)(L′ΦL) −1[(L′ΦL) + (L′ΛL)] × ∂σ i ∂Φ L)(L′ΦL) −1[(L′ΦL) + (L′ΛL)] (L′ΦL) −1 (L′ ∂σ j = (L′ΦL) −1 (L′ ∂Φ ∂Φ L)(L′ΦL) −1 (L′ L) ∂σ i ∂σ j (3.3) 26 + (L′ΦL) −1 (L′ ∂Φ ∂Φ L)(L′ΦL)−1 (L′ L)(L′ΦL) −1 (L′ΛL) ∂σ i ∂σ j + (L′ΦL) −1 (L′ ∂Φ ∂Φ L)(L′ΦL)−1 (L′ΛL)(L′ΦL) −1 (L′ L) + O(n −2 ) ∂σ i ∂σ j And hence, ⎛ r r ⎞ ∂ (L′ΦL) −1 ∂ (L′ΦL)−1 ′ tr ⎜ ∑∑ wij (L VL) (L′VL) ⎟ ⎜ i =1 j =1 ⎟ ∂σ i ∂σ j ⎝ ⎠ r r = tr[∑∑ wij tr(ΘΦΡ i ΦΘΦΡ j Φ) + O(n −2 ) i =1 j =1 = A2 + O ( n −2 ) (3.4) From (i ), (ii ), and (iii ), we obtain Ε[Var[F | σˆ ]] = r r 3 2⎧ 1 ⎫ + ΘΛ + A − wij tr[ΘΦ(Ρ i ΦΡ j − Qij + R ij )Φ]⎬ + O(n − 2 ) 2tr( ) 3 2 ⎨ ∑∑ 2 2 2 i =1 i =1 ⎩ ⎭ B. Var[E[F | σˆ ]] = 1 2 ˆ )−1 (L′VL)] Var[tr[(L′ΦL 1 = (3.5) 2 { ( ) ( { 2 }) } ˆ )−1 (L′VL)] − Ε tr[(L′ΦL ˆ )−1 (L′VL)] Ε tr[(L′ΦL 2 ˆ ) −1 about σ, For the first term above, and by using a Taylor series expansion for (L′ΦL ˆ ) ( tr[(L′ΦL −1 + (L′VL) ) 2 r ⎡ ∂ (L′ΦL)−1 = tr ⎢(L′ΦL) −1 (L′VL) + ∑ (σˆ i − σ i ) (L′VL) ∂σ i i =1 ⎣ ⎤ ∂ 2 (L′ΦL) −1 1 r r 3 ˆ ˆ − − ( )( ) (L′VL) + O p ( n − 2 ) ⎥ × (same terms) σ σ σ σ ∑∑ i i j j ∂σ i ∂σ j 2 i =1 i =1 ⎥⎦ r 2 ⎡ ∂ (L′ΦL)−1 ⎤ (L′VL) ⎥ = ( tr[(L′ΦL) −1 (L′VL)]) + 2tr[(L′ΦL)−1 (L′VL)]∑ (σˆ i − σ i )tr ⎢ ∂σ i i =1 ⎣ ⎦ r r ⎡ ∂ 2 (L′ΦL) −1 ⎤ + tr[(L′ΦL) −1 (L′VL)]∑∑ (σˆ i − σ i )(σˆ j − σ j )tr ⎢ (L′VL) ⎥ i =1 j =1 ⎢⎣ ∂σ i ∂σ j ⎥⎦ 27 r r ⎤ ⎡ ∂ (L′ΦL) −1 ⎤ ⎡ ∂ (L′ΦL) −1 3 + ∑∑ (σˆ i − σ i )(σˆ j − σ j )tr ⎢ (L′VL) ⎥tr ⎢ (L′VL) ⎥ + O p ( n − 2 ) ∂σ i i =1 j =1 ⎥⎦ ⎣ ⎦ ⎢⎣ ∂σ j Taking the expectation, and by assumption (A4), we have Ε ( tr[(L′ΦL) −1 (L′VL)]) = ( tr[(L′ΦL) −1 (L′VL)]) 2 2 ⎡ ∂ 2 (L′ΦL) −1 ⎤ (L′VL) ⎥tr[(L′ΦL) −1 (L′VL)] + ∑∑ wij tr ⎢ i =1 j =1 ⎣⎢ ∂σ i ∂σ j ⎦⎥ r r r r ⎤ ⎡ ∂ (L′ΦL) −1 ⎤ ⎡ ∂ (L′ΦL) −1 3 (L′VL) ⎥tr ⎢ (L′VL) ⎥ + O(n − 2 ) + ∑∑ wij tr ⎢ ∂σ i i =1 j =1 ⎥⎦ ⎣ ⎦ ⎢⎣ ∂σ j To proceed, we need to derive each term mentioned in the expression above. (i ) (L′ΦL) −1 (L′VL) = Ι + (L′ΦL) −1 (L′ΛL) ⇒ tr[(L′ΦL) −1 (L′VL)] = + tr(ΘΛ ) ⇒ ( tr[(L′ΦL) −1 (L′VL)]) = 2 (ii ) 2 + 2 tr(ΘΛ ) + O( n −2 ) ∂ (L′ΦL) −1 ∂ (L′ΦL) −1 ∂ (L′ΦL) −1 (L′VL) = (L′ΦL) + (L′ΛL) ∂σ i ∂σ i ∂σ i = −(L′ΦL) −1 L′ ∂Φ ∂Φ L − (L′ΦL) −1 L′ L(L′ΦL)−1 (L′ΛL) ∂σ i ∂σ i ⎡ ∂ (L′ΦL) −1 ⎤ So, tr ⎢ (L′VL) ⎥ = − tr(ΘΦΡ i Φ) − tr(ΘΦΡ i ΦΘΛ ), ∂σ i ⎣ ⎦ (iii ) (3.6) (3.7) ∂ 2 (L′ΦL) −1 ∂ 2 (L′ΦL) −1 ∂ 2 (L′ΦL) −1 (L′VL) = (L′ΦL) + (L′ΛL) , ∂σ i ∂σ j ∂σ i ∂σ j ∂σ i ∂σ j ⎡ ∂ 2 (L′ΦL) −1 ⎤ and tr ⎢ (L′VL) ⎥ ⎣⎢ ∂σ i ∂σ j ⎦⎥ ⎡ ∂ 2 (L′ΦL) −1 ⎤ ⎡ ∂ 2 (L′ΦL) −1 ⎤ = tr ⎢ (L′ΦL) ⎥ + tr ⎢ (L′ΛL) ⎥ ⎢⎣ ∂σ i ∂σ j ⎥⎦ ⎢⎣ ∂σ i ∂σ j ⎥⎦ ⎛ ∂ 2Φ = 2tr(ΘΦΡ i ΦΘΦΡ j Φ ) − tr ⎜ Θ ⎜ ∂σ ∂σ i j ⎝ ⎞ ⎟⎟ + 2tr(ΘΦΡ i ΦΘΦΡ j ΦΘΛ ) ⎠ ⎛ ⎞ ∂ 2Φ − tr ⎜ Θ ΘΛ ⎟ (from expression 3.1). ⎜ ∂σ ∂σ ⎟ i j ⎝ ⎠ (3.8) 28 From (i), (ii), and (iii), ( ˆ )−1 (L′VL) Ε tr[(L′ΦL r ) 2 = + 2 tr(ΘΛ ) 2 ⎧⎪ r ⎛ ⎜ ⎝ ∑∑ wij [ + tr(ΘΛ)] ⎨− tr ⎜ Θ + ⎪⎩ i =1 j =1 ⎛ ⎜ ⎝ ∂ 2Φ ⎞ ⎟ + 2tr(ΘΦΡ iΦΘΦΡ j Φ) ∂σ i ∂σ j ⎟⎠ +2tr(ΘΦΡ i ΦΘΦΡ j ΦΘΛ ) − tr ⎜ Θ r ⎞ ⎫⎪ ∂ 2Φ ΘΛ ⎟ ⎬ ⎟ ∂σ i ∂σ j ⎠⎪ ⎭ r + ∑∑ wij [tr(ΘΦΡ i Φ) + tr(ΘΦΡ iΦΘΛ )][tr(ΘΦΡ j Φ) + tr(ΘΦΡ j ΦΘΛ )] + O(n − ) 3 2 i =1 j =1 = 2 r r ⎧⎪ ⎛ ∂ 2Φ + 2 tr(ΘΛ ) + ∑∑ wij ⎨2 tr(ΘΦΡ i ΦΘΦΡ j Φ) − tr ⎜ Θ ⎜ ∂σ ∂σ i =1 j =1 i j ⎪⎩ ⎝ ⎞ ⎟⎟ ⎠ ⎛ ⎞ ∂ 2Φ θΛ ⎟ + 2tr(ΘΛ )tr(ΘΦΡ iΦΘΦΡ j Φ) +2 tr(ΘΦΡ i ΦΘΦΡ j ΦΘΛ ) − tr ⎜ Θ ⎜ ∂σ ∂σ ⎟ i j ⎝ ⎠ ⎛ ⎞ ⎛ ∂ 2Φ ⎞ ∂ 2Φ ΘΛ ⎟ 2tr( )tr( ) tr( )tr ΘΛ ΘΦΡ ΦΘΦΡ ΦΘΛ ΘΛ Θ − tr(ΘΛ )tr ⎜ Θ + − ⎜ ⎟ i j ⎜ ∂σ i ∂σ j ⎟ ⎜ ⎟ ⎝ ⎠ ⎝ ∂σ i ∂σ j ⎠ + tr(ΘΦΡ i Φ)tr(ΘΦΡ j Φ) + 2tr(ΘΦΡ iΦ)tr(ΘΦΡ j ΦΘΛ ) + tr(ΘΦΡ iΦΘΛ )tr(ΘΦΡ j ΦΘΛ )} +O ( n − 2 ) 3 = 2 ⎧⎪ ⎛ ∂ 2Φ + 2 tr(ΘΛ ) + ∑∑ wij ⎨2 tr(ΘΦΡ i ΦΘΦΡ j Φ) − tr ⎜ Θ ⎜ ∂σ ∂σ i =1 j =1 i j ⎝ ⎩⎪ r r ⎞ ⎫⎪ ⎟⎟ ⎬ ⎠ ⎭⎪ + ∑∑ wij {tr(ΘΦΡ i Φ)tr(ΘΦΡ j Φ)} + O( n − 2 ) r r 3 i =1 j =1 = 2 r r ⎛ ∂ 2Φ + 2 tr(ΘΛ ) + 2 A2 + A1 − ∑∑ wij tr ⎜ Θ ⎜ ∂σ ∂σ i =1 j =1 i j ⎝ r ⎞ −3 ⎟⎟ + O( n 2 ), ⎠ (3.9) r A1 = ∑∑ wij tr(ΘΦΡi Φ)tr(ΘΦΡ j Φ). where i =1 j =1 Also, from expression 3.2, ˆ ) ( Ε ⎡⎣ tr[(L′ΦL −1 (L′VL)]⎤⎦ ) 2 = 2 + 2 ( A2 +2A3 ) + O( n −2 ) From expressions (3.9) and (3.10), we obtain (3.10) 29 Var[E[F |σˆ ]] 1 ⎪⎧ = 2⎨ ⎩⎪ = 2 r r + 2 tr(ΘΛ ) + 2 A2 + A1 − ∑∑ i =1 j =1 ⎛ ∂ 2Φ wij tr ⎜ Θ ⎜ ⎝ ∂σ i ∂σ j r r ⎛ 1 ⎧⎪ ∂ 2Φ A w 2 tr( ΘΛ ) tr Θ + − ∑∑ ij ⎜ 1 2 ⎨ ⎜ i =1 j =1 ⎝ ∂σ i ∂σ j ⎩⎪ ⎞ 2 ⎪⎫ −3 ⎟⎟ − − 2 ( A2 +2A3 ) ⎬ + O(n 2 ) ⎠ ⎭⎪ ⎫⎪ ⎞ −3 ⎟⎟ − 4 A3 ) ⎬ + O(n 2 ) ⎪⎭ ⎠ From parts (A) and (B), we are able to find an expression for Var[ F ] . r r ⎛ ∂ 2Φ 2 ⎧⎪ + + − A w 2tr( ΘΛ ) 3 tr Θ ∑∑ 2 ij ⎜ 2 ⎨ ⎜ ∂σ i ∂σ j i =1 j =1 ⎝ ⎩⎪ Var[ F ] = + r r ⎛ 1 ⎧⎪ ∂ 2Φ A w 2 tr( ΘΛ ) tr Θ + − ∑∑ 1 ij ⎜ 2 ⎨ ⎜ ∂σ ∂σ i =1 j =1 i j ⎝ ⎩⎪ ⎞ ⎫⎪ ⎟⎟ ⎬ ⎠ ⎭⎪ ⎫⎪ ⎞ −3 ⎟⎟ − 4 A3 ⎬ + O(n 2 ) ⎪⎭ ⎠ ⎛ ∂ 2Φ 2 ⎧⎪ 3 A2 A1 ⎛2 ⎞ ⎛1 1⎞ r r = ⎨1 + + − 2 A3 + ⎜ + 1⎟ tr(ΘΛ ) − ⎜ + ⎟ ∑∑ wij tr ⎜ Θ ⎜ ∂σ ∂σ 2 2 ⎠ i =1 j =1 ⎝ ⎠ ⎝ i j ⎝ ⎩⎪ ⎞ ⎫⎪ −3 ⎟⎟ ⎬ + O(n 2 ) ⎠ ⎭⎪ = 2 ⎪⎧ 1 ⎨1 + 2 ⎩⎪ r r ⎡ ⎛ ∂ 2Φ ⎢ A1 + 6 A2 − 4 A3 + 2( + 2)tr(ΘΛ ) − ( + 2)∑∑ wij tr ⎜⎜ Θ i =1 j =1 ⎢⎣ ⎝ ∂σ i ∂σ j ⎞ ⎤ ⎪⎫ −3 ⎟⎟ ⎥ ⎬ + O(n 2 ) ⎠ ⎥⎦ ⎭⎪ = 2 ⎧⎪ 1 ⎨1 + 2 ⎩⎪ r r ⎡ ⎛ ∂ 2Φ ⎪⎧ ⎢ A1 + 6 A2 − 4 A3 + ( + 2) ⎨2tr(ΘΛ ) − ∑∑ wij tr ⎜ Θ ⎜ ∂σ ∂σ i =1 j =1 i j ⎝ ⎩⎪ ⎣⎢ = 2 ⎪⎧ 1 ⎨1 + ⎪⎩ 2 r r ⎡ ⎧ ⎫⎤ ⎫⎪ −3 + − + + − 6 4 ( 2) 4tr( ) tr( ) A A A ΘΛ w ΘΦR Φ ⎢ 1 ⎨ ⎬⎥ ⎬ + O( n 2 ), ∑∑ ij ij 2 3 i =1 j =1 ⎢⎣ ⎩ ⎭⎥⎦ ⎪⎭ ⎞ ⎪⎫⎤ ⎫⎪ −3 ⎟⎟ ⎬⎥ ⎬ + O (n 2 ) ⎠ ⎭⎪⎦⎥ ⎭⎪ by noticing that ⎛ ∂ 2Φ −∑∑ wij tr ⎜ Θ ⎜ ∂σ ∂σ i =1 j =1 i j ⎝ r r r ⎞ ⎟⎟ = ⎠ r 2tr(ΘΛ ) − ∑∑ wij tr(ΘΦR ij Φ) + O(n − 2 ) 5 (lemmas 2.4.8 and 2.4.10), i =1 j =1 and A3 = tr(ΘΛ ) − Proposition 3.1.1.2.1 1 r r 5 wij tr(ΘΦR ij Φ) + O(n − 2 ). ∑∑ 4 i =1 j =1 Var[ F ] = 2 (1 + B1 ) + O (n − 2 ), 3 30 B1 = where 1 [ A1 + 6 A2 +8A3 ] 2 3.1.2 Constructing a Wald-Type Pivot Through Φ̂A 1 ˆ L) −1 L′(βˆ − β) . Similar to section The Wald type pivot is F = (βˆ − β)′L(L′Φ A 3.1.1, we derive expressions for E[ F ] and Var[F ] approximately 3.1.2.1 Deriving an Approximate Expression for E[F] First, we derive a useful lemma that will be needed in this section and chapter 4. For a matrix B = Op (n −1 ), (Ι + B)−1 = Ι − B + Op (n −2 ) Lemma 3.1.2.1.1 Proof (Ι + B)−1 = Ι − (Ι + B) −1 B (Schott, 2005, theorem 1.7) = Ι − ⎡⎣Ι − (Ι + B) −1 B ⎤⎦ B, applying the theorem again for (Ι + B) −1 p p Notice that since B → 0, then (Ι + B) −1 → (Ι + 0) −1 = Ι, and hence (Ι + B) −1 = O p (1). Therefore, (Ι + B) −1 = Ι − B + (Ι + B) −1 BB = Ι − B + O( n −2 ). ( ) 1 ˆ L) −1 Var[L′(βˆ − β)] , by assumption (A4). Ε[ F |σˆ ] = tr (L′Φ A ( ˆ L) = tr ( E[(L′Φ ) ˆ L) −1 (L′VL)] , ⇒ Ε[ F ] = E tr[(L′Φ A −1 A ˆ∗ ˆ =Φ ˆ +A Since Φ A (L′VL)] ) where V = Var(βˆ ) = Φ + Λ r r ˆ ∗ = 2Φ ˆ − Ρˆ ΦΡ ˆ ⎧⎨∑∑ wˆ (Q ˆ ˆ −1R ˆ ) ⎫⎬ Φ ˆ, where A ij ij i j ij 4 ⎩ i =1 j =1 ⎭ −1 ˆ ∗L) ⎤ (L′VL) ˆ L) −1 (L′VL) = ⎡(L′ΦL ˆ ) + (L′A then (L′Φ A ⎣ ⎦ { } ˆ ∗L) ⎤ ˆ ) ⎡ Ι + (L′ΦL ˆ ) −1 (L′A = (L′ΦL ⎣ ⎦ −1 (L′VL) −1 ˆ ∗L) ⎤ (L′ΦL ˆ ) −1 (L′A ˆ ) −1 (L′VL). = ⎣⎡Ι + (L′ΦL ⎦ ˆ ∗L) = O (n −1 ), then by employing the lemma 3.1.2.1.1, ˆ ) −1 (L′A Since (L′ΦL p 31 ˆ ∗L) + O (n −2 ) ⎤ (L′ΦL ˆ L) −1 (L′VL) = ⎡ Ι − (L′ΦL ˆ ) −1 (L′A ˆ ) −1 (L′VL) (L′Φ p A ⎣ ⎦ ˆ ∗L)(L′ΦL ˆ ) −1 (L′VL) − (L′ΦL ˆ ) −1 (L′A ˆ ) −1 (L′VL) + O (n −2 ) (3.11) = (L′ΦL p ( ) ˆ ) −1 (L′VL)] = + A + 2A + O( n − 3 2 ), Notice that E tr[(L′ΦL 2 3 (3.12) (proposition 3.1.1.1.1) ˆ ∗L)(L′ΦL ˆ ) −1 (L′A ˆ ) −1 (L′VL) and (L′ΦL ˆ ∗L)(L′ΦL ˆ ∗L)(L′ΦL ˆ ) −1 (L′A ˆ ) −1 (L′ΛL) ˆ ) −1 (L′A ˆ ) −1 (L′ΦL) + (L′ΦL = (L′ΦL ˆ ∗L)(L′ΦL ˆ )−1 (L′A ˆ )−1 (L′ΦL) + Op (n −2 ). = (L′ΦL ˆ ) −1 about σ, we have Using a Taylor series expansion for (L′ΦL ˆ ∗L)(L′ΦL ˆ )−1 (L′A ˆ )−1 (L′ΦL) + O p (n −2 ) (L′ΦL r ⎡ ∂ (L′ΦL) −1 1 r r ∂ 2 (L′ΦL)−1 ⎤ ˆ ∗L) = ⎢(L′ΦL)−1 + ∑ (σˆ i − σ i ) + ∑∑ (σˆ i − σ i )(σˆ j − σ j ) ⎥ (L′A ∂σ i ∂σ i ∂σ j ⎥⎦ 2 i =1 j =1 i =1 ⎢⎣ r ⎡ ∂ (L′ΦL) −1 1 r r ∂ 2 (L′ΦL) −1 ⎤ −2 × ⎢(L′ΦL) −1 + ∑ (σˆ i − σ i ) + ∑∑ (σˆ i − σ i )(σˆ j − σ j ) ⎥ (L′ΦL) + O p (n ) ∂ ∂ ∂ σ σ σ 2 i =1 i =1 j =1 i i j ⎣⎢ ⎦⎥ ˆ ∗L ) + O ( n − 3 2 ) = (L′ΦL) −1 (L′A p (3.13) From expressions (3.12 and 3.13), we obtain ˆ ∗ ) ⎤ + O(n − 3 2 ) Ε[ F ] = + A2 + 2A3 − E ⎡⎣ tr(ΘA ⎦ Lemma Proof (3.14) ˆ ∗ ] = A∗ + O (n − 2 ) E[A 5 Direct from Lemmas 2.4.10 and 2.4.11 Applying the lemma above on expression (3.14), we have Proposition 3.1.2.1.2 Ε[ F ] = 1 + A2 −3 + O(n 2 ) 3.1.2.2 Deriving an Approximate Expression for Var[F] Var[F ] = E[Var[ F | σˆ ]] + Var[E[F | σˆ ]] 32 A. Var[F | σˆ ]] = 1 ˆ L) −1 L′(βˆ − β) | σˆ ] Var[(βˆ − β )′L(L′Φ A 2 The expression above is a quadratic form and hence by assumption (A4), 2 ˆ L) −1 (L′VL )]2 ] (Schott, 2005, theorem 10.22). Var[F | σˆ ]] = 2 tr[(L′Φ A Recall that from expression (3.11), we had ˆ ∗L)(L′ΦL ˆ L) −1 (L′VL) = (L′ΦL ˆ ) −1 (L′VL) − L′ΦL ˆ ) −1 (L′A ˆ ) −1 (L′VL) + O (n −2 ). (L′Φ A p ( ) ˆ L) −1 (L′VL)]2 = tr ⎡ (L′ΦL ˆ )−1 (L′VL) ⎤ and hence, tr [(L′Φ A ⎣ ⎦ 2 ˆ ∗L)(L′ΦL ˆ ) −1 (L′VL)(L′ΦL ˆ ) −1 (L′A ˆ ) −1 (L′VL) ⎤ + O (n −2 ) − 2tr ⎡⎣ (L′ΦL p ⎦ ˆ ) −1 ( (L′ΦL) + (L′ΛL) ) (L′ΦL ˆ ) −1 ( (L′ΦL) + (L′ΛL) ) ⎤ = tr ⎡⎣ (L′ΦL ⎦ ˆ ∗L)(L′ΦL ˆ ) −1 ( (L′ΦL) + (L′ΛL) ) (L′ΦL ˆ ) −1 (L′A ˆ ) −1 ( (L′ΦL ) + (L′ΛL) ) ⎤ + O (n −2 ) −2tr ⎣⎡(L′ΦL p ⎦ ˆ ) −1 (L′ΦL)(L′ΦL ˆ ) −1 (L′ΦL) ⎤ + 2tr ⎡ (L′ΦL ˆ ) −1 (L′ΦL )(L′ΦL ˆ ) −1 (L′ΛL) ⎤ = tr ⎡⎣ (L′ΦL ⎣ ⎦ ⎦ ˆ ∗L)(L′ΦL ˆ ) −1 (L′ΦL)(L′ΦL ˆ ) −1 (L′A ˆ ) −1 (L′ΦL) ⎤ + O (n −2 ). − 2tr ⎡⎣(L′ΦL p ⎦ ˆ ) −1 about σ, we have Using a Taylor series expansion for (L′ΦL ( ) ˆ L) −1 (L′VL)]2 = tr [(L′Φ A ⎧⎡ ⎪ r ∂ (L′ΦL) −1 i =1 ∂σ i tr ⎨⎢ Ι + ∑ (σˆ i − σ i ) ⎪⎩ ⎣ ⎡ r × ⎢ Ι + ∑ (σˆ i − σ i ) ⎣ i =1 ⎪⎧⎡ ∂ (L′ΦL) ∂σ i 1 r −1 (L′ΦL) + 1 r i − σ i )(σˆ j − σ j ) r ∑∑ (σˆ 2 i =1 ∂σ i (L′ΦL) + 1 i − σ i )(σˆ j − σ j ) i =1 r ∂ (L′ΦL) −1 ⎣ i =1 ∂σ i ⎧⎡ ⎪ r ∂ (L′ΦL) −1 ⎩⎪ ⎣ i =1 ∂σ i r (L′ΦL) + r ∑∑ (σˆ 2 ⎡ × ⎢ (L′ΦL) −1 (L′ΛL) + ∑ (σˆ i − σ i ) 2 ∂σ i ∂σ j i ∂ (L′ΦL) 2 ∂σ i ∂σ j − σ i )(σˆ j − σ j ) j =1 (L′ΛL) + 1 ∂ 2 (L′ΦL ) −1 j =1 i =1 j =1 ∂ (L′ΦL) −1 −2tr ⎨⎢ Ι + ∑ (σˆ i − σ i ) r ∑∑ (σˆ 2 i =1 r +2tr ⎨⎢ Ι + ∑ (σˆ i − σ i ) ⎩⎪ ⎣ (L′ΦL) + 1 2 r −1 ⎤ (L′ΦL) ⎥ ⎦ ⎤ ⎫⎪ (L′ΦL) ⎥ ⎬ ⎦⎭⎪ ∂ 2 (L′ΦL) −1 ∂σ i ∂σ j r ∑∑ (σˆ i − σ i )(σˆ j − σ j ) ⎦ ∂ 2 (L′ΦL) −1 i =1 j =1 r r ∂ 2 (L′ΦL) −1 i =1 j =1 ∂σ i ∂σ j ∑∑ (σˆ i − σ i )(σˆ j − σ j ) ⎤ (L′ΦL) ⎥ ∂σ i ∂σ j ⎤ (L′ΦL) ⎥ ⎦ r ⎡ ⎤ ∂ (L′ΦL) −1 ∂ 2 (L′ΦL) −1 1 r r ′ ˆ ˆ × ⎢Ι + ∑ (σˆ i − σ i ) + − − ( L ΦL ) ( )( ) (L′ΦL) ⎥ σ σ σ σ ∑∑ i i j j 2 ∂σ i ∂σ i ∂σ j 2 i =1 j =1 ⎣⎢ i =1 ⎦⎥ ⎤ ⎪⎫ (L′ΛL) ⎥ ⎬ ⎦⎭⎪ 33 ⎡ ˆ ∗L) + × ⎢ (L′ΦL) −1 (L′A ⎣ + ⎡ = tr ⎢ Ι + 2 ⎣⎢ r ∑ (σˆ i =1 i −σi ) ∂ (L′ΦL) −1 ˆ ∗L ) (L′A ∂σ i ⎤⎫ 1 r r ∂ 2 (L′ΦL) −1 ˆ ∗L) ⎥⎬⎪ + O (n −2 ) (σˆ i − σ i )(σˆ j − σ j ) (L′A ∑∑ p 2 i =1 j =1 ∂σ i ∂σ j ⎥⎦⎭⎪ r ∑ (σˆi − σ i ) i =1 r r ⎤ ∂ (L′ΦL) −1 ∂ 2 (L′ΦL) −1 (L′ΦL) + ∑∑ (σˆ i − σ i )(σˆ j − σ j ) (L′ΦL) ⎥ ∂σ i ∂σ i ∂σ j i =1 j =1 ⎦⎥ ⎡ r r ⎤ ∂ (L′ΦL) −1 ∂ (L′ΦL) −1 + tr ⎢ ∑∑ (σˆ i − σ i )(σˆ j − σ j ) (L′ΦL) (L′ΦL) ⎥ ∂σ i ∂σ j ⎢⎣ i =1 j =1 ⎥⎦ ˆ ∗L) ⎤ + O (n − 3 2 ). + 2tr ⎡⎣ (L′ΦL) −1 (L′ΛL) ⎤⎦ − 2tr ⎡⎣(L′ΦL) −1 (L′A p ⎦ By taking the expectation and assumption (A3), E[Var[ F | σˆ ]] = r r ⎡ ∂ 2 (L′ΦL) −1 ⎤ 2 ⎧⎪ ′ L ΦL tr w + ⎢ ⎥ ⎨ ∑∑ ij 2 i =1 j =1 ⎪⎩ ⎣⎢ ∂σ i ∂σ j ⎦⎥ r r ⎫⎪ ⎡ ∂ (L′ΦL) −1 ⎤ 3 ∂ (L′ΦL) −1 L′ΦL L′ΦL ⎥ + 2tr[ΘΛ] − 2tr[ΘA∗ ]⎬ + O(n − 2 ). + ∑∑ wij tr ⎢ ∂σ i ∂σ j i =1 j =1 ⎪⎭ ⎣⎢ ⎦⎥ Notice that ⎡ ∂ (L′ΦL) −1 ⎤ ∂ (L′ΦL) −1 ′ ′ tr L ΦL L ΦL w ⎢ ⎥ = A2 , ∑∑ ij ∂σ i ∂σ j i =1 j =1 ⎣⎢ ⎦⎥ r r r r ⎡ ∂ 2 (L′ΦL) −1 ⎤ ⎡ ∂ 2 (L′ΦL) ⎤ ′ = − tr L ΦL 2 tr w A w ⎢ ⎥ ⎢Θ ⎥. ∑∑ ∑∑ ij ij 2 ∂σ i ∂σ j ⎥⎦ i =1 j =1 i =1 j =1 ⎢⎣ ∂σ i ∂σ j ⎥⎦ ⎢⎣ r and from (3.1), r Hence, E[Var[ F | σˆ ]] = r r ⎡ ∂ 2 (L′ΦL) ⎤ ⎫⎪ 2 ⎧⎪ −3 ∗ + 3 2tr[ ΘΛ ] 2tr[ ΘA ] tr + − − A w ⎢Θ ⎥ ⎬ + O(n 2 ). ∑∑ ij 2 2 ⎨ ∂σ i ∂σ j ⎥⎦ ⎭⎪ i =1 j =1 ⎢⎣ ⎩⎪ In addition, by lemmas 2.4.8 and 2.4.10, we have ⎡ ∂ 2 (L′ΦL) ⎤ 2tr[ΘΛ ] − 2tr[ΘA ] − ∑∑ wij tr ⎢Θ ⎥ ∂σ i ∂σ j ⎦⎥ i =1 j =1 ⎣⎢ ∗ r r r r ⎡ ∂ 2 (L′ΦL) ⎤ −3 = 2tr[ΘΛ ] − 2tr[ΘA∗ ] − ∑∑ wij tr ⎢Θ ⎥ + O(n 2 ) ∂σ i ∂σ j ⎦⎥ i =1 j =1 ⎣⎢ r r r r 1 ⎡ ⎤ = 2∑∑ wij tr ⎡⎣ΘΦ(Qij − Ρ i ΦΡ j )Φ ⎤⎦ − 4∑∑ wij tr ⎢ΘΦ(Qij − Ρ i ΦΡ j − R ij )Φ ⎥ 4 ⎣ ⎦ i =1 j =1 i =1 j =1 34 r r 1 3 3 3 ⎡ ⎤ −2∑∑ wij tr ⎢ΘΦ(Ρ i ΦΡ j − Qij + R ij )Φ ⎥ + O( n − 2 ) = 0 + O(n − 2 ) = O (n − 2 ) 2 ⎣ ⎦ i =1 j =1 ⇒ E[Var[ F | σˆ ]] = 1 B. Var[E[F | σˆ ]] = ( tr[(L′Φˆ L) (i ) A { + 3 A2 } + O(n − 2 ) 3 {( 1 (3.16) }) }, ) ( { 2 2 ˆ L) −1 (L′VL)] − Ε tr[(L′Φ ˆ L) −1 (L′VL)] Ε tr[(L′Φ A A 2 −1 2 ˆ L) −1 (L′VL)], by assumption (A4) Var[tr[(L′Φ A 2 = 2 (3.15) (L′VL) ) 2 { } ˆ ∗L)(L′ΦL ˆ ) −1 (L′VL) − (L′ΦL ˆ )(L′A ˆ ) −1 (L′VL) + O (n −2 ) ⎤ = tr ⎣⎡ (L′ΦL p ⎦ { 2 (from 3.11) ˆ ) } − 2tr ⎡⎣(L′ΦL −1 ˆ ∗L)(L′ΦL ˆ )(L′A ˆ ) −1 (L′VL) ⎤ (L′VL) ⎤⎦ tr ⎡⎣ (L′ΦL ⎦ ˆ ) } − 2tr ⎡⎣(L′ΦL −1 ˆ ∗L)(L′ΦL ˆ )(L′A ˆ ) −1 (L′ΦL) ⎤ (L′ΦL) ⎤⎦ tr ⎡⎣(L′ΦL ⎦ 2 ˆ ) −1 (L′VL) ⎤ = tr ⎡⎣(L′ΦL ⎦ + O p (n −2 ) { 2 ˆ ) −1 (L′VL) ⎤ = tr ⎡⎣(L′ΦL ⎦ + O p (n −2 ) (recall that V = Φ + Λ ). ( ˆ ) −1 (L′VL)] E tr[(L′ΦL From expression (3.9), ) 2 r r ⎡ ∂ 2Φ ⎤ −3 + 2 tr[ΘΛ ] + 2 A2 + A1 − ∑∑ wij tr ⎢Θ ⎥ + O(n 2 ). i =1 j =1 ⎢⎣ ∂σ i ∂σ j ⎥⎦ ˆ ) −1 about σ, we have (ii ) Using a Taylor series expansion for (L′ΦL = 2 ˆ ∗L)(L′ΦL ˆ ) −1 (L′ΦL) ⎤ tr ⎡(L′ΦL ˆ ) −1 (L′A ˆ ) −1 (L′ΦL) ⎤ = tr ⎡⎣(L′ΦL ⎦ ⎣ ⎦ ⎡ r ∂ (L′ΦL) −1 ⎣ i =1 ∂σ i tr ⎢ Ι + ∑ (σˆ i − σ i ) ⎪⎧⎡ r ∂ (L′ΦL) −1 i =1 ∂σ i ×tr ⎨⎢ Ι + ∑ (σˆ i − σ i ) ⎩⎪ ⎣ (L′ΦL) + ⎡ ˆ ∗L) + × ⎢(L′ΦL) −1 (L′A ⎣ + 1 r r r ∑∑ (σˆ 2 i =1 j =1 i 2 r r ∑∑ (σˆ i − σ i )(σˆ j − σ j ) (L′ΦL ) + i −σi ) ∂ 2 (L′ΦL) −1 i =1 j =1 1 r r ∑∑ (σˆ 2 i =1 ∑ (σˆ i =1 1 i − σ i )(σˆ j − σ j ) j =1 ∂σ i ∂σ j ∂ 2 (L′ΦL) −1 ∂σ i ∂σ j ∂ (L′ΦL) −1 ˆ ∗ L) (L′A ∂σ i − σ i )(σˆ j − σ j ) ∂ 2 (L′ΦL) −1 ∂σ i ∂σ j ⎤⎫ −52 ˆ ∗L) ⎪ (L′A ⎥ ⎬ + O p (n ). ⎦⎭⎪ ⎤ (L′ΦL) ⎥ ⎦ ⎤ (L′ΦL) ⎥ ⎦ 35 By assumption (A3), we obtain ( ) ˆ ∗L)(L′ΦL ˆ ) −1 (L′ΦL) ⎤ tr ⎡(L′ΦL ˆ ) −1 (L′A ˆ ) −1 (L′ΦL) ⎤ = tr[ΘA∗ ] + O(n − 3 2 ). E tr ⎡⎣(L′ΦL ⎦ ⎣ ⎦ From (i ) and (ii), ( ) 2 ˆ L) −1 (L′VL)] = E tr[(L′Φ A + 2 tr[ΘΛ ] + 2 A2 + A1 2 r r ⎡ ∂ 2Φ ⎤ −3 ∗ − ∑∑ wij tr ⎢Θ ⎥ − 2 tr[ΘA ] + O(n 2 ), i =1 j =1 ⎢⎣ ∂σ i ∂σ j ⎥⎦ r r ⎡ ∂ 2Φ ⎤ −3 ∗ and since 2tr[ΘΛ ] − ∑∑ wij tr ⎢Θ ⎥ − 2tr[ΘA ] = O( n 2 ) (from 3.17), i =1 j =1 ⎢⎣ ∂σ i ∂σ j ⎥⎦ then ( ) 2 ˆ ) −1 (L′VL)] = E tr[(L′ΦL 2 + 2 A2 + A1 + O(n − 2 ). 3 (3.17) Recall that from expression (3.14), we have ( ) ˆ L) −1 (L′VL) ⎤ = E[F] = + A + O( n − 3 2 ) E tr ⎡⎣ (L′Φ A 2 ⎦ {( )} 2 ˆ L) −1 (L′VL) ⎤ = 2 + 2 A + O (n − 2 ). ⇒ E tr ⎡⎣(L′Φ A 2 ⎦ From expressions (3.17 and 3.18), we obtain Var[E[F | σˆ ]] = = 1 2 A1 2 { 2 + 2 A2 + A1 − 3 2 (3.18) − 2 A2 } + O (n − 2 ) 3 + O(n − 2 ), 3 (3.19) Therefore from expressions (3.16 and 3.19), Proposition 3.1.2.2.1 where Var[F ] = B= 2 (1 + B ) + O(n − 3 2 ), 1 ( A1 + 6 A2 ) . 2 Summary (i ) When the adjusted estimate of the variance-covariance matrix is used in constructing the Wald-type statistic, 36 Ε[ F ] ≈ E = 1 + and Var[F ] ≈ V = where B= r and 2 A2 , (1 + B ) , 1 ( A1 + 6 A2 ) , 2 r r A1 = ∑∑ wij tr(ΘΦΡ i Φ)tr(ΘΦΡ j Φ), r A2 = ∑∑ wij tr(ΘΦΡ i ΦΘΦΡ j Φ) i =1 j =1 i =1 j =1 (ii ) When the conventional estimate of the variance-covariance matrix is used, E[F ] ≈ E1 = 1 + 2 Var[F ] ≈ V1 = (1 + B1 ), and B1 = where and A2 + 2 A3 6 A2 + A1 + 8 A3 , 2 r r 1 A3 = ∑∑ wij tr[ΘΦ(Qij − Ρ i ΦΡ j − R ij )Φ] 4 i =1 j =1 Or alternatively, Ε[ F ] ≈ E1 = 1 + Var[F ] ≈ V1 = and where B1 = 6S2 + S1 , 2 2 S2 (1 + B1 ), and S2 = A2 + 2 A3 , S1 = A1 − 4 A3 3.2 Estimating the Denominator Degrees of Freedom and the Scale Factor In order to determine λ and m such that F ∗ = λ F ∼ F( ,m) approximately, we match the approximated first two moments of F ∗ with those of F( ,m ) distribution. E[F( , m)] = m , m−2 ⎛ m ⎞ Var[F( , m)] = 2 ⎜ ⎟ ⎝m−2⎠ 2 +m−2 . (m − 4) 37 In the previous section, we found expressions for E and V such that A Ε[ F ] ≈ E = 1 + 2 , Var[F ] ≈ V = and 2 (1 + B ) , By matching the first moments, E[F ∗ ] ≈ λ E = ⇒λ = m m−2 m . E (m − 2) By matching the second moments, ⎛ m ⎞ Var[F ] ≈ λ V = 2 ⎜ ⎟ ⎝m−2⎠ ∗ 2 2 ⎡ m ⎤ ⇔V = 2⎢ ⎥ ⎣ (m − 2)λ ⎦ and hence m = 4 + 2 +m−2 (m − 4) +m−2 (m − 4) ⇔ V +m−2 = 2 2E (m − 4) 2+ , where ρ −1 ρ= V . 2E 2 The scale and denominator degrees of freedom are to be estimated by substituting σ̂ for σ in A1 , A2 , and A3 . The quantity wij can be estimated by using the inverse of the expected information matrix. 38 4. MODIFYING THE ESTIMATES OF THE DENOMINATOR DEGREES OF FREEDOM AND THE SCALE FACTOR As we saw in chapter three, the Wald-type statistic F does not have an F distribution usually; however, the scaled form of F will have an F distribution approximately. Indeed, there are number of special cases where the scaled form of F follows an F distribution exactly. Kenward and Roger (1997) considered two of these special cases; the Hotelling T 2 and the conventional Anova models, and then they modified the approximate expressions for the expectation and the variance of F so the modified expressions produce estimates of the denominator degree of freedom and the scale that match the exact and known values for the special cases. In this chapter, we derive Kenward and Roger (K-R) modification based on the Hotelling T 2 and the balanced one-way Anova models. 4.1 Balanced One-Way Anova Model 4.1.1 The Model and Assumptions Consider the model: yij = μ + τ i + eij for i = 1,.., t , and j = 1,...., m, iid where μ is the general mean, τ i are the treatment effects, and eij ∼ N(0, σ e2 ) Suppose that we are interested in testing H 0 : τ 1 = ..... = τ t ⇔ L′β = 0, where ⎡ 0 1 −1 0 ⎢ 0 0 1 −1 L′ = ⎢ ⎢ ⎢ ⎣0 0 0 1 0⎤ 0 ⎥⎥ , ⎥ ⎥ −1⎦ β′ = [ μ τ 1 τt ] Σ = σ 2 Ι n , where n = mt. The design matrix; Χ is not of full column rank. We can proceed with this parameterization, using the g-inverse, however, to simplify our computations, we will reparameterize the model so Χ will be a full column rank matrix as follows β∗′ = ⎡⎣ μ ∗ τ 1∗ τ t∗−1 ⎤⎦ t with ∑τ i =1 ∗ i = 0, and the hypothesis becomes 39 ⎡0 1 0 ⎢0 0 1 L∗′ = ⎢ ⎢ ⎢ ⎣0 0 0 H 0 :L∗′β∗ = 0, where 0 0 0 0⎤ 0 ⎥⎥ ⎥ ⎥ 1⎦ To simplify our notation, from now on, consider Χ = Χ∗ , L = L∗ , and β = β∗ . 4.1.2 The REML Estimates of σ Since the design is balanced, then the REML estimates are the same as the Anova estimates (Searle et al. , 1992, section 3.8), and hence 2 σˆ REML = MSE = y ′(Ι - Ρ Χ ) y . n−t 4.1.3 Computing P1 , Q11 and R11 ∂Σ −1 1 Σ Χ = − 2 2 Χ′Χ, 2 (σ ) ∂σ 1 ∂Σ −1 ∂Σ −1 Q11 = Χ′Σ −1 Σ Σ Χ = 2 3 Χ′Χ, 2 2 (σ ) ∂σ ∂σ Ρ1 = − Χ′Σ −1 and R11 = Χ′Σ −1 ∂2Σ ∂2Σ −1 Σ Χ 0 , because = =0 ∂ (σ 2 ) 2 ∂ (σ 2 ) 2 In addition, since Φ = ( Χ′Σ −1Χ ) = σ 2 ( Χ′Χ ) , then Q11 − Ρ1ΦΡ1 = 0, −1 −1 r r ˆ − Ρˆ ΦΡ ˆ. ˆ =Φ ˆ . Recall that Φ ˆ =Φ ˆ + 2Φ ˆ ⎧⎨∑∑ wˆ (Q ˆ ˆ −1R ˆ ) ⎫⎬ Φ and hence Φ A A ij ij i j ij 4 1 1 = = i j ⎩ ⎭ 4.1.4 Computing A1 and A2 A1 = w11 [ tr(ΘΦΡ1Φ) ] , 2 2 Since σˆ REML = MSE, and Also, ΘΦΡ1Φ = − and A2 = w11 tr(ΘΦΡ1ΦΘΦΡ1Φ). (n − t )MSE σ2 1 σ 2 2 ∼ χ n2−t , then w11 = Var (σˆ REML )= −1 L ⎡⎣L′( Χ′Χ) −1 L ⎤⎦ L′( Χ′Χ) −1 Χ′Χ( Χ′Χ) −1 2(σ 2 ) 2 . n−t 40 =− 1 σ −1 L ⎡⎣ L′( Χ′Χ) −1 L ⎤⎦ L′( Χ′Χ) −1 2 −1 Notice that L ⎡⎣ L′( Χ′Χ) −1 L ⎤⎦ L′( Χ′Χ) −1 is a p.o. on ℜ(L ) ⇒ tr(ΘΦΡ1Φ) = − ( 1 −1 tr L ⎡⎣ L′( Χ′Χ) −1 L ⎤⎦ L′( Χ′Χ) −1 σ 2 1 ( ) ) −1 r L ⎡⎣L′( Χ′Χ) −1 L ⎤⎦ L′( Χ′Χ) −1 (Harville, 1997, corollary 10.2.2) σ 1 = − 2 r(L) = − 2 ( = t − 1). =− 2 σ σ Therefore, A1 = 2 2(σ 2 ) 2 2 2 , = n − t (σ 2 ) 2 n − t 2(σ 2 ) 2 2 . = 2 2 n − t (σ ) n−t Accordingly, for this model, we have A1 = A2 , A2 = and B= 1 +6 ( A1 + 6 A2 ) = . 2 n−t 4.1.5 Estimating the Denominator Degrees of Freedom and the Scale Factor Under the null hypothesis, −1 1 y′Χ( Χ′Χ) −1 L ⎡⎣L′( Χ′Χ) −1 L ⎤⎦ L′( Χ′Χ) −1 Χ′y F= MSE −1 Notice that Ρ = Χ( Χ′Χ) −1 L ⎡⎣L′( Χ′Χ) −1 L ⎤⎦ L′( Χ′Χ) −1 Χ′ is an o.p.o (clear), and ( = tr ( L ⎡⎣L′( Χ′Χ) −1 r(Ρ ) = tr(P) = tr Χ( Χ′Χ) −1 L ⎡⎣L′( Χ′Χ) −1 L ⎤⎦ L′( Χ′Χ) −1 Χ′ −1 −1 L ⎤⎦ L′( Χ′Χ) −1 ) −1 ) In addition, L ⎣⎡ L′( Χ′Χ) −1 L ⎦⎤ L′( Χ′Χ) −1 is a p.o. on ℜ(L), and hence r(Ρ) = , y′Ρy and F= . r(Ρ)MSE Since ℜ(Ρ) ⊂ ℜ( Χ), then under the null hypothesis, F ∼ F( , n − t ) exactlly, where n − t = r ( Ι − Ρ Χ ) (Birkes, 2003, lemm 4.4). 41 In other words, the denominator degrees of freedom should equal to n − t , and the scale should equal to one. Since A1 = 2 2 , n −t A2 = A2 2 +6 , and B = , n−t n−t E[F ] ≈ E = 1 + ⇒ρ= V (n − t ) ⎡ n − t + + 6 ⎤ = ⎢ (n − t + 2) 2 ⎥ , 2 2E ⎣ ⎦ m = 4+ = 1+ 2 , n−t then +2 (n − t + 2) 2 ( + 2) = 4+ , ρ −1 (n − t )( − 2) − 4 It can be seen that m − 4 = Var[F ] ≈ V = and λ = 2 2⎛ +6 ⎞ (1 + B) = ⎜1 + ⎟. ⎝ n−t ⎠ m ⎛ n − t ⎞⎛ m ⎞ =⎜ ⎟⎜ ⎟ E (m − 2) ⎝ n − t + 2 ⎠⎝ m − 2 ⎠ (n − t + 2) 2 ( + 2) ( n − t + 2) 2 ( + 2) > (n − t )( − 2) − 4 (n − t )( + 2) (n − t + 2) 2 ( n − t ) + 4(n − t ) + 4 4 = = = n−t +4+ n−t n−t n−t So, m > n − t +8, and this means that the approach overestimates the denominator degrees 2 of freedom. The approach in this setting does not perform well especially when n − t is a small value. Moreover, since m > n − t + 8, then 2m − 2(n − t ) > 4 ⇒ 2m − 2(n − t ) − 4 + m(n − t ) > m(n − t ) n−t m m( n − t ) ⇒λ = = <1 , n − t + 2 m − 2 2m − 2( n − t ) − 4 + m(n − t ) and again the approach doesn't perform in the way we wish. 4.2 The Hotelling’s T 2 model 4.2.1 The Model and Assumptions Suppose that y1 ,....., y n are n independent observations from a p-variate population which has a normal distribution with mean μ and variance Σ . E[ y ] = Χμ where Χ is np × p design matrix, and μ is p ×1 vector. 42 ⎡Ι p ⎤ ⎢ ⎥ Χ = ⎢ ⎥ = 1n ⊗ Ι p ⎢Ι p ⎥ ⎣ ⎦ ⎡Σ p ⎢ , Σ=⎢ ⎢0 ⎣ 0⎤ ⎥ ⎥ = Ιn ⊗ Σ p Σ p ⎥⎦ Also, suppose that we are interested in testing: Η 0 : μ = 0 Special Notation Since σ = {σ if } p×( p+1) (i ≤ f ) , then we will use the notation Ρ if , Ρ jg , 2 Q if , jg , R if , jg , and wif , jg . ˆ =Φ ˆ. In the Hotelling T 2 model, Φ A Lemma 4.2.2 Χ′Σ−1Χ = (1n ⊗ Ι p )′(Ι p ⊗ Σ −p1 )(1n ⊗ Ι p ) Proof = 1n′ Ι p 1n ⊗ Ι p Σ −p1Ι p = nΣ −p1 ⇒ Φ = ( Χ′Σ −1Χ)−1 = ( nΣ −p1 ) = −1 Σp n (Harville,1997, chapter 16). . ∂Σ −p1 ∂Σ −1 ∂Σ −1 Ρ if = − Χ′Σ Σ Χ = Χ′ Χ=n , ∂σ if ∂σ if ∂σ if −1 Ρ if ΦΡ jg = n and Qif , jg ∂Σ −p1 ∂σ if Σp ∂Σ −p1 ∂σ jg , ⎛ ∂Σ −1 ∂Σ −1 ⎞ ∂Σ −p1 ∂Σ −p1 ∂Σ −1 ∂Σ −1 = Χ′Σ Σ Σ Χ = Χ′ ⎜ Σ Σp . ⎟⎟ Χ = n ⎜ ∂σ ∂σ if ∂σ jg ∂ ∂ ∂ σ σ σ if jg if jg ⎝ ⎠ −1 ⇒ Qif , jg − Ρ if ΦΡ jg = 0, Since ∂2Σ ˆ =Φ ˆ. = 0, then R if , jg = 0. We conclude that Φ A ∂σ if ∂σ jg r r ˆ − Ρˆ ΦΡ ˆ =Φ ˆ + 2Φ ˆ ⎧⎨∑∑ wˆ (Q ˆ ˆ −1R ˆ ) ⎫⎬ Φ ˆ Recall that Φ A ij ij i j ij 4 ⎩ i =1 j =1 ⎭ Theorem 4.2.3 The REML estimate of Σ p is S = n where A = ∑ (y i − y )(y i − y )′, and y = i =1 A , n −1 1 n ∑ yi n i =1 43 Proof: When we are interested in deriving the REML estimate, we consider our data as z = Κ ′y such that Κ ′Χ = 0. Choose M such that M′M = Ι n −1 , MM′ = Ι n − Ρ1n (Birkes, 2004, chapter 8) We can choose Κ = M ⊗ Ι p , and this is true because Κ ′Χ = (M′ ⊗ Ι p )(1n ⊗ Ι p ) = M′1n ⊗ Ι p = 0 (notice that ℜ(M′) = ℜ(1n ) ⊥ ) Also, Κ ′Κ = (M′ ⊗ Ι p )(M ⊗ Ι p ) = Ι n −1 ⊗ Ι p = Ι np − p ( ) and ΚΚ ′ = (M ⊗ Ι p )(M′ ⊗ Ι p ) = MM′ ⊗ Ι p = Ι n − Ρ1n ⊗ Ι p = Ι np − Ρ1n ⊗ Ι p Observe that since Ρ Χ = (1n ⊗ Ι p ) ⎡⎣(1′n ⊗ Ι p )(1n ⊗ Ι p ) ⎤⎦ −1 (1′ ⊗ Ι ) = Ρ n p 1n ⊗Ιp, then ΚΚ ′ = Ι np − Ρ Χ . Since z = Κ ′y = (M′ ⊗ Ι p )y , then Var[z ] = Var[(M′ ⊗ Ι p )y ] = (M′ ⊗ Ι p ) Σ(M ⊗ Ι p ) = (M′ ⊗ Ι p )(Ι n ⊗ Σ p )(M ⊗ Ι p ) = M′M ⊗ Σ p . Var[z ] = M ′M ⊗ Σ p = Ι n −1 ⊗ Σ p . So, In fact, Var[z] has the same construction as Var[y], but with n − 1 blocks of Σ p instead of n blocks. z1 ,....., z n −1 are iid N p (0,Σ p ) . n −1 ∑z z ′ i =1 Lemma When z1 ,....., z n −1 are iid N p (0,Σ p ) , then Proof 1 1 n −1 = constant − ( n − 1)log Σ p − ∑ z′i Σ −p1z i 2 2 i =1 Notice that i i n −1 is the ML estimator of Σ p . ⎛ n −1 ′ −1 ⎞ ⎛ n −1 −1 ′ ⎞ ⎛ −1 n −1 −1 ′ ′⎞ z Σ z tr z Σ z tr Σ z z tr = = = ∑ i p i ⎜∑ i p i ⎟ ⎜∑ p i i ⎟ ⎜ Σ p ∑ zi zi ⎟ i =1 i =1 ⎠ ⎝ i =1 ⎠ ⎝ i =1 ⎠ ⎝ n −1 n −1 1 1 ⇒ = constant − ( n − 1)log Σ p − tr ( Σ −p1A ) , and hence the MLE of Σ p is 2 2 (Anderson, 2003, lemma 3.2.2), and (Rao, 2002, chapter 8). ∑z z ′ i =1 i i n −1 44 Remark Since MM ′ = Ι n − Ρ1n , n −1 ∑m i =1 2 ji n −1 ∑m i =1 ji = n −1 n mki = for j = 1,...., n −1 n for j ≠ k , j , k = 1,...., n n Now, since z = Κ ′y = (M ⊗ Ι p )y, then z i = ∑ m ji y j , j =1 ⎛ n ⎞⎛ n ⎞ n n z i z′i = ⎜ ∑ m ji y j ⎟⎜ ∑ m ji y′j ⎟ = ∑∑ m ji mki y j y ′k ⎝ j =1 ⎠⎝ j =1 ⎠ k =1 j =1 and So, n −1 n n n n ⎛ n −1 ⎞ ′ ′ ′ = = z z m m y y y y ∑ ∑∑∑ ∑∑ i i ji ki j k j k ⎜ ∑ m ji mki ⎟ i =1 i =1 k =1 j =1 k =1 j =1 ⎝ i =1 ⎠ n −1 n ⎛ n −1 ⎞ n = ∑ y j y′j ⎜ ⎟+∑ ⎝ n ⎠ j =1, j =1 k≠ j n = ∑ y j y ′j − j =1 n = ∑ y j y ′j − j =1 So, n −1 ∑ y y′ ⎜⎝ n ⎟⎠ k =1 j k (from the remark above) n 1 n 1 n ′ y j ∑ y′k ∑ y jy j − n ∑ n j =1 j =1 k =1 n ⎤ n 1 n 1⎡ ′ ′ ′ ′ ′ n n y y y y y y − − ∑ j j n⎢ ∑ j j ⎥ = ∑ y j y j − ny y n j =1 j =1 ⎣ ⎦ j =1 n ∑ zi z′i = ∑ y i y′i − ny y′ i =1 i =1 n n ∑ y y′ = ∑ (y and since i =1 then ⎛ −1 ⎞ n n −1 ∑z i =1 i i i =1 i −y )(y i − y )′ + ny y′ (Anderson, 2003, lemma 3.2.1), n i z′i = ∑ (y i −y )(y i − y )′. i =1 n We conclude that the REML estimate of Σ p = S = ∑ (y i =1 i − y )(y i − y )′ n −1 = A . n −1 4.2.4 Computing A1 and A2 Since we are testing H 0 : μ = 0 , then L = Ι p , = p, Θ = L(L′ΦL) −1 L′ = Φ −1 , 45 p p p p p p p p A1 = ∑ ∑ ∑∑ wif , jg tr(ΘΦΡ if Φ)tr(ΘΦΡ jg Φ) = ∑ ∑ ∑∑ wif , jg tr(Ρ if Φ)tr(Ρ jg Φ), (4.1) i =1 f =1, j =1 g =1, i≤ f j≤g p p p i =1 f =1, j =1 g =1, i≤ f j≤ g p p p p p A2 = ∑ ∑ ∑∑ wif , jg tr(ΘΦΡ if ΦΘΦΡ jg Φ) = ∑ ∑ ∑∑ wif , jg tr(Ρ if ΦΡ jg Φ) i =1 f =1, j =1 g =1, i≤ f j≤g 1−δ if 1−δ jg σ if , and tr(Ρ jg Φ) = −2 tr(Ρif Φ) = −2 ⇒ tr(Ρif Φ)tr(Ρ jg Φ) = 2 ⎧1 ⎩0 δ if = ⎨ where (4.2) i =1 f =1, j =1 g =1, i≤ f j≤ g 2 −δ if −δ jg σ jg σ if σ jg , for i = f for i ≠ f (4.3) , and Σ −p1 = {σ if }p× p Σ S ∼ Wishart [ n −p1 , n − 1], and A ∼ Wishart [Σ p , n − 1] (Anderson, 2003, corollary 7.2.3). Also, Anderson provided in section 7.3 that the characteristic function of A11 , A 22 ,...., A pp , 2A12 , 2A13 ,......, 2A p -1, p as Ε ⎡⎣ eitr ( AΘ ) ⎤⎦ = Ι − 2iΘΣ p −N 2 = Ψ, where Θ = {θ ij } p× p with θ ij = θ ji , and A = {A ij } p× p , N = n − 1 Finding the moments For the first moments, 1 ∂Ψ Ε[A ij ] = × i ∂θij Θ =0 1 1 = (− N ) Ι − 2iΘΣ p i 2 1 − N −1 2 ⎡ ⎤ ∂ (Ι − 2iΘΣ p ) ⎥ Ι − 2iΘΣ p tr ⎢ (Ι − 2iΘΣ p ) −1 ∂θij ⎢⎣ ⎥⎦ Θ = 0 ⎛ ∂Θ ⎞ ⎧⎪(n − 1)σ ii Σp ⎟ = ⎨ = N tr ⎜ ⎜ ∂θ ⎟ ⎪ 2( n − 1)σ ij ⎝ ij ⎠ ⎩ For the second moments, 2 2 −δ if −δ jg 1 ∂2Ψ Ε[Aif A jg ] = 2 × i ∂θif ∂θ jg Θ=0 for i = j for i ≠ j , and hence E[σˆ ij ] = σ ij 46 = 1 ∂ ⎧⎪ − N Ι − 2iΘΣ p ⎨ i 2 ∂θ jg ⎪ 2 ⎩ −N 2 ⎡ ⎛ ⎞ ⎤ ⎫⎪ ∂Θ tr ⎢(Ι − 2iΘΣ p ) −1 ⎜ −2i Σ p ⎟⎥ ⎬ ⎜ ⎟⎥ ∂θif ⎢⎣ ⎝ ⎠ ⎦ ⎪⎭ Θ =0 ⎛ ∂Θ ⎞ ⎛ ∂Θ ⎞ ⎛ ∂Θ ⎞ ∂Θ = N 2 tr ⎜ Σp ⎟ Σ p ⎟ tr ⎜ Σ p ⎟ + 2 Ntr ⎜ Σp ⎜ ∂θ ⎟ ⎜ ∂θ ⎟ ⎜ ∂θ ⎟ ∂θ if ⎠ ⎝ jg ⎠ ⎝ if ⎠ ⎝ jg ⇒2 2 −δ if −δ jg ⎛ ∂Θ ⎞ ⎛ ∂Θ ⎞ ⎛ ∂Θ ⎞ ∂Θ Ε[A if A jg ] = (n − 1)tr ⎜ Σ p ⎟ tr ⎜ Σ p ⎟ + 2(n − 1)tr ⎜ Σp Σp ⎟ ⎜ ∂θ ⎟ ⎜ ∂θ ⎟ ⎜ ∂θ ⎟ ∂θ jg ⎝ if ⎠ ⎝ jg ⎠ ⎝ if ⎠ ⇒ Cov[A if , A jg ] = Ε[A if A jg ] − Ε[A if ]Ε[A jg ] 1 ⎛ = ⎜ 2−δif −δ jg ⎝2 ⎞ ⎛ ∂Θ ⎞ ⎛ ∂Θ ⎞ ⎛ ∂Θ ∂Θ ⎞ ⎪⎧ 2 Σp ⎟ Σ p ⎟ tr ⎜ Σ p ⎟ + 2(n − 1)tr ⎜ Σp ⎟ ⎨(n − 1) tr ⎜⎜ ⎟ ⎟ ⎜ ∂θ ⎟ ⎜ ∂θ ∂θ jg ⎠ ⎪⎩ ⎠ ⎝ ∂θif ⎠ ⎝ jg ⎠ ⎝ if ⎞ ⎫⎪ ⎛ ∂Θ ⎞ ⎛ ∂Θ −( n − 1) 2 tr ⎜ Σ p ⎟ tr ⎜ Σ p ⎟⎬ ⎟ ⎜ ∂θ ⎟ ⎜ ∂θ ⎠ ⎪⎭ ⎝ if ⎠ ⎝ jg ⎞ ∂Θ ⎛ 2(n − 1) ⎞ ⎛ ∂Θ = ⎜ 2−δ if −δ jg ⎟ tr ⎜ Σp Σp ⎟ ⎟ ∂θ jg ⎝2 ⎠ ⎜⎝ ∂θ if ⎠ ⎛ ∂Θ ⎞ ∂Θ Σp Σp ⎟ 2tr ⎜ ⎜ ∂θ ⎟ ∂θ jg ⎝ if ⎠ ⇒ Cov[σˆ if , σˆ jg ] = 2 −δ −δ (n − 1)2 if jg (4.4) ⎛ ∂Θ ⎞ 1−δ −δ ∂Θ Σp Σ p ⎟ = 2 if jg (σ igσ jf + σ fgσ ij ), By noticing that tr ⎜ ⎜ ∂θ ⎟ ∂θ jg ⎝ if ⎠ Cov[σˆ if , σˆ jg ] = σ igσ jf + σ fgσ ij (4.5) (4.6) ( n − 1) Combining expressions (4.1), (4.3), and (4.6), we obtain p p p p A1 = ∑ ∑ ∑∑ i =1 f =1, j =1 g =1, i≤ f j≤g p p p p = ∑∑∑∑ i =1 f =1 j =1 g =1 σ igσ jf + σ fgσ ij (n − 1) σ igσ jf + σ fgσ ij (n − 1)2 2 −δ if −δ jg 2 2 2 −δ if −δ jg 2 −δ if −δ jg σ if σ jg σ if σ jg Note In the last expression above for A1 , we divided by 2 the summations repeat wif , jg 2 2 −δ if −δ jg times. 2 −δ if −δ jg , and this is true because 47 1 p p p p ⇒ A1 = σ igσ if σ jf σ jg + σ fgσ jgσ ijσ if ∑∑∑∑ n − 1 g =1 f =1 j =1 i =1 = 1 ⎪⎧ p p ⎛ p if ⎨∑∑ ⎜ ∑ σ igσ n − 1 ⎩⎪ f =1 g =1 ⎝ i =1 = p p ⎫ 1 ⎧ p p ⎨∑∑ (δ gf δ gf ) + ∑∑ (δ fjδ fj ) ⎬ n − 1 ⎩ f =1 g =1 j =1 f =1 ⎭ p p p ⎛ p ⎞⎛ p jg ⎞ jg ⎞ ⎛ if + σ σ σ σ ⎜ ⎟ ⎜ ⎟ ∑ ∑∑ ∑ ∑ ⎟ ⎜ σ ijσ jf fg ⎠ ⎝ j =1 ⎠ j =1 f =1 ⎝ g =1 ⎠ ⎝ i =1 ⎞ ⎪⎫ ⎟⎬ ⎠ ⎭⎪ Therefore, 2p n −1 A1 = Also, ⎛ ⎞ ⎛ ∂Σ −p1 ∂Σ −p1 ∂Σ p −1 ∂Σ p Σp Σ p ⎟ = tr ⎜ Σ −p1 Σp tr(Ρ if ΦΡ jg Φ) = tr ⎜ nΣ −p1 ⎜ ⎟ ⎜ ∂ ∂ ∂ ∂σ jg σ σ σ if jg if ⎝ ⎠ ⎝ 1−δ if −δ ij = (σ igσ jf + σ fgσ ij )2 , ⎞ ⎟⎟ ⎠ by utilizing expresion (4.5) above Combining expressions (4.2), (4.6), and (4.7), we obtain p p p p A2 = ∑ ∑ ∑∑ i =1 f =1, j =1 g =1, i≤ f j≤g p p p p = ∑∑∑∑ i =1 f =1 j =1 g =1 = (σ igσ jf + σ fgσ ij ) (n − 1) (σ igσ jf + σ fgσ ij ) (n − 1)2 2 −δ if −δ ij 1−δ if −δ ij (σ igσ jf + σ fgσ ij )2 1−δ if −δ ij (σ igσ jf + σ fgσ ij )2 p p p p 1 ⎧ p p p p ig jf fg ij ⎨∑∑∑∑ σ igσ σ jf σ + ∑∑∑∑ σ igσ σ jf σ 2(n − 1) ⎩ i =1 j =1 g =1 f =1 i =1 j =1 g =1 f =1 p p p p p p p p ⎫ + ∑∑∑∑ σ fgσ igσ ijσ jf + ∑∑∑∑ σ fgσ fgσ ijσ ij ⎬ i =1 j =1 g =1 f =1 i =1 j =1 g =1 f =1 ⎭ p p p p p p ⎫ 1 ⎧ p p = + + + δ δ δ δ (1)(1) (1)(1) ⎬ ⎨∑∑ ∑∑ ∑∑ ∑∑ if if if if 2(n − 1) ⎩ g =1 f =1 i =1 f =1 i =1 f =1 g =1 j =1 ⎭ 1 ( p2 + p + p + p2 ) 2(n − 1) p( p + 1) A2 = Therefore, n −1 = 4.2.5 Estimating the Denominator Degrees of Freedom and the Scale Factor . (4.7) 48 When testing Η 0 : μ = 0 , a scaled Hotelling Τ 2 has an exact F-test (Krzanowski, 2000, chapter 8). Indeed, the denominator degrees of freedom m should equal n − p , and the scale λ should equal n− p n− p . = n − p + p −1 n −1 ( + 1) 3 +4 , and B = (note that = p ), then n −1 n −1 A +1 2 2 3 +4 E = 1+ 2 = 1+ , and V = (1 + B) = (1 + ) n −1 n −1 V ⎛ n + 3 + 3 ⎞⎛ n − 1 ⎞ ⇒ρ= =⎜ ⎟⎜ ⎟, 2 2E ⎝ n+ ⎠⎝ ⎠ (n + )( + 2) +2 = 4+ 2 , m = 4+ n + n − 4 − 3 + 3n ρ −1 Since A1 = 2 , n −1 λ= and A2 = m ⎛ n − 1 ⎞⎛ m ⎞ =⎜ ⎟⎜ ⎟. E (m − 2) ⎝ n + ⎠⎝ m − 2 ⎠ The estimates do not match the exact values for this model. 4.3 Kenward and Roger’s Modification From section 4.1, we found that for the balanced one-way Anova model, 2 2 A1 = , n−t A2 = m = 4+ λ= and 2 , n−t Β= A +6 , and 1 = , n−t A2 +2 ≠ n − t, V −1 2E 2 n−t ≠1 E (n − t − 2) And from section 4.2, we found that for the Hotelling Τ 2 model, A1 = 2 , n −1 A2 = m = 4+ ( + 1) , n −1 B= 3 +4 , and n −1 +2 ≠ n− , V −1 2E 2 A1 2 = , A2 +1 49 and λ= n− ≠ E (n − − 2) n− . n −1 Instead of estimating E[ F ] and Var[F ] by E and V , we will construct modified estimators E ∗ and V ∗ leading to +2 , V∗ −1 2 E ∗2 m∗ ∗ λ = ∗ ∗ E (m − 2) m∗ = 4 + and (4.8) (4.9) such that m∗ and λ ∗ match the exact values for the denominator degrees of freedom and the scale respectively for the balanced one-way Anova and the Hotelling T 2 models. Applying expression (4.9) on the balanced one-way Anova model, n−t = 1, E (n − t − 2) The expression above, can be rewritten in A2 term as ∗ E ∗ ( − A2 ) E∗ = and hence = 1, 1 A 1− 2 (4.10) Similarly, applying expression (4.9) on the Hotelling T 2 , n− n− = , E (n − − 2) n −1 The expression above, can be rewritten in term of A2 as ∗ 1 = ⎤ ∗ ⎡ ( + 1) − ( + 1) ⎥ E ⎢ ⎣ A2 ⎦ 1 ⇒ ∗ = E [ ( + 1) − ( + 1) A2 ] and hence E∗ = 1 A 1− 2 A2 ( + 1) 1 , ( +1) (4.11) 50 From expressions (4.10 and 4.11), we can see that the approximate expression for E[ F ] should be as E∗ = 1 A 1− 2 (4.12) Applying expression (4.8) on the balanced one-way Anova model, +2 = n−t V∗ −1 2E ∗2 The expression above can be rewritten in B term as +2 +6 4+ = ∗ V B −1 ∗2 2E 4+ (4.13) 2( + 2) E ∗2 +6 = − 4, ∗ ∗2 V − 2E B ⎧ ⎛ ⎞⎫ ⎪ ⎜ ⎟ ⎪⎪ 2 + 2 ⎪ V ∗ = ⎨ E ∗2 ⎜ 1 + ⎟⎬. +6 ⎪ ⎜ − 4 ⎟⎪ ⎪⎩ ⎝ ⎠ ⎪⎭ B Moreover, since for the balanced one-way Anova model, ⇒ +6 2 , and A2 = , n−t n−t 2 1 B, = 1− E∗ = A2 6 + 1− B= then and hence ⎧ ⎫ ⎪ ⎪ 2⎪ + 6 + ( − 2) B ⎪ ∗ V = ⎨ ⎬ 2 ⎪ ⎛1 − 2 B ⎞ ( + 6 − 4 B) ⎪ ⎟ ⎪⎩ ⎜⎝ ⎪⎭ +6 ⎠ ⎧ ⎫ ⎛ −2⎞ 1+ ⎜ B ⎪ ⎪ ⎟ 2⎪ ⎪ +6⎠ ⎝ = ⎨ ⎬, 2 ⎪ ⎛1 − 2 B ⎞ ⎛1 − 4 B ⎞ ⎪ ⎟ ⎜ ⎟ ⎪⎩ ⎜⎝ +6 ⎠ ⎝ + 6 ⎠ ⎪⎭ by dividing the numerator and the denominator by + 6. (4.14) 51 Similarly, applying expression (4.8) on the Hotelling T 2 model, 4+ +2 V∗ −1 2E ∗2 = n− The expression above can be rewritten in B term as 4+ +2 ∗ V −1 2E ∗ 2 = 3 +4 +1− B (4.15) 2( + 2) E ∗2 3 + 4 ⇒ = −3− , V ∗ − 2E ∗ 2 B ⎧ ⎛ ⎪ ⎜ 2⎪ +2 V ∗ = ⎨ E ∗2 ⎜1 + 3 +4 ⎪ ⎜ −3− ⎪⎩ B ⎝ ⎞⎫ ⎟ ⎪⎪ ⎟⎬ ⎟⎪ ⎠ ⎪⎭ Moreover, Since for the Hotelling T 2 model, 3 +4 ( + 1) B= , and A2 = , n −1 n −1 1 +1 B, then E ∗ = = 1− A 3 +4 1− 2 and hence ⎧ ⎫ ⎪ ⎪ 2⎪ 3 +4− B ⎪ V∗ = ⎨ ⎬. 2 1 + ⎛ ⎞ ⎪ 1− B [3 + 4 − ( + 3) B] ⎪ ⎪⎩ ⎜⎝ 3 + 4 ⎟⎠ ⎪⎭ ⎧ ⎫ ⎛ −1 ⎞ B 1+ ⎜ ⎪ ⎪ ⎟ 2⎪ ⎪ ⎝3 +4⎠ = ⎨ ⎬, 2 1 3 + + ⎛ ⎞ ⎛ ⎞ ⎪ 1− B 1− B ⎪ ⎪⎩ ⎜⎝ 3 + 4 ⎟⎠ ⎜⎝ 3 + 4 ⎟⎠ ⎪⎭ (4.16) by dividing the numerator and the denominator by 3 + 4. From expressions (4.14 and 4.16), we can see that the approximate expression for the variance should be as V∗ = ⎫ 2⎧ 1 + d1 B ⎨ ⎬, 2 ⎩ (1 − d 2 B) (1 − d3 B) ⎭ 52 where −2⎫ −1 ⎫ d1 = ⎪ +6 3 +4⎪ ⎪ ⎪ 2 ⎪ +1 ⎪ 2 d2 = ⎬ for the balanced one-way Anova model, d 2 = ⎬ for Hotelling Τ model 3 + 4⎪ + 6⎪ 4 ⎪ +3 ⎪ d3 = d3 = ⎪ 3 + 4 ⎪⎭ +6⎭ d1 = So far we obtained a general expression for the variance but not for the coefficients di ' s . First, we determine some simple linear relationship among the di ' s that are common between the two cases (i ) The numerator of d 2 = − numerator of d1 (ii ) The numerator of d3 = numerator of d 2 + 2 = − numerator of d1 + 2 (iii ) The denominator of di ' s is a linear function of the numerator of d1 , where c( − 2) + d = + 6 for the balanced one-way Anova model, −c + d = 3 + 4 for the Hotelling T 2 model. Solving the two equations above for c and d , we have c = −2 and d = 3 + 2, and hence the denominator of di ' s = 3 + 2(1 − the numerator of d1 ) . The 's involved in relationships (i ), (ii ), and (iii ) for both special cases are fixed, whereas any other , which is involved in the numerator of d1 , is accommodated as a function of the ratio of A1 and A2 . Let the numerator of d1 = g ( A1 , A2 ) = a Notice that g = − 2 g = −1 Since then and A1 = and A2 A1 + b = g. A2 for the balanced one-way Anova model, for the Hotelling T 2 model. 2 for the balanced one-way Anova and Hotelling T 2 respectively, +1 a + b = − 2, 2 a + b = −1 +1 (4.17) 53 Solving the equations above, we have a= ( + 1)( − 1) +1 +4 = , and b = − . ( + 1) − 2 +2 +2 From the expressions we obtained for both a and b , we have + 1 A1 +4 − + 2 A2 +2 g= g= Therefore, ( + 1) A1 − ( + 4) A2 ( + 2) A2 for both special cases. Summary The Kenward and Roger approach that was derived in section 3.2 will be modified as m∗ = 4 + λ∗ = and +2 , ρ ∗ −1 where ρ ∗ = V∗ , 2E ∗2 m∗ such that E ∗ (m∗ − 2) −1 ⎛ A ⎞ E = ⎜1 − 2 ⎟ , ⎝ ⎠ ∗ and V∗ = ⎫ 1 + d1 B 2⎧ ⎨ ⎬, 2 ⎩ (1 − d 2 B ) (1 − d3 B ) ⎭ g 3 + 2(1 − g ) −g d2 = 3 + 2(1 − g ) −g+2 d3 = , 3 + 2(1 − g ) d1 = where g= ( + 1) A1 − ( + 4) A2 . ( + 2) A2 Recall that before the modification, Ε[ F ] ≈ E = 1 + B= where r and r , and Var[F ] ≈ V = 2 (1 + B ) , 1 ( A1 + 6 A2 ) , 2 A1 = ∑∑ wij tr(ΘΦΡ i Φ)tr(ΘΦΡ j Φ), i =1 j =1 A2 r r A2 = ∑∑ wij tr(ΘΦΡ i ΦΘΦΡ j Φ) i =1 j =1 54 4.4 Modifying the Approach with the Usual Variance-Covariance Matrix Recall that when we used the usual variance-covariance of the fixed effects estimates Φ̂ in section 3.1.1, we found that Ε[ F ] ≈ 1 + where B1 = S2 , 6 S 2 + S1 , 2 and Var[F ] ≈ 2 (1 + B1 ), and S 2 = A2 + 2 A3 , S1 = A1 − 4 A3 Since for both special cases, A3 = 0, then S2 and S1 are reduced to A2 and A1 respectively, and hence the modification based on the same two special cases considered for K-R modification should be analogous to K-R modification derived in section 4.3 (substitute S 2 and S1 for A2 and A1 respectively). The problematic part about this modification is that A3 = 0 , where there is no clear guide for us to accomplish the modification for A3 . Simulation studies (as we will see in chapter 8) show that this modification does not perform well to estimate the denominator degrees of freedom and the scale. 55 5. TWO PROPOSED MODIFICATIONS FOR THE KENWARD-ROGER METHOD As we saw in chapter four, the modification applied by Kenward and Roger was based on modifying the approximate expressions for the expectation and the variance of the Wald type statistic F so the approximation produces the known values for the denominator degrees of freedom and the scale for the special cases where the distribution of F is exact. The modification for the approximate expression for the expectation was relatively simple and direct; whereas the modification applied by Kenward and Roger for the approximate expression for the variance was more complicated. In fact, the modification that applied by Kenward and Roger is not unique and is ad-hoc. In this chapter, we propose two other modifications for Kenward and Roger’s method based on the same two special cases. The approximate expression for the expectation of F is modified as done by Kenward and Roger (1997); however, instead of modifying the approximate expression for the variance, we modify other quantities as will be seen shortly. These two proposed modifications are simpler than the K-R modification both in computation and in derivation. Moreover, like the K-R method, our proposed modifications produce the exact values for the denominator degrees of freedom and scale in the two special cases. 5.1 First Proposed Modification In order to approximate the F test for the fixed effects, H 0 : L′β = 0 in a model as described in section 2.1, we use the same form of Wald-type F statistic used in K-R method, Define h = 1 ˆ L) −1 L′βˆ . F = βˆ ′L(L′Φ A +2 . V −1 2E 2 Recall that E = 1 + A2 , and V = 2 (1 + B ) Also, recall that chapter 4, for the balanced one-way Anova model, we had 56 2 2 A1 = , n−t m = 4+ A2 = 2 , n−t Β= +2 = 4 + h ≠ n − t, V −1 2E 2 A +6 , and 1 = , n−t A2 λ= and n−t ≠1 E (n − t − 2) And for the Hotelling Τ 2 model, we had A1 = 2 , n −1 A2 = ( + 1) , n −1 B= 3 +4 , and n −1 +2 = 4+h ≠ n− , V −1 2E 2 m = 4+ and A1 2 = , A2 +1 λ= n− ≠ E (n − − 2) n− n −1 In this proposed modification, we keep the modified estimator for E[F ] to be E ∗ as was derived in section 4.3; however, instead of constructing a modified estimator for Var[F ], we will construct a modified estimator for h, say h1 leading to m1 = 4 + h1 , (5.1) such that m1 matches the exact values for the denominator degrees of freedom for the balanced one-way Anova and the Hotelling T 2 models. Applying expression (5.1) on the balanced one way Anova model, 4 + h1 = h1 = +6 , B (from 4.13) +6 −4 B And, applying expression (5.1) on the Hotelling Τ 2 model, 3 +4 +(1 − ) , B 3 +4 h1 = −3− . B 4 + h1 = (from 4.15) For both models, h1 = d 0 + d1 , B (5.2) 57 where and d 0 = −4 ⎫ ⎬ d1 = + 6 ⎭ for the balanced one-way Anova model, d 0 = −( + 3) ⎫ ⎬ d1 = 3 + 4 ⎭ for the Hotelling Τ 2 model. Also, notice that d1 can be expressed as a linear function of d 0 for both models, that is d1 = a1d 0 + a2 , + 6 = −4a1 + a2 where for the balanced one way Anova model, 3 + 4 = −( + 3)a1 + a2 for the Hotelling T 2 model. Solving the equations above, we have a1 = −2, and a2 = − 2. That is d1 = −2d 0 + − 2. (5.3) Like the discussion we made for K-R modification in section 4.3, we need to distinguish between the fixed and the one to be considered as a function of the ratio of A1 and A2 . The in expression 5.3 is a fixed one and the in the d 0 expression is considered to be a function of the ratio of A1 and A2 . Let ⇒ g ( A1 , A2 ) = d 0 = a + b A1 A2 −4 = a+b , 2 − (3 + ) = a + b +1 for the balanced one way Anova model, for the Hotelling T 2 model. Solving the two equations above, a=− ( + 1) +1 − 4, and b = +2 +2 And hence, + 1 ⎛ A1 ⎞ ( + 1) −4+ ⎜ ⎟ +2 + 2 ⎝ A2 ⎠ ( + 1) A1 − ( 2 + 5 + 8) A2 = ( + 2) A2 d0 = − Summary Kenward and Rogers’s approach that was derived in section 3.1 is modified as (5.4) 58 and λ1 = m1 = 4 + h1 , m1 E (m1 − 2) ∗ such that E ∗ = (1 − A2 ) −1 and h1 = d 0 + − 2 − 2d 0 , B where ( + 1) A1 − ( 2 + 5 + 8) A2 d0 = ( + 2) A2 (Recall that B = 6 A2 + A1 ). 2 5.2 Second Proposed Modification In order to approximate the F test for the fixed effects, H 0 : L′β = 0 in a model as described in section 2.1, we use the same form of Wald-type F statistic used in K-R 1 ˆ L) −1 L′βˆ . F = βˆ ′L(L′Φ A method, Recall that chapter 4, for the balanced one-way Anova model, we had A1 = 2 2 , n−t A2 = m = 4+ and λ= 2 , n−t Β= A +6 , and 1 = , n−t A2 V +2 ≠ n − t , where ρ = 2E 2 ρ −1 n−t ≠1 E (n − t − 2) And for the Hotelling Τ 2 model, we had A1 = 2 , n −1 A2 = ( + 1) , n −1 B= 3 +4 , and n −1 A1 2 = , A2 +1 +2 ≠ n− , ρ −1 n− n− λ= ≠ E (n − − 2) n −1 m = 4+ and In this proposed modification, we also keep the modified estimator for E[F ] to be E ∗ as for the Kenward and Roger modification and first proposed modification. 59 Since when λ F ∼ F( , m) exactly, then ρ= V[F ] +m−2 , = 2 2E[F ] ( m − 4) we construct a modified estimator forζ = ρ , say ζ 2 such that ζ2 = + m2 − 2 . m2 − 4 (5.5) where m2 are the exact values for the denominator degrees of freedom for the balanced one-way Anova and the Hotelling T 2 models. For the balanced one-way Anova model, and since F ∼ F( , n − t ) exactly, then by (5.5), + n−t −2 ζ2 = n−t −4 −2 1+ n−t = (dividing by n − t ) 4 1− n−t Similarly, since for the Hotelling Τ 2 model, n−2 n− −4 1 1− n −1 = +3 1− n −1 n− p F ∼ F( , n − p ) exactly, then by (5.5), n −1 ζ2 = (dividing by n − 1) For both models, ζ2 = a0 + a1 A1 + a2 A2 , b0 + b1 A1 + b2 A2 Notice that unlike the first proposed modification where we express h1 in B term, in this proposed modification, we express ζ 2 in term of A1 and A2 . 2 2 2 −2 + a2 1+ n−t n−t = n−t For the balanced one-way Anova model, ζ 2 = 2 4 2 2 1− b0 + b1 + b2 n−t n−t n−t a0 + a1 60 ⇒ a0 = c, b0 = c, 2 2 a1 + 2 a2 = c( − 2) (5.6) 2 b1 + 2 b2 = c(−4) (5.7) 2 for any c ≠ 0 2 + a2 n −1 and for the Hotelling Τ 2 model, ζ2 = 2 b0 + b1 + b2 n −1 ⇒ a0 = c, 2 a1 + ( + 1) a2 = −c a0 + a1 b0 = c, ( + 1) 1 1− n −1 = n −1 ( + 1) +3 1− n −1 n −1 (5.8) 2 b1 + ( + 1)b2 = −c( + 3) (5.9) By solving equations (5.6) and (5.8), we obtain a1 = c 2c , and a2 = − . 2 ( + 2) ( + 2) By solving equations (5.7) and (5.9), we obtain b1 = − c c( + 4) , and b2 = − . ( + 2) ( + 2) Take c = ( + 2) ⇒ a0 = b0 = ( + 2), a1 = , a2 = −2, 2 b1 = −1, b2 = −( + 4) So, the modification that we apply for ζ will be ζ2 = 4+ ( + 2) + A1 − 2 A2 2 , ( + 2) − A1 − ( + 4) A2 + 2 2 ( + 2) + 2( A1 − A2 ) = ζ 2 −1 A1 + 2 A2 Summary The Kenward and Roger’s approach that was derived in section 3.1 is modified as and m2 = 2 ( + 2) + 2( A1 − A2 ) , A1 + 2 A2 λ2 = m2 E (m2 − 2) ∗ such that E ∗ = (1 − A2 ) −1 61 5.3 Comparisons among the K-R and the Proposed Modifications Based on the way we derived K-R and the proposed modifications, we should expect that these modifications give identical estimates of the denominator degrees of freedom when the ratio of A1 and A2 is the same as one of the ratios for the two special cases. In this section, we derive some useful results about the three modifications. From now on, when we say Kenward and Roger’s approach, we mean after the modification that makes the approach produce the exact values for the two special cases. Theorem 5.3.1 The two proposed and K-R methods give the same estimate of the denominator degrees of freedom when A1 = A2 or A1 = are 2 A2 , and the estimate values +1 2 ( + 1) , and − ( − 1) respectively. A2 A2 Proof (i ) For the case where A1 = A2 K-R approach g= So, d1 = ( + 1) A1 − ( + 4) A2 ( + 1) A2 − ( + 4) A2 = = −2 ( + 2) A2 ( + 2) A2 −2 −2 2 = , d2 = , +6 +6 3 + 2(3 − ) E ∗ = (1 − A2 Since then ) −1 , and B = d3 = 4 . +6 1 +6 A2 , ( A1 + 6 A2 ) = 2 2 −2 ⎧ ⎫ 1+ A2 ⎪ ⎪ ⎧ ⎫ 1 d B + 2 2 1 2 V∗ = ⎨ ⎬. ⎬= ⎨ 2 2 ⎩ (1 − d 2 B ) (1 − d3 B) ⎭ ⎪ (1 − A2 ) (1 − 2 A2 ) ⎪ ⎩ ⎭ ⇒ ρ∗ = ∗ V 2 E ∗2 −2 ⎫ ⎧ 1+ A2 ⎪ ⎪ 1 2 = ⎨ ⎬, 2 ⎪ 1 − A2 ⎪ ⎩ ⎭ −2 +2 A2 A2 2 , ρ ∗ −1 = −1 = 2 2 2 1 − A2 1 − A2 1+ 62 Therefore, m∗ = 4 + +2 +2 ⎛ 2 ⎞ 2 1 − A2 ⎟ = . = 4+ ∗ + 2 ⎜⎝ ρ −1 ⎠ A2 A2 2 First proposed approach d0 = ( + 1) A1 − ( 2 + 5 + 8) A2 . ( + 2) A2 − 2 − 2d 0 , B −2 ⎛ 2⎞ = 4 + d0 ⎜1 − ⎟ + B ⎝ B⎠ m1 = 4 + d 0 + ⎞ 2 ( − 2) ( + 1) A2 − ( 2 + 5 + 8) A2 ⎛ 4 ⎜1 − ⎟+ ( + 2) A2 ⎝ (6 + ) A2 ⎠ (6 + ) A2 = 4+ = 4+ = 4+ = 4+ [ ( + 1) A2 − ( 2 3 + 16 2 + 5 + 8) A2 ][(6 + ) A2 − 4 ] + 2 ( (6 + )( + 2) A22 2 − 4) A2 + 24 − (4 + 8)(6 + ) A2 (6 + )( + 2) A2 2 2 (6 + )(2 + ) (4 + 8)(6 + ) A2 2 − = (6 + )( + 2) A2 (6 + )( + 2) A2 A2 Second proposed approach m2 = 2 ( + 2) + 2( A1 − A2 ) 2 = A1 + 2 A2 A2 (ii ) For the case where A1 = 2 A2 +1 K-R approach g= ( + 1) A1 − ( + 4) A2 ( + 1) A1 + 4 ( + 1) 2 +4 = − = − = −1 + 2 ( + 2) + 1 +2 ( + 2) A2 ( + 2) A2 So, d1 = −1 +1 , d2 = , 3 +4 3 +4 and since E ∗ = (1 − A2 d3 = )−1 , and B = +3 . 3 +4 1 4+3 ( A1 + 6 A2 ) = A2 , 2 ( + 1) 63 then V ∗ = ⎫ 2⎧ 1 + d1 B ⎨ ⎬ 2 ⎩ (1 − d 2 B) (1 − d3 B ) ⎭ 1 ⎧ ⎫ A2 1− ⎪ ⎪⎪ 2⎪ ( + 1) = ⎨ ⎬, ⎪ (1 − A2 ) 2 (1 − + 3 A2 ) ⎪ ( + 1) ⎩⎪ ⎭⎪ 1 ⎧ ⎫ 1− A2 ⎪ ⎪ ( + 1) − A2 ⎫ V 1⎪ ( + 1) ⎪ 1 ⎧ = ⎨ ρ∗ = ∗ = ⎨ ⎬, ⎬ E ⎩ ( + 1) − ( + 3) A2 ⎭ ⎪1 − + 3 A2 ⎪ ( + 1) ⎪⎭ ⎪⎩ ∗ ρ ∗ −1 = ( + 1) − A2 ( + 2) A2 −1 = , ( + 1) − ( + 3) A2 ( + 1) − ( + 3) A2 and therefore m∗ = 4 + ( + 1) − ( + 3) A2 ( + 1) +2 , = 4+ = 1− + A2 A2 ρ −1 First proposed approach ( + 1) A1 − ( 2 + 5 + 8) A2 d0 = , ( + 2) A2 − 2 − 2d 0 B −2 ⎛ 2⎞ = 4 + d0 ⎜1 − ⎟ + B ⎝ B⎠ m1 = 4 + d 0 + = 4+ = 4+ 2 A2 − ( 2 + 5 + 8) A2 ⎛ 2 ( + 1) ⎞ ( + 1)( − 2) ⎜1 − ⎟+ ( + 2) A2 (4 + 3 ) A2 ⎝ (4 + 3 ) A2 ⎠ −( + 5 + 6) A2 ⎛ (4 + 3 ) A2 − 2 ( + 1) ⎞ ( + 1)( − 2) ⎜ ⎟+ ( + 2) A2 (4 + 3 ) A2 (4 + 3 ) A2 ⎝ ⎠ 2 ⎛ (4 + 3 ) A2 − 2 ( + 1) ⎞ ( + 1)( − 2) = 4 − ( + 3) ⎜ ⎟+ (4 + 3 ) A2 (4 + 3 ) A2 ⎝ ⎠ = 4+ ( + 1)( − 2) − ( + 3)[(4 + 3 ) A2 − 2 ( + 1)] (4 + 3 ) A2 = 4+ ( + 1)[2( + 3) + − 2] ( + 1) − ( + 3) = 1 − + (4 + 3 ) A2 A2 64 Second proposed approach m2 = 2 ( + 2) + 2( A1 − A2 ) ( + 1) = 1− + A1 + 2 A2 A2 When A1 = A2 or A1 = Theorem 5.3.2 1− 2 A2 , then the scale estimates are 1 and +1 −1 A2 respectively for K-R and the proposed modifications as well. ( + 1) Proof When A1 = A2 , or A1 = 2 A2 , the three modifications give the same estimates +1 for the denominator degrees of freedom which are 2 ( + 1) respectively and 1 − + A2 A2 (theorem 5.3.1). In addition, since the E[ F ] is modified in the same way for the three modifications, then the scale estimate is the same for the three modifications. In the following proof, we use the Kenward and Roger approach to find the scale estimate (i ) For the case where A1 = A2 , m∗ = and λ∗ = ∗ m = E (m∗ − 2) ⎛ 1 ⎜ ⎝ 1 − A2 ∗ (ii ) For the case A1 = and λ∗ = 2 , A2 2 A2 = 1. ⎞⎛ 2 ⎞ ⎟⎜ − 2 ⎟ ⎠⎝ A2 ⎠ 2 ( + 1) A2 , m = 1 − + , A2 +1 m = E (m − 2) ⎛ 1 ⎜ ⎝ 1 − A2 ∗ ( + 1) A2 ⎞⎛ ( + 1) ⎞ − ( + 1) ⎟ ⎟⎜ ⎠⎝ A2 ⎠ 1− + (1 − ) A2 + 1 ( + 1) , = ⎛ 1 ⎞ ⎛ A2 ⎞ ⎜ ⎟ ⎜1 − ⎟ ⎠ ⎝ 1 − A2 ⎠ ⎝ multiplying by A2 ( + 1) 65 = 1− ( − 1) A2 ( + 1) Corollary 5.3.3 If A1 = A2 or A1 = 2 A2 , then the K-R, and the proposed approaches +1 are identical. Proof From theorems (5.3.1 and 5.3.2), K-R, and the proposed approaches give the same estimate of the denominator degrees of freedom and the same estimate for the scale, and since the statistic used is the same, then all approaches are identical. When = 1, then A1 = A2 . Theorem 5.3.4 r Proof r A1 = ∑∑ wij tr(ΘΦΡi Φ)tr(ΘΦΡ j Φ) , i =1 j =1 Since L′ΦL is a scalar, then Θ = r r A2 = ∑∑ wij tr(ΘΦΡ i ΦΘΦΡ j Φ) i =1 j =1 LL′ , L′ΦL L′ΦΡ i ΦL 1 ⎛ LL′ΦΡ i Φ ⎞ tr(ΘΦΡ i Φ) = tr ⎜ tr(L′ΦΡ i ΦL) = . = ⎟ L′ΦL ⎝ L′ΦL ⎠ L′ΦL r r L′ΦΡ i ΦLL′ΦΡ j ΦL and hence, A1 = ∑∑ wij (L′ΦL) 2 i =1 j =1 (5.10) Also, tr(ΘΦΡi ΦΘΦΡ j Φ) = = and hence, 1 tr(LL′ΦΡ i ΦLL′ΦΡ j Φ) (L′ΦL) 2 tr(L′ΦΡ i ΦLL′ΦΡ j ΦL) L′ΦΡ i ΦLL′ΦΡ j ΦL = (L′ΦL) 2 (L′ΦL) 2 r r L′ΦΡ i ΦLL′ΦΡ j ΦL A2 = ∑∑ wij (L′ΦL) 2 i =1 j =1 From expressions (5.10 and 5.11), we can see that A1 = A2 . Corollary 5.3.5 When = 1, the K-R and the proposed approaches are identical. (5.11) 66 The estimates of the denominator degrees of freedom and the scale are 2 and 1 A2 respectively. Proof Since A1 = A2 (theorem 5.3.4), then by applying corollary (5.3.3), we have the K-R and the proposed approaches are identical. By using theorems (5.3.1 and 5.3.2), the proof is completed. 67 6. KENWARD AND ROGER’S APPROXIMATION FOR THREE GENERAL MODELS In chapter 4, we saw how the K-R approach was modified so the approximation for the denominator degrees of freedom and the scale match the known values for the two special cases where F has an exact F distribution. In this chapter, we show that the K-R approximation produces the exact values for the denominator degrees of freedom and scale factor for more general models. Three models are considered: the modified Rady model which is more general than the balanced one-way Anova model, a general multivariate linear model which is more general than the Hotelling T 2 , and a balanced multivariate model with a random group effect. 6.1 Modified Rady’s Model A model that satisfies assumptions (A1)-(A5) will be called a Rady model. If the hypothesis satisfies conditions (A6) and (A7), and the model is a Rady model, the testing problem will be called a Rady testing problem. Rady (1986) proved that a Rady testing problem has an exact F test to test a linear hypothesis of the fixed effects. The model considered in this section is slightly different than Rady’s model where we add two more assumptions (A8) and (A9) and the model will be called modified Rady’s model. When the model is a modified Rady model, and the hypothesis satisfies conditions (A6) and (A7), then the testing problem is called a modified Rady testing problem. 6.1.1 Model and Assumptions Consider the mixed linear model, y ∼ N( Χβ, Σ ) , where Σ = Cov[y ] = σ 12 Σ1 + + σ r −12 Σ r −1 + σ r 2 Ι , and Σ i are n.n.d. matrices. Χ is a known matrix, β is a vector of unknown fixed effect parameters, and ℑ denotes the set of all possible vectors σ and ℵ = {Σ(σ ) : σ ∈ ℑ} . We are interested in testing H 0 : L′β = 0 , for L′ an ( × p ) fixed matrix . 68 We assume that the model satisfies assumptions A1-A8 where (A1) ℑ contains an r -dimensional rectangle. (A2) Σ(σ ) is p.d. for all σ. (A3) β and σ are functionally independent. (A4) The model has orthogonal block structure (OBS). That is, ℵ is commutative (i.e. Σ(σ ) Σ(σ ∗ ) = Σ(σ ∗ ) Σ(σ ) ∀σ, σ ∗ ∈ ℑ ). The orthogonal block structure is equivalent to the spectral decomposition: q r ⎛ ⎞ Σ(σ ) = ∑ α k (σ )Εk , where α k (σ ) is a linear function of σ ⎜ i.e α k (σ ) = ∑ α ki σ i ⎟ , and k =1 i =1 ⎝ ⎠ Εk 's are pairwise orthogonal o.p. matrices. (A5) Zyskind’s condition: A model is said to satisfy Zyskind’s condition if Ρ Χ Σ = ΣΡ Χ for all Σ . (A6) Zyskind’s condition for the submodel under the null hypothesis; ΣΡ U = Ρ U Σ for all Σ where ℜ(U) = {Χβ : L′β = 0} . (A7) ℜ(M ) ⊂ ℜ(B s ) for some s, where Μ = Ρ Χ − Ρ U , and each Ek can be expressed as Ek = B k + Ck where B k and Ck are symmetric idempotent matrices,with ℜ(Ck ) ⊂ N(Χ′), ℜ(B k ) = ℜ(Ek Χ) and B k Ck = 0. (A8) (Ι − Ρ Χ ) Σi 's are linearly independent. (A9) q = r. Remarks (i ) B k is an o.p.o on ℜ(Εk Χ), B k = Εk Ρ Χ , and notice that ℜ(Εk Χ) ⊂ ℜ(Εk ). Then, Ck = Εk − B k = Εk (Ι − Ρ Χ ) is an o.p.o on ℜ(Εk ) ∩ ℜ(Εk Χ) ⊥ (Seely, 2002, problem, 2.F.3). (ii ) ℜ(Ε k ) ∩ ℜ(Εk Χ) ⊥ ⊂ N(Χ′) Proof: Take any x ∈ℜ(Εk ) ∩ ℜ(Εk Χ) ⊥ ⇒ x ∈ℜ(Εk ) and x ∈ℜ(Εk Χ) ⊥ = N(Χ′Εk ) ⇒ Εk v = x for some v, and Χ′Εk x = 0. ⇒ Χ′Εk Εk v = 0 ⇒ Χ′Εk v = 0 ⇒ Χ′x = 0, and this means that x ∈ N(Χ′). 69 (iii ) From assumption(A7), ℜ(M ) ⊂ ℜ(B s ) = ℜ(Ε s Χ), and since ℜ(Ε s Χ) ⊂ ℜ(Ε s ), then ℜ(M ) ⊂ ℜ(Ε s ), and hence Ε s M = M. (iv ) Since L′βˆ is estimable, then L = Χ′A for some A. r(L) = r( A ) − dim[ℜ( A ) ∩ N(Χ′)], and ℜ( A) ∩ N(Χ′) = ℜ( A) ∩ ℜ( Χ) ⊥ = 0, because ℜ( A) ⊂ ℜ( Χ). Therefore, r(L) = r( A) = r(M ) = . (v) Ck ≠ 0 ∀k . r −1 r Proof: Suppose that Cr = 0, then Σ = ∑ σ i Σi = ∑ α k (σ )E k i =1 k =1 r r −1 i =1 k =1 ⇒ ∑ σ i (Ι - Ρ Χ ) Σi = ∑ α k (σ )Ck , and this combined with assumption (A1) implies (Ι - Ρ Χ ) Σi 's ∈ span{C1 ,..., Cr −1} which contradicts with assumption (A8) that (Ι - Ρ Χ ) Σi 's are linearly independent. (vi ) B = {α ki }r×r is invertible. r r Proof: Suppose Bσ = Bσ ∗ . Then α k (σ ) = α k (σ ∗ ) ∀k ⇒ ∑ α k (σ )Ck =∑ α k (σ ∗ )Ck k =1 k =1 r r r r k =1 k =1 i =1 i =1 ⇒ (Ι − Ρ Χ )∑ α k (σ )Ek =(Ι − Ρ Χ )∑ α k (σ ∗ )Ek ⇒ ∑ σ i (Ι - Ρ Χ ) Σi =∑ σ i∗ (Ι - Ρ Χ ) Σi , and since (Ι - Ρ Χ ) Σi 's are linearly independent (assumption A8), then σ i = σ i∗ ∀i. Hence B is injective on ℑ, and by assumption (A1), it is injective on square, then it is invertible. r , and since it is (vii ) The definition of OBS in (A4) was given by Birkes and Lee (2003) which is less restrictive than the definition given by VanLeeuwen et al. (1998). Examples 1- Fixed effects linear models which include the balanced one-way Anova model are modified Rady models. 2- Balanced mixed classification nested models are modified Rady models. 3- Many balanced mixed classification models including balanced split-plot models are modified Rady models. 70 6.1.2 Useful Properties for the Modified Rady Model A testing problem that satisfies all or some of the modified Rady testing problem assumptions has several properties that will be summarized in this section ˆ =Φ ˆ In a Rady model, Φ A Lemma 6.1.2.1 Consider Q ij − Ρ i ΦΡ j Proof = Χ′ Note ∂Σ −1 ∂Σ −1 ∂Σ −1 ∂Σ −1 −1 −1 ′ ′ ′ Σ Χ−Χ Χ( Χ Σ Χ) Χ Χ , ∂σ i ∂σ j ∂σ i ∂σ j Since Σ −1Χ( Χ′Σ −1Χ)−1 Χ′ is a p.o. on ℜ(Σ −1Χ) along N (Χ′), Σ −1 is a nonsingular matrix (by assumption) and n.n.d.(Birkes, 2004), then it is a p.d. matrix (Seely, 2002, corollary 1.9.3) ⇒ r( Χ′Σ −1Χ) = r(Χ′) (Birkes, 2004) . Since r( Χ′Σ −1Χ) = r( Χ′) = r( Ρ Χ ), then we have Σ −1Χ( Χ′Σ −1Χ) −1 Χ′ = Σ −1Ρ Χ (Ρ Χ Σ -1Ρ Χ ) + Ρ Χ (Seely, 2002, problem 1.10.B3) q Ρ Χ Σ −1 Ρ Χ = ∑ Ρ Χ Also, k =1 q 1 1 Εk Ρ Χ = ∑ Ε k Ρ Χ = Σ −1Ρ Χ α k (σ ) k =1 α k (σ ) ⇒ ( Ρ Χ Σ −1Ρ Χ ) = ∑ α k (σ )Ε k Ρ Χ = ΣΡ Χ , + q k =1 and hence Σ −1Χ( Χ′Σ −1Χ) −1 Χ′ = Σ −1Ρ Χ (Ρ Χ Σ −1Ρ Χ )+ Ρ Χ = Σ −1Ρ Χ ΣΡ Χ Ρ Χ = Ρ Χ (Zyskind's condition) −1 ΡΧ (6.1) −1 ∂Σ ∂Σ ∂Σ −1 ∂Σ −1 Σ Ρ Χ = Ρ Χ Σ −1 Σ Σ ΡΧ ∂σ i ∂σ j ∂σ i ∂σ j ⎛ q 1 ⎞⎛ q ⎞⎛ q ⎞ ⎞⎛ q 1 ⎞⎛ q 1 = ΡΧ ⎜ ∑ Ε k ⎟ ⎜ ∑ α ki Εk ⎟ ⎜ ∑ Εk ⎟ ⎜ ∑ α kj Εk ⎟ ⎜ ∑ Εk ⎟ Ρ Χ ⎠ ⎝ k =1 α k (σ ) ⎠ ⎝ k =1 ⎠ ⎝ k =1 α k (σ ) ⎠ ⎝ k =1 α k (σ ) ⎠ ⎝ k =1 α kiα kj Εk Ρ Χ 3 k =1 [α k (σ )] q =∑ Similarly, ΡΧ ∂Σ −1 ∂Σ −1 Χ( Χ′Σ −1Χ) −1 Χ′ ΡΧ ∂σ i ∂σ j (6.2) 71 = Ρ Χ Σ −1 ∂Σ −1 ∂Σ −1 Σ Χ( Χ′Σ −1Χ) −1 Χ′Σ −1 Σ ΡΧ ∂σ i ∂σ j = Ρ Χ Σ −1 ∂Σ ∂Σ −1 Ρ Χ Σ −1 Σ ΡΧ ∂σ i ∂σ j ⎛ q 1 ⎞⎛ q ⎞⎛ q ⎞ ⎞ ⎛ q 1 ⎞⎛ q 1 Εk ⎟ ⎜ ∑ α ki Εk ⎟ Ρ Χ ⎜ ∑ Εk ⎟ ⎜ ∑ α kj Εk ⎟ ⎜ ∑ Εk ⎟ Ρ Χ = ΡΧ ⎜ ∑ ⎠ ⎝ k =1 α k (σ ) ⎠ ⎝ k =1 ⎠ ⎝ k =1 α k (σ ) ⎠ ⎝ k =1 α k (σ ) ⎠ ⎝ k =1 α kiα kj Εk Ρ Χ 3 k =1 [α k (σ )] q =∑ (6.3) From expressions (6.2 and 6.3) we obtain ΡΧ ⇒ Χ′ ∂Σ −1 ∂Σ −1 ∂Σ −1 ∂Σ −1 Σ ΡΧ − ΡΧ Χ( Χ′Σ -1Χ) −1 Χ′ ΡΧ = 0 ∂σ i ∂σ j ∂σ i ∂σ j ∂Σ −1 ∂Σ −1 ∂Σ −1 ∂Σ −1 Σ Χ − Χ′ Χ( Χ′Σ -1Χ) −1 Χ′ Χ=0 ∂σ i ∂σ j ∂σ i ∂σ j ˆ =Φ ˆ. This shows that Qij − Ρ i ΦΡ j = 0, and since R ij = 0, then Φ A ˆ =Φ ˆ. Corollary 6.1.2.2 In balanced mixed classification models, Φ A Proof Balanced mixed classification models are Rady models (Rady, 1986), and hence ˆ =Φ ˆ. by applying lemma 6.1.2.1, we have Φ A Lemma 6.1.2.3 For a Rady testing problem, A1 = A2 . Ω = {Χβ : β ∈ Proof p } = ℜ(Χ). Since we are interested in testing Η 0 : L′β = 0, then L′β is assumed to be estimable ⇔ ℜ(L) ⊂ ℜ( Χ′) = ℜ( Χ′Χ) ⇔ L = Χ′ΧB for some B ⇔ L = Χ′A where A = ΧB, ℜ( A) ⊂ ℜ( Χ), and hence ΩΗ = {Χβ : L′β = 0} = ℜ( Χ) ∩ N ( A′) = ℜ(U). Also, since and then ΣΡ U = Ρ U Σ for all Σ (Zyskind's condition for the submodel), ΣΡ Χ = Ρ Χ Σ for all Σ (Zyskind's condition), ΣΜ = ΜΣ for all Σ where Μ = Ρ Χ − Ρ U . 72 ⎡ ⎤ ∂Σ −1 Σ Χ( Χ′Σ −1Χ) −1 ⎥ tr(ΘΦΡ i Φ) = − tr ⎢L(L′ΦL) −1 L′( Χ′Σ -1Χ) −1 Χ′Σ −1 ∂σ i ⎣ ⎦ ⎡ ⎤ ∂Σ −1 = − tr ⎢ Χ′A ( A′ΧΦΧ′A ) −1 A′Χ( Χ′Σ -1Χ) −1 Χ′Σ −1 Σ Χ( Χ′Σ -1Χ) −1 ⎥ ∂σ i ⎣ ⎦ ⎡ ∂Σ ⎤ = − tr ⎢ Ρ Χ A( A′ΧΦΧ′A ) −1 A′Ρ Χ ⎥ ∂σ i ⎦ ⎣ ⎡ ∂Σ ⎤ = − tr ⎢ A ( A′ΧΦΧ′A ) −1 A′ ⎥ ∂σ i ⎦ ⎣ ⎡ ∂Σ ⎤ = − tr ⎢ A( A′ΣA) −1 A′ ⎥, ∂σ i ⎦ ⎣ (from 6.1). , becauseℜ( A ) ⊂ ℜ( Χ) again notice that Σ −1ΧΦΧ′ = Ρ Χ Since ℜ( U ) ⊂ ℜ( Χ ) , then Μ is an o.p.o. on Ω ∩ Ω Η⊥ (Seely, 2002, problem 2.F.3), Ω ∩ ΩΗ⊥ = ℜ( Χ) ∩ [ℜ( Χ) ∩ N ( A′) ] = ℜ( Χ) ∩ ⎡⎣ℜ( Χ) ⊥ + ℜ( A) ⎤⎦ ⊥ = ℜ( Χ) ∩ ℜ( Χ) ⊥ + ℜ( Χ) ∩ ℜ( A) = ℜ( A) So, we have ℜ(M ) = ℜ( A). Analogous to the argument given in the proof of lemma 6.2.1, we obtain ΣA ( A′ΣA ) −1 A′ = ΣM (MΣM ) + M = M, ⎛ ∂Σ tr(θΦΡ i Φ ) = − tr ⎜ Σ −1M ∂σ i ⎝ and hence, ⎞ ⎛ −1 ∂Σ ⎟ tr ⎜⎜ Σ M ∂σ j ⎠ ⎝ r r ⎛ ∂Σ −1 ∂Σ ⎞ A2 = ∑∑ wij tr ⎜ Σ −1M Σ M ⎟ ⎜ ∂σ i ∂σ j ⎟⎠ i =1 j =1 ⎝ r r ⎛ ∂Σ ⇒ A1 = ∑∑ wij tr ⎜ Σ −1M ∂σ i i =1 j =1 ⎝ (6.4) ⎞ ⎟. ⎠ ⎞ ⎟⎟, ⎠ A1 and A2 can also be expressed in Εk 's terms as q α α ki A1 = ∑∑ wij ∑ tr ( Εk M )∑ kj tr ( Εk M ), i =1 j =1 k =1 α k (σ ) k =1 α k (σ ) r q r (6.5) And r r α kiα kj tr ( Εk M ) 2 k =1 [α k (σ )] q A2 = ∑∑ wij ∑ i =1 j =1 (6.6) 73 Since Εk M = MΕk (Zyskind's condition for the submodel), then ℜ(Εk M ) = ℜ(MΕk ) ⊂ ℜ(M ), and hence Εk M is an o.p.o (Seely, 2002, problem 2.F.1). ⇒ tr(Εk M ) = r(Εk M ) (Harville, 1997, corollary 10.2.2 and also Seely's notes). = r(M ) − dim[ℜ(M) ∩ N(Εk )] (Seely, 2002, proposition 1.8.2). When k = s, then dim[ℜ(M ) ∩ N(Ε s )] = 0 (assumption A7), and hence r(Εk M ) = r(M ) = (from the remark above). When k ≠ s, then ℜ(M ) ⊂ ℜ(Ε s ) ⇒ Εk ℜ(M ) ⊂ Εk ℜ(Ε s ) ⇒ ℜ(Εk M ) ⊂ ℜ(Εk Ε s ) = 0, and then we have Εk M = 0. Hence, expressions (6.5) and (6.6) can be simplified as r r A1 = ∑∑ wij i =1 j =1 r r A2 = ∑∑ wij and i =1 j =1 α siα sj [α s (σ )]2 2 , α siα sj . [α s (σ )]2 Lemma 6.1.2.4 For a modified Rady model, the approximation and the exact approaches to derive W = Var[σˆ REML ] are identical. Proof The Exact Approach 1 exp[− z′(Κ ′ΣΚ ) −1 z ], z ∼ N(0,Κ ′ΣΚ ) 2 1 1 ⇒ R (σ ) = constant − log Κ ′ΣΚ − z′(Κ ′ΣΚ ) −1 z 2 2 f (z ) = (2π ) − q 2 Κ ′ΣΚ − 12 1 1 = constant − log Κ ′ΣΚ − y ′Κ (Κ ′ΣΚ ) −1 Κ ′y, 2 2 Since B = {α ki }r×r is invertible (from the remarks above) which is equivalent to α = {α k (σ )}r×1 a one to one function. Now, we reparametrize by making γ k = α k (σ ), and hence we have ∂ R (γ) ∂γ k 1 ⎡ ∂Σ ∂Σ ⎤ 1 = − tr ⎢(Κ ′ΣΚ ) −1 Κ ′ Κ ⎥ + y′Κ (Κ ′ΣΚ ) −1 Κ ′ Κ (Κ ′ΣΚ ) −1 Κ ′y 2 ⎣ ∂γ k ∂γ k ⎦ 2 1 ⎛ ∂Σ = − tr ⎜ G 2 ⎝ ∂γ k ⎞ 1 ∂Σ Gy , ⎟ + y ′G 2 γ ∂ k ⎠ (6.7) 74 G = Σ −1 − Σ −1Χ( Χ′Σ -1Χ)−1 Χ′Σ −1 = Κ (Κ ′ΣΚ )−1 Κ ′ = Σ −1 (Ι - Ρ Χ ) where ⇒ ∂ R (γ) ∂γ k (from 6.1) 1 1 = − tr ( GE k ) + y′GEk Gy , since Σ = ∑ γ k Εk 2 2 k =1 r r ⎞ 1 C 1 ⎛ 1 1 = − tr ⎜ Ck ⎟ + y′ 2k y, since G = Σ −1 (Ι - Ρ Χ ) = ∑ Ck 2 ⎝γk k =1 γ k ⎠ 2 γk Equating ∂ R (γ ) ∂γ k with zero, α k (σˆ ) REML = γˆk = y ′Ck y , tk for k = 1,..., r (6.8) ⎡ ⎤ ⎡α1 (σˆ ) ⎤ ⎡α1 (σˆ ) ⎤ ⎢ ⎥ y′Ck y ⎥ ⎢ ⎥ ⎥ −1 ⎢ −1 ⎢ ˆ Bσˆ = ⎢ ⎥ ⇒ σ REML = B ⎢ ⎥ = B s, where s = ⎢ t ⎥ k ⎢⎣α r (σˆ ) ⎥⎦ ⎢⎣α r (σˆ ) ⎥⎦ ⎢ ⎥ ⎣ ⎦ The above expression is an explicit form for σˆ REML . Var[σˆ REML ] = Var[B −1s] = B −1Var[s](B −1 )′. Observe that for any two different elements of s, Cov[ y′Ci y , y′C j y ] = 2tr(Ci ΣC j Σ) + 4( Χβ)′Ci ΣC j Χβ = 0 ( Schott, 2005, theorem 10.22). And Cov[ y ′Ci y , y ′Ci y ] = 2tr(Ci ΣCi Σ) + 4( Χβ)′Ci ΣCi Χβ = 2tr(Ci ΣCi Σ), ⎛ q ⎞ ⎛ q ⎞ Ci ΣCi Σ = (Ι - Ρ Χ )Εi ⎜ ∑ α k (σ )Ε k ⎟ (Ι - Ρ Χ )Εi ⎜ ∑ α k (σ )Ε k ⎟ ⎝ k =1 ⎠ ⎝ k =1 ⎠ 2 = (Ι - Ρ Χ )[α i (σ )] Εi , because Ε k 's are orthogonal o.p. matrices = [α i (σ )]2 Ci ⎧ 2[α i (σ )]2 ⎫ So, Var[s] = diag ⎨ ⎬ = 2D, t i ⎩ ⎭r ×r and hence Var[σˆ REML ] = 2B −1D(B −1 )′. (6.9) The Approximation Approach ⎡ ⎧ ⎛ ∂Σ ∂Σ ⎪ W = Var[σˆ REML ] ≈ 2 ⎢ ⎨ tr ⎜ G G ⎜ ⎢ ⎪ ⎝ ∂σ i ∂σ j ⎣m ⎩ −1 r ⎞ ⎫⎪⎤ ⎟⎟ ⎬⎥ (Searle et al., 1992, P.253), ⎠i , j =1 ⎪⎭⎥⎦ 75 ⎛ ∂Σ ∂Σ G tr ⎜ G ⎜ ∂σ ∂σ j i ⎝ ⎞ ⎡ −1 ∂Σ −1 ∂Σ ⎤ Σ (Ι - Ρ Χ ) ⎟⎟ = tr ⎢ Σ (Ι - Ρ Χ ) ⎥ ∂σ i ∂σ j ⎦⎥ ⎠ ⎣⎢ ⎡ r α α ⎤ r α α = tr ⎢∑ ki kj 2 Εk (Ι - Ρ Χ ) ⎥ = ∑ ki kj 2 tk , ⎣ k =1 [α k (σ )] ⎦ k =1 [α k (σ )] where tr(Ck ) = tk ≥ 1. r ⎧ r ⎫ tk Define Ψ = ⎨∑ α ki α kj ⎬ = B′D−1B, where B and D are defined as above. 2 ⎩ k =1 [α k (σ )] ⎭i , j =1 ⇒ Ψ −1 = B −1D(B′) −1 , hence W = Var[σˆ REML ] ≈ 2Ψ −1 = 2 B −1D(B′)−1 . (6.10) From expressions (6.9 and 6.10), we see that both approaches give same result. Corollary 6.1.2.5 For a modified Rady testing problem, A1 = Proof r r From expression (6.7), A1 = ∑∑ wij i =1 j =1 r Notice that r ∑∑ α i =1 j =1 si α siα sj [α s (σ )]2 2 = 2 2 2 and A2 = . ts ts 2 [α s (σ )]2 r r ∑∑ α i =1 j =1 si wijα sj . wijα sj is an entry of BWB′; call it (BWB′) ss , where B as in above. From expressions (6.9) or (6.10), we have BWB′ = B[B −1D(B −1 )′]B′ = D, and hence A1 = 2 × [α s (σ )]2 2[α s (σ )]2 2 2 , = ts ts and A2 = 2 ts Lemma 6.1.2.6 (Rady, 1986) Under the null hypothesis, the Anova test statistic FR ∼ F( , ts ) , where FR = Lemma 6.1.2.7 The K-R test statistic F = FR . Proof 1 ˆ L) −1 L′βˆ , F = βˆ ′L(L′Φ A ˆ =Φ ˆ (lemma 6.1.2.1), then and since Φ A ts y ′My . y′Cs y 76 1 ˆ ) −1 L′βˆ F = βˆ ′L(L′ΦL ˆ ) −1 = ⎡L′( Χ′Σ ˆ −1Χ) −1 L ⎤ (i ) (L′ΦL ⎣ ⎦ −1 ˆ −1Χ) −1 Χ′A ⎤ = ⎣⎡ A′Χ( Χ′Σ ⎦ −1 (notice that L = Χ′A) = [ A′ΣΡ Χ A ] , utilizing expression (6.1) −1 = [ A′ΣA ] , because ℜ( A) ⊂ ℜ( Χ), −1 ˆ −1y = ( Χ′Χ) −1 Χ′y ( utilizing expression 6.1). (ii ) βˆ = ( Χ′Σˆ −1Χ) −1 Χ′Σ From (i ) and (ii ), F= = = = 1 y ′Χ( Χ′Χ) −1 Χ′A [ A′ΣA ] A′Χ( Χ′Χ) −1 Χ′y 1 y ′Ρ Χ A [ A′ΣA ] A′Ρ Χ y 1 y ′A [ A′ΣA ] A′y (because ℜ( A ) ⊂ ℜ( Χ)) 1 ˆ −1My (utilizing expression 6.4), y ′Σ −1 −1 −1 1 1 Εk M = M (from the remarks of this section), ˆ) α s (σˆ ) k =1 α k (σ y ′C s y where α s (σˆ ) = (from expression 6.8). ts t y ′My F= s Therefore, = FR y ′C s y q ˆ −1M = ∑ but Σ 6.1.3 Estimating the Denominator Degrees of Freedom and the Scale Factor From corollary, 6.1.2.5, A1 = 2 2 , ts and A2 = 2 , ts and then A1 = . A2 By utilizing theorems (5.3.1 and 5.3.2) and corollary (5.3.3), K-R and our proposed 2 = ts , and λ ∗ = 1. approaches are identical. m∗ = A2 The estimates of the denominator degrees of freedom and the scale factor match the exact 77 values for the Anova test in lemma 6.1.2.6. So, the K-R approach produces the exact values for the modified Rady model which is more general than the balanced one-way Anova model which is one of the special cases considered in modifying the K-R approach. 6.2 A General Multivariate Linear Model This model was studied by Mardia et al. (1992) where they show that for this model, we have an exact F test to test a linear hypothesis of the fixed effects. 6.2.1 The Model Setting Consider the model defined by Y = ΧB + U, where Y (n × p ) is an observed matrix of p response variables on each of n individuals, Χ(n × q ) is the design matrix, B( q × p) is a matrix of unknown parameters, and U is a matrix of unobserved random disturbances whose rows for given Χ are uncorrelated. We also assume that U is a data matrix from N p (0, Σ p ). ⎡ β′(1) ⎤ ⎡Σ p ⎡ y1′ ⎤ ⎡ x1′ ⎤ ⎢ ⎥ ⎢ ⎢ ⎥ ⎢ ⎥ Y = ⎢ ⎥, Χ = ⎢ ⎥, B=⎢ , Σ=⎢ ⎥ ⎢β′( q ) ⎥ ⎢0 ⎢⎣ y ′n ⎥⎦ ⎢⎣ x′n ⎥⎦ ⎣ ⎦ ⎣ xi 's are q × 1 vectors , and y i 's and βi 's are p × 1 vectors. 0⎤ ⎥ ⎥ = Ιn ⊗ Σ p Σ p ⎥⎦ Suppose that we are interested in testing: Η 0 : c′BM = 0, where c(1× q ), and M ( p × r ) are given matrices , and M have a rank r. Note A special case of this general model is the Hotelling T 2 one sample test, in which Χ = 1n , B = μ′, c = 1, M = Ι p . So, p = r , and q = 1 . Also, the Hotelling two samples is a special case of this model. Other Setting To compute the Wald statistic for our problem so we are able to compute the estimates of the Kenward and Roger method, we rearrange Y as a vector. 78 ⎡ β (1) ⎤ ⎡ y1 ⎤ ⎢ ⎥ ⎢ ⎥ y = ⎢ ⎥, β=⎢ ⎥ ⎢β ( q ) ⎥ ⎢⎣ y n ⎥⎦ ⎣ ⎦ y and β are np × 1 and pq ×1 vectors respectively. Notice that the design matrix for the new setting will be Χ∗ = Χ ⊗ Ι p . The testing problem Η 0 : c′BM = 0, which is ⎡ β′(1) ⎤ ⎢ ⎥ cq ⎤⎦ ⎢ ⎥ M = ⎡⎣ c1 ⎢ ⎥ ⎣β′( q ) ⎦ ⎡⎣c1 q q j =1 j =1 ⎡ β′(1) M ⎤ ⎢ ⎥ cq ⎤⎦ ⎢ ⎥ = 0. ⎢ ⎥ ⎣β′( q ) M ⎦ ⇔ ∑ c j β′( j ) M = 0 ⇔ ∑ c j M′β ( j ) = 0 ⇔ ⎡⎣c1M′ ⎡ β (1) ⎤ ⎢ ⎥ cq M ′⎤⎦ ⎢ ⎥ = 0, ⎢β ( q ) ⎥ ⎣ ⎦ which is can be written as Η 0 : L′β = 0, where L = c ⊗ M. So under this setting, Φ = ⎡⎣ ( Χ ⊗ Ι p )′(Ι n ⊗ Σ −p1 )( Χ ⊗ Ι p ) ⎤⎦ −1 = ( Χ′Χ ⊗ Σ −p1 ) = ( Χ′Χ)−1 ⊗ Σ p , −1 β = ⎡⎣( Χ′Χ) −1 ⊗ Σ p ⎤⎦ ( Χ′ ⊗ Ι p )((Ι n ⊗ Σ −p1 )y = ⎡⎣ ( Χ′Χ) −1 Χ′ ⊗ Ι p ⎤⎦ y, and (M′Σ p M ) −1 (L′ΦL) = ⎡⎣(c′ ⊗ M′)(( Χ′Χ) ⊗ Σ p )(c ⊗ M ) ⎤⎦ = . c′( Χ′Χ) −1 c −1 −1 −1 6.2.2 Useful Properties for the Model In this section, we derive several lemmas to summarize the useful properties for this model under the specified testing problem. Lemma 6.2.2.1 (Mardia et al., 1992, chapter 6) ˆ M ) −1 M ′Y′Χ( Χ′Χ) −1 c c′( Χ′Χ) −1 Χ′YM (M ′Σ p ˆ is the REML estimator Let FM = , where Σ p (n − q )c′( Χ′Χ) −1 c 79 for Σ p . Under the null hypothesis, FM ∼ [ r ( n − q − r + 1)]Fr ,n − q − r +1 . n ˆ = Note Similar to lemma 4.2.3, Σ p ∑ (y i =1 i − y )(y i − y )′ n−q Special Notation Since σ = {σ if } p×( p+1) (i ≤ f ) , then like in the Hotelling T 2 case, we use 2 the notation Ρ if , Ρ jg , Q if , jg , R if , jg , and wif , jg . Lemma 6.2.2.2 ˆ =Φ ˆ . Φ A ⎛ ∂Σ −p1 ⎞ ∂Σ −p1 Ρ if = ( Χ′ ⊗ Ι p ) ⎜ Ι n ⊗ ⎟ ( Χ ⊗ Ι p ) = Χ′Χ ⊗ ⎜ ∂σ if ⎟⎠ ∂σ if ⎝ Proof ⎛ ⎛ ∂Σ −1 ⎞ ∂Σ −p1 ⎞ ⇒ Ρ if ΦΡ jg = ⎜ Χ′Χ ⊗ p ⎟ ⎣⎡( Χ′Χ) −1 ⊗ Σ p ⎦⎤ ⎜ Χ′Χ ⊗ ⎟ ⎜ ⎜ ∂σ if ⎟⎠ ∂σ jg ⎟⎠ ⎝ ⎝ = Χ′Χ ⊗ ∂Σ −p1 ∂σ if Σp ∂Σ −p1 ∂σ jg , ⎛ ⎛ ∂Σ −p1 ⎞ ∂Σ −p1 ⎞ Qif , jg = ( Χ′ ⊗ Ι p ) ⎜ Ι n ⊗ ⎟⎟ (Ι n ⊗ Σ p ) ⎜⎜ Ι n ⊗ ⎟⎟ ( Χ ⊗ Ι p ) ⎜ σ σ ∂ ∂ if jg ⎝ ⎠ ⎝ ⎠ −1 −1 ∂Σ ∂Σ p = Χ′Χ ⊗ p Σ p ∂σ if ∂σ jg ˆ =Φ ˆ. ⇒ Qif , jg − Ρ if ΦΡ jg = 0, and since R if , jg = 0, then we conclude that Φ A The K-R test statistic F = Lemma 6.2.2.3 r FM . n−q 1 ˆ L) −1 L′βˆ F = βˆ ′L(L′Φ A Proof 1 ˆ ) −1 L′βˆ (lemma 6.2.2.2) = βˆ ′L(L′ΦL = 1 ( ˆ M ) −1 (M′Σ p −1 ⎣⎡( Χ′Χ) Χ′ ⊗ Ι p ⎦⎤ y (c ⊗ M ) c′( Χ′Χ) −1 c (c′ ⊗ M′) ⎡⎣ ( Χ′Χ) Χ′ ⊗ Ι p ⎤⎦ y −1 ) ′ 80 { } 1 ˆ M ) −1 M ′ y. y ′ Χ( Χ′Χ) −1 cc′( Χ′Χ) −1 Χ′ ⊗ M (M′Σ p −1 rc′( Χ′Χ) c In the expression above, we used = r , and this is because dim(L′) = r × pq. = In addition, { } ˆ M ) −1 M′ y = y ′(dd′ ⊗ A)y = (vec Y)′(dd′ ⊗ A )vec Y, y ′ Χ( Χ′Χ) −1 cc′( Χ′Χ) −1 Χ′ ⊗ M (M′Σ p ˆ M ) −1 M′Y′Χ( Χ′Χ) −1 c = d′YAY′d, and c′( Χ′Χ) −1 Χ′YM (M′Σ p ˆ M )−1 M′, where d = Χ( Χ′Χ) −1 c , A = M (M′Σ p vecY is the np dimensional column vector that we obtain when we stack the columns of Y. Since (vec Y)′(dd′ ⊗ A)vec Y = d′YAY′d (Harville, 1997, theorem 16.2.2), the proof is completed. Corollary 6.2.2.4 Under the null hypothesis, Proof n − q − r +1 F ∼ F(r , n − q − r + 1). n−q Direct from lemmas 6.2.2.1 and 6.2.2.3. Lemma 6.2.2.5 A1 = 2 , n−q and A2 = ( + 1) . n−q Proof p p p p A1 = ∑ ∑ ∑∑ wif , jg tr(ΘΦΡ if Φ)tr(ΘΦΡ jg Φ), i =1 f =1, j =1 g =1, i≤ f j≤ g p and p p p A2 = ∑ ∑ ∑∑ wif , jg tr(ΘΦΡif ΦΘΦΡ jg Φ). i =1 f =1, j =1 g =1, i≤ f j≤g Θ = L(L′ΦL) −1 L′ = (c ⊗ M ) (M′Σ p M ) −1 cc′ ⊗ M (M′Σ p M ) −1 M′ ′ ′ (c ⊗ M ) = , c′( Χ′Χ) −1 c c′( Χ′Χ) −1 c tr(ΘΦΡ if Φ) ⎧⎪ ⎫⎪ ⎡ ∂Σ −p1 ⎤ 1 −1 −1 −1 ′ ′ ′ ′ ′ ′ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⊗ tr ( ) ( ) ( ) = ⊗ ⊗ ⊗ cc M M Σ M M Χ Χ Σ Χ Χ Χ Χ Σ ⎢ ⎥ ⎨ p p⎦ p ⎦⎬ ⎦⎣ ∂σ if ⎦⎥ ⎣ c′( Χ′Χ) −1 c ⎪⎩ ⎣ ⎪⎭ ⎣⎢ ⎡ ⎤ ∂Σ −p1 1 −1 −1 ′ ′ ′ ′ tr ( ) ( ) = cc Χ Χ ⊗ M M Σ M M Σ Σ ⎢ ⎥ p p p c′( Χ′Χ) −1 c ⎣⎢ ∂σ if ⎦⎥ 81 ⎡ ∂Σ p ⎤ = − tr ⎢ M (M ′Σ p M ) −1 M ′ ⎥ ∂σ if ⎥⎦ ⎢⎣ Accordingly, A1 can be expressed as p p p p ⎡ ∂Σ p ⎤ ⎡ ∂Σ p ⎤ −1 A1 = ∑ ∑ ∑∑ wif , jg tr ⎢M (M′Σ p M ) −1 M′ ⎥ tr ⎢M (M′Σ p M ) M′ ⎥, ∂σ if ⎦⎥ ⎣⎢ ∂σ jg ⎦⎥ i =1 f =1, j =1 g =1, ⎣⎢ i≤ f j≤g and similarly, p p p p ⎡ ∂Σ p ⎤ ∂Σ p M (M′Σ p M ) −1 M′ A2 = ∑ ∑ ∑∑ wif , jg tr ⎢M (M′Σ p M ) −1 M′ ⎥. ∂σ jg ⎥⎦ ∂σ if i =1 f =1, j =1 g =1, ⎢⎣ i≤ f j≤g Let M(M′Σ p M) −1 M′ = Τ. Since Τ = {tij } p× p is a symmetric matrix, then ⎛ ∂Σ p tr ⎜ Τ ⎜ ∂σ if ⎝ for i = f ⎞ ⎧⎪tii ⎟⎟ = ⎨ ⎠ ⎩⎪2tif ⎛ ∂Σ p Or, tr ⎜ Τ ⎜ ⎝ ∂σ if for i ≠ f ⎞ 1−δif ⎧1 ⎟⎟ = 2 tif , where δ if = ⎨ ⎩0 ⎠ ⎛ ∂Σ p ∂Σ p It's easy to check that tr ⎜ T T ⎜ ∂σ ∂σ jg if ⎝ for i = f (6.11) for i ≠ f ⎞ 1−δ if −δ jg (tig t jf + t jg tij ) ⎟⎟ = 2 ⎠ (6.12) Moreover, analogous to the proof given for expression (4.6) in the Hotelling Τ 2 case, σ igσ jf + σ fgσ ij (6.13) wif , jg = Cov[σˆ if , σˆ jg ] = (n − q) Utilizing expressions (6.11) and (6.13), p p p p A1 = ∑ ∑ ∑∑ σ igσ jf + σ fgσ ij (n − q) i =1 f =1, j =1 g =1, i≤ f j≤g p p p p = ∑∑∑∑ σ igσ jf + σ fgσ ij (n − q) i =1 f =1 j =1 g =1 2 2 −δ if −δ jg tif t jg tif t jg Note In the expression above for A1 , we divided by 2 the summations repeat wif , jg 2 ⇒ A1 = 1 ⎧⎪ p p ⎛ p ⎨∑∑ ⎜ ∑ σ ig tif n − q ⎪⎩ f =1 g =1 ⎝ i =1 2 −δ if −δ jg 2 −δ if −δ jg , and this is true because times. ⎞ p p ⎛ p ⎞⎛ p ⎞⎛ p σ t σ t ⎟ ⎜ ∑ jf jg ⎟ + ∑∑ ⎜ ∑ fg jg ⎟ ⎜ ∑ σ ij tif ⎠ ⎝ j =1 ⎠ j =1 i =1 ⎝ g =1 ⎠ ⎝ f =1 ⎞ ⎫⎪ ⎟⎬ ⎠ ⎪⎭ 82 ⎞ ⎛ p ⎞⎛ p σ t ∑∑ ⎜ ∑ ig if ⎟ ⎜ ∑ σ jf t jg ⎟ = tr(TΣ p TΣ p ), f =1 g =1 ⎝ i =1 ⎠ ⎝ j =1 ⎠ p Notice that p TΣ p = M (M′Σ p M )−1 M′Σ p and since is o.p. on ℜ(M ), then TΣ p TΣ p = TΣ p , tr( TΣ p ) = r(TΣ p ) = r(M ) = r and A1 = Hence (Seely, 2002, chapter 2). 1 2r (r + r ) = . n−q n−q (6.14) Similarly, and by utilizing expressions (6.12) and (6.13), p p p p (σ σ + σ fgσ ij ) 1−δ −δ (tig t jf + t fg tij )2 if ij A2 = ∑ ∑ ∑∑ ig jf (n − q) i =1 f =1, j =1 g =1, i≤ f p p j≤g p p = ∑∑∑∑ (σ igσ jf + σ fgσ ij ) i =1 j =1 g =1 f =1 = 2(n − q ) (tig t jf + t fg tij ) p p p p ⎧ p p p p 1 t t σ σ + ⎨∑∑∑∑ ig ig jf jf ∑∑∑∑ σ ig t fgσ jf tij 2(n − q ) ⎩ i =1 j =1 g =1 f =1 i =1 j =1 g =1 f =1 p p p p p p p p ⎫ + ∑∑∑∑ σ fg tigσ ij t jf + ∑∑∑∑ σ fg t fgσ ij tij ⎬ i =1 j =1 g =1 f =1 i =1 j =1 g =1 f =1 ⎭ = p p p p ⎧ p p ⎛ p ⎞⎛ p ⎞ 1 ⎨∑∑ σ ig tig ∑∑ σ jf t jf + ∑∑ ⎜ ∑ σ ig t fg ⎟ ⎜ ∑ σ jf tij ⎟ 2( n − q ) ⎩ g =1 i =1 f =1 j =1 i =1 f =1 ⎝ g =1 ⎠ ⎠ ⎝ j =1 p p ⎛ p ⎞⎛ p + ∑∑ ⎜ ∑ σ fg tig ⎟ ⎜ ∑ σ ij t jf i =1 f =1 ⎝ g =1 ⎠ ⎝ j =1 p p ∑∑ σ Since f =1 j =1 t = tr(TΣ p ) = r(TΣ p ) = r(M ) = r jf jf ⎛ p ⎞⎛ p σ t ⎜ ∑ fg ig ⎟ ⎜ ∑ σ ij t jf ∑∑ i =1 f =1 ⎝ g =1 ⎠ ⎝ j =1 p and p p ⎞ p p ⎫ + σ t ⎟ ∑∑ fg fg ∑∑ σ ij tij ⎬ j =1 i =1 ⎠ g =1 f =1 ⎭ ⎞ ⎟ = r (from above), ⎠ 1 r (r + 1) (r 2 + r + r + r 2 ) = . A2 = 2(n − q) n−q p therefore 6.2.3 Estimating the Denominator Degrees of Freedom and the Scale A1 = 2r , n−q and A2 = r (r + 1) , n−q and then A1 2 (lemma 6.2.2.5). = A2 r + 1 (6.15) 83 By utilizing theorems (5.3.1, 5.3.2) and corollary (5.3.3), K-R and the proposed approaches are identical, r (r + 1) − (r − 1) = n − q − r + 1 (notice that = r ), A2 where m∗ = and λ∗ = 1− r −1 r −1 n − q − r +1 A2 = 1 − = . r (r + 1) n−q n−q The estimates of the denominator degrees of freedom and the scale match the values for the exact multivariate test in corollary 6.2.2.4. So, the K-R approach produces the exact values for the general multivariate linear model considered in this section which is more general that the Hotelling T 2 model. 6.3 A Balanced Multivariate Model with a Random Group Effect This model was studied by Birkes (2006) where he showed that there exists an exact F test to test a linear hypothesis of the fixed effects under a specific condition as we will see shortly. In this section, we prove that the K-R and the proposed approaches as well give the same exact values for the denominator degrees of freedom and the scale. 6.3.1 Model Setting Suppose an experiment yields measurements on p characteristics of a sample of subjects, and suppose the subjects are in t groups each of size m . The measurements of the k -th characteristic of the j -th subject in the i -th group is denoted by yijk (i = 1,..., t , j = 1,...., m, k = 1,...., p ) . The experiment is designed so that the j -th subject in each group is associated with q covariates x jl (l = 1,...., q ) , not depending on i . The design is balanced in so far as the groups all have the same size m and the same set of covariates. Each measurement yijk includes a random group effect aik and a random subject error eijk . Notation ⎡ x1′ ⎤ Χ = ⎢⎢ ⎥⎥ , ⎢⎣ x′m ⎥⎦ ⎡ x j1 ⎤ ⎢ ⎥ xj = ⎢ ⎥, ⎢ x jq ⎥ ⎣ ⎦ ⎡ Y1 ⎤ ⎡ y ′i1 ⎤ Y = ⎢⎢ ⎥⎥ , Yi = ⎢⎢ ⎥⎥ , ⎢⎣ Yt ⎥⎦ ⎢⎣ y ′im ⎥⎦ ⎡ yij1 ⎤ ⎢ ⎥ y ij = ⎢ ⎥ , ⎢ yijp ⎥ ⎣ ⎦ 84 B = ⎡⎣β (1) β ( p) ⎤⎦ , β (k ) ⎡ β k1 ⎤ ⎢ ⎥ = ⎢ ⎥, ⎢ β kq ⎥ ⎣ ⎦ ⎡ ai1 ⎤ ⎢ ⎥ ai = ⎢ ⎥ , ⎢ aip ⎥ ⎣ ⎦ ⎡ eij1 ⎤ ⎢ ⎥ eij = ⎢ ⎥ ⎢ eijp ⎥ ⎣ ⎦ Model yijk = β ( k )′ x j + aik + eijk , hence y ij = B′x j + ai + eij . We assume the β kl are fixed unknown parameters, the ai are i.i.d. N p (0, Λ), the eij are i.i.d. N p (0, Σ p ), and the ai are independent of the eij . We also require 1m ∈ℜ( Χ). Hypothesis Η 0 : c′BM = 0, where c is q × 1 and M is p × r. We require 1′m Χ( Χ′Χ)c = 0. Other Setting To compute the Wald statistic for our problem and in order to compare the statistic mentioned above and the Wald statistic, we rearrange Y as a vector. y = vec(Y′), ⎡ τ1 ⎤ ⎡ β1i ⎤ ⎢ ⎥ ⎢ ⎥ β = ⎢ ⎥ where τ i = ⎢ ⎥ ⎢τq ⎥ ⎢ β pi ⎥ ⎣ ⎦ ⎣ ⎦ y and β are mtp ×1 and pq ×1 vectors respectively. ⎡ β11 β p1 ⎤ ⎡ τ1′ ⎤ ⎢ ⎥ ⎢ ⎥ Notice that B = ⎡⎣β β ⎤⎦ = ⎢ ⎥ = ⎢ ⎥, ⎢ β1q β pq ⎥⎦ ⎢⎣ τ′q ⎥⎦ ⎣ the design matrix for the new setting will be Χ∗ = 1t ⊗ ( Χ ⊗ Ι p ), (1) ( p) and Σ = Ι t ⊗ Δ, where Δ = Ι m ⊗ Σ p + m Ρ 1m ⊗ Λ Special Notation As in sections 4.2 and 6.2, we use the notation Ρ if , Q if , jg , R if , jg , and wif , jg . 6.3.2 Useful properties for the Model In this section, we derive several lemmas that show useful properties about the model and the testing problem. 85 ˆ =Φ ˆ . Φ A Lemma 6.3.2.1 Δ = Ι m ⊗ Σ p + mΡ1m ⊗ Λ Proof ( ) = Ι m − Ρ1m ⊗ Σ p + Ρ1m ⊗ ( Σ p + mΛ ) , ( ) Δ −1 = Ι m − Ρ1m ⊗ Σ −p1 + Ρ1m ⊗ ( Σ p + mΛ ) −1 (6.16) {( ) ⇒ ( Χ′ ⊗ Ι p )Δ −1 ( Χ ⊗ Ι p ) = ( Χ′ ⊗ Ι p ) Ι m − Ρ1m ⊗ Σ −p1 + Ρ1m ⊗ ( Σ p + mΛ ) ( ) = Χ′ Ι m − Ρ1m Χ ⊗ Σ −p1 + Χ′Ρ1m Χ ⊗ ( Σ p + mΛ ) −1 } (Χ ⊗ Ι ) p −1 Hence, Φ = ⎡⎣(1t ⊗ Χ ⊗ Ι p )′(Ι t ⊗ Δ −1 )(1t ⊗ Χ ⊗ Ι p ) ⎤⎦ ⎡⎣ ( Χ′ ⊗ Ι p )Δ −1 ( Χ ⊗ Ι p ) ⎤⎦ = t ( −1 −1 −1 ) ⎡ Χ′ Ι − Ρ Χ ⊗ Σ −1 + Χ′Ρ Χ ⊗ ( Σ + mΛ )−1 ⎤ m 1m p 1m p ⎢ ⎦⎥ =⎣ t −1 ( Χ′Χ) Χ′ Ι m − Ρ1m Χ( Χ′Χ) −1 ⊗ Σ p + ( Χ′Χ) −1 Χ′Ρ1m Χ( Χ′Χ) −1 ⊗ ( Σ p + mΛ ) = t ( ) ⎛ ∂Δ −1 ⎞ Ρ if = 1t ⊗ ( Χ′ ⊗ Ι p ) ⎜ Ι t ⊗ ⎟⎟ 1t ⊗ ( Χ ⊗ Ι p ) ⎜ σ ∂ if ⎝ ⎠ = t ( Χ′ ⊗ Ι p ) ∂Δ −1 (Χ ⊗ Ι p ) ∂σ if Hence, Ρ if ΦΡ jg = t 2 ( Χ′ ⊗ Ι p ) ∂Δ −1 ∂Δ −1 ( Χ ⊗ Ι p )Φ( Χ′ ⊗ Ι p ) ( Χ ⊗ Ι p ). ∂σ if ∂σ jg ⎛ ⎛ ∂Δ −1 ⎞ ∂Δ −1 ⎞ Qif , jg = 1′t ⊗ ( Χ′ ⊗ Ι p ) ⎜ Ι t ⊗ Ιt ⊗ Δ ) ⎜ Ιt ⊗ ( ⎟ ⎟⎟ 1t ⊗ ( Χ ⊗ Ι p ) ⎜ ⎟ ⎜ σ σ ∂ ∂ if jg ⎝ ⎠ ⎝ ⎠ −1 −1 ∂Δ ∂Δ = t ( Χ′ ⊗ Ι p ) Δ ( Χ ⊗ Ι p ). ∂σ if ∂σ jg ⇒ Q if , jg − Ρ if ΦΡ jg = t ( Χ′ ⊗ Ι p ) Consider t ( Χ ⊗ Ι p )Φ( Χ′ ⊗ Ι p ) ∂Δ −1 ∂Δ −1 ′ − ⊗ ⊗ Δ Χ Ι Φ Χ Ι ( ) ( ) (Χ ⊗ Ι p ) t { p p } ∂σ if ∂σ jg (6.17) 86 { ( ) = ( Χ ⊗ Ι p ) ( Χ′Χ) −1 Χ′ Ι m − Ρ1m Χ( Χ′Χ) −1 ⊗ Σ p } + ( Χ′Χ) −1 Χ′Ρ1m Χ( Χ′Χ)−1 ⊗ ( Σ p + mΛ ) ( Χ′ ⊗ Ι p ). ( )Ρ ) = Χ( Χ′Χ) −1 Χ′ Ι m − Ρ1m Χ( Χ′Χ) −1 Χ′ ⊗ Σ p + Χ( Χ′Χ) −1 Χ′Ρ1m Χ( Χ′Χ) −1 Χ′ ⊗ ( Σ p + mΛ ) ( = Ρ Χ Ι m − Ρ1m ( Χ ⊗ Σ p + Ρ Χ Ρ1m Ρ Χ ⊗ ( Σ p + mΛ ) ) = Ρ Χ − Ρ1m ⊗ Σ p + Ρ1m ⊗ ( Σ p + mΛ ) ( (observe that 1m ∈ ℜ( Χ)) ) ( ) ⇒ Δ − t ( Χ ⊗ Ι p )Φ( Χ′ ⊗ Ι p ) = Ι m − Ρ1m ⊗ Σ p − Ρ Χ − Ρ1m ⊗ Σ p = ( Ι m − Ρ Χ ) ⊗ Σ p Moreover, {Δ − t ( Χ ⊗ Ι p )Φ( Χ′ ⊗ Ι p )} ∂Δ −1 (Χ ⊗ Ι p ) ∂σ jg −1 ⎧ ∂ ( Σ p + mΛ ) ⎫⎪ ∂Σ −p1 ⎪ = {( Ι m − Ρ Χ ) ⊗ Σ p } ⎨ Ι m − Ρ1m ⊗ + Ρ1m ⊗ ⎬ ( Χ ⊗ Ι p ). ∂σ jg ∂σ jg ⎪⎩ ⎪⎭ ( ( ) ) ∂Σ −p1 = ( Ι m − Ρ Χ ) Ι m − Ρ1m Χ ⊗ Σ p ∂σ jg + ( Ι m − Ρ Χ ) Ρ1m Χ ⊗ Σ p ∂ ( Σ p + mΛ ) ∂σ jg −1 =0 ˆ = Φ ˆ. This implies Qif , jg − Ρ if ΦΡ jg = 0, and since R if , jg = 0, then Φ A Lemma 6.3.2.2 (Birkes, 2006) Let Fd = (tm − t − q − r + 2)t c′( Χ′Χ) −1 Χ′Yi M[M′Y′QYM ]−1 M′Yi′Χ( Χ′Χ) −1 c . r c′( Χ′Χ) −1 c t where Yi = ∑ Yi t and Q = Ρ1t ⊗ (Ι m − Ρ Χ ) + (Ι t − Ρ1t ) ⊗ (Ι m − Ρ1m ). i =1 Under H 0 , Fd ∼ Fr ,tm −t − q − r + 2 . Lemma 6.3.2.3 ( ) −1 −1 −1 ˆ t c′( Χ′Χ) Χ′Yi M M ′Σ p M M ′Yi′Χ( Χ′Χ) c , F= . c′( Χ′Χ) −1 c r ˆ is the REML estimate of Σ . where Σ p p Proof ˆ = Φ̂ (lemma 6.3.2.1), then under the null hypothesis, Since Φ A 1 ˆ ) −1 L′βˆ F = βˆ ′L(L′ΦL 87 ( ) ) Δˆ } y = {1′ ⊗ Φˆ ( Χ′ ⊗ Ι ) Δˆ } y, ˆ (1′t ⊗ Χ′ ⊗ Ι p ) Ι t ⊗ Δˆ −1 y (i) βˆ = Φ { ˆ 1′t ⊗ ( Χ′ ⊗ Ι p =Φ −1 −1 t p From (6.17), Φ ( Χ′ ⊗ Ι p ) { ( } ) = ( Χ′Χ) −1 Χ′ Ι m − Ρ1m Χ( Χ′Χ) −1 ⊗ Σ p + ( Χ′Χ) −1 Χ′Ρ1m Χ( Χ′Χ) −1 ⊗ ( Σ p + mΛ ) { ( { ( Χ′ ⊗ Ι p t } = 1 ( Χ′Χ) −1 Χ′ Ι m − Ρ1m Ρ Χ ⊗ Σ p + ( Χ′Χ) −1 Χ′Ρ1m Ρ Χ ⊗ ( Σ p + mΛ ) t = 1 ( Χ′Χ) −1 Χ′ Ρ Χ − Ρ1m ⊗ Σ p + ( Χ′Χ) −1 Χ′Ρ1m ⊗ ( Σ p + mΛ ) , t ) } ) and from (6.16), we have Φ ( Χ′ ⊗ Ι p ) Δ −1 = = { ( ) {( Ι { } 1 ( Χ′Χ) −1 Χ′ Ρ Χ − Ρ1m ⊗ Σ p + ( Χ′Χ)−1 Χ′Ρ1m ⊗ ( Σ p + mΛ ) × t 1 ( Χ′Χ)−1 Χ′ Ρ Χ − Ρ1m t ( )( Ι m m ) − Ρ1m ⊗ Σ −p1 + Ρ1m ⊗ ( Σ p + mΛ ) ) ( −1 } ) − Ρ1m ⊗ Ι p + ( Χ′Χ)−1 Χ′ Ρ Χ − Ρ1m Ρ1m ⊗ Σ p ( Σ p + mΛ ) ( ) + ( Χ′Χ) −1 Χ′Ρ1m Ι m − Ρ1m ⊗ ( Σ p + mΛ ) Σ −p1 + ( Χ′Χ) −1 Χ′Ρ1m ⊗ Ι p = 1 ( Χ′Χ) −1 Χ′ ⊗ Ι p } { t So, 1 βˆ = {1′t ⊗ ( Χ′Χ) −1 Χ′ ⊗ Ι p } y, t 1 1 L′βˆ = (c′ ⊗ M′) {1′t ⊗ ( Χ′Χ) −1 Χ′ ⊗ Ι p } y = {1′t ⊗ c′( Χ′Χ) −1 Χ′ ⊗ M′} y t t (ii ) L′ΦL 1 = (c′ ⊗ M′) ( Χ′Χ) −1 Χ′ Ι m − Ρ1m Χ( Χ′Χ) −1 ⊗ Σ p t { ( ) } + ( Χ′Χ) −1 Χ′Ρ1m Χ( Χ′Χ) −1 ⊗ ( Σ p + mΛ ) (c ⊗ M ) = { 1 c′( Χ′Χ) −1 Χ′ Ι m − Ρ1m Χ( Χ′Χ) −1 c ⊗ M′Σ p M t ( ) } +c′( Χ′Χ) −1 Χ′Ρ1m Χ( Χ′Χ) −1 c ⊗ M′ ( Σ p + mΛ ) M } −1 88 c′( Χ′Χ) −1 c ( notice that 1′m Χ( Χ′Χ) −1 c = 0) M′Σ p M t Accordingly, from (i ) and (ii ), = ( ) (6.18) −1 −1 ˆ 1 y ′{1t ⊗ Χ( Χ′Χ) c ⊗ M} M′Σ p M {1′t ⊗ c′( Χ′Χ) Χ′ ⊗ M′} y F= . rt c′( Χ′Χ) −1 c −1 { ( ˆ M y′ ⎡⎣1t ⊗ Χ( Χ′Χ) −1 c ⎤⎦ ⎡⎣1′t ⊗ c′( Χ′Χ) −1 Χ′⎤⎦ ⊗ M M′Σ p 1 = . c′( Χ′Χ) −1 c rt = ) −1 } M′ y 1 y ′ {dd′ ⊗ A} y ˆ M . , where d = 1t ⊗ Χ( Χ′Χ) −1 c, and A = M M′Σ p rt c′( Χ′Χ) −1 c ( ) −1 M′. Since y ′ {dd′ ⊗ A} y = ( vecY′ )′ {dd′ ⊗ A} vecY′ = d′YAY′d (Harville,1997, theorem 16.2.2), ( ) −1 −1 ˆ 1 {1′t ⊗ c′( Χ′Χ) Χ′} YM M′Σ p M M′Y′ {1t ⊗ Χ( Χ′Χ) c} F= . rt c′( Χ′Χ) −1 c then −1 Moreover, since {1′ ⊗ c′(Χ′Χ) t −1 Χ′} Y = ∑ c′( Χ′Χ) −1 Χ′Yi = c′( Χ′Χ) −1 Χ′∑ Yi = tc′( Χ′Χ) −1 Χ′Yi t t i =1 i =1 then ( ) −1 −1 ˆ t c′( Χ′Χ) Χ′Yi M M′Σ p M M′Yi′Χ( Χ′Χ) c F= . , r c′( Χ′Χ) −1 c −1 ˆ is the REML estimate of Σ . where Σ p p Lemma 6.3.2.4 The REML estimator for Σ p is where Proof ( ) ( Y′QY , mt − q − t + 1 ) Q = Ρ1t ⊗ ( Ι m − Ρ Χ ) + Ι t − Ρ1t ⊗ Ι m − Ρ1m . Choose Τ such that ΤΤ′ = Ι mt − Ρ1t ⊗ Ρ Χ , and Τ′Τ = Ι mt − q . Notice that ℜ(Τ′) = ℜ(Τ′Τ) ⇒ r(Τ′) = r(Τ′Τ) = mt − q, and Τ′(1t ⊗ Χ) = 0 Define Κ = Τ ⊗ Ι p . Notice that Κ ′(1t ⊗ Χ ⊗ Ι p ) = (Τ′ ⊗ Ι p )(1t ⊗ Χ ⊗ Ι p ) = Τ′(1t ⊗ Χ) ⊗ Ι p = 0 (6.19) 89 1 1 −1 = Constant − log Κ ′(Ι t ⊗ Δ)Κ − y′Κ [ Κ ′(Ι t ⊗ Δ)Κ ] Κ ′y. 2 2 Let D = Κ ′(Ι t ⊗ Δ)Κ = Κ ′(Ι t ⊗ Δ)ΚΚ ′Κ. R ( (Ι t ⊗ Δ)ΚΚ ′ = (Ι t ⊗ Δ) Ι mpt − Ρ1t ⊗ Ρ Χ ⊗ Ι p ) = Ι t ⊗ Δ − Ρ1t ⊗ ⎡⎣Δ ( Ρ Χ ⊗ Ι p ) ⎤⎦ , ( where Ι t ⊗ Δ = Ι t ⊗ Ι m ⊗ Σ p + Ρ1m ⊗ mΛ ) = Ι mt ⊗ Σ p + Ι t ⊗ Ρ1m ⊗ mΛ. ( ) Ρ1t ⊗ ⎡⎣Δ ( Ρ Χ ⊗ Ι p ) ⎤⎦ = Ρ1t ⊗ Ρ Χ ⊗ Σ p + Ρ1m ⊗ mΛ = Ρ1t ⊗ Ρ Χ ⊗ Σ p + Ρ1t ⊗ Ρ1m ⊗ mΛ. and ( ) ( ) Hence, (Ι t ⊗ Δ)ΚΚ ′ = Ι mt − Ρ1t ⊗ Ρ Χ ⊗ Σ p + Ι t ⊗ Ρ1m − Ρ1t ⊗ Ρ1m ⊗ mΛ. ⇒ D = Κ ′ ⎡⎣( Ι t ⊗ Δ ) ΚΚ ′⎤⎦ Κ {( ) ( ) Τ ⎤⎦ ⊗ Σ + ⎡⎣Τ′ ( Ι } ) = ( Τ′ ⊗ Ι p ) Ι mt − Ρ1t ⊗ Ρ Χ ⊗ Σ p + Ι t ⊗ Ρ1m − Ρ1t ⊗ Ρ1m ⊗ mΛ ( Τ ⊗ Ι p ) ( = Τ′Τ ⊗ Σ p − ⎡Τ′ Ρ1t ⊗ Ρ Χ ⎣ ) ( ) ⊗ Ρ1m Τ ⎤ ⊗ mΛ − ⎡Τ′ Ρ 1t ⊗ Ρ1m Τ ⎤ ⊗ mΛ. ⎦ ⎣ ⎦ Since from expression (6.19), we have Τ′ (1t ⊗ Χ ) = 0 then, ⎣⎡Τ′ Ρ1t ⊗ Ρ Χ Τ ⎦⎤ ⊗ Σ p = 0. p ( t ( ) ) Also, since 1m ∈ ℜ( Χ), then ⎡Τ′ Ρ1t ⊗ Ρ1m Τ ⎤ ⊗ mΛ = 0. ⎣ ⎦ 1 1 Hence, R = Constant − log D − y ′ΚD−1Κ ′y where 2 2 D = Ι mt − q ⊗ Σ p + ⎡Τ′ Ι t ⊗ Ρ1m Τ ⎤ ⊗ mΛ. ⎣ ⎦ ( ) We may re-express D as ( ) ( ) D = ⎡Ι − Τ′ ( Ι ⊗ Ρ ) Τ ⎤ ⊗ Σ + ⎡Τ′ ( Ι ⊗ Ρ ) Τ ⎤ ⊗ ( Σ + mΛ ) ⎣ ⎦ ⎣ ⎦ Notice that Τ′ ( Ι ⊗ Ρ ) Τ is a p.o., and this is true because Τ′ ( Ι ⊗ Ρ ) ΤΤ′ ( Ι ⊗ Ρ ) Τ = Τ′ ( Ι ⊗ Ρ )( Ι − Ρ ⊗ Ρ )( Ι ⊗ Ρ ) Τ = Τ′ {( Ι ⊗ Ρ − Ρ ⊗ Ρ )( Ι ⊗ Ρ )} Τ = Τ′ ( Ι ⊗ Ρ − Ρ ⊗ Ρ ) Τ D = ⎡Ι mt − q − Τ′ Ι t ⊗ Ρ1m Τ ⎤ ⊗ Σ p + ⎡Τ′ Ι t ⊗ Ρ1m Τ ⎤ ⊗ ( Σ p + mΛ ) , ⎣ ⎦ ⎣ ⎦ −1 mt − q t 1m p 1m t t −1 −1 p 1m t 1m 1m t t 1m 1m t 1t 1m t mt 1m 1t Χ t t 1m 1m 1t 1m (6.20) 90 ( {notice that Τ′ ( Ρ ) = Τ′ Ι t ⊗ Ρ1m Τ 1t ) } ⊗ Ρ1m = 0 . Moreover, ( ) ( ) ( ) r ⎡Τ′ Ι t ⊗ Ρ1m Τ ⎤ = tr ⎡ Τ′ Ι t ⊗ Ρ1m Τ ⎤ = trace ⎡ΤΤ′ Ι t ⊗ Ρ1m ⎤ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ( ( )( ) − trace ( Ρ ( ) ) = tr ⎡ Ι mt − Ρ1t ⊗ Ρ Χ Ι t ⊗ Ρ1m ⎤ ⎣ ⎦ = tr Ι t ⊗ Ρ1m { 1t ) ⊗ Ρ1m = t − 1 (6.21) ΚD−1Κ ′ = ( Τ ⊗ Ι p ) ⎡⎣Ι mt − q − Τ′ Ι t ⊗ Ρ1m Τ ⎤⎦ ⊗ Σ −p1 ( ) ) ΤΤ′⎦⎤ ⊗ Σ ( )( )( + ⎡⎣Τ′ Ι t ⊗ Ρ1m Τ ⎤⎦ ⊗ ( Σ p + mΛ ) ( −1 } (Τ′ ⊗ Ι ) p ) −1 ⎡ ′ ′⎤ = ⎡ΤΤ′ − ΤΤ′ Ι t ⊗ Ρ1m p + ⎣ ΤΤ Ι t ⊗ Ρ 1m ΤΤ ⎦ ⊗ ( Σ p + mΛ ) , ⎣ where ⎡⎣ΤΤ′ − ΤΤ′ Ι t ⊗ Ρ1m ΤΤ′⎤⎦ ⊗ Σ −p1 = ⎡Ι mt − Ρ1t ⊗ Ρ Χ − Ι mt − Ρ1t ⊗ Ρ Χ Ι t ⊗ Ρ1m Ι mt − Ρ1t ⊗ Ρ Χ ⎤ ⊗ Σ −p1 ⎣ ⎦ ( ) ( −1 ) = ⎡⎣Ι mt − Ρ1t ⊗ Ρ Χ − Ι t ⊗ Ρ1m − Ρ1t ⊗ Ρ1m ⎤⎦ ⊗ Σ −p1 , ( ) ⎡ ΤΤ′ Ι t ⊗ Ρ1 ΤΤ′⎤ ⊗ ( Σ p + mΛ ) m ⎣ ⎦ and ( −1 ) = Ι t ⊗ Ρ1m − Ρ1t ⊗ Ρ1m ⊗ ( Σ p + mΛ ) −1 Hence, ΚD−1Κ ′ ( ) = ⎡⎣Ι mt − Ρ1t ⊗ Ρ Χ − Ι t ⊗ Ρ1m + Ρ1t ⊗ Ρ1m ⎤⎦ ⊗ Σ −p1 + Ι t ⊗ Ρ1m − Ρ1t ⊗ Ρ 1m ⊗ ( Σ p + mΛ ) Notice that −1 Ι mt − Ρ1t ⊗ Ρ Χ − Ι t ⊗ Ρ1m − Ρ1t ⊗ Ρ1m = Ρ1t ⊗ Ι m − Ρ1t ⊗ Ρ Χ + Ι mt − Ι t ⊗ Ρ1m + Ρ1t ⊗ Ρ1m − Ρ1t ⊗ Ι m ( ) ( = Ρ1t ⊗ ( Ι m − Ρ Χ ) + Ι t − Ρ1t ⊗ Ι m − Ρ1m ( ) ) ⇒ ΚD−1Κ ′ = Q ⊗ Σ −p1 + Ι t ⊗ Ρ1m − Ρ1t ⊗ Ρ1m ⊗ ( Σ p + mΛ ) , ( −1 ) ( (6.22) ) where Q = Ρ1t ⊗ ( Ι m − Ρ Χ ) + Ι t − Ρ1t ⊗ Ι m − Ρ1m . ( ) ( ) Since Τ′ Ι t ⊗ Ρ1m Τ, Ι mt − q − Τ′ Ι t ⊗ Ρ1m Τ, Σ p , and Σ p + mΛ are n.n.d. matrices, ( ) ( ) so is ⎡Ι mt − q − Τ′ Ι t ⊗ Ρ1m Τ ⎤ ⊗ Σ p and ⎡Τ′ Ι t ⊗ Ρ1m Τ ⎤ ⊗ ( Σ p + mΛ ) , (Harville, ⎣ ⎦ ⎣ ⎦ 1997, P.369) 91 ( ) ⎡Ι mt − q − Τ′ Ι t ⊗ Ρ1 Τ ⎤ ⊗ Σ p = GG ′, m ⎣ ⎦ Hence, ( ) ⎡Τ′ Ι t ⊗ Ρ1 Τ ⎤ ⊗ ( Σ p + mΛ ) = HH′, m ⎣ ⎦ for some full column rank matrices G and H (Seely, 2002, Problem 2.D.2). and Notice that since {⎡⎣Ι ( }{ ( ) } ) − Τ′ Ι t ⊗ Ρ1m Τ ⎤ ⊗ Σ ⎡Τ′ Ι t ⊗ Ρ1m Τ ⎤ ⊗ ( Σ p + mΛ ) = 0, ⎦ ⎣ ⎦ then GG ′HH′ = 0 ⇒ G ′GG ′HH′H = 0 ⇒ G ′H = 0. mt − q ( ) ( ) Hence, ⎡Ι mt − q − Τ′ Ι t ⊗ Ρ1m Τ ⎤ ⊗ Σ + ⎡Τ′ Ι t ⊗ Ρ1m Τ ⎤ ⊗ ( Σ p + mΛ ) = GG ′ + HH′ ⎣ ⎦ ⎣ ⎦ G′G 0 ⎡G ′ ⎤ ⎡G ′ ⎤ = [G, H ] ⎢ ⎥ = ⎢ ⎥ [G, H ] = = G′G H′H ′ ′ ′ H H 0 H H ⎣ ⎦ ⎣ ⎦ ( ) ( ) Recall that Τ′ Ι t ⊗ Ρ1m Τ is a projection operation, and r ⎡Τ′ Ι t ⊗ Ρ1m Τ ⎤ = t − 1, ⎣ ⎦ G ′G = Σ p and hence, mt − q −t +1 , and H′H = Σ p + mΛ t −1 . In addition, by using expression (6.22), we obtain ( ) −1 y′ΚD−1Κ ′y = y′ ( Q ⊗ Σ −p1 ) y + y ′ ⎡⎢ Ι t ⊗ Ρ1m − Ρ1t ⊗ Ρ1m ⊗ ( Σ p + mΛ ) ⎤⎥ y. ⎣ ⎦ 1 − − + mt q t 1 1 Accordingly, R = Constant − log Σ p − y ′ ( Q ⊗ Σ −p1 ) y + f ( Σ p + mΛ ). 2 2 Since Σ p and Λ are unstructured, then Σ p and Σ p + mΛ may be considered as different random variables. This means that to maximize maximize ∗ R R with respect to Σ p , it sufficies to with respect to Σ p , 1 1 = − (mt − q − t + 1) log Σ p − y′ ( Q ⊗ Σ −p1 ) y. 2 2 y′ ( Q ⊗ Σ −1 ) y = ( vecY′ )′ ( Q ⊗ Σ −1 ) vecY′ ∗ R where p p = tr ( YΣ −p1Y′Q ) (Harville, 1997, theorem 16.2.2). = tr ( Σ −p1Y′QY ) So, ∗ R 1 1 = − (mt − q − t + 1) log Σ p − tr ( Σ −p1Y′QY ) , 2 2 ˆ = and hence Σ p Y′QY (Anderson, 2003, lemma 3.2.2). mt − q − t + 1 92 Note We have two expressions for Q : ( ) ( ) Q = Ρ1t ⊗ ( Ι m − Ρ Χ ) + Ι t − Ρ1t ⊗ Ι m − Ρ1m , and ( ) Q = ΤΤ′ − ΤΤ′ Ι t ⊗ Ρ1m ΤΤ′ Q is a p.o. and r(Q) = mt − q − t + 1 Corollary 6.3.2.5 mt − t − q − r + 2 F = Fd . mt − q − t + 1 Proof By combining the results of lemmas (6.3.23) and (6.3.2.4), we obtain −1 −1 t (mt − q − t + 1) c′( Χ′Χ) Χ′Yi M ( M′Y′QYM ) M′Yi′Χ( Χ′Χ) c F= . , c′( Χ′Χ)−1 c r mt − t − q − r + 2 F = Fd . and therefore mt − q − t + 1 −1 Corollary 6.3.2.6 Under the null hypothesis, mt − t − q − r + 2 F ∼ F(r , tm − t − q − r + 2) mt − q − t + 1 Proof Direct from lemma 6.3.2.2 and corollary 6.3.2.5. Lemma 6.3.2.7 For this model, the approximation and the exact approaches to derive W = Var[σˆ REML ] are identical. Proof ∗ R Approximation approach 1 1 = − (mt − q − t + 1) log Σ p − y ′ ( Q ⊗ Σ −p1 ) y, 2 2 ⎛ ∂Σ p 1 ∂ ∗R = − (mt − q − t + 1)tr ⎜ Σ −p1 ⎜ ∂σ if ∂σ if 2 ⎝ ⎞ 1 ⎛ ∂Σ −p1 ⎞ ′ − y Q ⊗ ⎟⎟ ⎜⎜ ⎟⎟ y, σ 2 ∂ if ⎠ ⎝ ⎠ ⎛ ∂Σ −p1 ∂Σ p 1 ∂ 2 ∗R = − (mt − q − t + 1)tr ⎜ ⎜ ∂σ ∂σ ∂σ if ∂σ jg 2 jg if ⎝ ⎞ 1 ⎛ ∂ 2 Σ −p1 ⎟⎟ − y ′ ⎜⎜ Q ⊗ ∂σ if ∂σ jg ⎠ 2 ⎝ ⎡ ∂ 2 ∗R ⎤ 1 ⎛ ∂Σ −p1 ∂Σ p mt q t ( 1)tr ⇒ −E ⎢ = − − + ⎜⎜ ⎥ ⎣⎢ ∂σ if ∂σ jg ⎦⎥ 2 ⎝ ∂σ jg ∂σ if ⎞ ⎟⎟ ⎠ ⎞ ⎟⎟ y ⎠ 93 ⎛ ∂ 2 Σ −p1 1 ⎪⎧ + ⎨E( y′) ⎜ Q ⊗ ⎜ 2⎪ ∂σ if ∂σ jg ⎝ ⎩ ⎛ ∂ 2 Σ −p1 where E( y ′) ⎜ Q ⊗ ⎜ ∂σ if ∂σ jg ⎝ ⎡⎛ ⎞ ∂ 2 Σ −p1 y Q E( ) tr + ⊗ ⎢⎜⎜ ⎟⎟ ∂σ if ∂σ jg ⎢⎣⎝ ⎠ ⎤ ⎪⎫ ⎞ ⎟⎟ ( Ι t ⊗ Δ ) ⎥ ⎬ ⎥⎦ ⎭⎪ ⎠ ⎞ ⎛ ∂ 2 Σ −p1 ⎟⎟ E( y ) = β′ (1′t ⊗ Χ′ ⊗ Ι p ) ⎜⎜ Q ⊗ ∂σ if ∂σ jg ⎠ ⎝ ⎞ ⎟⎟ (1t ⊗ Χ ⊗ Ι p ) β ⎠ ⎡ ∂ 2 Σ −p1 ∂ 2 Σ −p1 ⎤ ′ ′ ′ ′ ′ = β ⎢1 Ρ1t 1t ⊗ Χ ( Ι m − Ρ Χ ) Χ ⊗ + 1 Ι t − Ρ1t 1t ⊗ Χ ( Ι m − Ρ Χ ) Χ ⊗ ⎥β = 0 ∂σ if ∂σ jg ∂σ if ∂σ jg ⎥⎦ ⎢⎣ ( ) ⎡ ∂ 2 ∗R ⎤ 1 ⎛ ∂Σ −p1 ∂Σ p ⇒ −E ⎢ ⎥ = (mt − q − t + 1)tr ⎜⎜ ⎢⎣ ∂σ if ∂σ jg ⎥⎦ 2 ⎝ ∂σ jg ∂σ if Moreover, ⎛ ∂ 2 Σ −p1 ⎜⎜ Q ⊗ ∂σ if ∂σ jg ⎝ ⎞ 1 ⎡⎛ ∂ 2 Σ −p1 ⎟⎟ + tr ⎢⎜⎜ Q ⊗ ∂σ if ∂σ jg ⎠ 2 ⎣⎢⎝ ⎤ ⎞ ⎟⎟ ( Ι t ⊗ Δ ) ⎥ . ⎠ ⎦⎥ ⎞ ⎟⎟ ( Ι t ⊗ Δ ) = ⎠ ⎡ ∂ 2 Σ −p1 ∂ 2 Σ −p1 ⎤ + Ι t − Ρ1t ⊗ Ι m − Ρ1m ⊗ ⎢ Ρ1t ⊗ ( Ι m − Ρ Χ ) ⊗ ⎥× ∂σ if ∂σ jg ∂σ if ∂σ jg ⎥⎦ ⎢⎣ ( (Ι ⊗ Ι t = Ρ1t ⊗ ( Ι m − Ρ Χ ) ⊗ =Q⊗ ∂ 2 Σ −p1 ∂σ if ∂σ jg ∂ Σ 2 −1 p ∂σ if ∂σ jg m ) ( ) ⊗ Σ p + Ι t ⊗ mΡ1m ⊗ Λ ( ) ( ) ) Σ p + Ι t − Ρ1t ⊗ Ι m − Ρ1m ⊗ ∂ 2 Σ −p1 ∂σ if ∂σ jg Σp Σp, and hence, ⎡⎛ ⎤ ⎛ ∂ 2 Σ −p1 ⎞ ⎛ ∂ 2 Σ −p1 ⎞ ∂ 2 Σ −p1 ⎞ Ι t ⊗ Δ ) ⎥ = tr(Q)tr ⎜ Σ p ⎟ = (mt − q − t + 1)tr ⎜ Σ p ⎟. tr ⎢⎜ Q ⊗ ( ⎟ ⎜ ∂σ ∂σ ⎟ ⎜ ∂σ ∂σ ⎟ ∂σ if ∂σ jg ⎟⎠ ⎢⎣⎜⎝ ⎥⎦ ⎝ if jg ⎠ ⎝ if jg ⎠ ⎡ ∂ 2 ∗R ⎤ 1 ⎛ ∂Σ −p1 ∂Σ p ⎞ 1 ⎛ ∂ 2 Σ −p1 ⎞ Σ p ⎟. ⇒ −E ⎢ ⎟⎟ + (mt − q − t + 1)tr ⎜⎜ ⎥ = (mt − q − t + 1)tr ⎜⎜ ⎟ ⎣⎢ ∂σ if ∂σ jg ⎦⎥ 2 ⎝ ∂σ jg ∂σ if ⎠ 2 ⎝ ∂σ if ∂σ jg ⎠ Notice that since ∂ 2 Σ −p1 ∂σ if ∂σ jg ⎛ ∂Σ −1 ∂Σ p ⎞ ∂Σ p ∂Σ −p1 Σp ⎟, Σp = −⎜ p + Σ −p1 ⎜ ∂σ ∂σ ⎟ ∂σ if ∂σ jg if ⎝ jg ⎠ ⎛ ∂ 2 Σ −p1 ⎞ ⎛ ∂Σ −p1 ∂Σ p ⎞ Σ p ⎟ = −2tr ⎜ then, tr ⎜ . ⎜ ∂σ ∂σ ⎟ ⎜ ∂σ ∂σ ⎟⎟ jg if ⎠ ⎝ if jg ⎠ ⎝ ⎡ ∂ 2 ∗R ⎤ ⎛ ∂Σ −p1 ∂Σ p 1 ⇒ −E ⎢ = − − − + mt q t ( 1)tr ⎜⎜ ⎥ 2 ⎢⎣ ∂σ if ∂σ jg ⎥⎦ ⎝ ∂σ jg ∂σ if ⎞ 1 ⎛ −1 ∂Σ p −1 ∂Σ p Σp ⎟⎟ = (mt − q − t + 1)tr ⎜⎜ Σ p ∂ ∂σ if 2 σ jg ⎠ ⎝ ⎞ ⎟⎟ ⎠ 94 ⎛ ∂Σ p −1 ∂Σ p Since from expression (4.7), tr ⎜ Σ −p1 Σp ⎜ ∂σ jg ∂σ if ⎝ ⎞ 1−δif −δ jg ig jf ⎟⎟ = 2 (σ σ + σ fgσ ij ) , ⎠ { } 2 ⎡ 21−δif −δ jg σ igσ jf + σ fgσ ij then W ≈ ( ) mt − q − t + 1 ⎢⎣ p ⎧1 With observing that ∑ σ ijσ if = ⎨ i =1 ⎩0 { for j = f , it can be seen that for j ≠ f −1 ⎤ = 1 ⎡ σ σ +σ σ p ( ig if fg ij )i, f , j , g =1 ⎤⎥⎦ , i , f , j , g =1 ⎥ ⎦ 2 ⎢⎣ } ⎡ 21−δif −δ jg σ igσ jf + σ fgσ ij ( ) ⎢⎣ −1 ⎤ . i , f , j , g =1 ⎥ ⎦ p p and hence, W≈ 1 ⎡ (σ σ + σ σ ) p ⎤ ig if fg ij i , f , j , g =1 ⎥ ⎦ mt − q − t + 1 ⎢⎣ (6.23) Exact Approach ⎡ y ′i1 ⎤ ⎡ x1′B + a′i + e′i1 ⎤ ⎥, Since Yi = ⎢⎢ ⎥⎥ = ⎢⎢ ⎥ ⎢⎣ y′im ⎥⎦ ⎢⎣ x′m B + a′i + e′im ⎥⎦ then we may express Yi as Yi = ΧB + 1m a′i + Ei where Ei = [ei1 ,...., eim ]′ ⎡ Y1 ⎤ and Y = ⎢⎢ ⎥⎥ = 1t ⊗ ΧB + G + E ⎢⎣ Yt ⎥⎦ ⇒ QY = Q (1t ⊗ ΧB ) + QG + QE ⎡1m a1′ ⎤ ⎡ E1 ⎤ ⎥ ⎢ ⎥ where E = ⎢ ⎥ and G = ⎢⎢ ⎥ ⎢⎣1m a′t ⎥⎦ ⎢⎣ Et ⎥⎦ Observe that Q (1t ⊗ ΧB ) = Q (1t ⊗ Χ )( Ι t ⊗ B ) , ( ) but Q (1t ⊗ Χ ) = ΤΤ′ (1t ⊗ Χ ) − ΤΤ′ Ι t ⊗ Ρ1m ΤΤ′ (1t ⊗ Χ ) = 0, and hence, Q (1t ⊗ ΧB ) = 0. ( ) ( ) Also, QG = ⎡⎣ Ρ1t ⊗ ( Ι m − Ρ Χ ) + Ι t − Ρ1t ⊗ Ι m − Ρ1m ⎤⎦ G ( ) = Q ⎡Ι t ⊗ Ι m − Ρ1m ⎤ G = 0 ⎣ ⎦ We conclude that QG = 0, and so we have QY = QE. Since Q is idempotent then, Y′QY = E′QE In addition, since E is a data matrix from N p (0,Σ p ), and Q is idempotent with r(Q) = mt − q − t + 1, 95 then E′QE = Y′QY ∼ Wp (Σ p ,mt − q − t + 1) (Mardia, 1992, theorem 3.4.4). Analogous to the argument given for Hotelling T 2 case in section 4.2.4 that leads to expression (4.6), we have wif , jg = σ igσ jf + σ fgσ ij (6.24) mt − q − t + 1 From expressions (6.23) and (6.24), we can see that the approximation and the exact methods give the same estimate for wif , jg . A1 = Lemma 6.3.2.8 Proof 2r mt − q − t + 1 and A2 = r (r + 1) . mt − q − t + 1 From expression (6.18), we found that L′ΦL = c′( Χ′Χ) −1 c M′Σ p M. t t (c ⊗ M )(M′Σ p M ) −1 (c′ ⊗ M′) c′( Χ′Χ) −1 c t cc′ ⊗ M (M′Σ p M )−1 M′, = −1 c′( Χ′Χ) c and from expression (6.17), we obtain ⇒ Θ = L(L′ΦL) −1 L′ = ( ) ( Χ′Χ) −1 Χ′ Ι m − Ρ1m Χ( Χ′Χ) −1 ∂Σ p ∂Φ . = −ΦΡ if Φ = − ⊗ t ∂σ if ∂σ if Hence, tr(ΘΦΡ if Φ) ( ) ⎧ ⎡ ( Χ′Χ) −1 Χ′ Ι m − Ρ1 Χ( Χ′Χ) −1 ∂Σ ⎤ ⎫⎪ −t ⎪ m −1 = ⊗ p ⎥⎬ tr ⎨ ⎣⎡cc′ ⊗ M (M′Σ p M ) M′⎦⎤ ⎢ −1 c′( Χ′Χ) c ⎪ t ∂σ if ⎥ ⎪ ⎢⎣ ⎦⎭ ⎩ = = ⎡ ∂Σ p ⎤ −1 tr ⎢cc′( Χ′Χ) −1 Χ′ Ι m − Ρ1m Χ( Χ′Χ) −1 ⊗ M (M′Σ p M ) −1 M ′ ⎥ −1 c′( Χ′Χ) c ⎢⎣ ∂σ if ⎥⎦ ( ( ) −c′( Χ′Χ) −1 Χ′ Ι m − Ρ1m Χ( Χ′Χ) −1 c c′( Χ′Χ) −1 c ⎡ ∂Σ p ⎤ = − tr ⎢M (M′Σ p M) −1 M′ ⎥ . ∂σ if ⎦⎥ ⎣⎢ ) ⎡ ∂Σ ⎤ tr ⎢M (M′Σ p M )−1 M′ p ⎥ ∂σ if ⎦⎥ ⎣⎢ (notice that Ρ1m Χ( Χ′Χ)−1 c = 0) 96 Accordingly, A1 and A2 can be expressed as ⎡ ∂Σ p ⎤ ⎡ ∂Σ p ⎤ −1 A1 = ∑∑∑∑ wif , jg tr ⎢M (M′Σ p M )−1 M′ ⎥ tr ⎢M (M′Σ p M ) M′ ⎥, ∂σ if ⎥⎦ ⎢⎣ ∂σ jg ⎥⎦ g =1 f =1 i =1 j =1 ⎢⎣ p p p p p p p p ⎡ ∂Σ ∂Σ p ⎤ A2 = ∑∑∑∑ wif , jg tr ⎢M (M′Σ p M )−1 M′ p M (M′Σ p M ) −1 M′ ⎥. ∂σ if ∂σ jg ⎦⎥ g =1 f =1 i =1 j =1 ⎣⎢ Analogous to the proof of lemma 6.2.2.5, we obtain A1 = A2 = 2r , mt − q − t + 1 r (r + 1) . mt − q − t + 1 6.3.3 Estimating the Denominator Degrees of Freedom and the Scale Factor A1 = 2r , mt − q − t + 1 and A2 = r (r + 1) , mt − q − t + 1 and then A1 2 = (lemma 6.3.2.8). A2 r + 1 By utilizing theorems (5.3.1 and 5.3.2) and corollary (5.3.3), K-R and the proposed approaches are identical, r (r + 1) where m∗ = − (r − 1) = mt − q − t − r + 2 , A2 and λ∗ = 1− r −1 r −1 mt − q − t − r + 2 A2 = 1 − . = r (r + 1) mt − q − t + 1 mt − q − t + 1 The estimates of the denominator degrees of freedom and the scale factor match the values for the exact multivariate test in corollary 6.3.2.6. 97 7. THE SATTERTHWAITE APPROXIMATIONS Another method to approximate F tests is the Satterthwaite method. Based on the original Satterthwaite’s approximation (1941), the method was developed by Giesbrecht and Burns (1985) and then by Fai and Cornelius (1996). In this chapter, we do not intend to investigate the theoretical derivation of the method. However, we present some useful theoretical results. 7.1 The Satterthwaite Method (SAS Institute Inc., 2002-2006) In order to approximate the F test for the fixed effects, H 0 : L′β = 0 in a model as described in section 2.1, the Satterthwaite method uses the Wald-type statistic 1 ˆ ) −1 L′βˆ FS = βˆ ′L(L′ΦL (i ) The multi-dimensional case( > 1) ˆ = Ρ′DΡ where Ρ is an orthogonal First, we perform the spectral decomposition; L′ΦL matrix of eigenvectors, and D is a diagonal matrix of eigenvalues. Let ν m = 2(d m ) 2 where g′m Wg m d m is the mth diagonal element of D, and g m is the gradient of a mΦa′m with respect to σ evaluated at σˆ , where a m is the mth row of ΡL′. Notice that ⎡ ∂Φ g m = ⎢… a m a′m ∂σ i2 ⎣ ⎤ = − [ a mΦΡ i Φa′m ⎥ ⎦ σ =σˆ ] σ =σˆ Then let E=∑ m =1 νm νm − 2 I (ν m > 2) so we eliminate the terms for which ν m ≤ 2. The degrees of freedom for F are then computed as ⎧ 2E ⎪ ν = ⎨E − ⎪⎩0 for E> otherwise 98 (ii ) The one-dimensional case ( = 1) In this case, F statistic is simplified as (L′βˆ ) 2 FS = . ˆ L′ΦL Notice that we may use the t-statistic instead since the numerator degrees of freedom equals one. The denominator degrees of freedom is computed as ˆ )2 2(L′ΦL , ν= g′Wg ˆ with respect to σ. (we assume that L′β is estimable). where g is the gradient of L′ΦL 7.2 The K-R, Satterthwaite, and Proposed Methods In this section, we provide some useful lemmas that show the relationship among the Satterthwaite, K-R and the proposed methods. When = 1, the Satterthwaite, the K-R and the proposed methods give Lemma 7.2.1 the same estimate of the denominator degrees of freedom. Proof The K-R and the proposed methods give the same estimate of the denominator degrees of freedom which is r 2 (corollary 5.3.5). A1 r A1 = ∑∑ wij tr(ΘΦΡ i Φ)tr(ΘΦΡ j Φ), Θ= i =1 j =1 LL′ L′ΦL r r ⎛ LL′ΦΡ i Φ ⎞ ⎛ LL′ΦΡ j Φ ⎞ = ∑∑ wij tr ⎜ ⎟ ⎟ tr ⎜ ⎝ L′ΦL ⎠ ⎝ L′ΦL ⎠ i =1 j =1 r r L′ΦΡ i ΦLL′ΦΡ j ΦL = ∑∑ wij (L′ΦL) 2 i =1 j =1 ⇒ 2 = A1 2(L′ΦL) 2 r r ∑∑ w L′ΦΡ ΦLL′ΦΡ ΦL i =1 j =1 ij By noticing that i = 2(L′ΦL) 2 where g = [L′ΦΡ i ΦL]r×1 g′Wg j ∂Φ = −ΦΡ i Φ, the proof is completed. ∂σ i 99 Note Even though the estimates of the denominator degrees of freedom are the same, the true levels do not have to be the same for the Satterthwaite method and the K-R and proposed methods. This is because the statistics used are not the same. The Satterthwaite statistic uses Φ̂ as the estimator of the variance-covariance matrix of the fixed effects estimator, whereas the K-R and the proposed methods statistic uses Φ̂A . The following corollary clarifies this fact. Corollary 7.2.2 ˆ =Φ ˆ , then the Satterthwaite, K-R and proposed When = 1 , and Φ A methods are identical. Proof From lemma 7.2.1, the Satterthwaite, K-R and proposed methods give the same estimate of the denominator degrees of freedom. Since the scale is 1 (corollary 5.3.5) and ˆ =Φ ˆ , then we have the same statistics for all approaches. This shows that the four Φ A approaches are identical. Corollary 7.2.3 In balanced mixed classification models, and when = 1 , the Satterthwaite, K-R and proposed methods are identical. The denominator degrees of freedom and the scale estimate are 2 and 1 respectively. A2 Proof: Direct form corollaries 5.3.5, 6.1.1.2, and 7.2.2. Lemma 7.2.4 When = 1, then the Satterthwaite statistic FS ≥ the K-R and the proposed methods statistic F for variance components models. Proof ˆ L ≥ L′ΦL ˆ . It suffices to show that L′Φ A r r ˆ − Ρˆ ΦΡ ˆ =Φ ˆ + 2Φ ˆ ⎧⎨∑∑ wˆ (Q ˆ ˆ ) ⎫⎬ Φ ˆ Since Φ A ij ij i j = = 1 1 i j ⎩ ⎭ r r ˆ − Ρˆ ΦΡ ˆ −Φ ˆ = 2Φ ˆ ⎧⎨∑∑ wˆ (Q ˆ ˆ ) ⎫⎬ Φ ˆ ⇒Φ A ij ij i j ⎩ i =1 j =1 ⎭ Notice that 100 Q ij − Ρ i ΦΡ j = Χ′Σ −1 = Χ′Σ −1 ∂Σ −1 ∂Σ −1 ∂Σ −1 ∂Σ −1 Σ Σ Χ − Χ′Σ −1 Σ Χ( Χ′Σ −1 Χ ) −1 Χ′Σ −1 Σ Χ ∂σ i ∂σ j ∂σ i ∂σ j ∂Σ ∂Σ −1 G Σ Χ, ∂σ i ∂σ j where G = Σ −1 − Σ −1Χ( Χ′Σ -1 Χ) −1 Χ′Σ −1 and hence, ⎧ r r ⎫ ∂Σ Φ A − Φ = 2Φ ⎨∑∑ wij Bi GB′j ⎬ Φ where Bi = Χ′Σ −1 ∂σ i ⎩ i =1 j =1 ⎭ = 2ΦBMB′Φ, where B = [B1 ,...., B r ] and M = W ⊗ G Since W and G are both n.n.d, so is M ( Harville, 1997, P.367), and hence so is BMB′. ( ) ˆ −Φ ˆ is n.n.d., and hence L′ Φ ˆ −Φ ˆ L ≥ 0. So Φ A A Definition The Loewner ordering of symmetric matrices (Pukelsheim, 1993, chapter 1): For A and B symmetric matrices, we say A ≥ B when A − B is n.n.d . Lemma 7.2.5 Proof For A and B p.d. matrices, and A ≥ B, then A −1 ≤ B −1 Since A ≥ B, then A = B + V = B + CC′ for some n.n.d. matrix V ⇒ A −1 = B −1 − B −1C ( C′B −1C + Ι ) C′B −1 (theorem 1.7, Schott, 2005) −1 ⇒ B −1 − A −1 = B −1C ( C′B −1C + Ι ) C′B −1 which is n.n.d, and hence B −1 ≥ A −1. −1 Note Theorem 7.2.4 is applicable for any for the variance components models as long as the scale is less or equal to 1, and this is true because from lemma 7.2.5, we have ( ˆ L ≥ L′ΦL ˆ ⇒ L′Φ ˆ L L′Φ A A ˆ ) ) ≤ ( L′ΦL −1 −1 , and hence F ≤ FS . Consider the partition of the design matrix Χ as Χ = [ Χ1 , Χ 2 ], Χ1 and Χ 2 are matrices of dimensions n × p1 and n × p2 respectively where p = p1 + p2 . ⎡ Χ1′ Σ −1Χ1 Χ′Σ Χ = ⎢ −1 ⎣ Χ′2 Σ Χ1 −1 Χ1′ Σ −1Χ 2 ⎤ ⎥ Χ′2 Σ −1Χ 2 ⎦ Let Χ 2 to be correspondent to the fixed effects that of our interest to be tested. 101 Consider the case where (a) Χ1′ Σ −1Χ 2 = 0. (b) Χ′2 Σ −1Χ 2 = f (σ ) A where A is a fixed matrix. A is invertible when Χ is a full column rank matrix. Lemma 7.2.6 For a model that satisfies conditions (a) and (b) mentioned above, we have A1 = A2 . Proof 0⎤ ⎡C According to above, Χ′Σ −1Χ = ⎢ 1 ⎥ ⎣ 0 C2 ⎦ where C2 = f (σ ) A ⎡C−1 0 ⎤ ⇒ Φ = ( Χ′Σ -1 Χ) −1 = ⎢ 1 −1 ⎥ ⎣ 0 C2 ⎦ Also, since L′ = ⎡⎣0 × p1 B′ × p2 ⎤⎦ , then ⎡C−1 0 ⎤ ⎡ 0 ⎤ = B′C−21B, L′ΦL = [ 0 B′] ⎢ 1 −1 ⎥ ⎢ ⎥ B 0 C ⎣ 2 ⎦⎣ ⎦ ⎡ 0 p1× ⎤ 0 ⎡0 ⎤ −1 −1 Θ = L(L′ΦL) −1 L′ = ⎢ , ⎥ (B′C2 B) ⎣⎡0 × p1 B′ × p2 ⎦⎤ = ⎢ ⎥ −1 −1 ⎢⎣B p2 × ⎥⎦ ⎣0 B(B′C2 B) B′⎦ 0 0 ⎡ C −1 0 ⎤ ⎡ 0 ⎤ ⎡C1−1 0 ⎤ ⎡ 0 ⎤ =⎢ ΦΘΦ = ⎢ 1 ⎢ ⎥ −1 −1 −1 −1 −1 −1 ⎥ −1 ⎥ ⎢ −1 ⎥ ⎣ 0 C2 ⎦ ⎣0 B(B′C2 B) B′⎦ ⎣ 0 C2 ⎦ ⎣0 C2 B(B′C2 B) B′C2 ⎦ ⎡ ∂C1 ⎤ 0 ⎢ ⎥ ∂σ i ∂Σ −1 ⎢ ⎥ then Moreover, since Ρ i = Χ′ Χ= ∂C2 ⎥ ⎢ ∂σ i ⎢ 0 ∂σ i ⎥⎦ ⎣ ⎡ ∂C1 ⎤ 0 ⎥ ⎡0 0 ⎤ ⎢ 0 ⎡0 ⎤ ⎢ ∂σ i ⎢ ⎥ ⎥ ΦΘΦΡ i = ⎢ = −1 −1 −1 −1 ∂C 2 −1 −1 −1 −1 ⎥ ⎢ ⎥ ′ ′ 0 C B B C B B C ( ) ′ ′ 0 C B B C B B C ( ) C ∂ ⎢ ⎥ 2 2 2 2 2 2 ⎦ ⎣ 2 ∂σ i ⎥⎦ ⎢⎣ ⎢ 0 ⎥ ∂σ i ⎦ ⎣ ⎛ ∂C ⎞ So, tr(ΘΦΡ iΦ) = tr ⎜ C2−1B(B′C−21B) −1 B′C−21 2 ⎟ ∂σ i ⎠ ⎝ Since C2 = f (σ ) A, then C−2 1 ∂C2 A −1 ∂f (σ ) 1 ∂f (σ ) = A= Ι ∂σ i f (σ ) ∂σ i f (σ ) ∂σ i (7.1) 102 So, expression (7.1) above can be simplified as tr(ΘΦΡ i Φ) = = 1 ∂f (σ ) tr ( C −21B(B′C−21B ) −1 B′ ) f (σ ) ∂σ i 1 ∂f (σ ) ∂f (σ ) tr ( B′C−21B(B′C−21B) −1 ) = f (σ ) ∂σ i f (σ ) ∂σ i ⇒ tr(ΘΦΡ i Φ)tr(ΘΦΡ j Φ) = (7.2) ∂f (σ ) ∂f (σ ) [ f (σ )] ∂σ i ∂σ j 2 (7.3) 2 Also, 0 0 ⎤ ⎡0 ⎤ ⎡0 ⎢ ⎢ ⎥ ∂C ⎥ ΦΘΦΡ i ΦΘΦΡ j = ⎢0 C2−1B(B′C−21B) −1 B′C−21 ∂C2 ⎥ ⎢0 C2−1B(B′C−21B) −1 B′C−21 2 ⎥ ∂σ j ⎥⎦ ∂σ i ⎥⎦ ⎢⎣ ⎢⎣ 0 ⎡0 ⎤ ⎢ ∂C ∂C ⎥ =⎢ 0 C2−1B(B′C−21B) −1 B′C−21 2 C−21B(B′C−21B) −1 B′C−21 2 ⎥ ∂σ i ∂σ j ⎥⎦ ⎢⎣ So, ⎛ ∂C ∂C ⎞ tr(ΘΦΡ i ΦΘΦΡ j Φ) = tr ⎜ C−21B(B′C−21B) −1 B′C−21 2 C−21B(B′C−21B)−1 B′C−21 2 ⎟ ⎜ ∂σ i ∂σ j ⎟⎠ ⎝ 1 ∂f (σ ) ∂f (σ ) tr ( C−21B(B′C−21B) −1 B′C−21B(B′C−21B)−1 B′ ) = 2 [ f (σ )] ∂σ i ∂σ j = 1 ∂f (σ ) ∂f (σ ) tr ( B′C−21B(B′C−21B) −1 B′C−21B(B′C−21B)−1 ) 2 [ f (σ )] ∂σ i ∂σ j = ∂f (σ ) ∂f (σ ) [ f (σ )] ∂σ i ∂σ j 2 From expressions 7.3 and 7.4, we have tr(ΘΦΡ i Φ)tr(ΘΦΡ j Φ) = tr(ΘΦΡ i ΦΘΦΡ j Φ), A1 = A2 and hence Lemma 7.2.7 For a model that satisfies conditions (a) and (b) mentioned above (i ) The K-R and the proposed approaches are identical. (ii ) The Satterthwaite, K-R and proposed approaches give the same estimate of the denominator degrees of freedom given that the estimate > 2. (7.4) 103 ˆ = Φ̂ , and the estimate of the denominator degrees of freedom >2, (iii ) If Φ A then the Satterthwaite, K-R and proposed methods are identical. (i ) Observe that A1 = A2 (lemma 7.2.6), and hence the K-R, and the proposed Proof methods are identical (corollary 5.3.3). They all give 1 and 2 as the estimates of the A2 scale and the denominator degrees of freedom respectively (theorems 5.3.1 and 5.3.2). (ii ) Claim 2d m2 2 = g′m Wg m A 2 r Proof for any m. r A1 = ∑∑ wij tr(ΘΦΡi Φ)tr(ΘΦΡ j Φ) = a′Wa where a = [tr(ΘΦΡ i Φ)]r×1 , i =1 j =1 Since form expression (7.2), tr(ΘΦΡ i Φ) = ⎡ ∂f (σ ) ∂f (σ ) ⎤ , then a = ⎢ ⎥ f (σ ) ∂σ i ⎣ f (σ ) ∂σ i ⎦ r ×1 And hence the R.H.S = 2 2 2 2 2 2[ f (σ )]2 = = = A 2 A1 a′Wa g′Wg ⎡ ∂f (σ ) ⎤ where g = ⎢ ⎥ . ∂ σ i ⎦ r ×1 ⎣ ⎡ ∂Φ ⎤ ⎡ ⎤ ∂Φ a′m ⎥ where a m is the mth row of ΡL′, then g m = ⎢b m L′ Lb′m ⎥ Since g m = ⎢a m ∂σ i ⎣ ∂σ i ⎦ r×1 ⎣ ⎦ r ×1 where b m is the mth row of Ρ. ⎡ ⎤ ⎡ −1 ∂f (σ ) ⎤ b m B′C2−1Bb′m ⎡ ∂f (σ ) ⎤ ∂C2−1 −1 Bb′m ⎥ = ⎢ b m B′A Bb′m ⎥ = − ⇒ g m = ⎢ b m B′ ⎢ ⎥ 2 f (σ ) ∂σ i ⎣ ∂σ i ⎦ r×1 ⎣ ⎦ r ×1 ⎣ [ f (σ )] ∂σ i ⎦ r ×1 ⎛ ⎞ 2d m2 2d m2 f (σ ) And the L.H.S = = ⎜ ⎟ −1 ′ ′ g′m Wg m ⎡ ⎡ ∂f (σ ) ⎤ ⎝ b m B C2 Bb m ⎠ ∂f (σ ) ⎤′ ⎢ ∂σ ⎥ W ⎢ ∂σ ⎥ i ⎦ r ×1 i ⎦ r ×1 ⎣ ⎣ −1 Notice that b m B′C2 Bb′m = b m L′ΦLb′m = d m . ⇒ the L.H.S = 2 2d m2 2[ f (σ )]2 = = the R.H.S g′m Wg m ⎡ ′ ⎡ ∂f (σ ) ⎤ ∂f (σ ) ⎤ ⎢ ∂σ ⎥ W ⎢ ∂σ ⎥ i ⎦ r ×1 i ⎦ r ×1 ⎣ ⎣ We note that the L.H.S does not depend on row m, and this means that our claim is true for any row m. 104 2d m2 2 Since ν m = >2 for any m, = g′m Wg m A 2 then E = 2 d m2 g′m Wg m ⎛ 2d m2 ⎞ 2d m2 2E 2 , and − = = = the L.H.S. ν ⎜ ′ ⎟ E− g′m Wg m ⎝ g m Wg m ⎠ (iii ) Direct from parts (ii ) and (iii ). Comments (i ) When conditions (a) and (b) are satisfied, then the K-R and the proposed methods are identical. However, these conditions are not enough to make the Satterthwaite approach identical with the K-R and the proposed methods. (ii ) However, when the denominator degrees of freedom estimate is greater than two, conditions (a) and (b) are enough to make the Satterthwaite produce the same estimate of the denominator degrees of freedom produced by the K-R and the proposed methods. (iii ) When the K-R and the proposed methods’ estimate of the denominator degrees of freedom is less or equal than two, the Satterthwaite estimate is zero. (iv ) Even though the Satterthwaite method produces the same estimate of the denominator degrees of freedom as the one produced by K-R and the proposed methods when the denominator degrees of freedom estimate is greater than two and conditions (a) and (b) are satisfied, we should notice that this does not mean that the Satterthwaite approach is identical to the K-R and proposed methods. This is true because the statistics used are different. The Satterthwaite statistic uses Φ̂ as an estimator of the variancecovariance matrix of the fixed effects estimator, whereas the K-R and the proposed methods statistic uses Φ̂A . ˆ =Φ ˆ . For example, in BIB designs, (v ) Conditions (a) and (b) are not enough to make Φ A ˆ ≠Φ ˆ (chapter 8). conditions (a) and (b) are satisfied; however, Φ A 105 8. SIMULATION STUDY FOR BLOCK DESIGNS To compare the tests discussed in the previous chapters, we have conducted simulation studies for three different types of block designs: partially balanced incomplete block design (PBIB), balanced incomplete block design (BIB), and complete block design with missing data (two observations to be missing: one is from the first block under first treatment and the other one is from the second block under the second treatment). To test the fixed effects in these models, there are two approaches. The first approach is the intra-block approach where we consider the blocks fixed, so that only the within blocks information is utilized. Even though this approach leads to exact F tests, it does not lead to an optimal test. When the design is efficient and blocking is effective, this approach is close to optimal. We use the second approach where the blocks are considered random, and information from both within and between blocks is utilized. With the second approach, the model does not satisfy Zyskind’s condition, so it is not a Rady model, and hence we do not have an exact F-test to test the fixed effect. 8.1 Preparing Formulas for Computations Model for a Block Design yij = μ + α i + b j + eij for i = 1,....., t , j = 1,....., s, where μ is the general mean, α i is the treatments effects, b j is the blocks effects, b j ∼ N(0, σ b2 ), eij ∼ N(0, σ e2 ) , and b j 's and eij 's are all independent. The model can be expressed as where b = [b1 , y = 1n μ + Aa + Bb + e, bs ]′, a = [α1 , , α t ]′, 2 E(y ) = 1n μ + Aa, and Var(y ) = Σ = σ e2 Ι n + σ b2 BB′ = ∑ σ i2 Di where D1 = Ι n and D2 = BB′ i =1 Suppose that we are interested in testing: H: α1 = = α t ⇔ L′β = 0 106 ⎡ 0 1 −1 0 ⎢0 0 1 −1 where L′ = ⎢ ⎢ ⎢ 0 ⎣0 0 1 0⎤ 0 ⎥⎥ , ⎥ ⎥ −1⎦ β′ = [ μ α1 αt ] The design matrix; Χ = [1n , A] is not of full column rank. We can proceed with this parameterization using the g-inverse. However, to simplify our computations, we will reparameterize the model so Χ will be a full column rank matrix. Reparameterize the model in the way that β∗′ = ⎣⎡ μ ∗ α1∗ ⎡0 1 ⎢0 0 ∗′ and L = ⎢ ⎢ ⎢ ⎣0 0 t α t∗−1 ⎦⎤ with ∑ α i∗ = 0. i =1 0⎤ 0 ⎥⎥ , ⎥ ⎥ 1⎦ 0 1 0 To simplify our notation, from now on, consider Χ = Χ∗ , L = L∗ , and β = β∗ . The REML Estimates The REML equations are: 2∂ (σ ) = − tr(G ) + y ′GGy , ∂σ e2 2∂ (σ ) = − tr(GD2 ) + y ′GD2Gy, ∂σ b2 where G = Σ −1 − Σ −1Χ( Χ′Σ −1Χ) −1 Χ′Σ −1 = Κ (Κ ′ΣΚ )−1 Κ ′. A useful form to find the REML estimates by iteration is {tr ( D GD G )} i j r i , j =1 σ = { y′GDi Gy}i =1 , r where σ (r ×1) = (σ 1 ,...., σ r )′ (Searle et al., 1992, chapter 6). To estimate wij , we may use the inverse of the expected information matrix where, ⎡ ∂2 ⎤ 1 − E ⎢ 2 2 ⎥ = tr(GD j GDi ) ⎢⎣ ∂σ i ∂σ j ⎥⎦ 2 (Searle et al., 1992, chapter 6) ⎡ ∂2 ⎤ 1 and hence − E ⎢ = tr(G 2 ), 2 2⎥ ( ) σ ∂ e ⎣ ⎦ 2 ⎡ ∂2 ⎤ 1 −E⎢ = tr(GD2GD2 ), 2 2⎥ ⎣ ∂ (σ b ) ⎦ 2 107 ⎡ ∂2 ⎤ 1 −E ⎢ 2 2 ⎥ = tr(GD2G ) ⎣ ∂σ e ∂σ b ⎦ 2 and The entries of the inverse of the information matrix are estimated by using the REML estimates of σ e2 and σ b2 . How we get matrix K Κ ′ = C′[Ι n − Ρ Χ ] for any C′ (Searle et al ., 1992, chapter 6) Notice that Κ ′Χ = C′[Ι n − Ρ Χ ]Χ = C′[ Χ − Χ] = 0. However, it's desirable to have Κ ′ with a full row rank so Κ ′ΣΚ is invertible. For our example, we obtain Ι n − Ρ Χ , and then we can pick the independent rows of it. Computing Ρ i , Q ij and R ij Ρ1 = Χ′ ∂Σ −1 ∂Σ −1 Χ = − Χ′Σ −1 Σ Χ = − Χ′Σ −1Σ −1Χ, 2 ∂σ e ∂σ e2 Ρ 2 = Χ′ ∂Σ −1 ∂Σ −1 Χ = − Χ′Σ −1 Σ Χ = − Χ′Σ −1D2 Σ −1Χ 2 ∂σ b ∂σ b2 Q11 = Χ′ ∂Σ −1 ∂Σ −1 ∂Σ −1 ∂Σ −1 Σ Χ = Χ′Σ −1 Σ Σ Χ = Χ′Σ −1Σ −1Σ −1Χ, 2 2 ∂σ e ∂σ e ∂σ e2 ∂σ e2 Q12 = Q 21 = Χ′ Q 22 = Χ′ ∂Σ −1 ∂Σ −1 ∂Σ −1 ∂Σ −1 Σ Χ = Χ′Σ −1 Σ Σ Χ = Χ′Σ −1Σ −1D2 Σ −1Χ 2 2 ∂σ e ∂σ b ∂σ e2 ∂σ b2 ∂Σ −1 ∂Σ −1 Σ Χ = Χ′Σ −1D2 Σ −1D2 Σ −1Χ . ∂σ b2 ∂σ b2 R ij = 0, for all i, j = 1, 2 (clear). The above quantities are to be estimated by using the REML estimates of σ . Block designs with equal block sizes When the design has equal block sizes (e.g., BIBD, and PBIBD), then D2 = BB′ = Ι s ⊗ J k where k is the block size. Also, Σ = Ι s ⊗ Σ k , where Σ k = σ e2Ι k + σ b2 J k . Σ −1 = Ι s ⊗ Σ −k 1 , where Σ −k 1 = (σ e2 Ι k + σ b2 J k ) = −1 ⎤ σ b2 1 ⎡ − Ι J k ⎥ (Schott, 2005, theorem 1.7) 2 ⎢ k 2 2 σe ⎣ σ e + kσ b ⎦ 108 Ρ 2 = Χ′ Χ′D Χ ∂Σ −1 ∂Σ −1 Χ = − Χ′Σ −1 Σ Χ = − Χ′Σ −1D2 Σ −1Χ = − 2 2 2 2 2 2 ∂σ b ∂σ b (σ e + kσ b ) Notice that D2 Σ −1 = (Ι s ⊗ J k )(Ι s ⊗ Σ −k 1 ) = Ι s ⊗ J k Σ −k 1 = Ιs ⊗ Jk D = 2 2 2 . 2 2 (σ e + kσ b ) (σ e + kσ b ) ∂Σ −1 ∂Σ −1 ∂Σ −1 ∂Σ −1 Σ Χ = Χ′Σ −1 Σ Σ Χ 2 2 ∂σ e ∂σ b ∂σ e2 ∂σ b2 Χ′D Χ = Χ′Σ −1Σ −1D2 Σ −1Χ = 2 2 2 3 (σ e + kσ b ) Q12 = Q 21 = Χ′ Q 22 = Χ′ kΧ′D2 Χ Χ′D D Χ ∂Σ −1 ∂Σ −1 Σ Χ = Χ′Σ −1D2 Σ −1D2 Σ −1Χ = 2 2 2 2 3 = 2 = kQ12 . 2 2 ∂σ b ∂σ b (σ e + kσ b ) (σ e + kσ b2 )3 r For the mixed linear model y = Χβ + u, u ∼ N(0, Σ), Σ = ∑ σ i Vi , and the Lemma i =1 null hypothesis is H 0 : L′β = 0 . (i ) In simulation of the K-R and Satterthwaite statistics ( FKR and FS ) under H 0 , it suffices to take the fixed effects, β = 0 . (ii ) FKR (u) = FKR (au) and FS (u) = FS (au) for any constant a > 0. 1 ˆ L) −1 (L′βˆ ). (i ) FKR = (L′βˆ )′(L′Φ A Proof To simulate FKR under H 0 , we generate u ∼ N(0, Σ), and then y = Χβ + u where L′β = 0. ˆ ,Φ ˆ , and Φ ˆ do not depend on Χβ. Indeed, they depend However, we should notice that Σ A only on u. ˆ ′Σ ˆ −1y = ΦΧ ˆ ′Σ ˆ −1 ( Χβ + u) = β + ΦΧ ˆ ′Σ ˆ −1u. Moreover, βˆ = ΦΧ ˆ ′Σˆ −1u = L′ΦΧ ˆ ′Σ ˆ −1u = L′βˆ (under H ) where βˆ is βˆ calculated from ⇒ L′βˆ = L′β + L′ΦΧ 0 0 0 u = y with β = 0. Similarly, it can be shown that in simulation of FS under H 0 , it suffices to take β = 0 ˆ ( au ) = a 2 Σ ˆ (u). (ii ) Observe that u ∼ N(0, Σ) ⇔ au ∼ N(0, a 2 Σ), and hence Σ ( ˆ (au) = Χ′Σ ˆ −1 (au) Χ Φ ) −1 ( ˆ −1 (u) Χ = a 2 Χ′Σ ) −1 ˆ (u), = a 2Φ ˆ (au) Χ′Σ ˆ −1 (au)au = aΦ ˆ (u) Χ′Σ ˆ −1 (u)u = aβˆ (u). βˆ (au) = Φ 109 ˆ ˆ ˆ −1 (au) ∂Σ(au) Σ ˆ −1 ( au) Χ = − 1 Χ′Σ ˆ −1(u) ∂Σ(u) Σ ˆ −1 (u) Χ = 1 Ρ (u), Ρˆ i (au) = − Χ′Σ i 4 ∂σ i a ∂σ i a4 ˆ (au) −1 ˆ ∂Σ ˆ (au) ∂Σ(au) Σ ˆ −1 (au) Χ Σ and Q ij (au) = Χ′Σˆ −1 (au) ∂σ i ∂σ j = ˆ ˆ 1 ˆ −1 (u) ∂Σ(u) Σ ˆ −1 (u) ∂Σ(u) Σ ˆ −1 (u) Χ = 1 Q (u). Χ′Σ ij 6 ∂σ i ∂σ j a a6 ( ) (u) Χ ) ˆ −1 ( au) − Σ ˆ −1 (au) Χ′Σ ˆ −1 (au) Χ Also, since G (au) = Σ = 1 a2 ( ⎡Σ ˆ −1 (u) − Σ ˆ −1 (u) Χ′Σ ˆ −1 ⎢⎣ −1 ˆ −1 (au) Χ′Σ −1 ˆ −1 (u) ⎤ = 1 G (u), Χ′Σ ⎥⎦ a 2 r ⎡ ⎧ ⎛ ˆ (au) ˆ (au) ⎞ ⎫⎪⎤ ∂Σ ∂Σ ⎪ ⎢ then W (au) ≈ 2 ⎨ tr ⎜ G (au) G ( au ) ⎟ ⎬⎥ ⎢ ⎪ ⎝⎜ ∂σ i ∂σ j ⎠⎟ ⎪⎥ i , j =1 ⎭ ⎦ ⎣m ⎩ −1 (section 6.1.2) −1 r ⎡ ⎧ ⎛ ˆ (u) ˆ (u) ⎞ ⎪⎫⎤ 1 Σ Σ ∂ ∂ ⎪ 4 G (u) = 2 ⎢ ⎨ tr ⎜ 4 G (u) ⎟⎟ ⎬⎥ = a W (u). ⎜ ⎢ ⎪ ⎝a ∂σ i ∂σ j ⎠ ⎪⎥ i , j =1 ⎭ ⎦ ⎣m ⎩ ˆ ( au ) = a 2 Φ ˆ (u), and hence hence F (u) = F (au) Since R ij = 0, then we obtain Φ A A KR KR and FS (u) = FS (au). 8.2 Simulation Results Seven approaches for testing fixed effects are considered in the simulation study: Kenward-Roger (after modification), Kenward-Roger before the modification, KenwardRoger using the conventional variance-covariance of the fixed effects, Satterthwaite, Containment, and our proposed methods. Five settings for the ratio of σ b and σ e that are denoted by ρ have been used: 0.25, 0.5, 1, 2, 4. For each setting of ρ , 10,000 data sets have been simulated from the corresponding multivariate distribution with zero treatment effects. The level for each test, mean of scales, and mean of the denominator degrees of freedom were computed for the data sets at nominal 0.05. We use †† to denote levels below 0.0400 or above 0.0600, and † for levels between 0.0400 and 0.0449 or between 0.0551 and 0.0600. So, all unmarked levels are between 0.0450 and 0.0550. For each data 110 set the REML estimates of the variance components have been calculated by the iteration method mentioned above, and convergence was achieved for almost all data sets. The convergence rate is reported. Also, we present the percentage relative bias in the conventional and the adjusted estimator of variance-covariance matrix of the fixed effects estimator. In each type of block design, we consider some examples where we have different efficiency factors. The relative efficiency of a random block model to a complete block model for a given number t of treatments and a given number of n of observations can be expressed as E= t/n trace ⎡⎢Τ′ ( A′Σ −ρ1A ) Τ ⎤⎥ (t − 1) ⎣ ⎦ −1 where Σ ρ = ρ 2 BB′ + Ι n , Τ′Τ = Ι t −1 , and Τ′1t = 0. The efficiency factor which is denoted by q is defined to be the lower bound of E , and it describes the design alone. For an equireplicate block design with equal block sizes, q= ( t −1 r trace ⎡⎣ A′ ( Ι − Ρ B ) A ⎤⎦ + ) where r is the replication of the treatments (Birkes, 2006). Observe that BIB and PBIB designs are equireplicate and have equal block sizes. 8.2.1 Partially Balanced Incomplete Block Design (PBIB) I. The first PBIB design (PBIB1) is from Cochran and Cox (1957, P.456) where we have fifteen blocks, fifteen treatments, four treatments per block and q = 0.7955 . Block 1 2 3 4 5 Table 1.1 PBIB1 (t = s = 15, k = 4, and n = 60) Treat. Block Treat. Block Treat. 15,9,1,13 6 12,4,3,1 11 9,7,10,3 5,7,8,1 7 12,14,15,8 12 8,6,2,9 10,1,14,2 8 6,3,14,5 13 5,9,11,12 15,11,2,3 9 5,4,2,13 14 7,13,14,11 6,15,4,7 10 10,12,13,6 15 10,4,8,11 111 Table 1.2 Simulated size of nominal 5% Wald F- tests for PBIB1 ∗ ρ K-R Prop.1 Prop.2 Satter K-R K-R ∗∗ †† 0.25 0.0528 0.0466 0.0478 0.0468 0.0697 †† 0.0390 0.5 0.0536 0.0423† 0.0475 0.0483 0.0480 0.0622†† 1.0 0.0521 0.0521 0.0521 0.0560† 0.0573† 0.0486 2.0 0.0548 0.0467 0.0480 0.0480 0.0480 0.0497 4.0 0.0544 0.0481 0.0484 0.0484 0.0484 0.0492 Contain 0.0674†† 0.0558† 0.0528 0.0483 0.0485 ˆ instead of Φ ˆ . S.E. of entries ∈ (0.0019,0.0025) ∗ before modification, ∗ ∗ with using Φ A Table1.3 Mean of estimated denominator degrees of freedom for PBIB1 ρ K-R Prop.1 Prop.2 Satter Contain K-R ∗ K-R ∗∗ 0.25 48.8821 31.9638 39.3355 41.7298 40.2042 38.8929 31 0.5 47.4572 31.8812 38.4884 39.6942 38.9313 38.3272 31 1.0 43.1678 30.0484 34.6682 34.8640 34.7480 34.6475 31 2.0 40.5233 29.9220 32.1060 32.1205 32.1125 32.1052 31 4.0 39.7016 30.5696 31.2880 31.2889 31.2884 31.2830 31 ˆ instead of Φ ˆ . ∗ before modification, ∗ ∗ with using Φ A Table 1.4 Mean of estimated scale for PBIB1 ρ K-R Prop.1 K-R ∗ K-R ∗∗ 0.25 0.9890 0.8787 0.9974 0.9954 0.5 0.9916 0.9459 0.9991 0.9978 1.0 0.9913 0.9829 0.9999 0.9996 2.0 0.9902 0.9970 1.0000 1.0000 4.0 0.9898 0.9996 1.0000 1.0000 ˆ instead of Φ ˆ ∗ before modification, ∗ ∗ with using Φ A Prop.2 0.9975 0.9989 0.9998 1.0000 1.0000 . Table 1.5 Percentage relative bias in the variance estimates, convergence rate (CR) and efficiency ( E ) for PBIB1 ρ CR E Φ̂ Φ̂A 0.25 -6.5 1.6 0.9993 0.9604 0.5 -4.2 1.0 0.9999 0.8999 1.0 -0.6 1.5 1.0000 0.8378 2.0 -0.4 0.0 1.0000 0.8080 4.0 -0.6 -0.5 1.0000 0.7987 112 II. The second design (PBIB2) is from Green (1974, P.65) where we have forty-eight blocks, sixteen treatments, two treatments per block and q = 0.4762 . blk 1 2 3 4 5 6 7 8 trt 1,2 1,3 1,4 2,3 2,4 3,4 5,6 5,7 ρ 0.25 0.5 1.0 2.0 4.0 blk 9 10 11 12 13 14 15 16 Table 1.6 PBIB2 design (t = 16, s = 48, k trt blk trt blk 5,8 17 10,12 25 6,7 18 11,12 26 6,8 19 13,14 27 7,8 20 13,15 28 9,10 21 13,16 29 9,11 22 14,15 30 9,12 23 14,16 31 10,11 24 15,16 32 = 2, and n = 96) trt blk trt 1,5 33 2,14 1,9 34 6,10 1,13 35 6,14 5,9 36 10,14 5,13 37 3,7 9,13 38 3,11 2,6 39 3,15 2,10 40 7,11 Table 1.7 Simulated size of nominal 5% Wald F- tests for PBIB2 K-R Prop.1 Prop.2 Satter K-R ∗ K-R ∗∗ † † † † 0.0532 0.0568 0.0585 0.0573 0.0730†† 0.0593 0.0585† 0.0529 0.0561† 0.0579† 0.0565† 0.0711†† 0.0570† 0.0504 0.0540 0.0549 0.0543 0.0644†† 0.0542 0.0557† 0.0473 0.0508 0.0511 0.0509 0.0507 0.0562† 0.0470 0.0493 0.0494 0.0493 blk 41 42 43 44 45 46 47 48 Contain 0.0440† 0.0453 0.0468 0.0480 0.0480 ˆ instead of Φ ˆ . S.E. of entries ∈ (0.0021,0.0026) ∗ before modification, ∗ ∗ with using Φ A Table1.8 Mean of estimated denominator degrees of freedom for PBIB2 ρ K-R Prop.1 Prop.2 Satter Contain K-R ∗ K-R ∗∗ 0.25 83.6874 65.3837 72.6495 82.2865 74.7209 68.9973 33 0.5 79.8284 61.8836 69.0306 77.4455 70.9267 65.9502 33 1.0 65.6409 49.2069 55.8740 59.9145 56.9966 54.6475 33 2.0 49.9531 36.2898 41.2927 41.9262 41.5348 41.1791 33 4.0 43.6954 32.8601 35.2892 35.3400 35.3120 35.2835 33 ˆ ˆ ∗ before modification, ∗ ∗ with using Φ instead of Φ . A trt 7,15 11,15 4,8 4,12 4,16 8,12 8,16 12,16 113 Table 1.9 Mean of estimated scales for PBIB2 ρ K-R Prop.1 K-R ∗ K-R ∗∗ 0.25 0.9953 0.9482 0.9981 0.9949 0.5 0.9952 0.9511 0.9983 0.9952 1.0 0.9948 0.9639 0.9991 0.9968 2.0 0.9934 0.9859 0.9999 0.9991 4.0 0.9917 0.9974 1.0000 0.9999 ˆ instead of Φ ˆ . ∗ before modification, ∗ ∗ with using Φ Prop.2 0.9973 0.9975 0.9984 0.9996 1.0000 A Table 1.10 Percentage relative bias in the variance estimates, convergence rate (CR) and efficiency ( E ) for PBIB2 ρ CR E Φ̂ Φ̂A 0.25 0.5 1.0 2.0 4.0 -7.0 -7.0 -5.9 -3.3 -2.6 -2.5 -2.7 -2.4 -1.6 -2.2 1.0000 1.0000 1.0000 1.0000 1.0000 0.9478 0.8408 0.6705 0.5451 0.4955 8.2.2 Balanced Incomplete Block Design (BIBD) Since the design is balanced with respect to treatments, then it can be shown that the model satisfies conditions (a) and (b) for theorem lemma 7.2.6, and hence A1 = A 2 . The K-R and the proposed approaches are identical. The scale estimate is 1 and the denominator degree of freedom is estimate is 2 (theorems 5.3.1 and 5.3.2, and corollary A2 5.3.3). The Satterthwaite method gives the same estimate of the denominator degrees of freedom as the K-R and the proposed approaches when the estimate >2 (lemma 7.2.7). Six BIB designs with different efficiency factors are adopted from Cochran and Cox (1957, chapter 11). Notice that the efficiency factor simplified as q = tα 1 − 1 k where = rk 1 − 1 t r is the replication of the treatments and α is the number of blocks in which each pair of treatments occur together (Kuehl, 2000, chapter 9). 114 I. In the first design (BIB1), where we have fifteen blocks, six treatments, and two treatments per block (t = 6, s = 15, k = 2, and n = 30) , and the efficiency factor, q = 0.6 Table 2.1 BIB1 (t = 6, s = 15, k = 2, Block Treat. Block Treat. Block Treat. 1 1,2 4 1,3 7 1,4 2 3,4 5 2,5 8 2,6 3 5,6 6 4,6 9 3, 5 and n = 30) Block Treat. 10 1,5 11 2,4 12 3,6 Block 13 14 15 Table 2.2 Simulated size of nominal 5% Wald F- tests for BIB1 ρ K-R&prop. Satter Contain K-R ∗ †† †† †† 0.25 0.0786 0.0676 0.0998 0.0619†† 0.5 0.0731†† 0.0635†† 0.0930†† 0.0589† 1.0 0.0534 0.0710†† 0.0566† 0.0746†† †† † † 2.0 0.0539 0.0697 0.0551 0.0599 †† † 4.0 0.0542 0.0540 0.0727 0.0559 ∗ before modification. S.E. of entries ∈ (0.0023,0.003) Table 2.3 Mean of estimated denominator degrees of freedom for BIB1 ρ K-R&prop. Satter Contain K-R ∗ 0.25 29.0917 20.1712 20.1712 10 0.5 28.0031 19.0591 19.0591 10 1.0 24.6097 15.5784 15.5784 10 2.0 21.0182 11.8539 11.8539 10 4.0 19.7201 10.4807 10.4807 10 ∗ before modification. Table 2.4 Mean of estimated scales for BIB1 ρ K-R K-R ∗ 0.25 0.9753 1.0000 0.5 0.9725 1.0000 1.0 0.9619 1.0000 2.0 0.9446 1.0000 4.0 0.9344 1.0000 ∗ before modification. Treat. 1,6 2,3 4,5 115 Table 2.5 Percentage relative bias in the variance estimates, convergence rate (CR) and efficiency ( E ) for BIB1 ρ CR E Φ̂ Φ̂A 0.25 -16.0 -3.7 0.9968 0.9556 0.5 -13.5 -2.3 0.9978 0.8667 1.0 -8.9 -1.2 0.9994 0.7333 2.0 -2.0 -0.6 1.0000 0.6444 4.0 0.6 0.9 1.0000 0.6121 II. In the second design (BIB2), we have ten blocks, six treatments, and three treatments per block (t = 6, s = 10, k = 3, and n = 30) , q = 0.8 Table 2.6 BIB2 (t = 6, s = 10, k = 3, and n = 30) Block Treat. Block Treat. 1 1,2,5 6 2,3,4 2 1,2,6 7 2,3,5 3 1,3,4 8 2,4,6 4 1,3,6 9 3,5,6 5 1,4,5 10 4,5,6 Table 2.7 Simulated size of nominal 5% Wald F- tests for BIB2 ρ K-R&prop. Satter Contain K-R ∗ † †† 0.25 0.0673†† 0.0577 0.0830 0.0710†† 0.5 0.0670†† 0.0584† 0.0775†† 0.0664†† 1.0 0.0524 0.0633†† 0.0604†† 0.0552† 2.0 0.0491 0.0515 0.0497 0.0606†† † 4.0 0.0474 0.0482 0.0475 0.0579 ∗ before modification. S.E. of entries ∈ (0.0019,0.0025) 116 Table 2.8 Mean of estimated denominator degrees of freedom for BIB2 ρ K-R&prop. Satter Contain K-R ∗ 0.25 29.4561 20.5425 20.5425 15 0.5 28.5419 19.6163 19.6163 15 1.0 26.3744 17.4069 17.4069 15 2.0 24.7693 15.7620 15.7620 15 4.0 24.2247 15.2012 15.2012 15 ∗ before modification. Table 2.9 Mean of estimated scales for BIB2 ρ K-R K-R ∗ 0.25 0.9761 1.0000 0.5 0.9749 1.0000 1.0 0.9701 1.0000 2.0 0.9653 1.0000 4.0 0.9632 1.0000 ∗ before modification. Table 2.10 Percentage relative bias in the variance estimates, convergence rate (CR) and efficiency for BIB2 ρ CR E Φ̂ Φ̂A 0.25 -12.0 -2.1 0.9977 0.9684 0.5 -7.4 0.2 0.9991 0.9143 1.0 -5.3 -2.1 1.0000 0.8500 2.0 -2.6 -2.0 1.0000 0.8154 4.0 -0.5 - 0.5 1.0000 0.8041 III. In the third design (BIB3), we have ten blocks, six treatments, and three treatments per block (t = 7, s = 7, k = 4, and n = 28) , q = 0.8750 117 Table 2.11 BIB3 (t = 7, s = 7, k = 4, and n = 28) Block Treat. 1 3,5,6,7 2 1,4,6,7 3 1,2,5,7 4 1,2,3,6 5 2,3,4,7 6 1,3,4,5 7 2,4,5,6 Table 2.12 Simulated size of nominal 5% Wald F- tests for BIB3 ρ K-R&prop. Satter Contain K-R ∗ † †† 0.25 0.0569 0.0472 0.0697 0.0769†† 0.5 0.0508 0.0613†† 0.0660†† 0.0644†† 1.0 0.0494 0.0524 0.0607 †† 0.0553† 2.0 0.0450 0.0470 0.0454 0.0581† 4.0 0.0455 0.0456 0.0456 0.0570† ∗ before modification. S.E. of entries ∈ (0.0021,0.0027) Table 2.13 Mean of estimated denominator degrees of freedom for BIB3 ρ K-R&prop. Satter Contain K-R ∗ 0.25 26.3479 17.6322 17.6218 15 0.5 26.2043 17.3494 17.3468 15 1.0 25.1420 16.2466 16.2464 15 2.0 24.3069 15.3868 15.3868 15 4.0 24.0273 15.0992 15.0992 15 ∗ before modification. Table 2.14 Mean of estimated scales for BIB3 ρ K-R K-R ∗ 0.25 0.9549 1.0000 0.5 0.9653 1.0000 1.0 0.9667 1.0000 2.0 0.9643 1.0000 4.0 0.9632 1.0000 ∗ before modification. 118 Table 2.15 Percentage relative bias in the variance estimates, convergence rate (CR) and efficiency for BIB3 ρ CR E Φ̂ Φ̂ A 0.25 0.5 1.0 2.0 4.0 -10.8 -8.9 -0.4 -1.3 -2.0 3.9 -0.5 2.2 -0.9 -1.9 1.0000 1.0000 1.0000 1.0000 1.0000 0.9750 0.9357 0.9000 0.8824 0.8769 IV. In the fourth design (BIB4), we have thirty six blocks, nine treatments, and two treatments per block (t = 9, s = 36, k = 2, and n = 72) , and q = 0.5625 Block 1 2 3 4 5 6 7 8 9 Table 2.16 BIB4 (t = 9, s = 36, k = 2, and n = 72) Treat. Block Treat. Block Treat. Block 1,2 10 1,3 19 1,4 28 2,8 11 2,5 20 2,6 29 3,4 12 3,6 21 2,3 30 4,7 13 4,9 22 4,5 31 5,6 14 5,8 23 5,7 32 1,6 15 6,7 24 6,8 33 3,7 16 1,7 25 7,9 34 8,9 17 4,8 26 1,8 35 5,9 18 2,9 27 3,9 36 Table 2.17 Simulated size of nominal 5% Wald F- tests for BIB4 ρ K-R&prop. Satter Contain K-R ∗ † † †† 0.25 0.0578 0.0520 0.0566 0.0731 † † †† 0.5 0.0587 0.0521 0.0562 0.0709 † †† 1.0 0.0555 0.0524 0.0501 0.0617 † † 2.0 0.0598 0.0536 0.0525 0.0592 † † 4.0 0.0599 0.0547 0.0546 0.0557 ∗ before modification. S.E. of entries ∈ (0.0022,0.0026) Treat. 1,5 2,4 3,8 4,6 3,5 6,9 2,7 7,8 1,9 119 Table 2.18 Mean of estimated denominator degrees of freedom for BIB4 ρ K-R&prop. Satter Contain K-R ∗ 0.25 66.8712 58.3711 58.3711 28 0.5 63.3018 54.7945 54.7945 28 1.0 52.5089 43.9746 43.9746 28 2.0 41.8840 33.3083 33.3083 28 4.0 38.0104 29.4118 29.4118 28 ∗ before modification. Table 2.19 Mean of estimated scales for BIB4 ρ K-R K-R ∗ 0.25 0.9966 1.0000 0.5 0.9961 1.0000 1.0 0.9942 1.0000 2.0 0.9906 1.0000 4.0 0.9883 1.0000 ∗ before modification. Table 2.20 Percentage relative bias in the variance estimates, convergence rate (CR) and efficiency ( E ) for BIB4 ρ CR E Φ̂ Φ̂A 0.25 -7.5 -2.0 1.0000 0.9514 0.5 -5.9 -0.8 1.0000 0.8542 1.0 -4.2 -0.7 1.0000 0.7083 2.0 -0.7 0.5 1.0000 0.6111 4.0 0.9 1.1 1.0000 0.5758 V. In the fifth design (BIB5), we have ten blocks, six treatments, and three treatments per block (t = 9, s = 18, k = 4, and n = 72) , and q = 0.8438 120 BIB5 design Block Treat. 1 1,4,6,7 2 2,6,8,9 3 1,3,8,9 4 1,2,3,4 5 1,5,7,8 6 4,5,6,9 Table 2.21 (t = 9, s = 18, k = 4, and n = 72) Block Treat. Block Treat. 7 2,3,6,7 13 1,2,4,9 8 2,4,5,8 14 1,5,6,9 9 3,5,7,9 15 1,3,6,8 10 1,2,5,7 16 4,6,7,8 11 2,3,5,6 17 3,4,5,8 12 3,4,7,9 18 2,7,8,9 Table 2.22 Simulated size of nominal 5% Wald F- tests for BIB5 ρ K-R&prop. Satter Contain K-R ∗ †† 0.25 0.0546 0.0532 0.0612 0.0585† 0.5 0.0534 0.0523 0.0546 0.0577† 1.0 0.0518 0.0494 0.0525 0.0514 2.0 0.0498 0.0468 0.0476 0.0480 4.0 0.0505 0.0472 0.0475 0.0480 ∗ before modification.S.E. of entries ∈ (0.0021,0.0024) Table 2.23 Mean of estimated denominator degrees of freedom for BIB5 ρ K-R&prop. Satter Contain K-R ∗ 0.25 66.4586 57.9582 57.9582 46 0.5 63.1346 54.6281 54.6281 46 1.0 58.2671 49.7502 49.7502 46 2.0 55.6517 47.1284 47.1284 46 4.0 54.8206 46.2951 46.2951 46 ∗ before modification. Table 2.24 Mean of estimated scales for BIB5 ρ K-R K-R ∗ 0.25 0.9966 1.0000 0.5 0.9962 1.0000 1.0 0.9955 1.0000 2.0 0.9950 1.0000 4.0 0.9949 1.0000 ∗ before modification. 121 Table 2.25 Percentage relative bias in the variance estimates, convergence rate (CR) and efficiency ( E ) for BIB5 ρ CR E Φ̂ Φ̂A 0.25 -3.0 0.8 1.0000 0.9688 0.5 -2.4 0.0 1.0000 0.9219 1.0 -1.7 -0.8 1.0000 0.8750 2.0 -0.9 -0.7 1.0000 0.8529 4.0 -0.6 -0.6 1.0000 0.8462 VI. In the sixth design (BIB6), where we have twelve blocks, nine treatments, and six treatments per block (t = 9, s = 12, k = 6, and n = 72) , and q = 0.9375 Table 2.26 BIB6 (t = 9, s = 12, k = 6, and n = 72) Block Treat. Block Treat. Block Treat. Block Treat. 1 4,5,6,7,8,9 4 2,3,4,5,7,9 7 1,3,4,5,8,9 10 1,2,4,5,7,8 2 2,3,5,6,8,9 5 1,3,5,6,7,8 8 1,2,5,6,7,9 11 1,2,3,7,8,9 3 2,3,4,6,7,8 6 1,3,4,6,7,9 9 1,2,4,6,8,9 12 1,2,3,4,5,6 Table 2.27 Simulated size of nominal 5% Wald F- tests for BIB6 ρ K-R&prop. Satter Contain K-R ∗ 0.25 0.0496 0.0477 0.0543 0.0519 0.5 0.0503 0.0483 0.0518 0.0503 1.0 0.0517 0.0501 0.0512 0.0501 2.0 0.0527 0.0509 0.0511 0.0510 4.0 0.0526 0.0505 0.0505 0.0505 ∗ before modification. S.E. of entries ∈ (0.0021,0.0023) Table 2.28 Mean of estimated denominator degrees of freedom for BIB6 ρ K-R&prop. Satter Contain K-R ∗ 0.25 65.7861 57.2847 57.2847 52 0.5 63.6891 55.1839 55.1839 52 1.0 61.7011 53.1920 53.1920 52 2.0 60.8480 52.3371 52.3371 52 4.0 60.5985 52.0870 52.0870 52 ∗ before modification. 122 Table 2.29 Mean of estimated scales for BIB6 ρ K-R K-R ∗ 0.25 0.9965 1.0000 0.5 0.9963 1.0000 1.0 0.9960 1.0000 2.0 0.9959 1.0000 4.0 0.9959 1.0000 ∗ before modification. Table 2.30 Percentage relative bias in the variance estimates, convergence rate (CR) and efficiency ( E ) for BIB6 ρ CR E Φ̂ Φ̂A 0.25 -1.5 0.7 1.0000 0.9750 0.5 0.2 1.3 1.0000 0.9357 1.0 -0.2 0.1 1.0000 0.9000 2.0 -0.4 -0.4 1.0000 0.8824 4.0 -0.8 -0.8 1.0000 0.8769 8.2.3 Complete Block Designs with Missing Data As we mentioned in the introduction of this chapter, two observations are in each of the following designs; one is from the first block under the first treatment, and the other is from the second block under the second treatments. I. Consider a complete block design with five blocks and four treatments (RCB1), q = 0.9477 . ρ 0.25 0.5 1.0 2.0 4.0 Table 3.1 Simulated size of nominal 5% Wald F- tests for RCB1 K-R Prop.1 Prop.2 Satter Contain K-R ∗ †† † † † † 0.0696 0.0557 0.0557 0.0557 0.0591 0.0562† 0.0691†† 0.0545 0.0546 0.0545 0.0575† 0.0558† 0.0631†† 0.0490 0.0490 0.0490 0.0508 0.0498 0.0622†† 0.0494 0.0494 0.0494 0.0498 0.0497 0.0622†† 0.0490 0.0490 0.0490 0.0492 0.0491 ∗ before modification. S.E. of entries ∈ (0.0022,0.0025) Table 3.2 123 Mean of estimated denominator degrees of freedom for RCB1 ρ K-R Prop.1 Prop.2 Satter Contain K-R ∗ 0.25 20.3740 10.7678 10.7747 10.7699 10.7542 10 0.5 20.1975 10.5812 10.5863 10.5828 10.5708 10 1.0 19.9325 10.3008 10.3029 10.3014 10.2960 10 2.0 19.7490 10.1050 10.1054 10.1051 10.1039 10 4.0 19.6793 10.0296 10.0297 10.0297 10.0295 10 ∗ before modification. Table 3.3 Mean of estimated scales for RCB1 ρ K-R Prop.1 Prop.2 K-R ∗ 0.25 0.9348 0.9997 0.9996 0.9997 0.5 0.9332 0.9998 0.9997 0.9998 1.0 0.9307 0.9999 0.9999 0.9999 2.0 0.9288 1.0000 1.0000 1.0000 4.0 0.9281 1.0000 1.0000 1.0000 ∗ before modification. Table 3.4 Percentage relative bias in the variance estimates, convergence rate (CR) and efficiency ( E ) for RCB1 ρ CR E Φ̂ Φ̂A 0.25 0.5 1.0 2.0 4.0 -3.0 -2.3 -1.0 -0.6 -0.2 -0.5 -0.5 -0.3 0.0 -0.2 0.9939 0.9972 0.9992 0.9998 1.0000 0.9814 0.9706 0.9577 0.9508 0.9485 II. Consider a complete block design with seven blocks and six treatments (RCB2), q = 0.9839 Table 3.5 Simulated size of nominal 5% Wald F- tests for RCB2 ρ K-R Prop.1 Prop.2 Satter Contain K-R ∗ 0.25 0.0564† 0.0521 0.0521 0.0521 0.0531 0.0525 0.5 0.0510 0.0466 0.0466 0.0466 0.0472 0.0472 1.0 0.0576† 0.0524 0.0524 0.0524 0.0528 0.0526 2.0 0.0585† 0.0525 0.0525 0.0525 0.0525 0.0525 4.0 0.0585† 0.0527 0.0527 0.0527 0.0527 0.0527 ∗ before modification. S.E. of entries ∈ (0.0021,0.0023) Table 3.6 124 Mean of estimated denominator degrees of freedom for RCB2 ρ K-R Prop.1 Prop.2 Satter Contain K-R ∗ 0.25 37.3202 28.5050 28.5173 28.5076 28.4925 28 0.5 37.1337 28.3205 28.3266 28.3218 28.3143 28 1.0 36.9442 28.1321 28.1335 28.1324 28.1307 28 2.0 36.8508 28.0386 28.0387 28.0386 28.0384 28 4.0 36.8226 28.0102 28.0102 28.0102 28.0102 28 ∗ before modification. Table 3.7 Mean of estimated scales for RCB2 ρ K-R Prop.1 Prop.2 K-R ∗ 0.25 0.9873 1.0000 0.9999 1.0000 0.5 0.9872 1.0000 1.0000 1.0000 1.0 0.9871 1.0000 1.0000 1.0000 2.0 0.9870 1.0000 1.0000 1.0000 4.0 0.9870 1.0000 1.0000 1.0000 ∗ before modification. Table 3.8 Percentage relative bias in the variance estimates, convergence rate (CR) and efficiency ( E ) for RCB2 ρ CR E Φ̂ Φ̂A 0.25 0.5 1.0 2.0 4.0 -0.4 -0.5 -1.6 0.9 0.7 0.0 -0.3 -1.6 0.9 0.7 0.9995 0.9998 1.0000 1.0000 1.0000 0.9922 0.9887 0.9857 0.9844 0.9840 8.3 Comments and Conclusion The bias of the conventional estimator of the variance-covariance matrix of the fixed effects estimator is large and negative in most cases as expected especially with small values of ρ (e.g., table 2.5 and table 2.10). The bias was found to be effected by three factors: q, ρ , and n. When q, ρ , and/or n. get larger, the bias decreases. This bias is reduced to an acceptable level by using the adjusted estimator. As expected, when the between-block variance increased, the estimate of the denominator degrees of freedom 125 decreased (e.g., tables 1.3, 2.2, and 3.1). This is because that the contribution of the information from between blocks to estimate the fixed effects and their covariance matrix gets smaller. Notice that for all approaches, the denominator degrees of freedom estimates approach the Containment method’s estimate as ρ increases. In other words, as ρ increases, the optimal approximate test approaches the Intra-block exact F-test where the between-block information is not utilized. In general, the K-R before modification does not perform as well as K-R after modification. Before modification, the K-R method tends to be more liberal than K-R after modification (e.g., table 3.1 and 3.5). Even though the K-R approach using the conventional estimator of the variance-covariance matrix of the fixed effects estimator give reasonable true levels (table 1.2), the approach is not appropriate since the denominator degrees of freedom and the scale estimates were found to be negative for some data sets. For PBIB1 design, and when ρ =0.25, a data set has -120.8 as an estimate of the denominator degrees of freedom. The K-R and the proposed methods perform well and similarly and the similarity gets stronger when ρ gets larger (for BIB designs, the K-R and proposed methods are identical). Their true levels tend to be more conservative with designs where the efficiency factor is higher, and this is related to the small remaining bias in the adjusted estimator of variance-covariance matrix of the fixed effects estimator in most cases. This bias is not eliminated in the BIB designs where the efficiency factor is small and this makes the K-R and the proposed methods’ true levels more liberal (e.g., tables 2.5, 2.10 and 2.15). Indeed, they perform better than other approaches in most cases except a few cases that will be mentioned below. The Satterthwaite approach was found to perform poorly for designs with small values of ρ , q and n where the true level tends to be too liberal (e.g., table 2.2). This unwanted liberality of the true level is related to the negative large percentage relative bias in the conventional estimator of the variance-covariance matrix of the fixed effects estimator for the cases where we have small values of ρ , q and n (e.g., table 2.5). When ρ , q and/or n get larger, the Satterthwaite method tends to be more conservative. 126 This is due to the relatively small values of the bias in the conventional estimate of the estimator of the variance-covariance matrix of the fixed effects estimator (e.g., table 2.9). Indeed, when q and ρ get larger, the Satterthwaite method tends to perform as well as the K-R and the proposed methods regardless of the sample size (e.g., table 2.12). For the cases where the Satterthwaite method performs poorly, the Containment method was found to perform better than the Satterthwaite method generally (e.g., tables 1.2 and 2.2). Moreover, for the cases where the efficiency factors are large, and ρ and/or n get larger, the performance of the Containment method improves in the way that it becomes as good as the K-R and the proposed methods (e.g., tables 2.12 and 2.22). For the cases where the efficiency factor is relatively small, we found the Containment method’s performance tends to be better than all other approaches including the K-R and the proposed methods when ρ is relatively small (e.g., table 2.2). 127 9. CONCLUSION 9.1 Summary The strength of Kenward and Roger’s approach comes from combining three components where no other approach does: First, instead of approximating the Wald-type F statistic itself by an F distribution as in the Satterthwaite approximation, Kenward and Roger considered a scaled form of the statistic that makes the approximation more flexible since they approximate the denominator degrees of freedom and the scale as well. Second, Kenward and Roger modified the procedure in such a way that they get the right values for both the denominator degrees of freedom and the scale for the two special cases considered in chapter 3. Third, the adjusted estimator of the variance-covariance matrix of the fixed effects estimator, which is less biased than the conventional estimator, as we saw in chapter 2, is used in constructing the Wald-type F statistic. In this thesis, by modifying certain steps in Kenward and Roger’s derivation, we obtain two alternative methods which are similar to the K-R method but simpler in derivation and application. Whenever the ratio of the two quantities A1 and A2 derived in chapter 3 is the same as for the two special cases, the K-R and the proposed methods are identical. We also showed that the K-R and the proposed methods produce the right values for the denominator degrees of freedom and the scale factor, not only in the special cases, but also in three general models where we have exact F tests for fixed effects. A Wald-type statistic was constructed using the conventional, rather than the adjusted, estimator of the variance-covariance matrix of the fixed effects estimator, and then modified to attain the right values in the two special cases. Because A3 was found to be null in all cases with exact F tests considered in this thesis, the modification step was problematic and this is consistent with the remark made by Kenward and Roger (1997). Another approach to obtain approximate F tests for fixed effects is the Satterthwaite method. When a one-dimensional hypothesis is of interest, then the K-R, the Satterthwaite, and the proposed methods produce the same estimate of the 128 denominator degrees of freedom, and the scale estimate is one. Even though the estimates are the same, the Satterthaite and the K-R tests are not necessarily identical, and this is true because in the K-R statistic, we use the adjusted estimator of the variance-covariance matrix of the fixed effects estimator, whereas the conventional estimator is used in the Satterthwaite test. However, in some cases like balanced mixed classification models where the adjustment in the estimator of the variance-covariance matrix of the fixed effects estimator is null, the Satterthwaite approach is identical to the K-R and the proposed methods when testing a one-dimensional hypothesis. When we make inference about a multidimensional hypothesis, we still have some cases where the K-R and the Satterthwaite approaches produce the same estimate of the denominator degrees of freedom, or are even identical approaches as in lemma 7.2.7. In chapter 8, we conducted a simulation study for three kinds of block designs; partially balanced block designs, balanced incomplete block designs, and complete block designs with missing data. The K-R, the proposed, the Satterthwaite, and the Containment methods were compared in the study. Three factors were considered in the study: the variance components ratio, the efficiency factor, and the sample size. In most cases, we found the K-R and the proposed methods performed as well as or better than the other methods. For designs with small values for all the three factors, the Satterthwaite method performed poorly. The true level was found to be quite liberal due to the large bias in the conventional estimator of the variance-covariance matrix of the fixed effects estimator. In some designs where the efficiency factor and the ratio of the variance components are both relatively small, the K-R and the proposed methods were found to be liberal. For those cases, the Containment method was found to work as well as or better than all other approaches. 9.2 Future Research In this thesis, some questions about the K-R and other methods used to make inference about fixed effects in mixed linear models were answered. However, many issues still need to be investigated. In chapter 3, we derived approximate expressions for 129 the expectation and variance of the Wald-type F statistic using the conventional estimator of the variance-covariance of the fixed effects estimator. However, as we saw in section 4.4, it was problematic to use the special cases to modify the approach since A3 was null. As a matter of fact, it is also null in all three general models considered in chapter 6. A possible direction for further research is to investigate if there is any case where we have an exact F test and A3 is not null, so that a better modification can be accomplished. Two different ratios A1 A2 were found in the two special cases and in the three general models considered in chapter 6. It is suggested to search if there is any case where we have an exact F test and the ratio A1 A2 is different than the two ratios we had. In case another ratio is found, it is suggested to use it in the modifying stage for the K-R and the proposed methods. In balanced mixed classification models, and in the general models considered in sections 6.2 and 6.3, we noticed that the adjustment of the estimator of the variancecovariance matrix of the fixed effects estimator is null. It would be interesting to generalize these results. When we make an inference about a multidimensional hypothesis of the fixed effects, lemma 7.2.7 provide conditions where the K-R and the Satterthwaite approaches produce the same estimate of the denominator degrees of freedom. It would be interesting to investigate the possibility of other conditions that lead to the same result. Finally, based on the theoretical derivation of the methods proposed in chapter 5, and their simulation results for the block designs in chapter 8, these methods’ performances are expected to be comparable to the K-R method. A natural proposal is to investigate the performance of these proposed methods in other different models. 130 BIBLIOGRAPHY Anderson, T. (2003). An Introduction to Multivariate Statistical Analysis. New York: John Wiley & Sons. Bellavance, F., Tardif, S. and Stephens, M. (1996). Tests for the Analysis of Variance of Crossover Designs with Correlated Errors. Biometrics 52, 607-612. Birkes, D. (2004). Linear Model Theory Class Notes, Unpublished. Oregon State University. Birkes, D. and Lee, Y. (2003). Adjusted Orthogonal Block Structure in Mixed Linear Models. Unpublished. Birkes, D. (2006). Class Notes. Unpublished. Oregon State University. Chen, X. and Wei, L. (2003). A Comparison of Recent Methods for the Analysis of Small-Sample Cross-Over Studies. Statistics in Medicine 22, 2821-2833. Cochran, W. and Cox, G. (1975). Experimental Designs. New York: John Wiley & Sons. Fai, C and Cornelius, P. (1996). Approximate F Tests of Multiple Degree of Freedom Hypotheses in Generalized Least Squares Analysis of Unbalanced Split-Plot Experiments. Journal of Statistical Computation and Simulation 54, 363-378. Giesbrecht, F. and Burns, J. (1985). Two-Stage Anaysis Based on a Mixed Model: LargeSample Asymptotic Theory and Small-Sample Simulation Results. Biometrics 41, 477-486 Gomes, E., Schaalje, G. and Fellingham, G. (2005). Performance of the Kenward-Roger Method when the Covariance Structure is Selected Using AIC and BIC. Communications in Statistics-Simulation and Computation 34, 377-392. Graybill, F. and Wang, C. (1980). Confidence Intervals on Nonnegative Linear Cominations of Variances. Journal of the American Statistical Association 75, 869-873. Green, P. (2001). On the Design of Choice Experiments Involving Multifactor Alternatives. Journal of Consumer Research 1, 61-68. Harville, D. (1997). Matrix Algebra from a Statistician’s Prospective. New York: Springer. Harville, D. and Jeske, D. (1992). Mean Squared Error of Estimation or Prediction Under a General Linear Model. Journal of the American Sataistical Association 87, 724731. Huynh, H. and Feldt, L. (1970). Conditions Under Which Mean Square Ratios in Repeated Measurements Designs Have Exact F-Distributions. Journal of the American Statistical Association 65, 1582-1589. 131 Kackar, R. and Harville, D. (1984). Approximations for Standard Errors of Estimators of Fixed and Random Effect in Mixed Linear Models. Journal of the American Staistical Association 79, 853-862. Kenward, M. and Roger, J. (1997). Small Sample Inference for Fixed Effects from Restricted Maximum Likelihood. Biometrics 53, 983-997. Krzanowski, W. (2000). Principles of Multivariate Analysis. Oxford, U.K.: Oxford University Press. Kuehl, R. (2000). Design of Experiments: Statistical Principles of Research Design and Analysis. Pacific Grove: Duxbury Press. Mardia, K., Kent, J. and Bibby, J. (1992). Multivariate Analysis. New York: Academic Press. Pase, L. and Salvan, A. (1997). Principles of Statistical Inference. New Jersey: World Scientific. Prasad, N. and Rao, J. (1990). The Estimation of the Mean Squared Error of Small-Area Estimators. Journal of the American Statistical Association 85, 163-171. Pukelsheim, F. (1993). Optimal Design of Experiments. New York: John Wiley & Sons. Rady, E. (1986). Testing Fixed Effects in Mixed Linear Models. Ph.D. thesis, Unpublished. Oregon State University. Rao, C. (2002). Linear Statistical Inference and its Applications. New York: John Wiley & Sons. SAS Institute, Inc. (2002-2006), SAS version 9.1.3, Online Help, Cary, NC: SAS Institute. Satterthwaite, F. (1941). Synthesis of Variance. Psychometrika 6, 309-316. Satterthwaite, F. (1946). Approximate Distribution of Estimates of Variance Components. Biometrics 2, 110-114. Savin, A., Wimmer, G. and Witkovsky, V. (2003). Measurement Science Review 3, 5356. Schaalje, G., McBride, J. and Fellingham, G. (2002). Journal of Agricultural, Biological, and Enviromental Statistics 7, 512-524. Schott, J. (2005). Matrix Analysis for Statistics. New York: John Wiley & Sons. Searle, S., Casella, G. and McCullouch, C. (1992). Variance Components. New York: John Wiley & Sons. Seely, J. (2002). Linear Model Theory Class Notes, Unpublished. Oregon State University. 132 Seely, J. and Rady, E. (1988). When Can Random Effects Be Treated As Fixed Effects For Computing A Test Statistic For A Linear Hypothesis? Communications in Statistics- Theory and Methods 17, 1089-1109. Siefert, B. (1979). Optimal Testing for Fixed Effects in General Balanced Mixed Classification Models. Math. Operationsforsh. Statist. Ser. Statistics 10, 237-256. Spike, J., Piepho, H. and Meyer, U. (2004). Approximating the Degrees of Freedom for Contrasts of Genotypes Laid Out as Subplots in an Alpha-Design in a Split-Plot Experiment. Plan Breeding 123, 193-197. Utlaut, T., Birkes, D. and VanLeeuwen, D. (2001). Optimal Exact F Tests and ErrorOrthogonal Designs. Conference Proceeding of the 53rd ISI Session, Seoul 2001. VanLeeuwen, D., Seely, J. and Birkes, D. (1998). Sufficient Conditions for Orthogonal Designs in Mixed Linear Models. Journal of Statistical Planning and Inference 73, 373-389. VanLeeuwen, D., Birkes, D. and Seely, J. (1999). Balance and Orthogonality in Designs for Mixed Classification Models. The Annals of Statistics 27, 1927-1947.