AN ABSTRACT OF THE DISSERTATION OF

advertisement
AN ABSTRACT OF THE DISSERTATION OF
Waseem S. Alnosaier for the degree of Doctor of Philosophy in Statistics presented on
April 25, 2007.
Title: Kenward-Roger Approximate F Test for Fixed Effects in Mixed Linear Models.
Abstract approved:
David S. Birkes
Clifford B. Pereira
For data following a balanced mixed Anova model, the standard Anova method typically
leads to exact F tests of the usual hypotheses about fixed effects. However, for most
unbalanced designs with most hypotheses, the Anova method does not produce exact
tests, and approximate methods are needed. One approach to approximate inference about
fixed effects in mixed linear models is the method of Kenward and Roger (1997), which
is available in SAS and widely used. In this thesis, we strengthen the theoretical
foundations of the method by clarifying and weakening the assumptions, and by
determining the orders of the approximations used in the derivation. We present two
modifications of the K-R method which are comparable in performance but simpler in
derivation and computation. It is known that the K-R method reproduces exact F tests in
two special cases, namely for Hotelling T 2 and for Anova F ratios in fixed-effects
models. We show that the K-R and proposed methods reproduce exact F tests in three
more general models which include the two special cases, plus tests in all fixed effects
linear models, many balanced mixed Anova models, and all multivariate regressions
models. In addition, we present some theorems on the K-R, proposed, and Satterthwaite
methods, investigating conditions under which they are identical. Also, we show the
difficulties in developing a K-R type method using the conventional, rather than adjusted,
estimator of the variance-covariance matrix of the fixed-effects estimator. A simulation
study is conducted to assess the performance of the K-R, proposed, Satterthwaite, and
Containment methods for three kinds of block designs. The K-R and proposed methods
perform quite similarly and they outperform other approaches in most cases.
©Copyright by Waseem S. Alnosaier
April 25, 2007
All Rights Reserved
Kenward-Roger Approximate F Test for Fixed Effects in Mixed Linear Models
by
Waseem S. Alnosaier
A DISSERTATION
submitted to
Oregon State University
in partial fulfillment of
the requirement for the
degree of
Doctor of Philosophy
Presented April 25, 2007
Commencement June 2007
Doctor of Philosophy dissertation of Waseem S. Alnosaier presented on April 25, 2007.
APPROVED:
Co-Major Professor, representing Statistics
Co-Major Professor, representing Statistics
Chair of the Department of Statistics
Dean of the Graduate School
I understand that my dissertation will become part of the permanent collection of Oregon
State University libraries. My signature below authorizes release of my dissertation to
any reader upon request.
Waseem S. Alnosaier, Author
ACKNOWLEDGMENTS
I am deeply indebt to my advisors, Dr. David Birkes and Dr. Clifford Pereira, for
giving me the opportunity to work with them. I especially thank Dr. Birkes who patiently
and tirelessly helped me throughout my doctoral program. Without his encouragement
and constant guidance, this dissertation would not have been finished. I really can not
thank him enough, and it is an honor to work with him. I also greatly thank Dr. Pereira
for his help and suggestions. I appreciate his attention, and valuable comments that
improved the work of this thesis.
I would also like to thank the entire faculty, and staff in the Department of
Statistics. I especially thank Dr. Alix Gitelman and Dr. Lisa Madsen for serving on my
committee, and for their concern and comments. I thank Professor Robert Smythe and
Professor Daniel Schafer for their help and concern. I also thank Professor Philippe
Rossignol from the Department of Fisheries and Wildlife for serving on my committee.
I am thankful to the faculty in the Department of Statistics at University of
Pittsburgh, especially Professor Leon Gleser, Professor Satish Iyengar, and Professor
David Stoffer who helped me during my studies there.
Special thanks go to all current and former students, of whom which there are too
many to mention, who influenced my statistical education. In particular, I thank my office
mate, Yonghai Li, for the unforgettable memories and valuable discussions we had.
I would like to express my sincere appreciation to my parents for their
compassion, encouragement, and support. They taught me at a young age how to respect
education, and offered me love and support through the course of my life.
Thanks are due foremost to my wonderful wife, Sohailah, and my lovely
children, Sarah, Kasim and Yaser, for supporting me throughout the entire process.
Thank you Sohailah for your patience, sacrifices and believing in me. You helped me in
so many ways, and I will never forget that. Your inspiration, understanding, and love
made this work possible.
TABLE OF CONTENTS
Page
1. INTRODUCTION ……………………………………………………………………..1
1.1 Previous Results ………………………………………………………. ……….....1
1.2 Contributions and Summary of Results …………………………………………...4
2. VARIANCE-COVARIANCE MATRIX OF THE FIXED EFFECTS
ESTIMATOR …………………………………………………………………………..7
2.1 The Model …………………………………………………………………………7
2.2 Notation ……………………………………………………………………………7
2.3 Assumptions ……………………………………………………………………….8
2.4 Estimating Var( β̂ ) ……………………………………………………………….11
3. TESTING THE FIXED EFFECTS …………………………………………………...21
3.1 Constructing a Wald-Type Pivot ...……………………………………………....21
3.2 Estimating the Denominator Degrees of Freedom and the Scale Factor ………...36
4. MODIFYING THE ESTIMATES OF THE DENOMINATOR DEGREES OF
FREEDOM AND THE SCALE FACTOR …………………………………………..38
4.1 Balanced One-Way Anova Model ……………………………………………….38
4.2 The Hotelling T 2 model ………………………………………………………….41
4.3 Kenward and Roger’s Modification ……………………………………………...48
4.4 Modifying the Approach with the Usual Variance-Covariance Matrix ………….54
5. TWO PROPOSED MODIFICATIONS FOR THE KENWARD-ROGER
METHOD …………………………………………………………………………….55
5.1 First Proposed Modification ……………………………………………………..55
5.2 Second Proposed Modification …………………………………………………..58
TABLE OF CONTENTS (Continued)
Page
5.3 Comparisons among the K-R and the Proposed Modifications ………………….61
6. KENWARD-ROGER APPROXIMATION FOR THREE GENERAL MODELS ….67
6.1 Modified Rady’s Model ………………………………………………………….67
6.2 A General Multivariate Linear Model …………………………………………...77
6.3 A Balanced Multivariate Model with a Random Group Effect ...………………..83
7. THE SATTERTHWAITE APPROXIMATION ……………………………………..97
7.1 The Satterthwaite Method ………………………………………………………..97
7.2 The K-R, the Satterthwaite and the Proposed Methods ………………………….98
8. SIMULATION STUDY FOR BLOCK DESIGNS …………………………………105
8.1 Preparing Formulas for Computations ………………………………………….105
8.2 Simulation Results ……………………………………………………………...109
8.3 Comments and Conclusion ...…………………………………………………...124
9. CONCLUSION ……………………………………………………………………...127
9.1 Summary ………………………………………………………………………..127
9.2 Future Research ………………………………………………………………...128
Bibliography …………………………………………………………………………130
LIST OF TABLES
Table
Page
1.1
PBIB1 (t = s = 15, k = 4, and n = 60) …………………………………………...110
1.2
Simulated size of nominal 5% Wald F- tests for PBIB1 …..……………………111
1.3
Mean of estimated denominator degrees of freedom for PBIB1 ………………..111
1.4
Mean of estimated scale for PBIB1 ……………………………………………..111
1.5
Percentage relative bias in the variance estimates, convergence rate (CR),
and efficiency ( E ) for PBIB1 …………………………………………………...111
1.6
PBIB2 design (t = 16, s = 48, k = 2, and n = 96) ………………………………...112
1.7
Simulated size of nominal 5% Wald F- tests for PBIB2 ……..…………………112
1.8
Mean of estimated denominator degrees of freedom for PBIB2 ………………..112
1.9
Mean of estimated scales for PBIB2 …………………………………………….113
1.10 Percentage relative bias in the variance estimates, convergence rate (CR),
and efficiency ( E ) for PBIB2 …………………………………………………...113
2.1
BIB1 (t = 6, s = 15, k = 2, and n = 30) …………………………………………114
2.2
Simulated size of nominal 5% Wald F- tests for BIB1 ...………………………..114
2.3
Mean of estimated denominator degrees of freedom for BIB1 …………………114
LIST OF TABLES (Continued)
Table
Page
2.4
Mean of estimated scales for BIB1 ……………………………………………...114
2.5
Percentage relative bias in the variance estimates, convergence rate (CR),
and efficiency ( E ) for BIB1 ……………………………………………………..115
2.6
BIB2 (t = 6, s = 10, k = 3, and n = 30) ………………………………………….115
2.7
Simulated size of nominal 5% Wald F- tests for BIB2 ...………………………..115
2.8
Mean of estimated denominator degrees of freedom for BIB2 …………………116
2.9
Mean of estimated scales for BIB2 ……………………………………………...116
2.10 Percentage relative bias in the variance estimates, convergence rate (CR),
and efficiency for BIB2 …………………………………………………………116
2.11 BIB3 (t = 7, s = 7, k = 4, and n = 28) …………………………………………..117
2.12 Simulated size of nominal 5% Wald F- tests for BIB3 …...……………………..117
2.13 Mean of estimated denominator degrees of freedom for BIB3 …………………117
2.14 Mean of estimated scales for BIB3 ……………………………………………...117
2.15 Percentage relative bias in the variance estimates, convergence rate (CR),
and efficiency for BIB3 …………………………………………………………118
LIST OF TABLES (Continued)
Table
Page
2.16 BIB4 (t = 9, s = 36, k = 2, and n = 72) ………………………………………….118
2.17 Simulated size of nominal 5% Wald F- tests for BIB4 ...………………………..118
2.18 Mean of estimated denominator degrees of freedom for BIB4 …………………119
2.19 Mean of estimated scales for BIB4 ……………………………………………...119
2.20 Percentage relative bias in the variance estimates, convergence rate (CR),
and efficiency ( E ) for BIB4 …………………………………………………….119
2.21 BIB5 design (t = 9, s = 18, k = 4, and n = 72) ………………………………….120
2.22 Simulated size of nominal 5% Wald F- tests for BIB5 ...………………………..120
2.23 Mean of estimated denominator degrees of freedom for BIB5 …………………120
2.24 Mean of estimated scales for BIB5 ……………………………………………...120
2.25 Percentage relative bias in the variance estimates, convergence rate (CR),
and efficiency ( E ) for BIB5 …………………………………………………….121
2.26 BIB6 (t = 9, s = 12, k = 6, and n = 72) ………………………………………….121
2.27 Simulated size of nominal 5% Wald F- tests for BIB6 ...………………………..121
2.28 Mean of estimated denominator degrees of freedom for BIB6 …………………121
LIST OF TABLES (Continued)
Table
Page
2.29 Mean of estimated scales for BIB6 ……………………………………………...122
2.30 Percentage relative bias in the variance estimates, convergence rate (CR),
and efficiency ( E ) for BIB6 …………………………………………………….122
3.1
Simulated size of nominal 5% Wald F- tests for RCB1 ……..…………………122
3.2
Mean of estimated denominator degrees of freedom for RCB1 ………………...123
3.3
Mean of estimated scales for RCB1 ……………………………………………..123
3.4
Percentage relative bias in the variance estimates, convergence rate (CR),
and efficiency ( E ) for RCB1 …………………………………………………...123
3.5
Simulated size of nominal 5% Wald F- tests for RCB2 ..……………………….123
3.6
Mean of estimated denominator degrees of freedom for RCB2 ………………...124
3.7
Mean of estimated scales for RCB2 ……………………………………………..124
3.8
Percentage relative bias in the variance estimates, convergence rate (CR),
and efficiency ( E ) for RCB2 …………………………………………………...124
To my wife, Sohailah,
for her significant support.
KENWARD-ROGER APPROXIMATE F TEST FOR
FIXED EFFECTS IN MIXED LINEAR MODELS
1. INTRODUCTION
When testing fixed effects in a mixed linear model, an exact F test may not exist.
Several approaches are available to perform approximate F tests, and one of the most
common approaches is the method derived by Kenward and Roger (1997). Since the
Kenward-Roger method has been implemented in the MIXED procedure of the SAS
system, it has become well known and widely used. In this thesis, we investigate the
Kenward-Roger approach, and suggest two other modifications.
1.1 Previous Results
In testing a hypothesis about fixed effects, it is desirable to find an exact test. In
balanced mixed linear models, the standard Anova method leads to optimal exact F tests
for the fixed effects in most cases. In such models, Seifert (1979) showed that the
standard Anova F tests are uniformly most powerful invariant unbiased tests (UMPIU).
Rady (1986) studied the needed assumptions for the mixed linear model so that the
Anova method produces optimal exact F tests for the fixed effects. Seely and Rady
(1988) studied conditions where the random effects can be treated as fixed to construct an
exact F test for the fixed effects. VanLeeuwen et al. (1998) introduced the concept of
error orthogonal (EO) designs. In models with EO structure, we have an exact F test for
certain standard hypotheses. Utlaut et al. (2001) introduced the concept of simple error
orthogonal (SEO) designs, in which optimal exact F tests can be obtained. If a mixed
effects Anova model has a type of “partial balance” called b&r balanced (VanLeeuwen
et al. , 1999), then it is SEO. A typical normal balanced Anova model is b&r balanced,
and therefore, has an optimal exact F test as indicated above. In particular, when the
smallest random effect that contains a fixed effect is unique, there is an optimal exact F
test for the fixed effect which coincides with the Anova F test (Birkes, 2004). In addition,
exact F tests have been constructed for certain other models. For instance, an exact F test
2
can be obtained for testing certain hypotheses about fixed effects in a general multivariate
model (Mardia et al. , 1992). Also, an exact F test can be obtained to test fixed effects in
a balanced multivariate model with a random group effect (Birkes, 2006).
In many cases, constructing an exact test seems hard and approximations need to
be employed. When no two mean squares in the Anova table have the same expectations
under the null hypothesis, the Anova method may still be used to obtain an approximate F
test. If a mean square can be created as a linear function of other mean squares such that
it has the same expectation as the mean square of the fixed effects under the null
hypothesis, then the created mean square is used in the denominator of an F test where its
degrees of freedom can be approximated by Satterthwaite (1946).
Another common approach to produce an approximate F test for the fixed effects
is based on the Wald-type statistic. When using a Wald-type test, the numerator degrees
of freedom equal the number of contrasts being tested, but the denominator degrees of
freedom needs to be estimated. In the Containment method, which is the default method
in SAS’s MIXED procedure, the denominator degrees of freedom are chosen be the
smallest rank contribution of the random effects that contain the fixed effects to the
design matrix. If no such effects are found, the estimate is the residual degrees of
freedom.
Giesbrecht and Burns (1985) proposed a method based on Satterthwaite (1941) to
determine the denominator degrees of freedom for an approximate Wald-type t test for a
single contrast of the fixed effects. Fai and Cornelius (1996) extended Giesbrecht and
Burns method where they approximate the Wald-type test for a multidimensional
hypothesis by an F distribution. Both approaches proposed by Giesbrecht and Burns, and
Fai and Cornelius use the conventional estimator of the variance-covariance matrix of the
fixed effects estimator, which is the variance-covariance matrix of the generalized least
squares (GLS) estimator of the fixed effects, replacing the variance components with the
REML estimates. The Fai and Cornelius approach is referred to as the Satterthwaite
approach in some literature and in this thesis as well. Bellavance et al. (1996) suggested a
method to improve the Anova F test by using a scaled F distribution with modified
3
degrees of freedom.
It is known that the conventional estimator of the variance-covariance matrix of
the fixed effects estimator underestimates (Kackar and Harville, 1984). Indeed, Kackar
and Harville expressed the variance-covariance matrix of the fixed effects estimator as a
sum of two components. The first component is the variance-covariance matrix of the
GLS estimator of the fixed effects, and the second component represents what the first
component underestimates. The second component was approximated by Kackar and
Harville (1984) and Prasad and Rao (1990), and estimation of the first component was
addressed by Harville and Jeske (1992) and later by Kenward and Roger (1997). In fact,
Kenward and Roger (1997) combined estimators of the two components to produce an
adjusted estimator for the variance-covariance matrix of the fixed effects estimator. They
plug the adjusted estimator into the Wald-type statistic where a scaled form of the
statistic follows an F distribution approximately. Unlike the Satterthwaite-based
approaches where only the degrees of freedom need to be estimated, in Kenward and
Roger’s approximation, two quantities need to be estimated from the data: the
denominator degrees of freedom and the scaling factor.
After deriving approximate expressions for the expectation and the variance of the
Wald-type statistic, Kenward and Roger then match these with the first and second
moments of the F distribution to determine the estimates of the denominator degrees of
freedom and the scale. Moreover, Kenward and Roger modified the approximation in
such a way that the estimates match the known values for two special cases where a
scaled form of the Wald-type statistic has an exact F test. In fact, the idea of modifying
the approach in a way to produce the exact values was adopted previously by Graybill
and Wang (1980) when they modified approximate confidence intervals on certain
functions of the variance components in a way to make them exact for some special
cases.
Lately, the performance of Kenward and Roger’s approximation has become a
subject for some simulation studies. For example, Schaalje et al. (2002) compared the KR and the Satterthwaite approaches for a split plot design with repeated measures for
4
several sample sizes and covariance structures. They found that the K-R method performs
as well as or better than the Satterthwaite approach in all situations. They considered
three factors in the comparisons: the complexity of the covariance matrix, imbalance, and
the sample size, and they found that these factors affect the Satterthwaite method more
than the K-R method. The Satterthwaite method was found to work well only when the
sample size is moderately large and the covariance matrix was compound symmetric. The
K-R method was found to have a tendency toward inflated levels when the sample size
was small, except when the covariance structure was compound symmetric. Chen and
Wei (2003) compared the Kenward-Roger approach and a modified Anova method
suggested by Bellavance et al. (1996) for some crossover designs. They recommended
using the K-R approach when the sample size is at least 24. For smaller sample sizes,
they found the modified Anova method works better than the K-R approach. Savin et al.
(2003) found the K-R approach reliable to construct a confidence interval for the
common mean in interlaboratory trials. Spike et al. (2004) investigated the K-R and the
Satterthwaite methods to estimate the denominator degrees of freedom for contrasts of
the fixed subplot effects. Like the conclusion drawn by Schaalje et al. (2002) above, they
suggested Kenward and Roger’s approximation be preferred in small datasets, and the
two methods are comparable for large datasets. Valderas et al. (2005) studied the
performance of the K-R approach when AIC and BIC are used as criteria to select the
covariance structure. They found the K-R method’s level much higher than the target
values. Even with the correct covariance structure, the level was found to be higher than
the target for many cases in which the covariance structure is not compound symmetric.
1.2 Contributions and Summary of Results
Since some of the detailed derivation for Kenward and Roger’s approach was
absent from their original work, we provide the detailed theoretical derivation of the
method which includes clarifying the assumptions to justify the theoretical derivation.
Also, we weaken some of the assumptions that were imposed by Kenward and Roger,
and determine the orders of the approximations used in the derivation. We present two
5
modifications of the K-R method which are comparable in performance but simpler in
derivation and computation. Kenward and Roger modified their approach in such a way
that their method reproduces exact F tests in two special cases, namely for Hotelling T 2
and for Anova F ratios. We show that the K-R and the two proposed methods reproduce
exact F tests in three more general models, two of which are generalizations of the two
special cases. We explore relationships among the K-R, proposed and Satterthwaite
methods by specifying cases where the approaches produce the same estimate of the
denominator degrees of freedom or are even identical. Also, we show the difficulties in
developing a K-R type method using the conventional, rather than adjusted, estimator of
the variance-covariance matrix of the fixed-effects estimator.
In chapter 2, using Taylor series expansions, matrix derivatives and invariance
arguments, we derive the adjusted estimator for the variance -covariance matrix of the
fixed effects estimator that was provided by Kenward and Roger (1997). Besides
clarifying all assumptions that justify the theoretical derivation, we weaken some of the
assumptions imposed by Kenward and Roger. In addition, we determine the orders of the
approximations used in the derivation.
In chapter 3, we derive the approximate expectation and variance of the Waldtype statistics where they are constructed by using the conventional and the adjusted
estimator of the variance-covariance matrix of the fixed effects estimator. In addition, we
match the first and second moments of F distribution with these for the scaled form of the
Wald-type statistic to obtain the Kenward-Roger approximation for the denominator
degrees of freedom and scale before the modification.
Two special cases: the balanced one-way Anova model, and the Hotelling T 2
model where the Wald-type statistic have an exact F distribution are considered in
chapter 4 to establish the modification of the expectation and the variance of the Waldtype statistic proposed by Kenward and Roger so the approach produces the right and
known values for these two special cases. Kenward and Roger (1997) mentioned that it
can be argued that the conventional estimator of the variance-covariance matrix of the
fixed effects estimator can be used instead of the adjusted estimator. We discuss the
6
difficulties in modifying the approach by using the conventional estimator instead of the
adjusted estimator.
The K-R approach was derived based on modifying the approximated expressions
for the expectation and the variance of the Wald-type statistic. This modification is not
unique, and hence we introduce two other modifications for the K-R method in chapter 5.
We keep the modification of the expectation of the statistic as modified by Kenward and
Roger; however, instead of modifying the variance, we modify other related quantities.
The proposed modifications for the K-R method are comparable in performance and
simpler in derivation and computation.
As mentioned above, the special cases used by Kenward and Roger are not the
only cases where the K-R and proposed methods produce the exact values. Indeed, the
Kenward-Roger and the proposed modifications produce the exact values for three
general models where there is an optimal exact F test. The models studied in chapter 6
are: (1) Rady’s model (with a slight modification) which includes a wide class of
balanced mixed classification models and is more general than the balanced one-way
Anova model, (2) a general linear multivariate model which is more general than the
Hotelling T 2 , and (3) a balanced multivariate model with a random group effect. We
show that the estimate of the denominator degrees of freedom and the scale factor match
the known values for those models.
The Satterthwaite, the Kenward-Roger and the proposed methods perform
similarly in some situations. Chapter 7 is devoted to study the cases where these methods
produce the same estimate for the denominator degrees of freedom. Moreover, we study
the cases where the approaches become identical to each other.
In chapter 8, we provide a simulation study for three types of block designs:
partially balanced incomplete block designs, balanced incomplete block designs and
complete block designs with some missing data. In the simulation study, the sample size,
the ratio of the variance components, and the efficiency factor are considered to see how
they affect the performance of the Kenward-Roger, the proposed, the Satterthwaite and
the Containment methods.
7
2. VARIANCE-COVARIANCE MATRIX OF THE
FIXED EFFECTS ESTIMATOR
For a multivariate mixed linear model, statisticians used to estimate the precision
of the fixed effects estimates based on the asymptotic distribution. However, this estimate
was known to be biased and underestimate the variance of the fixed effects estimate.
Kenward and Roger (1997) proposed an adjustment for the estimator of the variance of
the fixed effects estimator which is investigated in this chapter.
2.1 The Model
Consider n observations y following a multivariate normal distribution,
y ∼ N(Χβ, Σ) ,
where Χ(n × p ) is a full column rank matrix of known covariates, β( p × 1) is a vector of
unknown parameters and Σ (n × n) is an unknown variance-covariance matrix whose
elements are assumed to be functions of r parameters, σ (r × 1) = (σ 1 ,...., σ r )′ . The
generalized least squares estimator of β is β = ( Χ′Σ −1Χ) −1 Χ′Σ −1y , and the matrix
Φ = ( Χ′Σ −1 Χ) −1 is the variance-covariance matrix of this estimator. The REML
estimator of σ is denoted by σ̂ , and the REML-based estimated generalized least squares
estimator of β (EGLSE) is βˆ = ( Χ′Σ −1 (σˆ ) Χ) −1 Χ′Σ −1 (σˆ ) y . The center of our study is β , and
Σ is a nuisance to be addressed in the analysis. We are interested in testing H 0 : L′β = 0 ,
for L′ an ( × p ) fixed matrix .
2.2 Notation
Throughout the thesis, we use the following notation. For a matrix A , we use
A′, ℜ( A), N(A), r(A), tr(A), A , to denote transpose, range, null space, rank, trace,
and determinant of A respectively. ℜ( A) ⊥ is used to denote the orthogonal complement
of ℜ( A ). We use the abbreviation p.d. for positive definite and n.n.d. for nonnegative
definite. The notation p.o. is used for projection operator and o.p.o. for orthogonal
8
projection operator. Ρ A is o.p.o. on ℜ( A ) . In addition, we use the following
V = Var(βˆ )
W = Var(σˆ ), wij = Cov(σˆ i , σˆ j )
G = Σ −1 − Σ −1Χ( Χ′Σ −1Χ) −1 Χ′Σ −1
Θ = L(L′ΦL) −1 L′
Φ = ( Χ′Σ −1 Χ) −1
Ρ i = − Χ′Σ −1
∂Σ −1
Σ Χ
∂σ i
Qij = Χ′Σ −1
∂Σ −1 ∂Σ −1
Σ
Σ Χ
∂σ i
∂σ j
R ij = Χ′Σ −1
∂2Σ
Σ −1Χ
∂σ i ∂σ j
Λ = Var(βˆ − β)
r
r
Λ = ∑∑ wij Φ(Qij − Ρ i ΦΡ j )Φ
i =1 j =1
r
r
ˆ − Ρˆ ΦΡ
ˆ =Φ
ˆ + 2Φ
ˆ ⎧⎨∑∑ wˆ (Q
ˆ ˆ −1R
ˆ ) ⎫⎬ Φ
ˆ
Φ
ij
ij
i
j
ij
A
4
⎩ i =1 j =1
⎭
r
r
A1 = ∑∑ wij tr(ΘΦΡ i Φ)tr(ΘΦΡ j Φ)
i =1 j =1
r
r
A2 = ∑∑ wij tr(ΘΦΡ i ΦΘΦΡ j Φ),
i =1 j =1
r
r
1
A3 = ∑∑ wij tr[ΘΦ(Qij − Ρ i ΦΡ j − R ij )Φ]
4
i =1 j =1
1
ˆ L) −1 L′βˆ , unless otherwise mentioned.
F = βˆ ′L(L′Φ
A
ˆ = H (σˆ ).
For a matrix H (σ ), we use H
2.3 Assumptions
For chapters 2 and 3, we impose the following assumptions about the model.
(A1) The expectation of β̂ exists.
(A2) Σ is a block diagonal and nonsingular matrix. Also, we assume that the elements of
9
∂Σ k ∂ 2 Σ k
,
, and Χ k are bounded, where Σ = diag1≤ k ≤ m ( Σ k ), and
Σ k ,Σ ,
∂σ i ∂σ i ∂σ j
−1
k
Χ = col 1≤k ≤m ( Χ k ), and sup ni < ∞ , where ni are the blocks sizes.
(A3) E[σˆ ] = σ + O (n − 2 ). .
3
(A4) The possible dependence between σˆ and βˆ is ignored.
r
(A5)
r
⎡ ∂β
∑∑ Cov ⎢( ∂σ
i =1 j =1
⎢⎣
i
)(
⎤
∂β
5
)′, (σˆ i − σ i )(σˆ j − σ j ) ⎥ = O (n − 2 ).
∂σ j
⎥⎦
(A6) Φ = ( Χ′Σ −1 Χ) −1 = O(n −1 ), (L′ΦL) −1 = O(n),
∂β
∂ 2β
1
− 12
= O(n ), 2 = O(n − 2 )
∂σ i
∂σ i
(A7) If T = Op (nα ), then E[T] = O(nα )
Remarks
(i ) Σk is said to be bounded when max (elements of Σ k ) ≤ b(σ) for some constant b .
Since sup ni < ∞ , then n → ∞ ⇔ m → ∞.
(ii ) Even though Kenward and Roger required the bias of the REML estimator to be
ignored, we only require E[σˆ ] = σ + O ( n − 2 ) as stated as assumption (A3). In fact, for the
3
model mentioned in section 2.1, E[σˆ − σ ] = b(σ ) + O(n −2 ), where b(σ ) is of order O (n −1 )
(Pase and Salvan, 1997, expression 9.62). Moreover, when the covariance structure is
linear, b(σ ) = 0 , and hence E[σˆ − σ ] = O (n −2 ) which is stronger than what we need.
(iii ) Assumption (A4) was also imposed by Kenward and Roger. In fact, we did
investigate some models, like Hotelling T 2 model, the fixed effects model, and the one
way Anova model with fixed group effects and unequal group variances. In these models,
σˆ and βˆ are independent exactly. Also, for those models, the sum of covariance in
assumption (A5) is zero which is stronger than the assumption. This assumption is
needed to derive approximate expressions for the expectation and the variance of the
statistic by conditioning on σ̂ as we will see in chapter 3. However, we should mention
10
that the same derivation of the expectation and the variance of the statistic can be done
without conditioning on σ̂ , in which case assumption (A4) is not needed anymore.
(iv ) One circumstance for the covariance in assumption (A5) to be exactly zero is
when σ̂ is obtained from a previous sample, and β(σ ) from current data (Kackar and
Harville, 1984). Also, Kackar and Harville argued that when σ̂ is unbiased and
⎡ ∂β ∂β
⎤
E ⎢(
)(
)′ | σˆ ⎥ can be approximated by first order Taylor series, then the covariance
⎢⎣ ∂σ i ∂σ j
⎥⎦
in (A5) is expected to be zero. It appears that Kenward and Roger assumed one of the
arguments mentioned. For us, it is enough to assume (A5).
(v) Χ′H (σ ) Χ = O(n), where H (σ ) = diag1≤ k ≤ m (H k (σ ), and H k (σ ) is a product of any
combination of Σ k ,Σ −k 1 ,
∂Σ k
∂ 2Σk
, and
. This is true because Χ′k H k (σ ) Χ k = O(1)
∂σ i
∂σ i ∂σ j
for 1 ≤ k ≤ m ⇒ max(elements of Χ′k H k (σ ) Χ k ) ≤ a for 1 ≤ k ≤ m
m
⇒ ∑ max(elements of Χ′k H k (σ ) Χ k ) ≤ ma
k =1
⇒
m
∑ max(elements of Χ′ H
k
k =1
m
k
(σ ) Χ k ) ≤ ∑ max(elements of Χ′k H k (σ ) Χ k ) ≤ ma,
k =1
m
and hence Χ′H (σ ) Χ = ∑ Χ′k H k (σ ) Χ k = O(m) = O(n).
k =1
(vi ) A general situation where the first two conditions in assumption (A6) hold is when
all Χ′k Σ −k 1Χ k are contained in a compact
set
of p.d. matrices ⇒
1
Χ′Σ −1Χ ∈
m
⇒ mΦ ∈
and hence Φ = O (n −1 ). Also, we have mL′ΦL ∈ L′
−1
L (p.d.) ⇒ (L′ΦL)−1 = O (n).
−1
Also, this assumption holds when we suppose Σ k = Σ1 ∀k , and we suppose that
1 m
Χ′k Σ1−1Χ k → A (p.d.). This supposition is reasonable in two situations:
∑
m k =1
1) Χ k = Χ1 ∀k , and in this case, A = Χ1′ Σ1−1Χ1.
2) Χk are regarded as iid random covariate matrices, and by the weak law of large numbers,
11
Χ′k Σ1−1Χ k converges to E[Χ1′ Σ1−1Χ1 ].
Then, since inversion is a continuous operation, mΦ → A −1. That is Φ = O( n −1 )
1
Also, (L′ΦL) −1 → (L′A −1L) −1. That is (L′ΦL) −1 = O(n).
m
∂β
∂Σ
= −ΦΧ′Σ −1
Gy (as we will see in lemma 2.4.7).
∂σ i
∂σ i
m
∂Σ
G , and Χ′B(σ )y = ∑ Χ′k B k (σ )y k ,
∂σ i
k =1
where y = col1≤ k ≤ m (y k ), and B(σ ) = diag 1≤k ≤m (B k (σ ). If Χ′k B k (σ )y k are iid, then by the
Consider Χ′B(σ)y , where B(σ ) = Σ −1
⎡1 m
⎤
weak law of large numbers, we have m ⎢ ∑ Χ′k B k (σ )y k − E[Χ′k B k (σ )y k ]⎥ = O (1).
⎣ m k =1
⎦
However, in lemma (2.4.7), we have E[Χ′k B k (σ )y k ] = 0, and hence Χ′B(σ )y = O (n 2 ).
1
⇒
∂β
1
1
= O (n −1 )O (n 2 ) = O (n − 2 ).
∂σ i
(vii ) If H (σ ) = O p (nα ), then H (σˆ ) = O p (nα ). This result is obtained by employing a Taylor
series expansion for H (σˆ ) about σ.
(viii ) For some random matrices T, their expectations have higher order than the random
matrices themselves. For instance, (σˆ i − σ i )(σˆ j − σ j )(σˆ k − σ k ) = O p (n − 2 ), as it will be
3
shown in theorem (2.4.9); however, E[(σˆ i − σ i )(σˆ j − σ j )(σˆ k − σ k )] = O (n −2 ), from
expression (9.74) in Pace and Salvan. In assumption (A7), we assume that the expectation
will reserve the order which does not conflict with the cases mentioned above.
2.4 Estimating Var( β̂ )
There are two main sources of bias in Φ̂ when it is used as an estimator for
Var( β̂ ): Φ̂ underestimates Φ , and Φ = Var(β) does not take into account the variability of
σ̂ in βˆ = β(σˆ ) . Kackar and Harville (1984) proposed that Var( β̂ ) can be partitioned as
Var( β̂ ) = Φ + Λ , and they addressed the second source of bias by approximating Λ .
The first source of bias was discussed by Harville and Jeske (1992), and Kenward and
Roger (1997) proposed an approximation to adjust the first bias, and they combined both
12
adjustments to calculate their proposed estimator of Var( β̂ ) which is denoted by Φ̂ A . In
this section, several lemmas are derived to lead to the expression for Φ̂ A .
Lemma 2.4.1 With LR (σ ) being the likelihood function of z = Κ ′y, where Κ ′Χ = 0,
With LR (σ ) being the likelihood function of z = Κ ′y , where Κ ′Χ = 0,
2log LR (σ ) = 2
R
(σ ) = constant − log Σ − log Χ′Σ -1Χ
− y ′ ⎡⎣ Σ −1 − Σ −1Χ( Χ′Σ -1Χ) −1 Χ′Σ −1 ⎤⎦ y
Proof
y ∼ N(Χβ, Σ), f (y ) = (2π )
−n
2
Σ
−1
2
⎡ 1
⎤
exp ⎢ − (y − Χβ)′Σ −1 (y − Χβ) ⎥
⎣ 2
⎦
z ∼ N(0, Κ ′ΣΚ) , (Birkes, 2004, theorem 7.2.2).
and
f (z ) = (2π )
−q
2
Κ ′ΣΚ
−1
2
⎡ 1
⎤
exp ⎢ − z′(Κ ′ΣΚ ) −1 z ⎥ , where q = n − p .
⎣ 2
⎦
The REML estimator for σ is the maximum likelihood estimator from the marginal
likelihood of z = Κ ′y where Κ ′ is any q × n matrix of full column rank
2
R
(σ ) = −q log(2π ) − log(Κ ′ΣΚ ) − z′(Κ ′ΣΚ ) −1 z
= −q log(2π ) − log(Κ ′ΣΚ ) − y ′Κ (Κ ′ΣΚ )−1 Κ ′y
To prove the lemma, it suffices to show that
(a ) y ′Κ (Κ ′ΣΚ ) −1 Κ ′y = y ′ ⎡⎣ Σ −1 − Σ −1Χ( Χ′Σ −1Χ) −1 Χ′Σ −1 ⎤⎦ y
(b) − qlog(2π ) − log Κ ′ΣΚ = constant − log Σ − log Χ′Σ −1Χ
For part ( a ), it suffices to show that
Κ (Κ ′ΣΚ ) −1 Κ ′ = Σ −1 − Σ −1Χ( Χ′Σ -1Χ) −1 Χ′Σ −1
Indeed, Κ (Κ ′ΣΚ ) −1 Κ ′Σ = Ι − Σ −1Χ(Χ′Σ -1Χ)−1 Χ′
(Seely, 2002, problem (2.B.4))
⇒ Κ (Κ ′ΣΚ ) Κ ′ = Σ − Σ Χ( Χ′Σ Χ) Χ′Σ −1
−1
−1
−1
-1
−1
Observe that Σ is nonsingular (assumption A2), and n.n.d (Birkes, 2004), then Σ is p.d.
(Seely, 2002, corollary 1.9.3),
and hence
r(Κ ′ΣΚ ) = r(Κ ′)
Also,we have
ℜ(Κ ) = N ( Χ′)
(Birkes, 2004, a lemma on Jan 23).
(because ℜ(Κ ) = ℜ( Χ)⊥ )
13
So, all conditions of problem B.4 from Seely notes are satisfied.
For part (b), choose Τ=Χ( Χ′Χ)
−1
−1
−1
, so Τ′Τ = (Χ′Χ) 2 Χ′Χ(Χ′Χ)
Also, notice that we can choose Κ ′ to be orthogonal so Κ ′Κ = Ι.
2
2
= Ι.
Κ ′Τ = 0, and this is Κ ′Χ = 0.
⎡ Τ′ ⎤
⎡ Τ′Τ Τ′Κ ⎤ ⎡ Ι 0⎤
Let [ Τ Κ ] = R , so R′R = ⎢ ⎥ [ Τ Κ ] = ⎢
⎥=⎢
⎥
⎣Κ ′⎦
⎣Κ ′Τ Κ ′Κ ⎦ ⎣0 Ι ⎦
Hence R′R = 1 ,
and
Σ = R Σ R′ = RΣR′ =
Τ′ΣΤ Τ′ΣΚ
= Κ ′ΣΚ Τ′ΣΤ − Τ′ΣΚ (Κ ′ΣΚ )−1 Κ ′ΣΤ
Κ ′ΣΤ Κ ′ΣΚ
= Κ ′ΣΚ ( Χ′Σ -1Χ) −1 Χ′Χ .
So, log Σ = log Κ ′ΣΚ − log Χ′Σ -1Χ + log Χ′Χ
⇒ − log Κ ′ΣΚ = − log Σ − log Χ′Σ -1Χ + constant.
Lemma 2.4.2
Proof
β̂ is symmetric about β .
Since σ̂ is reflection and translation invariant (Birkes, 2004), then β̂ is
reflection and translation equivariant (Birkes, 2004, lemma 7.1).
d
Since y ∼ N(Χβ, Σ), then y − Χβ = − (y − Χβ)
⇒
βˆ (y ) − β = βˆ (y ) + (−β) = βˆ (y + Χ(−β)) (βˆ is translation equivariant)
d
= βˆ (y − Χβ) = − βˆ (−(y − Χβ))
= −βˆ ( y − Χβ) , βˆ is a reflection equivariant
= −(βˆ (y ) − β).
Lemma 2.4.3
Given the expectation exists (assumption A1), β̂ is an unbiased
estimator for β .
Proof
d
By applying lemma 2.3.2, βˆ − β = − (βˆ − β) ⇒ E(βˆ − β) = E[−(βˆ − β )]
14
⇒ E(βˆ ) − β = − E(βˆ ) + β ⇒ 2E(βˆ ) = 2β ⇒ E(βˆ ) = β.
Lemma 2.4.4
Proof
β and (Ι - Ρ Χ )y are independent.
Cov ⎣⎡β, (Ι - Ρ Χ )y ⎦⎤ = Cov ⎡⎣ ( Χ′Σ −1Χ) −1 Χ′Σ −1y , (Ι - Ρ Χ )y ⎤⎦
= ( Χ′Σ −1Χ) −1 Χ′Σ −1Σ(Ι - Ρ Χ ) = ( Χ′Σ −1Χ) −1 Χ′(Ι - Ρ Χ )
= ( Χ′Σ −1Χ) −1 Χ′ − ( Χ′Σ -1Χ) −1 Χ′Ρ Χ
= ( Χ′Σ −1Χ) −1 Χ′ − ( Χ′Σ −1Χ) −1 Χ′ = 0
Since β and (Ι - Ρ Χ )y are both normally distributed, then they are independent (Birkes,
2004, proposition 7.2.3).
Lemma 2.4.5
Proof
βˆ − β is a function of (Ι - Ρ Χ )y .
Consider (βˆ − β )( y ) = δ( y ), and hence δ(y ) is a translation-invariant function.
To show that δ(y ) is a function of (Ι - Ρ Χ )y, it is equivalent to show that:
(Ι - Ρ Χ )y (1) = (Ι - Ρ Χ )y (2) ⇒ δ(y (1) ) = δ(y (2) )
(Ι - Ρ Χ )y (1) = (Ι - Ρ Χ )y (2) ⇒ y (1) − Ρ Χ y (1) = y (2) − Ρ Χ y (2)
⇒ y (2) = y (1) − Ρ Χ y (1) + Ρ Χ y (2) = y (1) + Ρ Χ (y (2) − y (1) )
⇒ δ(y (2) ) = δ(y (1) + Χb)
for some b
= δ(y (1) ) (δ is translation invariant).
Lemma 2.4.6
Proof
Var(βˆ ) = Φ + Λ, where Λ = Var(βˆ − β)
From lemma 2.4.4, β and (Ι - Ρ Χ )y are independent, and from lemma 2.4.5,
βˆ − β is a function of (Ι - Ρ Χ )y . Then, β and βˆ − β are independent.
Write
βˆ = β + βˆ − β
⇒ Var(βˆ ) = Var(β) + Var(βˆ − β ) = Φ + Λ.
15
⎡ ∂β ⎤
⎡ ∂β
⎤
)(σˆ i − σ i ) ⎥ = 0 for i = 1,...., r .
Lemma 2.4.7 E ⎢
⎥ = 0, and E ⎢(
⎣ ∂σ i ⎦
⎣ ∂σ i
⎦
∂β
∂
⎡( Χ′Σ −1 Χ) −1 Χ′Σ −1y ⎤⎦
=
∂σ i ∂σ i ⎣
Proof
=
∂
∂
⎡⎣( Χ′Σ -1 Χ) −1 ⎤⎦ Χ′Σ −1y + ( Χ′Σ -1 Χ) −1
⎡⎣ Χ′Σ −1y ⎤⎦
∂σ i
∂σ i
= −ΦΧ′Σ −1
∂Σ
Gy where G = Σ −1 − Σ −1Χ( Χ′Σ -1Χ )−1 Χ′Σ −1
∂σ i
(2.1)
⎡ ∂β ⎤
⎡
⎤
∂Σ
−1 ∂Σ
E(Gy ) ,
⇒ E⎢
Gy ⎥ = −ΦΧ′Σ −1
⎥ = E ⎢ −ΦΧ′Σ
σ
σ
σ
∂
∂
∂
i
i
⎣
⎦
⎣ i⎦
where Ε[Gy ] = E[(Σ −1 − Σ −1Χ( Χ′Σ-1Χ)−1 Χ′Σ −1 )y ]
= Σ −1E[y − Χ( Χ′Σ -1Χ) −1 Χ′Σ -1Y] = Σ −1 (Χβ − Χβ) = 0.
⎡ ∂β
⎤
∂Σ
E⎢
(σˆ i − σ i ) ⎥ = −ΦΧ′Σ −1
Ε[ g (y )], where g (y ) = Gy (σˆ i (y ) − σ i ).
∂σ i
⎣ ∂σ i
⎦
g (−y ) = G (− y ) [σˆ i (−y ) − σ i )] = −Gy [σˆ i (− y ) − σ i )] = − g (y ), because σˆ i is reflection
equivariant. So, g is reflection-equaivariant.
Since y ∼ N(Χβ, Σ)
d
d
⇒ y − Χβ = − (y − Χβ) ⇒ y = − y + 2Χβ
d
and hence g(y ) = g( − y + 2 Χβ)
= −g[y + Χ( −2β )]
because g is reflection equivariant
= − g (y ), because g is translation invariant.
Notice that g (y + Χβ) = G (y + Χβ) [σˆ i (y + Χβ) − σ i ]
= (Gy + GΧβ) [σˆ i (y + Χβ) − σ i ] = Gy[σˆ i (y ) − σ i ] = g (y ),
and hence g is a translation invariant function.
Since g (y ) = − g (y ), then Ε[ g (y )] = Ε[− g (y )] = −Ε[ g (y )]
⎡ ∂β
⎤
)(σˆ i − σ i ) ⎥ = 0 for i = 1,...., r
⇒ 2Ε[ g (y )] = 0 ⇒ Ε[ g (y )] = 0, and hence E ⎢ (
⎣ ∂σ i
⎦
16
Lemma 2.4.8
(a )
∂Φ
= −ΦΡ i Φ
∂σ i
∂ 2Φ
= Φ(Ρ i ΦΡ j + Ρ j ΦΡ i − Qij − Q ji + R ij )Φ,
(b)
∂σ i ∂σ j
Ρ i = − Χ′Σ −1
where
R ij = Χ′Σ −1
and
Proof
∂Σ −1
Σ Χ,
∂σ i
Qij = Χ′Σ −1
∂Σ −1 ∂Σ −1
Σ
Σ Χ,
∂σ i
∂σ j
∂2Σ
Σ −1Χ
∂σ i ∂σ j
−1
∂Φ
−1 ∂ ( Χ′Σ Χ)
-1
′
(a)
( Χ′Σ −1Χ) −1
= −( Χ Σ Χ)
∂σ i
∂σ i
= −( Χ′Σ −1Χ) −1 Χ′
(b)
∂ 2Φ
∂
=−
∂σ i ∂σ j
∂σ i
∂Σ −1
Χ( Χ′Σ −1Χ)−1 = −ΦΡ i Φ,
∂σ i
⎧⎪
⎫⎪
∂Σ −1
−1
−1
′
′
Χ
Σ
Χ
Χ
Χ( Χ′Σ −1Χ) −1 ⎬
(
)
⎨
∂σ j
⎪⎩
⎭⎪
∂Σ −1
∂Σ −1
−1
−1
= ( Χ′Σ Χ) Χ′
Χ( Χ′Σ Χ) Χ′
Χ( Χ′Σ −1Χ) −1
∂σ i
∂σ j
−1
−1
− ( Χ′Σ −1Χ) −1 Χ′
∂ 2 Σ −1
Χ( Χ′Σ −1Χ) −1
∂σ i ∂σ j
+ ( Χ′Σ −1Χ) −1 Χ′
∂Σ −1
∂Σ −1
Χ( Χ′Σ -1Χ) −1 Χ′
Χ( Χ′Σ -1Χ) −1
∂σ j
∂σ i
⎧⎪
⎫⎪
∂ 2 Σ −1
= Φ ⎨Ρ i ΦΡ j − Χ′
Χ + Ρ j ΦΡ i ⎬ Φ.
∂σ i ∂σ j
⎪⎩
⎪⎭
Observe that Χ′
= Χ′Σ −1
∂ 2 Σ −1
∂
Χ = Χ′
∂σ i ∂σ j
∂σ i
(2.2)
⎛ −1 ∂Σ −1 ⎞
Σ ⎟Χ
⎜⎜ − Σ
⎟
∂
σ
j
⎝
⎠
∂Σ −1 ∂Σ −1
∂2Σ
∂Σ −1 ∂Σ −1
Σ
Σ Χ − Χ′Σ −1
Σ −1Χ + Χ′Σ −1
Σ
Σ Χ
∂σ i
∂σ j
∂σ i ∂σ j
∂σ j
∂σ i
= Q ij − R ij + Q ji
Combining expressions 2.3 and 2.3, we obtain
(2.3)
17
∂ 2Φ
= Φ(Ρ i ΦΡ j + Ρ j ΦΡ i − Qij − Q ji + R ij )Φ.
∂σ i ∂σ j
Before proceeding, we present asymptotic orders for some terms that will be used often
in chapters 2 and 3.
Lemma 2.4.9 (a)Φ = O(n −1 ),
(b)Ρi = O(n),
(e)Θ = O(n), ( f ) wij = Cov(σˆ i , σˆ j ) = O(n −1 ),
(i )
∂ 2Φ
= O(n −1 ),
∂σ i ∂σ j
( m)
∂2Λ
= O(n −2 ),
∂σ i
Proof
( j)
∂wij
∂σ i
= O (n −1 ),
(k )
(c)Qij = O(n),
( g ) Λ = O( n −2 ),
(d )R ij = O(n),
( h)
∂Φ
= O( n −1 ),
∂σ i
∂ 2 wij
∂Λ
= O(n −2 ), (l )
= O(n −1 ),
2
∂σ i
∂σ i
(n)σˆ i − σ i = O p ( n − 2 )
1
Parts (b), (c), and( d ) are direct from remark(v ) above.
Parts ( a ) and( e ) are direct from assumption (A6).
wij = i ij + a (σ ), where i ij is the (i, j ) entry of the inverse of the expected information matrix
(Pace and Salvan, 1997, expression 9.73). Pace and Salvan, showed that i ij = O (n −1 ),
and a (σ ) = O(n −2 ).
From parts (a ), (b), (c), and (f ), we have Λ = O (n −2 ).
Since
∂Φ
∂ 2Φ
= −ΦΡ i Φ,
= Φ(Ρ i ΦΡ j + Ρ j ΦΡ i − Qij − Q ji + R ij )Φ(lemma 2.4.8 ),
∂σ i
∂σ i ∂σ j
then from previous parts, results (h) and (i ) are obtained.
By using expression (9.17) in Pace and Salvan(1997), we have
∂i ij
= O (n −1 ),
∂σ i
∂ 2i ij
∂ 2 a(σ )
−1 ∂a (σ )
−2
= O(n ),
= O(n ), and
= O(n −2 )hence results ( j ) and (l ) are
2
2
∂σ i
∂σ i
∂σ i
obtained. By computing the derivative of Λ, and using previous parts, results (k )
and (m) are obtained. Finally, expression (n) is a direct result of the asymptotic normality
of
n (σˆ i − σ i ) (Pace and Salvan, expression 3.38).
18
5
Lemma 2.4.10 Λ = Var(βˆ − β) = Λ + O ( n − 2 ),
Proof
Using a Taylor series expansion about σ , we have
r
∂β
3
βˆ = β(σˆ ) = β(σ ) + ∑
(σˆ i − σ i ) + O p (n − 2 ).
i =1 ∂σ i
r
∂β
3
⇒ βˆ − β = ∑
(σˆ i − σ i ) + O p (n − 2 ),
i =1 ∂σ i
⎡ r
′⎤
r
⎛
∂β
∂β
−32 ⎞ ⎛
− 32 ⎞ ⎥
ˆ
⎢
and Λ = Var(β − β) = E ⎜ ∑
(σˆ − σ i ) + O p (n ) ⎟ ⎜ ∑
(σˆ i − σ i ) + O p (n ) ⎟
⎢⎝ i =1 ∂σ i i
⎠ ⎝ i =1 ∂σ i
⎠⎥
⎣
⎦
⎡⎛ r ∂β
⎤ ⎡⎛ r ∂β
⎤′
− 32 ⎞
− 32 ⎞
ˆ
ˆ
− E ⎢⎜ ∑
(σ i − σ i ) + O p (n ) ⎟ ⎥ E ⎢⎜ ∑
(σ i − σ i ) + O p (n ) ⎟ ⎥
⎠ ⎦ ⎣⎝ i =1 ∂σ i
⎠⎦
⎣⎝ i =1 ∂σ i
Applying lemma 2.4.7, we obtain
⎡ r r ∂β ∂β
⎤
5
Λ = Ε ⎢ ∑∑ (
)(
)′(σˆ i − σ i )(σˆ j − σ j ) ⎥ + O (n − 2 )
⎢⎣ i =1 j=1 ∂σ i ∂σ j
⎥⎦
r
r
⎡ ∂β ∂β ⎤
5
= ∑∑ Ε ⎢(
)(
)′⎥ × Ε ⎣⎡ (σˆ i − σ i )(σˆ j − σ j ) ⎦⎤ + O( n − 2 ) (assumption A5).
i=1 j =1
⎣⎢ ∂σ i ∂σ j ⎦⎥
(2.4)
By assumption (A3), we have
Ε ⎡⎣ (σˆ i − σ i )(σˆ j − σ j ) ⎤⎦ = Ε(σˆ iσˆ j ) − σ iσ j + O (n − 2 ) = wij + O (n − 2 )
3
3
(2.5)
⎡ ∂β ∂β ⎤
⎡ ∂β ∂β ⎤
⎡ ∂β ⎤
Since Ε ⎢
)(
)′⎥ = Cov ⎢
,
⎥.
⎥ = 0 (lemma 2.4.7), then Ε ⎢ (
⎢⎣ ∂σ i ∂σ j ⎥⎦
⎢⎣ ∂σ i ∂σ j ⎥⎦
⎣ ∂σ i ⎦
GΣG ′ = [ Σ −1 − Σ −1Χ( Χ′Σ -1Χ) −1 Χ′Σ −1 ]Σ[ Σ −1 − Σ −1Χ( Χ′Σ -1Χ) −1 Χ′Σ −1 ]
= Σ −1[Ι − Χ( Χ′Σ -1Χ) −1 Χ′Σ −1 ][Ι − Χ( Χ′Σ -1Χ)−1 Χ′Σ −1 ]
= Σ −1[Ι − Χ( Χ′Σ -1Χ) −1 Χ′Σ −1 ] = G ,
Then, using expression (2.1), we obtain
⎡ ∂β ∂β ⎤
∂Σ −1
−1 ∂Σ
Σ −1[Ι − Σ −1Χ( Χ′Σ −1Χ) −1 Χ′Σ −1 ]
Σ ΧΦ
Cov ⎢
,
⎥ = ΦΧ′Σ
∂σ i
∂σ j
⎣⎢ ∂σ i ∂σ j ⎦⎥
= Φ(Q ij − Ρ i ΦΡ j )Φ
Combining expressions 2.4, 2.5 and 2.6 we have
(2.6)
19
Λ = Λ + O(n
−52
r
r
), where Λ = ∑∑ wij Φ(Qij − Ρ i ΦΡ j )Φ.
i =1 j =1
ˆ ] = Φ − Λ + R ∗ + O(n− 2 )
( a ) E[Φ
ˆ ] = Λ + O(n − 5 2 )
(b) E[Λ
5
Lemma 2.4.11
ˆ ∗ ] = R ∗ + O (n − 2 )
(c) E[R
5
where R ∗ =
Proof
(a)
1 r r
∑∑ wijΦRij Φ,
2 i =1 j =1
ˆ = Λ (σˆ ).
and Λ
Using a Taylor series expansion about σ ,
r
∂Φ
1 r r ∂ 2Φ
5
ˆ
ˆ
Φ =Φ+∑
(σ i − σ i ) + ∑∑
(σˆ i − σ i )(σˆ j − σ j ) + O p (n − 2 )
2 i =1 j =1 ∂σ i ∂σ j
i =1 ∂σ i
By assumption (A3), we have
2
r
r
ˆ ] = Φ + 1 ∑∑ ∂ Φ w + O (n − 5 2 )
Ε[Φ
ij
2 i =1 j =1 ∂σ i ∂σ j
=Φ+
=Φ+
+
1 r r
5
wij Φ(Ρ i ΦΡ j + Ρ j ΦΡ i − Qij + R ij − Q ji )Φ + O(n − 2 ) (lemma 2.4.8) (2.7)
∑∑
2 i =1 j =1
1 r r
1 r r
(
)
w
Φ
Ρ
ΦΡ
−
Q
Φ
+
∑∑ ij i j ij
∑∑ wij Φ(Ρ jΦΡi − Qij )Φ
2 i =1 j =1
2 i =1 j =1
1 r r
5
wij ΦR ij Φ + O (n − 2 )
∑∑
2 i =1 i =1
1
1
1 r r
5
5
= Φ − Λ − Λ + ∑∑ wij ΦR ij Φ + O(n − 2 ) = Φ − Λ + R ∗ + O(n − 2 )
2
2
2 i =1 j =1
(b) Using a Taylor series expansion about σ , we obtain
ˆ = Λ + O ( n − 5 2 ) , and then, E[ Λ
ˆ ] = Λ + O (n − 5 2 ).
Λ
(c ) Using a Taylor series expansion about σ , we obtain
ˆ ∗ = R ∗ + O ( n − 5 2 ) , and then, E[R
ˆ ∗ ] = R ∗ + O (n − 5 2 ).
R
Proposition 2.4.12
ˆ ] = Var[βˆ ] + O(n − 5 2 ),
E[Φ
A
20
r
r
ˆ − Ρˆ ΦΡ
ˆ =Φ
ˆ + 2Φ
ˆ ⎨⎧∑∑ wˆ (Q
ˆ ˆ −1R
ˆ ) ⎬⎫ Φ̂.
where Φ
A
ij
ij
i
j
ij
4
⎩ i =1 j =1
⎭
Proof
ˆ −R
ˆ =Φ
ˆ + 2Λ
ˆ ∗,
Notice that Φ
A
ˆ ] = Φ − Λ + R ∗ + 2 Λ − R ∗ + O (n − 5 2 ) = Φ + Λ + O (n − 5 2 ) (lemma 2.4.11)
and hence E[Φ
A
= Φ + Λ + O(n − 2 )
5
= Var(βˆ ) + O (n − 2 ).
5
(lemma 2.4.9)
Comments
From proposition 2.4.12, we obtain
ˆ ] = Var(βˆ ) + O(n − 5 2 ) ,
E[Φ
A
and from lemma 2.4.10, we obtain
ˆ ] = Var(βˆ ) + O ( n −2 ) .
E[Φ
This shows that the adjusted estimator for Var(βˆ ) has less bias than the conventional
estimator.
21
3. TESTING THE FIXED EFFECTS
Consider the model as described in section 2.1. Suppose that we are interested in
making inferences about linear combinations of the elements of β . In other words, we
are interested in testing H 0 : L′β = 0 where L′ a fixed matrix of dimension ( × p ) . A
common statistic often used is the Wald statistic:
1
ˆ )]−1 (L′βˆ ) .
F = (L′βˆ )′[L′VL
In fact, even though we call this statistic F , it does not necessarily have an F-distribution.
Kenward and Roger (1997) approximate the distribution of F by choosing a scale λ and
denominator degrees of freedom m such that λ F ∼ F( , m) approximately.
3.1 Constructing a Wald-Type Pivot
The construction of a Wald-type pivot can be approached through either the
adjusted estimator for the variance-covariance matrix of the fixed effects Φ̂A (this is what
was done by Kenward and Roger, 1997), or through the conventional estimator Φ̂ . Both
approaches are to be considered in this section.
3.1.1 Constructing a Wald-Type Pivot Through Φ̂
The Wald type pivot is
1
ˆ ) −1 L′(βˆ − β) .
F = (βˆ − β)′L(L′ΦL
In this section, we will derive formulas for the E[ F ] and Var[F ] approximately.
3.1.1.1 Deriving An Approximate Expression for E[F]
Ε[F ] = Ε[Ε[ F |σˆ ]],
1
ˆ ) −1 L′(βˆ − β) | σˆ ]
Ε[ F |σˆ ] = Ε[ (βˆ − β)′L(L′ΦL
=
1
ˆ )
{Ε[(βˆ − β)′L](L′ΦL
−1
(
)}
ˆ ) −1 Var[L′(βˆ − β)] ,
Ε[L′(βˆ − β)] + tr (L′ΦL
22
using assumption (A4), and (Schott, 2005, theorem 10.18).
Since βˆ is an unbiased estimator for β (lemma 2.4.3),
ˆ ) −1 (L′VL)],
Ε[ F |σˆ ] = tr[(L′ΦL
then
where V = Var(βˆ ) = Φ + Λ
ˆ ) −1 (L′ΦL)] + tr[(L′ΦL
ˆ ) −1 (L′ΛL)] .
= tr[(L′ΦL
ˆ ) −1 about σ , we have
Using a Taylor series expansion for (L′ΦL
ˆ ) −1 = (L′ΦL) −1 + ∑ (σˆ − σ ) ∂ (L′ΦL)
(L′ΦL
i
i
∂σ i
i =1
r
+
−1
1 r r
∂ 2 (L′ΦL) −1
1
ˆ
ˆ
(
)(
)
σ
σ
σ
σ
−
−
+ O p (n − 2 ),
∑∑
i
i
j
j
2 i =1 j =1
∂σ i ∂σ j
and
r
ˆ ) −1 (L′ΦL) = Ι + ∑ (σˆ − σ )
(L′ΦL
i
i
i =1
+
∂ (L′ΦL) −1
(L′ΦL)
∂σ i
3
1 r r
∂ 2 (L′ΦL) −1
ˆ
ˆ
σ
σ
σ
σ
(
)(
)
(L′ΦL) + O p (n − 2 ).
−
−
∑∑
i
i
j
j
2 i =1 j =1
∂σ i ∂σ j
Since
⎡ ∂ 2 (L′ΦL) −1
⎤
tr ⎢
(L′ΦL) ⎥ =
⎣⎢ ∂σ i ∂σ j
⎦⎥
⎡
⎡
⎤
∂Φ
∂Φ ⎤
∂ 2Φ
L)(L′ΦL) −1 (L′
L) ⎥ − tr ⎢(L′ΦL) −1 (L′
L) ⎥ ,
2tr ⎢(L′ΦL) −1 (L′
∂σ i
∂σ j ⎥⎦
∂σ i ∂σ j ⎥⎦
⎢⎣
⎢⎣
then by assumption (A3), we have
(3.1)
r
r
⎡
⎤
ˆ ) −1 (L′ΦL)}] = + ∑∑ w tr ⎢(L′ΦL) −1 (L′ ∂Φ L)(L′ΦL) −1 (L′ ∂Φ L) ⎥
Ε[tr{(L′ΦL
ij
∂σ i
∂σ j ⎥⎦
i =1 j =1
⎣⎢
−
r
⎡
⎤
∂ 2Φ
1 r r
3
−1
′
′
tr
(
L
ΦL
)
(
L
L) ⎥ + O(n − 2 )
w
⎢
∑∑
ij
∂σ i ∂σ j ⎥⎦
2 i =1 j =1
⎢⎣
r
= + ∑∑ wij tr[ΘΦΡ i ΦΘΦΡ j Φ] −
i =1 j =1
where Θ = L (L′ΦL ) −1 L′, and noticing that
⎛
∂ 2Φ
1 r r
tr
Θ
w
⎜
∑∑ ij ⎜ ∂σ ∂σ
2 i =1 j =1
i
j
⎝
⎞
−3
⎟⎟ + O( n 2 ),
⎠
∂Φ
= −ΦΡ i Φ (lemma 2.4.8).
∂σ i
23
ˆ ) −1 (L′ΛL )}] = tr(ΘΛ ) + O ( n −2 ) .
Similar steps lead to Ε[tr{(L′ΦL
Hence,
r
r
Ε[ F ] = + tr(ΘΛ ) + ∑∑ wij tr(ΘΦΡ i ΦΘΦΡ j Φ) −
i =1 j =1
⎛
∂ 2Φ
1 r r
wij tr ⎜ Θ
∑∑
⎜ ∂σ ∂σ
2 i =1 j =1
i
j
⎝
⎞
−3
⎟⎟ + O (n 2 )
⎠
and,
E[F ] = 1+
1
r
r
∑∑ wij tr(ΘΦΡiΦΘΦΡ j Φ) −
i =1 j =1
= 1+
1
r
r
∑∑ wij tr(ΘΦΡiΦΘΦΡ jΦ) +
i =1 j =1
2
r
⎛
∂ 2Φ
tr
Θ
w
⎜
∑∑
ij ⎜
i =1 j =1
⎝ ∂σ i ∂σ j
r
r
i =1 j =1
ij
A2 + 2 A3
⎞ 1
−3
⎟⎟ + tr(ΘΛ ) + O(n 2 )
⎠
1
3
− ΡiΦΡ j − Rij )Φ] + O(n− 2 ).
4
r
∑∑ w tr[ΘΦ(Q
E[F ] = 1 +
Proposition 3.1.1.1.1
1
2
ij
+ O(n− 2 ),
3
where
r
r
A2 = ∑∑ wij tr(ΘΦΡiΦΘΦΡ j Φ),
i =1 j =1
and
r
1
A3 = ∑∑ wij tr[ΘΦ(Qij − Ρi ΦΡ j − Rij )Φ]
4
i =1 j =1
r
3.1.1.2 Deriving an Approximate Expression for Var[F]
Var[F ] = E[Var[ F | σˆ ]] + Var[E[F | σˆ ]]
To derive the right-hand side, we consider each term at a time. The first term will be
derived in part A, and the second term will be derived in part B.
A.
Var[F | σˆ ]] =
1
2
ˆ ) −1 L′(βˆ − β ) | σˆ ]
Var[(βˆ − β )′L (L′ΦL
The expression above is a quadratic form, and hence by assumption (A4),
2
ˆ )−1 (L′VL)]2 (Schott, 2005, theorem 10.22).
Var[F | σˆ ]] = 2 tr[(L′ΦL
ˆ ) −1 about σ, we have
Using a Taylor series expansion for (L′ΦL
24
ˆ ) −1 (L′VL) = (L′ΦL) −1 (L′VL) + ∑ (σˆ − σ ) ∂ (L′ΦL) (L′VL)
(L′ΦL
i
i
∂σ i
i =1
−1
r
+
3
1 r r
∂ 2 (L′ΦL) −1
ˆ
ˆ
σ
σ
σ
σ
(
)(
)
(L′VL) + O p (n − 2 )
−
−
∑∑
i
i
j
j
2 j =1 j =1
∂σ i ∂σ j
ˆ ) −1 (L′VL)]2 = [(L′ΦL) −1 (L′VL)]2
⇒ [(L′ΦL
r
+ (L′ΦL) −1 (L′VL)∑ (σˆ i − σ i )
i =1
∂ (L′ΦL) −1
(L′VL)
∂σ i
r
r
∂ 2 (L′ΦL) −1
1
+ (L′ΦL) −1 (L′VL)∑∑ (σˆ i − σ i )(σˆ j − σ j )
(L′VL)
∂σ i ∂σ j
2
i =1 j =1
⎛ r
⎞
∂ (L′ΦL) −1
+ ⎜ ∑ (σˆ i − σ i )
(L′VL) ⎟ (L′ΦL) −1 (L′VL)
∂σ i
⎝ i =1
⎠
−1
r
⎞
⎛
⎞⎛ r
∂ (L′ΦL)
∂ (L′ΦL) −1
+ ⎜ ∑ (σˆ i − σ i )
(L′VL) ⎟ ⎜ ∑ (σˆ i − σ i )
(L′VL) ⎟
∂σ i
∂σ i
⎠
⎝ i =1
⎠ ⎝ i =1
⎞
3
1⎛ r r
∂ 2 (L′ΦL) −1
(L′VL) ⎟ (L′ΦL) −1 (L′VL) + O p ( n − 2 ),
+ ⎜ ∑∑ (σˆ i − σ i )(σˆ j − σ j )
⎟
2 ⎜⎝ i =1 j =1
∂σ i ∂σ j
⎠
and
ˆ ) −1 (L′VL)]2 = tr {[(L′ΦL) −1 (L′VL)]2
tr[(L′ΦL
r
+ 2(L′ΦL) −1 (L′VL)∑ (σˆ i − σ i )
i =1
r
r
∂ (L′ΦL) −1
(L′VL)
∂σ i
+ (L′ΦL) −1 (L′VL)∑∑ (σˆ i − σ i )(σˆ j − σ j )
i =1 j =1
r
r
+ ∑∑ (σˆ i − σ i )(σˆ j − σ j )
i =1 j =1
∂ 2 (L′ΦL) −1
(L′VL)
∂σ i ∂σ j
∂ (L′ΦL) −1
∂ (L′ΦL) −1
3
′
(L VL)
(L′VL)} + O p (n − 2 ).
∂σ i
∂σ j
Taking the expectation and by assumption (A3), we obtain
Ε[Var[F | σˆ ]] =
2
2
{tr[(L′ΦL)
−1
(L′VL)]2
r
r
⎛
∂ 2 (L′ΦL) −1 ⎞
−1
′
′
′
+ tr ⎜ (L VL)(L ΦL) (L VL)∑∑ wij
⎟
⎜
∂σ i ∂σ j ⎠⎟
i =1 j =1
⎝
r
r
⎛
3
∂ (L′ΦL) −1
∂ (L′ΦL) −1 ⎞ ⎫⎪
+ O(n − 2 )
(L′VL)
+ tr ⎜ (L′VL)∑∑ wij
⎟
⎬
⎟
⎜
∂σ i
∂σ j
i =1 j =1
⎝
⎠ ⎪⎭
To proceed, we need to derive each term mentioned in the expression above.
25
(i ) [(L′ΦL)−1 (L′VL)]2
= (L′ΦL) −1[(L′ΦL) + (L′ΛL)](L′ΦL)−1[(L′ΦL) + (L′ΛL)]
= [Ι + (L′ΦL) −1 (L′ΛL)][Ι + (L′ΦL)−1 (L′ΛL)]
= Ι + 2(L′ΦL) −1 (L′ΛL) + (L′ΦL) −1 (L′ΛL)(L′ΦL) −1 (L′ΛL)
= Ι + 2(L′ΦL) −1 (L′ΛL) + O(n −2 )
(3.2)
(ii ) (L′VL)(L′ΦL) −1 (L′VL) = [(L′ΦL) + (L′ΛL)](L′ΦL) −1[(L′ΦL) + (L′ΛL)]
= (L′ΦL) + 2(L′ΛL) + (L′ΛL)(L′ΦL)−1 (L′ΛL)
r
r
⎡
∂ 2 (L′ΦL) −1 ⎤
So, tr ⎢ (L′VL)(L′ΦL) −1 (L′VL)∑∑ wij
⎥
∂σ i ∂σ j ⎥⎦
i =1 j =1
⎢⎣
r
r
r
r
⎡ ∂ 2 (L′ΦL) −1
⎤
⎡ ∂ 2 (L′ΦL) −1
⎤
= ∑∑ wij tr ⎢
(L′ΦL) ⎥ + 2∑∑ wij tr ⎢
(L′ΛL) ⎥
i =1 j =1
i =1 j =1
⎥⎦
⎣⎢ ∂σ i ∂σ j
⎦⎥
⎣⎢ ∂σ i ∂σ j
⎡ ∂ 2 (L′ΦL) −1
⎤
+ ∑∑ wij tr ⎢
(L′ΛL)(L′ΦL) −1 (L′ΛL) ⎥
i =1 j =1
⎣⎢ ∂σ i ∂σ j
⎦⎥
r
r
r
r
⎡ ∂ 2 (L′ΦL) −1
⎤
= ∑∑ wij tr ⎢
(L′ΦL) ⎥ + O (n −2 )
i =1 j =1
⎢⎣ ∂σ i ∂σ j
⎥⎦
r
r
r
r
⎛
∂ 2Φ
= 2∑∑ wij tr(ΘΦΡ i ΦΘΦΡ j Φ) −∑∑ wij tr ⎜ Θ
⎜ ∂σ ∂σ
i =1 j =1
i =1 j =1
i
j
⎝
⎞
−2
⎟⎟ + O (n ),
⎠
where the last expression above is obtained by utilizing (3.1).
Alternatively, we can utilize lemma 2.4.8, to rewrite expression (3.3) as
r
r
1
2A2 − 2∑∑ wij tr[ΘΦ(Ρ i ΦΡ j − Q ij + R ij )Φ] + O (n −2 ) .
2
i =1 i =1
∂ (L′ΦL) −1
∂ (L′ΦL) −1
′
(iii )
(L VL)
(L′VL)
∂σ i
∂σ j
∂Φ
= (L′ΦL) −1 (L′
L)(L′ΦL) −1[(L′ΦL) + (L′ΛL)] ×
∂σ i
∂Φ
L)(L′ΦL) −1[(L′ΦL) + (L′ΛL)]
(L′ΦL) −1 (L′
∂σ j
= (L′ΦL) −1 (L′
∂Φ
∂Φ
L)(L′ΦL) −1 (L′
L)
∂σ i
∂σ j
(3.3)
26
+ (L′ΦL) −1 (L′
∂Φ
∂Φ
L)(L′ΦL)−1 (L′
L)(L′ΦL) −1 (L′ΛL)
∂σ i
∂σ j
+ (L′ΦL) −1 (L′
∂Φ
∂Φ
L)(L′ΦL)−1 (L′ΛL)(L′ΦL) −1 (L′
L) + O(n −2 )
∂σ i
∂σ j
And hence,
⎛ r r
⎞
∂ (L′ΦL) −1
∂ (L′ΦL)−1
′
tr ⎜ ∑∑ wij
(L VL)
(L′VL) ⎟
⎜ i =1 j =1
⎟
∂σ i
∂σ j
⎝
⎠
r
r
= tr[∑∑ wij tr(ΘΦΡ i ΦΘΦΡ j Φ) + O(n −2 )
i =1 j =1
= A2 + O ( n −2 )
(3.4)
From (i ), (ii ), and (iii ), we obtain
Ε[Var[F | σˆ ]] =
r
r
3
2⎧
1
⎫
+
ΘΛ
+
A
−
wij tr[ΘΦ(Ρ i ΦΡ j − Qij + R ij )Φ]⎬ + O(n − 2 )
2tr(
)
3
2
⎨
∑∑
2
2
2
i =1 i =1
⎩
⎭
B. Var[E[F | σˆ ]] =
1
2
ˆ )−1 (L′VL)]
Var[tr[(L′ΦL
1
=
(3.5)
2
{
(
) ( {
2
}) }
ˆ )−1 (L′VL)] − Ε tr[(L′ΦL
ˆ )−1 (L′VL)]
Ε tr[(L′ΦL
2
ˆ ) −1 about σ,
For the first term above, and by using a Taylor series expansion for (L′ΦL
ˆ )
( tr[(L′ΦL
−1
+
(L′VL)
)
2
r
⎡
∂ (L′ΦL)−1
= tr ⎢(L′ΦL) −1 (L′VL) + ∑ (σˆ i − σ i )
(L′VL)
∂σ i
i =1
⎣
⎤
∂ 2 (L′ΦL) −1
1 r r
3
ˆ
ˆ
−
−
(
)(
)
(L′VL) + O p ( n − 2 ) ⎥ × (same terms)
σ
σ
σ
σ
∑∑
i
i
j
j
∂σ i ∂σ j
2 i =1 i =1
⎥⎦
r
2
⎡ ∂ (L′ΦL)−1
⎤
(L′VL) ⎥
= ( tr[(L′ΦL) −1 (L′VL)]) + 2tr[(L′ΦL)−1 (L′VL)]∑ (σˆ i − σ i )tr ⎢
∂σ i
i =1
⎣
⎦
r
r
⎡ ∂ 2 (L′ΦL) −1
⎤
+ tr[(L′ΦL) −1 (L′VL)]∑∑ (σˆ i − σ i )(σˆ j − σ j )tr ⎢
(L′VL) ⎥
i =1 j =1
⎢⎣ ∂σ i ∂σ j
⎥⎦
27
r
r
⎤
⎡ ∂ (L′ΦL) −1
⎤ ⎡ ∂ (L′ΦL) −1
3
+ ∑∑ (σˆ i − σ i )(σˆ j − σ j )tr ⎢
(L′VL) ⎥tr ⎢
(L′VL) ⎥ + O p ( n − 2 )
∂σ i
i =1 j =1
⎥⎦
⎣
⎦ ⎢⎣ ∂σ j
Taking the expectation, and by assumption (A4), we have
Ε ( tr[(L′ΦL) −1 (L′VL)]) = ( tr[(L′ΦL) −1 (L′VL)])
2
2
⎡ ∂ 2 (L′ΦL) −1
⎤
(L′VL) ⎥tr[(L′ΦL) −1 (L′VL)]
+ ∑∑ wij tr ⎢
i =1 j =1
⎣⎢ ∂σ i ∂σ j
⎦⎥
r
r
r
r
⎤
⎡ ∂ (L′ΦL) −1
⎤ ⎡ ∂ (L′ΦL) −1
3
(L′VL) ⎥tr ⎢
(L′VL) ⎥ + O(n − 2 )
+ ∑∑ wij tr ⎢
∂σ i
i =1 j =1
⎥⎦
⎣
⎦ ⎢⎣ ∂σ j
To proceed, we need to derive each term mentioned in the expression above.
(i )
(L′ΦL) −1 (L′VL) = Ι + (L′ΦL) −1 (L′ΛL)
⇒ tr[(L′ΦL) −1 (L′VL)] = + tr(ΘΛ )
⇒ ( tr[(L′ΦL) −1 (L′VL)]) =
2
(ii )
2
+ 2 tr(ΘΛ ) + O( n −2 )
∂ (L′ΦL) −1
∂ (L′ΦL) −1
∂ (L′ΦL) −1
(L′VL) =
(L′ΦL) +
(L′ΛL)
∂σ i
∂σ i
∂σ i
= −(L′ΦL) −1 L′
∂Φ
∂Φ
L − (L′ΦL) −1 L′
L(L′ΦL)−1 (L′ΛL)
∂σ i
∂σ i
⎡ ∂ (L′ΦL) −1
⎤
So, tr ⎢
(L′VL) ⎥ = − tr(ΘΦΡ i Φ) − tr(ΘΦΡ i ΦΘΛ ),
∂σ i
⎣
⎦
(iii )
(3.6)
(3.7)
∂ 2 (L′ΦL) −1
∂ 2 (L′ΦL) −1
∂ 2 (L′ΦL) −1
(L′VL) =
(L′ΦL) +
(L′ΛL) ,
∂σ i ∂σ j
∂σ i ∂σ j
∂σ i ∂σ j
⎡ ∂ 2 (L′ΦL) −1
⎤
and tr ⎢
(L′VL) ⎥
⎣⎢ ∂σ i ∂σ j
⎦⎥
⎡ ∂ 2 (L′ΦL) −1
⎤
⎡ ∂ 2 (L′ΦL) −1
⎤
= tr ⎢
(L′ΦL) ⎥ + tr ⎢
(L′ΛL) ⎥
⎢⎣ ∂σ i ∂σ j
⎥⎦
⎢⎣ ∂σ i ∂σ j
⎥⎦
⎛
∂ 2Φ
= 2tr(ΘΦΡ i ΦΘΦΡ j Φ ) − tr ⎜ Θ
⎜ ∂σ ∂σ
i
j
⎝
⎞
⎟⎟ + 2tr(ΘΦΡ i ΦΘΦΡ j ΦΘΛ )
⎠
⎛
⎞
∂ 2Φ
− tr ⎜ Θ
ΘΛ ⎟ (from expression 3.1).
⎜ ∂σ ∂σ
⎟
i
j
⎝
⎠
(3.8)
28
From (i), (ii), and (iii),
(
ˆ )−1 (L′VL)
Ε tr[(L′ΦL
r
)
2
=
+ 2 tr(ΘΛ )
2
⎧⎪
r
⎛
⎜
⎝
∑∑ wij [ + tr(ΘΛ)] ⎨− tr ⎜ Θ
+
⎪⎩
i =1 j =1
⎛
⎜
⎝
∂ 2Φ ⎞
⎟ + 2tr(ΘΦΡ iΦΘΦΡ j Φ)
∂σ i ∂σ j ⎟⎠
+2tr(ΘΦΡ i ΦΘΦΡ j ΦΘΛ ) − tr ⎜ Θ
r
⎞ ⎫⎪
∂ 2Φ
ΘΛ ⎟ ⎬
⎟
∂σ i ∂σ j
⎠⎪
⎭
r
+ ∑∑ wij [tr(ΘΦΡ i Φ) + tr(ΘΦΡ iΦΘΛ )][tr(ΘΦΡ j Φ) + tr(ΘΦΡ j ΦΘΛ )] + O(n − )
3
2
i =1 j =1
=
2
r
r
⎧⎪
⎛
∂ 2Φ
+ 2 tr(ΘΛ ) + ∑∑ wij ⎨2 tr(ΘΦΡ i ΦΘΦΡ j Φ) − tr ⎜ Θ
⎜ ∂σ ∂σ
i =1 j =1
i
j
⎪⎩
⎝
⎞
⎟⎟
⎠
⎛
⎞
∂ 2Φ
θΛ ⎟ + 2tr(ΘΛ )tr(ΘΦΡ iΦΘΦΡ j Φ)
+2 tr(ΘΦΡ i ΦΘΦΡ j ΦΘΛ ) − tr ⎜ Θ
⎜ ∂σ ∂σ
⎟
i
j
⎝
⎠
⎛
⎞
⎛
∂ 2Φ ⎞
∂ 2Φ
ΘΛ ⎟
2tr(
)tr(
)
tr(
)tr
ΘΛ
ΘΦΡ
ΦΘΦΡ
ΦΘΛ
ΘΛ
Θ
− tr(ΘΛ )tr ⎜ Θ
+
−
⎜
⎟
i
j
⎜ ∂σ i ∂σ j
⎟
⎜
⎟
⎝
⎠
⎝ ∂σ i ∂σ j ⎠
+ tr(ΘΦΡ i Φ)tr(ΘΦΡ j Φ) + 2tr(ΘΦΡ iΦ)tr(ΘΦΡ j ΦΘΛ ) + tr(ΘΦΡ iΦΘΛ )tr(ΘΦΡ j ΦΘΛ )}
+O ( n − 2 )
3
=
2
⎧⎪
⎛
∂ 2Φ
+ 2 tr(ΘΛ ) + ∑∑ wij ⎨2 tr(ΘΦΡ i ΦΘΦΡ j Φ) − tr ⎜ Θ
⎜ ∂σ ∂σ
i =1 j =1
i
j
⎝
⎩⎪
r
r
⎞ ⎫⎪
⎟⎟ ⎬
⎠ ⎭⎪
+ ∑∑ wij {tr(ΘΦΡ i Φ)tr(ΘΦΡ j Φ)} + O( n − 2 )
r
r
3
i =1 j =1
=
2
r
r
⎛
∂ 2Φ
+ 2 tr(ΘΛ ) + 2 A2 + A1 − ∑∑ wij tr ⎜ Θ
⎜ ∂σ ∂σ
i =1 j =1
i
j
⎝
r
⎞
−3
⎟⎟ + O( n 2 ),
⎠
(3.9)
r
A1 = ∑∑ wij tr(ΘΦΡi Φ)tr(ΘΦΡ j Φ).
where
i =1 j =1
Also, from expression 3.2,
ˆ )
( Ε ⎡⎣ tr[(L′ΦL
−1
(L′VL)]⎤⎦
)
2
=
2
+ 2 ( A2 +2A3 ) + O( n −2 )
From expressions (3.9) and (3.10), we obtain
(3.10)
29
Var[E[F |σˆ ]]
1 ⎪⎧
= 2⎨
⎩⎪
=
2
r
r
+ 2 tr(ΘΛ ) + 2 A2 + A1 − ∑∑
i =1 j =1
⎛
∂ 2Φ
wij tr ⎜ Θ
⎜
⎝ ∂σ i ∂σ j
r
r
⎛
1 ⎧⎪
∂ 2Φ
A
w
2
tr(
ΘΛ
)
tr
Θ
+
−
∑∑
ij ⎜
1
2 ⎨
⎜
i =1 j =1
⎝ ∂σ i ∂σ j
⎩⎪
⎞ 2
⎪⎫
−3
⎟⎟ − − 2 ( A2 +2A3 ) ⎬ + O(n 2 )
⎠
⎭⎪
⎫⎪
⎞
−3
⎟⎟ − 4 A3 ) ⎬ + O(n 2 )
⎪⎭
⎠
From parts (A) and (B), we are able to find an expression for Var[ F ] .
r
r
⎛
∂ 2Φ
2 ⎧⎪
+
+
−
A
w
2tr(
ΘΛ
)
3
tr
Θ
∑∑
2
ij ⎜
2 ⎨
⎜ ∂σ i ∂σ j
i =1 j =1
⎝
⎩⎪
Var[ F ] =
+
r
r
⎛
1 ⎧⎪
∂ 2Φ
A
w
2
tr(
ΘΛ
)
tr
Θ
+
−
∑∑
1
ij ⎜
2 ⎨
⎜ ∂σ ∂σ
i =1 j =1
i
j
⎝
⎩⎪
⎞ ⎫⎪
⎟⎟ ⎬
⎠ ⎭⎪
⎫⎪
⎞
−3
⎟⎟ − 4 A3 ⎬ + O(n 2 )
⎪⎭
⎠
⎛
∂ 2Φ
2 ⎧⎪ 3 A2 A1
⎛2 ⎞
⎛1 1⎞ r r
= ⎨1 +
+ − 2 A3 + ⎜ + 1⎟ tr(ΘΛ ) − ⎜ + ⎟ ∑∑ wij tr ⎜ Θ
⎜ ∂σ ∂σ
2
2 ⎠ i =1 j =1
⎝
⎠
⎝
i
j
⎝
⎩⎪
⎞ ⎫⎪
−3
⎟⎟ ⎬ + O(n 2 )
⎠ ⎭⎪
=
2 ⎪⎧
1
⎨1 +
2
⎩⎪
r
r
⎡
⎛
∂ 2Φ
⎢ A1 + 6 A2 − 4 A3 + 2( + 2)tr(ΘΛ ) − ( + 2)∑∑ wij tr ⎜⎜ Θ
i =1 j =1
⎢⎣
⎝ ∂σ i ∂σ j
⎞ ⎤ ⎪⎫
−3
⎟⎟ ⎥ ⎬ + O(n 2 )
⎠ ⎥⎦ ⎭⎪
=
2 ⎧⎪
1
⎨1 +
2
⎩⎪
r
r
⎡
⎛
∂ 2Φ
⎪⎧
⎢ A1 + 6 A2 − 4 A3 + ( + 2) ⎨2tr(ΘΛ ) − ∑∑ wij tr ⎜ Θ
⎜ ∂σ ∂σ
i =1 j =1
i
j
⎝
⎩⎪
⎣⎢
=
2 ⎪⎧
1
⎨1 +
⎪⎩ 2
r
r
⎡
⎧
⎫⎤ ⎫⎪
−3
+
−
+
+
−
6
4
(
2)
4tr(
)
tr(
)
A
A
A
ΘΛ
w
ΘΦR
Φ
⎢ 1
⎨
⎬⎥ ⎬ + O( n 2 ),
∑∑
ij
ij
2
3
i =1 j =1
⎢⎣
⎩
⎭⎥⎦ ⎪⎭
⎞ ⎪⎫⎤ ⎫⎪
−3
⎟⎟ ⎬⎥ ⎬ + O (n 2 )
⎠ ⎭⎪⎦⎥ ⎭⎪
by noticing that
⎛
∂ 2Φ
−∑∑ wij tr ⎜ Θ
⎜ ∂σ ∂σ
i =1 j =1
i
j
⎝
r
r
r
⎞
⎟⎟ =
⎠
r
2tr(ΘΛ ) − ∑∑ wij tr(ΘΦR ij Φ) + O(n − 2 )
5
(lemmas 2.4.8 and 2.4.10),
i =1 j =1
and
A3 = tr(ΘΛ ) −
Proposition 3.1.1.2.1
1 r r
5
wij tr(ΘΦR ij Φ) + O(n − 2 ).
∑∑
4 i =1 j =1
Var[ F ] =
2
(1 + B1 ) + O (n − 2 ),
3
30
B1 =
where
1
[ A1 + 6 A2 +8A3 ]
2
3.1.2 Constructing a Wald-Type Pivot Through Φ̂A
1
ˆ L) −1 L′(βˆ − β) . Similar to section
The Wald type pivot is F = (βˆ − β)′L(L′Φ
A
3.1.1, we derive expressions for E[ F ] and Var[F ] approximately
3.1.2.1 Deriving an Approximate Expression for E[F]
First, we derive a useful lemma that will be needed in this section and chapter 4.
For a matrix B = Op (n −1 ), (Ι + B)−1 = Ι − B + Op (n −2 )
Lemma 3.1.2.1.1
Proof
(Ι + B)−1 = Ι − (Ι + B) −1 B
(Schott, 2005, theorem 1.7)
= Ι − ⎡⎣Ι − (Ι + B) −1 B ⎤⎦ B,
applying the theorem again for (Ι + B) −1
p
p
Notice that since B → 0, then (Ι + B) −1 → (Ι + 0) −1 = Ι, and hence (Ι + B) −1 = O p (1).
Therefore, (Ι + B) −1 = Ι − B + (Ι + B) −1 BB = Ι − B + O( n −2 ).
(
)
1
ˆ L) −1 Var[L′(βˆ − β)] , by assumption (A4).
Ε[ F |σˆ ] = tr (L′Φ
A
(
ˆ L)
= tr ( E[(L′Φ
)
ˆ L) −1 (L′VL)] ,
⇒ Ε[ F ] = E tr[(L′Φ
A
−1
A
ˆ∗
ˆ =Φ
ˆ +A
Since Φ
A
(L′VL)]
)
where V = Var(βˆ ) = Φ + Λ
r
r
ˆ ∗ = 2Φ
ˆ − Ρˆ ΦΡ
ˆ ⎧⎨∑∑ wˆ (Q
ˆ ˆ −1R
ˆ ) ⎫⎬ Φ
ˆ,
where A
ij
ij
i
j
ij
4
⎩ i =1 j =1
⎭
−1
ˆ ∗L) ⎤ (L′VL)
ˆ L) −1 (L′VL) = ⎡(L′ΦL
ˆ ) + (L′A
then (L′Φ
A
⎣
⎦
{
}
ˆ ∗L) ⎤
ˆ ) ⎡ Ι + (L′ΦL
ˆ ) −1 (L′A
= (L′ΦL
⎣
⎦
−1
(L′VL)
−1
ˆ ∗L) ⎤ (L′ΦL
ˆ ) −1 (L′A
ˆ ) −1 (L′VL).
= ⎣⎡Ι + (L′ΦL
⎦
ˆ ∗L) = O (n −1 ), then by employing the lemma 3.1.2.1.1,
ˆ ) −1 (L′A
Since (L′ΦL
p
31
ˆ ∗L) + O (n −2 ) ⎤ (L′ΦL
ˆ L) −1 (L′VL) = ⎡ Ι − (L′ΦL
ˆ ) −1 (L′A
ˆ ) −1 (L′VL)
(L′Φ
p
A
⎣
⎦
ˆ ∗L)(L′ΦL
ˆ ) −1 (L′VL) − (L′ΦL
ˆ ) −1 (L′A
ˆ ) −1 (L′VL) + O (n −2 ) (3.11)
= (L′ΦL
p
(
)
ˆ ) −1 (L′VL)] = + A + 2A + O( n − 3 2 ),
Notice that E tr[(L′ΦL
2
3
(3.12)
(proposition 3.1.1.1.1)
ˆ ∗L)(L′ΦL
ˆ ) −1 (L′A
ˆ ) −1 (L′VL)
and (L′ΦL
ˆ ∗L)(L′ΦL
ˆ ∗L)(L′ΦL
ˆ ) −1 (L′A
ˆ ) −1 (L′ΛL)
ˆ ) −1 (L′A
ˆ ) −1 (L′ΦL) + (L′ΦL
= (L′ΦL
ˆ ∗L)(L′ΦL
ˆ )−1 (L′A
ˆ )−1 (L′ΦL) + Op (n −2 ).
= (L′ΦL
ˆ ) −1 about σ, we have
Using a Taylor series expansion for (L′ΦL
ˆ ∗L)(L′ΦL
ˆ )−1 (L′A
ˆ )−1 (L′ΦL) + O p (n −2 )
(L′ΦL
r
⎡
∂ (L′ΦL) −1 1 r r
∂ 2 (L′ΦL)−1 ⎤
ˆ ∗L)
= ⎢(L′ΦL)−1 + ∑ (σˆ i − σ i )
+ ∑∑ (σˆ i − σ i )(σˆ j − σ j )
⎥ (L′A
∂σ i
∂σ i ∂σ j ⎥⎦
2 i =1 j =1
i =1
⎢⎣
r
⎡
∂ (L′ΦL) −1 1 r r
∂ 2 (L′ΦL) −1 ⎤
−2
× ⎢(L′ΦL) −1 + ∑ (σˆ i − σ i )
+ ∑∑ (σˆ i − σ i )(σˆ j − σ j )
⎥ (L′ΦL) + O p (n )
∂
∂
∂
σ
σ
σ
2
i =1
i =1 j =1
i
i
j
⎣⎢
⎦⎥
ˆ ∗L ) + O ( n − 3 2 )
= (L′ΦL) −1 (L′A
p
(3.13)
From expressions (3.12 and 3.13), we obtain
ˆ ∗ ) ⎤ + O(n − 3 2 )
Ε[ F ] = + A2 + 2A3 − E ⎡⎣ tr(ΘA
⎦
Lemma
Proof
(3.14)
ˆ ∗ ] = A∗ + O (n − 2 )
E[A
5
Direct from Lemmas 2.4.10 and 2.4.11
Applying the lemma above on expression (3.14), we have
Proposition 3.1.2.1.2
Ε[ F ] = 1 +
A2
−3
+ O(n 2 )
3.1.2.2 Deriving an Approximate Expression for Var[F]
Var[F ] = E[Var[ F | σˆ ]] + Var[E[F | σˆ ]]
32
A. Var[F | σˆ ]] =
1
ˆ L) −1 L′(βˆ − β) | σˆ ]
Var[(βˆ − β )′L(L′Φ
A
2
The expression above is a quadratic form and hence by assumption (A4),
2
ˆ L) −1 (L′VL )]2 ] (Schott, 2005, theorem 10.22).
Var[F | σˆ ]] = 2 tr[(L′Φ
A
Recall that from expression (3.11), we had
ˆ ∗L)(L′ΦL
ˆ L) −1 (L′VL) = (L′ΦL
ˆ ) −1 (L′VL) − L′ΦL
ˆ ) −1 (L′A
ˆ ) −1 (L′VL) + O (n −2 ).
(L′Φ
A
p
(
)
ˆ L) −1 (L′VL)]2 = tr ⎡ (L′ΦL
ˆ )−1 (L′VL) ⎤
and hence, tr [(L′Φ
A
⎣
⎦
2
ˆ ∗L)(L′ΦL
ˆ ) −1 (L′VL)(L′ΦL
ˆ ) −1 (L′A
ˆ ) −1 (L′VL) ⎤ + O (n −2 )
− 2tr ⎡⎣ (L′ΦL
p
⎦
ˆ ) −1 ( (L′ΦL) + (L′ΛL) ) (L′ΦL
ˆ ) −1 ( (L′ΦL) + (L′ΛL) ) ⎤
= tr ⎡⎣ (L′ΦL
⎦
ˆ ∗L)(L′ΦL
ˆ ) −1 ( (L′ΦL) + (L′ΛL) ) (L′ΦL
ˆ ) −1 (L′A
ˆ ) −1 ( (L′ΦL ) + (L′ΛL) ) ⎤ + O (n −2 )
−2tr ⎣⎡(L′ΦL
p
⎦
ˆ ) −1 (L′ΦL)(L′ΦL
ˆ ) −1 (L′ΦL) ⎤ + 2tr ⎡ (L′ΦL
ˆ ) −1 (L′ΦL )(L′ΦL
ˆ ) −1 (L′ΛL) ⎤
= tr ⎡⎣ (L′ΦL
⎣
⎦
⎦
ˆ ∗L)(L′ΦL
ˆ ) −1 (L′ΦL)(L′ΦL
ˆ ) −1 (L′A
ˆ ) −1 (L′ΦL) ⎤ + O (n −2 ).
− 2tr ⎡⎣(L′ΦL
p
⎦
ˆ ) −1 about σ, we have
Using a Taylor series expansion for (L′ΦL
(
)
ˆ L) −1 (L′VL)]2 =
tr [(L′Φ
A
⎧⎡
⎪
r
∂ (L′ΦL) −1
i =1
∂σ i
tr ⎨⎢ Ι + ∑ (σˆ i − σ i )
⎪⎩ ⎣
⎡
r
× ⎢ Ι + ∑ (σˆ i − σ i )
⎣
i =1
⎪⎧⎡
∂ (L′ΦL)
∂σ i
1
r
−1
(L′ΦL) +
1
r
i
− σ i )(σˆ j − σ j )
r
∑∑ (σˆ
2
i =1
∂σ i
(L′ΦL) +
1
i
− σ i )(σˆ j − σ j )
i =1
r
∂ (L′ΦL) −1
⎣
i =1
∂σ i
⎧⎡
⎪
r
∂ (L′ΦL) −1
⎩⎪ ⎣
i =1
∂σ i
r
(L′ΦL) +
r
∑∑ (σˆ
2
⎡
× ⎢ (L′ΦL) −1 (L′ΛL) + ∑ (σˆ i − σ i )
2
∂σ i ∂σ j
i
∂ (L′ΦL)
2
∂σ i ∂σ j
− σ i )(σˆ j − σ j )
j =1
(L′ΛL) +
1
∂ 2 (L′ΦL ) −1
j =1
i =1 j =1
∂ (L′ΦL) −1
−2tr ⎨⎢ Ι + ∑ (σˆ i − σ i )
r
∑∑ (σˆ
2
i =1
r
+2tr ⎨⎢ Ι + ∑ (σˆ i − σ i )
⎩⎪ ⎣
(L′ΦL) +
1
2
r
−1
⎤
(L′ΦL) ⎥
⎦
⎤ ⎫⎪
(L′ΦL) ⎥ ⎬
⎦⎭⎪
∂ 2 (L′ΦL) −1
∂σ i ∂σ j
r
∑∑ (σˆ i − σ i )(σˆ j − σ j )
⎦
∂ 2 (L′ΦL) −1
i =1 j =1
r
r
∂ 2 (L′ΦL) −1
i =1
j =1
∂σ i ∂σ j
∑∑ (σˆ i − σ i )(σˆ j − σ j )
⎤
(L′ΦL) ⎥
∂σ i ∂σ j
⎤
(L′ΦL) ⎥
⎦
r
⎡
⎤
∂ (L′ΦL) −1
∂ 2 (L′ΦL) −1
1 r r
′
ˆ
ˆ
× ⎢Ι + ∑ (σˆ i − σ i )
+
−
−
(
L
ΦL
)
(
)(
)
(L′ΦL) ⎥
σ
σ
σ
σ
∑∑
i
i
j
j
2
∂σ i
∂σ i ∂σ j
2 i =1 j =1
⎣⎢ i =1
⎦⎥
⎤ ⎪⎫
(L′ΛL) ⎥ ⎬
⎦⎭⎪
33
⎡
ˆ ∗L) +
× ⎢ (L′ΦL) −1 (L′A
⎣
+
⎡
= tr ⎢ Ι + 2
⎣⎢
r
∑ (σˆ
i =1
i
−σi )
∂ (L′ΦL) −1
ˆ ∗L )
(L′A
∂σ i
⎤⎫
1 r r
∂ 2 (L′ΦL) −1
ˆ ∗L) ⎥⎬⎪ + O (n −2 )
(σˆ i − σ i )(σˆ j − σ j )
(L′A
∑∑
p
2 i =1 j =1
∂σ i ∂σ j
⎥⎦⎭⎪
r
∑ (σˆi − σ i )
i =1
r
r
⎤
∂ (L′ΦL) −1
∂ 2 (L′ΦL) −1
(L′ΦL) + ∑∑ (σˆ i − σ i )(σˆ j − σ j )
(L′ΦL) ⎥
∂σ i
∂σ i ∂σ j
i =1 j =1
⎦⎥
⎡ r r
⎤
∂ (L′ΦL) −1
∂ (L′ΦL) −1
+ tr ⎢ ∑∑ (σˆ i − σ i )(σˆ j − σ j )
(L′ΦL)
(L′ΦL) ⎥
∂σ i
∂σ j
⎢⎣ i =1 j =1
⎥⎦
ˆ ∗L) ⎤ + O (n − 3 2 ).
+ 2tr ⎡⎣ (L′ΦL) −1 (L′ΛL) ⎤⎦ − 2tr ⎡⎣(L′ΦL) −1 (L′A
p
⎦
By taking the expectation and assumption (A3),
E[Var[ F | σˆ ]] =
r
r
⎡ ∂ 2 (L′ΦL) −1
⎤
2 ⎧⎪
′
L
ΦL
tr
w
+
⎢
⎥
⎨
∑∑
ij
2
i =1 j =1
⎪⎩
⎣⎢ ∂σ i ∂σ j
⎦⎥
r
r
⎫⎪
⎡ ∂ (L′ΦL) −1
⎤
3
∂ (L′ΦL) −1
L′ΦL
L′ΦL ⎥ + 2tr[ΘΛ] − 2tr[ΘA∗ ]⎬ + O(n − 2 ).
+ ∑∑ wij tr ⎢
∂σ i
∂σ j
i =1 j =1
⎪⎭
⎣⎢
⎦⎥
Notice that
⎡ ∂ (L′ΦL) −1
⎤
∂ (L′ΦL) −1
′
′
tr
L
ΦL
L
ΦL
w
⎢
⎥ = A2 ,
∑∑
ij
∂σ i
∂σ j
i =1 j =1
⎣⎢
⎦⎥
r
r
r
r
⎡ ∂ 2 (L′ΦL) −1
⎤
⎡ ∂ 2 (L′ΦL) ⎤
′
=
−
tr
L
ΦL
2
tr
w
A
w
⎢
⎥
⎢Θ
⎥.
∑∑
∑∑
ij
ij
2
∂σ i ∂σ j ⎥⎦
i =1 j =1
i =1 j =1
⎢⎣ ∂σ i ∂σ j
⎥⎦
⎢⎣
r
and from (3.1),
r
Hence,
E[Var[ F | σˆ ]] =
r
r
⎡ ∂ 2 (L′ΦL) ⎤ ⎫⎪
2 ⎧⎪
−3
∗
+
3
2tr[
ΘΛ
]
2tr[
ΘA
]
tr
+
−
−
A
w
⎢Θ
⎥ ⎬ + O(n 2 ).
∑∑
ij
2
2 ⎨
∂σ i ∂σ j ⎥⎦ ⎭⎪
i =1 j =1
⎢⎣
⎩⎪
In addition, by lemmas 2.4.8 and 2.4.10, we have
⎡ ∂ 2 (L′ΦL) ⎤
2tr[ΘΛ ] − 2tr[ΘA ] − ∑∑ wij tr ⎢Θ
⎥
∂σ i ∂σ j ⎦⎥
i =1 j =1
⎣⎢
∗
r
r
r
r
⎡ ∂ 2 (L′ΦL) ⎤
−3
= 2tr[ΘΛ ] − 2tr[ΘA∗ ] − ∑∑ wij tr ⎢Θ
⎥ + O(n 2 )
∂σ i ∂σ j ⎦⎥
i =1 j =1
⎣⎢
r
r
r
r
1
⎡
⎤
= 2∑∑ wij tr ⎡⎣ΘΦ(Qij − Ρ i ΦΡ j )Φ ⎤⎦ − 4∑∑ wij tr ⎢ΘΦ(Qij − Ρ i ΦΡ j − R ij )Φ ⎥
4
⎣
⎦
i =1 j =1
i =1 j =1
34
r
r
1
3
3
3
⎡
⎤
−2∑∑ wij tr ⎢ΘΦ(Ρ i ΦΡ j − Qij + R ij )Φ ⎥ + O( n − 2 ) = 0 + O(n − 2 ) = O (n − 2 )
2
⎣
⎦
i =1 j =1
⇒ E[Var[ F | σˆ ]] =
1
B. Var[E[F | σˆ ]] =
( tr[(L′Φˆ L)
(i )
A
{
+ 3 A2 } + O(n − 2 )
3
{(
1
(3.16)
}) },
) ( {
2
2
ˆ L) −1 (L′VL)] − Ε tr[(L′Φ
ˆ L) −1 (L′VL)]
Ε tr[(L′Φ
A
A
2
−1
2
ˆ L) −1 (L′VL)], by assumption (A4)
Var[tr[(L′Φ
A
2
=
2
(3.15)
(L′VL)
)
2
{
}
ˆ ∗L)(L′ΦL
ˆ ) −1 (L′VL) − (L′ΦL
ˆ )(L′A
ˆ ) −1 (L′VL) + O (n −2 ) ⎤
= tr ⎣⎡ (L′ΦL
p
⎦
{
2
(from 3.11)
ˆ )
} − 2tr ⎡⎣(L′ΦL
−1
ˆ ∗L)(L′ΦL
ˆ )(L′A
ˆ ) −1 (L′VL) ⎤
(L′VL) ⎤⎦ tr ⎡⎣ (L′ΦL
⎦
ˆ )
} − 2tr ⎡⎣(L′ΦL
−1
ˆ ∗L)(L′ΦL
ˆ )(L′A
ˆ ) −1 (L′ΦL) ⎤
(L′ΦL) ⎤⎦ tr ⎡⎣(L′ΦL
⎦
2
ˆ ) −1 (L′VL) ⎤
= tr ⎡⎣(L′ΦL
⎦
+ O p (n −2 )
{
2
ˆ ) −1 (L′VL) ⎤
= tr ⎡⎣(L′ΦL
⎦
+ O p (n −2 )
(recall that V = Φ + Λ ).
(
ˆ ) −1 (L′VL)]
E tr[(L′ΦL
From expression (3.9),
)
2
r
r
⎡
∂ 2Φ ⎤
−3
+ 2 tr[ΘΛ ] + 2 A2 + A1 − ∑∑ wij tr ⎢Θ
⎥ + O(n 2 ).
i =1 j =1
⎢⎣ ∂σ i ∂σ j ⎥⎦
ˆ ) −1 about σ, we have
(ii ) Using a Taylor series expansion for (L′ΦL
=
2
ˆ ∗L)(L′ΦL
ˆ ) −1 (L′ΦL) ⎤ tr ⎡(L′ΦL
ˆ ) −1 (L′A
ˆ ) −1 (L′ΦL) ⎤ =
tr ⎡⎣(L′ΦL
⎦ ⎣
⎦
⎡
r
∂ (L′ΦL) −1
⎣
i =1
∂σ i
tr ⎢ Ι + ∑ (σˆ i − σ i )
⎪⎧⎡
r
∂ (L′ΦL) −1
i =1
∂σ i
×tr ⎨⎢ Ι + ∑ (σˆ i − σ i )
⎩⎪ ⎣
(L′ΦL) +
⎡
ˆ ∗L) +
× ⎢(L′ΦL) −1 (L′A
⎣
+
1
r
r
r
∑∑ (σˆ
2
i =1 j =1
i
2
r
r
∑∑ (σˆ i − σ i )(σˆ j − σ j )
(L′ΦL ) +
i
−σi )
∂ 2 (L′ΦL) −1
i =1 j =1
1
r
r
∑∑ (σˆ
2
i =1
∑ (σˆ
i =1
1
i
− σ i )(σˆ j − σ j )
j =1
∂σ i ∂σ j
∂ 2 (L′ΦL) −1
∂σ i ∂σ j
∂ (L′ΦL) −1
ˆ ∗ L)
(L′A
∂σ i
− σ i )(σˆ j − σ j )
∂ 2 (L′ΦL) −1
∂σ i ∂σ j
⎤⎫
−52
ˆ ∗L) ⎪
(L′A
⎥ ⎬ + O p (n ).
⎦⎭⎪
⎤
(L′ΦL) ⎥
⎦
⎤
(L′ΦL) ⎥
⎦
35
By assumption (A3), we obtain
(
)
ˆ ∗L)(L′ΦL
ˆ ) −1 (L′ΦL) ⎤ tr ⎡(L′ΦL
ˆ ) −1 (L′A
ˆ ) −1 (L′ΦL) ⎤ = tr[ΘA∗ ] + O(n − 3 2 ).
E tr ⎡⎣(L′ΦL
⎦ ⎣
⎦
From (i ) and (ii),
(
)
2
ˆ L) −1 (L′VL)] =
E tr[(L′Φ
A
+ 2 tr[ΘΛ ] + 2 A2 + A1
2
r
r
⎡
∂ 2Φ ⎤
−3
∗
− ∑∑ wij tr ⎢Θ
⎥ − 2 tr[ΘA ] + O(n 2 ),
i =1 j =1
⎢⎣ ∂σ i ∂σ j ⎥⎦
r
r
⎡
∂ 2Φ ⎤
−3
∗
and since 2tr[ΘΛ ] − ∑∑ wij tr ⎢Θ
⎥ − 2tr[ΘA ] = O( n 2 ) (from 3.17),
i =1 j =1
⎢⎣ ∂σ i ∂σ j ⎥⎦
then
(
)
2
ˆ ) −1 (L′VL)] =
E tr[(L′ΦL
2
+ 2 A2 + A1 + O(n − 2 ).
3
(3.17)
Recall that from expression (3.14), we have
(
)
ˆ L) −1 (L′VL) ⎤ = E[F] = + A + O( n − 3 2 )
E tr ⎡⎣ (L′Φ
A
2
⎦
{(
)}
2
ˆ L) −1 (L′VL) ⎤ = 2 + 2 A + O (n − 2 ).
⇒ E tr ⎡⎣(L′Φ
A
2
⎦
From expressions (3.17 and 3.18), we obtain
Var[E[F | σˆ ]] =
=
1
2
A1
2
{
2
+ 2 A2 + A1 −
3
2
(3.18)
− 2 A2 } + O (n − 2 )
3
+ O(n − 2 ),
3
(3.19)
Therefore from expressions (3.16 and 3.19),
Proposition 3.1.2.2.1
where
Var[F ] =
B=
2
(1 + B ) + O(n −
3
2
),
1
( A1 + 6 A2 ) .
2
Summary
(i ) When the adjusted estimate of the variance-covariance matrix is used in constructing
the Wald-type statistic,
36
Ε[ F ] ≈ E = 1 +
and
Var[F ] ≈ V =
where
B=
r
and
2
A2
,
(1 + B ) ,
1
( A1 + 6 A2 ) ,
2
r
r
A1 = ∑∑ wij tr(ΘΦΡ i Φ)tr(ΘΦΡ j Φ),
r
A2 = ∑∑ wij tr(ΘΦΡ i ΦΘΦΡ j Φ)
i =1 j =1
i =1 j =1
(ii ) When the conventional estimate of the variance-covariance matrix is used,
E[F ] ≈ E1 = 1 +
2
Var[F ] ≈ V1 = (1 + B1 ),
and
B1 =
where
and
A2 + 2 A3
6 A2 + A1 + 8 A3
,
2
r
r
1
A3 = ∑∑ wij tr[ΘΦ(Qij − Ρ i ΦΡ j − R ij )Φ]
4
i =1 j =1
Or alternatively,
Ε[ F ] ≈ E1 = 1 +
Var[F ] ≈ V1 =
and
where
B1 =
6S2 + S1
,
2
2
S2
(1 + B1 ),
and S2 = A2 + 2 A3 ,
S1 = A1 − 4 A3
3.2 Estimating the Denominator Degrees of Freedom and the Scale Factor
In order to determine λ and m such that F ∗ = λ F ∼ F( ,m) approximately, we
match the approximated first two moments of F ∗ with those of F( ,m ) distribution.
E[F( , m)] =
m
,
m−2
⎛ m ⎞
Var[F( , m)] = 2 ⎜
⎟
⎝m−2⎠
2
+m−2
.
(m − 4)
37
In the previous section, we found expressions for E and V such that
A
Ε[ F ] ≈ E = 1 + 2 ,
Var[F ] ≈ V =
and
2
(1 + B ) ,
By matching the first moments,
E[F ∗ ] ≈ λ E =
⇒λ =
m
m−2
m
.
E (m − 2)
By matching the second moments,
⎛ m ⎞
Var[F ] ≈ λ V = 2 ⎜
⎟
⎝m−2⎠
∗
2
2
⎡ m ⎤
⇔V = 2⎢
⎥
⎣ (m − 2)λ ⎦
and hence m = 4 +
2
+m−2
(m − 4)
+m−2
(m − 4)
⇔
V
+m−2
=
2
2E
(m − 4)
2+
, where
ρ −1
ρ=
V
.
2E 2
The scale and denominator degrees of freedom are to be estimated by substituting σ̂ for
σ in A1 , A2 , and A3 . The quantity wij can be estimated by using the inverse of the expected
information matrix.
38
4. MODIFYING THE ESTIMATES OF THE DENOMINATOR
DEGREES OF FREEDOM AND THE SCALE FACTOR
As we saw in chapter three, the Wald-type statistic F does not have an F
distribution usually; however, the scaled form of F will have an F distribution
approximately. Indeed, there are number of special cases where the scaled form of
F follows an F distribution exactly. Kenward and Roger (1997) considered two of these
special cases; the Hotelling T 2 and the conventional Anova models, and then they
modified the approximate expressions for the expectation and the variance of F so the
modified expressions produce estimates of the denominator degree of freedom and the
scale that match the exact and known values for the special cases. In this chapter, we
derive Kenward and Roger (K-R) modification based on the Hotelling T 2 and the
balanced one-way Anova models.
4.1 Balanced One-Way Anova Model
4.1.1 The Model and Assumptions
Consider the model: yij = μ + τ i + eij
for i = 1,.., t , and j = 1,...., m,
iid
where μ is the general mean, τ i are the treatment effects, and eij ∼ N(0, σ e2 )
Suppose that we are interested in testing H 0 : τ 1 = ..... = τ t ⇔ L′β = 0,
where
⎡ 0 1 −1 0
⎢ 0 0 1 −1
L′ = ⎢
⎢
⎢
⎣0 0 0
1
0⎤
0 ⎥⎥
,
⎥
⎥
−1⎦
β′ = [ μ τ 1
τt ]
Σ = σ 2 Ι n , where n = mt. The design matrix; Χ is not of full column rank. We can
proceed with this parameterization, using the g-inverse, however, to simplify our
computations, we will reparameterize the model so Χ will be a full column rank matrix
as follows β∗′ = ⎡⎣ μ ∗ τ 1∗
τ t∗−1 ⎤⎦
t
with
∑τ
i =1
∗
i
= 0, and the hypothesis becomes
39
⎡0 1 0
⎢0 0 1
L∗′ = ⎢
⎢
⎢
⎣0 0 0
H 0 :L∗′β∗ = 0, where
0
0
0
0⎤
0 ⎥⎥
⎥
⎥
1⎦
To simplify our notation, from now on, consider Χ = Χ∗ , L = L∗ , and β = β∗ .
4.1.2 The REML Estimates of σ
Since the design is balanced, then the REML estimates are the same as the Anova
estimates (Searle et al. , 1992, section 3.8), and hence
2
σˆ REML
= MSE =
y ′(Ι - Ρ Χ ) y
.
n−t
4.1.3 Computing P1 , Q11 and R11
∂Σ −1
1
Σ Χ = − 2 2 Χ′Χ,
2
(σ )
∂σ
1
∂Σ −1 ∂Σ −1
Q11 = Χ′Σ −1
Σ
Σ Χ = 2 3 Χ′Χ,
2
2
(σ )
∂σ
∂σ
Ρ1 = − Χ′Σ −1
and
R11 = Χ′Σ −1
∂2Σ
∂2Σ
−1
Σ
Χ
0
,
because
=
=0
∂ (σ 2 ) 2
∂ (σ 2 ) 2
In addition, since Φ = ( Χ′Σ −1Χ ) = σ 2 ( Χ′Χ ) , then Q11 − Ρ1ΦΡ1 = 0,
−1
−1
r
r
ˆ − Ρˆ ΦΡ
ˆ.
ˆ =Φ
ˆ . Recall that Φ
ˆ =Φ
ˆ + 2Φ
ˆ ⎧⎨∑∑ wˆ (Q
ˆ ˆ −1R
ˆ ) ⎫⎬ Φ
and hence Φ
A
A
ij
ij
i
j
ij
4
1
1
=
=
i
j
⎩
⎭
4.1.4 Computing A1 and A2
A1 = w11 [ tr(ΘΦΡ1Φ) ] ,
2
2
Since σˆ REML
= MSE, and
Also,
ΘΦΡ1Φ = −
and A2 = w11 tr(ΘΦΡ1ΦΘΦΡ1Φ).
(n − t )MSE
σ2
1
σ
2
2
∼ χ n2−t , then w11 = Var (σˆ REML
)=
−1
L ⎡⎣L′( Χ′Χ) −1 L ⎤⎦ L′( Χ′Χ) −1 Χ′Χ( Χ′Χ) −1
2(σ 2 ) 2
.
n−t
40
=−
1
σ
−1
L ⎡⎣ L′( Χ′Χ) −1 L ⎤⎦ L′( Χ′Χ) −1
2
−1
Notice that L ⎡⎣ L′( Χ′Χ) −1 L ⎤⎦ L′( Χ′Χ) −1 is a p.o. on ℜ(L )
⇒ tr(ΘΦΡ1Φ) = −
(
1
−1
tr L ⎡⎣ L′( Χ′Χ) −1 L ⎤⎦ L′( Χ′Χ) −1
σ
2
1
(
)
)
−1
r L ⎡⎣L′( Χ′Χ) −1 L ⎤⎦ L′( Χ′Χ) −1 (Harville, 1997, corollary 10.2.2)
σ
1
= − 2 r(L) = − 2 ( = t − 1).
=−
2
σ
σ
Therefore,
A1 =
2
2(σ 2 ) 2
2 2
,
=
n − t (σ 2 ) 2 n − t
2(σ 2 ) 2
2
.
=
2 2
n − t (σ )
n−t
Accordingly, for this model, we have A1 = A2 ,
A2 =
and
B=
1
+6
( A1 + 6 A2 ) =
.
2
n−t
4.1.5 Estimating the Denominator Degrees of Freedom and the Scale Factor
Under the null hypothesis,
−1
1
y′Χ( Χ′Χ) −1 L ⎡⎣L′( Χ′Χ) −1 L ⎤⎦ L′( Χ′Χ) −1 Χ′y
F=
MSE
−1
Notice that Ρ = Χ( Χ′Χ) −1 L ⎡⎣L′( Χ′Χ) −1 L ⎤⎦ L′( Χ′Χ) −1 Χ′ is an o.p.o (clear),
and
(
= tr ( L ⎡⎣L′( Χ′Χ)
−1
r(Ρ ) = tr(P) = tr Χ( Χ′Χ) −1 L ⎡⎣L′( Χ′Χ) −1 L ⎤⎦ L′( Χ′Χ) −1 Χ′
−1
−1
L ⎤⎦ L′( Χ′Χ) −1
)
−1
)
In addition, L ⎣⎡ L′( Χ′Χ) −1 L ⎦⎤ L′( Χ′Χ) −1 is a p.o. on ℜ(L), and hence r(Ρ) = ,
y′Ρy
and
F=
.
r(Ρ)MSE
Since ℜ(Ρ) ⊂ ℜ( Χ), then under the null hypothesis, F ∼ F( , n − t ) exactlly,
where n − t = r ( Ι − Ρ Χ ) (Birkes, 2003, lemm 4.4).
41
In other words, the denominator degrees of freedom should equal to n − t , and the scale
should equal to one.
Since A1 =
2 2
,
n −t
A2 =
A2
2
+6
, and B =
,
n−t
n−t
E[F ] ≈ E = 1 +
⇒ρ=
V
(n − t ) ⎡ n − t + + 6 ⎤
=
⎢ (n − t + 2) 2 ⎥ ,
2
2E
⎣
⎦
m = 4+
= 1+
2
,
n−t
then
+2
(n − t + 2) 2 ( + 2)
= 4+
,
ρ −1
(n − t )( − 2) − 4
It can be seen that m − 4 =
Var[F ] ≈ V =
and λ =
2
2⎛
+6 ⎞
(1 + B) = ⎜1 +
⎟.
⎝ n−t ⎠
m
⎛ n − t ⎞⎛ m ⎞
=⎜
⎟⎜
⎟
E (m − 2) ⎝ n − t + 2 ⎠⎝ m − 2 ⎠
(n − t + 2) 2 ( + 2) ( n − t + 2) 2 ( + 2)
>
(n − t )( − 2) − 4
(n − t )( + 2)
(n − t + 2) 2 ( n − t ) + 4(n − t ) + 4
4
=
=
= n−t +4+
n−t
n−t
n−t
So, m > n − t +8, and this means that the approach overestimates the denominator degrees
2
of freedom. The approach in this setting does not perform well especially when n − t is
a small value. Moreover, since m > n − t + 8, then 2m − 2(n − t ) > 4
⇒ 2m − 2(n − t ) − 4 + m(n − t ) > m(n − t )
n−t
m
m( n − t )
⇒λ =
=
<1 ,
n − t + 2 m − 2 2m − 2( n − t ) − 4 + m(n − t )
and again the approach doesn't perform in the way we wish.
4.2 The Hotelling’s T 2 model
4.2.1 The Model and Assumptions
Suppose that y1 ,....., y n are n independent observations from a p-variate population
which has a normal distribution with mean μ and variance Σ .
E[ y ] = Χμ where Χ is np × p design matrix, and μ is p ×1 vector.
42
⎡Ι p ⎤
⎢ ⎥
Χ = ⎢ ⎥ = 1n ⊗ Ι p
⎢Ι p ⎥
⎣ ⎦
⎡Σ p
⎢
, Σ=⎢
⎢0
⎣
0⎤
⎥
⎥ = Ιn ⊗ Σ p
Σ p ⎥⎦
Also, suppose that we are interested in testing: Η 0 : μ = 0
Special Notation Since σ = {σ if } p×( p+1) (i ≤ f ) , then we will use the notation Ρ if , Ρ jg ,
2
Q if , jg , R if , jg , and wif , jg .
ˆ =Φ
ˆ.
In the Hotelling T 2 model, Φ
A
Lemma 4.2.2
Χ′Σ−1Χ = (1n ⊗ Ι p )′(Ι p ⊗ Σ −p1 )(1n ⊗ Ι p )
Proof
= 1n′ Ι p 1n ⊗ Ι p Σ −p1Ι p = nΣ −p1
⇒ Φ = ( Χ′Σ −1Χ)−1 = ( nΣ −p1 ) =
−1
Σp
n
(Harville,1997, chapter 16).
.
∂Σ −p1
∂Σ −1
∂Σ −1
Ρ if = − Χ′Σ
Σ Χ = Χ′
Χ=n
,
∂σ if
∂σ if
∂σ if
−1
Ρ if ΦΡ jg = n
and Qif , jg
∂Σ −p1
∂σ if
Σp
∂Σ −p1
∂σ jg
,
⎛ ∂Σ −1 ∂Σ −1 ⎞
∂Σ −p1
∂Σ −p1
∂Σ −1 ∂Σ −1
= Χ′Σ
Σ
Σ Χ = Χ′ ⎜
Σ
Σp
.
⎟⎟ Χ = n
⎜ ∂σ
∂σ if
∂σ jg
∂
∂
∂
σ
σ
σ
if
jg
if
jg
⎝
⎠
−1
⇒ Qif , jg − Ρ if ΦΡ jg = 0,
Since
∂2Σ
ˆ =Φ
ˆ.
= 0, then R if , jg = 0. We conclude that Φ
A
∂σ if ∂σ jg
r
r
ˆ − Ρˆ ΦΡ
ˆ =Φ
ˆ + 2Φ
ˆ ⎧⎨∑∑ wˆ (Q
ˆ ˆ −1R
ˆ ) ⎫⎬ Φ
ˆ
Recall that Φ
A
ij
ij
i
j
ij
4
⎩ i =1 j =1
⎭
Theorem 4.2.3
The REML estimate of Σ p is S =
n
where
A = ∑ (y i − y )(y i − y )′, and y =
i =1
A
,
n −1
1 n
∑ yi
n i =1
43
Proof: When we are interested in deriving the REML estimate, we consider our data as
z = Κ ′y such that Κ ′Χ = 0.
Choose M such that M′M = Ι n −1 , MM′ = Ι n − Ρ1n (Birkes, 2004, chapter 8)
We can choose Κ = M ⊗ Ι p , and this is true because
Κ ′Χ = (M′ ⊗ Ι p )(1n ⊗ Ι p ) = M′1n ⊗ Ι p = 0 (notice that ℜ(M′) = ℜ(1n ) ⊥ )
Also, Κ ′Κ = (M′ ⊗ Ι p )(M ⊗ Ι p ) = Ι n −1 ⊗ Ι p = Ι np − p
(
)
and ΚΚ ′ = (M ⊗ Ι p )(M′ ⊗ Ι p ) = MM′ ⊗ Ι p = Ι n − Ρ1n ⊗ Ι p = Ι np − Ρ1n ⊗ Ι p
Observe that since Ρ Χ = (1n ⊗ Ι p ) ⎡⎣(1′n ⊗ Ι p )(1n ⊗ Ι p ) ⎤⎦
−1
(1′ ⊗ Ι ) = Ρ
n
p
1n
⊗Ιp,
then ΚΚ ′ = Ι np − Ρ Χ .
Since z = Κ ′y = (M′ ⊗ Ι p )y , then
Var[z ] = Var[(M′ ⊗ Ι p )y ] = (M′ ⊗ Ι p ) Σ(M ⊗ Ι p )
= (M′ ⊗ Ι p )(Ι n ⊗ Σ p )(M ⊗ Ι p ) = M′M ⊗ Σ p .
Var[z ] = M ′M ⊗ Σ p = Ι n −1 ⊗ Σ p .
So,
In fact, Var[z] has the same construction as Var[y], but with n − 1 blocks of Σ p instead of
n blocks. z1 ,....., z n −1 are iid N p (0,Σ p ) .
n −1
∑z z ′
i =1
Lemma
When z1 ,....., z n −1 are iid N p (0,Σ p ) , then
Proof
1
1 n −1
= constant − ( n − 1)log Σ p − ∑ z′i Σ −p1z i
2
2 i =1
Notice that
i i
n −1
is the ML estimator of Σ p .
⎛ n −1 ′ −1 ⎞
⎛ n −1 −1 ′ ⎞
⎛ −1 n −1
−1
′
′⎞
z
Σ
z
tr
z
Σ
z
tr
Σ
z
z
tr
=
=
=
∑
i
p i
⎜∑ i p i ⎟
⎜∑ p i i ⎟
⎜ Σ p ∑ zi zi ⎟
i =1
i =1
⎠
⎝ i =1
⎠
⎝ i =1
⎠
⎝
n −1
n −1
1
1
⇒ = constant − ( n − 1)log Σ p − tr ( Σ −p1A ) , and hence the MLE of Σ p is
2
2
(Anderson, 2003, lemma 3.2.2), and (Rao, 2002, chapter 8).
∑z z ′
i =1
i i
n −1
44
Remark
Since MM ′ = Ι n − Ρ1n ,
n −1
∑m
i =1
2
ji
n −1
∑m
i =1
ji
=
n −1
n
mki =
for j = 1,...., n
−1
n
for j ≠ k , j , k = 1,...., n
n
Now, since z = Κ ′y = (M ⊗ Ι p )y, then z i = ∑ m ji y j ,
j =1
⎛ n
⎞⎛ n
⎞ n n
z i z′i = ⎜ ∑ m ji y j ⎟⎜ ∑ m ji y′j ⎟ = ∑∑ m ji mki y j y ′k
⎝ j =1
⎠⎝ j =1
⎠ k =1 j =1
and
So,
n −1 n
n
n
n
⎛ n −1
⎞
′
′
′
=
=
z
z
m
m
y
y
y
y
∑
∑∑∑
∑∑
i
i
ji ki j k
j k ⎜ ∑ m ji mki ⎟
i =1
i =1 k =1 j =1
k =1 j =1
⎝ i =1
⎠
n −1
n
⎛ n −1 ⎞ n
= ∑ y j y′j ⎜
⎟+∑
⎝ n ⎠ j =1,
j =1
k≠ j
n
= ∑ y j y ′j −
j =1
n
= ∑ y j y ′j −
j =1
So,
n −1
∑ y y′ ⎜⎝ n ⎟⎠
k =1
j
k
(from the remark above)
n
1 n
1 n
′
y j ∑ y′k
∑ y jy j − n ∑
n j =1
j =1
k =1
n
⎤ n
1 n
1⎡
′
′
′
′
′
n
n
y
y
y
y
y
y
−
−
∑ j j n⎢
∑
j j ⎥ = ∑ y j y j − ny y
n j =1
j =1
⎣
⎦ j =1
n
∑ zi z′i = ∑ y i y′i − ny y′
i =1
i =1
n
n
∑ y y′ = ∑ (y
and since
i =1
then
⎛ −1 ⎞
n
n −1
∑z
i =1
i
i
i =1
i
−y )(y i − y )′ + ny y′ (Anderson, 2003, lemma 3.2.1),
n
i
z′i = ∑ (y i −y )(y i − y )′.
i =1
n
We conclude that the REML estimate of Σ p = S =
∑ (y
i =1
i
− y )(y i − y )′
n −1
=
A
.
n −1
4.2.4 Computing A1 and A2
Since we are testing H 0 : μ = 0 , then L = Ι p ,
= p, Θ = L(L′ΦL) −1 L′ = Φ −1 ,
45
p
p
p
p
p
p
p
p
A1 = ∑ ∑ ∑∑ wif , jg tr(ΘΦΡ if Φ)tr(ΘΦΡ jg Φ) = ∑ ∑ ∑∑ wif , jg tr(Ρ if Φ)tr(Ρ jg Φ), (4.1)
i =1 f =1, j =1 g =1,
i≤ f
j≤g
p
p
p
i =1 f =1, j =1 g =1,
i≤ f
j≤ g
p
p
p
p
p
A2 = ∑ ∑ ∑∑ wif , jg tr(ΘΦΡ if ΦΘΦΡ jg Φ) = ∑ ∑ ∑∑ wif , jg tr(Ρ if ΦΡ jg Φ)
i =1 f =1, j =1 g =1,
i≤ f
j≤g
1−δ if
1−δ jg
σ if , and tr(Ρ jg Φ) = −2
tr(Ρif Φ) = −2
⇒ tr(Ρif Φ)tr(Ρ jg Φ) = 2
⎧1
⎩0
δ if = ⎨
where
(4.2)
i =1 f =1, j =1 g =1,
i≤ f
j≤ g
2 −δ if −δ jg
σ jg
σ if σ jg ,
for i = f
for i ≠ f
(4.3)
, and Σ −p1 = {σ if }p× p
Σ
S ∼ Wishart [ n −p1 , n − 1], and A ∼ Wishart [Σ p , n − 1] (Anderson, 2003, corollary 7.2.3).
Also, Anderson provided in section 7.3 that the characteristic function of A11 , A 22 ,...., A pp ,
2A12 , 2A13 ,......, 2A p -1, p as
Ε ⎡⎣ eitr ( AΘ ) ⎤⎦ = Ι − 2iΘΣ p
−N
2
= Ψ,
where Θ = {θ ij } p× p with θ ij = θ ji , and A = {A ij } p× p , N = n − 1
Finding the moments
For the first moments,
1 ∂Ψ
Ε[A ij ] = ×
i ∂θij
Θ =0
1 1
= (− N ) Ι − 2iΘΣ p
i 2
1
− N −1
2
⎡
⎤
∂
(Ι − 2iΘΣ p ) ⎥
Ι − 2iΘΣ p tr ⎢ (Ι − 2iΘΣ p ) −1
∂θij
⎢⎣
⎥⎦ Θ = 0
⎛ ∂Θ
⎞ ⎧⎪(n − 1)σ ii
Σp ⎟ = ⎨
= N tr ⎜
⎜ ∂θ
⎟ ⎪ 2( n − 1)σ ij
⎝ ij
⎠ ⎩
For the second moments,
2
2 −δ if −δ jg
1
∂2Ψ
Ε[Aif A jg ] = 2 ×
i ∂θif ∂θ jg
Θ=0
for i = j
for i ≠ j
,
and hence E[σˆ ij ] = σ ij
46
=
1 ∂ ⎧⎪ − N
Ι − 2iΘΣ p
⎨
i 2 ∂θ jg ⎪ 2
⎩
−N
2
⎡
⎛
⎞ ⎤ ⎫⎪
∂Θ
tr ⎢(Ι − 2iΘΣ p ) −1 ⎜ −2i
Σ p ⎟⎥ ⎬
⎜
⎟⎥
∂θif
⎢⎣
⎝
⎠ ⎦ ⎪⎭ Θ =0
⎛ ∂Θ
⎞ ⎛ ∂Θ
⎞
⎛ ∂Θ
⎞
∂Θ
= N 2 tr ⎜
Σp ⎟
Σ p ⎟ tr ⎜
Σ p ⎟ + 2 Ntr ⎜
Σp
⎜ ∂θ
⎟ ⎜ ∂θ
⎟
⎜ ∂θ
⎟
∂θ if
⎠
⎝ jg
⎠ ⎝ if
⎠
⎝ jg
⇒2
2 −δ if −δ jg
⎛ ∂Θ
⎞ ⎛ ∂Θ
⎞
⎛ ∂Θ
⎞
∂Θ
Ε[A if A jg ] = (n − 1)tr ⎜
Σ p ⎟ tr ⎜
Σ p ⎟ + 2(n − 1)tr ⎜
Σp
Σp ⎟
⎜ ∂θ
⎟ ⎜ ∂θ
⎟
⎜ ∂θ
⎟
∂θ jg
⎝ if
⎠ ⎝ jg
⎠
⎝ if
⎠
⇒ Cov[A if , A jg ] = Ε[A if A jg ] − Ε[A if ]Ε[A jg ]
1
⎛
= ⎜ 2−δif −δ jg
⎝2
⎞
⎛ ∂Θ
⎞ ⎛ ∂Θ
⎞
⎛ ∂Θ
∂Θ
⎞ ⎪⎧
2
Σp ⎟
Σ p ⎟ tr ⎜
Σ p ⎟ + 2(n − 1)tr ⎜
Σp
⎟ ⎨(n − 1) tr ⎜⎜
⎟
⎟ ⎜ ∂θ
⎟
⎜ ∂θ
∂θ jg
⎠ ⎪⎩
⎠
⎝ ∂θif
⎠ ⎝ jg
⎠
⎝ if
⎞ ⎫⎪
⎛ ∂Θ
⎞ ⎛ ∂Θ
−( n − 1) 2 tr ⎜
Σ p ⎟ tr ⎜
Σ p ⎟⎬
⎟
⎜ ∂θ
⎟ ⎜ ∂θ
⎠ ⎪⎭
⎝ if
⎠ ⎝ jg
⎞
∂Θ
⎛ 2(n − 1) ⎞ ⎛ ∂Θ
= ⎜ 2−δ if −δ jg ⎟ tr ⎜
Σp
Σp ⎟
⎟
∂θ jg
⎝2
⎠ ⎜⎝ ∂θ if
⎠
⎛ ∂Θ
⎞
∂Θ
Σp
Σp ⎟
2tr ⎜
⎜ ∂θ
⎟
∂θ jg
⎝ if
⎠
⇒ Cov[σˆ if , σˆ jg ] =
2 −δ −δ
(n − 1)2 if jg
(4.4)
⎛ ∂Θ
⎞ 1−δ −δ
∂Θ
Σp
Σ p ⎟ = 2 if jg (σ igσ jf + σ fgσ ij ),
By noticing that tr ⎜
⎜ ∂θ
⎟
∂θ jg
⎝ if
⎠
Cov[σˆ if , σˆ jg ] =
σ igσ jf + σ fgσ ij
(4.5)
(4.6)
( n − 1)
Combining expressions (4.1), (4.3), and (4.6), we obtain
p
p
p
p
A1 = ∑ ∑ ∑∑
i =1 f =1, j =1 g =1,
i≤ f
j≤g
p
p
p
p
= ∑∑∑∑
i =1 f =1 j =1 g =1
σ igσ jf + σ fgσ ij
(n − 1)
σ igσ jf + σ fgσ ij
(n − 1)2
2 −δ if −δ jg
2
2
2 −δ if −δ jg
2 −δ if −δ jg
σ if σ jg
σ if σ jg
Note In the last expression above for A1 , we divided by 2
the summations repeat wif , jg 2
2 −δ if −δ jg
times.
2 −δ if −δ jg
, and this is true because
47
1 p p p p
⇒ A1 =
σ igσ if σ jf σ jg + σ fgσ jgσ ijσ if
∑∑∑∑
n − 1 g =1 f =1 j =1 i =1
=
1 ⎪⎧ p p ⎛ p
if
⎨∑∑ ⎜ ∑ σ igσ
n − 1 ⎩⎪ f =1 g =1 ⎝ i =1
=
p
p
⎫
1 ⎧ p p
⎨∑∑ (δ gf δ gf ) + ∑∑ (δ fjδ fj ) ⎬
n − 1 ⎩ f =1 g =1
j =1 f =1
⎭
p
p
p
⎛ p
⎞⎛ p
jg ⎞
jg ⎞ ⎛
if
+
σ
σ
σ
σ
⎜
⎟
⎜
⎟
∑
∑∑
∑
∑
⎟
⎜ σ ijσ
jf
fg
⎠ ⎝ j =1
⎠ j =1 f =1 ⎝ g =1
⎠ ⎝ i =1
⎞ ⎪⎫
⎟⎬
⎠ ⎭⎪
Therefore,
2p
n −1
A1 =
Also,
⎛
⎞
⎛
∂Σ −p1
∂Σ −p1
∂Σ p −1 ∂Σ p
Σp
Σ p ⎟ = tr ⎜ Σ −p1
Σp
tr(Ρ if ΦΡ jg Φ) = tr ⎜ nΣ −p1
⎜
⎟
⎜
∂
∂
∂
∂σ jg
σ
σ
σ
if
jg
if
⎝
⎠
⎝
1−δ if −δ ij
= (σ igσ jf + σ fgσ ij )2
,
⎞
⎟⎟
⎠
by utilizing expresion (4.5) above
Combining expressions (4.2), (4.6), and (4.7), we obtain
p
p
p
p
A2 = ∑ ∑ ∑∑
i =1 f =1, j =1 g =1,
i≤ f
j≤g
p
p
p
p
= ∑∑∑∑
i =1 f =1 j =1 g =1
=
(σ igσ jf + σ fgσ ij )
(n − 1)
(σ igσ jf + σ fgσ ij )
(n − 1)2
2 −δ if −δ ij
1−δ if −δ ij
(σ igσ jf + σ fgσ ij )2
1−δ if −δ ij
(σ igσ jf + σ fgσ ij )2
p
p
p
p
1 ⎧ p p p p
ig
jf
fg
ij
⎨∑∑∑∑ σ igσ σ jf σ + ∑∑∑∑ σ igσ σ jf σ
2(n − 1) ⎩ i =1 j =1 g =1 f =1
i =1 j =1 g =1 f =1
p
p
p
p
p
p
p
p
⎫
+ ∑∑∑∑ σ fgσ igσ ijσ jf + ∑∑∑∑ σ fgσ fgσ ijσ ij ⎬
i =1 j =1 g =1 f =1
i =1 j =1 g =1 f =1
⎭
p
p
p
p
p
p
⎫
1 ⎧ p p
=
+
+
+
δ
δ
δ
δ
(1)(1)
(1)(1) ⎬
⎨∑∑
∑∑
∑∑
∑∑
if if
if if
2(n − 1) ⎩ g =1 f =1
i =1 f =1
i =1 f =1
g =1 j =1
⎭
1
( p2 + p + p + p2 )
2(n − 1)
p( p + 1)
A2 =
Therefore,
n −1
=
4.2.5 Estimating the Denominator Degrees of Freedom and the Scale Factor
.
(4.7)
48
When testing Η 0 : μ = 0 , a scaled Hotelling Τ 2 has an exact F-test (Krzanowski, 2000,
chapter 8). Indeed, the denominator degrees of freedom m should equal n − p , and the
scale λ should equal
n− p
n− p
.
=
n − p + p −1 n −1
( + 1)
3 +4
, and B =
(note that = p ), then
n −1
n −1
A
+1
2
2
3 +4
E = 1+ 2 = 1+
, and V = (1 + B) = (1 +
)
n −1
n −1
V
⎛ n + 3 + 3 ⎞⎛ n − 1 ⎞
⇒ρ=
=⎜
⎟⎜
⎟,
2
2E
⎝ n+
⎠⎝
⎠
(n + )( + 2)
+2
= 4+ 2
,
m = 4+
n + n − 4 − 3 + 3n
ρ −1
Since A1 =
2
,
n −1
λ=
and
A2 =
m
⎛ n − 1 ⎞⎛ m ⎞
=⎜
⎟⎜
⎟.
E (m − 2) ⎝ n + ⎠⎝ m − 2 ⎠
The estimates do not match the exact values for this model.
4.3 Kenward and Roger’s Modification
From section 4.1, we found that for the balanced one-way Anova model,
2 2
A1 =
,
n−t
A2 =
m = 4+
λ=
and
2
,
n−t
Β=
A
+6
, and 1 = ,
n−t
A2
+2
≠ n − t,
V
−1
2E 2
n−t
≠1
E (n − t − 2)
And from section 4.2, we found that for the Hotelling Τ 2 model,
A1 =
2
,
n −1
A2 =
m = 4+
( + 1)
,
n −1
B=
3 +4
, and
n −1
+2
≠ n− ,
V
−1
2E 2
A1
2
=
,
A2
+1
49
and
λ=
n−
≠
E (n − − 2)
n−
.
n −1
Instead of estimating E[ F ] and Var[F ] by E and V , we will construct modified estimators
E ∗ and V ∗ leading to
+2
,
V∗
−1
2 E ∗2
m∗
∗
λ = ∗ ∗
E (m − 2)
m∗ = 4 +
and
(4.8)
(4.9)
such that m∗ and λ ∗ match the exact values for the denominator degrees of freedom and
the scale respectively for the balanced one-way Anova and the Hotelling T 2 models.
Applying expression (4.9) on the balanced one-way Anova model,
n−t
= 1,
E (n − t − 2)
The expression above, can be rewritten in A2 term as
∗
E ∗ ( − A2 )
E∗ =
and hence
= 1,
1
A
1− 2
(4.10)
Similarly, applying expression (4.9) on the Hotelling T 2 ,
n−
n−
=
,
E (n − − 2)
n −1
The expression above, can be rewritten in term of A2 as
∗
1
=
⎤
∗ ⎡ ( + 1)
− ( + 1) ⎥
E ⎢
⎣ A2
⎦
1
⇒ ∗
=
E [ ( + 1) − ( + 1) A2 ]
and hence
E∗ =
1
A
1− 2
A2
( + 1)
1
,
( +1)
(4.11)
50
From expressions (4.10 and 4.11), we can see that the approximate expression for
E[ F ] should be as
E∗ =
1
A
1− 2
(4.12)
Applying expression (4.8) on the balanced one-way Anova model,
+2
= n−t
V∗
−1
2E ∗2
The expression above can be rewritten in B term as
+2
+6
4+
=
∗
V
B
−1
∗2
2E
4+
(4.13)
2( + 2) E ∗2
+6
=
− 4,
∗
∗2
V − 2E
B
⎧
⎛
⎞⎫
⎪
⎜
⎟ ⎪⎪
2
+
2
⎪
V ∗ = ⎨ E ∗2 ⎜ 1 +
⎟⎬.
+6
⎪
⎜
− 4 ⎟⎪
⎪⎩
⎝
⎠ ⎪⎭
B
Moreover, since for the balanced one-way Anova model,
⇒
+6
2
, and A2 =
,
n−t
n−t
2
1
B,
= 1−
E∗ =
A2
6
+
1−
B=
then
and hence
⎧
⎫
⎪
⎪
2⎪
+ 6 + ( − 2) B
⎪
∗
V = ⎨
⎬
2
⎪ ⎛1 − 2 B ⎞ ( + 6 − 4 B) ⎪
⎟
⎪⎩ ⎜⎝
⎪⎭
+6 ⎠
⎧
⎫
⎛ −2⎞
1+ ⎜
B
⎪
⎪
⎟
2⎪
⎪
+6⎠
⎝
= ⎨
⎬,
2
⎪ ⎛1 − 2 B ⎞ ⎛1 − 4 B ⎞ ⎪
⎟ ⎜
⎟
⎪⎩ ⎜⎝
+6 ⎠ ⎝
+ 6 ⎠ ⎪⎭
by dividing the numerator and the denominator by + 6.
(4.14)
51
Similarly, applying expression (4.8) on the Hotelling T 2 model,
4+
+2
V∗
−1
2E ∗2
= n−
The expression above can be rewritten in B term as
4+
+2
∗
V
−1
2E ∗ 2
=
3 +4
+1−
B
(4.15)
2( + 2) E ∗2 3 + 4
⇒
=
−3− ,
V ∗ − 2E ∗ 2
B
⎧
⎛
⎪
⎜
2⎪
+2
V ∗ = ⎨ E ∗2 ⎜1 +
3
+4
⎪
⎜
−3−
⎪⎩
B
⎝
⎞⎫
⎟ ⎪⎪
⎟⎬
⎟⎪
⎠ ⎪⎭
Moreover, Since for the Hotelling T 2 model,
3 +4
( + 1)
B=
, and A2 =
,
n −1
n −1
1
+1
B,
then E ∗ =
= 1−
A
3 +4
1− 2
and hence
⎧
⎫
⎪
⎪
2⎪
3 +4− B
⎪
V∗ = ⎨
⎬.
2
1
+
⎛
⎞
⎪ 1−
B [3 + 4 − ( + 3) B] ⎪
⎪⎩ ⎜⎝ 3 + 4 ⎟⎠
⎪⎭
⎧
⎫
⎛ −1 ⎞
B
1+ ⎜
⎪
⎪
⎟
2⎪
⎪
⎝3 +4⎠
= ⎨
⎬,
2
1
3
+
+
⎛
⎞
⎛
⎞
⎪ 1−
B 1−
B ⎪
⎪⎩ ⎜⎝ 3 + 4 ⎟⎠ ⎜⎝ 3 + 4 ⎟⎠ ⎪⎭
(4.16)
by dividing the numerator and the denominator by 3 + 4.
From expressions (4.14 and 4.16), we can see that the approximate expression for the
variance should be as
V∗ =
⎫
2⎧
1 + d1 B
⎨
⎬,
2
⎩ (1 − d 2 B) (1 − d3 B) ⎭
52
where
−2⎫
−1 ⎫
d1 =
⎪
+6
3 +4⎪
⎪
⎪
2 ⎪
+1 ⎪
2
d2 =
⎬ for the balanced one-way Anova model, d 2 =
⎬ for Hotelling Τ model
3 + 4⎪
+ 6⎪
4 ⎪
+3 ⎪
d3 =
d3 =
⎪
3 + 4 ⎪⎭
+6⎭
d1 =
So far we obtained a general expression for the variance but not for the coefficients di ' s .
First, we determine some simple linear relationship among the di ' s that are common
between the two cases
(i ) The numerator of d 2 = − numerator of d1
(ii ) The numerator of d3 = numerator of d 2 + 2 = − numerator of d1 + 2
(iii ) The denominator of di ' s is a linear function of the numerator of d1 , where
c( − 2) + d = + 6
for the balanced one-way Anova model,
−c + d = 3 + 4
for the Hotelling T 2 model.
Solving the two equations above for c and d , we have c = −2 and d = 3 + 2, and hence
the denominator of di ' s = 3 + 2(1 − the numerator of d1 ) .
The 's involved in relationships (i ), (ii ), and (iii ) for both special cases are fixed,
whereas any other , which is involved in the numerator of d1 , is accommodated as a
function of the ratio of A1 and A2 .
Let the numerator of d1 = g ( A1 , A2 ) = a
Notice that g = − 2
g = −1
Since
then
and
A1
= and
A2
A1
+ b = g.
A2
for the balanced one-way Anova model,
for the Hotelling T 2 model.
2
for the balanced one-way Anova and Hotelling T 2 respectively,
+1
a + b = − 2,
2
a
+ b = −1
+1
(4.17)
53
Solving the equations above, we have
a=
( + 1)( − 1)
+1
+4
=
, and b = −
.
( + 1) − 2
+2
+2
From the expressions we obtained for both a and b , we have
+ 1 A1
+4
−
+ 2 A2
+2
g=
g=
Therefore,
( + 1) A1 − ( + 4) A2
( + 2) A2
for both special cases.
Summary
The Kenward and Roger approach that was derived in section 3.2 will be modified as
m∗ = 4 +
λ∗ =
and
+2
,
ρ ∗ −1
where ρ ∗ =
V∗
,
2E ∗2
m∗
such that
E ∗ (m∗ − 2)
−1
⎛ A ⎞
E = ⎜1 − 2 ⎟ ,
⎝
⎠
∗
and
V∗ =
⎫
1 + d1 B
2⎧
⎨
⎬,
2
⎩ (1 − d 2 B ) (1 − d3 B ) ⎭
g
3 + 2(1 − g )
−g
d2 =
3 + 2(1 − g )
−g+2
d3 =
,
3 + 2(1 − g )
d1 =
where
g=
( + 1) A1 − ( + 4) A2
.
( + 2) A2
Recall that before the modification, Ε[ F ] ≈ E = 1 +
B=
where
r
and
r
, and Var[F ] ≈ V =
2
(1 + B ) ,
1
( A1 + 6 A2 ) ,
2
A1 = ∑∑ wij tr(ΘΦΡ i Φ)tr(ΘΦΡ j Φ),
i =1 j =1
A2
r
r
A2 = ∑∑ wij tr(ΘΦΡ i ΦΘΦΡ j Φ)
i =1 j =1
54
4.4 Modifying the Approach with the Usual Variance-Covariance Matrix
Recall that when we used the usual variance-covariance of the fixed effects
estimates Φ̂ in section 3.1.1, we found that
Ε[ F ] ≈ 1 +
where
B1 =
S2
,
6 S 2 + S1
,
2
and Var[F ] ≈
2
(1 + B1 ),
and S 2 = A2 + 2 A3 ,
S1 = A1 − 4 A3
Since for both special cases, A3 = 0, then S2 and S1 are reduced to A2 and A1 respectively,
and hence the modification based on the same two special cases considered for K-R
modification should be analogous to K-R modification derived in section 4.3 (substitute
S 2 and S1 for A2 and A1 respectively). The problematic part about this modification is
that A3 = 0 , where there is no clear guide for us to accomplish the modification for A3 .
Simulation studies (as we will see in chapter 8) show that this modification does not
perform well to estimate the denominator degrees of freedom and the scale.
55
5. TWO PROPOSED MODIFICATIONS FOR
THE KENWARD-ROGER METHOD
As we saw in chapter four, the modification applied by Kenward and Roger was
based on modifying the approximate expressions for the expectation and the variance of
the Wald type statistic F so the approximation produces the known values for the
denominator degrees of freedom and the scale for the special cases where the distribution
of F is exact. The modification for the approximate expression for the expectation was
relatively simple and direct; whereas the modification applied by Kenward and Roger for
the approximate expression for the variance was more complicated. In fact, the
modification that applied by Kenward and Roger is not unique and is ad-hoc. In this
chapter, we propose two other modifications for Kenward and Roger’s method based on
the same two special cases. The approximate expression for the expectation of F is
modified as done by Kenward and Roger (1997); however, instead of modifying the
approximate expression for the variance, we modify other quantities as will be seen
shortly. These two proposed modifications are simpler than the K-R modification both in
computation and in derivation. Moreover, like the K-R method, our proposed
modifications produce the exact values for the denominator degrees of freedom and scale
in the two special cases.
5.1 First Proposed Modification
In order to approximate the F test for the fixed effects, H 0 : L′β = 0 in a model as
described in section 2.1, we use the same form of Wald-type F statistic used in K-R
method,
Define h =
1
ˆ L) −1 L′βˆ .
F = βˆ ′L(L′Φ
A
+2
.
V
−1
2E 2
Recall that E = 1 +
A2
, and V =
2
(1 + B )
Also, recall that chapter 4, for the balanced one-way Anova model, we had
56
2 2
A1 =
,
n−t
m = 4+
A2 =
2
,
n−t
Β=
+2
= 4 + h ≠ n − t,
V
−1
2E 2
A
+6
, and 1 = ,
n−t
A2
λ=
and
n−t
≠1
E (n − t − 2)
And for the Hotelling Τ 2 model, we had
A1 =
2
,
n −1
A2 =
( + 1)
,
n −1
B=
3 +4
, and
n −1
+2
= 4+h ≠ n− ,
V
−1
2E 2
m = 4+
and
A1
2
=
,
A2
+1
λ=
n−
≠
E (n − − 2)
n−
n −1
In this proposed modification, we keep the modified estimator for E[F ] to be E ∗ as was
derived in section 4.3; however, instead of constructing a modified estimator for Var[F ],
we will construct a modified estimator for h, say h1 leading to
m1 = 4 + h1 ,
(5.1)
such that m1 matches the exact values for the denominator degrees of freedom for the
balanced one-way Anova and the Hotelling T 2 models.
Applying expression (5.1) on the balanced one way Anova model,
4 + h1 =
h1 =
+6
,
B
(from 4.13)
+6
−4
B
And, applying expression (5.1) on the Hotelling Τ 2 model,
3 +4
+(1 − ) ,
B
3 +4
h1 =
−3− .
B
4 + h1 =
(from 4.15)
For both models,
h1 = d 0 +
d1
,
B
(5.2)
57
where
and
d 0 = −4 ⎫
⎬
d1 = + 6 ⎭
for the balanced one-way Anova model,
d 0 = −( + 3) ⎫
⎬
d1 = 3 + 4 ⎭
for the Hotelling Τ 2 model.
Also, notice that d1 can be expressed as a linear function of d 0 for both models, that is
d1 = a1d 0 + a2 ,
+ 6 = −4a1 + a2
where
for the balanced one way Anova model,
3 + 4 = −( + 3)a1 + a2
for the Hotelling T 2 model.
Solving the equations above, we have a1 = −2, and a2 = − 2. That is
d1 = −2d 0 + − 2.
(5.3)
Like the discussion we made for K-R modification in section 4.3, we need to distinguish
between the fixed
and the one to be considered as a function of the ratio of A1 and A2 .
The in expression 5.3 is a fixed one and the
in the d 0 expression is considered to be a
function of the ratio of A1 and A2 .
Let
⇒
g ( A1 , A2 ) = d 0 = a + b
A1
A2
−4 = a+b ,
2
− (3 + ) = a + b
+1
for the balanced one way Anova model,
for the Hotelling T 2 model.
Solving the two equations above,
a=−
( + 1)
+1
− 4, and b =
+2
+2
And hence,
+ 1 ⎛ A1 ⎞
( + 1)
−4+
⎜ ⎟
+2
+ 2 ⎝ A2 ⎠
( + 1) A1 − ( 2 + 5 + 8) A2
=
( + 2) A2
d0 = −
Summary
Kenward and Rogers’s approach that was derived in section 3.1 is modified as
(5.4)
58
and λ1 =
m1 = 4 + h1 ,
m1
E (m1 − 2)
∗
such that
E ∗ = (1 − A2
) −1 and
h1 = d 0 +
− 2 − 2d 0
,
B
where
( + 1) A1 − ( 2 + 5 + 8) A2
d0 =
( + 2) A2
(Recall that B =
6 A2 + A1
).
2
5.2 Second Proposed Modification
In order to approximate the F test for the fixed effects, H 0 : L′β = 0 in a model as
described in section 2.1, we use the same form of Wald-type F statistic used in K-R
1
ˆ L) −1 L′βˆ .
F = βˆ ′L(L′Φ
A
method,
Recall that chapter 4, for the balanced one-way Anova model, we had
A1 =
2 2
,
n−t
A2 =
m = 4+
and
λ=
2
,
n−t
Β=
A
+6
, and 1 = ,
n−t
A2
V
+2
≠ n − t , where ρ =
2E 2
ρ −1
n−t
≠1
E (n − t − 2)
And for the Hotelling Τ 2 model, we had
A1 =
2
,
n −1
A2 =
( + 1)
,
n −1
B=
3 +4
, and
n −1
A1
2
=
,
A2
+1
+2
≠ n− ,
ρ −1
n−
n−
λ=
≠
E (n − − 2)
n −1
m = 4+
and
In this proposed modification, we also keep the modified estimator for E[F ] to be E ∗ as
for the Kenward and Roger modification and first proposed modification.
59
Since when λ F ∼ F( , m) exactly,
then
ρ=
V[F ]
+m−2
,
=
2
2E[F ]
( m − 4)
we construct a modified estimator forζ = ρ , say ζ 2 such that
ζ2 =
+ m2 − 2
.
m2 − 4
(5.5)
where m2 are the exact values for the denominator degrees of freedom for the balanced
one-way Anova and the Hotelling T 2 models.
For the balanced one-way Anova model, and since F ∼ F( , n − t ) exactly, then by (5.5),
+ n−t −2
ζ2 =
n−t −4
−2
1+
n−t
=
(dividing by n − t )
4
1−
n−t
Similarly, since for the Hotelling Τ 2 model,
n−2
n− −4
1
1−
n −1
=
+3
1−
n −1
n− p
F ∼ F( , n − p ) exactly, then by (5.5),
n −1
ζ2 =
(dividing by n − 1)
For both models,
ζ2 =
a0 + a1 A1 + a2 A2
,
b0 + b1 A1 + b2 A2
Notice that unlike the first proposed modification where we express h1 in B term, in this
proposed modification, we express ζ 2 in term of A1 and A2 .
2 2
2
−2
+ a2
1+
n−t
n−t =
n−t
For the balanced one-way Anova model, ζ 2 =
2
4
2
2
1−
b0 + b1
+ b2
n−t
n−t
n−t
a0 + a1
60
⇒ a0 = c,
b0 = c,
2 2 a1 + 2 a2 = c( − 2)
(5.6)
2 b1 + 2 b2 = c(−4)
(5.7)
2
for any c ≠ 0
2
+ a2
n −1
and for the Hotelling Τ 2 model,
ζ2 =
2
b0 + b1
+ b2
n −1
⇒ a0 = c,
2 a1 + ( + 1) a2 = −c
a0 + a1
b0 = c,
( + 1)
1
1−
n −1 =
n −1
( + 1)
+3
1−
n −1
n −1
(5.8)
2 b1 + ( + 1)b2 = −c( + 3)
(5.9)
By solving equations (5.6) and (5.8), we obtain
a1 =
c
2c
, and a2 = −
.
2 ( + 2)
( + 2)
By solving equations (5.7) and (5.9), we obtain
b1 = −
c
c( + 4)
, and b2 = −
.
( + 2)
( + 2)
Take c = ( + 2)
⇒ a0 = b0 = ( + 2),
a1 = , a2 = −2,
2
b1 = −1,
b2 = −( + 4)
So, the modification that we apply for ζ will be
ζ2 =
4+
( + 2) +
A1 − 2 A2
2
,
( + 2) − A1 − ( + 4) A2
+ 2 2 ( + 2) + 2( A1 − A2 )
=
ζ 2 −1
A1 + 2 A2
Summary
The Kenward and Roger’s approach that was derived in section 3.1 is modified as
and
m2 =
2 ( + 2) + 2( A1 − A2 )
,
A1 + 2 A2
λ2 =
m2
E (m2 − 2)
∗
such that
E ∗ = (1 − A2
) −1
61
5.3 Comparisons among the K-R and the Proposed Modifications
Based on the way we derived K-R and the proposed modifications, we should
expect that these modifications give identical estimates of the denominator degrees of
freedom when the ratio of A1 and A2 is the same as one of the ratios for the two special
cases. In this section, we derive some useful results about the three modifications. From
now on, when we say Kenward and Roger’s approach, we mean after the modification
that makes the approach produce the exact values for the two special cases.
Theorem 5.3.1 The two proposed and K-R methods give the same estimate of the
denominator degrees of freedom when A1 = A2 or A1 =
are
2
A2 , and the estimate values
+1
2
( + 1)
, and
− ( − 1) respectively.
A2
A2
Proof (i ) For the case where A1 = A2
K-R approach
g=
So, d1 =
( + 1) A1 − ( + 4) A2 ( + 1) A2 − ( + 4) A2
=
= −2
( + 2) A2
( + 2) A2
−2
−2
2
=
, d2 =
,
+6
+6
3 + 2(3 − )
E ∗ = (1 − A2
Since
then
) −1 , and B =
d3 =
4
.
+6
1
+6
A2 ,
( A1 + 6 A2 ) =
2
2
−2
⎧
⎫
1+
A2
⎪
⎪
⎧
⎫
1
d
B
+
2
2
1
2
V∗ = ⎨
⎬.
⎬= ⎨
2
2
⎩ (1 − d 2 B ) (1 − d3 B) ⎭
⎪ (1 − A2 ) (1 − 2 A2 ) ⎪
⎩
⎭
⇒ ρ∗ =
∗
V
2 E ∗2
−2 ⎫
⎧
1+
A2 ⎪
⎪
1
2
= ⎨
⎬,
2
⎪ 1 − A2 ⎪
⎩
⎭
−2
+2
A2
A2
2
,
ρ ∗ −1 =
−1 = 2
2
2
1 − A2
1 − A2
1+
62
Therefore,
m∗ = 4 +
+2
+2 ⎛ 2 ⎞ 2
1 − A2 ⎟ = .
= 4+
∗
+ 2 ⎜⎝
ρ −1
⎠ A2
A2
2
First proposed approach
d0 =
( + 1) A1 − ( 2 + 5 + 8) A2
.
( + 2) A2
− 2 − 2d 0
,
B
−2
⎛ 2⎞
= 4 + d0 ⎜1 − ⎟ +
B
⎝ B⎠
m1 = 4 + d 0 +
⎞ 2 ( − 2)
( + 1) A2 − ( 2 + 5 + 8) A2 ⎛
4
⎜1 −
⎟+
( + 2) A2
⎝ (6 + ) A2 ⎠ (6 + ) A2
= 4+
= 4+
= 4+
= 4+
[ ( + 1) A2 − (
2 3 + 16
2
+ 5 + 8) A2 ][(6 + ) A2 − 4 ] + 2 (
(6 + )( + 2) A22
2
− 4) A2
+ 24 − (4 + 8)(6 + ) A2
(6 + )( + 2) A2
2
2 (6 + )(2 + ) (4 + 8)(6 + ) A2 2
−
=
(6 + )( + 2) A2 (6 + )( + 2) A2
A2
Second proposed approach
m2 =
2 ( + 2) + 2( A1 − A2 ) 2
=
A1 + 2 A2
A2
(ii ) For the case where A1 =
2
A2
+1
K-R approach
g=
( + 1) A1 − ( + 4) A2 ( + 1) A1
+ 4 ( + 1) 2
+4
=
−
=
−
= −1
+ 2 ( + 2) + 1
+2
( + 2) A2
( + 2) A2
So, d1 =
−1
+1
, d2 =
,
3 +4
3 +4
and since E ∗ = (1 − A2
d3 =
)−1 , and B =
+3
.
3 +4
1
4+3
( A1 + 6 A2 ) =
A2 ,
2
( + 1)
63
then V ∗ =
⎫
2⎧
1 + d1 B
⎨
⎬
2
⎩ (1 − d 2 B) (1 − d3 B ) ⎭
1
⎧
⎫
A2
1−
⎪
⎪⎪
2⎪
( + 1)
= ⎨
⎬,
⎪ (1 − A2 ) 2 (1 − + 3 A2 ) ⎪
( + 1)
⎩⎪
⎭⎪
1
⎧
⎫
1−
A2 ⎪
⎪
( + 1) − A2 ⎫
V
1⎪
( + 1) ⎪ 1 ⎧
= ⎨
ρ∗ = ∗ = ⎨
⎬,
⎬
E
⎩ ( + 1) − ( + 3) A2 ⎭
⎪1 − + 3 A2 ⎪
( + 1) ⎪⎭
⎪⎩
∗
ρ ∗ −1 =
( + 1) − A2
( + 2) A2
−1 =
,
( + 1) − ( + 3) A2
( + 1) − ( + 3) A2
and therefore
m∗ = 4 +
( + 1) − ( + 3) A2
( + 1)
+2
,
= 4+
= 1− +
A2
A2
ρ −1
First proposed approach
( + 1) A1 − ( 2 + 5 + 8) A2
d0 =
,
( + 2) A2
− 2 − 2d 0
B
−2
⎛ 2⎞
= 4 + d0 ⎜1 − ⎟ +
B
⎝ B⎠
m1 = 4 + d 0 +
= 4+
= 4+
2 A2 − ( 2 + 5 + 8) A2 ⎛
2 ( + 1) ⎞ ( + 1)( − 2)
⎜1 −
⎟+
( + 2) A2
(4 + 3 ) A2
⎝ (4 + 3 ) A2 ⎠
−(
+ 5 + 6) A2 ⎛ (4 + 3 ) A2 − 2 ( + 1) ⎞ ( + 1)( − 2)
⎜
⎟+
( + 2) A2
(4 + 3 ) A2
(4 + 3 ) A2
⎝
⎠
2
⎛ (4 + 3 ) A2 − 2 ( + 1) ⎞ ( + 1)( − 2)
= 4 − ( + 3) ⎜
⎟+
(4 + 3 ) A2
(4 + 3 ) A2
⎝
⎠
= 4+
( + 1)( − 2) − ( + 3)[(4 + 3 ) A2 − 2 ( + 1)]
(4 + 3 ) A2
= 4+
( + 1)[2( + 3) + − 2]
( + 1)
− ( + 3) = 1 − +
(4 + 3 ) A2
A2
64
Second proposed approach
m2 =
2 ( + 2) + 2( A1 − A2 )
( + 1)
= 1− +
A1 + 2 A2
A2
When A1 = A2 or A1 =
Theorem 5.3.2
1−
2
A2 , then the scale estimates are 1 and
+1
−1
A2 respectively for K-R and the proposed modifications as well.
( + 1)
Proof
When A1 = A2 , or A1 =
2
A2 , the three modifications give the same estimates
+1
for the denominator degrees of freedom which are
2
( + 1)
respectively
and 1 − +
A2
A2
(theorem 5.3.1). In addition, since the E[ F ] is modified in the same way for the three
modifications, then the scale estimate is the same for the three modifications. In the
following proof, we use the Kenward and Roger approach to find the scale estimate
(i ) For the case where A1 = A2 , m∗ =
and
λ∗ =
∗
m
=
E (m∗ − 2) ⎛ 1
⎜
⎝ 1 − A2
∗
(ii ) For the case A1 =
and
λ∗ =
2
,
A2
2
A2
= 1.
⎞⎛ 2
⎞
⎟⎜ − 2 ⎟
⎠⎝ A2
⎠
2
( + 1)
A2 , m = 1 − +
,
A2
+1
m
=
E (m − 2)
⎛ 1
⎜
⎝ 1 − A2
∗
( + 1)
A2
⎞⎛ ( + 1)
⎞
− ( + 1) ⎟
⎟⎜
⎠⎝ A2
⎠
1− +
(1 − )
A2 + 1
( + 1)
,
=
⎛ 1 ⎞ ⎛ A2 ⎞
⎜
⎟ ⎜1 − ⎟
⎠
⎝ 1 − A2 ⎠ ⎝
multiplying by
A2
( + 1)
65
= 1−
( − 1)
A2
( + 1)
Corollary 5.3.3 If A1 = A2 or A1 =
2
A2 , then the K-R, and the proposed approaches
+1
are identical.
Proof
From theorems (5.3.1 and 5.3.2), K-R, and the proposed approaches give the
same estimate of the denominator degrees of freedom and the same estimate for the scale,
and since the statistic used is the same, then all approaches are identical.
When = 1, then A1 = A2 .
Theorem 5.3.4
r
Proof
r
A1 = ∑∑ wij tr(ΘΦΡi Φ)tr(ΘΦΡ j Φ) ,
i =1 j =1
Since L′ΦL is a scalar, then Θ =
r
r
A2 = ∑∑ wij tr(ΘΦΡ i ΦΘΦΡ j Φ)
i =1 j =1
LL′
,
L′ΦL
L′ΦΡ i ΦL
1
⎛ LL′ΦΡ i Φ ⎞
tr(ΘΦΡ i Φ) = tr ⎜
tr(L′ΦΡ i ΦL) =
.
=
⎟
L′ΦL
⎝ L′ΦL ⎠ L′ΦL
r
r
L′ΦΡ i ΦLL′ΦΡ j ΦL
and hence,
A1 = ∑∑ wij
(L′ΦL) 2
i =1 j =1
(5.10)
Also,
tr(ΘΦΡi ΦΘΦΡ j Φ) =
=
and hence,
1
tr(LL′ΦΡ i ΦLL′ΦΡ j Φ)
(L′ΦL) 2
tr(L′ΦΡ i ΦLL′ΦΡ j ΦL) L′ΦΡ i ΦLL′ΦΡ j ΦL
=
(L′ΦL) 2
(L′ΦL) 2
r
r
L′ΦΡ i ΦLL′ΦΡ j ΦL
A2 = ∑∑ wij
(L′ΦL) 2
i =1 j =1
From expressions (5.10 and 5.11), we can see that A1 = A2 .
Corollary 5.3.5
When = 1, the K-R and the proposed approaches are identical.
(5.11)
66
The estimates of the denominator degrees of freedom and the scale are
2
and 1
A2
respectively.
Proof
Since A1 = A2 (theorem 5.3.4), then by applying corollary (5.3.3), we have the
K-R and the proposed approaches are identical. By using theorems (5.3.1 and 5.3.2), the
proof is completed.
67
6. KENWARD AND ROGER’S APPROXIMATION
FOR THREE GENERAL MODELS
In chapter 4, we saw how the K-R approach was modified so the approximation
for the denominator degrees of freedom and the scale match the known values for the two
special cases where F has an exact F distribution. In this chapter, we show that the K-R
approximation produces the exact values for the denominator degrees of freedom and
scale factor for more general models. Three models are considered: the modified Rady
model which is more general than the balanced one-way Anova model, a general
multivariate linear model which is more general than the Hotelling T 2 , and a balanced
multivariate model with a random group effect.
6.1 Modified Rady’s Model
A model that satisfies assumptions (A1)-(A5) will be called a Rady model. If the
hypothesis satisfies conditions (A6) and (A7), and the model is a Rady model, the testing
problem will be called a Rady testing problem. Rady (1986) proved that a Rady testing
problem has an exact F test to test a linear hypothesis of the fixed effects. The model
considered in this section is slightly different than Rady’s model where we add two more
assumptions (A8) and (A9) and the model will be called modified Rady’s model. When
the model is a modified Rady model, and the hypothesis satisfies conditions (A6) and
(A7), then the testing problem is called a modified Rady testing problem.
6.1.1 Model and Assumptions
Consider the mixed linear model,
y ∼ N( Χβ, Σ ) ,
where Σ = Cov[y ] = σ 12 Σ1 +
+ σ r −12 Σ r −1 + σ r 2 Ι , and Σ i are n.n.d. matrices. Χ is a
known matrix, β is a vector of unknown fixed effect parameters, and ℑ denotes the set
of all possible vectors σ and ℵ = {Σ(σ ) : σ ∈ ℑ} . We are interested in testing H 0 : L′β = 0 ,
for L′ an ( × p ) fixed matrix .
68
We assume that the model satisfies assumptions A1-A8 where
(A1) ℑ contains an r -dimensional rectangle.
(A2) Σ(σ ) is p.d. for all σ.
(A3) β and σ are functionally independent.
(A4) The model has orthogonal block structure (OBS). That is, ℵ is commutative (i.e.
Σ(σ ) Σ(σ ∗ ) = Σ(σ ∗ ) Σ(σ ) ∀σ, σ ∗ ∈ ℑ ). The orthogonal block structure is equivalent to the
spectral decomposition:
q
r
⎛
⎞
Σ(σ ) = ∑ α k (σ )Εk , where α k (σ ) is a linear function of σ ⎜ i.e α k (σ ) = ∑ α ki σ i ⎟ , and
k =1
i =1
⎝
⎠
Εk 's are pairwise orthogonal o.p. matrices.
(A5) Zyskind’s condition: A model is said to satisfy Zyskind’s condition if
Ρ Χ Σ = ΣΡ Χ for all Σ .
(A6) Zyskind’s condition for the submodel under the null hypothesis; ΣΡ U = Ρ U Σ for all
Σ where ℜ(U) = {Χβ : L′β = 0} .
(A7) ℜ(M ) ⊂ ℜ(B s ) for some s, where Μ = Ρ Χ − Ρ U , and each Ek can be expressed as
Ek = B k + Ck where B k and Ck are symmetric idempotent matrices,with ℜ(Ck ) ⊂ N(Χ′),
ℜ(B k ) = ℜ(Ek Χ) and B k Ck = 0.
(A8) (Ι − Ρ Χ ) Σi 's are linearly independent.
(A9) q = r.
Remarks
(i ) B k is an o.p.o on ℜ(Εk Χ), B k = Εk Ρ Χ , and notice that ℜ(Εk Χ) ⊂ ℜ(Εk ). Then,
Ck = Εk − B k = Εk (Ι − Ρ Χ ) is an o.p.o on ℜ(Εk ) ∩ ℜ(Εk Χ) ⊥ (Seely, 2002, problem, 2.F.3).
(ii ) ℜ(Ε k ) ∩ ℜ(Εk Χ) ⊥ ⊂ N(Χ′)
Proof: Take any x ∈ℜ(Εk ) ∩ ℜ(Εk Χ) ⊥
⇒ x ∈ℜ(Εk ) and x ∈ℜ(Εk Χ) ⊥ = N(Χ′Εk )
⇒ Εk v = x for some v, and Χ′Εk x = 0.
⇒ Χ′Εk Εk v = 0 ⇒ Χ′Εk v = 0 ⇒ Χ′x = 0, and this means that x ∈ N(Χ′).
69
(iii ) From assumption(A7), ℜ(M ) ⊂ ℜ(B s ) = ℜ(Ε s Χ), and since ℜ(Ε s Χ) ⊂ ℜ(Ε s ),
then ℜ(M ) ⊂ ℜ(Ε s ), and hence Ε s M = M.
(iv ) Since L′βˆ is estimable, then L = Χ′A for some A. r(L) = r( A ) − dim[ℜ( A ) ∩ N(Χ′)],
and ℜ( A) ∩ N(Χ′) = ℜ( A) ∩ ℜ( Χ) ⊥ = 0, because ℜ( A) ⊂ ℜ( Χ).
Therefore, r(L) = r( A) = r(M ) = .
(v) Ck ≠ 0 ∀k .
r −1
r
Proof: Suppose that Cr = 0, then Σ = ∑ σ i Σi = ∑ α k (σ )E k
i =1
k =1
r
r −1
i =1
k =1
⇒ ∑ σ i (Ι - Ρ Χ ) Σi = ∑ α k (σ )Ck ,
and this combined with assumption (A1) implies (Ι - Ρ Χ ) Σi 's ∈ span{C1 ,..., Cr −1} which
contradicts with assumption (A8) that (Ι - Ρ Χ ) Σi 's are linearly independent.
(vi ) B = {α ki }r×r is invertible.
r
r
Proof: Suppose Bσ = Bσ ∗ . Then α k (σ ) = α k (σ ∗ ) ∀k ⇒ ∑ α k (σ )Ck =∑ α k (σ ∗ )Ck
k =1
k =1
r
r
r
r
k =1
k =1
i =1
i =1
⇒ (Ι − Ρ Χ )∑ α k (σ )Ek =(Ι − Ρ Χ )∑ α k (σ ∗ )Ek ⇒ ∑ σ i (Ι - Ρ Χ ) Σi =∑ σ i∗ (Ι - Ρ Χ ) Σi ,
and since (Ι - Ρ Χ ) Σi 's are linearly independent (assumption A8), then σ i = σ i∗ ∀i.
Hence B is injective on ℑ, and by assumption (A1), it is injective on
square, then it is invertible.
r
, and since it is
(vii ) The definition of OBS in (A4) was given by Birkes and Lee (2003) which is less
restrictive than the definition given by VanLeeuwen et al. (1998).
Examples
1- Fixed effects linear models which include the balanced one-way Anova model are
modified Rady models.
2- Balanced mixed classification nested models are modified Rady models.
3- Many balanced mixed classification models including balanced split-plot models are
modified Rady models.
70
6.1.2 Useful Properties for the Modified Rady Model
A testing problem that satisfies all or some of the modified Rady testing problem
assumptions has several properties that will be summarized in this section
ˆ =Φ
ˆ
In a Rady model, Φ
A
Lemma 6.1.2.1
Consider Q ij − Ρ i ΦΡ j
Proof
= Χ′
Note
∂Σ −1 ∂Σ −1
∂Σ −1
∂Σ −1
−1
−1
′
′
′
Σ
Χ−Χ
Χ( Χ Σ Χ) Χ
Χ ,
∂σ i
∂σ j
∂σ i
∂σ j
Since Σ −1Χ( Χ′Σ −1Χ)−1 Χ′ is a p.o. on ℜ(Σ −1Χ) along N (Χ′), Σ −1 is a
nonsingular matrix (by assumption) and n.n.d.(Birkes, 2004), then it is a p.d. matrix
(Seely, 2002, corollary 1.9.3)
⇒ r( Χ′Σ −1Χ) = r(Χ′)
(Birkes, 2004) .
Since r( Χ′Σ −1Χ) = r( Χ′) = r( Ρ Χ ), then we have
Σ −1Χ( Χ′Σ −1Χ) −1 Χ′ = Σ −1Ρ Χ (Ρ Χ Σ -1Ρ Χ ) + Ρ Χ (Seely, 2002, problem 1.10.B3)
q
Ρ Χ Σ −1 Ρ Χ = ∑ Ρ Χ
Also,
k =1
q
1
1
Εk Ρ Χ = ∑
Ε k Ρ Χ = Σ −1Ρ Χ
α k (σ )
k =1 α k (σ )
⇒ ( Ρ Χ Σ −1Ρ Χ ) = ∑ α k (σ )Ε k Ρ Χ = ΣΡ Χ ,
+
q
k =1
and hence Σ −1Χ( Χ′Σ −1Χ) −1 Χ′ = Σ −1Ρ Χ (Ρ Χ Σ −1Ρ Χ )+ Ρ Χ
= Σ −1Ρ Χ ΣΡ Χ Ρ Χ = Ρ Χ (Zyskind's condition)
−1
ΡΧ
(6.1)
−1
∂Σ
∂Σ
∂Σ −1 ∂Σ −1
Σ
Ρ Χ = Ρ Χ Σ −1
Σ
Σ ΡΧ
∂σ i
∂σ j
∂σ i
∂σ j
⎛ q 1
⎞⎛ q
⎞⎛ q
⎞
⎞⎛ q 1
⎞⎛ q 1
= ΡΧ ⎜ ∑
Ε k ⎟ ⎜ ∑ α ki Εk ⎟ ⎜ ∑
Εk ⎟ ⎜ ∑ α kj Εk ⎟ ⎜ ∑
Εk ⎟ Ρ Χ
⎠ ⎝ k =1 α k (σ ) ⎠ ⎝ k =1
⎠ ⎝ k =1 α k (σ ) ⎠
⎝ k =1 α k (σ ) ⎠ ⎝ k =1
α kiα kj
Εk Ρ Χ
3
k =1 [α k (σ )]
q
=∑
Similarly,
ΡΧ
∂Σ −1
∂Σ −1
Χ( Χ′Σ −1Χ) −1 Χ′
ΡΧ
∂σ i
∂σ j
(6.2)
71
= Ρ Χ Σ −1
∂Σ −1
∂Σ −1
Σ Χ( Χ′Σ −1Χ) −1 Χ′Σ −1
Σ ΡΧ
∂σ i
∂σ j
= Ρ Χ Σ −1
∂Σ
∂Σ −1
Ρ Χ Σ −1
Σ ΡΧ
∂σ i
∂σ j
⎛ q 1
⎞⎛ q
⎞⎛ q
⎞
⎞ ⎛ q 1
⎞⎛ q 1
Εk ⎟ ⎜ ∑ α ki Εk ⎟ Ρ Χ ⎜ ∑
Εk ⎟ ⎜ ∑ α kj Εk ⎟ ⎜ ∑
Εk ⎟ Ρ Χ
= ΡΧ ⎜ ∑
⎠ ⎝ k =1 α k (σ ) ⎠ ⎝ k =1
⎠ ⎝ k =1 α k (σ ) ⎠
⎝ k =1 α k (σ ) ⎠ ⎝ k =1
α kiα kj
Εk Ρ Χ
3
k =1 [α k (σ )]
q
=∑
(6.3)
From expressions (6.2 and 6.3) we obtain
ΡΧ
⇒ Χ′
∂Σ −1 ∂Σ −1
∂Σ −1
∂Σ −1
Σ
ΡΧ − ΡΧ
Χ( Χ′Σ -1Χ) −1 Χ′
ΡΧ = 0
∂σ i
∂σ j
∂σ i
∂σ j
∂Σ −1 ∂Σ −1
∂Σ −1
∂Σ −1
Σ
Χ − Χ′
Χ( Χ′Σ -1Χ) −1 Χ′
Χ=0
∂σ i
∂σ j
∂σ i
∂σ j
ˆ =Φ
ˆ.
This shows that Qij − Ρ i ΦΡ j = 0, and since R ij = 0, then Φ
A
ˆ =Φ
ˆ.
Corollary 6.1.2.2 In balanced mixed classification models, Φ
A
Proof
Balanced mixed classification models are Rady models (Rady, 1986), and hence
ˆ =Φ
ˆ.
by applying lemma 6.1.2.1, we have Φ
A
Lemma 6.1.2.3
For a Rady testing problem, A1 = A2 .
Ω = {Χβ : β ∈
Proof
p
} = ℜ(Χ).
Since we are interested in testing Η 0 : L′β = 0, then L′β is assumed to be estimable ⇔
ℜ(L) ⊂ ℜ( Χ′) = ℜ( Χ′Χ) ⇔ L = Χ′ΧB for some B ⇔ L = Χ′A where A = ΧB,
ℜ( A) ⊂ ℜ( Χ), and hence
ΩΗ = {Χβ : L′β = 0} = ℜ( Χ) ∩ N ( A′) = ℜ(U).
Also, since
and
then
ΣΡ U = Ρ U Σ for all Σ (Zyskind's condition for the submodel),
ΣΡ Χ = Ρ Χ Σ for all Σ (Zyskind's condition),
ΣΜ = ΜΣ for all Σ where Μ = Ρ Χ − Ρ U .
72
⎡
⎤
∂Σ −1
Σ Χ( Χ′Σ −1Χ) −1 ⎥
tr(ΘΦΡ i Φ) = − tr ⎢L(L′ΦL) −1 L′( Χ′Σ -1Χ) −1 Χ′Σ −1
∂σ i
⎣
⎦
⎡
⎤
∂Σ −1
= − tr ⎢ Χ′A ( A′ΧΦΧ′A ) −1 A′Χ( Χ′Σ -1Χ) −1 Χ′Σ −1
Σ Χ( Χ′Σ -1Χ) −1 ⎥
∂σ i
⎣
⎦
⎡
∂Σ ⎤
= − tr ⎢ Ρ Χ A( A′ΧΦΧ′A ) −1 A′Ρ Χ
⎥
∂σ i ⎦
⎣
⎡
∂Σ ⎤
= − tr ⎢ A ( A′ΧΦΧ′A ) −1 A′
⎥
∂σ i ⎦
⎣
⎡
∂Σ ⎤
= − tr ⎢ A( A′ΣA) −1 A′
⎥,
∂σ i ⎦
⎣
(from 6.1).
, becauseℜ( A ) ⊂ ℜ( Χ)
again notice that Σ −1ΧΦΧ′ = Ρ Χ
Since ℜ( U ) ⊂ ℜ( Χ ) , then Μ is an o.p.o. on Ω ∩ Ω Η⊥ (Seely, 2002, problem 2.F.3),
Ω ∩ ΩΗ⊥ = ℜ( Χ) ∩ [ℜ( Χ) ∩ N ( A′) ] = ℜ( Χ) ∩ ⎡⎣ℜ( Χ) ⊥ + ℜ( A) ⎤⎦
⊥
= ℜ( Χ) ∩ ℜ( Χ) ⊥ + ℜ( Χ) ∩ ℜ( A) = ℜ( A)
So, we have ℜ(M ) = ℜ( A).
Analogous to the argument given in the proof of lemma 6.2.1, we obtain
ΣA ( A′ΣA ) −1 A′ = ΣM (MΣM ) + M = M,
⎛
∂Σ
tr(θΦΡ i Φ ) = − tr ⎜ Σ −1M
∂σ i
⎝
and hence,
⎞ ⎛ −1 ∂Σ
⎟ tr ⎜⎜ Σ M
∂σ j
⎠ ⎝
r
r
⎛
∂Σ −1 ∂Σ ⎞
A2 = ∑∑ wij tr ⎜ Σ −1M
Σ M
⎟
⎜
∂σ i
∂σ j ⎟⎠
i =1 j =1
⎝
r
r
⎛
∂Σ
⇒ A1 = ∑∑ wij tr ⎜ Σ −1M
∂σ i
i =1 j =1
⎝
(6.4)
⎞
⎟.
⎠
⎞
⎟⎟,
⎠
A1 and A2 can also be expressed in Εk 's terms as
q
α
α ki
A1 = ∑∑ wij ∑
tr ( Εk M )∑ kj tr ( Εk M ),
i =1 j =1
k =1 α k (σ )
k =1 α k (σ )
r
q
r
(6.5)
And
r
r
α kiα kj
tr ( Εk M )
2
k =1 [α k (σ )]
q
A2 = ∑∑ wij ∑
i =1 j =1
(6.6)
73
Since Εk M = MΕk (Zyskind's condition for the submodel), then
ℜ(Εk M ) = ℜ(MΕk ) ⊂ ℜ(M ), and hence Εk M is an o.p.o (Seely, 2002, problem 2.F.1).
⇒ tr(Εk M ) = r(Εk M ) (Harville, 1997, corollary 10.2.2 and also Seely's notes).
= r(M ) − dim[ℜ(M) ∩ N(Εk )] (Seely, 2002, proposition 1.8.2).
When k = s, then dim[ℜ(M ) ∩ N(Ε s )] = 0 (assumption A7),
and hence r(Εk M ) = r(M ) = (from the remark above).
When k ≠ s, then ℜ(M ) ⊂ ℜ(Ε s ) ⇒ Εk ℜ(M ) ⊂ Εk ℜ(Ε s ) ⇒ ℜ(Εk M ) ⊂ ℜ(Εk Ε s ) = 0,
and then we have Εk M = 0.
Hence, expressions (6.5) and (6.6) can be simplified as
r
r
A1 = ∑∑ wij
i =1 j =1
r
r
A2 = ∑∑ wij
and
i =1 j =1
α siα sj
[α s (σ )]2
2
,
α siα sj
.
[α s (σ )]2
Lemma 6.1.2.4 For a modified Rady model, the approximation and the exact
approaches to derive W = Var[σˆ REML ] are identical.
Proof
The Exact Approach
1
exp[− z′(Κ ′ΣΚ ) −1 z ], z ∼ N(0,Κ ′ΣΚ )
2
1
1
⇒ R (σ ) = constant − log Κ ′ΣΚ − z′(Κ ′ΣΚ ) −1 z
2
2
f (z ) = (2π ) −
q
2
Κ ′ΣΚ
− 12
1
1
= constant − log Κ ′ΣΚ − y ′Κ (Κ ′ΣΚ ) −1 Κ ′y,
2
2
Since B = {α ki }r×r is invertible (from the remarks above) which is equivalent to
α = {α k (σ )}r×1 a one to one function.
Now, we reparametrize by making γ k = α k (σ ), and hence we have
∂
R
(γ)
∂γ k
1 ⎡
∂Σ
∂Σ ⎤ 1
= − tr ⎢(Κ ′ΣΚ ) −1 Κ ′
Κ ⎥ + y′Κ (Κ ′ΣΚ ) −1 Κ ′
Κ (Κ ′ΣΚ ) −1 Κ ′y
2 ⎣
∂γ k
∂γ k ⎦ 2
1 ⎛ ∂Σ
= − tr ⎜ G
2 ⎝ ∂γ k
⎞ 1
∂Σ
Gy ,
⎟ + y ′G
2
γ
∂
k
⎠
(6.7)
74
G = Σ −1 − Σ −1Χ( Χ′Σ -1Χ)−1 Χ′Σ −1 = Κ (Κ ′ΣΚ )−1 Κ ′ = Σ −1 (Ι - Ρ Χ )
where
⇒
∂
R
(γ)
∂γ k
(from 6.1)
1
1
= − tr ( GE k ) + y′GEk Gy , since Σ = ∑ γ k Εk
2
2
k =1
r
r
⎞ 1 C
1 ⎛ 1
1
= − tr ⎜ Ck ⎟ + y′ 2k y, since G = Σ −1 (Ι - Ρ Χ ) = ∑ Ck
2 ⎝γk
k =1 γ k
⎠ 2 γk
Equating
∂
R
(γ )
∂γ k
with zero,
α k (σˆ ) REML = γˆk =
y ′Ck y
,
tk
for k = 1,..., r
(6.8)
⎡
⎤
⎡α1 (σˆ ) ⎤
⎡α1 (σˆ ) ⎤
⎢
⎥
y′Ck y ⎥
⎢
⎥
⎥
−1 ⎢
−1
⎢
ˆ
Bσˆ = ⎢
⎥ ⇒ σ REML = B ⎢
⎥ = B s, where s = ⎢ t ⎥
k
⎢⎣α r (σˆ ) ⎥⎦
⎢⎣α r (σˆ ) ⎥⎦
⎢
⎥
⎣
⎦
The above expression is an explicit form for σˆ REML .
Var[σˆ REML ] = Var[B −1s] = B −1Var[s](B −1 )′.
Observe that for any two different elements of s,
Cov[ y′Ci y , y′C j y ] = 2tr(Ci ΣC j Σ) + 4( Χβ)′Ci ΣC j Χβ = 0 ( Schott, 2005, theorem 10.22).
And
Cov[ y ′Ci y , y ′Ci y ] = 2tr(Ci ΣCi Σ) + 4( Χβ)′Ci ΣCi Χβ = 2tr(Ci ΣCi Σ),
⎛ q
⎞
⎛ q
⎞
Ci ΣCi Σ = (Ι - Ρ Χ )Εi ⎜ ∑ α k (σ )Ε k ⎟ (Ι - Ρ Χ )Εi ⎜ ∑ α k (σ )Ε k ⎟
⎝ k =1
⎠
⎝ k =1
⎠
2
= (Ι - Ρ Χ )[α i (σ )] Εi , because Ε k 's are orthogonal o.p. matrices
= [α i (σ )]2 Ci
⎧ 2[α i (σ )]2 ⎫
So, Var[s] = diag ⎨
⎬ = 2D,
t
i
⎩
⎭r ×r
and hence
Var[σˆ REML ] = 2B −1D(B −1 )′.
(6.9)
The Approximation Approach
⎡ ⎧ ⎛
∂Σ
∂Σ
⎪
W = Var[σˆ REML ] ≈ 2 ⎢ ⎨ tr ⎜ G
G
⎜
⎢ ⎪ ⎝ ∂σ i ∂σ j
⎣m ⎩
−1
r
⎞ ⎫⎪⎤
⎟⎟ ⎬⎥ (Searle et al., 1992, P.253),
⎠i , j =1 ⎪⎭⎥⎦
75
⎛ ∂Σ
∂Σ
G
tr ⎜ G
⎜ ∂σ
∂σ j
i
⎝
⎞
⎡ −1
∂Σ −1
∂Σ ⎤
Σ (Ι - Ρ Χ )
⎟⎟ = tr ⎢ Σ (Ι - Ρ Χ )
⎥
∂σ i
∂σ j ⎦⎥
⎠
⎣⎢
⎡ r α α
⎤ r α α
= tr ⎢∑ ki kj 2 Εk (Ι - Ρ Χ ) ⎥ = ∑ ki kj 2 tk ,
⎣ k =1 [α k (σ )]
⎦ k =1 [α k (σ )]
where tr(Ck ) = tk ≥ 1.
r
⎧ r
⎫
tk
Define Ψ = ⎨∑ α ki
α kj ⎬ = B′D−1B, where B and D are defined as above.
2
⎩ k =1 [α k (σ )]
⎭i , j =1
⇒ Ψ −1 = B −1D(B′) −1 ,
hence W = Var[σˆ REML ] ≈ 2Ψ −1 = 2 B −1D(B′)−1 .
(6.10)
From expressions (6.9 and 6.10), we see that both approaches give same result.
Corollary 6.1.2.5 For a modified Rady testing problem, A1 =
Proof
r
r
From expression (6.7), A1 = ∑∑ wij
i =1 j =1
r
Notice that
r
∑∑ α
i =1 j =1
si
α siα sj
[α s (σ )]2
2
=
2 2
2
and A2 = .
ts
ts
2
[α s (σ )]2
r
r
∑∑ α
i =1 j =1
si
wijα sj .
wijα sj is an entry of BWB′; call it (BWB′) ss , where B as in above.
From expressions (6.9) or (6.10), we have BWB′ = B[B −1D(B −1 )′]B′ = D,
and hence A1 =
2
×
[α s (σ )]2
2[α s (σ )]2 2 2
,
=
ts
ts
and A2 =
2
ts
Lemma 6.1.2.6 (Rady, 1986)
Under the null hypothesis, the Anova test statistic FR ∼ F( , ts ) , where FR =
Lemma 6.1.2.7 The K-R test statistic F = FR .
Proof
1
ˆ L) −1 L′βˆ ,
F = βˆ ′L(L′Φ
A
ˆ =Φ
ˆ (lemma 6.1.2.1), then
and since Φ
A
ts y ′My
.
y′Cs y
76
1
ˆ ) −1 L′βˆ
F = βˆ ′L(L′ΦL
ˆ ) −1 = ⎡L′( Χ′Σ
ˆ −1Χ) −1 L ⎤
(i ) (L′ΦL
⎣
⎦
−1
ˆ −1Χ) −1 Χ′A ⎤
= ⎣⎡ A′Χ( Χ′Σ
⎦
−1
(notice that L = Χ′A)
= [ A′ΣΡ Χ A ] , utilizing expression (6.1)
−1
= [ A′ΣA ] , because ℜ( A) ⊂ ℜ( Χ),
−1
ˆ −1y = ( Χ′Χ) −1 Χ′y ( utilizing expression 6.1).
(ii ) βˆ = ( Χ′Σˆ −1Χ) −1 Χ′Σ
From (i ) and (ii ),
F=
=
=
=
1
y ′Χ( Χ′Χ) −1 Χ′A [ A′ΣA ] A′Χ( Χ′Χ) −1 Χ′y
1
y ′Ρ Χ A [ A′ΣA ] A′Ρ Χ y
1
y ′A [ A′ΣA ] A′y (because ℜ( A ) ⊂ ℜ( Χ))
1
ˆ −1My (utilizing expression 6.4),
y ′Σ
−1
−1
−1
1
1
Εk M =
M (from the remarks of this section),
ˆ)
α s (σˆ )
k =1 α k (σ
y ′C s y
where α s (σˆ ) =
(from expression 6.8).
ts
t y ′My
F= s
Therefore,
= FR
y ′C s y
q
ˆ −1M = ∑
but Σ
6.1.3 Estimating the Denominator Degrees of Freedom and the Scale Factor
From corollary, 6.1.2.5, A1 =
2 2
,
ts
and A2 =
2
,
ts
and then
A1
= .
A2
By utilizing theorems (5.3.1 and 5.3.2) and corollary (5.3.3), K-R and our proposed
2
= ts , and λ ∗ = 1.
approaches are identical. m∗ =
A2
The estimates of the denominator degrees of freedom and the scale factor match the exact
77
values for the Anova test in lemma 6.1.2.6. So, the K-R approach produces the exact
values for the modified Rady model which is more general than the balanced one-way
Anova model which is one of the special cases considered in modifying the K-R
approach.
6.2 A General Multivariate Linear Model
This model was studied by Mardia et al. (1992) where they show that for this
model, we have an exact F test to test a linear hypothesis of the fixed effects.
6.2.1 The Model Setting
Consider the model defined by
Y = ΧB + U,
where Y (n × p ) is an observed matrix of p response variables on each of n individuals,
Χ(n × q ) is the design matrix, B( q × p) is a matrix of unknown parameters, and U is a
matrix of unobserved random disturbances whose rows for given Χ are uncorrelated. We
also assume that U is a data matrix from N p (0, Σ p ).
⎡ β′(1) ⎤
⎡Σ p
⎡ y1′ ⎤
⎡ x1′ ⎤
⎢
⎥
⎢
⎢
⎥
⎢
⎥
Y = ⎢ ⎥,
Χ = ⎢ ⎥,
B=⎢
, Σ=⎢
⎥
⎢β′( q ) ⎥
⎢0
⎢⎣ y ′n ⎥⎦
⎢⎣ x′n ⎥⎦
⎣
⎦
⎣
xi 's are q × 1 vectors , and y i 's and βi 's are p × 1 vectors.
0⎤
⎥
⎥ = Ιn ⊗ Σ p
Σ p ⎥⎦
Suppose that we are interested in testing: Η 0 : c′BM = 0,
where c(1× q ), and M ( p × r ) are given matrices , and M have a rank r.
Note A special case of this general model is the Hotelling T 2 one sample test, in which
Χ = 1n , B = μ′, c = 1, M = Ι p . So, p = r , and q = 1 . Also, the Hotelling two samples is a
special case of this model.
Other Setting
To compute the Wald statistic for our problem so we are able to compute the
estimates of the Kenward and Roger method, we rearrange Y as a vector.
78
⎡ β (1) ⎤
⎡ y1 ⎤
⎢ ⎥
⎢
⎥
y = ⎢ ⎥,
β=⎢ ⎥
⎢β ( q ) ⎥
⎢⎣ y n ⎥⎦
⎣ ⎦
y and β are np × 1 and pq ×1 vectors respectively.
Notice that the design matrix for the new setting will be Χ∗ = Χ ⊗ Ι p .
The testing problem Η 0 : c′BM = 0,
which is
⎡ β′(1) ⎤
⎢
⎥
cq ⎤⎦ ⎢
⎥ M = ⎡⎣ c1
⎢
⎥
⎣β′( q ) ⎦
⎡⎣c1
q
q
j =1
j =1
⎡ β′(1) M ⎤
⎢
⎥
cq ⎤⎦ ⎢
⎥ = 0.
⎢
⎥
⎣β′( q ) M ⎦
⇔ ∑ c j β′( j ) M = 0 ⇔ ∑ c j M′β ( j ) = 0 ⇔ ⎡⎣c1M′
⎡ β (1) ⎤
⎢
⎥
cq M ′⎤⎦ ⎢
⎥ = 0,
⎢β ( q ) ⎥
⎣
⎦
which is can be written as Η 0 : L′β = 0, where L = c ⊗ M.
So under this setting,
Φ = ⎡⎣ ( Χ ⊗ Ι p )′(Ι n ⊗ Σ −p1 )( Χ ⊗ Ι p ) ⎤⎦
−1
= ( Χ′Χ ⊗ Σ −p1 ) = ( Χ′Χ)−1 ⊗ Σ p ,
−1
β = ⎡⎣( Χ′Χ) −1 ⊗ Σ p ⎤⎦ ( Χ′ ⊗ Ι p )((Ι n ⊗ Σ −p1 )y
= ⎡⎣ ( Χ′Χ) −1 Χ′ ⊗ Ι p ⎤⎦ y,
and
(M′Σ p M ) −1
(L′ΦL) = ⎡⎣(c′ ⊗ M′)(( Χ′Χ) ⊗ Σ p )(c ⊗ M ) ⎤⎦ =
.
c′( Χ′Χ) −1 c
−1
−1
−1
6.2.2 Useful Properties for the Model
In this section, we derive several lemmas to summarize the useful properties for
this model under the specified testing problem.
Lemma 6.2.2.1 (Mardia et al., 1992, chapter 6)
ˆ M ) −1 M ′Y′Χ( Χ′Χ) −1 c
c′( Χ′Χ) −1 Χ′YM (M ′Σ
p
ˆ is the REML estimator
Let FM =
, where Σ
p
(n − q )c′( Χ′Χ) −1 c
79
for Σ p . Under the null hypothesis, FM ∼ [ r ( n − q − r + 1)]Fr ,n − q − r +1 .
n
ˆ =
Note Similar to lemma 4.2.3, Σ
p
∑ (y
i =1
i
− y )(y i − y )′
n−q
Special Notation Since σ = {σ if } p×( p+1) (i ≤ f ) , then like in the Hotelling T 2 case, we use
2
the notation Ρ if , Ρ jg , Q if , jg , R if , jg , and wif , jg .
Lemma 6.2.2.2
ˆ =Φ
ˆ .
Φ
A
⎛
∂Σ −p1 ⎞
∂Σ −p1
Ρ if = ( Χ′ ⊗ Ι p ) ⎜ Ι n ⊗
⎟ ( Χ ⊗ Ι p ) = Χ′Χ ⊗
⎜
∂σ if ⎟⎠
∂σ if
⎝
Proof
⎛
⎛
∂Σ −1 ⎞
∂Σ −p1 ⎞
⇒ Ρ if ΦΡ jg = ⎜ Χ′Χ ⊗ p ⎟ ⎣⎡( Χ′Χ) −1 ⊗ Σ p ⎦⎤ ⎜ Χ′Χ ⊗
⎟
⎜
⎜
∂σ if ⎟⎠
∂σ jg ⎟⎠
⎝
⎝
= Χ′Χ ⊗
∂Σ −p1
∂σ if
Σp
∂Σ −p1
∂σ jg
,
⎛
⎛
∂Σ −p1 ⎞
∂Σ −p1 ⎞
Qif , jg = ( Χ′ ⊗ Ι p ) ⎜ Ι n ⊗
⎟⎟ (Ι n ⊗ Σ p ) ⎜⎜ Ι n ⊗
⎟⎟ ( Χ ⊗ Ι p )
⎜
σ
σ
∂
∂
if
jg
⎝
⎠
⎝
⎠
−1
−1
∂Σ
∂Σ p
= Χ′Χ ⊗ p Σ p
∂σ if
∂σ jg
ˆ =Φ
ˆ.
⇒ Qif , jg − Ρ if ΦΡ jg = 0, and since R if , jg = 0, then we conclude that Φ
A
The K-R test statistic F =
Lemma 6.2.2.3
r
FM .
n−q
1
ˆ L) −1 L′βˆ
F = βˆ ′L(L′Φ
A
Proof
1
ˆ ) −1 L′βˆ (lemma 6.2.2.2)
= βˆ ′L(L′ΦL
=
1
(
ˆ M ) −1
(M′Σ
p
−1
⎣⎡( Χ′Χ) Χ′ ⊗ Ι p ⎦⎤ y (c ⊗ M ) c′( Χ′Χ) −1 c (c′ ⊗ M′) ⎡⎣ ( Χ′Χ) Χ′ ⊗ Ι p ⎤⎦ y
−1
)
′
80
{
}
1
ˆ M ) −1 M ′ y.
y ′ Χ( Χ′Χ) −1 cc′( Χ′Χ) −1 Χ′ ⊗ M (M′Σ
p
−1
rc′( Χ′Χ) c
In the expression above, we used = r , and this is because dim(L′) = r × pq.
=
In addition,
{
}
ˆ M ) −1 M′ y = y ′(dd′ ⊗ A)y = (vec Y)′(dd′ ⊗ A )vec Y,
y ′ Χ( Χ′Χ) −1 cc′( Χ′Χ) −1 Χ′ ⊗ M (M′Σ
p
ˆ M ) −1 M′Y′Χ( Χ′Χ) −1 c = d′YAY′d,
and c′( Χ′Χ) −1 Χ′YM (M′Σ
p
ˆ M )−1 M′,
where d = Χ( Χ′Χ) −1 c , A = M (M′Σ
p
vecY is the np dimensional column vector
that we obtain when we stack the columns of Y.
Since (vec Y)′(dd′ ⊗ A)vec Y = d′YAY′d (Harville, 1997, theorem 16.2.2), the proof is
completed.
Corollary 6.2.2.4 Under the null hypothesis,
Proof
n − q − r +1
F ∼ F(r , n − q − r + 1).
n−q
Direct from lemmas 6.2.2.1 and 6.2.2.3.
Lemma 6.2.2.5
A1 =
2
,
n−q
and
A2 =
( + 1)
.
n−q
Proof
p
p
p
p
A1 = ∑ ∑ ∑∑ wif , jg tr(ΘΦΡ if Φ)tr(ΘΦΡ jg Φ),
i =1 f =1, j =1 g =1,
i≤ f
j≤ g
p
and
p
p
p
A2 = ∑ ∑ ∑∑ wif , jg tr(ΘΦΡif ΦΘΦΡ jg Φ).
i =1 f =1, j =1 g =1,
i≤ f
j≤g
Θ = L(L′ΦL) −1 L′ = (c ⊗ M )
(M′Σ p M ) −1
cc′ ⊗ M (M′Σ p M ) −1 M′
′
′
(c ⊗ M ) =
,
c′( Χ′Χ) −1 c
c′( Χ′Χ) −1 c
tr(ΘΦΡ if Φ)
⎧⎪
⎫⎪
⎡
∂Σ −p1 ⎤
1
−1
−1
−1
′
′
′
′
′
′
⎡
⎤
⎡
⎤
⎡
⎤
⊗
tr
(
)
(
)
(
)
=
⊗
⊗
⊗
cc
M
M
Σ
M
M
Χ
Χ
Σ
Χ
Χ
Χ
Χ
Σ
⎢
⎥
⎨
p
p⎦
p ⎦⎬
⎦⎣
∂σ if ⎦⎥ ⎣
c′( Χ′Χ) −1 c ⎪⎩ ⎣
⎪⎭
⎣⎢
⎡
⎤
∂Σ −p1
1
−1
−1
′
′
′
′
tr
(
)
(
)
=
cc
Χ
Χ
⊗
M
M
Σ
M
M
Σ
Σ
⎢
⎥
p
p
p
c′( Χ′Χ) −1 c ⎣⎢
∂σ if
⎦⎥
81
⎡
∂Σ p ⎤
= − tr ⎢ M (M ′Σ p M ) −1 M ′
⎥
∂σ if ⎥⎦
⎢⎣
Accordingly, A1 can be expressed as
p
p
p
p
⎡
∂Σ p ⎤ ⎡
∂Σ p ⎤
−1
A1 = ∑ ∑ ∑∑ wif , jg tr ⎢M (M′Σ p M ) −1 M′
⎥ tr ⎢M (M′Σ p M ) M′
⎥,
∂σ if ⎦⎥ ⎣⎢
∂σ jg ⎦⎥
i =1 f =1, j =1 g =1,
⎣⎢
i≤ f
j≤g
and similarly,
p
p
p
p
⎡
∂Σ p ⎤
∂Σ p
M (M′Σ p M ) −1 M′
A2 = ∑ ∑ ∑∑ wif , jg tr ⎢M (M′Σ p M ) −1 M′
⎥.
∂σ jg ⎥⎦
∂σ if
i =1 f =1, j =1 g =1,
⎢⎣
i≤ f
j≤g
Let M(M′Σ p M) −1 M′ = Τ. Since Τ = {tij } p× p is a symmetric matrix, then
⎛ ∂Σ p
tr ⎜ Τ
⎜ ∂σ
if
⎝
for i = f
⎞ ⎧⎪tii
⎟⎟ = ⎨
⎠ ⎩⎪2tif
⎛ ∂Σ p
Or, tr ⎜ Τ
⎜
⎝ ∂σ if
for i ≠ f
⎞ 1−δif
⎧1
⎟⎟ = 2 tif , where δ if = ⎨
⎩0
⎠
⎛ ∂Σ p ∂Σ p
It's easy to check that tr ⎜ T
T
⎜ ∂σ
∂σ jg
if
⎝
for i = f
(6.11)
for i ≠ f
⎞ 1−δ if −δ jg
(tig t jf + t jg tij )
⎟⎟ = 2
⎠
(6.12)
Moreover, analogous to the proof given for expression (4.6) in the Hotelling Τ 2 case,
σ igσ jf + σ fgσ ij
(6.13)
wif , jg = Cov[σˆ if , σˆ jg ] =
(n − q)
Utilizing expressions (6.11) and (6.13),
p
p
p
p
A1 = ∑ ∑ ∑∑
σ igσ jf + σ fgσ ij
(n − q)
i =1 f =1, j =1 g =1,
i≤ f
j≤g
p
p
p
p
= ∑∑∑∑
σ igσ jf + σ fgσ ij
(n − q)
i =1 f =1 j =1 g =1
2
2 −δ if −δ jg
tif t jg
tif t jg
Note In the expression above for A1 , we divided by 2
the summations repeat wif , jg 2
⇒ A1 =
1 ⎧⎪ p p ⎛ p
⎨∑∑ ⎜ ∑ σ ig tif
n − q ⎪⎩ f =1 g =1 ⎝ i =1
2 −δ if −δ jg
2 −δ if −δ jg
, and this is true because
times.
⎞ p p ⎛ p
⎞⎛ p
⎞⎛ p
σ
t
σ
t
⎟ ⎜ ∑ jf jg ⎟ + ∑∑ ⎜ ∑ fg jg ⎟ ⎜ ∑ σ ij tif
⎠ ⎝ j =1
⎠ j =1 i =1 ⎝ g =1
⎠ ⎝ f =1
⎞ ⎫⎪
⎟⎬
⎠ ⎪⎭
82
⎞
⎛ p
⎞⎛ p
σ
t
∑∑
⎜ ∑ ig if ⎟ ⎜ ∑ σ jf t jg ⎟ = tr(TΣ p TΣ p ),
f =1 g =1 ⎝ i =1
⎠ ⎝ j =1
⎠
p
Notice that
p
TΣ p = M (M′Σ p M )−1 M′Σ p
and since
is o.p. on ℜ(M ), then TΣ p TΣ p = TΣ p ,
tr( TΣ p ) = r(TΣ p ) = r(M ) = r
and
A1 =
Hence
(Seely, 2002, chapter 2).
1
2r
(r + r ) =
.
n−q
n−q
(6.14)
Similarly, and by utilizing expressions (6.12) and (6.13),
p
p
p
p
(σ σ + σ fgσ ij )
1−δ −δ
(tig t jf + t fg tij )2 if ij
A2 = ∑ ∑ ∑∑ ig jf
(n − q)
i =1 f =1, j =1 g =1,
i≤ f
p
p
j≤g
p
p
= ∑∑∑∑
(σ igσ jf + σ fgσ ij )
i =1 j =1 g =1 f =1
=
2(n − q )
(tig t jf + t fg tij )
p
p
p
p
⎧ p p p p
1
t
t
σ
σ
+
⎨∑∑∑∑ ig ig jf jf ∑∑∑∑ σ ig t fgσ jf tij
2(n − q ) ⎩ i =1 j =1 g =1 f =1
i =1 j =1 g =1 f =1
p
p
p
p
p
p
p
p
⎫
+ ∑∑∑∑ σ fg tigσ ij t jf + ∑∑∑∑ σ fg t fgσ ij tij ⎬
i =1 j =1 g =1 f =1
i =1 j =1 g =1 f =1
⎭
=
p
p
p
p
⎧ p p
⎛ p
⎞⎛ p
⎞
1
⎨∑∑ σ ig tig ∑∑ σ jf t jf + ∑∑ ⎜ ∑ σ ig t fg ⎟ ⎜ ∑ σ jf tij ⎟
2( n − q ) ⎩ g =1 i =1
f =1 j =1
i =1 f =1 ⎝ g =1
⎠
⎠ ⎝ j =1
p
p
⎛ p
⎞⎛ p
+ ∑∑ ⎜ ∑ σ fg tig ⎟ ⎜ ∑ σ ij t jf
i =1 f =1 ⎝ g =1
⎠ ⎝ j =1
p
p
∑∑ σ
Since
f =1 j =1
t = tr(TΣ p ) = r(TΣ p ) = r(M ) = r
jf jf
⎛ p
⎞⎛ p
σ
t
⎜ ∑ fg ig ⎟ ⎜ ∑ σ ij t jf
∑∑
i =1 f =1 ⎝ g =1
⎠ ⎝ j =1
p
and
p
p
⎞ p p
⎫
+
σ
t
⎟ ∑∑ fg fg ∑∑ σ ij tij ⎬
j =1 i =1
⎠ g =1 f =1
⎭
⎞
⎟ = r (from above),
⎠
1
r (r + 1)
(r 2 + r + r + r 2 ) =
.
A2 =
2(n − q)
n−q
p
therefore
6.2.3 Estimating the Denominator Degrees of Freedom and the Scale
A1 =
2r
,
n−q
and A2 =
r (r + 1)
,
n−q
and then
A1
2
(lemma 6.2.2.5).
=
A2 r + 1
(6.15)
83
By utilizing theorems (5.3.1, 5.3.2) and corollary (5.3.3), K-R and the proposed
approaches are identical,
r (r + 1)
− (r − 1) = n − q − r + 1 (notice that = r ),
A2
where
m∗ =
and
λ∗ = 1−
r −1
r −1 n − q − r +1
A2 = 1 −
=
.
r (r + 1)
n−q
n−q
The estimates of the denominator degrees of freedom and the scale match the values for
the exact multivariate test in corollary 6.2.2.4. So, the K-R approach produces the exact
values for the general multivariate linear model considered in this section which is more
general that the Hotelling T 2 model.
6.3 A Balanced Multivariate Model with a Random Group Effect
This model was studied by Birkes (2006) where he showed that there exists an
exact F test to test a linear hypothesis of the fixed effects under a specific condition as we
will see shortly. In this section, we prove that the K-R and the proposed approaches as
well give the same exact values for the denominator degrees of freedom and the scale.
6.3.1 Model Setting
Suppose an experiment yields measurements on p characteristics of a sample of
subjects, and suppose the subjects are in t groups each of size m . The measurements of
the k -th characteristic of the j -th subject in the i -th group is denoted by yijk (i = 1,..., t ,
j = 1,...., m, k = 1,...., p ) . The experiment is designed so that the j -th subject in each
group is associated with q covariates x jl (l = 1,...., q ) , not depending on i . The design is
balanced in so far as the groups all have the same size m and the same set of covariates.
Each measurement yijk includes a random group effect aik and a random subject error eijk .
Notation
⎡ x1′ ⎤
Χ = ⎢⎢ ⎥⎥ ,
⎢⎣ x′m ⎥⎦
⎡ x j1 ⎤
⎢ ⎥
xj = ⎢ ⎥,
⎢ x jq ⎥
⎣ ⎦
⎡ Y1 ⎤
⎡ y ′i1 ⎤
Y = ⎢⎢ ⎥⎥ , Yi = ⎢⎢ ⎥⎥ ,
⎢⎣ Yt ⎥⎦
⎢⎣ y ′im ⎥⎦
⎡ yij1 ⎤
⎢ ⎥
y ij = ⎢ ⎥ ,
⎢ yijp ⎥
⎣ ⎦
84
B = ⎡⎣β
(1)
β
( p)
⎤⎦ , β
(k )
⎡ β k1 ⎤
⎢ ⎥
= ⎢ ⎥,
⎢ β kq ⎥
⎣ ⎦
⎡ ai1 ⎤
⎢ ⎥
ai = ⎢ ⎥ ,
⎢ aip ⎥
⎣ ⎦
⎡ eij1 ⎤
⎢ ⎥
eij = ⎢ ⎥
⎢ eijp ⎥
⎣ ⎦
Model
yijk = β ( k )′ x j + aik + eijk , hence y ij = B′x j + ai + eij . We assume the β kl are fixed unknown
parameters, the ai are i.i.d. N p (0, Λ), the eij are i.i.d. N p (0, Σ p ), and the ai are independent
of the eij . We also require 1m ∈ℜ( Χ).
Hypothesis
Η 0 : c′BM = 0, where c is q × 1 and M is p × r. We require 1′m Χ( Χ′Χ)c = 0.
Other Setting
To compute the Wald statistic for our problem and in order to compare the statistic
mentioned above and the Wald statistic, we rearrange Y as a vector.
y = vec(Y′),
⎡ τ1 ⎤
⎡ β1i ⎤
⎢ ⎥
⎢ ⎥
β = ⎢ ⎥ where τ i = ⎢ ⎥
⎢τq ⎥
⎢ β pi ⎥
⎣ ⎦
⎣ ⎦
y and β are mtp ×1 and pq ×1 vectors respectively.
⎡ β11
β p1 ⎤ ⎡ τ1′ ⎤
⎢
⎥ ⎢ ⎥
Notice that B = ⎡⎣β
β ⎤⎦ = ⎢
⎥ = ⎢ ⎥,
⎢ β1q
β pq ⎥⎦ ⎢⎣ τ′q ⎥⎦
⎣
the design matrix for the new setting will be Χ∗ = 1t ⊗ ( Χ ⊗ Ι p ),
(1)
( p)
and
Σ = Ι t ⊗ Δ,
where
Δ = Ι m ⊗ Σ p + m Ρ 1m ⊗ Λ
Special Notation As in sections 4.2 and 6.2, we use the notation Ρ if , Q if , jg , R if , jg , and
wif , jg .
6.3.2 Useful properties for the Model
In this section, we derive several lemmas that show useful properties about the
model and the testing problem.
85
ˆ =Φ
ˆ .
Φ
A
Lemma 6.3.2.1
Δ = Ι m ⊗ Σ p + mΡ1m ⊗ Λ
Proof
(
)
= Ι m − Ρ1m ⊗ Σ p + Ρ1m ⊗ ( Σ p + mΛ ) ,
(
)
Δ −1 = Ι m − Ρ1m ⊗ Σ −p1 + Ρ1m ⊗ ( Σ p + mΛ )
−1
(6.16)
{(
)
⇒ ( Χ′ ⊗ Ι p )Δ −1 ( Χ ⊗ Ι p ) = ( Χ′ ⊗ Ι p ) Ι m − Ρ1m ⊗ Σ −p1 + Ρ1m ⊗ ( Σ p + mΛ )
(
)
= Χ′ Ι m − Ρ1m Χ ⊗ Σ −p1 + Χ′Ρ1m Χ ⊗ ( Σ p + mΛ )
−1
} (Χ ⊗ Ι )
p
−1
Hence,
Φ = ⎡⎣(1t ⊗ Χ ⊗ Ι p )′(Ι t ⊗ Δ −1 )(1t ⊗ Χ ⊗ Ι p ) ⎤⎦
⎡⎣ ( Χ′ ⊗ Ι p )Δ −1 ( Χ ⊗ Ι p ) ⎤⎦
=
t
(
−1
−1
−1
)
⎡ Χ′ Ι − Ρ Χ ⊗ Σ −1 + Χ′Ρ Χ ⊗ ( Σ + mΛ )−1 ⎤
m
1m
p
1m
p
⎢
⎦⎥
=⎣
t
−1
( Χ′Χ) Χ′ Ι m − Ρ1m Χ( Χ′Χ) −1 ⊗ Σ p + ( Χ′Χ) −1 Χ′Ρ1m Χ( Χ′Χ) −1 ⊗ ( Σ p + mΛ )
=
t
(
)
⎛
∂Δ −1 ⎞
Ρ if = 1t ⊗ ( Χ′ ⊗ Ι p ) ⎜ Ι t ⊗
⎟⎟ 1t ⊗ ( Χ ⊗ Ι p )
⎜
σ
∂
if
⎝
⎠
= t ( Χ′ ⊗ Ι p )
∂Δ −1
(Χ ⊗ Ι p )
∂σ if
Hence, Ρ if ΦΡ jg = t 2 ( Χ′ ⊗ Ι p )
∂Δ −1
∂Δ −1
( Χ ⊗ Ι p )Φ( Χ′ ⊗ Ι p )
( Χ ⊗ Ι p ).
∂σ if
∂σ jg
⎛
⎛
∂Δ −1 ⎞
∂Δ −1 ⎞
Qif , jg = 1′t ⊗ ( Χ′ ⊗ Ι p ) ⎜ Ι t ⊗
Ιt ⊗ Δ ) ⎜ Ιt ⊗
(
⎟
⎟⎟ 1t ⊗ ( Χ ⊗ Ι p )
⎜
⎟
⎜
σ
σ
∂
∂
if
jg
⎝
⎠
⎝
⎠
−1
−1
∂Δ
∂Δ
= t ( Χ′ ⊗ Ι p )
Δ
( Χ ⊗ Ι p ).
∂σ if ∂σ jg
⇒ Q if , jg − Ρ if ΦΡ jg = t ( Χ′ ⊗ Ι p )
Consider t ( Χ ⊗ Ι p )Φ( Χ′ ⊗ Ι p )
∂Δ −1
∂Δ −1
′
−
⊗
⊗
Δ
Χ
Ι
Φ
Χ
Ι
(
)
(
)
(Χ ⊗ Ι p )
t
{
p
p }
∂σ if
∂σ jg
(6.17)
86
{
(
)
= ( Χ ⊗ Ι p ) ( Χ′Χ) −1 Χ′ Ι m − Ρ1m Χ( Χ′Χ) −1 ⊗ Σ p
}
+ ( Χ′Χ) −1 Χ′Ρ1m Χ( Χ′Χ)−1 ⊗ ( Σ p + mΛ ) ( Χ′ ⊗ Ι p ).
(
)Ρ
)
= Χ( Χ′Χ) −1 Χ′ Ι m − Ρ1m Χ( Χ′Χ) −1 Χ′ ⊗ Σ p + Χ( Χ′Χ) −1 Χ′Ρ1m Χ( Χ′Χ) −1 Χ′ ⊗ ( Σ p + mΛ )
(
= Ρ Χ Ι m − Ρ1m
(
Χ
⊗ Σ p + Ρ Χ Ρ1m Ρ Χ ⊗ ( Σ p + mΛ )
)
= Ρ Χ − Ρ1m ⊗ Σ p + Ρ1m ⊗ ( Σ p + mΛ )
(
(observe that 1m ∈ ℜ( Χ))
)
(
)
⇒ Δ − t ( Χ ⊗ Ι p )Φ( Χ′ ⊗ Ι p ) = Ι m − Ρ1m ⊗ Σ p − Ρ Χ − Ρ1m ⊗ Σ p = ( Ι m − Ρ Χ ) ⊗ Σ p
Moreover, {Δ − t ( Χ ⊗ Ι p )Φ( Χ′ ⊗ Ι p )}
∂Δ −1
(Χ ⊗ Ι p )
∂σ jg
−1
⎧
∂ ( Σ p + mΛ ) ⎫⎪
∂Σ −p1
⎪
= {( Ι m − Ρ Χ ) ⊗ Σ p } ⎨ Ι m − Ρ1m ⊗
+ Ρ1m ⊗
⎬ ( Χ ⊗ Ι p ).
∂σ jg
∂σ jg
⎪⎩
⎪⎭
(
(
)
)
∂Σ −p1
= ( Ι m − Ρ Χ ) Ι m − Ρ1m Χ ⊗ Σ p
∂σ jg
+ ( Ι m − Ρ Χ ) Ρ1m Χ ⊗ Σ p
∂ ( Σ p + mΛ )
∂σ jg
−1
=0
ˆ = Φ
ˆ.
This implies Qif , jg − Ρ if ΦΡ jg = 0, and since R if , jg = 0, then Φ
A
Lemma 6.3.2.2 (Birkes, 2006)
Let Fd =
(tm − t − q − r + 2)t c′( Χ′Χ) −1 Χ′Yi M[M′Y′QYM ]−1 M′Yi′Χ( Χ′Χ) −1 c
.
r
c′( Χ′Χ) −1 c
t
where Yi = ∑ Yi t and Q = Ρ1t ⊗ (Ι m − Ρ Χ ) + (Ι t − Ρ1t ) ⊗ (Ι m − Ρ1m ).
i =1
Under H 0 , Fd ∼ Fr ,tm −t − q − r + 2 .
Lemma 6.3.2.3
(
)
−1
−1
−1
ˆ
t c′( Χ′Χ) Χ′Yi M M ′Σ p M M ′Yi′Χ( Χ′Χ) c
,
F= .
c′( Χ′Χ) −1 c
r
ˆ is the REML estimate of Σ .
where Σ
p
p
Proof
ˆ = Φ̂ (lemma 6.3.2.1), then under the null hypothesis,
Since Φ
A
1
ˆ ) −1 L′βˆ
F = βˆ ′L(L′ΦL
87
(
)
) Δˆ } y = {1′ ⊗ Φˆ ( Χ′ ⊗ Ι ) Δˆ } y,
ˆ (1′t ⊗ Χ′ ⊗ Ι p ) Ι t ⊗ Δˆ −1 y
(i) βˆ = Φ
{
ˆ 1′t ⊗ ( Χ′ ⊗ Ι p
=Φ
−1
−1
t
p
From (6.17), Φ ( Χ′ ⊗ Ι p )
{
(
}
)
= ( Χ′Χ) −1 Χ′ Ι m − Ρ1m Χ( Χ′Χ) −1 ⊗ Σ p + ( Χ′Χ) −1 Χ′Ρ1m Χ( Χ′Χ) −1 ⊗ ( Σ p + mΛ )
{
(
{
(
Χ′ ⊗ Ι p
t
}
=
1
( Χ′Χ) −1 Χ′ Ι m − Ρ1m Ρ Χ ⊗ Σ p + ( Χ′Χ) −1 Χ′Ρ1m Ρ Χ ⊗ ( Σ p + mΛ )
t
=
1
( Χ′Χ) −1 Χ′ Ρ Χ − Ρ1m ⊗ Σ p + ( Χ′Χ) −1 Χ′Ρ1m ⊗ ( Σ p + mΛ ) ,
t
)
}
)
and from (6.16), we have
Φ ( Χ′ ⊗ Ι p ) Δ −1 =
=
{
(
)
{( Ι
{
}
1
( Χ′Χ) −1 Χ′ Ρ Χ − Ρ1m ⊗ Σ p + ( Χ′Χ)−1 Χ′Ρ1m ⊗ ( Σ p + mΛ ) ×
t
1
( Χ′Χ)−1 Χ′ Ρ Χ − Ρ1m
t
(
)( Ι
m
m
)
− Ρ1m ⊗ Σ −p1 + Ρ1m ⊗ ( Σ p + mΛ )
)
(
−1
}
)
− Ρ1m ⊗ Ι p + ( Χ′Χ)−1 Χ′ Ρ Χ − Ρ1m Ρ1m ⊗ Σ p ( Σ p + mΛ )
(
)
+ ( Χ′Χ) −1 Χ′Ρ1m Ι m − Ρ1m ⊗ ( Σ p + mΛ ) Σ −p1 + ( Χ′Χ) −1 Χ′Ρ1m ⊗ Ι p
=
1
( Χ′Χ) −1 Χ′ ⊗ Ι p }
{
t
So,
1
βˆ = {1′t ⊗ ( Χ′Χ) −1 Χ′ ⊗ Ι p } y,
t
1
1
L′βˆ = (c′ ⊗ M′) {1′t ⊗ ( Χ′Χ) −1 Χ′ ⊗ Ι p } y = {1′t ⊗ c′( Χ′Χ) −1 Χ′ ⊗ M′} y
t
t
(ii ) L′ΦL
1
= (c′ ⊗ M′) ( Χ′Χ) −1 Χ′ Ι m − Ρ1m Χ( Χ′Χ) −1 ⊗ Σ p
t
{
(
)
}
+ ( Χ′Χ) −1 Χ′Ρ1m Χ( Χ′Χ) −1 ⊗ ( Σ p + mΛ ) (c ⊗ M )
=
{
1
c′( Χ′Χ) −1 Χ′ Ι m − Ρ1m Χ( Χ′Χ) −1 c ⊗ M′Σ p M
t
(
)
}
+c′( Χ′Χ) −1 Χ′Ρ1m Χ( Χ′Χ) −1 c ⊗ M′ ( Σ p + mΛ ) M
}
−1
88
c′( Χ′Χ) −1 c
( notice that 1′m Χ( Χ′Χ) −1 c = 0)
M′Σ p M
t
Accordingly, from (i ) and (ii ),
=
(
)
(6.18)
−1
−1
ˆ
1 y ′{1t ⊗ Χ( Χ′Χ) c ⊗ M} M′Σ p M {1′t ⊗ c′( Χ′Χ) Χ′ ⊗ M′} y
F= .
rt
c′( Χ′Χ) −1 c
−1
{
(
ˆ M
y′ ⎡⎣1t ⊗ Χ( Χ′Χ) −1 c ⎤⎦ ⎡⎣1′t ⊗ c′( Χ′Χ) −1 Χ′⎤⎦ ⊗ M M′Σ
p
1
= .
c′( Χ′Χ) −1 c
rt
=
)
−1
}
M′ y
1 y ′ {dd′ ⊗ A} y
ˆ M
.
, where d = 1t ⊗ Χ( Χ′Χ) −1 c, and A = M M′Σ
p
rt c′( Χ′Χ) −1 c
(
)
−1
M′.
Since y ′ {dd′ ⊗ A} y = ( vecY′ )′ {dd′ ⊗ A} vecY′
= d′YAY′d
(Harville,1997, theorem 16.2.2),
(
)
−1
−1
ˆ
1 {1′t ⊗ c′( Χ′Χ) Χ′} YM M′Σ p M M′Y′ {1t ⊗ Χ( Χ′Χ) c}
F= .
rt
c′( Χ′Χ) −1 c
then
−1
Moreover, since
{1′ ⊗ c′(Χ′Χ)
t
−1
Χ′} Y = ∑ c′( Χ′Χ) −1 Χ′Yi = c′( Χ′Χ) −1 Χ′∑ Yi = tc′( Χ′Χ) −1 Χ′Yi
t
t
i =1
i =1
then
(
)
−1
−1
ˆ
t c′( Χ′Χ) Χ′Yi M M′Σ p M M′Yi′Χ( Χ′Χ) c
F= .
,
r
c′( Χ′Χ) −1 c
−1
ˆ is the REML estimate of Σ .
where Σ
p
p
Lemma 6.3.2.4 The REML estimator for Σ p is
where
Proof
(
) (
Y′QY
,
mt − q − t + 1
)
Q = Ρ1t ⊗ ( Ι m − Ρ Χ ) + Ι t − Ρ1t ⊗ Ι m − Ρ1m .
Choose Τ such that ΤΤ′ = Ι mt − Ρ1t ⊗ Ρ Χ , and Τ′Τ = Ι mt − q .
Notice that ℜ(Τ′) = ℜ(Τ′Τ) ⇒ r(Τ′) = r(Τ′Τ) = mt − q,
and Τ′(1t ⊗ Χ) = 0
Define Κ = Τ ⊗ Ι p .
Notice that Κ ′(1t ⊗ Χ ⊗ Ι p ) = (Τ′ ⊗ Ι p )(1t ⊗ Χ ⊗ Ι p ) = Τ′(1t ⊗ Χ) ⊗ Ι p = 0
(6.19)
89
1
1
−1
= Constant − log Κ ′(Ι t ⊗ Δ)Κ − y′Κ [ Κ ′(Ι t ⊗ Δ)Κ ] Κ ′y.
2
2
Let D = Κ ′(Ι t ⊗ Δ)Κ = Κ ′(Ι t ⊗ Δ)ΚΚ ′Κ.
R
(
(Ι t ⊗ Δ)ΚΚ ′ = (Ι t ⊗ Δ) Ι mpt − Ρ1t ⊗ Ρ Χ ⊗ Ι p
)
= Ι t ⊗ Δ − Ρ1t ⊗ ⎡⎣Δ ( Ρ Χ ⊗ Ι p ) ⎤⎦ ,
(
where Ι t ⊗ Δ = Ι t ⊗ Ι m ⊗ Σ p + Ρ1m ⊗ mΛ
)
= Ι mt ⊗ Σ p + Ι t ⊗ Ρ1m ⊗ mΛ.
(
)
Ρ1t ⊗ ⎡⎣Δ ( Ρ Χ ⊗ Ι p ) ⎤⎦ = Ρ1t ⊗ Ρ Χ ⊗ Σ p + Ρ1m ⊗ mΛ
= Ρ1t ⊗ Ρ Χ ⊗ Σ p + Ρ1t ⊗ Ρ1m ⊗ mΛ.
and
(
)
(
)
Hence, (Ι t ⊗ Δ)ΚΚ ′ = Ι mt − Ρ1t ⊗ Ρ Χ ⊗ Σ p + Ι t ⊗ Ρ1m − Ρ1t ⊗ Ρ1m ⊗ mΛ.
⇒ D = Κ ′ ⎡⎣( Ι t ⊗ Δ ) ΚΚ ′⎤⎦ Κ
{(
)
(
) Τ ⎤⎦ ⊗ Σ + ⎡⎣Τ′ ( Ι
}
)
= ( Τ′ ⊗ Ι p ) Ι mt − Ρ1t ⊗ Ρ Χ ⊗ Σ p + Ι t ⊗ Ρ1m − Ρ1t ⊗ Ρ1m ⊗ mΛ ( Τ ⊗ Ι p )
(
= Τ′Τ ⊗ Σ p − ⎡Τ′ Ρ1t ⊗ Ρ Χ
⎣
)
(
)
⊗ Ρ1m Τ ⎤ ⊗ mΛ − ⎡Τ′ Ρ 1t ⊗ Ρ1m Τ ⎤ ⊗ mΛ.
⎦
⎣
⎦
Since from expression (6.19), we have Τ′ (1t ⊗ Χ ) = 0 then, ⎣⎡Τ′ Ρ1t ⊗ Ρ Χ Τ ⎦⎤ ⊗ Σ p = 0.
p
(
t
(
)
)
Also, since 1m ∈ ℜ( Χ), then ⎡Τ′ Ρ1t ⊗ Ρ1m Τ ⎤ ⊗ mΛ = 0.
⎣
⎦
1
1
Hence, R = Constant − log D − y ′ΚD−1Κ ′y where
2
2
D = Ι mt − q ⊗ Σ p + ⎡Τ′ Ι t ⊗ Ρ1m Τ ⎤ ⊗ mΛ.
⎣
⎦
(
)
We may re-express D as
(
)
(
)
D = ⎡Ι
− Τ′ ( Ι ⊗ Ρ ) Τ ⎤ ⊗ Σ + ⎡Τ′ ( Ι ⊗ Ρ ) Τ ⎤ ⊗ ( Σ + mΛ )
⎣
⎦
⎣
⎦
Notice that Τ′ ( Ι ⊗ Ρ ) Τ is a p.o., and this is true because
Τ′ ( Ι ⊗ Ρ ) ΤΤ′ ( Ι ⊗ Ρ ) Τ = Τ′ ( Ι ⊗ Ρ )( Ι − Ρ ⊗ Ρ )( Ι ⊗ Ρ ) Τ
= Τ′ {( Ι ⊗ Ρ − Ρ ⊗ Ρ )( Ι ⊗ Ρ )} Τ = Τ′ ( Ι ⊗ Ρ − Ρ ⊗ Ρ ) Τ
D = ⎡Ι mt − q − Τ′ Ι t ⊗ Ρ1m Τ ⎤ ⊗ Σ p + ⎡Τ′ Ι t ⊗ Ρ1m Τ ⎤ ⊗ ( Σ p + mΛ ) ,
⎣
⎦
⎣
⎦
−1
mt − q
t
1m
p
1m
t
t
−1
−1
p
1m
t
1m
1m
t
t
1m
1m
t
1t
1m
t
mt
1m
1t
Χ
t
t
1m
1m
1t
1m
(6.20)
90
(
{notice that Τ′ ( Ρ
)
= Τ′ Ι t ⊗ Ρ1m Τ
1t
) }
⊗ Ρ1m = 0 .
Moreover,
(
)
(
)
(
)
r ⎡Τ′ Ι t ⊗ Ρ1m Τ ⎤ = tr ⎡ Τ′ Ι t ⊗ Ρ1m Τ ⎤ = trace ⎡ΤΤ′ Ι t ⊗ Ρ1m ⎤
⎣
⎦
⎣
⎦
⎣
⎦
(
(
)(
) − trace ( Ρ
(
)
)
= tr ⎡ Ι mt − Ρ1t ⊗ Ρ Χ Ι t ⊗ Ρ1m ⎤
⎣
⎦
= tr Ι t ⊗ Ρ1m
{
1t
)
⊗ Ρ1m = t − 1
(6.21)
ΚD−1Κ ′ = ( Τ ⊗ Ι p ) ⎡⎣Ι mt − q − Τ′ Ι t ⊗ Ρ1m Τ ⎤⎦ ⊗ Σ −p1
(
)
) ΤΤ′⎦⎤ ⊗ Σ
(
)(
)(
+ ⎡⎣Τ′ Ι t ⊗ Ρ1m Τ ⎤⎦ ⊗ ( Σ p + mΛ )
(
−1
} (Τ′ ⊗ Ι )
p
)
−1
⎡ ′
′⎤
= ⎡ΤΤ′ − ΤΤ′ Ι t ⊗ Ρ1m
p + ⎣ ΤΤ Ι t ⊗ Ρ 1m ΤΤ ⎦ ⊗ ( Σ p + mΛ ) ,
⎣
where ⎡⎣ΤΤ′ − ΤΤ′ Ι t ⊗ Ρ1m ΤΤ′⎤⎦ ⊗ Σ −p1
= ⎡Ι mt − Ρ1t ⊗ Ρ Χ − Ι mt − Ρ1t ⊗ Ρ Χ Ι t ⊗ Ρ1m Ι mt − Ρ1t ⊗ Ρ Χ ⎤ ⊗ Σ −p1
⎣
⎦
(
)
(
−1
)
= ⎡⎣Ι mt − Ρ1t ⊗ Ρ Χ − Ι t ⊗ Ρ1m − Ρ1t ⊗ Ρ1m ⎤⎦ ⊗ Σ −p1 ,
(
)
⎡ ΤΤ′ Ι t ⊗ Ρ1 ΤΤ′⎤ ⊗ ( Σ p + mΛ )
m
⎣
⎦
and
(
−1
)
= Ι t ⊗ Ρ1m − Ρ1t ⊗ Ρ1m ⊗ ( Σ p + mΛ )
−1
Hence,
ΚD−1Κ ′
(
)
= ⎡⎣Ι mt − Ρ1t ⊗ Ρ Χ − Ι t ⊗ Ρ1m + Ρ1t ⊗ Ρ1m ⎤⎦ ⊗ Σ −p1 + Ι t ⊗ Ρ1m − Ρ1t ⊗ Ρ 1m ⊗ ( Σ p + mΛ )
Notice that
−1
Ι mt − Ρ1t ⊗ Ρ Χ − Ι t ⊗ Ρ1m − Ρ1t ⊗ Ρ1m
= Ρ1t ⊗ Ι m − Ρ1t ⊗ Ρ Χ + Ι mt − Ι t ⊗ Ρ1m + Ρ1t ⊗ Ρ1m − Ρ1t ⊗ Ι m
(
) (
= Ρ1t ⊗ ( Ι m − Ρ Χ ) + Ι t − Ρ1t ⊗ Ι m − Ρ1m
(
)
)
⇒ ΚD−1Κ ′ = Q ⊗ Σ −p1 + Ι t ⊗ Ρ1m − Ρ1t ⊗ Ρ1m ⊗ ( Σ p + mΛ ) ,
(
−1
) (
(6.22)
)
where Q = Ρ1t ⊗ ( Ι m − Ρ Χ ) + Ι t − Ρ1t ⊗ Ι m − Ρ1m .
(
)
(
)
Since Τ′ Ι t ⊗ Ρ1m Τ, Ι mt − q − Τ′ Ι t ⊗ Ρ1m Τ, Σ p , and Σ p + mΛ are n.n.d. matrices,
(
)
(
)
so is ⎡Ι mt − q − Τ′ Ι t ⊗ Ρ1m Τ ⎤ ⊗ Σ p and ⎡Τ′ Ι t ⊗ Ρ1m Τ ⎤ ⊗ ( Σ p + mΛ ) , (Harville,
⎣
⎦
⎣
⎦
1997, P.369)
91
(
)
⎡Ι mt − q − Τ′ Ι t ⊗ Ρ1 Τ ⎤ ⊗ Σ p = GG ′,
m
⎣
⎦
Hence,
(
)
⎡Τ′ Ι t ⊗ Ρ1 Τ ⎤ ⊗ ( Σ p + mΛ ) = HH′,
m
⎣
⎦
for some full column rank matrices G and H (Seely, 2002, Problem 2.D.2).
and
Notice that since
{⎡⎣Ι
(
}{ (
)
}
)
− Τ′ Ι t ⊗ Ρ1m Τ ⎤ ⊗ Σ ⎡Τ′ Ι t ⊗ Ρ1m Τ ⎤ ⊗ ( Σ p + mΛ ) = 0,
⎦
⎣
⎦
then GG ′HH′ = 0 ⇒ G ′GG ′HH′H = 0 ⇒ G ′H = 0.
mt − q
(
)
(
)
Hence, ⎡Ι mt − q − Τ′ Ι t ⊗ Ρ1m Τ ⎤ ⊗ Σ + ⎡Τ′ Ι t ⊗ Ρ1m Τ ⎤ ⊗ ( Σ p + mΛ ) = GG ′ + HH′
⎣
⎦
⎣
⎦
G′G
0
⎡G ′ ⎤ ⎡G ′ ⎤
= [G, H ] ⎢ ⎥ = ⎢ ⎥ [G, H ] =
= G′G H′H
′
′
′
H
H
0
H
H
⎣ ⎦ ⎣ ⎦
(
)
(
)
Recall that Τ′ Ι t ⊗ Ρ1m Τ is a projection operation, and r ⎡Τ′ Ι t ⊗ Ρ1m Τ ⎤ = t − 1,
⎣
⎦
G ′G = Σ p
and hence,
mt − q −t +1
, and H′H = Σ p + mΛ
t −1
.
In addition, by using expression (6.22), we obtain
(
)
−1
y′ΚD−1Κ ′y = y′ ( Q ⊗ Σ −p1 ) y + y ′ ⎡⎢ Ι t ⊗ Ρ1m − Ρ1t ⊗ Ρ1m ⊗ ( Σ p + mΛ ) ⎤⎥ y.
⎣
⎦
1
−
−
+
mt
q
t
1
1
Accordingly, R = Constant − log Σ p
− y ′ ( Q ⊗ Σ −p1 ) y + f ( Σ p + mΛ ).
2
2
Since Σ p and Λ are unstructured, then Σ p and Σ p + mΛ may be considered as different
random variables. This means that to maximize
maximize
∗
R
R
with respect to Σ p , it sufficies to
with respect to Σ p ,
1
1
= − (mt − q − t + 1) log Σ p − y′ ( Q ⊗ Σ −p1 ) y.
2
2
y′ ( Q ⊗ Σ −1 ) y = ( vecY′ )′ ( Q ⊗ Σ −1 ) vecY′
∗
R
where
p
p
= tr ( YΣ −p1Y′Q )
(Harville, 1997, theorem 16.2.2).
= tr ( Σ −p1Y′QY )
So,
∗
R
1
1
= − (mt − q − t + 1) log Σ p − tr ( Σ −p1Y′QY ) ,
2
2
ˆ =
and hence Σ
p
Y′QY
(Anderson, 2003, lemma 3.2.2).
mt − q − t + 1
92
Note We have two expressions for Q :
(
) (
)
Q = Ρ1t ⊗ ( Ι m − Ρ Χ ) + Ι t − Ρ1t ⊗ Ι m − Ρ1m ,
and
(
)
Q = ΤΤ′ − ΤΤ′ Ι t ⊗ Ρ1m ΤΤ′
Q is a p.o. and r(Q) = mt − q − t + 1
Corollary 6.3.2.5
mt − t − q − r + 2
F = Fd .
mt − q − t + 1
Proof By combining the results of lemmas (6.3.23) and (6.3.2.4), we obtain
−1
−1
t (mt − q − t + 1) c′( Χ′Χ) Χ′Yi M ( M′Y′QYM ) M′Yi′Χ( Χ′Χ) c
F=
.
,
c′( Χ′Χ)−1 c
r
mt − t − q − r + 2
F = Fd .
and therefore
mt − q − t + 1
−1
Corollary 6.3.2.6
Under the null hypothesis,
mt − t − q − r + 2
F ∼ F(r , tm − t − q − r + 2)
mt − q − t + 1
Proof Direct from lemma 6.3.2.2 and corollary 6.3.2.5.
Lemma 6.3.2.7 For this model, the approximation and the exact approaches to derive
W = Var[σˆ REML ] are identical.
Proof
∗
R
Approximation approach
1
1
= − (mt − q − t + 1) log Σ p − y ′ ( Q ⊗ Σ −p1 ) y,
2
2
⎛
∂Σ p
1
∂ ∗R
= − (mt − q − t + 1)tr ⎜ Σ −p1
⎜
∂σ if
∂σ if
2
⎝
⎞ 1 ⎛
∂Σ −p1 ⎞
′
−
y
Q
⊗
⎟⎟
⎜⎜
⎟⎟ y,
σ
2
∂
if
⎠
⎝
⎠
⎛ ∂Σ −p1 ∂Σ p
1
∂ 2 ∗R
= − (mt − q − t + 1)tr ⎜
⎜ ∂σ ∂σ
∂σ if ∂σ jg
2
jg
if
⎝
⎞ 1 ⎛
∂ 2 Σ −p1
⎟⎟ − y ′ ⎜⎜ Q ⊗
∂σ if ∂σ jg
⎠ 2 ⎝
⎡ ∂ 2 ∗R ⎤ 1
⎛ ∂Σ −p1 ∂Σ p
mt
q
t
(
1)tr
⇒ −E ⎢
=
−
−
+
⎜⎜
⎥
⎣⎢ ∂σ if ∂σ jg ⎦⎥ 2
⎝ ∂σ jg ∂σ if
⎞
⎟⎟
⎠
⎞
⎟⎟ y
⎠
93
⎛
∂ 2 Σ −p1
1 ⎪⎧
+ ⎨E( y′) ⎜ Q ⊗
⎜
2⎪
∂σ if ∂σ jg
⎝
⎩
⎛
∂ 2 Σ −p1
where E( y ′) ⎜ Q ⊗
⎜
∂σ if ∂σ jg
⎝
⎡⎛
⎞
∂ 2 Σ −p1
y
Q
E(
)
tr
+
⊗
⎢⎜⎜
⎟⎟
∂σ if ∂σ jg
⎢⎣⎝
⎠
⎤ ⎪⎫
⎞
⎟⎟ ( Ι t ⊗ Δ ) ⎥ ⎬
⎥⎦ ⎭⎪
⎠
⎞
⎛
∂ 2 Σ −p1
⎟⎟ E( y ) = β′ (1′t ⊗ Χ′ ⊗ Ι p ) ⎜⎜ Q ⊗
∂σ if ∂σ jg
⎠
⎝
⎞
⎟⎟ (1t ⊗ Χ ⊗ Ι p ) β
⎠
⎡
∂ 2 Σ −p1
∂ 2 Σ −p1 ⎤
′
′
′
′
′
= β ⎢1 Ρ1t 1t ⊗ Χ ( Ι m − Ρ Χ ) Χ ⊗
+ 1 Ι t − Ρ1t 1t ⊗ Χ ( Ι m − Ρ Χ ) Χ ⊗
⎥β = 0
∂σ if ∂σ jg
∂σ if ∂σ jg ⎥⎦
⎢⎣
(
)
⎡ ∂ 2 ∗R ⎤ 1
⎛ ∂Σ −p1 ∂Σ p
⇒ −E ⎢
⎥ = (mt − q − t + 1)tr ⎜⎜
⎢⎣ ∂σ if ∂σ jg ⎥⎦ 2
⎝ ∂σ jg ∂σ if
Moreover,
⎛
∂ 2 Σ −p1
⎜⎜ Q ⊗
∂σ if ∂σ jg
⎝
⎞ 1 ⎡⎛
∂ 2 Σ −p1
⎟⎟ + tr ⎢⎜⎜ Q ⊗
∂σ if ∂σ jg
⎠ 2 ⎣⎢⎝
⎤
⎞
⎟⎟ ( Ι t ⊗ Δ ) ⎥ .
⎠
⎦⎥
⎞
⎟⎟ ( Ι t ⊗ Δ ) =
⎠
⎡
∂ 2 Σ −p1
∂ 2 Σ −p1 ⎤
+ Ι t − Ρ1t ⊗ Ι m − Ρ1m ⊗
⎢ Ρ1t ⊗ ( Ι m − Ρ Χ ) ⊗
⎥×
∂σ if ∂σ jg
∂σ if ∂σ jg ⎥⎦
⎢⎣
(
(Ι ⊗ Ι
t
= Ρ1t ⊗ ( Ι m − Ρ Χ ) ⊗
=Q⊗
∂ 2 Σ −p1
∂σ if ∂σ jg
∂ Σ
2
−1
p
∂σ if ∂σ jg
m
) (
)
⊗ Σ p + Ι t ⊗ mΡ1m ⊗ Λ
(
) (
)
)
Σ p + Ι t − Ρ1t ⊗ Ι m − Ρ1m ⊗
∂ 2 Σ −p1
∂σ if ∂σ jg
Σp
Σp,
and hence,
⎡⎛
⎤
⎛ ∂ 2 Σ −p1
⎞
⎛ ∂ 2 Σ −p1
⎞
∂ 2 Σ −p1 ⎞
Ι t ⊗ Δ ) ⎥ = tr(Q)tr ⎜
Σ p ⎟ = (mt − q − t + 1)tr ⎜
Σ p ⎟.
tr ⎢⎜ Q ⊗
(
⎟
⎜ ∂σ ∂σ
⎟
⎜ ∂σ ∂σ
⎟
∂σ if ∂σ jg ⎟⎠
⎢⎣⎜⎝
⎥⎦
⎝ if jg
⎠
⎝ if jg
⎠
⎡ ∂ 2 ∗R ⎤ 1
⎛ ∂Σ −p1 ∂Σ p ⎞ 1
⎛ ∂ 2 Σ −p1
⎞
Σ p ⎟.
⇒ −E ⎢
⎟⎟ + (mt − q − t + 1)tr ⎜⎜
⎥ = (mt − q − t + 1)tr ⎜⎜
⎟
⎣⎢ ∂σ if ∂σ jg ⎦⎥ 2
⎝ ∂σ jg ∂σ if ⎠ 2
⎝ ∂σ if ∂σ jg
⎠
Notice that since
∂ 2 Σ −p1
∂σ if ∂σ jg
⎛ ∂Σ −1 ∂Σ p
⎞
∂Σ p ∂Σ −p1
Σp ⎟,
Σp = −⎜ p
+ Σ −p1
⎜ ∂σ ∂σ
⎟
∂σ if ∂σ jg
if
⎝ jg
⎠
⎛ ∂ 2 Σ −p1
⎞
⎛ ∂Σ −p1 ∂Σ p ⎞
Σ p ⎟ = −2tr ⎜
then, tr ⎜
.
⎜ ∂σ ∂σ
⎟
⎜ ∂σ ∂σ ⎟⎟
jg
if ⎠
⎝ if jg
⎠
⎝
⎡ ∂ 2 ∗R ⎤
⎛ ∂Σ −p1 ∂Σ p
1
⇒ −E ⎢
=
−
−
−
+
mt
q
t
(
1)tr
⎜⎜
⎥
2
⎢⎣ ∂σ if ∂σ jg ⎥⎦
⎝ ∂σ jg ∂σ if
⎞ 1
⎛ −1 ∂Σ p −1 ∂Σ p
Σp
⎟⎟ = (mt − q − t + 1)tr ⎜⎜ Σ p
∂
∂σ if
2
σ
jg
⎠
⎝
⎞
⎟⎟
⎠
94
⎛
∂Σ p −1 ∂Σ p
Since from expression (4.7), tr ⎜ Σ −p1
Σp
⎜
∂σ jg
∂σ if
⎝
⎞ 1−δif −δ jg ig jf
⎟⎟ = 2
(σ σ + σ fgσ ij ) ,
⎠
{
}
2
⎡ 21−δif −δ jg σ igσ jf + σ fgσ ij
then W ≈
(
)
mt − q − t + 1 ⎢⎣
p
⎧1
With observing that ∑ σ ijσ if = ⎨
i =1
⎩0
{
for j = f
, it can be seen that
for j ≠ f
−1
⎤ = 1 ⎡ σ σ +σ σ p
( ig if fg ij )i, f , j , g =1 ⎤⎥⎦ ,
i , f , j , g =1 ⎥
⎦
2 ⎢⎣
}
⎡ 21−δif −δ jg σ igσ jf + σ fgσ ij
(
)
⎢⎣
−1
⎤ .
i , f , j , g =1 ⎥
⎦
p
p
and hence,
W≈
1
⎡ (σ σ + σ σ ) p
⎤
ig if
fg ij i , f , j , g =1 ⎥
⎦
mt − q − t + 1 ⎢⎣
(6.23)
Exact Approach
⎡ y ′i1 ⎤ ⎡ x1′B + a′i + e′i1 ⎤
⎥,
Since Yi = ⎢⎢ ⎥⎥ = ⎢⎢
⎥
⎢⎣ y′im ⎥⎦ ⎢⎣ x′m B + a′i + e′im ⎥⎦
then we may express Yi as Yi = ΧB + 1m a′i + Ei where Ei = [ei1 ,...., eim ]′
⎡ Y1 ⎤
and Y = ⎢⎢ ⎥⎥ = 1t ⊗ ΧB + G + E
⎢⎣ Yt ⎥⎦
⇒ QY = Q (1t ⊗ ΧB ) + QG + QE
⎡1m a1′ ⎤
⎡ E1 ⎤
⎥
⎢
⎥
where E = ⎢ ⎥ and G = ⎢⎢
⎥
⎢⎣1m a′t ⎥⎦
⎢⎣ Et ⎥⎦
Observe that Q (1t ⊗ ΧB ) = Q (1t ⊗ Χ )( Ι t ⊗ B ) ,
(
)
but Q (1t ⊗ Χ ) = ΤΤ′ (1t ⊗ Χ ) − ΤΤ′ Ι t ⊗ Ρ1m ΤΤ′ (1t ⊗ Χ ) = 0,
and hence, Q (1t ⊗ ΧB ) = 0.
(
) (
)
Also, QG = ⎡⎣ Ρ1t ⊗ ( Ι m − Ρ Χ ) + Ι t − Ρ1t ⊗ Ι m − Ρ1m ⎤⎦ G
(
)
= Q ⎡Ι t ⊗ Ι m − Ρ1m ⎤ G = 0
⎣
⎦
We conclude that QG = 0, and so we have QY = QE.
Since Q is idempotent then, Y′QY = E′QE
In addition, since E is a data matrix from N p (0,Σ p ), and Q is idempotent with
r(Q) = mt − q − t + 1,
95
then E′QE = Y′QY ∼ Wp (Σ p ,mt − q − t + 1)
(Mardia, 1992, theorem 3.4.4).
Analogous to the argument given for Hotelling T 2 case in section 4.2.4 that leads to
expression (4.6), we have
wif , jg =
σ igσ jf + σ fgσ ij
(6.24)
mt − q − t + 1
From expressions (6.23) and (6.24), we can see that the approximation and the exact
methods give the same estimate for wif , jg .
A1 =
Lemma 6.3.2.8
Proof
2r
mt − q − t + 1
and A2 =
r (r + 1)
.
mt − q − t + 1
From expression (6.18), we found that L′ΦL =
c′( Χ′Χ) −1 c
M′Σ p M.
t
t
(c ⊗ M )(M′Σ p M ) −1 (c′ ⊗ M′)
c′( Χ′Χ) −1 c
t
cc′ ⊗ M (M′Σ p M )−1 M′,
=
−1
c′( Χ′Χ) c
and from expression (6.17), we obtain
⇒ Θ = L(L′ΦL) −1 L′ =
(
)
( Χ′Χ) −1 Χ′ Ι m − Ρ1m Χ( Χ′Χ) −1 ∂Σ p
∂Φ
.
= −ΦΡ if Φ = −
⊗
t
∂σ if
∂σ if
Hence, tr(ΘΦΡ if Φ)
(
)
⎧
⎡ ( Χ′Χ) −1 Χ′ Ι m − Ρ1 Χ( Χ′Χ) −1 ∂Σ ⎤ ⎫⎪
−t
⎪
m
−1
=
⊗ p ⎥⎬
tr ⎨ ⎣⎡cc′ ⊗ M (M′Σ p M ) M′⎦⎤ ⎢
−1
c′( Χ′Χ) c ⎪
t
∂σ if ⎥ ⎪
⎢⎣
⎦⎭
⎩
=
=
⎡
∂Σ p ⎤
−1
tr ⎢cc′( Χ′Χ) −1 Χ′ Ι m − Ρ1m Χ( Χ′Χ) −1 ⊗ M (M′Σ p M ) −1 M ′
⎥
−1
c′( Χ′Χ) c ⎢⎣
∂σ if ⎥⎦
(
(
)
−c′( Χ′Χ) −1 Χ′ Ι m − Ρ1m Χ( Χ′Χ) −1 c
c′( Χ′Χ) −1 c
⎡
∂Σ p ⎤
= − tr ⎢M (M′Σ p M) −1 M′
⎥ .
∂σ if ⎦⎥
⎣⎢
)
⎡
∂Σ ⎤
tr ⎢M (M′Σ p M )−1 M′ p ⎥
∂σ if ⎦⎥
⎣⎢
(notice that Ρ1m Χ( Χ′Χ)−1 c = 0)
96
Accordingly, A1 and A2 can be expressed as
⎡
∂Σ p ⎤ ⎡
∂Σ p ⎤
−1
A1 = ∑∑∑∑ wif , jg tr ⎢M (M′Σ p M )−1 M′
⎥ tr ⎢M (M′Σ p M ) M′
⎥,
∂σ if ⎥⎦ ⎢⎣
∂σ jg ⎥⎦
g =1 f =1 i =1 j =1
⎢⎣
p
p
p
p
p
p
p
p
⎡
∂Σ
∂Σ p ⎤
A2 = ∑∑∑∑ wif , jg tr ⎢M (M′Σ p M )−1 M′ p M (M′Σ p M ) −1 M′
⎥.
∂σ if
∂σ jg ⎦⎥
g =1 f =1 i =1 j =1
⎣⎢
Analogous to the proof of lemma 6.2.2.5, we obtain
A1 =
A2 =
2r
,
mt − q − t + 1
r (r + 1)
.
mt − q − t + 1
6.3.3 Estimating the Denominator Degrees of Freedom and the Scale Factor
A1 =
2r
,
mt − q − t + 1
and A2 =
r (r + 1)
,
mt − q − t + 1
and then
A1
2
=
(lemma 6.3.2.8).
A2 r + 1
By utilizing theorems (5.3.1 and 5.3.2) and corollary (5.3.3), K-R and the proposed
approaches are identical,
r (r + 1)
where
m∗ =
− (r − 1) = mt − q − t − r + 2 ,
A2
and
λ∗ = 1−
r −1
r −1
mt − q − t − r + 2
A2 = 1 −
.
=
r (r + 1)
mt − q − t + 1
mt − q − t + 1
The estimates of the denominator degrees of freedom and the scale factor match the
values for the exact multivariate test in corollary 6.3.2.6.
97
7. THE SATTERTHWAITE APPROXIMATIONS
Another method to approximate F tests is the Satterthwaite method. Based on the
original Satterthwaite’s approximation (1941), the method was developed by Giesbrecht
and Burns (1985) and then by Fai and Cornelius (1996). In this chapter, we do not intend
to investigate the theoretical derivation of the method. However, we present some useful
theoretical results.
7.1 The Satterthwaite Method (SAS Institute Inc., 2002-2006)
In order to approximate the F test for the fixed effects, H 0 : L′β = 0 in a model as
described in section 2.1, the Satterthwaite method uses the Wald-type statistic
1
ˆ ) −1 L′βˆ
FS = βˆ ′L(L′ΦL
(i ) The multi-dimensional case( > 1)
ˆ = Ρ′DΡ where Ρ is an orthogonal
First, we perform the spectral decomposition; L′ΦL
matrix of eigenvectors, and D is a diagonal matrix of eigenvalues.
Let ν m =
2(d m ) 2
where
g′m Wg m
d m is the mth diagonal element of D, and g m is the gradient of a mΦa′m with respect to σ
evaluated at σˆ , where a m is the mth row of ΡL′.
Notice that
⎡
∂Φ
g m = ⎢… a m
a′m
∂σ i2
⎣
⎤
= − [ a mΦΡ i Φa′m
⎥
⎦ σ =σˆ
]
σ =σˆ
Then let
E=∑
m =1
νm
νm − 2
I (ν m > 2)
so we eliminate the terms for which ν m ≤ 2.
The degrees of freedom for F are then computed as
⎧ 2E
⎪
ν = ⎨E −
⎪⎩0
for E>
otherwise
98
(ii ) The one-dimensional case ( = 1)
In this case, F statistic is simplified as
(L′βˆ ) 2
FS =
.
ˆ
L′ΦL
Notice that we may use the t-statistic instead since the numerator degrees of freedom equals
one. The denominator degrees of freedom is computed as
ˆ )2
2(L′ΦL
,
ν=
g′Wg
ˆ with respect to σ. (we assume that L′β is estimable).
where g is the gradient of L′ΦL
7.2 The K-R, Satterthwaite, and Proposed Methods
In this section, we provide some useful lemmas that show the relationship among
the Satterthwaite, K-R and the proposed methods.
When = 1, the Satterthwaite, the K-R and the proposed methods give
Lemma 7.2.1
the same estimate of the denominator degrees of freedom.
Proof
The K-R and the proposed methods give the same estimate of the denominator
degrees of freedom which is
r
2
(corollary 5.3.5).
A1
r
A1 = ∑∑ wij tr(ΘΦΡ i Φ)tr(ΘΦΡ j Φ),
Θ=
i =1 j =1
LL′
L′ΦL
r
r
⎛ LL′ΦΡ i Φ ⎞ ⎛ LL′ΦΡ j Φ ⎞
= ∑∑ wij tr ⎜
⎟
⎟ tr ⎜
⎝ L′ΦL ⎠ ⎝ L′ΦL ⎠
i =1 j =1
r
r
L′ΦΡ i ΦLL′ΦΡ j ΦL
= ∑∑ wij
(L′ΦL) 2
i =1 j =1
⇒
2
=
A1
2(L′ΦL) 2
r
r
∑∑ w L′ΦΡ ΦLL′ΦΡ ΦL
i =1 j =1
ij
By noticing that
i
=
2(L′ΦL) 2
where g = [L′ΦΡ i ΦL]r×1
g′Wg
j
∂Φ
= −ΦΡ i Φ, the proof is completed.
∂σ i
99
Note Even though the estimates of the denominator degrees of freedom are the same,
the true levels do not have to be the same for the Satterthwaite method and the K-R and
proposed methods. This is because the statistics used are not the same. The Satterthwaite
statistic uses Φ̂ as the estimator of the variance-covariance matrix of the fixed effects
estimator, whereas the K-R and the proposed methods statistic uses Φ̂A . The following
corollary clarifies this fact.
Corollary 7.2.2
ˆ =Φ
ˆ , then the Satterthwaite, K-R and proposed
When = 1 , and Φ
A
methods are identical.
Proof
From lemma 7.2.1, the Satterthwaite, K-R and proposed methods give the same
estimate of the denominator degrees of freedom. Since the scale is 1 (corollary 5.3.5) and
ˆ =Φ
ˆ , then we have the same statistics for all approaches. This shows that the four
Φ
A
approaches are identical.
Corollary 7.2.3
In balanced mixed classification models, and when = 1 , the
Satterthwaite, K-R and proposed methods are identical. The denominator degrees of
freedom and the scale estimate are
2
and 1 respectively.
A2
Proof: Direct form corollaries 5.3.5, 6.1.1.2, and 7.2.2.
Lemma 7.2.4
When = 1, then the Satterthwaite statistic FS ≥ the K-R and the proposed
methods statistic F for variance components models.
Proof
ˆ L ≥ L′ΦL
ˆ .
It suffices to show that L′Φ
A
r
r
ˆ − Ρˆ ΦΡ
ˆ =Φ
ˆ + 2Φ
ˆ ⎧⎨∑∑ wˆ (Q
ˆ ˆ ) ⎫⎬ Φ
ˆ
Since Φ
A
ij
ij
i
j
=
=
1
1
i
j
⎩
⎭
r
r
ˆ − Ρˆ ΦΡ
ˆ −Φ
ˆ = 2Φ
ˆ ⎧⎨∑∑ wˆ (Q
ˆ ˆ ) ⎫⎬ Φ
ˆ
⇒Φ
A
ij
ij
i
j
⎩ i =1 j =1
⎭
Notice that
100
Q ij − Ρ i ΦΡ j = Χ′Σ −1
= Χ′Σ −1
∂Σ −1 ∂Σ −1
∂Σ −1
∂Σ −1
Σ
Σ Χ − Χ′Σ −1
Σ Χ( Χ′Σ −1 Χ ) −1 Χ′Σ −1
Σ Χ
∂σ i
∂σ j
∂σ i
∂σ j
∂Σ
∂Σ −1
G
Σ Χ,
∂σ i ∂σ j
where G = Σ −1 − Σ −1Χ( Χ′Σ -1 Χ) −1 Χ′Σ −1
and hence,
⎧ r r
⎫
∂Σ
Φ A − Φ = 2Φ ⎨∑∑ wij Bi GB′j ⎬ Φ
where Bi = Χ′Σ −1
∂σ i
⎩ i =1 j =1
⎭
= 2ΦBMB′Φ,
where B = [B1 ,...., B r ] and M = W ⊗ G
Since W and G are both n.n.d, so is M ( Harville, 1997, P.367), and hence so is BMB′.
(
)
ˆ −Φ
ˆ is n.n.d., and hence L′ Φ
ˆ −Φ
ˆ L ≥ 0.
So Φ
A
A
Definition The Loewner ordering of symmetric matrices (Pukelsheim, 1993, chapter 1):
For A and B symmetric matrices, we say A ≥ B when A − B is n.n.d .
Lemma 7.2.5
Proof
For A and B p.d. matrices, and A ≥ B, then A −1 ≤ B −1
Since A ≥ B, then A = B + V = B + CC′ for some n.n.d. matrix V
⇒ A −1 = B −1 − B −1C ( C′B −1C + Ι ) C′B −1 (theorem 1.7, Schott, 2005)
−1
⇒ B −1 − A −1 = B −1C ( C′B −1C + Ι ) C′B −1 which is n.n.d, and hence B −1 ≥ A −1.
−1
Note Theorem 7.2.4 is applicable for any
for the variance components models as long
as the scale is less or equal to 1, and this is true because from lemma 7.2.5, we have
(
ˆ L ≥ L′ΦL
ˆ ⇒ L′Φ
ˆ L
L′Φ
A
A
ˆ )
) ≤ ( L′ΦL
−1
−1
, and hence F ≤ FS .
Consider the partition of the design matrix Χ as Χ = [ Χ1 , Χ 2 ], Χ1 and Χ 2 are matrices of
dimensions n × p1 and n × p2 respectively where p = p1 + p2 .
⎡ Χ1′ Σ −1Χ1
Χ′Σ Χ = ⎢
−1
⎣ Χ′2 Σ Χ1
−1
Χ1′ Σ −1Χ 2 ⎤
⎥
Χ′2 Σ −1Χ 2 ⎦
Let Χ 2 to be correspondent to the fixed effects that of our interest to be tested.
101
Consider the case where
(a) Χ1′ Σ −1Χ 2 = 0.
(b) Χ′2 Σ −1Χ 2 = f (σ ) A where A is a fixed matrix. A is invertible when Χ is a full column rank
matrix.
Lemma 7.2.6 For a model that satisfies conditions (a) and (b) mentioned above, we
have A1 = A2 .
Proof
0⎤
⎡C
According to above, Χ′Σ −1Χ = ⎢ 1
⎥
⎣ 0 C2 ⎦
where C2 = f (σ ) A
⎡C−1 0 ⎤
⇒ Φ = ( Χ′Σ -1 Χ) −1 = ⎢ 1
−1 ⎥
⎣ 0 C2 ⎦
Also, since L′ = ⎡⎣0 × p1
B′ × p2 ⎤⎦ , then
⎡C−1 0 ⎤ ⎡ 0 ⎤
= B′C−21B,
L′ΦL = [ 0 B′] ⎢ 1
−1 ⎥ ⎢ ⎥
B
0
C
⎣
2 ⎦⎣ ⎦
⎡ 0 p1× ⎤
0
⎡0
⎤
−1
−1
Θ = L(L′ΦL) −1 L′ = ⎢
,
⎥ (B′C2 B) ⎣⎡0 × p1 B′ × p2 ⎦⎤ = ⎢
⎥
−1
−1
⎢⎣B p2 × ⎥⎦
⎣0 B(B′C2 B) B′⎦
0
0
⎡ C −1 0 ⎤ ⎡ 0
⎤ ⎡C1−1 0 ⎤ ⎡ 0
⎤
=⎢
ΦΘΦ = ⎢ 1
⎢
⎥
−1
−1
−1
−1
−1
−1 ⎥
−1 ⎥ ⎢
−1 ⎥
⎣ 0 C2 ⎦ ⎣0 B(B′C2 B) B′⎦ ⎣ 0 C2 ⎦ ⎣0 C2 B(B′C2 B) B′C2 ⎦
⎡ ∂C1
⎤
0
⎢
⎥
∂σ i
∂Σ −1
⎢
⎥ then
Moreover, since Ρ i = Χ′
Χ=
∂C2 ⎥
⎢
∂σ i
⎢ 0
∂σ i ⎥⎦
⎣
⎡ ∂C1
⎤
0 ⎥ ⎡0
0
⎤
⎢
0
⎡0
⎤ ⎢ ∂σ i
⎢
⎥
⎥
ΦΘΦΡ i = ⎢
=
−1
−1
−1
−1 ∂C 2
−1
−1
−1
−1 ⎥
⎢
⎥
′
′
0
C
B
B
C
B
B
C
(
)
′
′
0
C
B
B
C
B
B
C
(
)
C
∂
⎢
⎥
2
2
2
2
2
2 ⎦
⎣
2
∂σ i ⎥⎦
⎢⎣
⎢ 0
⎥
∂σ i ⎦
⎣
⎛
∂C ⎞
So, tr(ΘΦΡ iΦ) = tr ⎜ C2−1B(B′C−21B) −1 B′C−21 2 ⎟
∂σ i ⎠
⎝
Since C2 = f (σ ) A, then C−2 1
∂C2
A −1 ∂f (σ )
1 ∂f (σ )
=
A=
Ι
∂σ i
f (σ ) ∂σ i
f (σ ) ∂σ i
(7.1)
102
So, expression (7.1) above can be simplified as
tr(ΘΦΡ i Φ) =
=
1 ∂f (σ )
tr ( C −21B(B′C−21B ) −1 B′ )
f (σ ) ∂σ i
1 ∂f (σ )
∂f (σ )
tr ( B′C−21B(B′C−21B) −1 ) =
f (σ ) ∂σ i
f (σ ) ∂σ i
⇒ tr(ΘΦΡ i Φ)tr(ΘΦΡ j Φ) =
(7.2)
∂f (σ ) ∂f (σ )
[ f (σ )] ∂σ i ∂σ j
2
(7.3)
2
Also,
0
0
⎤
⎡0
⎤ ⎡0
⎢
⎢
⎥
∂C ⎥
ΦΘΦΡ i ΦΘΦΡ j =
⎢0 C2−1B(B′C−21B) −1 B′C−21 ∂C2 ⎥ ⎢0 C2−1B(B′C−21B) −1 B′C−21 2 ⎥
∂σ j ⎥⎦
∂σ i ⎥⎦ ⎢⎣
⎢⎣
0
⎡0
⎤
⎢
∂C
∂C ⎥
=⎢
0 C2−1B(B′C−21B) −1 B′C−21 2 C−21B(B′C−21B) −1 B′C−21 2 ⎥
∂σ i
∂σ j ⎥⎦
⎢⎣
So,
⎛
∂C
∂C ⎞
tr(ΘΦΡ i ΦΘΦΡ j Φ) = tr ⎜ C−21B(B′C−21B) −1 B′C−21 2 C−21B(B′C−21B)−1 B′C−21 2 ⎟
⎜
∂σ i
∂σ j ⎟⎠
⎝
1 ∂f (σ ) ∂f (σ )
tr ( C−21B(B′C−21B) −1 B′C−21B(B′C−21B)−1 B′ )
=
2
[ f (σ )] ∂σ i ∂σ j
=
1
∂f (σ ) ∂f (σ )
tr ( B′C−21B(B′C−21B) −1 B′C−21B(B′C−21B)−1 )
2
[ f (σ )] ∂σ i ∂σ j
=
∂f (σ ) ∂f (σ )
[ f (σ )] ∂σ i ∂σ j
2
From expressions 7.3 and 7.4, we have
tr(ΘΦΡ i Φ)tr(ΘΦΡ j Φ) = tr(ΘΦΡ i ΦΘΦΡ j Φ),
A1 = A2
and hence
Lemma 7.2.7
For a model that satisfies conditions (a) and (b) mentioned above
(i ) The K-R and the proposed approaches are identical.
(ii ) The Satterthwaite, K-R and proposed approaches give the same estimate
of the denominator degrees of freedom given that the estimate > 2.
(7.4)
103
ˆ = Φ̂ , and the estimate of the denominator degrees of freedom >2,
(iii ) If Φ
A
then the Satterthwaite, K-R and proposed methods are identical.
(i ) Observe that A1 = A2 (lemma 7.2.6), and hence the K-R, and the proposed
Proof
methods are identical (corollary 5.3.3). They all give 1 and
2
as the estimates of the
A2
scale and the denominator degrees of freedom respectively (theorems 5.3.1 and 5.3.2).
(ii ) Claim
2d m2
2
=
g′m Wg m A 2
r
Proof
for any m.
r
A1 = ∑∑ wij tr(ΘΦΡi Φ)tr(ΘΦΡ j Φ) = a′Wa where a = [tr(ΘΦΡ i Φ)]r×1 ,
i =1 j =1
Since form expression (7.2), tr(ΘΦΡ i Φ) =
⎡
∂f (σ )
∂f (σ ) ⎤
, then a = ⎢
⎥
f (σ ) ∂σ i
⎣ f (σ ) ∂σ i ⎦ r ×1
And hence
the R.H.S =
2
2 2
2 2
2[ f (σ )]2
=
=
=
A 2 A1 a′Wa
g′Wg
⎡ ∂f (σ ) ⎤
where g = ⎢
⎥ .
∂
σ
i ⎦ r ×1
⎣
⎡ ∂Φ ⎤
⎡
⎤
∂Φ
a′m ⎥ where a m is the mth row of ΡL′, then g m = ⎢b m L′
Lb′m ⎥
Since g m = ⎢a m
∂σ i
⎣ ∂σ i ⎦ r×1
⎣
⎦ r ×1
where b m is the mth row of Ρ.
⎡
⎤
⎡ −1 ∂f (σ )
⎤
b m B′C2−1Bb′m ⎡ ∂f (σ ) ⎤
∂C2−1
−1
Bb′m ⎥ = ⎢
b m B′A Bb′m ⎥ = −
⇒ g m = ⎢ b m B′
⎢
⎥
2
f (σ )
∂σ i
⎣ ∂σ i ⎦ r×1
⎣
⎦ r ×1 ⎣ [ f (σ )] ∂σ i
⎦ r ×1
⎛
⎞
2d m2
2d m2
f (σ )
And the L.H.S =
=
⎜
⎟
−1
′
′
g′m Wg m ⎡
⎡ ∂f (σ ) ⎤ ⎝ b m B C2 Bb m ⎠
∂f (σ ) ⎤′
⎢ ∂σ ⎥ W ⎢ ∂σ ⎥
i ⎦ r ×1
i ⎦ r ×1
⎣
⎣
−1
Notice that b m B′C2 Bb′m = b m L′ΦLb′m = d m .
⇒ the L.H.S =
2
2d m2
2[ f (σ )]2
=
= the R.H.S
g′m Wg m ⎡
′
⎡ ∂f (σ ) ⎤
∂f (σ ) ⎤
⎢ ∂σ ⎥ W ⎢ ∂σ ⎥
i ⎦ r ×1
i ⎦ r ×1
⎣
⎣
We note that the L.H.S does not depend on row m, and this means that our claim is true
for any row m.
104
2d m2
2
Since ν m =
>2 for any m,
=
g′m Wg m A 2
then E =
2 d m2
g′m Wg m
⎛ 2d m2
⎞
2d m2
2E
2
,
and
−
=
=
= the L.H.S.
ν
⎜ ′
⎟
E−
g′m Wg m
⎝ g m Wg m
⎠
(iii ) Direct from parts (ii ) and (iii ).
Comments
(i ) When conditions (a) and (b) are satisfied, then the K-R and the proposed methods are
identical. However, these conditions are not enough to make the Satterthwaite approach
identical with the K-R and the proposed methods.
(ii ) However, when the denominator degrees of freedom estimate is greater than two,
conditions (a) and (b) are enough to make the Satterthwaite produce the same estimate of
the denominator degrees of freedom produced by the K-R and the proposed methods.
(iii ) When the K-R and the proposed methods’ estimate of the denominator degrees of
freedom is less or equal than two, the Satterthwaite estimate is zero.
(iv ) Even though the Satterthwaite method produces the same estimate of the
denominator degrees of freedom as the one produced by K-R and the proposed methods
when the denominator degrees of freedom estimate is greater than two and conditions (a)
and (b) are satisfied, we should notice that this does not mean that the Satterthwaite
approach is identical to the K-R and proposed methods. This is true because the statistics
used are different. The Satterthwaite statistic uses Φ̂ as an estimator of the variancecovariance matrix of the fixed effects estimator, whereas the K-R and the proposed
methods statistic uses Φ̂A .
ˆ =Φ
ˆ . For example, in BIB designs,
(v ) Conditions (a) and (b) are not enough to make Φ
A
ˆ ≠Φ
ˆ (chapter 8).
conditions (a) and (b) are satisfied; however, Φ
A
105
8. SIMULATION STUDY FOR BLOCK DESIGNS
To compare the tests discussed in the previous chapters, we have conducted
simulation studies for three different types of block designs: partially balanced
incomplete block design (PBIB), balanced incomplete block design (BIB), and complete
block design with missing data (two observations to be missing: one is from the first
block under first treatment and the other one is from the second block under the second
treatment). To test the fixed effects in these models, there are two approaches. The first
approach is the intra-block approach where we consider the blocks fixed, so that only the
within blocks information is utilized. Even though this approach leads to exact F tests, it
does not lead to an optimal test. When the design is efficient and blocking is effective,
this approach is close to optimal. We use the second approach where the blocks are
considered random, and information from both within and between blocks is utilized.
With the second approach, the model does not satisfy Zyskind’s condition, so it is not a
Rady model, and hence we do not have an exact F-test to test the fixed effect.
8.1 Preparing Formulas for Computations
Model for a Block Design
yij = μ + α i + b j + eij
for i = 1,....., t , j = 1,....., s, where μ is the general mean, α i is the treatments effects,
b j is the blocks effects, b j ∼ N(0, σ b2 ), eij ∼ N(0, σ e2 ) , and b j 's and eij 's are all
independent.
The model can be expressed as
where b = [b1 ,
y = 1n μ + Aa + Bb + e,
bs ]′, a = [α1 , , α t ]′,
2
E(y ) = 1n μ + Aa, and Var(y ) = Σ = σ e2 Ι n + σ b2 BB′ = ∑ σ i2 Di where D1 = Ι n and D2 = BB′
i =1
Suppose that we are interested in testing: H: α1 =
= α t ⇔ L′β = 0
106
⎡ 0 1 −1 0
⎢0 0 1 −1
where L′ = ⎢
⎢
⎢
0
⎣0 0
1
0⎤
0 ⎥⎥
,
⎥
⎥
−1⎦
β′ = [ μ α1
αt ]
The design matrix; Χ = [1n , A] is not of full column rank. We can proceed with this
parameterization using the g-inverse. However, to simplify our computations, we will
reparameterize the model so Χ will be a full column rank matrix.
Reparameterize the model in the way that β∗′ = ⎣⎡ μ ∗ α1∗
⎡0 1
⎢0 0
∗′
and L = ⎢
⎢
⎢
⎣0 0
t
α t∗−1 ⎦⎤ with ∑ α i∗ = 0.
i =1
0⎤
0 ⎥⎥
,
⎥
⎥
1⎦
0
1
0
To simplify our notation, from now on, consider Χ = Χ∗ , L = L∗ , and β = β∗ .
The REML Estimates
The REML equations are:
2∂ (σ )
= − tr(G ) + y ′GGy ,
∂σ e2
2∂ (σ )
= − tr(GD2 ) + y ′GD2Gy,
∂σ b2
where G = Σ −1 − Σ −1Χ( Χ′Σ −1Χ) −1 Χ′Σ −1 = Κ (Κ ′ΣΚ )−1 Κ ′.
A useful form to find the REML estimates by iteration is
{tr ( D GD G )}
i
j
r
i , j =1
σ = { y′GDi Gy}i =1 ,
r
where σ (r ×1) = (σ 1 ,...., σ r )′ (Searle et al., 1992, chapter 6).
To estimate wij , we may use the inverse of the expected information matrix where,
⎡ ∂2
⎤ 1
− E ⎢ 2 2 ⎥ = tr(GD j GDi )
⎢⎣ ∂σ i ∂σ j ⎥⎦ 2
(Searle et al., 1992, chapter 6)
⎡ ∂2 ⎤ 1
and hence − E ⎢
= tr(G 2 ),
2 2⎥
(
)
σ
∂
e
⎣
⎦ 2
⎡ ∂2 ⎤ 1
−E⎢
= tr(GD2GD2 ),
2 2⎥
⎣ ∂ (σ b ) ⎦ 2
107
⎡ ∂2
⎤ 1
−E ⎢ 2 2 ⎥ = tr(GD2G )
⎣ ∂σ e ∂σ b ⎦ 2
and
The entries of the inverse of the information matrix are estimated by using the REML
estimates of σ e2 and σ b2 .
How we get matrix K
Κ ′ = C′[Ι n − Ρ Χ ] for any C′ (Searle et al ., 1992, chapter 6)
Notice that Κ ′Χ = C′[Ι n − Ρ Χ ]Χ = C′[ Χ − Χ] = 0.
However, it's desirable to have Κ ′ with a full row rank so Κ ′ΣΚ is invertible.
For our example, we obtain Ι n − Ρ Χ , and then we can pick the independent rows of it.
Computing Ρ i , Q ij and R ij
Ρ1 = Χ′
∂Σ −1
∂Σ −1
Χ = − Χ′Σ −1
Σ Χ = − Χ′Σ −1Σ −1Χ,
2
∂σ e
∂σ e2
Ρ 2 = Χ′
∂Σ −1
∂Σ −1
Χ = − Χ′Σ −1
Σ Χ = − Χ′Σ −1D2 Σ −1Χ
2
∂σ b
∂σ b2
Q11 = Χ′
∂Σ −1 ∂Σ −1
∂Σ −1 ∂Σ −1
Σ
Χ = Χ′Σ −1
Σ
Σ Χ = Χ′Σ −1Σ −1Σ −1Χ,
2
2
∂σ e ∂σ e
∂σ e2
∂σ e2
Q12 = Q 21 = Χ′
Q 22 = Χ′
∂Σ −1 ∂Σ −1
∂Σ −1 ∂Σ −1
Σ
Χ = Χ′Σ −1
Σ
Σ Χ = Χ′Σ −1Σ −1D2 Σ −1Χ
2
2
∂σ e ∂σ b
∂σ e2
∂σ b2
∂Σ −1 ∂Σ −1
Σ
Χ = Χ′Σ −1D2 Σ −1D2 Σ −1Χ .
∂σ b2 ∂σ b2
R ij = 0, for all i, j = 1, 2 (clear).
The above quantities are to be estimated by using the REML estimates of σ .
Block designs with equal block sizes
When the design has equal block sizes (e.g., BIBD, and PBIBD), then D2 = BB′ = Ι s ⊗ J k
where k is the block size. Also, Σ = Ι s ⊗ Σ k , where Σ k = σ e2Ι k + σ b2 J k . Σ −1 = Ι s ⊗ Σ −k 1 ,
where Σ −k 1 = (σ e2 Ι k + σ b2 J k ) =
−1
⎤
σ b2
1 ⎡
−
Ι
J k ⎥ (Schott, 2005, theorem 1.7)
2 ⎢ k
2
2
σe ⎣
σ e + kσ b ⎦
108
Ρ 2 = Χ′
Χ′D Χ
∂Σ −1
∂Σ −1
Χ = − Χ′Σ −1
Σ Χ = − Χ′Σ −1D2 Σ −1Χ = − 2 2 2 2
2
2
∂σ b
∂σ b
(σ e + kσ b )
Notice that D2 Σ −1 = (Ι s ⊗ J k )(Ι s ⊗ Σ −k 1 ) = Ι s ⊗ J k Σ −k 1 =
Ιs ⊗ Jk
D
= 2 2 2 .
2
2
(σ e + kσ b ) (σ e + kσ b )
∂Σ −1 ∂Σ −1
∂Σ −1 ∂Σ −1
Σ
Χ = Χ′Σ −1
Σ
Σ Χ
2
2
∂σ e ∂σ b
∂σ e2
∂σ b2
Χ′D Χ
= Χ′Σ −1Σ −1D2 Σ −1Χ = 2 2 2 3
(σ e + kσ b )
Q12 = Q 21 = Χ′
Q 22 = Χ′
kΧ′D2 Χ
Χ′D D Χ
∂Σ −1 ∂Σ −1
Σ
Χ = Χ′Σ −1D2 Σ −1D2 Σ −1Χ = 2 2 2 2 3 = 2
= kQ12 .
2
2
∂σ b ∂σ b
(σ e + kσ b )
(σ e + kσ b2 )3
r
For the mixed linear model y = Χβ + u, u ∼ N(0, Σ), Σ = ∑ σ i Vi , and the
Lemma
i =1
null hypothesis is H 0 : L′β = 0 .
(i ) In simulation of the K-R and Satterthwaite statistics ( FKR and FS ) under H 0 , it suffices
to take the fixed effects, β = 0 .
(ii ) FKR (u) = FKR (au) and FS (u) = FS (au) for any constant a > 0.
1
ˆ L) −1 (L′βˆ ).
(i ) FKR = (L′βˆ )′(L′Φ
A
Proof
To simulate FKR under H 0 , we generate u ∼ N(0, Σ), and then y = Χβ + u where L′β = 0.
ˆ ,Φ
ˆ , and Φ
ˆ do not depend on Χβ. Indeed, they depend
However, we should notice that Σ
A
only on u.
ˆ ′Σ
ˆ −1y = ΦΧ
ˆ ′Σ
ˆ −1 ( Χβ + u) = β + ΦΧ
ˆ ′Σ
ˆ −1u.
Moreover, βˆ = ΦΧ
ˆ ′Σˆ −1u = L′ΦΧ
ˆ ′Σ
ˆ −1u = L′βˆ (under H ) where βˆ is βˆ calculated from
⇒ L′βˆ = L′β + L′ΦΧ
0
0
0
u = y with β = 0.
Similarly, it can be shown that in simulation of FS under H 0 , it suffices to take β = 0
ˆ ( au ) = a 2 Σ
ˆ (u).
(ii ) Observe that u ∼ N(0, Σ) ⇔ au ∼ N(0, a 2 Σ), and hence Σ
(
ˆ (au) = Χ′Σ
ˆ −1 (au) Χ
Φ
)
−1
(
ˆ −1 (u) Χ
= a 2 Χ′Σ
)
−1
ˆ (u),
= a 2Φ
ˆ (au) Χ′Σ
ˆ −1 (au)au = aΦ
ˆ (u) Χ′Σ
ˆ −1 (u)u = aβˆ (u).
βˆ (au) = Φ
109
ˆ
ˆ
ˆ −1 (au) ∂Σ(au) Σ
ˆ −1 ( au) Χ = − 1 Χ′Σ
ˆ −1(u) ∂Σ(u) Σ
ˆ −1 (u) Χ = 1 Ρ (u),
Ρˆ i (au) = − Χ′Σ
i
4
∂σ i
a
∂σ i
a4
ˆ (au) −1
ˆ
∂Σ
ˆ (au) ∂Σ(au) Σ
ˆ −1 (au) Χ
Σ
and Q ij (au) = Χ′Σˆ −1 (au)
∂σ i
∂σ j
=
ˆ
ˆ
1
ˆ −1 (u) ∂Σ(u) Σ
ˆ −1 (u) ∂Σ(u) Σ
ˆ −1 (u) Χ = 1 Q (u).
Χ′Σ
ij
6
∂σ i
∂σ j
a
a6
(
)
(u) Χ )
ˆ −1 ( au) − Σ
ˆ −1 (au) Χ′Σ
ˆ −1 (au) Χ
Also, since G (au) = Σ
=
1
a2
(
⎡Σ
ˆ −1 (u) − Σ
ˆ −1 (u) Χ′Σ
ˆ −1
⎢⎣
−1
ˆ −1 (au)
Χ′Σ
−1
ˆ −1 (u) ⎤ = 1 G (u),
Χ′Σ
⎥⎦ a 2
r
⎡ ⎧ ⎛
ˆ (au)
ˆ (au) ⎞ ⎫⎪⎤
∂Σ
∂Σ
⎪
⎢
then W (au) ≈ 2 ⎨ tr ⎜ G (au)
G ( au )
⎟ ⎬⎥
⎢ ⎪ ⎝⎜
∂σ i
∂σ j ⎠⎟ ⎪⎥
i , j =1 ⎭ ⎦
⎣m ⎩
−1
(section 6.1.2)
−1
r
⎡ ⎧ ⎛
ˆ (u)
ˆ (u) ⎞ ⎪⎫⎤
1
Σ
Σ
∂
∂
⎪
4
G (u)
= 2 ⎢ ⎨ tr ⎜ 4 G (u)
⎟⎟ ⎬⎥ = a W (u).
⎜
⎢ ⎪ ⎝a
∂σ i
∂σ j ⎠ ⎪⎥
i , j =1 ⎭ ⎦
⎣m ⎩
ˆ ( au ) = a 2 Φ
ˆ (u), and hence hence F (u) = F (au)
Since R ij = 0, then we obtain Φ
A
A
KR
KR
and FS (u) = FS (au).
8.2 Simulation Results
Seven approaches for testing fixed effects are considered in the simulation study:
Kenward-Roger (after modification), Kenward-Roger before the modification, KenwardRoger using the conventional variance-covariance of the fixed effects, Satterthwaite,
Containment, and our proposed methods. Five settings for the ratio of σ b and σ e that are
denoted by ρ have been used: 0.25, 0.5, 1, 2, 4. For each setting of ρ , 10,000 data sets
have been simulated from the corresponding multivariate distribution with zero treatment
effects. The level for each test, mean of scales, and mean of the denominator degrees of
freedom were computed for the data sets at nominal 0.05. We use †† to denote levels
below 0.0400 or above 0.0600, and † for levels between 0.0400 and 0.0449 or between
0.0551 and 0.0600. So, all unmarked levels are between 0.0450 and 0.0550. For each data
110
set the REML estimates of the variance components have been calculated by the iteration
method mentioned above, and convergence was achieved for almost all data sets. The
convergence rate is reported. Also, we present the percentage relative bias in the
conventional and the adjusted estimator of variance-covariance matrix of the fixed effects
estimator. In each type of block design, we consider some examples where we have
different efficiency factors. The relative efficiency of a random block model to a
complete block model for a given number t of treatments and a given number of n of
observations can be expressed as
E=
t/n
trace ⎡⎢Τ′ ( A′Σ −ρ1A ) Τ ⎤⎥ (t − 1)
⎣
⎦
−1
where Σ ρ = ρ 2 BB′ + Ι n , Τ′Τ = Ι t −1 , and Τ′1t = 0.
The efficiency factor which is denoted by q is defined to be the lower bound of E , and it
describes the design alone. For an equireplicate block design with equal block sizes,
q=
(
t −1
r trace ⎡⎣ A′ ( Ι − Ρ B ) A ⎤⎦
+
)
where r is the replication of the treatments (Birkes, 2006).
Observe that BIB and PBIB designs are equireplicate and have equal block sizes.
8.2.1 Partially Balanced Incomplete Block Design (PBIB)
I. The first PBIB design (PBIB1) is from Cochran and Cox (1957, P.456) where we have
fifteen blocks, fifteen treatments, four treatments per block and q = 0.7955 .
Block
1
2
3
4
5
Table 1.1
PBIB1 (t = s = 15, k = 4, and n = 60)
Treat.
Block Treat.
Block Treat.
15,9,1,13 6
12,4,3,1
11
9,7,10,3
5,7,8,1
7
12,14,15,8 12
8,6,2,9
10,1,14,2 8
6,3,14,5
13
5,9,11,12
15,11,2,3 9
5,4,2,13
14
7,13,14,11
6,15,4,7 10
10,12,13,6 15
10,4,8,11
111
Table 1.2
Simulated size of nominal 5% Wald F- tests for PBIB1
∗
ρ
K-R
Prop.1 Prop.2 Satter
K-R
K-R ∗∗
††
0.25 0.0528
0.0466 0.0478 0.0468 0.0697 ††
0.0390
0.5 0.0536
0.0423† 0.0475 0.0483 0.0480 0.0622††
1.0
0.0521 0.0521 0.0521 0.0560†
0.0573† 0.0486
2.0 0.0548 0.0467
0.0480 0.0480 0.0480 0.0497
4.0 0.0544 0.0481
0.0484 0.0484 0.0484 0.0492
Contain
0.0674††
0.0558†
0.0528
0.0483
0.0485
ˆ instead of Φ
ˆ . S.E. of entries ∈ (0.0019,0.0025)
∗ before modification, ∗ ∗ with using Φ
A
Table1.3
Mean of estimated denominator degrees of freedom for PBIB1
ρ
K-R
Prop.1
Prop.2
Satter
Contain
K-R ∗
K-R ∗∗
0.25 48.8821 31.9638 39.3355 41.7298 40.2042 38.8929
31
0.5 47.4572 31.8812 38.4884 39.6942 38.9313 38.3272
31
1.0 43.1678 30.0484 34.6682 34.8640 34.7480 34.6475
31
2.0 40.5233 29.9220 32.1060 32.1205 32.1125 32.1052
31
4.0 39.7016 30.5696 31.2880 31.2889 31.2884 31.2830
31
ˆ instead of Φ
ˆ .
∗ before modification, ∗ ∗ with using Φ
A
Table 1.4
Mean of estimated scale for PBIB1
ρ
K-R
Prop.1
K-R ∗
K-R ∗∗
0.25 0.9890 0.8787 0.9974 0.9954
0.5 0.9916 0.9459 0.9991 0.9978
1.0 0.9913 0.9829 0.9999 0.9996
2.0 0.9902 0.9970 1.0000 1.0000
4.0 0.9898 0.9996 1.0000 1.0000
ˆ instead of Φ
ˆ
∗ before modification, ∗ ∗ with using Φ
A
Prop.2
0.9975
0.9989
0.9998
1.0000
1.0000
.
Table 1.5
Percentage relative bias in the variance estimates,
convergence rate (CR) and efficiency ( E ) for PBIB1
ρ
CR
E
Φ̂
Φ̂A
0.25
-6.5
1.6
0.9993 0.9604
0.5
-4.2
1.0
0.9999 0.8999
1.0
-0.6
1.5
1.0000 0.8378
2.0
-0.4
0.0
1.0000 0.8080
4.0
-0.6
-0.5
1.0000 0.7987
112
II. The second design (PBIB2) is from Green (1974, P.65) where we have forty-eight
blocks, sixteen treatments, two treatments per block and q = 0.4762 .
blk
1
2
3
4
5
6
7
8
trt
1,2
1,3
1,4
2,3
2,4
3,4
5,6
5,7
ρ
0.25
0.5
1.0
2.0
4.0
blk
9
10
11
12
13
14
15
16
Table 1.6
PBIB2 design (t = 16, s = 48, k
trt
blk
trt
blk
5,8
17 10,12 25
6,7
18 11,12 26
6,8
19 13,14 27
7,8
20 13,15 28
9,10
21 13,16 29
9,11
22 14,15 30
9,12
23 14,16 31
10,11 24 15,16 32
= 2, and n = 96)
trt
blk
trt
1,5
33
2,14
1,9
34
6,10
1,13
35
6,14
5,9
36 10,14
5,13
37
3,7
9,13
38
3,11
2,6
39
3,15
2,10
40
7,11
Table 1.7
Simulated size of nominal 5% Wald F- tests for PBIB2
K-R
Prop.1
Prop.2
Satter
K-R ∗
K-R ∗∗
†
†
†
†
0.0532 0.0568 0.0585 0.0573 0.0730††
0.0593
0.0585† 0.0529 0.0561† 0.0579† 0.0565† 0.0711††
0.0570† 0.0504 0.0540 0.0549 0.0543 0.0644††
0.0542
0.0557† 0.0473 0.0508 0.0511 0.0509
0.0507
0.0562† 0.0470 0.0493 0.0494 0.0493
blk
41
42
43
44
45
46
47
48
Contain
0.0440†
0.0453
0.0468
0.0480
0.0480
ˆ instead of Φ
ˆ . S.E. of entries ∈ (0.0021,0.0026)
∗ before modification, ∗ ∗ with using Φ
A
Table1.8
Mean of estimated denominator degrees of freedom for PBIB2
ρ
K-R
Prop.1
Prop.2
Satter
Contain
K-R ∗
K-R ∗∗
0.25 83.6874 65.3837 72.6495 82.2865 74.7209 68.9973
33
0.5 79.8284 61.8836 69.0306 77.4455 70.9267 65.9502
33
1.0 65.6409 49.2069 55.8740 59.9145 56.9966 54.6475
33
2.0 49.9531 36.2898 41.2927 41.9262 41.5348 41.1791
33
4.0 43.6954 32.8601 35.2892 35.3400 35.3120 35.2835
33
ˆ
ˆ
∗ before modification, ∗ ∗ with using Φ instead of Φ .
A
trt
7,15
11,15
4,8
4,12
4,16
8,12
8,16
12,16
113
Table 1.9
Mean of estimated scales for PBIB2
ρ
K-R
Prop.1
K-R ∗
K-R ∗∗
0.25 0.9953 0.9482 0.9981 0.9949
0.5 0.9952 0.9511 0.9983 0.9952
1.0 0.9948 0.9639 0.9991 0.9968
2.0 0.9934 0.9859 0.9999 0.9991
4.0 0.9917 0.9974 1.0000 0.9999
ˆ instead of Φ
ˆ .
∗ before modification, ∗ ∗ with using Φ
Prop.2
0.9973
0.9975
0.9984
0.9996
1.0000
A
Table 1.10
Percentage relative bias in the variance estimates,
convergence rate (CR) and efficiency ( E ) for PBIB2
ρ
CR
E
Φ̂
Φ̂A
0.25
0.5
1.0
2.0
4.0
-7.0
-7.0
-5.9
-3.3
-2.6
-2.5
-2.7
-2.4
-1.6
-2.2
1.0000
1.0000
1.0000
1.0000
1.0000
0.9478
0.8408
0.6705
0.5451
0.4955
8.2.2 Balanced Incomplete Block Design (BIBD)
Since the design is balanced with respect to treatments, then it can be shown that
the model satisfies conditions (a) and (b) for theorem lemma 7.2.6, and hence A1 = A 2 .
The K-R and the proposed approaches are identical. The scale estimate is 1 and the
denominator degree of freedom is estimate is
2
(theorems 5.3.1 and 5.3.2, and corollary
A2
5.3.3). The Satterthwaite method gives the same estimate of the denominator degrees of
freedom as the K-R and the proposed approaches when the estimate >2 (lemma 7.2.7).
Six BIB designs with different efficiency factors are adopted from Cochran and Cox
(1957, chapter 11). Notice that the efficiency factor simplified as q =
tα 1 − 1 k
where
=
rk 1 − 1 t
r is the replication of the treatments and α is the number of blocks in which each pair of
treatments occur together (Kuehl, 2000, chapter 9).
114
I. In the first design (BIB1), where we have fifteen blocks, six treatments, and two
treatments per block (t = 6, s = 15, k = 2, and n = 30) , and the efficiency factor, q = 0.6
Table 2.1
BIB1 (t = 6, s = 15, k = 2,
Block Treat. Block Treat. Block Treat.
1
1,2
4
1,3
7
1,4
2
3,4
5
2,5
8
2,6
3
5,6
6
4,6
9
3, 5
and n = 30)
Block Treat.
10
1,5
11
2,4
12
3,6
Block
13
14
15
Table 2.2
Simulated size of nominal 5% Wald F- tests for BIB1
ρ
K-R&prop. Satter
Contain
K-R ∗
††
††
††
0.25 0.0786
0.0676
0.0998
0.0619††
0.5
0.0731††
0.0635††
0.0930††
0.0589†
1.0
0.0534
0.0710††
0.0566†
0.0746††
††
†
†
2.0
0.0539
0.0697
0.0551
0.0599
††
†
4.0
0.0542
0.0540
0.0727
0.0559
∗ before modification. S.E. of entries ∈ (0.0023,0.003)
Table 2.3
Mean of estimated denominator degrees of freedom for BIB1
ρ
K-R&prop.
Satter
Contain
K-R ∗
0.25 29.0917
20.1712
20.1712
10
0.5 28.0031
19.0591
19.0591
10
1.0 24.6097
15.5784
15.5784
10
2.0 21.0182
11.8539
11.8539
10
4.0 19.7201
10.4807
10.4807
10
∗ before modification.
Table 2.4
Mean of estimated scales for BIB1
ρ
K-R
K-R ∗
0.25 0.9753 1.0000
0.5 0.9725 1.0000
1.0 0.9619 1.0000
2.0 0.9446 1.0000
4.0 0.9344 1.0000
∗ before modification.
Treat.
1,6
2,3
4,5
115
Table 2.5
Percentage relative bias in the variance estimates,
convergence rate (CR) and efficiency ( E ) for BIB1
ρ
CR
E
Φ̂
Φ̂A
0.25 -16.0
-3.7
0.9968 0.9556
0.5 -13.5
-2.3
0.9978 0.8667
1.0 -8.9
-1.2
0.9994 0.7333
2.0 -2.0
-0.6
1.0000 0.6444
4.0 0.6
0.9
1.0000 0.6121
II. In the second design (BIB2), we have ten blocks, six treatments, and three treatments
per block (t = 6, s = 10, k = 3, and n = 30) , q = 0.8
Table 2.6
BIB2 (t = 6, s = 10, k = 3, and n = 30)
Block Treat. Block Treat.
1
1,2,5
6
2,3,4
2
1,2,6
7
2,3,5
3
1,3,4
8
2,4,6
4
1,3,6
9
3,5,6
5
1,4,5
10
4,5,6
Table 2.7
Simulated size of nominal 5% Wald F- tests for BIB2
ρ
K-R&prop.
Satter
Contain
K-R ∗
†
††
0.25 0.0673††
0.0577
0.0830
0.0710††
0.5
0.0670††
0.0584†
0.0775††
0.0664††
1.0
0.0524
0.0633††
0.0604††
0.0552†
2.0
0.0491
0.0515
0.0497
0.0606††
†
4.0
0.0474
0.0482
0.0475
0.0579
∗ before modification. S.E. of entries ∈ (0.0019,0.0025)
116
Table 2.8
Mean of estimated denominator degrees of freedom for BIB2
ρ
K-R&prop.
Satter
Contain
K-R ∗
0.25 29.4561
20.5425
20.5425
15
0.5 28.5419
19.6163
19.6163
15
1.0 26.3744
17.4069
17.4069
15
2.0 24.7693
15.7620
15.7620
15
4.0 24.2247
15.2012
15.2012
15
∗ before modification.
Table 2.9
Mean of estimated scales for BIB2
ρ
K-R
K-R ∗
0.25 0.9761 1.0000
0.5 0.9749 1.0000
1.0 0.9701 1.0000
2.0 0.9653 1.0000
4.0 0.9632 1.0000
∗ before modification.
Table 2.10
Percentage relative bias in the variance estimates,
convergence rate (CR) and efficiency for BIB2
ρ
CR
E
Φ̂
Φ̂A
0.25 -12.0
-2.1
0.9977 0.9684
0.5 -7.4
0.2
0.9991 0.9143
1.0 -5.3
-2.1
1.0000 0.8500
2.0 -2.6
-2.0
1.0000 0.8154
4.0 -0.5
- 0.5
1.0000 0.8041
III. In the third design (BIB3), we have ten blocks, six treatments, and three treatments
per block (t = 7, s = 7, k = 4, and n = 28) , q = 0.8750
117
Table 2.11
BIB3 (t = 7, s = 7, k = 4, and n = 28)
Block Treat.
1
3,5,6,7
2
1,4,6,7
3
1,2,5,7
4
1,2,3,6
5
2,3,4,7
6
1,3,4,5
7
2,4,5,6
Table 2.12
Simulated size of nominal 5% Wald F- tests for BIB3
ρ
K-R&prop. Satter
Contain
K-R ∗
†
††
0.25 0.0569
0.0472
0.0697
0.0769††
0.5
0.0508
0.0613††
0.0660††
0.0644††
1.0
0.0494
0.0524
0.0607 ††
0.0553†
2.0
0.0450
0.0470
0.0454
0.0581†
4.0
0.0455
0.0456
0.0456
0.0570†
∗ before modification. S.E. of entries ∈ (0.0021,0.0027)
Table 2.13
Mean of estimated denominator degrees of freedom for BIB3
ρ
K-R&prop.
Satter
Contain
K-R ∗
0.25 26.3479
17.6322
17.6218
15
0.5 26.2043
17.3494
17.3468
15
1.0 25.1420
16.2466
16.2464
15
2.0 24.3069
15.3868
15.3868
15
4.0 24.0273
15.0992
15.0992
15
∗ before modification.
Table 2.14
Mean of estimated scales for BIB3
ρ
K-R
K-R ∗
0.25 0.9549 1.0000
0.5 0.9653 1.0000
1.0 0.9667 1.0000
2.0 0.9643 1.0000
4.0 0.9632 1.0000
∗ before modification.
118
Table 2.15
Percentage relative bias in the variance estimates,
convergence rate (CR) and efficiency for BIB3
ρ
CR
E
Φ̂
Φ̂
A
0.25
0.5
1.0
2.0
4.0
-10.8
-8.9
-0.4
-1.3
-2.0
3.9
-0.5
2.2
-0.9
-1.9
1.0000
1.0000
1.0000
1.0000
1.0000
0.9750
0.9357
0.9000
0.8824
0.8769
IV. In the fourth design (BIB4), we have thirty six blocks, nine treatments, and two
treatments per block (t = 9, s = 36, k = 2, and n = 72) , and q = 0.5625
Block
1
2
3
4
5
6
7
8
9
Table 2.16
BIB4 (t = 9, s = 36, k = 2, and n = 72)
Treat. Block Treat. Block Treat. Block
1,2
10
1,3
19
1,4
28
2,8
11
2,5
20
2,6
29
3,4
12
3,6
21
2,3
30
4,7
13
4,9
22
4,5
31
5,6
14
5,8
23
5,7
32
1,6
15
6,7
24
6,8
33
3,7
16
1,7
25
7,9
34
8,9
17
4,8
26
1,8
35
5,9
18
2,9
27
3,9
36
Table 2.17
Simulated size of nominal 5% Wald F- tests for BIB4
ρ
K-R&prop.
Satter
Contain
K-R ∗
†
†
††
0.25 0.0578
0.0520
0.0566
0.0731
†
†
††
0.5 0.0587
0.0521
0.0562
0.0709
†
††
1.0 0.0555
0.0524
0.0501
0.0617
†
†
2.0 0.0598
0.0536
0.0525
0.0592
†
†
4.0 0.0599
0.0547
0.0546
0.0557
∗ before modification. S.E. of entries ∈ (0.0022,0.0026)
Treat.
1,5
2,4
3,8
4,6
3,5
6,9
2,7
7,8
1,9
119
Table 2.18
Mean of estimated denominator degrees of freedom for BIB4
ρ
K-R&prop.
Satter
Contain
K-R ∗
0.25 66.8712
58.3711
58.3711
28
0.5 63.3018
54.7945
54.7945
28
1.0 52.5089
43.9746
43.9746
28
2.0 41.8840
33.3083
33.3083
28
4.0 38.0104
29.4118
29.4118
28
∗ before modification.
Table 2.19
Mean of estimated scales for BIB4
ρ
K-R
K-R ∗
0.25 0.9966 1.0000
0.5 0.9961 1.0000
1.0 0.9942 1.0000
2.0 0.9906 1.0000
4.0 0.9883 1.0000
∗ before modification.
Table 2.20
Percentage relative bias in the variance estimates,
convergence rate (CR) and efficiency ( E ) for BIB4
ρ
CR
E
Φ̂
Φ̂A
0.25 -7.5
-2.0
1.0000 0.9514
0.5 -5.9
-0.8
1.0000 0.8542
1.0 -4.2
-0.7
1.0000 0.7083
2.0 -0.7
0.5
1.0000 0.6111
4.0 0.9
1.1
1.0000 0.5758
V. In the fifth design (BIB5), we have ten blocks, six treatments, and three treatments per
block (t = 9, s = 18, k = 4, and n = 72) , and q = 0.8438
120
BIB5 design
Block Treat.
1
1,4,6,7
2
2,6,8,9
3
1,3,8,9
4
1,2,3,4
5
1,5,7,8
6
4,5,6,9
Table 2.21
(t = 9, s = 18, k = 4, and n = 72)
Block Treat.
Block Treat.
7
2,3,6,7 13
1,2,4,9
8
2,4,5,8 14
1,5,6,9
9
3,5,7,9 15
1,3,6,8
10
1,2,5,7 16
4,6,7,8
11
2,3,5,6 17
3,4,5,8
12
3,4,7,9 18
2,7,8,9
Table 2.22
Simulated size of nominal 5% Wald F- tests for BIB5
ρ
K-R&prop. Satter
Contain
K-R ∗
††
0.25 0.0546
0.0532
0.0612
0.0585†
0.5 0.0534
0.0523
0.0546
0.0577†
1.0 0.0518
0.0494
0.0525
0.0514
2.0 0.0498
0.0468
0.0476
0.0480
4.0 0.0505
0.0472
0.0475
0.0480
∗ before modification.S.E. of entries ∈ (0.0021,0.0024)
Table 2.23
Mean of estimated denominator degrees of freedom for BIB5
ρ
K-R&prop.
Satter
Contain
K-R ∗
0.25 66.4586
57.9582
57.9582
46
0.5 63.1346
54.6281
54.6281
46
1.0 58.2671
49.7502
49.7502
46
2.0 55.6517
47.1284
47.1284
46
4.0 54.8206
46.2951
46.2951
46
∗ before modification.
Table 2.24
Mean of estimated scales for BIB5
ρ
K-R
K-R ∗
0.25 0.9966 1.0000
0.5 0.9962 1.0000
1.0 0.9955 1.0000
2.0 0.9950 1.0000
4.0 0.9949 1.0000
∗ before modification.
121
Table 2.25
Percentage relative bias in the variance estimates,
convergence rate (CR) and efficiency ( E ) for BIB5
ρ
CR
E
Φ̂
Φ̂A
0.25
-3.0
0.8
1.0000 0.9688
0.5
-2.4
0.0
1.0000 0.9219
1.0
-1.7
-0.8
1.0000 0.8750
2.0
-0.9
-0.7
1.0000 0.8529
4.0
-0.6
-0.6
1.0000 0.8462
VI. In the sixth design (BIB6), where we have twelve blocks, nine treatments, and six
treatments per block (t = 9, s = 12, k = 6, and n = 72) , and q = 0.9375
Table 2.26
BIB6 (t = 9, s = 12, k = 6, and n = 72)
Block
Treat.
Block
Treat.
Block
Treat.
Block
Treat.
1
4,5,6,7,8,9
4
2,3,4,5,7,9
7
1,3,4,5,8,9
10
1,2,4,5,7,8
2
2,3,5,6,8,9
5
1,3,5,6,7,8
8
1,2,5,6,7,9
11
1,2,3,7,8,9
3
2,3,4,6,7,8
6
1,3,4,6,7,9
9
1,2,4,6,8,9
12
1,2,3,4,5,6
Table 2.27
Simulated size of nominal 5% Wald F- tests for BIB6
ρ
K-R&prop.
Satter
Contain
K-R ∗
0.25 0.0496
0.0477
0.0543
0.0519
0.5
0.0503
0.0483
0.0518
0.0503
1.0
0.0517
0.0501
0.0512
0.0501
2.0
0.0527
0.0509
0.0511
0.0510
4.0
0.0526
0.0505
0.0505
0.0505
∗ before modification. S.E. of entries ∈ (0.0021,0.0023)
Table 2.28
Mean of estimated denominator degrees of freedom for BIB6
ρ
K-R&prop.
Satter
Contain
K-R ∗
0.25 65.7861
57.2847
57.2847
52
0.5 63.6891
55.1839
55.1839
52
1.0 61.7011
53.1920
53.1920
52
2.0 60.8480
52.3371
52.3371
52
4.0 60.5985
52.0870
52.0870
52
∗ before modification.
122
Table 2.29
Mean of estimated scales for BIB6
ρ
K-R
K-R ∗
0.25 0.9965 1.0000
0.5 0.9963 1.0000
1.0 0.9960 1.0000
2.0 0.9959 1.0000
4.0 0.9959 1.0000
∗ before modification.
Table 2.30
Percentage relative bias in the variance estimates,
convergence rate (CR) and efficiency ( E ) for BIB6
ρ
CR
E
Φ̂
Φ̂A
0.25
-1.5
0.7
1.0000 0.9750
0.5
0.2
1.3
1.0000 0.9357
1.0
-0.2
0.1
1.0000 0.9000
2.0
-0.4
-0.4
1.0000 0.8824
4.0
-0.8
-0.8
1.0000 0.8769
8.2.3 Complete Block Designs with Missing Data
As we mentioned in the introduction of this chapter, two observations are in each
of the following designs; one is from the first block under the first treatment, and the
other is from the second block under the second treatments.
I. Consider a complete block design with five blocks and four treatments (RCB1),
q = 0.9477 .
ρ
0.25
0.5
1.0
2.0
4.0
Table 3.1
Simulated size of nominal 5% Wald F- tests for RCB1
K-R
Prop.1
Prop.2
Satter
Contain
K-R ∗
††
†
†
†
†
0.0696
0.0557
0.0557
0.0557
0.0591
0.0562†
0.0691†† 0.0545 0.0546 0.0545
0.0575† 0.0558†
0.0631†† 0.0490 0.0490 0.0490 0.0508 0.0498
0.0622†† 0.0494 0.0494 0.0494 0.0498 0.0497
0.0622†† 0.0490 0.0490 0.0490 0.0492 0.0491
∗ before modification. S.E. of entries ∈ (0.0022,0.0025)
Table 3.2
123
Mean of estimated denominator degrees of freedom for RCB1
ρ
K-R
Prop.1
Prop.2
Satter
Contain
K-R ∗
0.25 20.3740 10.7678 10.7747 10.7699 10.7542
10
0.5 20.1975 10.5812 10.5863 10.5828 10.5708
10
1.0 19.9325 10.3008 10.3029 10.3014 10.2960
10
2.0 19.7490 10.1050 10.1054 10.1051 10.1039
10
4.0 19.6793 10.0296 10.0297 10.0297 10.0295
10
∗ before modification.
Table 3.3
Mean of estimated scales for RCB1
ρ
K-R
Prop.1 Prop.2
K-R ∗
0.25 0.9348 0.9997 0.9996 0.9997
0.5 0.9332 0.9998 0.9997 0.9998
1.0 0.9307 0.9999 0.9999 0.9999
2.0 0.9288 1.0000 1.0000 1.0000
4.0 0.9281 1.0000 1.0000 1.0000
∗ before modification.
Table 3.4
Percentage relative bias in the variance estimates,
convergence rate (CR) and efficiency ( E ) for RCB1
ρ
CR
E
Φ̂
Φ̂A
0.25
0.5
1.0
2.0
4.0
-3.0
-2.3
-1.0
-0.6
-0.2
-0.5
-0.5
-0.3
0.0
-0.2
0.9939
0.9972
0.9992
0.9998
1.0000
0.9814
0.9706
0.9577
0.9508
0.9485
II. Consider a complete block design with seven blocks and six treatments (RCB2),
q = 0.9839
Table 3.5
Simulated size of nominal 5% Wald F- tests for RCB2
ρ
K-R
Prop.1 Prop.2 Satter Contain
K-R ∗
0.25 0.0564† 0.0521 0.0521 0.0521 0.0531 0.0525
0.5 0.0510 0.0466 0.0466 0.0466 0.0472 0.0472
1.0 0.0576† 0.0524 0.0524 0.0524 0.0528 0.0526
2.0 0.0585† 0.0525 0.0525 0.0525 0.0525 0.0525
4.0 0.0585† 0.0527 0.0527 0.0527 0.0527 0.0527
∗ before modification. S.E. of entries ∈ (0.0021,0.0023)
Table 3.6
124
Mean of estimated denominator degrees of freedom for RCB2
ρ
K-R
Prop.1
Prop.2
Satter
Contain
K-R ∗
0.25 37.3202 28.5050 28.5173 28.5076 28.4925
28
0.5 37.1337 28.3205 28.3266 28.3218 28.3143
28
1.0 36.9442 28.1321 28.1335 28.1324 28.1307
28
2.0 36.8508 28.0386 28.0387 28.0386 28.0384
28
4.0 36.8226 28.0102 28.0102 28.0102 28.0102
28
∗ before modification.
Table 3.7
Mean of estimated scales for RCB2
ρ
K-R
Prop.1 Prop.2
K-R ∗
0.25 0.9873 1.0000 0.9999 1.0000
0.5 0.9872 1.0000 1.0000 1.0000
1.0 0.9871 1.0000 1.0000 1.0000
2.0 0.9870 1.0000 1.0000 1.0000
4.0 0.9870 1.0000 1.0000 1.0000
∗ before modification.
Table 3.8
Percentage relative bias in the variance estimates,
convergence rate (CR) and efficiency ( E ) for RCB2
ρ
CR
E
Φ̂
Φ̂A
0.25
0.5
1.0
2.0
4.0
-0.4
-0.5
-1.6
0.9
0.7
0.0
-0.3
-1.6
0.9
0.7
0.9995
0.9998
1.0000
1.0000
1.0000
0.9922
0.9887
0.9857
0.9844
0.9840
8.3 Comments and Conclusion
The bias of the conventional estimator of the variance-covariance matrix of the
fixed effects estimator is large and negative in most cases as expected especially with
small values of ρ (e.g., table 2.5 and table 2.10). The bias was found to be effected by
three factors: q, ρ , and n. When q, ρ , and/or n. get larger, the bias decreases. This bias
is reduced to an acceptable level by using the adjusted estimator. As expected, when the
between-block variance increased, the estimate of the denominator degrees of freedom
125
decreased (e.g., tables 1.3, 2.2, and 3.1). This is because that the contribution of the
information from between blocks to estimate the fixed effects and their covariance matrix
gets smaller. Notice that for all approaches, the denominator degrees of freedom
estimates approach the Containment method’s estimate as ρ increases. In other words,
as ρ increases, the optimal approximate test approaches the Intra-block exact F-test where
the between-block information is not utilized.
In general, the K-R before modification does not perform as well as K-R after
modification. Before modification, the K-R method tends to be more liberal than K-R
after modification (e.g., table 3.1 and 3.5). Even though the K-R approach using the
conventional estimator of the variance-covariance matrix of the fixed effects estimator
give reasonable true levels (table 1.2), the approach is not appropriate since the
denominator degrees of freedom and the scale estimates were found to be negative for
some data sets. For PBIB1 design, and when ρ =0.25, a data set has -120.8 as an estimate
of the denominator degrees of freedom.
The K-R and the proposed methods perform well and similarly and the similarity
gets stronger when ρ gets larger (for BIB designs, the K-R and proposed methods are
identical). Their true levels tend to be more conservative with designs where the
efficiency factor is higher, and this is related to the small remaining bias in the adjusted
estimator of variance-covariance matrix of the fixed effects estimator in most cases. This
bias is not eliminated in the BIB designs where the efficiency factor is small and this
makes the K-R and the proposed methods’ true levels more liberal (e.g., tables 2.5, 2.10
and 2.15). Indeed, they perform better than other approaches in most cases except a few
cases that will be mentioned below.
The Satterthwaite approach was found to perform poorly for designs with small
values of ρ , q and n where the true level tends to be too liberal (e.g., table 2.2). This
unwanted liberality of the true level is related to the negative large percentage relative
bias in the conventional estimator of the variance-covariance matrix of the fixed effects
estimator for the cases where we have small values of ρ , q and n (e.g., table 2.5).
When ρ , q and/or n get larger, the Satterthwaite method tends to be more conservative.
126
This is due to the relatively small values of the bias in the conventional estimate of the
estimator of the variance-covariance matrix of the fixed effects estimator (e.g., table 2.9).
Indeed, when q and ρ get larger, the Satterthwaite method tends to perform as well as
the K-R and the proposed methods regardless of the sample size (e.g., table 2.12).
For the cases where the Satterthwaite method performs poorly, the Containment
method was found to perform better than the Satterthwaite method generally (e.g., tables
1.2 and 2.2). Moreover, for the cases where the efficiency factors are large, and
ρ and/or n get larger, the performance of the Containment method improves in the way
that it becomes as good as the K-R and the proposed methods (e.g., tables 2.12 and 2.22).
For the cases where the efficiency factor is relatively small, we found the Containment
method’s performance tends to be better than all other approaches including the K-R and
the proposed methods when ρ is relatively small (e.g., table 2.2).
127
9. CONCLUSION
9.1 Summary
The strength of Kenward and Roger’s approach comes from combining three
components where no other approach does: First, instead of approximating the Wald-type
F statistic itself by an F distribution as in the Satterthwaite approximation, Kenward and
Roger considered a scaled form of the statistic that makes the approximation more
flexible since they approximate the denominator degrees of freedom and the scale as
well. Second, Kenward and Roger modified the procedure in such a way that they get the
right values for both the denominator degrees of freedom and the scale for the two special
cases considered in chapter 3. Third, the adjusted estimator of the variance-covariance
matrix of the fixed effects estimator, which is less biased than the conventional estimator,
as we saw in chapter 2, is used in constructing the Wald-type F statistic.
In this thesis, by modifying certain steps in Kenward and Roger’s derivation, we
obtain two alternative methods which are similar to the K-R method but simpler in
derivation and application. Whenever the ratio of the two quantities A1 and A2 derived in
chapter 3 is the same as for the two special cases, the K-R and the proposed methods are
identical. We also showed that the K-R and the proposed methods produce the right
values for the denominator degrees of freedom and the scale factor, not only in the
special cases, but also in three general models where we have exact F tests for fixed
effects.
A Wald-type statistic was constructed using the conventional, rather than the
adjusted, estimator of the variance-covariance matrix of the fixed effects estimator, and
then modified to attain the right values in the two special cases. Because A3 was found to
be null in all cases with exact F tests considered in this thesis, the modification step was
problematic and this is consistent with the remark made by Kenward and Roger (1997).
Another approach to obtain approximate F tests for fixed effects is the
Satterthwaite method. When a one-dimensional hypothesis is of interest, then the K-R,
the Satterthwaite, and the proposed methods produce the same estimate of the
128
denominator degrees of freedom, and the scale estimate is one. Even though the estimates
are the same, the Satterthaite and the K-R tests are not necessarily identical, and this is
true because in the K-R statistic, we use the adjusted estimator of the variance-covariance
matrix of the fixed effects estimator, whereas the conventional estimator is used in the
Satterthwaite test. However, in some cases like balanced mixed classification models
where the adjustment in the estimator of the variance-covariance matrix of the fixed
effects estimator is null, the Satterthwaite approach is identical to the K-R and the
proposed methods when testing a one-dimensional hypothesis. When we make inference
about a multidimensional hypothesis, we still have some cases where the K-R and the
Satterthwaite approaches produce the same estimate of the denominator degrees of
freedom, or are even identical approaches as in lemma 7.2.7.
In chapter 8, we conducted a simulation study for three kinds of block designs;
partially balanced block designs, balanced incomplete block designs, and complete block
designs with missing data. The K-R, the proposed, the Satterthwaite, and the
Containment methods were compared in the study. Three factors were considered in the
study: the variance components ratio, the efficiency factor, and the sample size. In most
cases, we found the K-R and the proposed methods performed as well as or better than
the other methods. For designs with small values for all the three factors, the
Satterthwaite method performed poorly. The true level was found to be quite liberal due
to the large bias in the conventional estimator of the variance-covariance matrix of the
fixed effects estimator. In some designs where the efficiency factor and the ratio of the
variance components are both relatively small, the K-R and the proposed methods were
found to be liberal. For those cases, the Containment method was found to work as well
as or better than all other approaches.
9.2 Future Research
In this thesis, some questions about the K-R and other methods used to make
inference about fixed effects in mixed linear models were answered. However, many
issues still need to be investigated. In chapter 3, we derived approximate expressions for
129
the expectation and variance of the Wald-type F statistic using the conventional
estimator of the variance-covariance of the fixed effects estimator. However, as we saw
in section 4.4, it was problematic to use the special cases to modify the approach since
A3 was null. As a matter of fact, it is also null in all three general models considered in
chapter 6. A possible direction for further research is to investigate if there is any case
where we have an exact F test and A3 is not null, so that a better modification can be
accomplished. Two different ratios A1 A2 were found in the two special cases and in the
three general models considered in chapter 6. It is suggested to search if there is any case
where we have an exact F test and the ratio A1 A2 is different than the two ratios we had.
In case another ratio is found, it is suggested to use it in the modifying stage for the K-R
and the proposed methods.
In balanced mixed classification models, and in the general models considered in
sections 6.2 and 6.3, we noticed that the adjustment of the estimator of the variancecovariance matrix of the fixed effects estimator is null. It would be interesting to
generalize these results.
When we make an inference about a multidimensional hypothesis of the fixed
effects, lemma 7.2.7 provide conditions where the K-R and the Satterthwaite approaches
produce the same estimate of the denominator degrees of freedom. It would be interesting
to investigate the possibility of other conditions that lead to the same result.
Finally, based on the theoretical derivation of the methods proposed in chapter 5,
and their simulation results for the block designs in chapter 8, these methods’
performances are expected to be comparable to the K-R method. A natural proposal is to
investigate the performance of these proposed methods in other different models.
130
BIBLIOGRAPHY
Anderson, T. (2003). An Introduction to Multivariate Statistical Analysis. New York:
John Wiley & Sons.
Bellavance, F., Tardif, S. and Stephens, M. (1996). Tests for the Analysis of Variance of
Crossover Designs with Correlated Errors. Biometrics 52, 607-612.
Birkes, D. (2004). Linear Model Theory Class Notes, Unpublished. Oregon State
University.
Birkes, D. and Lee, Y. (2003). Adjusted Orthogonal Block Structure in Mixed Linear
Models. Unpublished.
Birkes, D. (2006). Class Notes. Unpublished. Oregon State University.
Chen, X. and Wei, L. (2003). A Comparison of Recent Methods for the Analysis of
Small-Sample Cross-Over Studies. Statistics in Medicine 22, 2821-2833.
Cochran, W. and Cox, G. (1975). Experimental Designs. New York: John Wiley & Sons.
Fai, C and Cornelius, P. (1996). Approximate F Tests of Multiple Degree of Freedom
Hypotheses in Generalized Least Squares Analysis of Unbalanced Split-Plot
Experiments. Journal of Statistical Computation and Simulation 54, 363-378.
Giesbrecht, F. and Burns, J. (1985). Two-Stage Anaysis Based on a Mixed Model: LargeSample Asymptotic Theory and Small-Sample Simulation Results. Biometrics 41,
477-486
Gomes, E., Schaalje, G. and Fellingham, G. (2005). Performance of the Kenward-Roger
Method when the Covariance Structure is Selected Using AIC and BIC.
Communications in Statistics-Simulation and Computation 34, 377-392.
Graybill, F. and Wang, C. (1980). Confidence Intervals on Nonnegative Linear
Cominations of Variances. Journal of the American Statistical Association 75,
869-873.
Green, P. (2001). On the Design of Choice Experiments Involving Multifactor
Alternatives. Journal of Consumer Research 1, 61-68.
Harville, D. (1997). Matrix Algebra from a Statistician’s Prospective. New York:
Springer.
Harville, D. and Jeske, D. (1992). Mean Squared Error of Estimation or Prediction Under
a General Linear Model. Journal of the American Sataistical Association 87, 724731.
Huynh, H. and Feldt, L. (1970). Conditions Under Which Mean Square Ratios in
Repeated Measurements Designs Have Exact F-Distributions. Journal of the
American Statistical Association 65, 1582-1589.
131
Kackar, R. and Harville, D. (1984). Approximations for Standard Errors of Estimators of
Fixed and Random Effect in Mixed Linear Models. Journal of the American
Staistical Association 79, 853-862.
Kenward, M. and Roger, J. (1997). Small Sample Inference for Fixed Effects from
Restricted Maximum Likelihood. Biometrics 53, 983-997.
Krzanowski, W. (2000). Principles of Multivariate Analysis. Oxford, U.K.: Oxford
University Press.
Kuehl, R. (2000). Design of Experiments: Statistical Principles of Research Design and
Analysis. Pacific Grove: Duxbury Press.
Mardia, K., Kent, J. and Bibby, J. (1992). Multivariate Analysis. New York: Academic
Press.
Pase, L. and Salvan, A. (1997). Principles of Statistical Inference. New Jersey: World
Scientific.
Prasad, N. and Rao, J. (1990). The Estimation of the Mean Squared Error of Small-Area
Estimators. Journal of the American Statistical Association 85, 163-171.
Pukelsheim, F. (1993). Optimal Design of Experiments. New York: John Wiley & Sons.
Rady, E. (1986). Testing Fixed Effects in Mixed Linear Models. Ph.D. thesis,
Unpublished. Oregon State University.
Rao, C. (2002). Linear Statistical Inference and its Applications. New York: John Wiley
& Sons.
SAS Institute, Inc. (2002-2006), SAS version 9.1.3, Online Help, Cary, NC: SAS
Institute.
Satterthwaite, F. (1941). Synthesis of Variance. Psychometrika 6, 309-316.
Satterthwaite, F. (1946). Approximate Distribution of Estimates of Variance
Components. Biometrics 2, 110-114.
Savin, A., Wimmer, G. and Witkovsky, V. (2003). Measurement Science Review 3, 5356.
Schaalje, G., McBride, J. and Fellingham, G. (2002). Journal of Agricultural, Biological,
and Enviromental Statistics 7, 512-524.
Schott, J. (2005). Matrix Analysis for Statistics. New York: John Wiley & Sons.
Searle, S., Casella, G. and McCullouch, C. (1992). Variance Components. New York:
John Wiley & Sons.
Seely, J. (2002). Linear Model Theory Class Notes, Unpublished. Oregon State
University.
132
Seely, J. and Rady, E. (1988). When Can Random Effects Be Treated As Fixed Effects
For Computing A Test Statistic For A Linear Hypothesis? Communications in
Statistics- Theory and Methods 17, 1089-1109.
Siefert, B. (1979). Optimal Testing for Fixed Effects in General Balanced Mixed
Classification Models. Math. Operationsforsh. Statist. Ser. Statistics 10, 237-256.
Spike, J., Piepho, H. and Meyer, U. (2004). Approximating the Degrees of Freedom for
Contrasts of Genotypes Laid Out as Subplots in an Alpha-Design in a Split-Plot
Experiment. Plan Breeding 123, 193-197.
Utlaut, T., Birkes, D. and VanLeeuwen, D. (2001). Optimal Exact F Tests and ErrorOrthogonal Designs. Conference Proceeding of the 53rd ISI Session, Seoul 2001.
VanLeeuwen, D., Seely, J. and Birkes, D. (1998). Sufficient Conditions for Orthogonal
Designs in Mixed Linear Models. Journal of Statistical Planning and Inference 73,
373-389.
VanLeeuwen, D., Birkes, D. and Seely, J. (1999). Balance and Orthogonality in Designs
for Mixed Classification Models. The Annals of Statistics 27, 1927-1947.
Download