Chapter 2 General Linear Hypothesis and Analysis of

advertisement
Chapter 2
General Linear Hypothesis and Analysis of Variance
Regression model for the general linear hypothesis
Let Y1 , Y2 ,..., Yn be a sequence of n independent random variables associated with responses. Then
we can write it as
=
E (Yi )
p
=
β j xij , i 1, =
2,..., n, j 1, 2,..., p
∑
j =1
Var (Yi ) = σ 2 .
This is the linear model in the expectation form where β1 , β 2 ,..., β p are the unknown parameters and
xij ’s are the known values of independent covariates X 1 , X 2 ,..., X p .
Alternatively, the linear model can be expressed as
=
Yi
p
∑β
j =1
j
xij , +ε i=
, i 1, 2,..., n;=
j 1, 2,..., p
where ε i ’s are identically and independently distributed random error component with mean 0 and
variance σ 2 , i.e., E (ε i ) = 0 Var (ε i ) = σ 2 and Cov(ε i , ε=
0(i ≠ j ).
j)
In matrix notations, the linear model can be expressed as
=
Y Xβ +ε
where
•
Y = (Y1 , Y2 ,..., Yn ) ' is an n×1 vector of observations on response variable,
•
 X 11 X 12 ... X 1 p 


X 21 X 22 ... X 2 p 

the matrix X = 
is an n × p matrix of

  


 X X ... X 
np 
 n1 n 2
n observations on
p independent
covariates X 1 , X 2 ,..., X p ,
•
β = ( β1 , β 2 ,..., β p )′ is a
p × 1 vector of
unknown regression parameters (or regression
coefficients) β1 , β 2 ,..., β p associated with X 1 , X 2 ,..., X p , respectively and
•
ε = (ε1 , ε 2 ,..., ε n )′ is a n×1 vector of random errors or disturbances.
•
We assume that E (ε ) = 0, covariance matrix =
V (ε ) E=
(εε ') σ 2 I p , rank
=
(X ) p .
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
1
In the context of analysis of variance and design of experiments,
 the matrix X is termed as design matrix;
 unknown β1 , β 2 ,..., β p are termed as effects,
 the covariates X 1 , X 2 ,..., X p , are counter variables or indicator variables where xij counts
the number of times the effect β j occurs in the ith observation xi .

xij mostly takes the values 1 or 0 but not always.
 The value xij = 1 indicates the presence of effect β j in xi and

xij = 0 indicates the absence of effect β j in xi .
Note that in the linear regression model, the covariates are usually continuous variables.
When some of the covariates are counter variables and rest are continuous variables, then the
model is called as mixed model and is used in the analysis of covariance.
Relationship between the regression model and analysis of variance model
The same linear model is used in the linear regression analysis as well as in the analysis of variance.
So it is important to understand the role of linear model in the context of linear regression analysis
and analysis of variance.
Consider the multiple linear model
Y = β 0 + X 1β1 + X 2 β 2 + ... + X p β p + ε .
In the case of analysis of variance model,
• the one-way classification considers only one covariate,
• two-way classification model considers two covariates,
• three-way classification model considers three covariates and so on.
If β , γ and δ denote the effects associated with the covariates X , Z and W which are the counter
variables, then in
One-way model: Y =
α + Xβ +ε
Two-way model: Y =α + X β + Z γ + ε
Three-way model : Y =α + X β + Z γ + W δ + ε and so on.
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
2
Consider an example of agricultural yield. The study variable Y denotes the yield which depends on
various covariates X 1 , X 2 ,..., X p . In case of regression analysis, the covariates X 1 , X 2 ,..., X p are the
different variables like temperature, quantity of fertilizer, amount of irrigation etc.
Now consider the case of one-way model and try to understand its interpretation in terms of multiple
regression model. The covariate X is now measured at different levels, e.g., if X is the quantity of
fertilizer then suppose there are p possible values, say 1 Kg., 2 Kg., ,..., p Kg. then X 1 , X 2 ,..., X p
denotes these p values in the following way.
The linear model now can be expressed as
Y = β o + β1 X 1 + β 2 X 2 + ... + β p X p + ε by defining.
1 if effect of 1 Kg.fertilizer is present
X1 = 
0 if effect of 1 Kg.fertilizer is absent
1 if effect of 2 Kg.fertilizer is present
X2 = 
0 if effect of 2 Kg.fertilizer is absent

1 if effect of p Kg.fertilizer is present
Xp =
0 if effect of p Kg.fertilizer is absent.
If effect of 1 Kg. of fertilizer is present, then other effects will obviously be absent and the linear
model is expressible as
Y = β 0 + β1 ( X 1 = 1) + β 2 ( X 2 = 0) + ... + β p ( X p = 0) + ε
= β 0 + β1 + ε
If effect of 2 Kg. of fertilizer is present then
Y = β 0 + β1 ( X 1 = 0) + β 2 ( X 2 = 1) + ... + β p ( X p = 0) + ε
= β0 + β2 + ε
If effect of p Kg. of fertilizer is present then
Y = β 0 + β1 ( X 1 = 0) + β 2 ( X 2 = 0) + ... + β p ( X p = 1) + ε
= β0 + β p + ε
and so on.
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
3
If the experiment with 1 Kg. of fertilizer is repeated n1 number of times then n1 observation on
response variables are recorded which can be represented as
Y11 = β 0 + β1.1 + β 2 .0 + ... + β p .0 + ε11
Y12 = β 0 + β1.1 + β 2 .0 + ... + β p .0 + ε12

Y1n1 = β 0 + β1.1 + β 2 .0 + ... + β p .0 + ε1n1
If X 2 =1 is repeated n 2 times, then on the same lines n2 number of times then n2 observation on
response variables are recorded which can be represented as
Y21 = β 0 + β1.0 + β 2 .1 + ... + β p .0 + ε 21
Y22 = β 0 + β1.0 + β 2 .1 + ... + β p .0 + ε 22

Y2 n2 = β 0 + β1.0 + β 2 .1 + ... + β p .0 + ε 2 n2
The experiment is continued and if X p = 1 is repeated n p times, then on the same lines
Yp1 = β 0 + β1.0 + β 2 .0 + ... + β p .1 + ε P1
Yp 2 = β 0 + β1.0 + β 2 .0 + ... + β p .1 + ε P 2

Ypn p = β 0 + β1.0 + β 2 .0 + ... + β p .1 + ε pn p
All these n1 , n2 ,.., n p observations can be represented as
y 
 11  1
 y12  1

 


 
 y1n  1
 1  
 y21  1
y  
 22  1

 = 

 
y
 2 n2  1

 

 
 y p1  1

 1
 y p2  

 

 
 y pn  1
 p
ε 

 11 

 ε12 




 

 ε1n 

 1 

 ε 21 
  β0  

  β   ε 22 
  1  +  
   

    ε 2 n2 
 βp  
 




 ε p1 




 ε p2 
    
 



0 0 0 0 1 
 ε pn 
 p
1 0 0 0 0
1 0 0 0 0
    
1 0 0 0 0
0 1 0 0 0
0 1 0 0 0
    
0 1 0 0 0
    
0 0 0 0 1
0 0 0 0 1
or
=
Y X β + ε.
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
4
In the two-way analysis of variance model, there are two covariates and the linear model is
expressible as
Y β 0 + β1 X 1 + β 2 X 2 + ... + β p X p + γ 1Z1 + γ 2 Z 2 + ... + γ q Z q + ε
=
where X 1 , X 2 ,..., X p denotes , e.g., the p levels of quantity of fertilizer, say 1 Kg., 2 Kg.,...,p Kg.
and Z1 , Z 2 ,..., Z q denotes, e.g. , the q levels of level of irrigation, say 10 Cms., 20 Cms.,…10 q Cms.
etc. The levels X 1 , X 2 ,..., X p , Z1 , Z 2 ,..., Z q are the counter variable indicating the presence or
absence of the effect as in the earlier case. If the effect of X 1 and Z1 are present, i.e., 1 Kg of
fertilizer and 10 Cms. of irrigation is used then the linear model is written as
Y = β 0 + β1.1 + β 2 .0 + ... + β p .0 + γ 1.1 + γ 2 .0 + ... + γ p .0 + ε
= β 0 + β1 + γ 1 + ε .
If X 2 = 1 and Z 2 = 1 is used, then the model is
Y = β0 + β2 + γ 2 + ε .
The design matrix can be written accordingly as in the one-way analysis of variance case.
In the three-way analysis of variance model
Y = α + β1 X 1 + ... + β p X p + γ 1Z1 + ... + γ q Z q + δ1W1 + ... + δ rWr + ε
The regression parameters β ' s can be fixed or random.
•
If all β ' s are unknown constants, they are called as parameters of the model and the model is
called as a fixed effect model or model I. The objective in this case is to make inferences about
the parameters and the error variance σ 2 .
•
If for some j , xij = 1 for all i = 1, 2,..., n then β j is termed as additive constant. In this case, β j
occurs with every observation and so it is also called as general mean effect.
•
If all β ’s are observable random variables except the additive constant, then the linear model is
termed as random effect model, model II or variance components model. The objective in
this case is to make inferences about the variances of β ' s , i.e., σ β2 , σ β2 ,..., σ β2 and error
1
2
p
variance σ 2 and/or certain functions of them..
•
If some parameters are fixed and some are random variables, then the model is called as mixed
effect model or model III. In mixed effect model, at least one β j is constant and at least one
β j is random variable.
The objective is to make inference about the fixed effect parameters,
variance of random effects and error variance σ 2 .
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
5
Analysis of variance
Analysis of variance is a body of statistical methods of analyzing the measurements assumed to be
structured as
y=
β1 xi1 + β 2 xi 2 + ... + β p xip + ε i , =
i 1, 2,..., n
i
where xij are integers, generally 0 or 1 indicating usually the absence or presence of effects β j ; and
ε i ’s are assumed to be identically and independently distributed with mean 0 and variance σ 2 . It
may be noted that the ε i ’s can be assumed additionally to follow a normal distribution N (0, σ 2 ) . It
is needed for the maximum likelihood estimation of parameters from the beginning of analysis but
in the least squares estimation, it is needed only when conducting the tests of hypothesis and the
confidence interval estimation of parameters. The least squares method does not require any
knowledge of distribution like normal upto the stage of estimation of parameters.
We need some basic concepts to develop the tools.
Least squares estimate of β :
Let y1 , y2 ,..., yn be a sample of observations on Y1 , Y2 ,..., Yn . The least squares estimate of β is the
values β̂ of β for which the sum of squares due to errors, i.e.,
n
ε ' ε ( y − X β )′( y − X β )
S2 =
∑ ε i2 ==
i =1
=
y′y − 2 X ' y + β ′ X ′X β
is minimum where y = ( y1 , y2 ,..., yn )′ . Differentiating S 2 with respect to β and substituting it to
be zero, the normal equations are obtained as
dS 2
= 2 X ′X β − 2 X ′y= 0
dβ
or X ′X β = X ′y .
If X has full rank p, then
( X ′X ) has a unique inverse and the unique least squares estimate of
β is
βˆ = ( X ′X ) −1 X ′y
which is the best linear unbiased estimator of β in the sense of having minimum variance in the
class of linear and unbiased estimator. If rank of X is not full, then generalized inverse is used for
finding the inverse of ( X ′X ).
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
6
If L′β is a linear parametric function where L = (1 ,  2 ,...,  p )′ is a non-null vector, then the least
squares estimate of L′β is L′βˆ .
A question arises that what are the conditions under which a linear parametric function L′β admits
a unique least squares estimate in the general case.
The concept of estimable function is needed to find such conditions.
Estimable functions:
A linear function
λ ′β of the parameters with known λ is said to be an estimable parametric
function (or estimable) if there exists a linear function L′Y of Y such that
b
E ( L′Y ) = λ ′β for all β ∈ R .
Note that not all parametric functions are estimable.
Following results will be useful in understanding the further topics.
Theorem 1: A linear parametric function
L′β admits a unique least squares estimate if and only
if L′β is estimable.
Theorem 2 (Gauss Markoff theorem):
If the linear parametric function L′β
is estimable then the linear estimator L′βˆ
where β̂ is a
solution of
X ′X βˆ = X ′Y
is the best linear unbiased estimator of L ' β in the sense of having minimum variance in the class of
all linear and unbiased estimators of L′β .
Theorem 3: If the linear parametric function
=
φ1
'
l1=
β , φ2 l2' β ,...,
=
φk lk' β are estimable , then any
linear combination of φ1 , φ2 ,..., φk is also estimable.
Theorem 4: All linear parametric functions in β are estimable if and only if X has full rank.
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
7
If X is not of full rank, then some linear parametric functions do not admit the unbiased linear
estimators and nothing can be inferred about them. The linear parametric functions which are not
estimable are said to be confounded. A possible solution to this problem is to add linear restrictions
on β so as to reduce the linear model to a full rank.
Theorem 5: Let L1' β and L'2 β be two estimable parametric functions and let L1' βˆ and L'2 βˆ be
their least squares estimators. Then
Var ( L1' βˆ ) = σ 2 L1' ( X ′X ) −1 L1
Cov( L' βˆ , L' βˆ ) = σ 2 L' ( X ′X ) −1 L
1
2
1
2
assuming that X is a full rank matrix. If not, the generalized inverse of X ′X can be used in place of
unique inverse.
Estimator of σ 2 based on least squares estimation:
Consider an estimator of σ 2 as,
1
( y − X βˆ )′( y − X βˆ )
n− p
1
=
[ y − X ( X ′X ) −1 X ' y ]′[Y − X ( X ′X ) −1 X ′y ]
n− p
1
= y '[ I − X ( X ′X ) −1 X '][ I − X ( X ′X ) −1 X ′] y
n− p
1
=
y '[ I − X ( X ′X ) −1 X ′] y
n− p
σˆ 2 =
where the hat matrix
[ I − X ( X ′X ) −1 X ′] is an idempotent matrix with its trace as
tr [ I − X ( X ′X ) −1 ] =
trI − trX ( X ′X ) −1 X ′
=
n − tr ( X ′X ) −1 X ′X (using the result tr ( AB) =
tr ( BA))
= n − tr I p
= n − p.
Note that using E ( y ' Ay )= µ ' Aµ + tr ( AΣ) , we have
=
E (σˆ 2 )
σ2
n− p
tr[ I − X ( X ′X ) −1 X ′]
= σ2
and so σˆ 2 is an unbiased estimator of σ 2 .
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
8
Maximum Likelihood Estimation
The least squares method does not uses any distribution of the random variables in case of the
estimation of parameters. We need the distributional assumption in case of least squares only while
constructing the tests for hypothesis and the confidence intervals. For maximum likelihood
estimation, we need the distributional assumption from the beginning.
Suppose y1 , y2 ,..., yn are independently and identically distributed following a normal distribution
p
with mean E ( yi ) = ∑ β j xij and variance Var ( yi ) = σ 2 (i =1, 2,…, n). Then the likelihood function
j =1
of y1 , y2 ,..., yn is
L( =
y β ,σ 2 )
1
n
2
n
2 2
(2π ) (σ )
 1

exp  − 2 ( y − X β )′( y − X β ) 
 2σ

where y = ( y1 , y2 ,..., yn )′ . Then
n
n
1
− log 2π − log σ 2 −
L=
ln L( y β , σ 2 ) =
( y − X β )′( y − X β ).
2
2
2σ 2
Differentiating the log likelihood with respect to β
and σ 2 , we have
∂L
X ′y
=
0 ⇒ X ′X β =
∂β
∂L
1
=0 ⇒ σ 2 = ( y − X β )′( y − X β )
2
n
∂σ
Assuming the full rank of X , the normal equations are solved and the maximum likelihood
estimators are obtained as
β = ( X ′X ) −1 X ′y
1
σ 2 =( y − X β )′( y − X β )
n
1
=
y′  I − X ( X ′X ) −1 X ′ y.
n
The second order differentiation conditions can be checked and they are satisfied for β̂ and σ 2 to be
the maximum likelihood estimators.
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
9
Note that in the maximum likelihood estimator β is same as the least squares estimator β and
•
β is an unbiased estimator of β ,i.e., E ( β ) = β unlike the least squares estimator but
•
σ 2 is not an unbiased estimator of σ 2 ,i.e.,=
E (σ 2 )
n− p 2
σ ≠ σ 2 like the least squares
n
estimator.
Now we use the following theorems for developing the test of hypothesis.
Theorem 6: Let Y = (Y1 , Y2 ,..., Yn )′ follow a multivariate normal distribution
N ( µ , Σ) with mean
vector µ and positive definite covariance matrix Σ . Then Y ′AY follows a noncentral chi-square
distribution with p degrees of freedom and noncentrality parameter µ ′ Aµ , i.e., χ 2 ( p, µ ′ Aµ ) if and
only if ΣA is an idempotent matrix of rank p.
Theorem 7: Let Y = (Y1 , Y2 ,..., Yn )′ follows a multivariate normal distribution
N ( µ , ∑)
with
follows χ 2 ( p1 , µ ′ A1µ )
mean vector µ and positive definite covariance matrix Σ . Let Y ′AY
1
and Y ′A2Y follows
χ 2 ( p2 , µ ′ A2 µ ) .
Then
Y ′AY
and Y ′A2Y are independently distributed if
1
A1ΣA2 =
0.
Theorem 8: Let Y = (Y1 , Y2 ,..., Yn )′ follows a multivariate normal distribution N ( µ , σ 2 I ) , then
the maximum likelihood (or least squares) estimator L′βˆ of estimable linear parametric function
nσˆ 2
−1
ˆ
is independently distributed of σˆ ; Lβ follow N  L′β , L′( X ′X ) L  and
follows χ 2 (n − p )
2
σ
2
where rank ( X ) = p.
Proof: Consider βˆ = ( X ′X )−1 X ′Y , then
E ( Lβˆ ) = L( X ′X ) −1 X ′E (Y )
= L( X ′X ) −1 X ′X β
= Lβ
Var ( Lβˆ ) = L′Var ( βˆ ) L
= L′E ( βˆ − β )′( βˆ − β ) L
= σ 2 L′( X ′X ) −1 L.
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
10
Since β̂ is a linear function of y and L′βˆ is a linear function of β̂ , so L′βˆ follows a normal
distribution
N  Lβ , σ 2 L′( X ′X ) −1 L  . Let
A= I − X ( X ′X ) −1 X ′
and B = L( X ′X ) −1 X ′,
then
′( X ′X ) −1 X ′Y BY
=
L′βˆ L=
and
and nσˆ 2 =
(Y − X β ) '  I − X ( X ′X )−1 X ′ (Y − X β ) =
Y ' AY .
So, using Theorem 6 with rank(A) = n - p,
nσˆ 2
follows a χ 2 (n − p ) . Also
σ2
=
BA L′( X ′X ) −1 X ′ − L′( X ′X ) −1 X ′X ( X ′X ) −1 X ′
= 0.
So using Theorem 7, Y ' AY and BY are independently distributed.
Tests of Hypothesis in the Linear Regression Model
First we discuss the development of the tests of hypothesis concerning the parameters of a linear
regression model. These tests of hypothesis will be used later in the development of tests based on
the analysis of variance.
Analysis of Variance
The technique in the analysis of variance involves the breaking down of total variation
into
orthogonal components. Each orthogonal factor represents the variation due to a particular factor
contributing in the total variation.
Model
Let
Y1 , Y2 ,..., Yn be independently distributed following a normal distribution with mean
p
E (Yi ) = ∑ β j xij and variance
j =1
σ 2 . Denoting Y = (Y1 , Y2 ,..., Yn )′ a n×1 column vector, such
assumption can be expressed in the form of a linear regression model
=
Y Xβ +ε
where X is a n × p matrix, β is a p × 1 vector and ε
is a n×1 vector of disturbances with
E (ε ) = 0
Cov(ε ) = σ 2 I and ε follows a normal distribution.
This implies that
E (Y ) = X β
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
11
E (Y − X β )(Y − X β )′ =
σ 2I.
Now we consider four different types of tests of hypothesis .
In the first two cases, we develop the likelihood ratio test for the null hypothesis related to the
analysis of variance. Note that, later we will derive the same test on the basis of least squares
principle also. An important idea behind the development of this test is to demonstrate that the test
used in the analysis of variance can be derived using least squares principle as well as likelihood
ratio test.
Case 1: Consider the null hypothesis for testing
H0 : β = β 0
where β (=
=
β1 , β 2 ,..., β p )′, β 0 ( β10 , β 20 ,..., β p0 ) ' is specified and
σ 2 is unknown. This null
hypothesis is equivalent to
0
H=
β=
β 20 ,...,
=
β p β p0 .
0 : β1
1 , β2
Assume that all βi ' s are estimable, i.e., rank ( X ) = p (full column rank). We now develop the
likelihood ratio test.
The ( p + 1) × 1 dimensional parametric space Ω is a collection of points such that
=
Ω
{(β ,σ
2
2
); − ∞ < βi < ∞, σ=
> 0, i 1, 2,... p} .
Under H 0 , all β ’s ‘s are known and equal, say β 0 all are known and the Ω reduces to one
dimensional space given by
=
ω
{(β
0
, σ 2 ); σ 2 > 0} .
The likelihood function of y1 , y2 ,..., yn is
n
 1 2
 1

,σ ) 
exp  − 2 ( y − X β 0 )′( y − X β 0 ) 
L( y β=
2 
 2πσ 
 2σ

2
The likelihood function is maximum over Ω when β and σ 2 are substituted with their maximum
likelihood estimators, i.e.,
βˆ = ( X ′X ) −1 X ′y
1
n
σˆ 2 =( y − X βˆ )′( y − X βˆ ).
Substituting β̂ and σˆ 2 in L( y | β , σ 2 ) gives
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
12
n
 1 2
 1

,σ ) 
exp  − 2 ( y − X βˆ )′( y − X βˆ ) 
Max L( y β=
2 
Ω
 2πσˆ 
 2σˆ

2
n

2
n
 n
= 
 exp  −  .
ˆ
ˆ
 2
 2π ( y − X β )′( y − X β ) 
1
Under H 0 , the maximum likelihood estimator of σ 2 is σˆ 2 =
( y − X β 0 )′( y − X β 0 ).
n
The maximum value of the likelihood function under H 0 is
n
 1 2
 1

,σ ) 
exp  − 2 ( y − X β 0 )′( y − X β 0 ) 
Max L( y β=

2
ω
 2πσˆ 
 2σˆ

2
n
2


n
 n
exp  − 
= 
0
0 
 2
 2π ( y − X β )′( y − X β ) 
The likelihood ratio test statistic is
λ=
Max L( y β , σ 2 )
ω
Max L( y β , σ 2 )
Ω
n
 ( y − X βˆ )′( y − X βˆ )  2
=
0
0 
 ( y − X β )′( y − X β ) 
n

2
ˆ )′( y − X βˆ )
−
(
y
X
β

=
'

0
0
ˆ
ˆ
ˆ
ˆ


 ( y − X β ) + ( X β − X β )  ( y − X β ) + ( X β − X β )  
 ( βˆ − β 0 ) ' X ′X ( βˆ − β 0 ) 
= 1 +

( y − X βˆ )′( y − X βˆ ) 

 q 
= 1 + 1 
 q2 
−
−
n
2
n
2
where q2 =
( y − X βˆ )′( y − X βˆ )
and q =
( βˆ − β 0 )′ X ′X ( βˆ − β 0 ).
1
The expression of q1 and q2 can be further simplified as follows.
Consider
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
13
q1 =
( βˆ − β 0 )′ X ′X ( βˆ − β 0 )
( X ′X ) −1 X ′y − β 0 ′ X ′X ( X ′X ) −1 X ′y − β 0 
=
( X ′X ) −1 X ′( y − X β 0 ) ′ X ′X ( X ′X ) −1 X ′( y − X β 0 ) 
=
=
( y − X β 0 )′ X ( X ′X ) −1 X ′X ( X ′X ) −1 X ′( y − X β 0 )
=
( y − X β 0 )′ X ( X ′X ) −1 X ′( y − X β 0 )
q2 =
( y − X βˆ )′( y − X βˆ )
 y − X ( X ′X ) −1 X ′y ′  y − X ( X ′X ) −1 X ′y 
=
= y′  I − X ( X ′X ) −1 X ′ y
=
[( y − X β 0 ) + X β 0 ]′[ I − X ( X ′X ) −1 X '][( y − X β 0 )′ + X β 0 ]
( y − X β 0 )′[ I − X ( X ′X ) −1 X ′]( y − X β 0 )
=
Other two terms become zero using
[ I − X ( X ′X ) −1 X ′] X =
0
In order to find out the decision rule for H 0 based on λ , first we need to find if λ is a monotonic
increasing or decreasing function of
Let g =
q1
. So we proceed as follows:
q2
q1
, so that
q2
 q 
λ= 1 + 1 
 q2 
= (1 + g )
then
dλ
n
= −
dg
2
−
−
n
2
n
2
1
n
(1 + g ) 2
+1
So as g increases, λ decreases.
Thus λ is a monotonic decreasing function of
The decision rule is to reject
q1
.
q2
H 0 if λ ≤ λ0 where λ0 is a constant to be determined on the basis
of size of the test α . Let us simplify this in our context.
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
14
λ ≤ λ0
 q 
or 1 + 1 
 q2 
or
1
(1 + g )
n
2
−
n
2
≤ λo
≤ λo
−
or (1 + g ) ≥ λ0
−
2
n
2
or g ≥ λ0 n − 1
or g ≥ C
where C is a constant to be determined by the size α condition of the test.
So reject H 0 whenever
Note that the statistic
q1
≥ C.
q2
q1
q2
can also be obtained by the least squares method as follows. The least
squares methodology will also be discussed in further lectures.
( βˆ − β 0 )′ X ′X ( βˆ − β 0 )
q1 =
q1
=
Min( y − X β )′( y − X β )
ω
↓
sum
of
squares
due to
deviation
from H o
OR
sum of
squares due
to β
−
Min( y − X β )′( y − X β )
↓
sum
of
squares
due to H o
Ω
↓
sum
of
squares
due to error
OR
Totalsum of
squares
It will be seen later that the test statistic will be based on the ratio
distribution of
q1
. In order to find an appropriate
q2
q1
, we use the following theorem:
q2
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
15
Theorem 9: Let
Z= Y − X β 0
Q1 = Z ′X ( X ′X ) −1 X ' Z
=
Q2 Z ′[ I − X ( X ′X ) −1 X ′]Z .
Q1
Then
and
σ
Q2
σ2
2
and
Q2
σ
2
.
are independently distributed. Further, when H 0 is true , then
Q1
σ2
~ χ 2 ( p)
2
2
~ χ 2 (n − p ) where χ (m) denotes the χ distribution with ‘m’ degrees of freedom.
Proof: Under H 0 ,
E (Z ) = X β 0 − X β 0 = 0
( Z ) Var
(Y ) σ 2 I .
Var
=
=
Further Z is a linear function of Y and Y follows a normal distribution. So
Z ~ N (0, σ 2 I )
The matrices X ( X ′X ) −1 X ′ and [ I − X ( X ′X ) −1 X ′ ] are idempotent matrices. So
1
tr [ X ( X ′X ) −=
X ′ ] tr[( X ′X ) −1=
X ′X ] tr=
(I p ) p
tr [ I − X ( X ′X ) −1 X ′ ] =
tr I n − tr[ X ( X ′X ) −1 X ′] =
n− p
So using theorem 6, we can write that under H 0
Q1
σ
2
~ χ 2 ( p)
and
Q2
σ2
~ χ 2 (n − p)
where the degrees of freedom p and (n − p ) are obtained by the trace of  X ( X ′X ) −1 X ′ and trace
of  I − X ( X ′X ) −1 X ′ , respectively.
Since
 I − X ( X ′X ) −1 X ′  X ( X ′X ) −1 X ′ =
0,
So using theorem 7, the quadratic forms Q1 and Q2 are independent under H 0 .
Hence the theorem is proved.
Since Q 1 and Q 2 are independently distributed, so under H 0
Q1 / p
follows a central F distribution, i.e.
Q2 /(n − p )
 n − p  Q1

  F ( p, n − p).
p

 Q2
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
16
Hence the constant C in the likelihood ratio test statistic λ is given by
=
C F1−α ( p, n − p)
where F1−α (n1 , n2 ) denotes the upper 100α % points of F-distribution with n1 and n2 degrees of
freedom.
The computations of this test of hypothesis can be represented in the form of an analysis of variance
table.
ANOVA for testing H 0 : β = β 0
______________________________________________________________________________
Source of
Degrees
Sum of
Mean
F-value
variation
of freedom
squares
squares
______________________________________________________________________________
q1
 n − p  q1
q1
Due to β
p


p
 p  q2
Error
Total
n− p
n
q2
q2
(n − p)
( y − X β 0 )′( y − X β 0 )
Case 2: Test of a subset of parameters
0
H 0=
: β k β=
1, 2,.., r < p when β r +1 , β r + 2 ,..., β p and
k ,k
σ 2 are unknown.
In case 1 , the test of hypothesis was developed when all β ’s are considered in the sense that we test
0
for each=
βi β=
1, 2,..., p. Now consider another situation, in which the interest is to test only a
i , i
subset of β1 , β 2 ,..., β p , i.e., not all but only few parameters. This type of test of hypothesis can be
used, e.g., in the following situation. Suppose five levels of voltage are applied to check the
rotations per minute (rpm) of a fan at 160 volts, 180 volts, 200 volts, 220 volts and 240 volts. It can
be realized in practice that when the voltage is low, then the rpm at 160, 180 and 200 volts can be
observed easily. At 220 and 240 volts, the fan rotates at the full speed and there is not much
difference in the rotations per minute at these voltages. So the interest of the experimenter lies in
testing the hypothesis related to only first three effects, viz., β1 , for 160 volts, β 2 for 180 volts
and β 3 for 200 volts. The null hypothesis in this case can be written as:
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
17
0
0
0
β=
β=
β 30
H=
0 : β1
1 , β2
2 , β3
when β 4 , β5 and σ 2 are unknown.
Note that under case 1, the null hypothesis will be
0
0
0
0
0
β=
β=
β=
β=
β 50 .
H=
0 : β1
1 , β2
2 , β3
3 , β4
4 , β5
Let β 1 , β 2 ,..., β p be the p parameters.
We can divide them into two parts such that out of β 1 , β 2 ,..., β r , β r +1 ,..., β p and we are interested in
testing a hypothesis of a subset of it.
Suppose, We want to test the null hypothesis
0
H 0=
: β k β=
1, 2,.., r < p when β r +1 , β r + 2 ,..., β p and σ 2 are unknown.
k ,k
The alternative hypothesis under consideration is H1 : β k ≠ β k0 for at least one k =
1, 2,.., r < p.
In order to develop a test for such a hypothesis, the linear model
=
Y Xβ +ε
under the usual assumptions can be rewritten as follows:
 β (1) 
Partition X = ( X 1 X 2 ) , β = 

 β (2) 
where β (1) = ( β1 , β 2 ,..., β r )′ , β (2) = ( β r +1 , β r + 2 ,..., β p )′
with order as X 1 : n × r , X 2 : n × ( p − r ), β (1) : r ×1 and β (2) : ( p − r ) ×1.
The model can be rewritten as
=
Y Xβ +ε
 β (1) 
= ( X1 X 2 ) 
+ε
 β (2) 
= X 1β (1) + X 2 β (2) + ε
The null hypothesis of interest is now
0
H 0 : β=
β=
( β10 , β 20 ,..., β r0 ) where β (2) and σ 2 are unknown.
(1)
(1)
The complete parametric space is
=
Ω
{(β ,σ
2
2
); − ∞ < βi < ∞, σ=
> 0, i 1, 2,..., p}
and sample space under H 0 is
ω=
{(β
0
(1)
, β (2) , σ 2 ); −∞ < βi < ∞, σ 2 > 0, i= r + 1, r + 2,..., p} .
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
18
The likelihood function is
n
 1 2
 1

,σ 2 ) 
exp  − 2 ( y − X β )′( y − X β )  .
L( y β=
2 
 2πσ 
 2σ

The maximum value of likelihood function under Ω is obtained by substituting the maximum
likelihood estimates of β and σ 2 , i.e.,
βˆ = ( X ′X ) −1 X ′y
1
n
σˆ 2 =( y − X βˆ )′( y − X βˆ )
as
n
 1 2
 1

,σ 2 ) 
exp  − 2 ( y − X βˆ )′( y − X βˆ ) 
Max L( y β=

2
Ω
 2πσˆ 
 2σˆ

n

2
n
 n
exp  −  .
= 

'
 2
 2π ( y − X βˆ ) ( y − X βˆ ) 
Now we find the maximum value of likelihood function under H 0 . The model under H 0 becomes
0
+ X 2 β 2 + ε . The likelihood function under H 0 is
Y = X 1β (1)
n
 1 2
 1

0
0
exp  − 2 ( y − X 1β (1)
− X 2 β (2) )′( y − X 1β (1)
− X 2 β (2) ) 
L( y =
β ,σ 2 ) 
2 
 2πσ 
 2σ

n
 1 2
 1

exp  − 2 ( y * − X 2 β (2) )′( y * − X 2 β (2) ) 
= 
2 
 2πσ 
 2σ

(0)
where y*= y − X 1β (1)
. Note that β (2) and σ 2 are the unknown parameters . This likelihood function
looks like as if it is written for y* ~ N ( X 2 β (2) , σ 2 ).
This helps is writing the maximum likelihood estimators of β (2) and σ 2 directly as
βˆ(2) = ( X 2' X 2 ) −1 X 2' y *
1
n
( y * − X 2 βˆ(2) )′( y * − X 2 βˆ(2) ).
σˆ 2 =
.
Note that X 2' X 2 is a principal minor of X ′X . Since X ′X is a positive definite matrix, so X 2' X 2 is
also positive definite . Thus ( X 2' X 2 ) −1 exists and is unique.
Thus the maximum value of likelihood function under H 0 is obtained as
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
19
n
 1 2
 1

exp  − 2 ( y * − X 2 βˆ(2) )′( y * − X 2 βˆ(2) ) 
Max L( y * βˆ ,=
σ ) 
2 
ω
 2πσˆ 
 2σˆ

2
n

2
n
 n

 exp  − 
ˆ
ˆ 

 2
 2π ( y * − X 2 β (2) ) '( y * − X 2 β (2) ) 
0
The likelihood ratio test statistic for H 0 : β (1) = β (1)
is
λ=
max L( y β , σ 2 )
ω
max L( y β , σ 2 )
Ω


( y − X βˆ )′( y − X βˆ )
=

ˆ
ˆ
 ( y * − X 2 β (2) )′( y * − X 2 β (2) ) 
n
2
 ( y * − X 2 βˆ(2) )′( y * − X 2 βˆ(2) ) − ( y − X βˆ )′( y − X βˆ ) + ( y − X βˆ )′( y − X βˆ ) 
=

( y − X βˆ )′( y − X βˆ )


 ( y * − X 2 βˆ(2) )′( y * − X 2 βˆ(2) ) − ( y − X βˆ )′( y − X βˆ ) 
= 1 +

( y − X βˆ )′( y − X βˆ )


-n
 q1  2
= 1 + 
 q2 
-
- n2
n
2
where q1 = ( y * − X 2 βˆ(2) )′( y * − X 2 βˆ(2) ) − ( y − X βˆ )′( y − X βˆ ) and q2 =
( y − X βˆ )′( y − X βˆ ).
Now we simplify q1 and q2 .
Consider
( y * − X 2 βˆ(2) )′( y * − X 2 βˆ(2) )′ =
= ( y * − X 2 ( X 2' X 2 ) −1 X 2' y*)′( y * − X 2 ( X 2' X 2 ) −1 X 2' y*)
= y *'  I − X 2 ( X 2' X 2 ) −1 X 2'  y *
'
0
0
( y − X 1β (1)
=
− X 2 β (2) ) + X 2 β (2)   I − X 2 ( X 2' X 2 ) −1 X 2'  ( y − X 1β (1)
− X 2 β (2) ) + X 2 β (2) 
0
0
= ( y − X 1β (1)
− X 2 β (2) )′  I − X 2 ( X 2' X 2 ) −1 X 2'  ( y − X 1β (1)
− X 2 β (2) ).
0.
The other terms becomes zero using the result X 2'  I − X 2 ( X 2' X 2 ) −1 X 2'  =
0
0
Note that under H 0 , X 1β (1)
+ X 2 β (2) can be expressed as ( X 1 X 2 )( β (1)
β (2) ) ' ,
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
20
Consider
( y − X βˆ )′( y − X βˆ )′ =
= ( y − X ( X ' X ) −1 X ' y )′( y − X ( X ' X ) −1 X ' y )
= y′  I − X ( X ′X ) −1 X ' y
′
0
0
0
0
− X 2 βˆ(2) ) + X 1β (1)
+ X 2 β (2) )   I − X ( X ' X ) −1 X ′ ( y − X 1β (1)
− X 2 βˆ(2) ) + X 1β (1)
+ X 2 β (2) ) 
= ( y − X 1β (1)
0
0
− ( y − X 1β (1)
− X 2 β (2) )  I − X ( X ′X ) −1 X ′ ( y − X 1β (1)
− X 2 β (2) )
0
0
− X 2 β (2) ) '  I − X ( X ′X ) −1 X ′ ( y − X 1β (1)
− X 2 β (2) )
= ( y − X 1β (1)
0.
and other term becomes zero using the result X '  I − X ( X ′X ) −1 X ′ =
0
Note that under H 0 , the term X 1β (1)
+ X 2 β (2) can be expressed as ( X 1
0
X 2 )( β (1)
β (2) ) ' .
Thus
q1 = ( y * − X 2 βˆ(2) )′( y * − X 2 βˆ(2) )′ − ( y − X βˆ )′( y − X βˆ )′
= y *'  I − X 2 ( X 2′ X 2 ) −1 X 2′  y * − y '  I − X ( X ′X ) −1 X ′ y
0
0
=−
− X 2 β (2) )′  I − X 2 ( X 2' X 2 ) −1 X 2'  ( y − X 1β (1)
− X 2 β (2) )
( y X 1β (1)
0
0
− ( y − X 1β (1)
− X 2 β (2) ) '  I − X ( X ′X ) −1 X ′ ( y − X 1β (1)
− X 2 β (2) )
0
0
=−
( y X 1β (1)
− X 2 β (2) )′  X ( X ′X ) −1 X ′ − X 2 ( X 2' X 2 ) −1 X 2'  ( y − X 1β (1)
− X 2 β (2) )
( y − X βˆ )′( y − X βˆ )
q2 =
= y′ [ I − X ( X ′X )′ X ′] y
0
0
( y − X 1β (1)
=
− X 2 β (2) ) + ( X 1β (1)
+ X 2 β (2) )  [ I − X ( X ′X )′ X ′]
0
0
× ( y − X 1β (1)
− X 2 β (2) ) + ( X 1β (1)
+ X 2 β (2) ) 
'
0
0
=−
− X 2 β (2) ) '  I − X ( X ′X ) −1 X ′ ( y − X 1β (1)
− X 2 β (2) ).
( y X 1β (1)
Other terms become zero. Note that in simplifying the terms q1 and q2 , we tried to write them in the
0
quadratic form with same variable ( y − X 1β (1)
− X 2 β (2) ).
Using the same argument as in the case 1, we can say that since λ
function of
is a monotonic decreasing
q1
, so the likelihood ratio test rejects H 0 whenever
q2
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
21
q1
>C
q2
where C is a constant to be determined by the size α of the test.
The likelihood ratio test statistic can also be obtained through least squares method as follows:
0
(q1 + q2 ) : Minimum value of ( y − X β )′( y − X β ) when H 0 : β (1) = β (1)
holds true.
: Sum of squares due to H 0
q2
: Sum of squares due to error.
q1
: Sum of squares due to the deviation from H 0 or sum of squares due to β (1) adjusted
for β (2) .
0
If β (1)
= 0 then
q1 = ( y − X 2 βˆ(2) ) '( y − X 2 βˆ(2) ) − ( y − X βˆ ) '( y − X βˆ )
'
X 2' y ) − ( y′y − βˆ ′ X ′y )
= ( y′y − βˆ(2)
= βˆ ′ X ′y
'
X 2' y.
βˆ(2)
−
↓
↓
sum of squares
due to β (2)
Reduction
sum of squares
or
sum of squares
due to β
ignoring β (1)
Now we have the following theorem based on the Theorems 6 and 7.
Theorem 10: Let
0
Z=
Y − X 1β (1)
− X 2 β (2)
Q1 = Z ′AZ
Q2 = Z ′BZ
=
where A X ( X ′X ) −1 X '− X 2 ( X 2' X 2 ) −1 X 2'
B= I − X ( X ′X ) −1 X '.
Q1
Q2
Q1
Q2
~ χ 2 (n − p ).
σ2
σ
σ
σ
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
Then
2
and
2
are independently distributed. Further
2
~ χ 2 (r ) and
22
Thus under H 0 ,
Q1 / r
n − p Q1
follow a F-distribution F (r , n − p ).
=
Q2 /(n − p )
r Q2
Hence the constant C in λ is
=
C F1−α (r , n − p ),
where F1−α (r , n − p ) denotes the upper α % points on F-distribution with r and (n − p ) degrees of
freedom.
The analysis of variance table for this null hypothesis is as follows:
0
ANOVA for testing H 0 : β (1) = β (1)
Source of
Variation
Degrees
of
Freedom
Sum of
squares
Due to β (1)
r
q1
Error
n− p
q2
Total
n − ( p − r)
Mean
squares
q1
r
q2
(n − p)
F-value
 n − p  q1


 r  q2
q1 + q2
Case 3: Test of H 0 : L′β = δ
Let us consider the test of hypothesis related to a linear parametric function. Assuming that the linear
parameter function L′β is estimable where L = (1 ,  2 ,...,  p )′ is a p × 1 vector of known constants
and β = ( β1 , β 2 ,..., β p )′ . The null hypothesis of interest is
H 0 : L′β = δ .
where δ is some specified constant.
2
Consider the set up of linear model=
Y X β + ε where Y = (Y1 , Y2 ,..., Yn )′ follows N ( X β , σ I ). The
maximum likelihood estimators of β and σ 2 are
βˆ = ( X ′X ) −1 X ′y
1
and σˆ 2 =( y − X βˆ )′( y − X βˆ )
n
respectively. The maximum likelihood estimate of estimable L′β is L′βˆ , with
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
23
E ( L ' βˆ ) = L′β
Cov( L′βˆ ) = σ 2 L′( X ′X ) −1 L
Lβˆ ~ N  L′β , σ 2 L′( X ′X ) −1 L 
and
nσˆ 2
~ χ 2 (n − p)
σ2
assuming X to be the full column rank matrix. Further,
nσˆ 2
are also independently
L′βˆ and
2
σ
distributed.
Under H 0 : L′β = δ , the statistic
(n − p )( L′βˆ − δ )
t=
nσˆ 2 L′( X ′X ) −1 L
follows a t-distribution with
(n − p ) degrees of freedom. So the test for
H 0 : L′β = δ against
H1 : L′β ≠ δ rejects H 0 whenever
t ≥t
1−
α
(n − p)
2
where t1−α (n1 ) denotes the upper α % points on t-distribution with n1 degrees of freedom.
: φ1 δ=
δ 2 ,...,
=
φk δ k
Case 4: Test of H 0=
1 , φ2
Now we develop the test of hypothesis related to more than one linear parametric functions. Let the
ith estimable linear parametric function is
φi = L'i β and there are k such functions with Li and β both being p × 1 vectors as in the Case 3.
Our interest is to test the hypothesis
H 0=
: φ1 δ=
δ 2 ,...,
=
φk δ k
1 , φ2
where δ1 , δ 2 ,..., δ k are the known constants.
Let φ = (φ1 , φ2 ,..., φk )′ and δ = (δ1 , δ 2 ,.., δ k )′.
′β δ
: φ L=
Then H 0 is expressible as H 0=
where L′ is a k × p matrix of constants associated with L1 , L2 ,..., Lk . The maximum likelihood
estimator of φi is : φˆi = L'i βˆ where βˆ = ( X ′X ) −1 X ′y.
ˆ , φˆ ,..., φˆ ) Lβˆ .
Then φˆ (φ=
=
1
2
k
Also E (φˆ) = φ
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
24
Cov(φˆ) = σ 2V
V = (( L'i ( X ′X ) −1 L j ))
where
where
( L'i ( X ′X ) −1 L j )
is the
(i, j )th
element of V . Thus
(φˆ − φ )′V −1 (φˆ − φ )
σ2
follows a χ 2 − distribution with k degrees of freedom and
nσˆ 2
σ
2
follows a χ 2 − distribution with (n − p ) degrees of freedom where
1
n
σˆ 2 =( y − X βˆ )′( y − X βˆ ) is the maximum likelihood estimator of σ 2 .
Further
(φˆ − φ )′V −1 (φˆ − φ )
σ2
and
nσˆ 2
σ2
are also independently distributed.
Thus under H 0 : φ = δ
 (φˆ − δ )′V −1 (φˆ − δ ) 


σ2


k






2
 nσˆ 


2
 σ

 (n − p) 


 n − p  (φˆ − δ )′V −1 (φˆ − δ )
or 

nσˆ 2
 k 
follows F- distribution with k and (n − p ) degrees of freedom. So the hypothesis H 0 : φ = δ is
1, 2,..., k whenever F ≥ F1−α (k , n − p ) where
rejected against H1 : At least one φì ≠ δ i for i =
F1−α (k , n − p ) denotes the 100α % points on F-distribution with k and (n – p) degrees of freedom.
One-way classification with fixed effect linear models of full rank:
The objective in the one-way classification is to test the hypothesis about the equality of means on
the basis of several samples which have been drawn from univariate normal populations with
different means but the same variances.
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
25
Let there be p univariate normal populations and samples of different sizes are drawn from each of
the population. Let yij ( j = 1, 2,..., ni ) be a random sample from the ith normal population with mean
βi and variance σ 2 , i = 1, 2,..., p , i.e.
2,..., ni ; i 1, 2,..., p.
βi , σ 2 ), j 1,=
Yij ~ N (=
The random samples from different population are assumed to be independent of each other.
These observations follow the set up of linear model
Y Xβ +ε
=
where
Y = (Y11 , Y12 ,..., Y1n1 , Y21 ,..., Y2 n2 ,..., Yp1 , Yp 2 ,..., Ypn p ) '
y = ( y11 , y12 ,..., y1n1 , y21 ,..., y2 n2 ,..., y p1 , y p 2 ,..., y pn p ) '
β = ( β1 , β 2 ,..., β p )′
ε = (ε11 , ε12 ,..., ε1n , ε 21 ,..., ε 2 n ,..., ε p1 , ε p 2 ,..., ε pn ) '
1
2
p
1 0...0 




    n1 values 
1 0 0 



 0 1...0 

    n values 
 2

X =
 0 1...0 



  

 0 0...1




    n p values 



 0 0...1

1 if βi occurs in the j th observation x j

xij =  or if effect βi is present in x j

0 if effect βi is absent in x j
p
n = ∑ ni .
i =1
So X is a matrix of order n × p, β is fixed and
-
first n1 rows of ε are ε1' = (1, 0, 0,..., 0),
-
next n2 rows of ε are ε 2' = (0,1, 0,..., 0)
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
26
-
and similarly the last n p rows of ε are ε 'p = (0, 0,..., 0,1).
Obviously, rank
=
, E (Y ) X β and Cov(Y ) = σ 2 I .
( X ) p=
This completes the representation of a fixed effect linear model of full rank.
The null hypothesis of interest is H 0 : β1= β 2= ...= β p= β (say)
and H1 : At least one βi ≠ β j (i ≠ j )
where β and σ 2 are unknown.
We would develop here the likelihood ratio test. It may be noted that the same test can also be
derived through the least squares method. This will be demonstrated in the next module. This way
the readers will understand both the methods.
We already have developed the likelihood ratio for the hypothesis H 0 : β1= β 2= ...= β p in the case
1.
The
=
Ω
whole
{(β ,σ
2
parametric
space
Ω
is
a
( p + 1)
dimensional
space
2
) : −∞ < β < ∞, σ
=
> 0, i 1, 2,..., p} .
Note that there are ( p + 1) parameters are β1 , β 2 ,..., β p and σ 2 .
Under H 0 , Ω reduces to two dimensional space ω as
=
ω
{(β ,σ
2
); −∞ < β < ∞, σ 2 > 0} . .
The likelihood function under Ω is
n
 1 p ni
 1 2
2
exp
L( y β , σ ) = 
 − 2 ∑∑ ( yij − βi ) 
2 
 2πσ 
 2σ =i 1 =j 1

2
1 p ni
n
− ln (2πσ 2 ) − 2 ∑∑ ( yij − βi ) 2
ln L( y β , σ 2 ) =
L=
2
2σ =i 1 =j 1
∂L
1
=0 ⇒ βˆi =
∂βi
ni
ni
∑y
j =1
ij
=yio
∂L
1 p ni
2
ˆ
=
⇒
=
σ
0
∑∑ ( yij − yio )2 .
∂σ 2
n=i 1 =j 1
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
27
The dot sign ( o ) in yio indicates that the average has been taken over the second subscript j. The
Hessian matrix of second order partial derivation of ln L with respect to βi and σ 2 is negative
definite at β = yio and σ 2 = σˆ 2 which ensures that the likelihood function is maximized at these
values.
Thus the maximum value of L( y β , σ 2 ) over Ω is
n
 1 p ni

 1 2
− 2 ∑∑ ( yij − βˆi ) 2 
Max L( y β , σ 2 ) = 
exp


2
Ω
 2πσˆ 
 2σ =i 1 =j 1





n


=
p ni

2
 2π ∑∑ ( yij − yio ) 
 =i 1 =j 1

n/2
 n
exp  −  .
 2
The likelihood function under ω is
n
 1 p ni

 1 2
exp  − 2 ∑∑ ( yij − β ) 2 
L( y β , σ ) = 
2 
 2πσ 
 2σ =i 1 =j 1

2
1 p ni
n
ln L( y β , σ 2 ) =
− ln(2πσ 2 ) − 2 ∑∑ ( yij − β ) 2
2
2σ =i 1 =j 1
The normal equations and the least squares are obtained as follows:
∂ ln L( y β , σ 2 )
∂β
∂ ln L( y β , σ 2 )
∂σ 2
p
ni
1
=0 ⇒ βˆ = ∑∑ yij = yoo
n=i 1 =j 1
1 p ni
=0 ⇒ σˆ 2 = ∑∑ ( yij − yoo ) 2 .
n=i 1 =j 1
The maximum value of the likelihood function over ω under H 0 is
n
 1 p ni

 1 2
exp
− 2 ∑∑ ( yij − βˆ ) 2 
Max L( y β , σ 2 ) = 


2
ω
 2πσˆ 
 2σˆ =i 1 =j 1





n


=
p ni

2
 2π ∑∑ ( yij − yoo ) 
 =i 1 =j 1

n/2
 n
exp  −  .
 2
The likelihood ratio test statistic is
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
28
λ=
Max L( y β , σ 2 )
ω
Max L( y β , σ 2 )
Ω
 p ni
2 
 ∑∑ ( yij − yio ) 
i 1 j 1

= =p =ni

2
 ∑∑ ( yij − yoo ) 
=i 1 =j 1

n/2
We have that
p
ni
∑∑ ( y
ij
=i 1 =j 1
−=
yoo )
2
=
p
ni
∑∑ ( y
ij
=i 1 =j 1
p
ni
∑∑ ( y
=i 1 =j 1
ij
2
− yio ) + ( yio − yoo ) 
p
− yio ) + ∑ ni ( yio − yoo ) 2
2
=i 1
Thus
n
2
−


λ=




− yi ) 2 + ∑ ni ( yio − yoo ) 2 
I 1
=

p ni

( yij − yio ) 2
∑∑

=i 1 =j 1

ni
p
p
∑∑ ( y
ij
=i 1 =j 1
 q 
= 1 + 1 
 q2 
−
n
2
where
=
q1
p
∑n (y
i =1
i
io
− yoo ) ,=
and q2
2
p
ni
∑∑ ( y
=i 1 =j 1
ij
− yio ) 2 .
Note that if the least squares principal is used, then
q 1 : sum of squares due to deviations from H 0 or the between population sum of squares,
q 2 : sum of squares due to error or the within population sum of squares,
q 1 +q 2 : sum of squares due to H 0 or the total sum of squares.
Using the theorems 6 and 7, let
p
Q1 =
∑ ni (Yio − Yoo ) ,
2
p
Q2 =
∑ Si2
i 1 =i 1
where
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
29
ni
∑ (Y
=
Si2
ij
i =1
− Yio ) 2
Yoo =
1 p ni
∑∑ Yij
n=i 1 =j 1
Yio =
1
ni
ni
∑Y
j =1
ij
then under H 0
Q1
σ
2
Q2
σ
2
~ χ 2 ( p − 1)
~ χ 2 (n − p)
and
Q1
σ
2
Q2
and
σ2
are independently distributed.
Thus under H 0
 Q1 
 σ2 


 p −1 

 ~ F ( p − 1, n − p ).
Q
 2 
 σ2 


n− p


The likelihood ratio test reject H 0 whenever
q1
>C
q2
C F1−α ( p − 1, n − p).
where the constant =
The analysis of variance table for the one-way classification in fixed effect model is
Source of
Variation
Degrees of
freedom
Sum of
squares
Between Population
p −1
q1
Within Population
n− p
q2
Total
n −1
Mean sum
of squares
q1
p −1
q2
n− p
F
 n − p  q1

.
p
−
1

 q2
q1 + q2
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
30
Note that
 Q 
E  2  =σ2
n − p
p
( βi − β )2
∑
 Q1 
1 p
2
i =1
=
+
=
σ
β
;
E
∑ βi

p −1
p i =1
 p − 1
Case of rejection of
H0
If F > F1−α ( p − 1, n − p ), then H 0 : β1= β 2= ...= β p is rejected. This means that at least one βi is
different from other which is responsible for the rejection. So objective is to investigate and find out
such βi and divide the population into group such that the means of populations within the groups
are same. This can be done by pairwise testing of β ' s.
: βi β k (i ≠ k ) against H1 : βi ≠ β k .
Test H 0=
This can be tested using following t-statistic
t=
Yio − Yko
1 1
s2  + 
 ni nk 
which follows the t distribution with (n − p ) degrees of freedom under H 0 and s 2 =
q2
. Thus
n− p
the decision rule is to reject H 0 at level α if the observed difference
( yio − yko ) > t
α
The quantity t
α
1− , n − p
2
1− , n − p
2
1 1
s2  + 
 ni nk 
1 1
s 2  +  is called the critical difference.
 ni nk 
Thus following steps are followed :
1. Compute all possible critical differences arising out of all possible pair ( βi , β k ), i ≠ k =
1, 2,..., p .
2. Compare them with their observed differences
3. Divide the p populations into different groups such that the populations in the same group have
same means.
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
31
The computation are simplified if ni = n for all i. In such a case , the common critical difference
(CCD) is
CCD = t
α
1− , n − p
2
2s 2
n
and the observed difference ( yio − yko ), i ≠ k are compared with CCD.
If yio − yko > CCD
then the corresponding effects/means yio and yko are coming from populations with the different
means.
Note: In general we say that if there are three effects β1 , β 2 , β3
then
β 2 (≡ denote as event A) is accepted
if H 01 : β=
1
β3 (≡ denote as event B) is accepted
and if H 02 : β=
2
β 2 (≡ denote as event C ) will be accepted. The question arises here that in what sense
then H 03 : β=
1
do we conclude such statement about the acceptance of H 03. The reason is as follows:
Since event A  B ⊂ C , so
P ( A  B ) ≤ P (C )
In this sense if the probability of an event is higher than the intersection of the events, i.e., the
probability that H 03 is accepted is higher than the probability of acceptance of H 01 and H 02
both, so we conclude , in general , that the acceptance of H 01 and H 02 imply the acceptance of
H 03 .
Multiple comparison test:
One interest in the analysis of variance is to decide whether population means are equal or not. If the
hypothesis of equal means is rejected then one would like to divide the populations into subgroups
such that all populations with same means come to the same subgroup. This can be achieved by the
multiple comparison tests.
A multiple comparison test procedure conducts the test of hypothesis for all the pairs of effects and
compare them at a significance level α i.e., it works on per comparison basis.
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
32
This is based mainly on the t-statistic. If we want to ensure that the significance level α
simultaneously for all group comparison of interest, the approximate multiple test procedure is one
that controls the error rate per experiment basis.
There are various available multiple comparison tests. We will discuss some of them in the context
of one-way classification. In two-way or higher classification, they can be used on similar lines.
1. Studentized range test:
It is assumed in the Studentized range test that the p samples, each of size n, have been drawn from p
normal populations. Let their sample means be y1o , y2 o ,..., y po These means are ranked and arranged
and y *p Max
in an ascending order as y1* , y2* ,..., y *p where y1* = Min yio=
yio , i 1, 2,..., p.
=
i
i
Find the range as =
R y *p − y1* .
The Studentized range is defined as
q p, n− p =
R n
s
where qα , p ,γ is the upper α % point of Studentized range when γ = n − p. The tables for qα , p ,γ are
available.
The testing procedure involves the comparison of q p ,γ with qα , p ,γ in the usual way asβ1= β 2= ...= β p
•
if q p ,n − p < qα , p ,n − p then conclude that
•
if q p ,n − p > qα , p ,n − p then all β ′s in the group are not the same.
.
2. Student-Newman-Keuls test:
The Student-Newman-Keuls test is similar to Studentized range test in the sense that the range is
compared with α % points on critical Studentized range W P given by
W p = qα , p ,γ
s2
.
n
= y *p − y1* is now compared with W p .
The observed range R
•
If R < W p then stop the process of comparison and conclude that β1= β 2= ...= β p .
•
If R > W p then
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
33
(i)
divide the ranked means y1* , y2* ,..., y *p into two subgroups containing - ( y *p , y *p −1 ,..., y2* )
and ( y *p −1 , y *p − 2 ,..., y1* )
(ii)
y *p − y2* and =
Compute the ranges R=
R2 y *p −1 − y1* . Then compare the ranges R1 and R2
1
with W p −1 .
 If either range ( R1 or R2 ) is smaller than W p −1 , then the means (or βi ’s) in each of the
groups are equal.
 If R1 and/or R2 are greater then W p −1 , then the ( p − 1) means (or βi ’s) in the group
concerned are divided into two groups of ( p − 2) means (or βi ’s) each and compare the
range of the two groups with W p − 2 .
Continue with this procedure until a group of i means (or βi ’s) is found whose range does not exceed
Wi .
By this method, the difference between any two means under test is significant when the range of the
observed means of each and every subgroup containing the two means under test is significant
according to the studentized critical range.
This procedure can be easily understood by the following flow chart.
Arrange '
yio s in increasing order
y1* ≤ y2* ≤ ... ≤ y *p
Compute =
R y *p − y1*
Compare with W p = qα , p ,γ
If R < W p
Stop and conclude
β1= β 2= ...= β p
s2
n
If R > W p
continue
Divide ranked mean in 2 groups
( y ,.*p,..., y2* ) and ( y *p −1 ,..., y1* )
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
34
Compute R=
y p* − y2*
1
=
R2 y *p −1 − y1*
Compare R1 and R2 with W p −1
4 possibilities of R1 and R2 with W p −1
R1 < W p −1
R2 < W p −1
⇒ β 2 =β3 =... =β p
and
β1= β 2= ...= β p −1
⇒ β1 =β 2 =... =β p
R1 < W p −1
R1 > W p −1
R2 > W p −1
R2 < W p −1
⇒ β 2 =β3 =... =β p
⇒ β1 =β 2 =... =β p −1
and at least one
βi ≠ β j , =
i ≠ j 1, 2,..., p − 1
and at least one
2,3,..., p
βi ≠ β j , i ≠ j =
which is β1
which is β p
⇒ one subgroup is
⇒ One subgroup has
( β 2 , β3 ,..., β p )
( β1 , β 2 ,..., β p −1 )
and another group has only β1
and another has only β p
R1 > W p −1
R2 > W p −1
Divide ranked means
in four groups
1. y *p ,..., y3*
2. y p* −1 ,..., y2*
3. y p* −1 ,..., y2* (same as in 2)
4. y p* − 2 ,..., y1*
Compute
R=
y *p − y3*
3
=
R4 y *p −1 − y2*
=
R5 y *p − 2 − y1*
and compare
with W p − 2
Continue till we get subgroups with same '
β s
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
35
3. Duncan’s multiple comparison test:
The test procedure in Duncan’s multiple comparison test is the same as in the Student-NewmanKeuls test except the observed ranges are compared with Duncan’s α % critical range
D p = qα* p , p ,γ
s2
n
where α p =1 − (1 − α ) p −1 , qα*
p , p,γ
denotes the upper 100α % points of the Studentized range based on
Duncan’s range.
Tables for Duncan’s range are available.
Duncan feels that this test is better than the Student-Newman-Keuls test for comparing the
differences between any two ranked means. Duncan regarded that the Student-Newman-Keuls
method is too stringent in the sense that the true differences between the means will tend to be
missed too often. Duncan notes that in testing the equality of a subset k , (2 ≤ k ≤ p ) means through
null hypothesis, we are in fact testing whether ( p − 1) orthogonal contrasts between the β ' s differ
from zero or not. If these contrasts were tested in separate independent experiments, each at level
α , the probability of incorrectly rejecting the null hypothesis would be 1 − (1 − α ) p −1  . So Duncan
proposed to use 1 − (1 − α ) p −1  in place of α in the Student-Newman-Keuls test.
[Reference: Contributions to order statistics, Wiley 1962, Chapter 9 (Multiple decision and multiple
comparisons, H.A. David, pages 147-148)].
Case of unequal sample sizes:
When sample means are not based on the same number of observations, the procedures based on
Studentized range, Student-Newman-Keuls test and Duncan’s test are not applicable. Kramer
proposed that in Duncan’s method, if a set of p means is to be tested for equality, then replace
qα* * , p ,γ
p
s
1  1
1 
by qα* * , p ,γ s
+ 

p
2  nU nL 
n
where nU and nL are the number of observations corresponding to the largest and smallest means in
the data. This procedure is only an approximate procedure but will tend to be conservative, since
means based on a small number of observations will tend to be overrepresented in the extreme
groups of means.
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
36
Another option is to replace n by the harmonic mean of n1 , n2 ,..., n p , i.e.,
p
1
∑

i =1  ni
p



.
4. The “Least Significant Difference” (LSD):
In the usual testing of H 0 : β i = β k against H1 : βi ≠ β k , the t-statistic
t=
yio − yko
(y − y )
Var
io
ko
is used which follows a t-distribution , say with degrees of freedom ' df ' . Thus H 0 is rejected
whenever
t >t
df , 1−
α
2
and it is concluded that β1 and β 2 are significantly different. The inequality t > t
df , 1−
α
can be
2
equivalently written as
yio − yko > t
df , 1−
α
2
(y − y ) .
Var
io
ko
If every pair of sample for which
yio − yko exceeds t
df , 1−
(y − y )
Var
io
ko
α
2
then this will indicate that the difference between βi and β k is significantly different . So according
to this, the quantity t
df , 1−
α
2
 ( y − y ) would be the least difference of y and y for which it
Var
io
ko
ko
io
will be declared that the difference between βi and β k is significant. Based on this idea, we use the
pooled variance of the two sample Var ( yio − yko ) as s2 and the Least Significant Difference (LSD) is
defined as
=
LSD t
α
df , 1−
2
1 1
s2  +  .
 ni nk 
n=
n, then
If n=
1
2
LSD = t
df , 1−
α
2
2s 2
.
n
p ( p − 1)
pairs of
2
yio and yko , (i ≠ k =
1, 2,..., p ) are compared with LSD. Use of LSD
criterion may not lead to
good results if it is used for comparisons suggested by the data
Now all
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
37
(largest/smallest sample mean) or if all pairwise comparisons are done without correction of the test
level. If LSD is used for all the pairwise comparisons then these tests are not independent. Such
correction for test levels was incorporated in Duncan’s test.
5. Tukey’s “Honestly significant Difference” (HSD)
In this procedure , the Studentized rank values
qα , n ,γ are used in place of t-quantiles and the
standard error of the difference of pooled mean is used in place of standard error of mean in
common critical difference for testing H 0 : βi = β k against
H 0 : βi ≠ β k and Tukey’s Honestly
Singnificant Difference is computed as
HSD = q
α
1− , p ,γ
2
MSerror
.
n
assuming all samples are of the same size n. All
p ( p − 1)
pairs yio − yko
2
are compared with HSD.
If yio − yko > HSD then βi and β k are significantly different.
We notice that all the multiple comparison test procedure discussed up to now are based on the
testing of hypothesis. There is one-to-one relationship between the testing of hypothesis and the
confidence interval estimation. So the confidence interval can also be used for such comparisons.
0 so first we establish the relationship and then describe
Since H 0 : βi = β k is same as H 0 : βi − β k =
the Tukey’s and Scheffe’s procedures for multiple comparison test which are based on the
confidence interval. We need the following concepts.
Contrast:
A linear parametric function=
L l=
'β
p
∑ β
i =1
i
i
where β = ( β1 , β 2 ,..., β P ) and  = (1 ,  2 ,...,  p ) are
the p × 1 vectors of parameters and constants respectively is said to be a contrast when
p
∑
i
= 0.
i=1
For example
β1 − β 2= 0, β1 + β 2 − β3 − β1= 0, β1 + 2β 2 − 3β3= 0
etc. are contrast whereas
β1 + β 2 = 0, β1 + β 2 + β3 + β 4 = 0, β1 − 2β 2 − 3β3 = 0 etc. are not contrasts.
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
38
Orthogonal contrast:
If =
L1 =
'β
p
L2 m=
'β
∑  i βi and=
i =1
p
∑ mi βi are contrasts such that ′m = 0 or
i =1
p
∑  m = 0 then L
i =1
i
i
1
and L2 are called orthogonal contrasts.
For example,
L1 = β1 + β 2 − β3 − β 4 and
L2 = β1 − β 2 + β3 − β 4 are contrasts. They are also the
orthogonal contrasts.
p
The condition
∑ m
i =1
i
i
= 0 ensures that L1 and L2 are independent in the sense that
p
Cov
( L1 , L2 ) σ=
=
∑  i mi 0.
2
i =1
Mutually orthogonal contrasts:
If there are more than two contrasts then they are said to be mutually orthogonal, if they are pairwise orthogonal.
It may be noted that the number of mutually orthogonal contrasts is the number of degrees of
freedom.
Coming back to the multiple comparison test, if the null hypothesis of equality of all effects is
rejected then it is reasonable to look for the contrasts which are responsible for the rejection. In terms
of contrasts, it is desirable to have a procedure
(i)
that permits the selection of the contrasts after the data is available.
(ii)
with which a known level of significance is associated.
Such procedures are Tukey’s and Scheffe’s procedures. Before discussing these procedure, let us
consider the following example which illustrates the relationship between the testing of hypothesis
and confidence intervals.
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
39
Example: Consider the test of hypothesis for
H 0 : β=
β j (i ≠ =
j 1, 2,..., p)
i
or H 0 : βi − β j =
0
or H 0 : contrast = 0
or H 0 : L = 0.
The test statistic for H 0 : βi = β j is
t
( βˆi − βˆ j ) − ( βi − β j )
=
(y − y )
Var
io
ko
Lˆ − L
 ( Lˆ )
Var
where β̂ denotes the maximum likelihood (or least squares) estimator of β and t follows a tdistribution with df degrees of freedom. This statistic, in fact, can be extended to any linear contrast,
say e.g., L = β1 + β 2 − β3 − β 4 , Lˆ = βˆ1 + βˆ2 − βˆ3 − βˆ4 .
The decision rule is
reject H 0 : L = 0 against H1 : L ≠ 0 .
If
 ( Lˆ ) .
Lˆ > tdf Var
The 100 (1 − α ) % confidence interval of L is obtained as


Lˆ − L

P −tdf ≤
≤ tdf  =1 − α



ˆ
Var ( L)


 ( Lˆ ) ≤ L ≤ Lˆ + t Var
 ( Lˆ )  =1 − α
or P  Lˆ − tdf Var
df


so that the 100(1 − α ) % confidence interval of L is
 Lˆ − t Var
 ( Lˆ ), Lˆ + t Var
 ( Lˆ ) 
df
df


and
 ( Lˆ ) ≤ L ≤ Lˆ + t Var
 ( Lˆ )
Lˆ − tdf Var
df
If this interval includes
L = 0 between lower and upper confidence limits, then
H 0 : L = 0 is
accepted. Our objective is to know if the confidence interval contains zero or not.
Suppose for some given data the confidence intervals for β1 − β 2 and β1 − β3 are obtained as
−3 ≤ β1 − β 2 ≤ 2 and 2 ≤ β1 − β3 ≤ 4.
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
40
Thus we find that the interval of β1 − β 2 includes zero which implies that
accepted. Thus β1 = β 2 .
H 0 : β1 − β 2 =
0 is
On the other hand interval of β1 − β 3 does not include zero and so
H 0 : β1 − β3 =
0 is not accepted . Thus β1 ≠ β3 .
If the interval of β1 − β3 is −1 ≤ β1 − β3 ≤ 1 then H 0 : β1 = β3 is accepted. If both H 0 : β1 = β 2 and
H 0 : β1 = β3 are accepted then we can conclude that β=
β=
β3 .
1
2
Tukey’s procedure for multiple comparison (T-method)
The T-method uses the distribution of studentized range statistic . (The S-method (discussed next)
utilizes the F-distribution). The T-method can be used to make
the simultaneous confidence
statements about contrasts ( βi − β j ) among a set of parameters {β1 , β 2 ,..., β p } and an estimate s 2 of
error variance if certain restrictions are satisfied.
These restrictions have to be viewed according to the given conditions. For example, one of the
restrictions is that all βˆi ' s have equal variances. In the setup of one-way classification, β̂i has its
σ2
mean Yi and its variance is
. This reduces to a simple condition that all ni ' s are same, i.e.,
ni
ni = n for all i. so that all the variances are same.
Another assumption is to assume that βˆ1 , βˆ2 ,..., βˆ p are statistically independent and the only contrasts
considered are the
p ( p − 1)
1, 2,..., p .
differences βi − β j , i ≠ j =
2
{
}
We make the following assumptions:
(i)
The βˆ1 , βˆ2 ,..., βˆ p are statistically independent
(ii)
2 2
1, 2,..., p, a > 0 is a known constant.
βˆi ~ N ( β=
i , a σ ), i
(iii)
s 2 is an independent estimate of σ 2 with γ degrees of freedom (here γ = n − p ) , i.e.,
γ s2
~ χ 2 (γ )
σ2
and
(iv) s 2 is statistically independent of βˆ1 , βˆ2 ,..., βˆ p .
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
41
The statement of T-method is as follows:
Under the assumptions (i)-(iv), the probability is (1 − α ) that the values of contrasts
=
L
p
p
=
Ci βi (∑ Ci 0) simultaneously satisfy
∑
=i 1 =i 1
1 p

1 p

Lˆ − Ts  ∑ Ci  ≤ L ≤ Lˆ + Ts  ∑ Ci 
=
 2 i 1=

2 i 1

p
where Lˆ = ∑ Ci βˆi , βˆi is the maximum likelihood (or least squares) estimate of βi , T = aqα , p ,γ ,
i =0
with qα , p ,γ beng the upper α point of the distribution of studentized range.
Note that if L is a contrast like βi − β j (i ≠ j ) then
1 p
∑ Ci = 1 and the variance is σ 2 so that
2 i =1
a = 1 and the interval simplifies to
( βˆi − βˆ j ) − Ts ≤ β i − β j ≤ ( βˆi − βˆ j ) + Ts
ˆ βˆ − βˆ of
where T = qα , p ,γ . Thus the maximum likelihood (or least squares) estimate L=
i
j
L= βi − β j is said to be significantly different from zero according to T-criterion if the interval
( βˆi − βˆ j − Ts, βˆi − βˆ j + Ts ) does not cover β i − β j =
0, i.e.,
if βˆi − βˆ j > Ts
1 p
or more general if Lˆ > Ts  ∑ Ci
 2 i =1

.

The steps involved in the testing now involve the following steps:
(
)
-
Compute Lˆ or βˆi − βˆ j .
-
Compute all possible pairwise differences.
-
Compare all the differences with
qα , p ,γ .
-

s 1 p
 ∑ Ci 
n  2 i =1

1 p

If Lˆ or ( βˆi − βˆ j ) > Ts  ∑ Ci 
 2 i =1

then β̂i and β̂ j are significantly different where T =
qα , p ,γ
n
.
Tables for T are available.
When sample sizes are not equal, then Tukey-Kramer Procedure suggests to compare L̂ with
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
42
qα , p ,γ

1 1 1  1 p
s  +   ∑ Ci 


2  ni n j   2 i =1

or
T
1 1 1  1 p
 +   ∑ Ci
2  ni n j   2 i =1

.

The Scheffe’s method (S-method) of multiple comparison
S-method generally gives shorter confidence intervals then T-method. It can be used in a number
of situations where T-method is not applicable, e.g., when the sample sizes are not equal.
A set L of estimable functions {ψ } is called a p-dimensional space of estimable functions if there
exists p linearly independent estimable functions (ψ 1 ,ψ 2 ,...,ψ p ) such that every ψ in L is of the
p
form ψ = ∑ Ci yi where C1 , C2 ,..., C p are known constants. In other word, L is the set of all
i =1
linear combinations of ψ 1 ,ψ 2 ,...,ψ p .
Under the assumption that the parametric space Ω is Y ~ N ( X β , σ 2 I ) with rank ( X ) = p,
β = ( β1 ,..., β p ), X is n × p matrix, consider a p-dimensional space L of estimable functions
generated by a set of p linearly independent estimable functions {ψ 1 ,ψ 2 ,...,ψ p }.
n
For any ψ ∈ L, Let ψˆ = ∑ Ci yi be its least squares (or maximum likelihood) estimator,
i =1
n
Var (ψˆ ) = σ 2 ∑ Ci2
i =1
= σψ ( say )
2
n
and σˆψ2 = s 2 ∑ Ci2
i =1
where s 2 is the mean square due to error with (n − p ) degrees of freedom.
The statement of S-method is as follows:
Under the parametric space Ω , the probability is
(1 − α ) that simultaneously for all
constant S
ψ ∈ L, ψˆ − Sσˆψˆ ≤ ψ ≤ ψˆ + Sσˆψˆ where the =
pF1−α ( p, n − p ) .
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
43
Method: For a given space L of estimable functions and confidence coefficient
(1 − α ) , the
least square (or maximum likelihood) estimate ψˆ of ψ ∈ L will be said to be significantly
different from zero according to S-criterion if the confidence interval
(ψˆ − Sσˆψˆ ≤ ψ ≤ ψˆ + Sσˆψˆ )
does not cover ψ = 0, i.e., if ψˆ > Sσˆψˆ .
The S-method is less sensitive to the violation of assumptions of normality and homogeneity of
variances.
Comparison of Tukey’s and Scheffe’s methods:
1. Tukey’s method can be used only with equal sample size for all factor level but S-method is
applicable whether the sample sizes are equal or not.
2. Although, Tukey’s method is applicable for any general contrast, the procedure is more
powerful when comparing simple pairwise differences and not making more complex
comparisons.
3. It only pairwise comparisons are of interest, and all factor levels have equal sample sizes,
Tukey’s method gives shorter confidence interval and thus is more powerful.
4. In the case of comparisons involving general contrasts, Scheffe’s method tends to give
marrower confidence interval and provides a more powerful test.
5. Scheffe’s method is less sensitive to the violations of assumptions of normal distribution and
homogeneity of variances.
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
44
Download