Chapter 2 General Linear Hypothesis and Analysis of Variance Regression model for the general linear hypothesis Let Y1 , Y2 ,..., Yn be a sequence of n independent random variables associated with responses. Then we can write it as = E (Yi ) p = β j xij , i 1, = 2,..., n, j 1, 2,..., p ∑ j =1 Var (Yi ) = σ 2 . This is the linear model in the expectation form where β1 , β 2 ,..., β p are the unknown parameters and xij ’s are the known values of independent covariates X 1 , X 2 ,..., X p . Alternatively, the linear model can be expressed as = Yi p ∑β j =1 j xij , +ε i= , i 1, 2,..., n;= j 1, 2,..., p where ε i ’s are identically and independently distributed random error component with mean 0 and variance σ 2 , i.e., E (ε i ) = 0 Var (ε i ) = σ 2 and Cov(ε i , ε= 0(i ≠ j ). j) In matrix notations, the linear model can be expressed as = Y Xβ +ε where • Y = (Y1 , Y2 ,..., Yn ) ' is an n×1 vector of observations on response variable, • X 11 X 12 ... X 1 p X 21 X 22 ... X 2 p the matrix X = is an n × p matrix of X X ... X np n1 n 2 n observations on p independent covariates X 1 , X 2 ,..., X p , • β = ( β1 , β 2 ,..., β p )′ is a p × 1 vector of unknown regression parameters (or regression coefficients) β1 , β 2 ,..., β p associated with X 1 , X 2 ,..., X p , respectively and • ε = (ε1 , ε 2 ,..., ε n )′ is a n×1 vector of random errors or disturbances. • We assume that E (ε ) = 0, covariance matrix = V (ε ) E= (εε ') σ 2 I p , rank = (X ) p . Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur 1 In the context of analysis of variance and design of experiments, the matrix X is termed as design matrix; unknown β1 , β 2 ,..., β p are termed as effects, the covariates X 1 , X 2 ,..., X p , are counter variables or indicator variables where xij counts the number of times the effect β j occurs in the ith observation xi . xij mostly takes the values 1 or 0 but not always. The value xij = 1 indicates the presence of effect β j in xi and xij = 0 indicates the absence of effect β j in xi . Note that in the linear regression model, the covariates are usually continuous variables. When some of the covariates are counter variables and rest are continuous variables, then the model is called as mixed model and is used in the analysis of covariance. Relationship between the regression model and analysis of variance model The same linear model is used in the linear regression analysis as well as in the analysis of variance. So it is important to understand the role of linear model in the context of linear regression analysis and analysis of variance. Consider the multiple linear model Y = β 0 + X 1β1 + X 2 β 2 + ... + X p β p + ε . In the case of analysis of variance model, • the one-way classification considers only one covariate, • two-way classification model considers two covariates, • three-way classification model considers three covariates and so on. If β , γ and δ denote the effects associated with the covariates X , Z and W which are the counter variables, then in One-way model: Y = α + Xβ +ε Two-way model: Y =α + X β + Z γ + ε Three-way model : Y =α + X β + Z γ + W δ + ε and so on. Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur 2 Consider an example of agricultural yield. The study variable Y denotes the yield which depends on various covariates X 1 , X 2 ,..., X p . In case of regression analysis, the covariates X 1 , X 2 ,..., X p are the different variables like temperature, quantity of fertilizer, amount of irrigation etc. Now consider the case of one-way model and try to understand its interpretation in terms of multiple regression model. The covariate X is now measured at different levels, e.g., if X is the quantity of fertilizer then suppose there are p possible values, say 1 Kg., 2 Kg., ,..., p Kg. then X 1 , X 2 ,..., X p denotes these p values in the following way. The linear model now can be expressed as Y = β o + β1 X 1 + β 2 X 2 + ... + β p X p + ε by defining. 1 if effect of 1 Kg.fertilizer is present X1 = 0 if effect of 1 Kg.fertilizer is absent 1 if effect of 2 Kg.fertilizer is present X2 = 0 if effect of 2 Kg.fertilizer is absent 1 if effect of p Kg.fertilizer is present Xp = 0 if effect of p Kg.fertilizer is absent. If effect of 1 Kg. of fertilizer is present, then other effects will obviously be absent and the linear model is expressible as Y = β 0 + β1 ( X 1 = 1) + β 2 ( X 2 = 0) + ... + β p ( X p = 0) + ε = β 0 + β1 + ε If effect of 2 Kg. of fertilizer is present then Y = β 0 + β1 ( X 1 = 0) + β 2 ( X 2 = 1) + ... + β p ( X p = 0) + ε = β0 + β2 + ε If effect of p Kg. of fertilizer is present then Y = β 0 + β1 ( X 1 = 0) + β 2 ( X 2 = 0) + ... + β p ( X p = 1) + ε = β0 + β p + ε and so on. Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur 3 If the experiment with 1 Kg. of fertilizer is repeated n1 number of times then n1 observation on response variables are recorded which can be represented as Y11 = β 0 + β1.1 + β 2 .0 + ... + β p .0 + ε11 Y12 = β 0 + β1.1 + β 2 .0 + ... + β p .0 + ε12 Y1n1 = β 0 + β1.1 + β 2 .0 + ... + β p .0 + ε1n1 If X 2 =1 is repeated n 2 times, then on the same lines n2 number of times then n2 observation on response variables are recorded which can be represented as Y21 = β 0 + β1.0 + β 2 .1 + ... + β p .0 + ε 21 Y22 = β 0 + β1.0 + β 2 .1 + ... + β p .0 + ε 22 Y2 n2 = β 0 + β1.0 + β 2 .1 + ... + β p .0 + ε 2 n2 The experiment is continued and if X p = 1 is repeated n p times, then on the same lines Yp1 = β 0 + β1.0 + β 2 .0 + ... + β p .1 + ε P1 Yp 2 = β 0 + β1.0 + β 2 .0 + ... + β p .1 + ε P 2 Ypn p = β 0 + β1.0 + β 2 .0 + ... + β p .1 + ε pn p All these n1 , n2 ,.., n p observations can be represented as y 11 1 y12 1 y1n 1 1 y21 1 y 22 1 = y 2 n2 1 y p1 1 1 y p2 y pn 1 p ε 11 ε12 ε1n 1 ε 21 β0 β ε 22 1 + ε 2 n2 βp ε p1 ε p2 0 0 0 0 1 ε pn p 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 1 or = Y X β + ε. Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur 4 In the two-way analysis of variance model, there are two covariates and the linear model is expressible as Y β 0 + β1 X 1 + β 2 X 2 + ... + β p X p + γ 1Z1 + γ 2 Z 2 + ... + γ q Z q + ε = where X 1 , X 2 ,..., X p denotes , e.g., the p levels of quantity of fertilizer, say 1 Kg., 2 Kg.,...,p Kg. and Z1 , Z 2 ,..., Z q denotes, e.g. , the q levels of level of irrigation, say 10 Cms., 20 Cms.,…10 q Cms. etc. The levels X 1 , X 2 ,..., X p , Z1 , Z 2 ,..., Z q are the counter variable indicating the presence or absence of the effect as in the earlier case. If the effect of X 1 and Z1 are present, i.e., 1 Kg of fertilizer and 10 Cms. of irrigation is used then the linear model is written as Y = β 0 + β1.1 + β 2 .0 + ... + β p .0 + γ 1.1 + γ 2 .0 + ... + γ p .0 + ε = β 0 + β1 + γ 1 + ε . If X 2 = 1 and Z 2 = 1 is used, then the model is Y = β0 + β2 + γ 2 + ε . The design matrix can be written accordingly as in the one-way analysis of variance case. In the three-way analysis of variance model Y = α + β1 X 1 + ... + β p X p + γ 1Z1 + ... + γ q Z q + δ1W1 + ... + δ rWr + ε The regression parameters β ' s can be fixed or random. • If all β ' s are unknown constants, they are called as parameters of the model and the model is called as a fixed effect model or model I. The objective in this case is to make inferences about the parameters and the error variance σ 2 . • If for some j , xij = 1 for all i = 1, 2,..., n then β j is termed as additive constant. In this case, β j occurs with every observation and so it is also called as general mean effect. • If all β ’s are observable random variables except the additive constant, then the linear model is termed as random effect model, model II or variance components model. The objective in this case is to make inferences about the variances of β ' s , i.e., σ β2 , σ β2 ,..., σ β2 and error 1 2 p variance σ 2 and/or certain functions of them.. • If some parameters are fixed and some are random variables, then the model is called as mixed effect model or model III. In mixed effect model, at least one β j is constant and at least one β j is random variable. The objective is to make inference about the fixed effect parameters, variance of random effects and error variance σ 2 . Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur 5 Analysis of variance Analysis of variance is a body of statistical methods of analyzing the measurements assumed to be structured as y= β1 xi1 + β 2 xi 2 + ... + β p xip + ε i , = i 1, 2,..., n i where xij are integers, generally 0 or 1 indicating usually the absence or presence of effects β j ; and ε i ’s are assumed to be identically and independently distributed with mean 0 and variance σ 2 . It may be noted that the ε i ’s can be assumed additionally to follow a normal distribution N (0, σ 2 ) . It is needed for the maximum likelihood estimation of parameters from the beginning of analysis but in the least squares estimation, it is needed only when conducting the tests of hypothesis and the confidence interval estimation of parameters. The least squares method does not require any knowledge of distribution like normal upto the stage of estimation of parameters. We need some basic concepts to develop the tools. Least squares estimate of β : Let y1 , y2 ,..., yn be a sample of observations on Y1 , Y2 ,..., Yn . The least squares estimate of β is the values β̂ of β for which the sum of squares due to errors, i.e., n ε ' ε ( y − X β )′( y − X β ) S2 = ∑ ε i2 == i =1 = y′y − 2 X ' y + β ′ X ′X β is minimum where y = ( y1 , y2 ,..., yn )′ . Differentiating S 2 with respect to β and substituting it to be zero, the normal equations are obtained as dS 2 = 2 X ′X β − 2 X ′y= 0 dβ or X ′X β = X ′y . If X has full rank p, then ( X ′X ) has a unique inverse and the unique least squares estimate of β is βˆ = ( X ′X ) −1 X ′y which is the best linear unbiased estimator of β in the sense of having minimum variance in the class of linear and unbiased estimator. If rank of X is not full, then generalized inverse is used for finding the inverse of ( X ′X ). Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur 6 If L′β is a linear parametric function where L = (1 , 2 ,..., p )′ is a non-null vector, then the least squares estimate of L′β is L′βˆ . A question arises that what are the conditions under which a linear parametric function L′β admits a unique least squares estimate in the general case. The concept of estimable function is needed to find such conditions. Estimable functions: A linear function λ ′β of the parameters with known λ is said to be an estimable parametric function (or estimable) if there exists a linear function L′Y of Y such that b E ( L′Y ) = λ ′β for all β ∈ R . Note that not all parametric functions are estimable. Following results will be useful in understanding the further topics. Theorem 1: A linear parametric function L′β admits a unique least squares estimate if and only if L′β is estimable. Theorem 2 (Gauss Markoff theorem): If the linear parametric function L′β is estimable then the linear estimator L′βˆ where β̂ is a solution of X ′X βˆ = X ′Y is the best linear unbiased estimator of L ' β in the sense of having minimum variance in the class of all linear and unbiased estimators of L′β . Theorem 3: If the linear parametric function = φ1 ' l1= β , φ2 l2' β ,..., = φk lk' β are estimable , then any linear combination of φ1 , φ2 ,..., φk is also estimable. Theorem 4: All linear parametric functions in β are estimable if and only if X has full rank. Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur 7 If X is not of full rank, then some linear parametric functions do not admit the unbiased linear estimators and nothing can be inferred about them. The linear parametric functions which are not estimable are said to be confounded. A possible solution to this problem is to add linear restrictions on β so as to reduce the linear model to a full rank. Theorem 5: Let L1' β and L'2 β be two estimable parametric functions and let L1' βˆ and L'2 βˆ be their least squares estimators. Then Var ( L1' βˆ ) = σ 2 L1' ( X ′X ) −1 L1 Cov( L' βˆ , L' βˆ ) = σ 2 L' ( X ′X ) −1 L 1 2 1 2 assuming that X is a full rank matrix. If not, the generalized inverse of X ′X can be used in place of unique inverse. Estimator of σ 2 based on least squares estimation: Consider an estimator of σ 2 as, 1 ( y − X βˆ )′( y − X βˆ ) n− p 1 = [ y − X ( X ′X ) −1 X ' y ]′[Y − X ( X ′X ) −1 X ′y ] n− p 1 = y '[ I − X ( X ′X ) −1 X '][ I − X ( X ′X ) −1 X ′] y n− p 1 = y '[ I − X ( X ′X ) −1 X ′] y n− p σˆ 2 = where the hat matrix [ I − X ( X ′X ) −1 X ′] is an idempotent matrix with its trace as tr [ I − X ( X ′X ) −1 ] = trI − trX ( X ′X ) −1 X ′ = n − tr ( X ′X ) −1 X ′X (using the result tr ( AB) = tr ( BA)) = n − tr I p = n − p. Note that using E ( y ' Ay )= µ ' Aµ + tr ( AΣ) , we have = E (σˆ 2 ) σ2 n− p tr[ I − X ( X ′X ) −1 X ′] = σ2 and so σˆ 2 is an unbiased estimator of σ 2 . Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur 8 Maximum Likelihood Estimation The least squares method does not uses any distribution of the random variables in case of the estimation of parameters. We need the distributional assumption in case of least squares only while constructing the tests for hypothesis and the confidence intervals. For maximum likelihood estimation, we need the distributional assumption from the beginning. Suppose y1 , y2 ,..., yn are independently and identically distributed following a normal distribution p with mean E ( yi ) = ∑ β j xij and variance Var ( yi ) = σ 2 (i =1, 2,…, n). Then the likelihood function j =1 of y1 , y2 ,..., yn is L( = y β ,σ 2 ) 1 n 2 n 2 2 (2π ) (σ ) 1 exp − 2 ( y − X β )′( y − X β ) 2σ where y = ( y1 , y2 ,..., yn )′ . Then n n 1 − log 2π − log σ 2 − L= ln L( y β , σ 2 ) = ( y − X β )′( y − X β ). 2 2 2σ 2 Differentiating the log likelihood with respect to β and σ 2 , we have ∂L X ′y = 0 ⇒ X ′X β = ∂β ∂L 1 =0 ⇒ σ 2 = ( y − X β )′( y − X β ) 2 n ∂σ Assuming the full rank of X , the normal equations are solved and the maximum likelihood estimators are obtained as β = ( X ′X ) −1 X ′y 1 σ 2 =( y − X β )′( y − X β ) n 1 = y′ I − X ( X ′X ) −1 X ′ y. n The second order differentiation conditions can be checked and they are satisfied for β̂ and σ 2 to be the maximum likelihood estimators. Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur 9 Note that in the maximum likelihood estimator β is same as the least squares estimator β and • β is an unbiased estimator of β ,i.e., E ( β ) = β unlike the least squares estimator but • σ 2 is not an unbiased estimator of σ 2 ,i.e.,= E (σ 2 ) n− p 2 σ ≠ σ 2 like the least squares n estimator. Now we use the following theorems for developing the test of hypothesis. Theorem 6: Let Y = (Y1 , Y2 ,..., Yn )′ follow a multivariate normal distribution N ( µ , Σ) with mean vector µ and positive definite covariance matrix Σ . Then Y ′AY follows a noncentral chi-square distribution with p degrees of freedom and noncentrality parameter µ ′ Aµ , i.e., χ 2 ( p, µ ′ Aµ ) if and only if ΣA is an idempotent matrix of rank p. Theorem 7: Let Y = (Y1 , Y2 ,..., Yn )′ follows a multivariate normal distribution N ( µ , ∑) with follows χ 2 ( p1 , µ ′ A1µ ) mean vector µ and positive definite covariance matrix Σ . Let Y ′AY 1 and Y ′A2Y follows χ 2 ( p2 , µ ′ A2 µ ) . Then Y ′AY and Y ′A2Y are independently distributed if 1 A1ΣA2 = 0. Theorem 8: Let Y = (Y1 , Y2 ,..., Yn )′ follows a multivariate normal distribution N ( µ , σ 2 I ) , then the maximum likelihood (or least squares) estimator L′βˆ of estimable linear parametric function nσˆ 2 −1 ˆ is independently distributed of σˆ ; Lβ follow N L′β , L′( X ′X ) L and follows χ 2 (n − p ) 2 σ 2 where rank ( X ) = p. Proof: Consider βˆ = ( X ′X )−1 X ′Y , then E ( Lβˆ ) = L( X ′X ) −1 X ′E (Y ) = L( X ′X ) −1 X ′X β = Lβ Var ( Lβˆ ) = L′Var ( βˆ ) L = L′E ( βˆ − β )′( βˆ − β ) L = σ 2 L′( X ′X ) −1 L. Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur 10 Since β̂ is a linear function of y and L′βˆ is a linear function of β̂ , so L′βˆ follows a normal distribution N Lβ , σ 2 L′( X ′X ) −1 L . Let A= I − X ( X ′X ) −1 X ′ and B = L( X ′X ) −1 X ′, then ′( X ′X ) −1 X ′Y BY = L′βˆ L= and and nσˆ 2 = (Y − X β ) ' I − X ( X ′X )−1 X ′ (Y − X β ) = Y ' AY . So, using Theorem 6 with rank(A) = n - p, nσˆ 2 follows a χ 2 (n − p ) . Also σ2 = BA L′( X ′X ) −1 X ′ − L′( X ′X ) −1 X ′X ( X ′X ) −1 X ′ = 0. So using Theorem 7, Y ' AY and BY are independently distributed. Tests of Hypothesis in the Linear Regression Model First we discuss the development of the tests of hypothesis concerning the parameters of a linear regression model. These tests of hypothesis will be used later in the development of tests based on the analysis of variance. Analysis of Variance The technique in the analysis of variance involves the breaking down of total variation into orthogonal components. Each orthogonal factor represents the variation due to a particular factor contributing in the total variation. Model Let Y1 , Y2 ,..., Yn be independently distributed following a normal distribution with mean p E (Yi ) = ∑ β j xij and variance j =1 σ 2 . Denoting Y = (Y1 , Y2 ,..., Yn )′ a n×1 column vector, such assumption can be expressed in the form of a linear regression model = Y Xβ +ε where X is a n × p matrix, β is a p × 1 vector and ε is a n×1 vector of disturbances with E (ε ) = 0 Cov(ε ) = σ 2 I and ε follows a normal distribution. This implies that E (Y ) = X β Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur 11 E (Y − X β )(Y − X β )′ = σ 2I. Now we consider four different types of tests of hypothesis . In the first two cases, we develop the likelihood ratio test for the null hypothesis related to the analysis of variance. Note that, later we will derive the same test on the basis of least squares principle also. An important idea behind the development of this test is to demonstrate that the test used in the analysis of variance can be derived using least squares principle as well as likelihood ratio test. Case 1: Consider the null hypothesis for testing H0 : β = β 0 where β (= = β1 , β 2 ,..., β p )′, β 0 ( β10 , β 20 ,..., β p0 ) ' is specified and σ 2 is unknown. This null hypothesis is equivalent to 0 H= β= β 20 ,..., = β p β p0 . 0 : β1 1 , β2 Assume that all βi ' s are estimable, i.e., rank ( X ) = p (full column rank). We now develop the likelihood ratio test. The ( p + 1) × 1 dimensional parametric space Ω is a collection of points such that = Ω {(β ,σ 2 2 ); − ∞ < βi < ∞, σ= > 0, i 1, 2,... p} . Under H 0 , all β ’s ‘s are known and equal, say β 0 all are known and the Ω reduces to one dimensional space given by = ω {(β 0 , σ 2 ); σ 2 > 0} . The likelihood function of y1 , y2 ,..., yn is n 1 2 1 ,σ ) exp − 2 ( y − X β 0 )′( y − X β 0 ) L( y β= 2 2πσ 2σ 2 The likelihood function is maximum over Ω when β and σ 2 are substituted with their maximum likelihood estimators, i.e., βˆ = ( X ′X ) −1 X ′y 1 n σˆ 2 =( y − X βˆ )′( y − X βˆ ). Substituting β̂ and σˆ 2 in L( y | β , σ 2 ) gives Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur 12 n 1 2 1 ,σ ) exp − 2 ( y − X βˆ )′( y − X βˆ ) Max L( y β= 2 Ω 2πσˆ 2σˆ 2 n 2 n n = exp − . ˆ ˆ 2 2π ( y − X β )′( y − X β ) 1 Under H 0 , the maximum likelihood estimator of σ 2 is σˆ 2 = ( y − X β 0 )′( y − X β 0 ). n The maximum value of the likelihood function under H 0 is n 1 2 1 ,σ ) exp − 2 ( y − X β 0 )′( y − X β 0 ) Max L( y β= 2 ω 2πσˆ 2σˆ 2 n 2 n n exp − = 0 0 2 2π ( y − X β )′( y − X β ) The likelihood ratio test statistic is λ= Max L( y β , σ 2 ) ω Max L( y β , σ 2 ) Ω n ( y − X βˆ )′( y − X βˆ ) 2 = 0 0 ( y − X β )′( y − X β ) n 2 ˆ )′( y − X βˆ ) − ( y X β = ' 0 0 ˆ ˆ ˆ ˆ ( y − X β ) + ( X β − X β ) ( y − X β ) + ( X β − X β ) ( βˆ − β 0 ) ' X ′X ( βˆ − β 0 ) = 1 + ( y − X βˆ )′( y − X βˆ ) q = 1 + 1 q2 − − n 2 n 2 where q2 = ( y − X βˆ )′( y − X βˆ ) and q = ( βˆ − β 0 )′ X ′X ( βˆ − β 0 ). 1 The expression of q1 and q2 can be further simplified as follows. Consider Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur 13 q1 = ( βˆ − β 0 )′ X ′X ( βˆ − β 0 ) ( X ′X ) −1 X ′y − β 0 ′ X ′X ( X ′X ) −1 X ′y − β 0 = ( X ′X ) −1 X ′( y − X β 0 ) ′ X ′X ( X ′X ) −1 X ′( y − X β 0 ) = = ( y − X β 0 )′ X ( X ′X ) −1 X ′X ( X ′X ) −1 X ′( y − X β 0 ) = ( y − X β 0 )′ X ( X ′X ) −1 X ′( y − X β 0 ) q2 = ( y − X βˆ )′( y − X βˆ ) y − X ( X ′X ) −1 X ′y ′ y − X ( X ′X ) −1 X ′y = = y′ I − X ( X ′X ) −1 X ′ y = [( y − X β 0 ) + X β 0 ]′[ I − X ( X ′X ) −1 X '][( y − X β 0 )′ + X β 0 ] ( y − X β 0 )′[ I − X ( X ′X ) −1 X ′]( y − X β 0 ) = Other two terms become zero using [ I − X ( X ′X ) −1 X ′] X = 0 In order to find out the decision rule for H 0 based on λ , first we need to find if λ is a monotonic increasing or decreasing function of Let g = q1 . So we proceed as follows: q2 q1 , so that q2 q λ= 1 + 1 q2 = (1 + g ) then dλ n = − dg 2 − − n 2 n 2 1 n (1 + g ) 2 +1 So as g increases, λ decreases. Thus λ is a monotonic decreasing function of The decision rule is to reject q1 . q2 H 0 if λ ≤ λ0 where λ0 is a constant to be determined on the basis of size of the test α . Let us simplify this in our context. Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur 14 λ ≤ λ0 q or 1 + 1 q2 or 1 (1 + g ) n 2 − n 2 ≤ λo ≤ λo − or (1 + g ) ≥ λ0 − 2 n 2 or g ≥ λ0 n − 1 or g ≥ C where C is a constant to be determined by the size α condition of the test. So reject H 0 whenever Note that the statistic q1 ≥ C. q2 q1 q2 can also be obtained by the least squares method as follows. The least squares methodology will also be discussed in further lectures. ( βˆ − β 0 )′ X ′X ( βˆ − β 0 ) q1 = q1 = Min( y − X β )′( y − X β ) ω ↓ sum of squares due to deviation from H o OR sum of squares due to β − Min( y − X β )′( y − X β ) ↓ sum of squares due to H o Ω ↓ sum of squares due to error OR Totalsum of squares It will be seen later that the test statistic will be based on the ratio distribution of q1 . In order to find an appropriate q2 q1 , we use the following theorem: q2 Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur 15 Theorem 9: Let Z= Y − X β 0 Q1 = Z ′X ( X ′X ) −1 X ' Z = Q2 Z ′[ I − X ( X ′X ) −1 X ′]Z . Q1 Then and σ Q2 σ2 2 and Q2 σ 2 . are independently distributed. Further, when H 0 is true , then Q1 σ2 ~ χ 2 ( p) 2 2 ~ χ 2 (n − p ) where χ (m) denotes the χ distribution with ‘m’ degrees of freedom. Proof: Under H 0 , E (Z ) = X β 0 − X β 0 = 0 ( Z ) Var (Y ) σ 2 I . Var = = Further Z is a linear function of Y and Y follows a normal distribution. So Z ~ N (0, σ 2 I ) The matrices X ( X ′X ) −1 X ′ and [ I − X ( X ′X ) −1 X ′ ] are idempotent matrices. So 1 tr [ X ( X ′X ) −= X ′ ] tr[( X ′X ) −1= X ′X ] tr= (I p ) p tr [ I − X ( X ′X ) −1 X ′ ] = tr I n − tr[ X ( X ′X ) −1 X ′] = n− p So using theorem 6, we can write that under H 0 Q1 σ 2 ~ χ 2 ( p) and Q2 σ2 ~ χ 2 (n − p) where the degrees of freedom p and (n − p ) are obtained by the trace of X ( X ′X ) −1 X ′ and trace of I − X ( X ′X ) −1 X ′ , respectively. Since I − X ( X ′X ) −1 X ′ X ( X ′X ) −1 X ′ = 0, So using theorem 7, the quadratic forms Q1 and Q2 are independent under H 0 . Hence the theorem is proved. Since Q 1 and Q 2 are independently distributed, so under H 0 Q1 / p follows a central F distribution, i.e. Q2 /(n − p ) n − p Q1 F ( p, n − p). p Q2 Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur 16 Hence the constant C in the likelihood ratio test statistic λ is given by = C F1−α ( p, n − p) where F1−α (n1 , n2 ) denotes the upper 100α % points of F-distribution with n1 and n2 degrees of freedom. The computations of this test of hypothesis can be represented in the form of an analysis of variance table. ANOVA for testing H 0 : β = β 0 ______________________________________________________________________________ Source of Degrees Sum of Mean F-value variation of freedom squares squares ______________________________________________________________________________ q1 n − p q1 q1 Due to β p p p q2 Error Total n− p n q2 q2 (n − p) ( y − X β 0 )′( y − X β 0 ) Case 2: Test of a subset of parameters 0 H 0= : β k β= 1, 2,.., r < p when β r +1 , β r + 2 ,..., β p and k ,k σ 2 are unknown. In case 1 , the test of hypothesis was developed when all β ’s are considered in the sense that we test 0 for each= βi β= 1, 2,..., p. Now consider another situation, in which the interest is to test only a i , i subset of β1 , β 2 ,..., β p , i.e., not all but only few parameters. This type of test of hypothesis can be used, e.g., in the following situation. Suppose five levels of voltage are applied to check the rotations per minute (rpm) of a fan at 160 volts, 180 volts, 200 volts, 220 volts and 240 volts. It can be realized in practice that when the voltage is low, then the rpm at 160, 180 and 200 volts can be observed easily. At 220 and 240 volts, the fan rotates at the full speed and there is not much difference in the rotations per minute at these voltages. So the interest of the experimenter lies in testing the hypothesis related to only first three effects, viz., β1 , for 160 volts, β 2 for 180 volts and β 3 for 200 volts. The null hypothesis in this case can be written as: Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur 17 0 0 0 β= β= β 30 H= 0 : β1 1 , β2 2 , β3 when β 4 , β5 and σ 2 are unknown. Note that under case 1, the null hypothesis will be 0 0 0 0 0 β= β= β= β= β 50 . H= 0 : β1 1 , β2 2 , β3 3 , β4 4 , β5 Let β 1 , β 2 ,..., β p be the p parameters. We can divide them into two parts such that out of β 1 , β 2 ,..., β r , β r +1 ,..., β p and we are interested in testing a hypothesis of a subset of it. Suppose, We want to test the null hypothesis 0 H 0= : β k β= 1, 2,.., r < p when β r +1 , β r + 2 ,..., β p and σ 2 are unknown. k ,k The alternative hypothesis under consideration is H1 : β k ≠ β k0 for at least one k = 1, 2,.., r < p. In order to develop a test for such a hypothesis, the linear model = Y Xβ +ε under the usual assumptions can be rewritten as follows: β (1) Partition X = ( X 1 X 2 ) , β = β (2) where β (1) = ( β1 , β 2 ,..., β r )′ , β (2) = ( β r +1 , β r + 2 ,..., β p )′ with order as X 1 : n × r , X 2 : n × ( p − r ), β (1) : r ×1 and β (2) : ( p − r ) ×1. The model can be rewritten as = Y Xβ +ε β (1) = ( X1 X 2 ) +ε β (2) = X 1β (1) + X 2 β (2) + ε The null hypothesis of interest is now 0 H 0 : β= β= ( β10 , β 20 ,..., β r0 ) where β (2) and σ 2 are unknown. (1) (1) The complete parametric space is = Ω {(β ,σ 2 2 ); − ∞ < βi < ∞, σ= > 0, i 1, 2,..., p} and sample space under H 0 is ω= {(β 0 (1) , β (2) , σ 2 ); −∞ < βi < ∞, σ 2 > 0, i= r + 1, r + 2,..., p} . Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur 18 The likelihood function is n 1 2 1 ,σ 2 ) exp − 2 ( y − X β )′( y − X β ) . L( y β= 2 2πσ 2σ The maximum value of likelihood function under Ω is obtained by substituting the maximum likelihood estimates of β and σ 2 , i.e., βˆ = ( X ′X ) −1 X ′y 1 n σˆ 2 =( y − X βˆ )′( y − X βˆ ) as n 1 2 1 ,σ 2 ) exp − 2 ( y − X βˆ )′( y − X βˆ ) Max L( y β= 2 Ω 2πσˆ 2σˆ n 2 n n exp − . = ' 2 2π ( y − X βˆ ) ( y − X βˆ ) Now we find the maximum value of likelihood function under H 0 . The model under H 0 becomes 0 + X 2 β 2 + ε . The likelihood function under H 0 is Y = X 1β (1) n 1 2 1 0 0 exp − 2 ( y − X 1β (1) − X 2 β (2) )′( y − X 1β (1) − X 2 β (2) ) L( y = β ,σ 2 ) 2 2πσ 2σ n 1 2 1 exp − 2 ( y * − X 2 β (2) )′( y * − X 2 β (2) ) = 2 2πσ 2σ (0) where y*= y − X 1β (1) . Note that β (2) and σ 2 are the unknown parameters . This likelihood function looks like as if it is written for y* ~ N ( X 2 β (2) , σ 2 ). This helps is writing the maximum likelihood estimators of β (2) and σ 2 directly as βˆ(2) = ( X 2' X 2 ) −1 X 2' y * 1 n ( y * − X 2 βˆ(2) )′( y * − X 2 βˆ(2) ). σˆ 2 = . Note that X 2' X 2 is a principal minor of X ′X . Since X ′X is a positive definite matrix, so X 2' X 2 is also positive definite . Thus ( X 2' X 2 ) −1 exists and is unique. Thus the maximum value of likelihood function under H 0 is obtained as Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur 19 n 1 2 1 exp − 2 ( y * − X 2 βˆ(2) )′( y * − X 2 βˆ(2) ) Max L( y * βˆ ,= σ ) 2 ω 2πσˆ 2σˆ 2 n 2 n n exp − ˆ ˆ 2 2π ( y * − X 2 β (2) ) '( y * − X 2 β (2) ) 0 The likelihood ratio test statistic for H 0 : β (1) = β (1) is λ= max L( y β , σ 2 ) ω max L( y β , σ 2 ) Ω ( y − X βˆ )′( y − X βˆ ) = ˆ ˆ ( y * − X 2 β (2) )′( y * − X 2 β (2) ) n 2 ( y * − X 2 βˆ(2) )′( y * − X 2 βˆ(2) ) − ( y − X βˆ )′( y − X βˆ ) + ( y − X βˆ )′( y − X βˆ ) = ( y − X βˆ )′( y − X βˆ ) ( y * − X 2 βˆ(2) )′( y * − X 2 βˆ(2) ) − ( y − X βˆ )′( y − X βˆ ) = 1 + ( y − X βˆ )′( y − X βˆ ) -n q1 2 = 1 + q2 - - n2 n 2 where q1 = ( y * − X 2 βˆ(2) )′( y * − X 2 βˆ(2) ) − ( y − X βˆ )′( y − X βˆ ) and q2 = ( y − X βˆ )′( y − X βˆ ). Now we simplify q1 and q2 . Consider ( y * − X 2 βˆ(2) )′( y * − X 2 βˆ(2) )′ = = ( y * − X 2 ( X 2' X 2 ) −1 X 2' y*)′( y * − X 2 ( X 2' X 2 ) −1 X 2' y*) = y *' I − X 2 ( X 2' X 2 ) −1 X 2' y * ' 0 0 ( y − X 1β (1) = − X 2 β (2) ) + X 2 β (2) I − X 2 ( X 2' X 2 ) −1 X 2' ( y − X 1β (1) − X 2 β (2) ) + X 2 β (2) 0 0 = ( y − X 1β (1) − X 2 β (2) )′ I − X 2 ( X 2' X 2 ) −1 X 2' ( y − X 1β (1) − X 2 β (2) ). 0. The other terms becomes zero using the result X 2' I − X 2 ( X 2' X 2 ) −1 X 2' = 0 0 Note that under H 0 , X 1β (1) + X 2 β (2) can be expressed as ( X 1 X 2 )( β (1) β (2) ) ' , Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur 20 Consider ( y − X βˆ )′( y − X βˆ )′ = = ( y − X ( X ' X ) −1 X ' y )′( y − X ( X ' X ) −1 X ' y ) = y′ I − X ( X ′X ) −1 X ' y ′ 0 0 0 0 − X 2 βˆ(2) ) + X 1β (1) + X 2 β (2) ) I − X ( X ' X ) −1 X ′ ( y − X 1β (1) − X 2 βˆ(2) ) + X 1β (1) + X 2 β (2) ) = ( y − X 1β (1) 0 0 − ( y − X 1β (1) − X 2 β (2) ) I − X ( X ′X ) −1 X ′ ( y − X 1β (1) − X 2 β (2) ) 0 0 − X 2 β (2) ) ' I − X ( X ′X ) −1 X ′ ( y − X 1β (1) − X 2 β (2) ) = ( y − X 1β (1) 0. and other term becomes zero using the result X ' I − X ( X ′X ) −1 X ′ = 0 Note that under H 0 , the term X 1β (1) + X 2 β (2) can be expressed as ( X 1 0 X 2 )( β (1) β (2) ) ' . Thus q1 = ( y * − X 2 βˆ(2) )′( y * − X 2 βˆ(2) )′ − ( y − X βˆ )′( y − X βˆ )′ = y *' I − X 2 ( X 2′ X 2 ) −1 X 2′ y * − y ' I − X ( X ′X ) −1 X ′ y 0 0 =− − X 2 β (2) )′ I − X 2 ( X 2' X 2 ) −1 X 2' ( y − X 1β (1) − X 2 β (2) ) ( y X 1β (1) 0 0 − ( y − X 1β (1) − X 2 β (2) ) ' I − X ( X ′X ) −1 X ′ ( y − X 1β (1) − X 2 β (2) ) 0 0 =− ( y X 1β (1) − X 2 β (2) )′ X ( X ′X ) −1 X ′ − X 2 ( X 2' X 2 ) −1 X 2' ( y − X 1β (1) − X 2 β (2) ) ( y − X βˆ )′( y − X βˆ ) q2 = = y′ [ I − X ( X ′X )′ X ′] y 0 0 ( y − X 1β (1) = − X 2 β (2) ) + ( X 1β (1) + X 2 β (2) ) [ I − X ( X ′X )′ X ′] 0 0 × ( y − X 1β (1) − X 2 β (2) ) + ( X 1β (1) + X 2 β (2) ) ' 0 0 =− − X 2 β (2) ) ' I − X ( X ′X ) −1 X ′ ( y − X 1β (1) − X 2 β (2) ). ( y X 1β (1) Other terms become zero. Note that in simplifying the terms q1 and q2 , we tried to write them in the 0 quadratic form with same variable ( y − X 1β (1) − X 2 β (2) ). Using the same argument as in the case 1, we can say that since λ function of is a monotonic decreasing q1 , so the likelihood ratio test rejects H 0 whenever q2 Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur 21 q1 >C q2 where C is a constant to be determined by the size α of the test. The likelihood ratio test statistic can also be obtained through least squares method as follows: 0 (q1 + q2 ) : Minimum value of ( y − X β )′( y − X β ) when H 0 : β (1) = β (1) holds true. : Sum of squares due to H 0 q2 : Sum of squares due to error. q1 : Sum of squares due to the deviation from H 0 or sum of squares due to β (1) adjusted for β (2) . 0 If β (1) = 0 then q1 = ( y − X 2 βˆ(2) ) '( y − X 2 βˆ(2) ) − ( y − X βˆ ) '( y − X βˆ ) ' X 2' y ) − ( y′y − βˆ ′ X ′y ) = ( y′y − βˆ(2) = βˆ ′ X ′y ' X 2' y. βˆ(2) − ↓ ↓ sum of squares due to β (2) Reduction sum of squares or sum of squares due to β ignoring β (1) Now we have the following theorem based on the Theorems 6 and 7. Theorem 10: Let 0 Z= Y − X 1β (1) − X 2 β (2) Q1 = Z ′AZ Q2 = Z ′BZ = where A X ( X ′X ) −1 X '− X 2 ( X 2' X 2 ) −1 X 2' B= I − X ( X ′X ) −1 X '. Q1 Q2 Q1 Q2 ~ χ 2 (n − p ). σ2 σ σ σ Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur Then 2 and 2 are independently distributed. Further 2 ~ χ 2 (r ) and 22 Thus under H 0 , Q1 / r n − p Q1 follow a F-distribution F (r , n − p ). = Q2 /(n − p ) r Q2 Hence the constant C in λ is = C F1−α (r , n − p ), where F1−α (r , n − p ) denotes the upper α % points on F-distribution with r and (n − p ) degrees of freedom. The analysis of variance table for this null hypothesis is as follows: 0 ANOVA for testing H 0 : β (1) = β (1) Source of Variation Degrees of Freedom Sum of squares Due to β (1) r q1 Error n− p q2 Total n − ( p − r) Mean squares q1 r q2 (n − p) F-value n − p q1 r q2 q1 + q2 Case 3: Test of H 0 : L′β = δ Let us consider the test of hypothesis related to a linear parametric function. Assuming that the linear parameter function L′β is estimable where L = (1 , 2 ,..., p )′ is a p × 1 vector of known constants and β = ( β1 , β 2 ,..., β p )′ . The null hypothesis of interest is H 0 : L′β = δ . where δ is some specified constant. 2 Consider the set up of linear model= Y X β + ε where Y = (Y1 , Y2 ,..., Yn )′ follows N ( X β , σ I ). The maximum likelihood estimators of β and σ 2 are βˆ = ( X ′X ) −1 X ′y 1 and σˆ 2 =( y − X βˆ )′( y − X βˆ ) n respectively. The maximum likelihood estimate of estimable L′β is L′βˆ , with Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur 23 E ( L ' βˆ ) = L′β Cov( L′βˆ ) = σ 2 L′( X ′X ) −1 L Lβˆ ~ N L′β , σ 2 L′( X ′X ) −1 L and nσˆ 2 ~ χ 2 (n − p) σ2 assuming X to be the full column rank matrix. Further, nσˆ 2 are also independently L′βˆ and 2 σ distributed. Under H 0 : L′β = δ , the statistic (n − p )( L′βˆ − δ ) t= nσˆ 2 L′( X ′X ) −1 L follows a t-distribution with (n − p ) degrees of freedom. So the test for H 0 : L′β = δ against H1 : L′β ≠ δ rejects H 0 whenever t ≥t 1− α (n − p) 2 where t1−α (n1 ) denotes the upper α % points on t-distribution with n1 degrees of freedom. : φ1 δ= δ 2 ,..., = φk δ k Case 4: Test of H 0= 1 , φ2 Now we develop the test of hypothesis related to more than one linear parametric functions. Let the ith estimable linear parametric function is φi = L'i β and there are k such functions with Li and β both being p × 1 vectors as in the Case 3. Our interest is to test the hypothesis H 0= : φ1 δ= δ 2 ,..., = φk δ k 1 , φ2 where δ1 , δ 2 ,..., δ k are the known constants. Let φ = (φ1 , φ2 ,..., φk )′ and δ = (δ1 , δ 2 ,.., δ k )′. ′β δ : φ L= Then H 0 is expressible as H 0= where L′ is a k × p matrix of constants associated with L1 , L2 ,..., Lk . The maximum likelihood estimator of φi is : φˆi = L'i βˆ where βˆ = ( X ′X ) −1 X ′y. ˆ , φˆ ,..., φˆ ) Lβˆ . Then φˆ (φ= = 1 2 k Also E (φˆ) = φ Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur 24 Cov(φˆ) = σ 2V V = (( L'i ( X ′X ) −1 L j )) where where ( L'i ( X ′X ) −1 L j ) is the (i, j )th element of V . Thus (φˆ − φ )′V −1 (φˆ − φ ) σ2 follows a χ 2 − distribution with k degrees of freedom and nσˆ 2 σ 2 follows a χ 2 − distribution with (n − p ) degrees of freedom where 1 n σˆ 2 =( y − X βˆ )′( y − X βˆ ) is the maximum likelihood estimator of σ 2 . Further (φˆ − φ )′V −1 (φˆ − φ ) σ2 and nσˆ 2 σ2 are also independently distributed. Thus under H 0 : φ = δ (φˆ − δ )′V −1 (φˆ − δ ) σ2 k 2 nσˆ 2 σ (n − p) n − p (φˆ − δ )′V −1 (φˆ − δ ) or nσˆ 2 k follows F- distribution with k and (n − p ) degrees of freedom. So the hypothesis H 0 : φ = δ is 1, 2,..., k whenever F ≥ F1−α (k , n − p ) where rejected against H1 : At least one φì ≠ δ i for i = F1−α (k , n − p ) denotes the 100α % points on F-distribution with k and (n – p) degrees of freedom. One-way classification with fixed effect linear models of full rank: The objective in the one-way classification is to test the hypothesis about the equality of means on the basis of several samples which have been drawn from univariate normal populations with different means but the same variances. Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur 25 Let there be p univariate normal populations and samples of different sizes are drawn from each of the population. Let yij ( j = 1, 2,..., ni ) be a random sample from the ith normal population with mean βi and variance σ 2 , i = 1, 2,..., p , i.e. 2,..., ni ; i 1, 2,..., p. βi , σ 2 ), j 1,= Yij ~ N (= The random samples from different population are assumed to be independent of each other. These observations follow the set up of linear model Y Xβ +ε = where Y = (Y11 , Y12 ,..., Y1n1 , Y21 ,..., Y2 n2 ,..., Yp1 , Yp 2 ,..., Ypn p ) ' y = ( y11 , y12 ,..., y1n1 , y21 ,..., y2 n2 ,..., y p1 , y p 2 ,..., y pn p ) ' β = ( β1 , β 2 ,..., β p )′ ε = (ε11 , ε12 ,..., ε1n , ε 21 ,..., ε 2 n ,..., ε p1 , ε p 2 ,..., ε pn ) ' 1 2 p 1 0...0 n1 values 1 0 0 0 1...0 n values 2 X = 0 1...0 0 0...1 n p values 0 0...1 1 if βi occurs in the j th observation x j xij = or if effect βi is present in x j 0 if effect βi is absent in x j p n = ∑ ni . i =1 So X is a matrix of order n × p, β is fixed and - first n1 rows of ε are ε1' = (1, 0, 0,..., 0), - next n2 rows of ε are ε 2' = (0,1, 0,..., 0) Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur 26 - and similarly the last n p rows of ε are ε 'p = (0, 0,..., 0,1). Obviously, rank = , E (Y ) X β and Cov(Y ) = σ 2 I . ( X ) p= This completes the representation of a fixed effect linear model of full rank. The null hypothesis of interest is H 0 : β1= β 2= ...= β p= β (say) and H1 : At least one βi ≠ β j (i ≠ j ) where β and σ 2 are unknown. We would develop here the likelihood ratio test. It may be noted that the same test can also be derived through the least squares method. This will be demonstrated in the next module. This way the readers will understand both the methods. We already have developed the likelihood ratio for the hypothesis H 0 : β1= β 2= ...= β p in the case 1. The = Ω whole {(β ,σ 2 parametric space Ω is a ( p + 1) dimensional space 2 ) : −∞ < β < ∞, σ = > 0, i 1, 2,..., p} . Note that there are ( p + 1) parameters are β1 , β 2 ,..., β p and σ 2 . Under H 0 , Ω reduces to two dimensional space ω as = ω {(β ,σ 2 ); −∞ < β < ∞, σ 2 > 0} . . The likelihood function under Ω is n 1 p ni 1 2 2 exp L( y β , σ ) = − 2 ∑∑ ( yij − βi ) 2 2πσ 2σ =i 1 =j 1 2 1 p ni n − ln (2πσ 2 ) − 2 ∑∑ ( yij − βi ) 2 ln L( y β , σ 2 ) = L= 2 2σ =i 1 =j 1 ∂L 1 =0 ⇒ βˆi = ∂βi ni ni ∑y j =1 ij =yio ∂L 1 p ni 2 ˆ = ⇒ = σ 0 ∑∑ ( yij − yio )2 . ∂σ 2 n=i 1 =j 1 Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur 27 The dot sign ( o ) in yio indicates that the average has been taken over the second subscript j. The Hessian matrix of second order partial derivation of ln L with respect to βi and σ 2 is negative definite at β = yio and σ 2 = σˆ 2 which ensures that the likelihood function is maximized at these values. Thus the maximum value of L( y β , σ 2 ) over Ω is n 1 p ni 1 2 − 2 ∑∑ ( yij − βˆi ) 2 Max L( y β , σ 2 ) = exp 2 Ω 2πσˆ 2σ =i 1 =j 1 n = p ni 2 2π ∑∑ ( yij − yio ) =i 1 =j 1 n/2 n exp − . 2 The likelihood function under ω is n 1 p ni 1 2 exp − 2 ∑∑ ( yij − β ) 2 L( y β , σ ) = 2 2πσ 2σ =i 1 =j 1 2 1 p ni n ln L( y β , σ 2 ) = − ln(2πσ 2 ) − 2 ∑∑ ( yij − β ) 2 2 2σ =i 1 =j 1 The normal equations and the least squares are obtained as follows: ∂ ln L( y β , σ 2 ) ∂β ∂ ln L( y β , σ 2 ) ∂σ 2 p ni 1 =0 ⇒ βˆ = ∑∑ yij = yoo n=i 1 =j 1 1 p ni =0 ⇒ σˆ 2 = ∑∑ ( yij − yoo ) 2 . n=i 1 =j 1 The maximum value of the likelihood function over ω under H 0 is n 1 p ni 1 2 exp − 2 ∑∑ ( yij − βˆ ) 2 Max L( y β , σ 2 ) = 2 ω 2πσˆ 2σˆ =i 1 =j 1 n = p ni 2 2π ∑∑ ( yij − yoo ) =i 1 =j 1 n/2 n exp − . 2 The likelihood ratio test statistic is Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur 28 λ= Max L( y β , σ 2 ) ω Max L( y β , σ 2 ) Ω p ni 2 ∑∑ ( yij − yio ) i 1 j 1 = =p =ni 2 ∑∑ ( yij − yoo ) =i 1 =j 1 n/2 We have that p ni ∑∑ ( y ij =i 1 =j 1 −= yoo ) 2 = p ni ∑∑ ( y ij =i 1 =j 1 p ni ∑∑ ( y =i 1 =j 1 ij 2 − yio ) + ( yio − yoo ) p − yio ) + ∑ ni ( yio − yoo ) 2 2 =i 1 Thus n 2 − λ= − yi ) 2 + ∑ ni ( yio − yoo ) 2 I 1 = p ni ( yij − yio ) 2 ∑∑ =i 1 =j 1 ni p p ∑∑ ( y ij =i 1 =j 1 q = 1 + 1 q2 − n 2 where = q1 p ∑n (y i =1 i io − yoo ) ,= and q2 2 p ni ∑∑ ( y =i 1 =j 1 ij − yio ) 2 . Note that if the least squares principal is used, then q 1 : sum of squares due to deviations from H 0 or the between population sum of squares, q 2 : sum of squares due to error or the within population sum of squares, q 1 +q 2 : sum of squares due to H 0 or the total sum of squares. Using the theorems 6 and 7, let p Q1 = ∑ ni (Yio − Yoo ) , 2 p Q2 = ∑ Si2 i 1 =i 1 where Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur 29 ni ∑ (Y = Si2 ij i =1 − Yio ) 2 Yoo = 1 p ni ∑∑ Yij n=i 1 =j 1 Yio = 1 ni ni ∑Y j =1 ij then under H 0 Q1 σ 2 Q2 σ 2 ~ χ 2 ( p − 1) ~ χ 2 (n − p) and Q1 σ 2 Q2 and σ2 are independently distributed. Thus under H 0 Q1 σ2 p −1 ~ F ( p − 1, n − p ). Q 2 σ2 n− p The likelihood ratio test reject H 0 whenever q1 >C q2 C F1−α ( p − 1, n − p). where the constant = The analysis of variance table for the one-way classification in fixed effect model is Source of Variation Degrees of freedom Sum of squares Between Population p −1 q1 Within Population n− p q2 Total n −1 Mean sum of squares q1 p −1 q2 n− p F n − p q1 . p − 1 q2 q1 + q2 Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur 30 Note that Q E 2 =σ2 n − p p ( βi − β )2 ∑ Q1 1 p 2 i =1 = + = σ β ; E ∑ βi p −1 p i =1 p − 1 Case of rejection of H0 If F > F1−α ( p − 1, n − p ), then H 0 : β1= β 2= ...= β p is rejected. This means that at least one βi is different from other which is responsible for the rejection. So objective is to investigate and find out such βi and divide the population into group such that the means of populations within the groups are same. This can be done by pairwise testing of β ' s. : βi β k (i ≠ k ) against H1 : βi ≠ β k . Test H 0= This can be tested using following t-statistic t= Yio − Yko 1 1 s2 + ni nk which follows the t distribution with (n − p ) degrees of freedom under H 0 and s 2 = q2 . Thus n− p the decision rule is to reject H 0 at level α if the observed difference ( yio − yko ) > t α The quantity t α 1− , n − p 2 1− , n − p 2 1 1 s2 + ni nk 1 1 s 2 + is called the critical difference. ni nk Thus following steps are followed : 1. Compute all possible critical differences arising out of all possible pair ( βi , β k ), i ≠ k = 1, 2,..., p . 2. Compare them with their observed differences 3. Divide the p populations into different groups such that the populations in the same group have same means. Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur 31 The computation are simplified if ni = n for all i. In such a case , the common critical difference (CCD) is CCD = t α 1− , n − p 2 2s 2 n and the observed difference ( yio − yko ), i ≠ k are compared with CCD. If yio − yko > CCD then the corresponding effects/means yio and yko are coming from populations with the different means. Note: In general we say that if there are three effects β1 , β 2 , β3 then β 2 (≡ denote as event A) is accepted if H 01 : β= 1 β3 (≡ denote as event B) is accepted and if H 02 : β= 2 β 2 (≡ denote as event C ) will be accepted. The question arises here that in what sense then H 03 : β= 1 do we conclude such statement about the acceptance of H 03. The reason is as follows: Since event A B ⊂ C , so P ( A B ) ≤ P (C ) In this sense if the probability of an event is higher than the intersection of the events, i.e., the probability that H 03 is accepted is higher than the probability of acceptance of H 01 and H 02 both, so we conclude , in general , that the acceptance of H 01 and H 02 imply the acceptance of H 03 . Multiple comparison test: One interest in the analysis of variance is to decide whether population means are equal or not. If the hypothesis of equal means is rejected then one would like to divide the populations into subgroups such that all populations with same means come to the same subgroup. This can be achieved by the multiple comparison tests. A multiple comparison test procedure conducts the test of hypothesis for all the pairs of effects and compare them at a significance level α i.e., it works on per comparison basis. Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur 32 This is based mainly on the t-statistic. If we want to ensure that the significance level α simultaneously for all group comparison of interest, the approximate multiple test procedure is one that controls the error rate per experiment basis. There are various available multiple comparison tests. We will discuss some of them in the context of one-way classification. In two-way or higher classification, they can be used on similar lines. 1. Studentized range test: It is assumed in the Studentized range test that the p samples, each of size n, have been drawn from p normal populations. Let their sample means be y1o , y2 o ,..., y po These means are ranked and arranged and y *p Max in an ascending order as y1* , y2* ,..., y *p where y1* = Min yio= yio , i 1, 2,..., p. = i i Find the range as = R y *p − y1* . The Studentized range is defined as q p, n− p = R n s where qα , p ,γ is the upper α % point of Studentized range when γ = n − p. The tables for qα , p ,γ are available. The testing procedure involves the comparison of q p ,γ with qα , p ,γ in the usual way asβ1= β 2= ...= β p • if q p ,n − p < qα , p ,n − p then conclude that • if q p ,n − p > qα , p ,n − p then all β ′s in the group are not the same. . 2. Student-Newman-Keuls test: The Student-Newman-Keuls test is similar to Studentized range test in the sense that the range is compared with α % points on critical Studentized range W P given by W p = qα , p ,γ s2 . n = y *p − y1* is now compared with W p . The observed range R • If R < W p then stop the process of comparison and conclude that β1= β 2= ...= β p . • If R > W p then Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur 33 (i) divide the ranked means y1* , y2* ,..., y *p into two subgroups containing - ( y *p , y *p −1 ,..., y2* ) and ( y *p −1 , y *p − 2 ,..., y1* ) (ii) y *p − y2* and = Compute the ranges R= R2 y *p −1 − y1* . Then compare the ranges R1 and R2 1 with W p −1 . If either range ( R1 or R2 ) is smaller than W p −1 , then the means (or βi ’s) in each of the groups are equal. If R1 and/or R2 are greater then W p −1 , then the ( p − 1) means (or βi ’s) in the group concerned are divided into two groups of ( p − 2) means (or βi ’s) each and compare the range of the two groups with W p − 2 . Continue with this procedure until a group of i means (or βi ’s) is found whose range does not exceed Wi . By this method, the difference between any two means under test is significant when the range of the observed means of each and every subgroup containing the two means under test is significant according to the studentized critical range. This procedure can be easily understood by the following flow chart. Arrange ' yio s in increasing order y1* ≤ y2* ≤ ... ≤ y *p Compute = R y *p − y1* Compare with W p = qα , p ,γ If R < W p Stop and conclude β1= β 2= ...= β p s2 n If R > W p continue Divide ranked mean in 2 groups ( y ,.*p,..., y2* ) and ( y *p −1 ,..., y1* ) Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur 34 Compute R= y p* − y2* 1 = R2 y *p −1 − y1* Compare R1 and R2 with W p −1 4 possibilities of R1 and R2 with W p −1 R1 < W p −1 R2 < W p −1 ⇒ β 2 =β3 =... =β p and β1= β 2= ...= β p −1 ⇒ β1 =β 2 =... =β p R1 < W p −1 R1 > W p −1 R2 > W p −1 R2 < W p −1 ⇒ β 2 =β3 =... =β p ⇒ β1 =β 2 =... =β p −1 and at least one βi ≠ β j , = i ≠ j 1, 2,..., p − 1 and at least one 2,3,..., p βi ≠ β j , i ≠ j = which is β1 which is β p ⇒ one subgroup is ⇒ One subgroup has ( β 2 , β3 ,..., β p ) ( β1 , β 2 ,..., β p −1 ) and another group has only β1 and another has only β p R1 > W p −1 R2 > W p −1 Divide ranked means in four groups 1. y *p ,..., y3* 2. y p* −1 ,..., y2* 3. y p* −1 ,..., y2* (same as in 2) 4. y p* − 2 ,..., y1* Compute R= y *p − y3* 3 = R4 y *p −1 − y2* = R5 y *p − 2 − y1* and compare with W p − 2 Continue till we get subgroups with same ' β s Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur 35 3. Duncan’s multiple comparison test: The test procedure in Duncan’s multiple comparison test is the same as in the Student-NewmanKeuls test except the observed ranges are compared with Duncan’s α % critical range D p = qα* p , p ,γ s2 n where α p =1 − (1 − α ) p −1 , qα* p , p,γ denotes the upper 100α % points of the Studentized range based on Duncan’s range. Tables for Duncan’s range are available. Duncan feels that this test is better than the Student-Newman-Keuls test for comparing the differences between any two ranked means. Duncan regarded that the Student-Newman-Keuls method is too stringent in the sense that the true differences between the means will tend to be missed too often. Duncan notes that in testing the equality of a subset k , (2 ≤ k ≤ p ) means through null hypothesis, we are in fact testing whether ( p − 1) orthogonal contrasts between the β ' s differ from zero or not. If these contrasts were tested in separate independent experiments, each at level α , the probability of incorrectly rejecting the null hypothesis would be 1 − (1 − α ) p −1 . So Duncan proposed to use 1 − (1 − α ) p −1 in place of α in the Student-Newman-Keuls test. [Reference: Contributions to order statistics, Wiley 1962, Chapter 9 (Multiple decision and multiple comparisons, H.A. David, pages 147-148)]. Case of unequal sample sizes: When sample means are not based on the same number of observations, the procedures based on Studentized range, Student-Newman-Keuls test and Duncan’s test are not applicable. Kramer proposed that in Duncan’s method, if a set of p means is to be tested for equality, then replace qα* * , p ,γ p s 1 1 1 by qα* * , p ,γ s + p 2 nU nL n where nU and nL are the number of observations corresponding to the largest and smallest means in the data. This procedure is only an approximate procedure but will tend to be conservative, since means based on a small number of observations will tend to be overrepresented in the extreme groups of means. Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur 36 Another option is to replace n by the harmonic mean of n1 , n2 ,..., n p , i.e., p 1 ∑ i =1 ni p . 4. The “Least Significant Difference” (LSD): In the usual testing of H 0 : β i = β k against H1 : βi ≠ β k , the t-statistic t= yio − yko (y − y ) Var io ko is used which follows a t-distribution , say with degrees of freedom ' df ' . Thus H 0 is rejected whenever t >t df , 1− α 2 and it is concluded that β1 and β 2 are significantly different. The inequality t > t df , 1− α can be 2 equivalently written as yio − yko > t df , 1− α 2 (y − y ) . Var io ko If every pair of sample for which yio − yko exceeds t df , 1− (y − y ) Var io ko α 2 then this will indicate that the difference between βi and β k is significantly different . So according to this, the quantity t df , 1− α 2 ( y − y ) would be the least difference of y and y for which it Var io ko ko io will be declared that the difference between βi and β k is significant. Based on this idea, we use the pooled variance of the two sample Var ( yio − yko ) as s2 and the Least Significant Difference (LSD) is defined as = LSD t α df , 1− 2 1 1 s2 + . ni nk n= n, then If n= 1 2 LSD = t df , 1− α 2 2s 2 . n p ( p − 1) pairs of 2 yio and yko , (i ≠ k = 1, 2,..., p ) are compared with LSD. Use of LSD criterion may not lead to good results if it is used for comparisons suggested by the data Now all Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur 37 (largest/smallest sample mean) or if all pairwise comparisons are done without correction of the test level. If LSD is used for all the pairwise comparisons then these tests are not independent. Such correction for test levels was incorporated in Duncan’s test. 5. Tukey’s “Honestly significant Difference” (HSD) In this procedure , the Studentized rank values qα , n ,γ are used in place of t-quantiles and the standard error of the difference of pooled mean is used in place of standard error of mean in common critical difference for testing H 0 : βi = β k against H 0 : βi ≠ β k and Tukey’s Honestly Singnificant Difference is computed as HSD = q α 1− , p ,γ 2 MSerror . n assuming all samples are of the same size n. All p ( p − 1) pairs yio − yko 2 are compared with HSD. If yio − yko > HSD then βi and β k are significantly different. We notice that all the multiple comparison test procedure discussed up to now are based on the testing of hypothesis. There is one-to-one relationship between the testing of hypothesis and the confidence interval estimation. So the confidence interval can also be used for such comparisons. 0 so first we establish the relationship and then describe Since H 0 : βi = β k is same as H 0 : βi − β k = the Tukey’s and Scheffe’s procedures for multiple comparison test which are based on the confidence interval. We need the following concepts. Contrast: A linear parametric function= L l= 'β p ∑ β i =1 i i where β = ( β1 , β 2 ,..., β P ) and = (1 , 2 ,..., p ) are the p × 1 vectors of parameters and constants respectively is said to be a contrast when p ∑ i = 0. i=1 For example β1 − β 2= 0, β1 + β 2 − β3 − β1= 0, β1 + 2β 2 − 3β3= 0 etc. are contrast whereas β1 + β 2 = 0, β1 + β 2 + β3 + β 4 = 0, β1 − 2β 2 − 3β3 = 0 etc. are not contrasts. Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur 38 Orthogonal contrast: If = L1 = 'β p L2 m= 'β ∑ i βi and= i =1 p ∑ mi βi are contrasts such that ′m = 0 or i =1 p ∑ m = 0 then L i =1 i i 1 and L2 are called orthogonal contrasts. For example, L1 = β1 + β 2 − β3 − β 4 and L2 = β1 − β 2 + β3 − β 4 are contrasts. They are also the orthogonal contrasts. p The condition ∑ m i =1 i i = 0 ensures that L1 and L2 are independent in the sense that p Cov ( L1 , L2 ) σ= = ∑ i mi 0. 2 i =1 Mutually orthogonal contrasts: If there are more than two contrasts then they are said to be mutually orthogonal, if they are pairwise orthogonal. It may be noted that the number of mutually orthogonal contrasts is the number of degrees of freedom. Coming back to the multiple comparison test, if the null hypothesis of equality of all effects is rejected then it is reasonable to look for the contrasts which are responsible for the rejection. In terms of contrasts, it is desirable to have a procedure (i) that permits the selection of the contrasts after the data is available. (ii) with which a known level of significance is associated. Such procedures are Tukey’s and Scheffe’s procedures. Before discussing these procedure, let us consider the following example which illustrates the relationship between the testing of hypothesis and confidence intervals. Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur 39 Example: Consider the test of hypothesis for H 0 : β= β j (i ≠ = j 1, 2,..., p) i or H 0 : βi − β j = 0 or H 0 : contrast = 0 or H 0 : L = 0. The test statistic for H 0 : βi = β j is t ( βˆi − βˆ j ) − ( βi − β j ) = (y − y ) Var io ko Lˆ − L ( Lˆ ) Var where β̂ denotes the maximum likelihood (or least squares) estimator of β and t follows a tdistribution with df degrees of freedom. This statistic, in fact, can be extended to any linear contrast, say e.g., L = β1 + β 2 − β3 − β 4 , Lˆ = βˆ1 + βˆ2 − βˆ3 − βˆ4 . The decision rule is reject H 0 : L = 0 against H1 : L ≠ 0 . If ( Lˆ ) . Lˆ > tdf Var The 100 (1 − α ) % confidence interval of L is obtained as Lˆ − L P −tdf ≤ ≤ tdf =1 − α ˆ Var ( L) ( Lˆ ) ≤ L ≤ Lˆ + t Var ( Lˆ ) =1 − α or P Lˆ − tdf Var df so that the 100(1 − α ) % confidence interval of L is Lˆ − t Var ( Lˆ ), Lˆ + t Var ( Lˆ ) df df and ( Lˆ ) ≤ L ≤ Lˆ + t Var ( Lˆ ) Lˆ − tdf Var df If this interval includes L = 0 between lower and upper confidence limits, then H 0 : L = 0 is accepted. Our objective is to know if the confidence interval contains zero or not. Suppose for some given data the confidence intervals for β1 − β 2 and β1 − β3 are obtained as −3 ≤ β1 − β 2 ≤ 2 and 2 ≤ β1 − β3 ≤ 4. Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur 40 Thus we find that the interval of β1 − β 2 includes zero which implies that accepted. Thus β1 = β 2 . H 0 : β1 − β 2 = 0 is On the other hand interval of β1 − β 3 does not include zero and so H 0 : β1 − β3 = 0 is not accepted . Thus β1 ≠ β3 . If the interval of β1 − β3 is −1 ≤ β1 − β3 ≤ 1 then H 0 : β1 = β3 is accepted. If both H 0 : β1 = β 2 and H 0 : β1 = β3 are accepted then we can conclude that β= β= β3 . 1 2 Tukey’s procedure for multiple comparison (T-method) The T-method uses the distribution of studentized range statistic . (The S-method (discussed next) utilizes the F-distribution). The T-method can be used to make the simultaneous confidence statements about contrasts ( βi − β j ) among a set of parameters {β1 , β 2 ,..., β p } and an estimate s 2 of error variance if certain restrictions are satisfied. These restrictions have to be viewed according to the given conditions. For example, one of the restrictions is that all βˆi ' s have equal variances. In the setup of one-way classification, β̂i has its σ2 mean Yi and its variance is . This reduces to a simple condition that all ni ' s are same, i.e., ni ni = n for all i. so that all the variances are same. Another assumption is to assume that βˆ1 , βˆ2 ,..., βˆ p are statistically independent and the only contrasts considered are the p ( p − 1) 1, 2,..., p . differences βi − β j , i ≠ j = 2 { } We make the following assumptions: (i) The βˆ1 , βˆ2 ,..., βˆ p are statistically independent (ii) 2 2 1, 2,..., p, a > 0 is a known constant. βˆi ~ N ( β= i , a σ ), i (iii) s 2 is an independent estimate of σ 2 with γ degrees of freedom (here γ = n − p ) , i.e., γ s2 ~ χ 2 (γ ) σ2 and (iv) s 2 is statistically independent of βˆ1 , βˆ2 ,..., βˆ p . Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur 41 The statement of T-method is as follows: Under the assumptions (i)-(iv), the probability is (1 − α ) that the values of contrasts = L p p = Ci βi (∑ Ci 0) simultaneously satisfy ∑ =i 1 =i 1 1 p 1 p Lˆ − Ts ∑ Ci ≤ L ≤ Lˆ + Ts ∑ Ci = 2 i 1= 2 i 1 p where Lˆ = ∑ Ci βˆi , βˆi is the maximum likelihood (or least squares) estimate of βi , T = aqα , p ,γ , i =0 with qα , p ,γ beng the upper α point of the distribution of studentized range. Note that if L is a contrast like βi − β j (i ≠ j ) then 1 p ∑ Ci = 1 and the variance is σ 2 so that 2 i =1 a = 1 and the interval simplifies to ( βˆi − βˆ j ) − Ts ≤ β i − β j ≤ ( βˆi − βˆ j ) + Ts ˆ βˆ − βˆ of where T = qα , p ,γ . Thus the maximum likelihood (or least squares) estimate L= i j L= βi − β j is said to be significantly different from zero according to T-criterion if the interval ( βˆi − βˆ j − Ts, βˆi − βˆ j + Ts ) does not cover β i − β j = 0, i.e., if βˆi − βˆ j > Ts 1 p or more general if Lˆ > Ts ∑ Ci 2 i =1 . The steps involved in the testing now involve the following steps: ( ) - Compute Lˆ or βˆi − βˆ j . - Compute all possible pairwise differences. - Compare all the differences with qα , p ,γ . - s 1 p ∑ Ci n 2 i =1 1 p If Lˆ or ( βˆi − βˆ j ) > Ts ∑ Ci 2 i =1 then β̂i and β̂ j are significantly different where T = qα , p ,γ n . Tables for T are available. When sample sizes are not equal, then Tukey-Kramer Procedure suggests to compare L̂ with Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur 42 qα , p ,γ 1 1 1 1 p s + ∑ Ci 2 ni n j 2 i =1 or T 1 1 1 1 p + ∑ Ci 2 ni n j 2 i =1 . The Scheffe’s method (S-method) of multiple comparison S-method generally gives shorter confidence intervals then T-method. It can be used in a number of situations where T-method is not applicable, e.g., when the sample sizes are not equal. A set L of estimable functions {ψ } is called a p-dimensional space of estimable functions if there exists p linearly independent estimable functions (ψ 1 ,ψ 2 ,...,ψ p ) such that every ψ in L is of the p form ψ = ∑ Ci yi where C1 , C2 ,..., C p are known constants. In other word, L is the set of all i =1 linear combinations of ψ 1 ,ψ 2 ,...,ψ p . Under the assumption that the parametric space Ω is Y ~ N ( X β , σ 2 I ) with rank ( X ) = p, β = ( β1 ,..., β p ), X is n × p matrix, consider a p-dimensional space L of estimable functions generated by a set of p linearly independent estimable functions {ψ 1 ,ψ 2 ,...,ψ p }. n For any ψ ∈ L, Let ψˆ = ∑ Ci yi be its least squares (or maximum likelihood) estimator, i =1 n Var (ψˆ ) = σ 2 ∑ Ci2 i =1 = σψ ( say ) 2 n and σˆψ2 = s 2 ∑ Ci2 i =1 where s 2 is the mean square due to error with (n − p ) degrees of freedom. The statement of S-method is as follows: Under the parametric space Ω , the probability is (1 − α ) that simultaneously for all constant S ψ ∈ L, ψˆ − Sσˆψˆ ≤ ψ ≤ ψˆ + Sσˆψˆ where the = pF1−α ( p, n − p ) . Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur 43 Method: For a given space L of estimable functions and confidence coefficient (1 − α ) , the least square (or maximum likelihood) estimate ψˆ of ψ ∈ L will be said to be significantly different from zero according to S-criterion if the confidence interval (ψˆ − Sσˆψˆ ≤ ψ ≤ ψˆ + Sσˆψˆ ) does not cover ψ = 0, i.e., if ψˆ > Sσˆψˆ . The S-method is less sensitive to the violation of assumptions of normality and homogeneity of variances. Comparison of Tukey’s and Scheffe’s methods: 1. Tukey’s method can be used only with equal sample size for all factor level but S-method is applicable whether the sample sizes are equal or not. 2. Although, Tukey’s method is applicable for any general contrast, the procedure is more powerful when comparing simple pairwise differences and not making more complex comparisons. 3. It only pairwise comparisons are of interest, and all factor levels have equal sample sizes, Tukey’s method gives shorter confidence interval and thus is more powerful. 4. In the case of comparisons involving general contrasts, Scheffe’s method tends to give marrower confidence interval and provides a more powerful test. 5. Scheffe’s method is less sensitive to the violations of assumptions of normal distribution and homogeneity of variances. Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur 44