Here - Appalachian State University

advertisement
The General Linear Hypothesis
George H Olson, Ph. D.
Leadership and Educational Studies
Appalachian State University
Manuscript in Preparation
Outline of Manuscript (colored entries are already developed. See below)
Introduction
What is the General Linear Hypothesis?
How does it differ from the General Linear Model (GLM)?
Early work
Paucity of contemporary references
The Univariate General Linear Hypothesis (GLH)
Definition of the Model (the Theory)
Assumptions
Estimation of  and 
Expectation and variance of b
Estimating the variance, σ2.
Testing General Linear Hypotheses
Typical Applications
Analysis of Variance Models
Choice of design matrices
Dummy variable coding
Effect coding
Using a super identity design matrix
Simple examples that can be completed using a calculator (or Excel)
More complex examples using SPSS, SAS, and SYSTAT.
Analysis of Covariance Models
Choice of design matrices
Simple examples that can be completed using a calculator (or Excel)
More complex examples using SPSS, SAS, and SYSTAT
Multiple Regression Models
Simple examples that can be completed using a calculator (or Excel)
More complex examples using SPSS, SAS, and SYSTAT
The Multivariate General Linear Model
Definition of the Model (the Theory)
Additional Assumptions
Types of Hypotheses that can be Tested
Most General Model: ABC =D
Estimation of Parameter Matrices, β and 
Typical Applications
Analysis of Variance
The Equations
The statistical tests
Simple Examples that can be computed using a calculator (or Excel)
More complex examples using SPSS, SAS, and SYSTAT
Analysis of Variance with Repeated Measures
The Equations
The statistical tests
Simple Examples that can be computed using a calculator (or Excel)
More complex examples using SPSS, SAS, and SYSTAT
Multivariate Profile Analysis
The Equations
The statistical tests
Simple Examples that can be computed using a calculator (or Excel)
More complex examples using SPSS, SAS, and SYSTAT
Analysis of Covariance
The Equations
The statistical tests
Simple Examples that can be computed using a calculator (or Excel)
More complex examples using SPSS, SAS, and SYSTAT
Analysis of Covariance with Repeated Measures
Covariance Measures Collected Prior to Repeated Measures
Covariance Measures Collected Concomitant with Repeated Measures
Multivariate Regression Analysis
The Equations
The statistical tests
Simple Examples that can be computed using a calculator (or Excel)
More complex examples using SPSS, SAS, and SYSTAT
Appendices
Summation Operators (Use paper from my website)
Expectation Operators
Matrix Algebra (Use paper from my website)
Introduction
By now, education researchers should be disposed to the idea that both causes and
effects of educational phenomena are inherently multivariate in nature. Researchers have held
this view for over six decades (see, for example, Tatsuoka, 1973), as well as older surveys of
multivariate statistical applications in education and psychology contained in books by Cattell,
1962, and Whitla, 1968, and more recent surveys by Srivastave & Carter, 1983,and Myers,
2008) In their statistical applications, however, while recognizing that causes of educational
phenomena are, indeed, multidimensional, have tended to employ techniques of multiple
linear regression (MLR) with increasing levels of sophistication, to investigate both the
combined effects of, and interrelationships among, multiple independent variables. For the
most part, however, researchers have tended to not investigate the "multivariateness" of
educational outcomes. Instead, the typical approach has been to study the effects of a common
set of independent variables on each of several criterion variables separately.
A less well-known data-analytic strategy (at least in the social sciences), that also utilizes
GLMs, is the general linear hypothesis (GLH) first proposed by Kolodziejczyk (1935). In 1951
Kempthorne, in a now out-or-publication book, provided a very tractable development GLH
theory and showed how it could be applied of a wide variety of research situations. Graybill
(1961 )provided extensions. Both authors used matrix notation to extend Kolodziejczyk’s
original formulation.
In preparing for this paper, I did a quick, informal review of the studies published in the
last three years in the American Educational Research Journal and the Journal of Educational
Psychology. Over 80 percent of the experimental and comparative studies reported in these
journals investigated effects on multiple criteria. Although several of these studies employed
multivariate techniques, the majority of them (approximately three-fourths) employed multiple
univariate analyses of variance.
The typical example would be a study in which several independent variables (including
pretests, demographic data, and incidence of treatment measures) are examined separately for
their effect on, say, measures of motivation, achievement, and attitude toward school—
variables that few would deny are correlated (perhaps highly) in the population. The possible
ramifications of performing separate univariate analyses on correlated criteria are well-known
(e.g., Myers, 1990). If a Type I error occurs in tests involving one criteria, the probability is
greater than a that it will occur in the tests involving the other criteria also. This fact, by itself,
should provide sufficient motivation to the researcher to seek multivariate techniques for the
analysis of multiple-criteria designs. At the very least a multivariate technique offers the
researcher a procedure for controlling experiment-wise probability of a Type I error.
Multivariate statistical procedures appropriately fit this epistemological point of view
that the "effects" in educational settings are rarely, if ever, unidimensional and therefore
should not be studied in isolation. The study of multiple outcomes simultaneously can afford
the researcher an opportunity to "uncover" complex relationships among treatment and
outcome variables that might otherwise go undiscovered. An example might be a situation in
which it is found that two experimental instructional programs lead not only to increases in
several measures of achievement and motivation, but also, under one of the programs, to
increases in the correlation among the variables. Such a finding might have important
implications for understanding the dynamics of the instructional programs.
If the need for greater application of multivariate statistical procedures can be
established so easily, why then are these procedures not used more frequently in educational
research? Although it is impossible to answer this question completely, there is no doubt that
at least part of the answer lies in the fact that multivariate procedures are more complex, both
mathematically and conceptually, then_univariate procedures. Until recently, researchers who
sought to use multivariate techniques had to first develop a- fairly high level of mathematical
skill to be able to read the existing text books. Even given the requisite mathematical ability,
the existing textbooks have often proven less than useful to the applied researcher since they
contained a paucity of examples on anything even bearing a passing resemblance to the kinds
of problems encountered in educational research.
In terms of practical use of the multivariate general linear model, there has been very
little published since the disappearance of the programs, BMD 05V and BMDX 63, when SPSS,
inc. purchased the BMD suite of statistical package. Most of the literature relating to the
general linear hypothesis is quite dated.
Several years (decades) ago there were a number of intermediate-level (in
mathematics) books on multivariate statistics. Many of these are still (e.g., Cooley & Lohnes,
1971, Finn, 1974, Harris, 1975, Morrison, 1967, Press, 1972, Tatsuoka, 1971, and Van de Geer,
1971). More recently, books by Srivastave & Carter (1983) and Myers (1990) have provided
fairly tractable accounts of the general linear hypothesis. Although these books usually require
some ability in matrix algebra, the mathematical developments tend to be quite readable.
More than that, however, these books provide a rich source of examples of applications of
multivariate procedures. It is likely that, due to these books alone, many more applications of
multivariate procedures have begun to appear, and will continue to appear in the published
literature. Unfortunately, much of the literature on the general linear hypothesis that has been
published in recent years, has not appeared in the education research literature. Rather it
appears in such areas as brain research (Friston, et al., 1994, Keselman, Wilcox, & Lix, 2003),
and marketing research (Perreault & Darden, 1975). Most of the more recent literature on the
general linear hypothesis has been highly technical, mathematical treaties (e.g., Muller &
Peterson, 1984, an examination of methods for computing power in testing the multivariate
general linear hypotheses; McDonald, 1975, tests for general linear hypotheses under various
complex multivariate designs; and Hothorn, Bretz, & Westfall, 2008, simultaneous inference).
The Univariate General Linear Hypothesis (GLH)
We will begin with the univariate general linear hypothesis (GLH) and treat the
multivariate case later.The GLH is the most general of all parametric linear statistical
procedures. In fact, all linear statistical tests (univariate and multivariate) can be developed as
special subclasses of GLH theory. This includes, among other procedures, factor analysis,
discriminant analysis, prediction (or regression) analysis, and the analysis of variance and
covariance. Although the mathematical development of GLH theory is complex, its
conceptualization, at least insofar as the most common applied situations are concerned, is not
particularly difficult. It does require an elementary facility for matrix algebra, however. A simple
statement of the theory is that if a set of dependent variables, Y (where the bolding denotes a
matrix), is linearly related to a set of independent variables, X, by the equation,
y = Xβ + ε,
(1)
where β is a set of unknown parameters of interest, then any hypothesis of the form
H0: A β C = D
(2)
is testable. In this equation, the matrices A and C are used to select particular elements from
among rows and columns of β . The matrix, D, is a matrix of constants specified by the
investigator. Usually D is set equal to 0, a matrix of zeros.
In the next section a brief introduction to the theory of the MGLH is provided. Readers
who wish to pursue this development in greater depth are encouraged to consult the
appropriate chapters in Kempthorne (1952), Kulbach (1968), Mendenhall (1968), and Seal
(1964). Early papers by Smith, Gnanadesikan, and Hughes (1962) and Bock (1963b) develop the
MGLH in a way that is useful to those who may wish to program the computations. Thorough,
but highly mathematical, presentations of the MGLH have been given by Anderson (1958), Rao
(1965), and Seber (1966).
In the third section of this paper, several general algebraic examples of typical
applications are provided. Other examples of applications of the MGLH, using real data, can be
found in Bock (1963a), Bock and Haggard (1968), Jones (1966), and Finn (1974).
Introduction to the Theory of the GLH
In this paper, interest is primarily focused upon the analysis of variance and the analysis of
covariance.- Repeated measures (or profile) analyses of variance and covariance are also
covered
but are treated as special instances of multivariate analyses of variance and covariance. Before
proceeding to illustrative applications of the GLH, however, it is first necessary to lay down
some of the mathematical ground work.
Definition of the Model
We begin by rewriting the general equation for a univariate linear model given earlier,
y = Xβ + ε,
(1)
Where y is an (n × 1) column vector of observations or measurements (dependent variates),
denoted by,
 y1 
y 
y   2,
 
 
 yn 
X is an (n × p) matrix of known predictor or design variables, denoted by,
 x11 x12
x
x 22
21
X


 x n1 x n2
x1 p 
x2 p 
,


x np 
β is a (p × 1) matrix of parameters (coefficients) to be estimated, denoted by
 β1 
β 
2
    , and
 
 
β p 
ε is an (n × 1) matrix of disturbances (residuals, measurement errors), denoted by
 ε1 
ε 
   2.
 
 
ε n 
It should be noted that the model (1) is very general. Depending upon the choice of X
the model forms the basis for a wide variety of common linear statistical procedures. For
instance, if X is a matrix of dummy classification variables, than t (1) is the model for the
analysis of variance. Furthermore, if X consists of measures taken on a set of p predictor
variables, then (1) is a prediction model. Finally, if X contains both classification variables and
predictor variables, (1) is a model for the analysis of covariance.
If y is a vector of “1’s” and “0’s” and X contains measures on a set of predictor
variables, then (1) provides a model for a two-group discriminant analysis. Additionally, if y is
expanded to an (n x q) matrix, Y, of observations on q variates, then (1) serves as a model for an
even wider class of statistical procedures including factor analysis. In this sequel, only those
models where X is a matrix of classification variables, predictor variables, or both are
considered.
Assumptions
The assumptions applicable to the model defined above, for purposes of testing hypotheses,
are the following:
1. The model is linear in terms of the parameters,
2. N > p + q.
3. X is of full rank, p.
4. ε is distributed normally around μ = 0, with variance = σ2 In, where 0 is a column
vector of zeros, and In is an (n × n) identity matrix.
The first assumption is rarely limiting in educational research and evaluation. Many models
which appear non-linear on first sight are actually intrinsically linear. In these situations,
suitable linear models can be written following acceptable transformations of the original
measures (see Draper and Smith, 1968; ch. 5, for a discussion of these types of models).
Assumption number 2 is required to ensure the availability of a sufficient number of
observations to estimate the p elements in  .The third assumption is necessary to ensure a
unique solution for  Since this solution requires the inverse of X'X, it is necessary that X'X be
nonsingular; hence X must be of full tank.
More general, non-unique solution which use generalized inverses of X'X, allow X to
have more columns than its rank (i.e., they allow X to be less than full rank). Since
full-rank design matrices are usually quite easily constructed for most designs in educational
research, the generalized inverse approach is rarely, if ever, needed. Anyway, for any design
matrix less than full rank, there always exists a transformation matrix, T, such that
X* = XT
is of full rank (see Bock [1963], Graybill [1961; pp. 235-239], or Smith [1972], for details on
computing T).
The fourth assumption provides the foundation for the theory underlying tests of GLHs.
The assumption implies that
Y MVN( ij , ).
Estimation of  and 
Since the elements of  and  are not generally known in practice, they must be
estimated. Two procedures are available, viz. maximum likelihood (ML) and Least Squares (LS).
Since the former requires the assumption of multivariate normality whereas the latter does
not, and since the solutions for  are the same in either case, only the LS procedure will be
pursued.
The LS procedure calls for obtaining estimates of  , b say, such that the error sums of
squares and cross-products (SSCP) are a minimum. In matrix terms, estimates of  (viz, b) are
obtained such that the following equation is a minimum. The least squares approach is to find a
solution for b such that the sums of squares ( s 2  y i  ee ) is a minimum. This is accomplished
by differentiating the quadratic form,
s 2  (y - Xb)(y - Xb)
 yy  yXβ  bXy  bXXb
 yy  bXXb.
with respect to b, setting the result to zero, and solving for b:
s 2 / b  2X(y - Xb)  0
 2Xy  2XXb  0.
(3)
Rearranging terms,
Xy  ( XX)b,
where the solution for b is given by
( XX ) 1 Xy  ( X X ) 1 ( X X )b
 b.
(4)
The solution (4) depends upon X'X being nonsingular, a condition that always exists when X is
of full column rank (i.e., rank{X} = p). There are more general solutions to (4), using, for
instance, generalized inverses, but these yield non-unique solutions for b. Usually, however, a
more generalized solution for b is unnecessary for when X is less than full rank (i.e., when the
number of columns in X is greater than its rank) there always exists a transformation matrix, T,
such that X* = XT is of full rank. Furthermore, it is fairly easy, in most applications, to construct
X such that it is of full column rank.
The estimates, b, have been shown to be unbiased and minimally dispersed by several
authors (e.g., Anderson [1958], Kulback [1968], Press [1972], Rao [1965]). This is fairly easy to
show.
Expectation and variance of b. Recall that in the population, the parametric equation (1),
is y = X + ε . Hence, (4) can be re-written as
  ( XX)-1 Xy
 ( XX) -1 X( X  )
 ( XX) -1 ( XX)  ( XX) -1 X
   ( XX) -1 X.
(5)
Then, the expectation of b is given by
E {b}  E {  ( XX ) 1 X}
   ( XX ) 1 XE {}
 ,
(6)
since, under the assumptions given earlier, E {}  0. Hence, the expectation of b is β;
demonstrating the unbiasedness of b.
From (1), X  ( y  ) from which,   ( XX ) 1 X '( y  ). Then, the variance of b, using
the expectation operator { E }, is given by
V(b)  E {(b - β)(b - β)}
 E {[(XX)-1 Xy  ( XX)-1 X(y  ε)][(XX)-1 Xy  ( XX)-1 X(y  ε )]}
 E {[(XX)-1 Xy  (XX)-1 Xy  (XX)-1 X]
 [(XX)-1 Xy  (XX)-1 Xy  (XX)-1 X]}
 E {[(XX)-1 X][(XX)-1 X]}
 E {(XX)-1 XX(XX)-1 }
 E {}(XX)-1 XX(XX)1
 σ 2 I(XX)1.
(7)
Furthermore, it can easily be shown that
(b  )  ( XX) 1 Xy  ( XX) 1 X(y  )
 ( XX) 1 Xy  ( XX) 1 Xy  ( XX) 1 X
 ( XX) 1 X.
(8)
Hence, (8) can now be written as
V{b}  E {[( XX ) 1 X ][ ( X X ) 1 X ]}
 E {( XX ) 1 XX ( XX ) 1}
 E {}( XX ) 1 XX ( XX ) 1
 E {}( XX ) 1
 σ 2 I( XX ) 1.
(9)
Any other estimate of b, b* for instance, where b*  [( XX ) 1 X  a]y, is unbiased, would yield
the expected value,
E {b*}  E {[(X X) -1 X  + a]y}
 [(X X) -1 X y + a]E {y}
 [(X X) -1 X y + a]E {Xβ}
 β  aXβ.
(10)
However, since b* is unbiased, aX  0 . Now, substituting the steps (eqs?) given earlier, only
this time substituting b* for b, the variance of b* is given by
V( b*)   2 I[(X X) -1 X  + a][(X X) -1 X  + a]
  2 I[(X X) -1 X X(X X) -1  aX(X X) -1  (X X) -1 X a  aa ]
  2 I[(X X) -1  aa],
(11)
which is clearly greater (by a'a) then  (b).
Estimating the variance, σ2. In the population, the residual variance for a single
observation,  {εi} is given by
 2  V{ε}  V{y - Xβ}
 E {[ y - Xβ][ y - Xβ]}
 E {yy - y Xβ  βXy  βXXβ}
 E {yy - [Xβ  ε ]Xβ  βX[ Xβ  ε ]  βXXβ}
 E {yy  βXXβ   ' X  βX Xβ  βX  βXXβ}
 yy  βXXβ  E {}X  βXXβ  βXE {}  βXXβ
 yy  βXXβ  βXXβ  βXXβ  2βXE {}
 yy  βXXβ,
(12)
Since, again under the assumptions, E {}  0.
Typically, however, σ2 is estimated from sample observations. Given an independent
random sample of observations, Eq (?) can be re-written as y = Xb + e , to reflect that all terms
are generated from a sample. The sample variance is then given by s2  V{e} , where, since
e  y - Xb ,
s2  V{e}
 V{y - Xb
E
 y - Xb  y - Xb.
But, it was shown earlier that b    ( XX)1 X , so that
(13)
y - Xb  ( X  )  X[  ( X X ) 1 X ]
 ( X  )  X ( X X ) 1 X X  X ( X X ) 1 X 
 X    X  X ( X X ) 1 X 
   X ( XX ) 1 X 
 [I-X ( XX ) 1 X ],
so that,
s2  E
(14)
 y - Xb   y - Xb 
 E {[I - X(XX)-1 Xε][I - X(XX)-1 Xε ]
 E {ε[I - X(XX)-1 X]ε}.
(15)
Since it is easily shown that [I  X(XX )-1 X] is idempotent since X(XX)-1 X can easily be
shown to be idempotent.
Note that the product under the expectation operator in (15) is a scalar and thus equal to
its trace. Using the property of the trace that states that the trace of a product of matrices in
invariant under cyclic permutations of the matrices, (?) can be re-written as
s2  E {tr{[I-X ( X X ) 1 X ]}}
 tr{I-X ( X X ) 1 X }E {}
  2 [tr{I}  tr{( X X ) 1 X X}]
  2 (tr{IN }  tr{I p })
  2 (N  p ).
(16)
The previous equation (15) demonstrates that the sample variance, s2, is a biased estimate of
the population variance, i.e., it is inflated by (N - p). Therefore, the unbiased estimate is given
by ˆ 2  s2 / ( N  p).
Testing General Linear Hypotheses
Typically, we are interested in testing one or more of three types of linear hypotheses:
(i)
a hypothesis that the parameters in β are equal to a specific vector of constants:
H01: Aβ = d, where A = Ip;
(ii)
a hypothesis that some subset of the parameters is equal to a given vector of
constants:
H02: A(k × p)β = d, where k < p; and
(iii)
a hypothesis that some linear function of the β that some linear function(s) of the
parameters, β, is(are) equal to a given vector of constants:
H03: A(k × p)β = d, where k ≤ p.
In (i) A is a p-order identity matrix; in (ii) A(k) is a (k × p) matrix in which the rows each
contain one element equal to “1” and the rest equal to “0;” in (iii) the rows of A (i.e., the ai') are
vectors of weights for obtaining linear combinations of the parameters in β. Actually any
general linear hypothesis may contain any combination of the hypotheses: i, ii, and iii, subject
to the constraint that the inverse, (A'A)-1exists.
All three types of hypotheses have the same general form that can be expressed more
generally as:
H0: A'β = β*, or equivalently as
H0: A'β - β* = 0.
Estimates of parameters under H0. Consider first, case (ii), above, where A is a (k × p)
matrix. Under this hypothesis, A constrains k of the parameters in β by setting them equal to
zero, while leaving the remaining k - p parameters unconstrained (i.e., free to vary). The
hypothesis can be written as
β1  β1* 
β   * 
 2  β 2 


 

  * 
β k  β k   *
H0 : 


β k 1   γ k 1   γ 


 
β k  2   γ k  2 


 


 
β k  p   γ k  p 
(17)
where the βi, i = 1,2, …,k, are selected by the rows of A, the βi*, i = 1,2, …,k, are known
constants, and the γ’s are new parameters resulting from the constraints. To estimate the γ’s,
let
 A 
A   1,
 A2 
where A1 is now the k × p matrix given in (ii) above and A 2 is (p - k) × p, typically having the form:
A 2  0 I  .
Under the null hypothesis, H02:, y  XA   , which, when X is partitioned in conformance
with A, can be written as
 A 
y  [X1 X 2 ]  1    e
 A 2 
 A  
 [X1 X 2 ]  1   e
 A 2  
 *
 [X1 X 2 ]    e
γ
 X1 *  X 2   e.
(18)
Or, since the solutions are computed on sample data, y  X1 *  X 2 g  e. As before, the
objective is to find a solution for g such that the scalar product,
s2 =ee
=  y  X1*  X 2g   y  X1*  X 2g  ,
(19)
is a minimum. This is accomplished by differentiating (?) with respect to g and setting the result
equal to zero:
 s2 /  g  2X 2  y  X1*  X 2g 
 2 X 2 y  2X 2 X1*  2X 2 X 2g
 0.
Solving for g:
(20)
X2 X 2 g  X2 y  X2X1 *
 X2  y  X1 * ,
(21)
And
g  (X2 X 2 )1 X2   X1 * 
 (X2 X 2 )1 X2 y  (X2 X 2 )1 X2 X1 * .
(22)
Expectation and variance of g. The expected value of g is given by
E {g}  E {( X2 X 2 ) 1 X2 ( y  X1* )}
 E {( X2 X 2 ) 1 X2 [( X  e)  X1* ]}
 E {( X2 X 2 ) 1 X2 [( X1*  X2   e)  X1* ]}
 E {( X2 X 2 ) 1 X2 ( X2   e)}
(23)
 E {  ( X2 X 2 ) X2e}
1
   ( X2 X 2 ) 1 X2E {e}
 .
Hence, g provides an unbiased estimate of γ. The variance of g is given by
V{{g}  E {(g  E {g})(g  E {g})}
 E {(g   )(g   )
 E {[( X2 X 2 ) 1 X2 (y  X1*)   ][( X2 X 2 ) 1 X2 (y  X1*)   ]}
 E {[( X2 X 2 ) 1 X2 [( X1 *  X 2   e)  X1* ]   ]
 [( X1 *  X 2   e)  X1* ]   ]}
 E {[( X2 X 2 ) 1 X2 ( X 2   e)   ][( X2 X 2 ) 1 X2 ( X 2   e)   ]}
 E {[   ( X2 X 2 ) 1 X2e   ][   ( X2 X 2 ) 1 X2e   ]}
 E {( X2 X 2 ) 1 X2eeX 2 ( X2 X 2 ) 1}
 E {ee}( X2 X 2 ) 1 X2 X 2 ( X2 X 2 ) 1
=σ 2 I ( X2 X 2 ) 1.
(24)
Residual variance under the null hypothesis. Under the null hypothesis, H0,
y  X1 *  X2   e,
where
e  X1*  y  X2 
or
e  y  X1β* X2  .
Then, the variance of e is given by
V{e}   *  X1 X1  X1 X 2 ( X2 X 2 ) 1 X2 X1   * σ 2 .
But,
[ X1X1  X1X 2 ( X2 X 2 ) 1 X2 X1 ]  [A1 ( XX ) 1 A1 ]1,
so that
1
V{e}   *  A1 (XX)1 A1   * σ 2 ,
which is greater than the variance given for the unrestricted model. Since the identity earlier
may not be obvious, it may be useful to show the derivation.
Begin with the partitioned matrix, X = [X1 X2] and compute
 X  X X1 X 2 
X X   1 1

 X 2 X1 X 2 X 2 
X12 
X
  11
,
 X 21 X 22 
then, let
 XX
1
 X11 X12 
  21

X 22 
X
 ( X  X X 1 X ) 1  X 111 X12 X 22 
  111 12 1122 21
.
( X 22  X 21 X111 X12 ) 1 
  X 22 X 21 X
 A 
Now, if A   1  , then
 A2 
1
1
1
A1  XX  A1   X1 X1  X1 X 2  X 2 X 2  X 2 X1  ,


so that
 A1  XX 1 A1 


1
 X1 X1  X1 X 2  X2 X 2  X2 X1 .
1
Because, X1 X1  X1 X 2  X2 X 2  X2 X1 is idempotent.
1
Typical Applications
Having described the basic equations of the GLH, we now turn to an algebraic exposition of
some of the more typical models found in educational research. Examples of typical
applications using real data can be found in many of the references cited earlier as
well as in Olson (1971). In the discussion which follows, we begin with the model for the
analysis of variance, move to a brief discussion of the analysis of covariance, and finally present
a discussion on the analysis of repeated measures designs.
Choice of Design Matrices
Dummy variable coding.
Effect coding.
Using a super identity design matrix.
While there exists a mathematical
equivalence between hypothesis tests performed using GLH and the more traditional MLR, GLH
provides a more elegant, straight-forward approach to hypothesis testing. This is especially so in
the cases illustrated here.
If we let
1n1

0
X

 0
0
1n2
0
0

0
,

1n p 

what I prefer to call a “super identity matrix,” then the usual solution to the normal equations
yields,
 1 
 
2
β 
 
 
  p 
Then A can be used in a straight-forward fashion to test hypotheses involving linear
combinations of group means.
As a simple example, suppose we have a single factor, three-group design. In this
saturation, the terms in (1) would be specified as
1n1
 y1 

y   y2  , X   0

 y3 
 0
0
 1 
 1 



1n2 0  , β  μ   2  , and ε   2  ,

 3 
 3 
0 1n3 
Where, in general yi is an ni x 1 column vector of observations for individuals in group i, 1ni is an
ni x 1 column vector of 1s corresponding to the vector yi for group i, μi is the mean of the
observations in group i, and εi is an ni x 1 column vector of random disturbances.
0
Letting A and d in (3) be
1 0 1
0 
A
, d   ,

0 1 1
0 
respectively, sets up the following test of the null hypothesis to test the “group effect”:
   3  0
H0   1
   .
 2  3  0
Appendices
Summation Operators
The summation operator (∑) {Greek letter, capital sigma} is an instruction to sum over a
series of values. For instance, if we have the set of values for the variable, X = {X1, X2, X3, X4,
X5}, then
n 5
X
i 1
i
= X1+ X2+X3+ X4+ X5
n 5
Literally, the expression,  X i , says: beginning with i=1 and ending with i=5, sum over
i 1
the variables Xi. As an example, let
Xi = 8, X2 = 10, X3 = 11, X4 = 15, X5 = 16.
Then n = 5 {the number of cases}, and
n 5
X
i 1
i
= 8 + 10 + 11 + 15 + 16
= 60.
In many contexts, it is clear that the summation is over all cases and we do not need the
superscript over the summation operator. Furthermore, in most contexts it is assumed that the
n 5
summation begins with i = 1. Hence, the notation,  X i is taken to imply  X i . In most
i
i 1
situations, where the variable has only one subscript, as in Xi, the subscript can be omitted. In
these situations,
X
n 5
implies  X i .
i 1
In other contexts, the variable X may have more than one subscript, e.g., Xij. This occurs, for
instance, when individuals belong to two or more subgroupings or cross-classifications. We
might have a situation as shown below in Table 1.
Table 1: Three groups.
Group 1
X11, X21, X31, X41
Group 2
X12, X22, X32, X42, X52, X62
Group 3
X13, X23, X33, X43, X53
Here we have three groups, each with a different number of cases. We denote the ith case
in the jth group with the symbol, Xij. To sum all the cases, over all three groups, we would use
the following, double summation operator,
3
nj
j
i
 X
ij
,
which instructs us to sum over the three groups (j=1, 2, and 3) and, within each group, sum over
the number of cases in the group (i=1, 2, 3, 4 for Group 1; i=1, 2, 3, 4, 5, 6 for Group 2; i=1, 2, 3,
4, 5 for Group 3). For simplicity, we often write the summation expression as,
 X
ij
,
where it is assumed that we are to sum over all groups and all cases within each group. For a
more concrete example, lets substitute the following numbers for the symbolic values given
above.
Table 2: Three groups with real numbers.
Group 1
Group 2
10, 8, 12, 13
6, 11, 8, 10, 8, 12
Then,
Group 3
14, 6, 6, 10, 9

  X =   X
ij

ij

 = [10+8+12+13]

+[6+11+8+10+8+12]
+[14+6+6+10+9]
= 43+55+45
= 143.
A more complex situation occurs when cases are grouped into cross-classifications. Table
3 represents a situation where cases are cross-classified by sex and age-category.
Table 3: Cross-classification of sex and age.
Age Category
Child
Adolescent
Adult
Male
X111, X211,
X311, X411
X112, X212,
X312, X412
X113, X213,
X313, X413
Female
X121, X221,
X321, X421
X122, X222,
X322, X422
X123, X223,
X323, X423
Sex
In the table the notation, Xijk, denotes the ith individual in the jth row (Sex category) and
kth column (Age Category). Hence, X423 is the last case in the Adult-Female cell. To indicate
summation over all the cases in the table, we would use the notation,
  X
k
j
ijk
,
i
where it is assumed that the summation is over all N cases, i, over all J rows, j, and all K
columns, k.
Dot Notation
It is often easier to denote aggregates using dot notation. In an expression such as
X   Xi ,
i
the dot (∙) represents aggregation or summation over the missing (dotted) subscript. For instance,
n 5
in the example given earlier, where we had  X i = 60, we could simply write
i 1
X∙ = 60 {read X-dot = 60}. Using dot notation we would represent the cell aggregates in the Sex
by Age Category table above as shown in Table 4.
Table 4: Cross-classification using dot notation.
Age Category
Child
Adolescent
Adult
Male
X∙11
X∙12
X∙13
Sex
Female
X∙21
X∙22
X∙23
Total
X∙∙1
X∙∙2
X∙∙3
Total
X∙1∙
X∙2∙
X∙∙∙
where, for the Male-Adolescent cell, X∙12 =
X
i12
. To denote the sum of the values for all the
i
males, we write, X∙1∙ ; for all females, X∙2∙. Similarly, the sum of the values for all children is,
X∙∙1; all adolescents, X∙∙2; and all adults, X∙∙3 And the sum of the values for all the cases, X∙∙∙
Note that
 X
X∙1∙ =
k
X∙∙1=
 X
j
X∙∙∙ =
i1k
ij1
, and
i
  X
k
,
i
j
ijk
.
i
Rules of summation
Summation of a constant. Let c be some constant value. Then,
N
 c  Nc .
i 1
In other words, summing a constant N times is the same as multiplying the constant by N. Hence,
if c = 5, then
12
 c  12  5  60 .
i 1
This rule can be extended to double summation. Thus,
J  nj
 J
c
c
=
  j i  =  n j c .

 j
As an example, consider the situation involving the three groups given earlier in Table 2. If all
cases, in all groups, have the constant value, 10, then
3  nj

10
=
j i 10 = [10+10+10+10]
j i


+ [10+10+10+10+10+10]
3
nj
+ [10+10+10+10+10]
= (4×10) + (6×10) + (5×10)
= 15×10
= 150.
Multiplication by a constant. If all the values of a variable, X, are multiplied by the same
constant, c, then,
N
 N

 c  X i   c  X i .
i
 i

N
 cX
i
i
For example, let the set of 5 values of the variable X be {3, 9, 5, 7, 10}, assume that each is
multiplied by the constant, 2. Then,
 2X
i
= 2(3)+2(9)+2(5)+2(7)+2(10)
i
= 2(3+9+5+7+10)
= 34
= 2 X i
i
= cX∙∙
Again, this can be expanded to multiple summations:
 cX
j
ij
 c  X ij = cX∙∙,
i
j
 cX
k
j
i
ijk
i
 c  X ij = cX∙∙∙,
k
j
i
and so on.
As an example, again consider the three group situation given earlier in Table 2. If all
cases were multiplied by 5, then
3
nj
j
i
  5( X
ij
) = [(5×10) + (5×8) + (5×12) + (5×13)]
+[(5×6) + (5×11) + (5×8) + (5×10) + (5×8) + (5×12)]
+[(5×14) + (5×6) + (5×6) + (5×10) + (5×9)]
= 5(43+55+45)
 3 nj

=5    ( X ij ) 
 j i

= 10(143)
= 715.
In some situations, the values in different groups are multiplied by different constants.
For instance, suppose the values in Group 1 (in the example just given) were multiplied by the
constant c1, the values in Group 2 by c2, and the values in group 3 by c3. Then, the sum of all the
cases would be given by,
c X .
i
i
i
In this situation, it is necessary that the constants remain within the summation operator.
Now, suppose that in the Sex by Age Category example given earlier, we have the values as
shown below in Table 5.
Table 5
Child
Age Category
Adolescent
Adult
Male
2, 3,
5, 4
5, 1,
1, 3
2, 2,
3, 0
Female
1, 3,
5, 2
2, 0,
3, 1
5, 3,
4, 3
Sex
If all the male values are multiplied by the constant, 5, and all the female values are
multiplied by the constant, 10. Then, letting c1 = 5 and c2 = 10, the sum over all cases, rows, and
columns is given by,
 c
k
j
i
j
X ijk =

 c   X
j
k
j
i
ijk



= 5(14+10+7) + 10(11+6+15)
= 5(31) + 10(32)
= 475.
Note that the right-hand side of the summation notation above could have been written as
c  X
j
k
j
ijk
,
i
without the parentheses. Using dot notation, the above expression can be written as
 c X
j
k
 jk
.
j
Order of operations. The order of operations, when using summation operators, is the
same as that for arithmetic. That is, operations involving the values in a summation operation are
indicated by mathematical punctuation. When indicated by punctuation, operations are to be
carried out on values prior to summation. Some examples should suffice.
X
2
i
 ( X 12  X 22  X 32   X N2 ) ,
i

Xi 
X 1  X 2    X N , and
i
 log X
i
 log X 1  log X 2    log X N .
i
On the other hand,
2


  X i   ( X1  X 2    X 3 )2 ,
 i

X
i
i
 X1  X 2    X N , and


log   X i   log( X 1  X 2   X N ) .
 i

Note that these rules apply even when we have multiple summations. For example, letting capital
J represent the number of groups,
2


2
  X i    X 1 j  X 2 j    X nj 

j 1  i
j

.
J
  X 2j
j
In words, within each group, j: sum the values over cases, i, then square the sum. After
squaring the sums within all of the J groups, sum the squared sums over groups. For example,
consider the data given in Table 2, earlier.
2


  X i  = 432 + 552 + 452 = 6899.

j 1  i

J
Distributive rule of summation. The summation operator is distributive when the value
being operated upon is itself a sum (or difference). For example,
 X
i
 Yi    X i   Yi ,
i
i
 X
2
J
j
i
ij

 X     X ij2  2 X  X ij  X 2
J
i
j

i
J
J
J
  X ij2  2 X   X ij   X 2
j
i
J
j
i
J
j
i
J
  X ij2  2 X   X ij  X 2  n j
j
i
j
i
j
Note that the last step in this equation follows from the rule pertaining to summation over a
constant, given earlier. Hence,
nJ
 n1 2 n2 2

2

X

X

X



j i   i  i 
i X 2 


2
2
2
 n1 X   n2 X     n J X  
J
J
  n j X 2
j
J
 X 2  n j
j
Summation involving two or more variables. If each case has values on two variables, Xi
and Yi, say, then,
N
X Y
i i
i



   X i   Yi 
 i
 i 
Instead,
 X Y = (X1Y1 + X2Y2 + ∙∙∙ + XNYN).
i i
i
These rules can be extended, quite logically, to situations involving more than two
variables.
Expectation Operators
This appendix has yet to be developed.
Matrix Algebra
<Need an introduction>
Dimensions and order of a matrix. A p by q dimensioned matrix is a p (rows) by q
(columns) array of numbers, or symbols, and will itself be represented by an upper-case, bold,
italic symbol; e.g, A , β . For instance, a 2 by 3 matrix, A , can be represented, symbolically, as
a symbolic array
a
A   11
a21
a12
a22
a13 
,
a23 
or as an actual array of numbers
 3 5 6
A
.
2 4 3
In either case, the order of A is said to be 2  3. On the other hand, the matrix,
 d11 d12
d
d 22
D   21
 d 31 d 32

d 41 d 42
d13 
d 23 
,
d 33 

d 43 
is a 4  3 matrix. That is, its order is 4  3. A matrix having only one column or one row is
called a vector and is represented by a lower-case, bold, italic symbol; e.g., a, . Hence the 1 
4 and 3  1 matrices,
a  a1
a2
a3
a4  , and
 1 
β    2  ,
  3 
are both vectors. A is a row vector;  is a column vector.
The numbers (or symbols) inside a matrix are called elements. Thus in the first matrix, A,
given above, a11 and a23 are elements of the matrix A. Similarly, in the second matrix, A, above,
the numbers 4 and 6 are elements.
Elements in a matrix are indexed (or referred to) by subscripts that give their row and
column locations. For, instance, in the first matrix, A, above, a23 is the element in the second
row of the third column. Similarly, in the matrix, D, above, element d41 is found in the fourth
row, first column. In general, aij is the i,j’th element of A. When referring to elements of
vectors, however, the first row (or column) is assumed. Hence, in the vector, a, above, a2 is its
second element; in , 3 is the third element. In general, elements of vectors, are indicated by
the j’th element of row (i.e., 1 x q) vectors and the i’th element of column (i.e., p x 1) vectors.
A matrix in which the number of rows (p) equals the number of columns (q) is called a
square matrix.; otherwise, providing it is not a vector, it is called a rectangular matrix.
The natural order of a rectangular matrix has more rows than columns (i.e., p > q).
Hence,
 a11
A
a12

 a13
a 21 
a 22 
,
a 23 

a 3 x 2 matrix, is presented in its natural order.
The transpose of a matrix is obtained by interchanging its rows and columns. Thus, the
transpose of A, represented as A', is
a
A   11
a 21
a12
a 22
a13 
,
a 23 
a 3 x 2 matrix. As a concrete example, consider
2
5
X 
1

2
4 3
7 4 
,
6 7

2 6
which, when transposed, becomes
 2 5 1 2
X   4 7 6 2 .
3 4 7 6
The natural order of a vector is a p x 1 column vector. Here, the 4 x 1 column vector,
 a1 
a 
a   2,
 a3 
 
a 4 
Is presented in its natural order. The transpose of a,
a   a1 a2 a3 a4  ,
a 1 x 4 row vector.
As mentioned earlier a matrix where p = q is a square matrix. At least two particular
types of square matrices are particularly important, symmetric matrices and diagonal matrices.
A symmetric matrix is a square matrix in which the elements above the main diagonal are
mirror images of the elements below the main diagonal (the main diagonal is that set of
elements running from the upper left-hand side of a square matrix to the lower right-hand. In
the symmetric matrix V, given below, the elements in bold represent the main diagonal. Note
that the elements above the main diagonal are mirror images of the elements below the main
diagonal.
56 52 38 
V   52 64 61  .
 38 61 59
Diagonal matrices are square matrices in which all elements except the main diagonal
are zero (0). For instance,
37 0 0 0 
 0 52 0 0 

D
 0 0 49 0 


 0 0 0 61
is a symmetric diagonal matrix.
An important diagonal matrix is the identity matrix. The identity matrix is a diagonal
matrix in which all the elements along the main diagonal are unity (1). For example,
1 0 0
I  0 1 0
0 0 1
is a 3 x 3 identity matrix.
Symmetric matrices have important properties. For instance, a symmetric matrix is
equal to its transpose. That is, for the matrix just given, V = V’.
Matrix Operations
Addition. The addition of two matrices, A and B, is accomplished by adding
corresponding elements, e.g., aij + bij.; hence,
C  AB
 a11 a12
a
a 22
  21
 a31 a32

a 41 a 42
 a11  b11
a  b
21
21

 a31  b31

a 41  b41
 c11 c12
c
c
  21 22
 c31 c32

c 41 c 42
a13   b11 b12 b13 
a 23  b21 b22 b23 

a33  b31 b32 b33 
 

a 43  b41 b42 b43 
a12  b12 a13  b13 
a 22  b22 a 23  b23 
a32  b32 a33  b33 

a 42  b42 a 43  b 43 
c13 
c 23 
c33 

c 43 
It is obvious (or should be obvious) that the orders of the two matrices being added
need to be identical. Here, for instance, both matrices are of order 4 x 3. When the orders of
the two matrices are not identical, addition is impossible.
Addition of matrices is commutative. Hence A + B = B + A.
As a concrete example, let
7
3
A
4

1
5 8
1 1 
, and
5 8

3 4
2
1
B
4

3
0 3
3 3 
.
1 2

1 3
Then, C = A + B = B + A
7  2
 3 1
= 
4  4

1 3
9
4
=
8

4
5  0 8  3
1  3 1  3 
5  1 8  2

3 1 4  3 
5 11
4 4 
.
6 10

4 7
Multiplication. Multiplication is somewhat more complicated. When multiplying
matrices we need to distinguish among multiplication by a scalar, vector multiplication, premultiplication, and post-multiplication, inner-products and outer-products. But first, let us
define a scalar. A scalar is simply a single number, such as 2, 17.6, or -300.335. Symbolically, we
typically use the symbol, c, to represent a scalar.
Scalar multiplication. Scalar multiplication is easily shown by example. Given the 3 x 2 matrix X
and the scalar, c, the product, cX, is given by
 x 11 x 12 
cX  c  x 21 x 22 
 x31 x32 
cx 11 cx 12 
 cx 21 cx 22 
 cx31 cx32 
 x 11 x 12 
  x 21 x 22  c
 x31 x32 
 x 11 c x 12 c 
  x 21 c x 22 c .
 x31c x32 c 
As a concrete example, let
4 7
X  3 0 
 1 2 
and c = 5. Then
4
cX  53
 1
4
 3
 1
7
0 
2 
7
0 5
2 
20 35
  15 0 
 5 10 
In the above expression, the product, cX, results from pre-multiplying X by the scalar c;
whereas the product Xc results from post-multiplying X by the scalar, c. Since scalar
multiplication is commutative, cX will always be equal to Xc. This is not the case with matrix
multiplication, however. Only in certain special cases will the products formed by premultiplying and post-multiplying two matrices, X and Y be equal. In general XY ≠ YX.
Matrix Multiplication. First, note that given two matrices, e.g., A of order n x m, and B of
order p x q, multiplication is only possible when the number of columns in the pre-multiplier is
equal to the number of rows in the post-multiplier. Hence the product, AB, is only possible
when m = p. Similarly, the product BA is only possible when q = n.
When m = p the product, C = AB will have order n x q. Similarly, when q = n, the product,
C = BA will have order p x m.
We can illustrate this by defining the two matrices,
 x11
x
X   21
 x31

 x 41
x12
x 22
x32
x 42
x13 
x 23 
,
x33 

x 43 
and
y
Y    11
 y 21
y12
y 22
y13 
.
y 23 
Note that X has order 4 x 3, and Y’ has order 2 x 3. Here, only the products, XY (not XY’) and
Y’X’ (not YX) are possible. Post-multiplying X by Y, we have
 x11
x
XY   21
 x31

 x 41
x13 
 y11 y 21 
x 23  
y12 y 22 

x33 
  y13 y 23 
x 43  
 x11 y11  x12 y12  x13 y13 x11 y 21  x12 y 22  x13 y 23 
x y  x y  x y
x 21 y 21  x 22 y 22  x 23 y 23 
22 12
23 13
  21 11
 x31 y11  x32 y12  x33 y13 x31 y 21  x32 y 22  x33 y 23 


 x 41 y11  x 42 y12  x 43 y13 x 41 y 21  x 42 y 22  x 43 y 23 
x12
x 22
x32
x 42
where the product is of order 4 x 2. As a more concrete example, let
2
1
X 
2

1
Then
0 1
3 1 
1 0

3 2
2 1 1
and Y   
.
3 0 2
2
1
XY  
2

1
0 1
2 3
3 1  
1 0 


1 0
 1 2 
3 2 
 4  0  1 6  0  3
2  3 1 3  0  2


4  1  0 6  0  0


2  3  2 3  0  4
5 9 
6 5
,

5 6 


7 7 
a 4 x 2 matrix. The only other product possible between these two matrices is Y’X’:
 x11 x 21 x31 x 41 
y13  
x12 x 22 x32 x 42 


y 22 y 23 
 x13 x 23 x33 x 43 
 y x  y12 x12  y13 x13 y11 x 21  y12 x 22  y13 x 23
  11 11
 y 21 x 21  y 22 x12  y 23 x13 y 21 x 21  y 22 x 22  y 23 x 23
y
    11
YX
 y 21
y12
2 1 2
2 1 1 

 0 3 1
3 0 2 1 1 0

2  0 1 2  3 1

6  0  2 3  0  2
3 6 5 7 

,
8 5 6 7 
y11 x31  y12 x 43  y13 x33
y 21 x31  y 22 x 43  y 23 x33
y11 x 41  y12 x 42  y13 x 43 
y 21 x 41  y 22 x 42  y 23 x 43 
1
3
2
4 1 0
2  3  2
6  0  0 3  0  4
which is a 2 x 4 matrix.
When A and B are both square matrices of the same order, then all the products, AB, BA, A’B,
AB’, B’A, BA’, A’B’, and B’A’ are possible. Furthermore, except is certain special cases, the
matrices formed by these various products will be different from each other. As an exercise, try
computing all possible products of the following two square matrices:
2 2 1
1 2 3


A   2 3 1 and B   3 1 0  .
 2 1 1 
0 2 1
Vector Multiplication. Having learned how to compute matrix multiplication, vector
multiplication is easy. It follows the same rules of matrix multiplication except that here one of
the matrices is a vector. For instance, given the 4 x 1 column vector,
 n1 
 y11
n 
y
2

,
and
the
4
x
2
matrix,
n
Y   21
 n3 
 y 31
 

n4 
 y 41
y12 
y 22 
,
y 32 

y 42 
then the product n’Y is the only possible product. Hence,
 y11
y
n4  21
 y 31

 y 41
y12 
y 22 

n Y  n1 n2 n3
y 32 

y 42 
 n1 y11  n2 y 21  n3 y 31  n4 y 41

 n y  n y .
i
i1
i
n1 y12  n2 y 22  n3 y 32  n4 y 42 
i2
To put this in concrete terms, let
3
2
n   3 1 2 5 and Y  
2

4
1
3
.
2

4
Then
3
2
n Y  3 1 2 5
2

4
1
3
 6  2  4  20 3  3  4  15  32 25 .
2

3
Some special cases of vector and matrix multiplication are particularly important. For instance,
let the vector, 1 be defined as a vector with all elements equal to 1:
1
1

1  1 and Y  y 1

1
1
2
1

y 2   4

3
4
3
2
0. {Note, y1 and y2 are column vectors}.

1
2
Then,
1Y 
 y  y   14 8.
1
2
If we let
X  x 1
1
1

x 2   1

1
1
2
1
3 and Y  y 1

2
2
2
1

y 2   4

3
4
3
2
0 .

1
3
Then,
  y1
X Y  
 xy1
 y   14
 xy  31
2
2
9
.
16
Note also that
  x1
X X  
 x 2 x1
x x
x
1 2
2
2
  n

  x 2
x
x
2
2
2
  5 10 

.
10
22



Inner-products and outer-products. Given the two matrices,
 a11
A  a 21
 a31
a12 
b 
b b
a 22  and B   11 12 13  ,
b21 b22 b23 
a32 
Verify that the following two products are possible, AB and BA (of course, A’B’ and B’A’ also are
possible). The orders of the two products are not identical. AB has order, 3 x 3, while BA has
order 2 x 2. The product AB is an outer-product while the product BA is an inner-product. In
general, whenever both an inner-product and an outer-product exist for two matrices, the
order of the inner-product will be less than the outer-product. Furthermore, in general, the
inner-product will not be identical to the outer-product.
For example, let
 2 1
1 2 4
.
A  3 2 and B  
3 2 1

 3 1
Then
5 6 9 
20 9
.
AB  9 10 14 and BA  
15 8

6 8 13
Inner- and outer-products are particularly important vector multiplication. Let
1
2
a  1 1 1 1 , b  1 1 1, and X  
4

2
2 3
3 4
.
3 2

1 1
Then
aX  9 9 10 
 x  x  x , {Note the summation is over
i1
i2
i3
rows} and
6   x1 j 
9   x 
 2 j  . {Here the summation is over columns}.
Xb     

9 
 x3 j 
  
4  x 4 j 
Also, it is easy to verify that
aXb  28   xij where the summation is over rows and columns.
Download