Uploaded by Ali Alhilali

Linea Models 2

advertisement
BIOS 2083: Linear Models
Abdus S Wahed
August 30, 2010
BIOS 2083
Chapter 0
Linear Models
Abdus S. Wahed
2
Chapter 1
Introduction to linear models
1.1
Linear Models: Definition and Examples
Example 1.1.1. Estimating the mean of a N(μ, σ 2 ) random variable.
1. Suppose Y1, Y2, . . . Yn be n i.i.d random variables from N (μ, σ 2) distribution.
• What can we tell about μ based on these n observations?
• Likelihood:
2
L(μ, σ ) =
n
fYi (yi )
√ −n
1 2
exp − 2
(yi − μ)
= σ 2π
2σ
i=1
3
(1.1.1)
BIOS 2083
Linear Models
Abdus S. Wahed
• Maximum likelihood estimator:
1
= Ȳ =
Yi .
n i=1
n
μ̂M LE
(1.1.2)
2. Now consider Y1 , Y2, . . . Yn be n random variables such that
Yi = μ + i , i = 1, 2, . . . , n,
(1.1.3)
where i ’s are i.i.d N (0, σ 2) random variables.
• How can we draw inference on μ?
• Likelihood:
2
L(μ, σ ) =
=
n
i=1
n
fYi (yi )
fi (yi − μ)
√ −n
1 2
= σ 2π
exp − 2
(yi − μ)
2σ
i=1
(1.1.4)
• Maximum likelihood estimator:
1
= Ȳ =
Yi .
n i=1
n
μ̂M LE
(1.1.5)
SAME RESULT AS BEFORE
In the Equation (1.1.3), we have expressed the random variable Y as a
linear function of the parameter μ plus an error term.
Chapter 1
4
BIOS 2083
Linear Models
Abdus S. Wahed
Definition 1.1.1. A model which expresses the response as a linear function of the parameter(s) (plus an error term that has mean zero) is known
as a linear model.
The model
• (log-linear regression) ln Yi = α + βXi + i ; E(i) = 0
is a linear model, while
• Yi = exp(α + βXi ) + i ; E(i) = 0
is not.
Chapter 1
5
BIOS 2083
Linear Models
Abdus S. Wahed
Figure 1.1: Age distribution in African American(AA) and Caucasian Americans (CA) volunteered to participate in a clinical study.
Example 1.1.2. Testing the equality of the means of two independent normal populations.
1. Suppose Yi1, Yi2, . . . Yini be ni i.i.d random variables from N (μi , σ 2) distribution for i = 1, 2.
• How do we test the equality of the two means μ1 and μ2 based on
these n1 + n2 = n observations?
Chapter 1
6
BIOS 2083
Linear Models
Abdus S. Wahed
• Hypothesis: H0 : μ1 = μ2 .
• Usual test statistic:
Tpooled =
Ȳ1 − Ȳ2
∼ tn−2 ,
1
1
S n1 + n2
where S is the pooled sample standard deviation.
• Alternatively: Write
Yij = μi + ij , i = 1, 2; j = 1, 2, . . . , ni,
(1.1.6)
where ij ’s are i.i.d. N (0, σ 2) random variables.
• We will show later that this alternative representation also leads to
the same test statistic as Tpooled.
• The equation (1.1.6) shows that two-sample t-test can be viewed as
a linear model, as well.
Chapter 1
7
BIOS 2083
Linear Models
Abdus S. Wahed
Example 1.1.3. Paired experiment (before and after test). Suppose
observations are collected on a number of individuals in two separate conditions (temperatures, times, before/after treatment). Let Yji denote the
response from the ith individual at condition j, i = 1, 2, . . . , n; j = 1, 2.
The goal is to see if the mean response is similar across the two conditions. Since the observations are paired, one would be willing to construct
the differences Di = Y2i − Y1i and draw inference on the expected difference
δ = E(D) = E(Y2) − E(Y1). We can write this problem as a linear model in
many different ways:
Di = δ + i , E(i) = 0.
(1.1.7)
Yji = μ + αj + ji , E(ji) = 0.
(1.1.8)
and
In the second model, one draws inference on the difference α2 − α1 . It can
be shown that both models lead to the same conclusion under normality
assumption when 1i and 2i in the second construction are assumed to be
correlated.
Chapter 1
8
BIOS 2083
Linear Models
Abdus S. Wahed
Example 1.1.4. Simple linear regression.
Very often we are interested in associating one variable (covariate) to
another variable (outcome). For instance, consider a random sample of n
leukemia patients who were diagnosed at age wi, i = 1, 2, . . . , n and died at
age Yi , i = 1, 2, . . . , n. The objective is to relate the survival times to the age
at diagnosis.
• The simple linear regression assumes the following model:
Yi = α + βwi + i .
(1.1.9)
• The goal is to estimate the “parameters” α (intercept) and β (slope) so
that given the age at prognosis, one can predict how long the patient is
going to survive.
Chapter 1
9
BIOS 2083
Linear Models
Abdus S. Wahed
Figure 1.2: Regression of survival time on age at prognosis
Chapter 1
10
BIOS 2083
Linear Models
Abdus S. Wahed
Figure 1.3: Polynomial regression of survival time on age at prognosis
Example 1.1.5. Polynomial regression (Example 1.1.4 continued..)
• The quadratic linear regression assumes the following model:
Yi = α + βwi + γwi2 + i .
(1.1.10)
• The goal is to estimate the “parameters” α (intercept), β (linear coefficient) and γ (quadratic coefficient) so that again, as in previous example,
given the age at prognosis, one can predict how long the patient is going
to survive.
Chapter 1
11
BIOS 2083
Linear Models
Abdus S. Wahed
Example 1.1.6. Multiple linear regression
• Extend the idea of Examples 1.1.4 and 1.1.5 to associate a single response
to a set of k explanatory variables. Multiple linear regression assumes
the following model:
Yi = β0 + β1 x1i + β2 x2i + . . . + βk xki + i .
(1.1.11)
• The goal is to estimate the “parameters” βj , j = 0, 1, . . . , k with the
goal of investigating the relationship between the response Y and the
explanatory variables Xj , j = 1, 2, . . . , k.
Chapter 1
12
BIOS 2083
Linear Models
Abdus S. Wahed
Example 1.1.7. Transformed data.
Inverse square law states that the force of gravity between two particle
situated at D distance apart can be modeled by
F =
Dβ
.
(1.1.12)
• Consider a log transformation on the both sides of (1.1.12) to obtain
Y = α + βx + .
(1.1.13)
where Y = ln(F ), x = ln(D), and α = ln( ).
• Model (1.1.13) is basically in the form of model (1.1.9).
Chapter 1
13
BIOS 2083
Linear Models
Abdus S. Wahed
Example 1.1.8. One-way analysis of variance (ANOVA). Consider a
clinical trial in which we are interested in comparing a treatments. Suppose ni patients are randomized to the ith treatment. Let Yij denote the
response from the jth patient receiving the ith treatment, μ the overall mean
response, and αi the incremental effect of treatment i, i = 1, 2, . . . , a and
j = 1, 2, . . . , ni. Then:
• The one-way analysis of variance model is written as
Yij = μ + αi + ij ,
(1.1.14)
where ij is the error term associated with the jth observation from the
ith treatment and have zero mean.
Chapter 1
14
BIOS 2083
Linear Models
Abdus S. Wahed
Example 1.1.9. Two-way analysis of variance (ANOVA).
A clinical trial is being planned to compare a treatments. However, the
treatments are known to have different effect in different racial groups and one
would like to adjust for race while determining the effect of treatment on the
response. Suppose ni = ni1 +ni2 patients are randomized to the ith treatment
with nij patients belonging to the jth racial group, j = 1, 2. Let Yijk denote
the response from the kth patient belonging to the jth racial group receiving
the ith treatment, μ the overall mean response, αi the incremental effect of
treatment i, and βj the incremental effect for race j, i = 1, 2, . . . , a, j = 1, 2,
and k = 1, 2, . . . , nij . Then:
• The two-way analysis of variance model is written as
Yijk = μ + αi + βj + ijk ,
(1.1.15)
where ijk is the error term associated with the kth patient belonging to
racial group j and receiving treatment i.
Chapter 1
15
BIOS 2083
Linear Models
Abdus S. Wahed
Example 1.1.10. Analysis of covariance
The objective of analysis of covariance is similar to the previous two examples. Here we are interested in comparing a treatments adjusting for
continuous covariates.
• There are multiple representation of an analysis of covariance model with
a single adjusting covariate.
Yij = μ + αi + βxij + ij ,
(1.1.16)
Yij = μ + αi + β(xij − x̄..) + ij ,
(1.1.17)
Yij = μ + αi + βi xij + ij ,
(1.1.18)
Yij = μ + αi + β(xij − x̄i.) + ij ,
(1.1.19)
where
– μ = overall mean response,
– αi = incremental mean response from ith treatment,
– β = effect of the adjusting covariate X,
– xij = value of the covariate X for the subject j from treat. group i,
– x̄.. = overall mean for the adjusting covariate
Chapter 1
16
BIOS 2083
Linear Models
Abdus S. Wahed
– x̄i. = ith treatment group-specific mean for the adjusting covariate X.
ij is the error term associated with the jth patient receiving treatment
i.
You can find more examples of linear models in different applications of
statistics. I have cited only a few simple ones that are commonly applied in
day-to-day data analysis. More complex linear models can be constructed to
address particular problems of interest.
1.2
General form of linear model
All the models in the examples from the previous section can be written in a
general form using a response vector Y , a matrix of constants X, a parameter
vector β and an error vector .
• Specifically, a general linear model will have the form
Y = Xβ + ,
(1.2.1)
where
Chapter 1
17
BIOS 2083
–
–
–
–
⎛
⎞
Linear Models
Abdus S. Wahed
⎜ Y1 ⎟
⎟
⎜
⎟
⎜
⎜ Y2 ⎟
⎟ is an n × 1 vector of response,
Y =⎜
⎟
⎜
⎜ ··· ⎟
⎟
⎜
⎠
⎝
Yn
⎡
⎤
⎢ x11 x12 . . . x1p ⎥
⎢
⎥
⎢
⎥
⎢ x21 x22 . . . x2p ⎥
⎥ is an n × p matrix of constants,
X=⎢
⎢ .
⎥
.
.
.
⎢ ..
..
..
.. ⎥
⎢
⎥
⎣
⎦
xn1 xn2 . . . xnp
⎛
⎞
⎜ β1 ⎟
⎜
⎟
⎜
⎟
⎜ β2 ⎟
⎟ is an p × 1 vector of parameters, and
β=⎜
⎜
⎟
⎜ ··· ⎟
⎜
⎟
⎝
⎠
βp
⎛
⎞
⎜ 1 ⎟
⎜
⎟
⎜
⎟
⎜ 2 ⎟
⎟ is an n × 1 vector of error terms.
=⎜
⎜
⎟
⎜ ··· ⎟
⎜
⎟
⎝
⎠
n
• The response vector Y usually contains the responses from patients,
subjects, or experimental units.
Chapter 1
18
BIOS 2083
Linear Models
Abdus S. Wahed
• The columns of X-matrix represents the values of the variables, the
effect of which on the response is being studied, known as predictors,
covariates, regressors, or independent variables.
• Usually the first column of the X-matrix is a column of 1’s where there
is an intercept in the model.
• β is referred to as the parameter vector, regression coefficient (coefficient,
in short).
• The model is linear in the unknown coefficients β1, β2 , . . . , βp as (1.2.1)
can be written as
Y =
p
βj xj + ,
(1.2.2)
j=1
where xj is the jth column of X.
• Typically, for fixed X, the assumption that is required on a general linear
model (1.2.1), or equivalently, (1.2.2) is that the error vector has mean
zero. That is,
Assumption I. E() = 0.
• For random X, we require that E(Y |X) = Xβ, an assumption that is
guaranteed to hold when E(|X) = 0.
Chapter 1
19
BIOS 2083
Linear Models
Abdus S. Wahed
Now, how do we show that all the models considered as examples of linear
models in the previous section can be written in the form (1.2.1)?
1. Example 1.1.1. Writing (1.1.3) specifically for each i = 1, 2, . . . , n, we
can easily see that
Y1 = μ + 1
Y2 = μ + 2
..
.
. = ..
Yn = μ + n ,
leading to
⎛
⎞
⎡
⎜ Y1 ⎟
⎢
⎟
⎜
⎢
⎟
⎜
⎢
⎜ Y2 ⎟
⎢
⎟, X = ⎢
where Y = ⎜
⎟
⎜
⎢
⎜ ··· ⎟
⎢
⎟
⎜
⎢
⎠
⎝
⎣
Yn
Chapter 1
Y = Xβ + ,
⎛
⎤
1⎥
⎜
⎜
⎥
⎜
⎥
1⎥
⎜
⎥ = 1n , β = μ, and = ⎜
⎜
.. ⎥
⎜
. ⎥
⎜
⎥
⎝
⎦
1
⎞
1 ⎟
⎟
⎟
2 ⎟
⎟.
⎟
··· ⎟
⎟
⎠
n
20
BIOS 2083
Linear Models
Abdus S. Wahed
2. Example 1.1.2. For this problem, follow Equation (1.1.6) and write it
out for all i and j which will lead to
⎡
⎞
⎛
⎢1
⎜ Y11 ⎟
⎢
⎟
⎜
⎢
⎟
⎜
⎢1
⎜ Y12 ⎟
⎢
⎟
⎜
⎢ .
⎟
⎜
⎢ ..
⎜ ··· ⎟
⎢
⎟
⎜
⎢
⎟
⎜
⎢
⎟
⎜
⎢1
⎜ Y1n1 ⎟
⎟, X = ⎢
Y =⎜
⎢
⎟
⎜
⎢0
⎜ Y21 ⎟
⎢
⎟
⎜
⎢
⎟
⎜
⎢
⎟
⎜
⎢0
⎜ Y22 ⎟
⎢
⎟
⎜
⎢ .
⎟
⎜
⎢ ..
⎜ ··· ⎟
⎢
⎟
⎜
⎣
⎠
⎝
0
Y2n2
⎤
0⎥
⎥
⎥
0⎥
⎥
⎥
⎥
⎥ ⎡
⎤
⎥
⎥
0 ⎥ ⎢ 1 n1 0 n1 ⎥
⎥=⎣
⎦,
⎥
1⎥
0 n2 1 n2
⎥
⎥
⎥
1⎥
⎥
⎥
⎥
⎥
⎦
1
⎞
⎛
⎜ 11 ⎟
⎟
⎜
⎟
⎜
⎜ 12 ⎟
⎟
⎜
⎟
⎜
⎜ ··· ⎟
⎟
⎜
⎛
⎞
⎟
⎜
⎟
⎜
⎜ 1n1 ⎟
⎜ μ1 ⎟
⎟.
β=⎝
⎠, and = ⎜
⎟
⎜
⎜ 21 ⎟
μ2
⎟
⎜
⎟
⎜
⎟
⎜
⎜ 22 ⎟
⎟
⎜
⎟
⎜
⎜ ··· ⎟
⎟
⎜
⎠
⎝
2n2
Chapter 1
21
BIOS 2083
Linear Models
Abdus S. Wahed
3. Example 1.1.4.
Chapter 1
22
BIOS 2083
Linear Models
Abdus S. Wahed
4. Example 1.1.5.
Chapter 1
23
BIOS 2083
Linear Models
Abdus S. Wahed
5. Example 1.1.6.
Chapter 1
24
BIOS 2083
Linear Models
Abdus S. Wahed
6. Example 1.1.7.
Chapter 1
25
BIOS 2083
Linear Models
Abdus S. Wahed
7. Example 1.1.8.
Chapter 1
26
BIOS 2083
Linear Models
Abdus S. Wahed
8. Example 1.1.9.
Chapter 1
27
BIOS 2083
Linear Models
Abdus S. Wahed
9. Example 1.1.10.
Chapter 1
28
BIOS 2083
1.3
Linear Models
Abdus S. Wahed
Problems
1. A clinical trial was designed to compare three treatments based on a
continuous endpoint (Y ). Each treatment consists of doses of 6 pills to
be taken orally everyday for 6 weeks. The patient population is highly
variable regarding their medication adherence. A measure of adherence
is given by the proportion of pills taken during the course of treatment.
Suppose that the investigators would like to compare the treatments
adjusting for the effect of adherence. They also suspect that the effect
of adherence on response will vary by treatment group. Use your own
sets of notations to propose a linear model to analyze the data from this
trial. Write the model in matrix form.
2. An immunologist is investigating the effect of treatment on the expressions of Programmed Death - 1 (PD-1) molecules on disease-specific CD8
cells. For each patient, PD-1 levels are measured on 5 fixed pentamers (a
viral capsomer having five structural units) before and after the end of
the therapy. Patients are classified into early response groups (marked,
intermediate, or poor) based on characteristics observed prior to meaChapter 1
29
BIOS 2083
Linear Models
Abdus S. Wahed
suring the PD-1 levels. It is well-known that pre-treatment PD-1 levels
vary across early response groups. Accordingly, the immunologist would
like to adjust for pre-treatment PD-1 levels while assessing the effect of
treatment and early response on the change in PD-1 expressions.
Assuming that there are n patients in each early response group, use
your own set of notation to set up a linear model that will answer the
immunologist’s questions. Write the model in matrix form.
3. Consider the linear model:
Yijk = βi + βj + ijk , i, j = 1, 2, 3; i < j; k = 1, 2,
(1.3.1)
so that there are a total of 6 observations.
Write the model in matrix form.
4. Suppose the investigators want to compare the effect of a treatments by
treating N = an individuals. Treatments are allocated randomly in such
a way that there are n individuals in each treatment group. Even though
the treatments were assigned randomly, investigators are concerned that
younger patients might respond better than the older patients. Therefore, the analyst needs to adjust for the factor age while comparing the
Chapter 1
30
BIOS 2083
Linear Models
Abdus S. Wahed
treatments. The observed data for this problem is (Yij , Xij ), where Yij
and Xij respectively denote the response and age for the jth individual
assigned to the ith treatment. Suppose we want to treat age as a continuous variable and want to model the response as a linear function of
treatment effect αi and age effect β. Write the linear model in the form
Y = Xβ + .
5. Suppose 2n Hepatitis C patients are randomized equally to two treatments IFN (treatment 1) and IFN-RBV (treatment 2). Hepatitis C virus
(HCV) RNA levels are measured on each patient at day 0 (timepoint 0)
and at week 24 (timepoint 1). The objective of interest is to compare
the effect of two treatments in reducing the HCV RNA levels after 24
weeks of therapy. The following linear model have been assumed:
⎧
⎪
⎨ μ + eijk ,
k = 0,
yijk =
i = 1, 2; j = 1, 2, . . . , n,
(1.3.2)
⎪
⎩ μ − αi + eijk , k = 1,
where yijk denote the HCV RNA levels at timepoint k for the jth patient
in the ith group.
Write the above model in the form Y = Xβ + by explicitly defining Y ,
X and β.
Chapter 1
31
BIOS 2083
Linear Models
Abdus S. Wahed
6. Suppose Y11, Y12, . . . , Y1ni be n1 independent observations from a N (μ +
α1 , σ 2) distribution and Y21, Y22, . . . , Y2n2 be n2 independent observations
from a N (μ − α2 , σ 2) distribution. Notice that the two populations
have different means but the same standard deviation. Assume that
Y1j and Y2j are independent for all j. Define n1 + n2 = n, and Y =
(Y11, Y12, . . . , Y1n1 , Y21, Y22, . . . , Y2n2 )T as the n × 1 vector consisting of all
n observations. We write Y as
Y = Xβ + .
(1.3.3)
What are the Y , X and β in the above equation?
7. Homeostasis Model Assessment (HOMA) is a measure of insulin resistance, calculated as a product of fasting glucose and insulin levels. The
higher the HOMA score, the higher the insulin resistance.
Researchers at the University of Michigan, An Arbor have collected fasting glucose and insulin levels for a group of hepatitis C patients undergoing peg-interferon therapy. HOMA score was computed for all patients
at baseline and at 24 week post therapy. The goal was to identify factors
associated with changes in insulin resistance in response to peg-interferon
Chapter 1
32
BIOS 2083
Linear Models
Abdus S. Wahed
therapy. The candidate factors are:
(i) Baseline BMI (a continuous measure)
(ii) Peg-interferon dose (0 = placebo, 1 = 135mcg, and 2 = 180mcg)
(iii) HCV negativity at week 24 (1 = negative, 0 = positive), and
(iv) Interaction between (ii) and (iii).
Use your own set of notation to develop a linear model for this problem.
Make sure to clearly define each symbol that appears in your model.
8. In a recent weight loss study, subjects were randomized to two treatment groups - SBWP (standard behavioral weight-control program) and
EWLI (extended weight loss intervention). Subjects in both treatment
groups received instructions on exercise and diet in batches, (a subject
could belong to one batch only). Subjects in EWLI group additionally
received personalized text messages on their cellular phones. The study
weighed each subject at baseline (Month 0), and then at months 6, 12,
and 24. The aim of the study was to compare weight loss between the
two groups at months 6, 12, and 24 from baseline. Using weight as the
outcome variable, we would like to develop a linear model to conduct the
Chapter 1
33
BIOS 2083
Linear Models
Abdus S. Wahed
statistical analysis for this study. The proposed model should treat time
as a categorical independent variable. Since subjects received instructions in batches, the model should account for the correlation among
patients belonging to the same batch.
(a) Write the linear model using your own notation for random variables,
parameters, and error term. You must define each of the terms, and
describe assumptions you make about the random variables and the
parameters in your model.
(b) Express the null hypothesis ”The weight loss after 24 months of
treatment is similar between the two treatment groups” in terms of
the parameters of your model in part (a).
Chapter 1
34
Chapter 2
A short review of matrix algebra
2.1
Vectors and vector spaces
Definition 2.1.1. A vector a of dimension n is a collection of n elements
typically written as
⎞
⎛
⎜
⎜
⎜
a=⎜
⎜
⎜
⎝
a1
a2
..
.
⎟
⎟
⎟
⎟ = (ai )n.
⎟
⎟
⎠
an
Vectors of length 2 (two-dimensional vectors) can be thought of points in
the plane (See figures).
35
BIOS 2083
Linear Models
Abdus S. Wahed
Figure 2.1: Vectors in two and three dimensional spaces
(-1.5,2)
(1, 1)
(1, -2)
x1
(2.5, 1.5, 0.95)
x2
(0, 1.5, 0.95)
x3
Chapter 2
36
BIOS 2083
Linear Models
Abdus S. Wahed
• A vector with all elements equal to zero is known as a zero vector and
is denoted by 0.
• A vector whose elements are stacked vertically is known as column
vector whereas a vector whose elements are stacked horizontally will be
referred to as row vector. (Unless otherwise mentioned, all vectors will
be referred to as column vectors).
• A row vector representation of a column vector is known as its trans
T
pose. We will use
⎛ the⎞notation ‘ ’ or ‘ ’ to indicate a transpose. For
a
⎜ 1 ⎟
⎟
⎜
⎜ a2 ⎟
T
⎟
instance, if a = ⎜
⎜ .. ⎟ and b = (a1 a2 . . . an ), then we write b = a
⎜ . ⎟
⎠
⎝
an
or a = bT .
• Vectors of same dimension are conformable to algebraic operations such
as additions and subtractions. Sum of two or more vectors of dimension
n results in another n-dimensional vector with elements as the sum of
the corresponding elements of summand vectors. That is,
(ai)n ± (bi)n = (ai ± bi)n .
Chapter 2
37
BIOS 2083
Linear Models
Abdus S. Wahed
• Vectors can be multiplied by a scalar.
c(ai )n = (cai )n.
• Product of two vectors of same dimension can be formed when one of
them is a row vector and the other is a column
The result
is called
⎛
⎞
⎞
⎛ vector.
b
a
⎜ 1 ⎟
⎜ 1 ⎟
⎜
⎟
⎟
⎜
⎜ b2 ⎟
⎜ a2 ⎟
⎜
⎟
⎟
inner, dot or scalar product. if a = ⎜
⎜ .. ⎟ and b = ⎜ .. ⎟, then
⎜ . ⎟
⎜ . ⎟
⎝
⎠
⎠
⎝
an
bn
aT b = a1 b1 + a2 b2 + . . . + an bn .
Definition 2.1.2. The length, magnitude, or Euclidean norm of a vector is defined as the square root of the sum of squares of its elements and is
denoted by ||.||. For example,
n
a2i =
||a|| = ||(ai )n|| =
√
aT a.
i=1
• The length of the sum of two or more vectors is less than or equal to the
sum of the lengths of each vector. (Cauchy-Schwarz Inequality).
||a + b|| ≤ ||a|| + ||b||
Chapter 2
38
BIOS 2083
Linear Models
Abdus S. Wahed
Definition 2.1.3. A set of vectors {a1 , a2 , . . . , am} is linearly dependent
if at least one of them can be written as a linear combination of the others.
In other words, {a1 , a2 , . . . , am } are linearly dependent if there exists at
least one non-zero cj such that
m
cj aj = 0.
(2.1.1)
j=1
In other words, for some k,
ak = −(1/ck )
cj aj .
j=k
Definition 2.1.4. A set of vectors are linearly independent if they are
not linearly dependent. That is, in order for (2.1.1) to hold, all cj ’s must be
equal to zero.
Chapter 2
39
BIOS 2083
Linear Models
Abdus S. Wahed
Definition 2.1.5. Two vectors a and b are orthogonal if their scalar product is zero. That is, aT b = 0, and we write a ⊥ b.
Definition 2.1.6. A set of vectors is said to be mutually orthogonal if
members of any pair of vectors belonging to the set are orthogonal.
• If vectors are mutually orthogonal then they are linearly independent.
Chapter 2
40
BIOS 2083
Linear Models
Abdus S. Wahed
Definition 2.1.7. Vector space. A set of vectors which are closed under
addition and scalar multiplication is known as a vector space.
Thus if V is a vector space, for any two vectors a and b from V, (i)
ca a + cb b ∈ V, and (ii) ca a ∈ V for any two constants ca and cb .
Definition 2.1.8. Span. All possible linear combinations of a set of linearly
independent vectors form a Span of that set.
Thus if A = {a1 , a2 , . . . , am } is a set of m linearly independent vectors,
then the span of A is given by
m
span(A) =
cj aj
a:a=
,
j=1
for some numbers cj , j = 1, 2, . . . , m. Viewed differently, the set of vectors A
generates the vector space span(A) and is referred to as a basis of span(A).
Formally,
• Let a1 , a2 , . . . , am be a set of m linearly independent n-dimensional vector in a vector space V that spans V. Then a1 , a2, . . . , am together forms
a basis of V and the dimension of a vector space is defined by the number
of vectors in its basis. That is, dim(V) = m.
Chapter 2
41
BIOS 2083
2.2
Linear Models
Abdus S. Wahed
Matrix
Definition 2.2.1. A matrix is a rectangular or square arrangement of numbers. A matrix with m rows and n columns is referred to as an m × n (read
as ‘m by n’) matrix. An m × n matrix A with (i, j)th element aij is written
as
⎡
A = (aij )m×n
a
⎢ 11
⎢
⎢ a21
=⎢
⎢
⎢ ···
⎣
am1
⎤
a12 . . . a1n
⎥
⎥
a22 . . . a2n ⎥
⎥.
⎥
··· ... ··· ⎥
⎦
am2 . . . amn
If m = n then the matrix is a square matrix.
Definition 2.2.2. A diagonal matrix is a square matrix with non-zero
elements in the diagonal cells and zeros elsewhere.
A diagonal matrix with diagonal elements a1 , a2 , . . . , an is written as
⎤
⎡
0 ... 0
a
⎥
⎢ 1
⎥
⎢
⎢ 0 a2 . . . 0 ⎥
⎥.
diag(a1 , a2, . . . , an ) = ⎢
⎥
⎢
⎢ ··· ··· ... ··· ⎥
⎦
⎣
0
0 . . . an
Definition 2.2.3. An n × n diagonal matrix with all diagonal elements equal
to 1 is known as identity matrix of order n and is denoted by In .
Chapter 2
42
BIOS 2083
Linear Models
Abdus S. Wahed
A similar notation Jmn is sometimes used for an m × n matrix with all
elements equal to 1, i.e.,
⎡
Jmn
⎤
1
1
⎢
⎢
⎢ 1
1
=⎢
⎢
⎢ ··· ···
⎣
1
1
...
1
⎥
⎥
... 1 ⎥
⎥ = [1m 1m . . . 1m ] .
⎥
... ··· ⎥
⎦
... 1
Like vectors, matrices with the same dimensions can be added together
and results in another matrix. Any matrix is conformable to multiplication
by a scalar. If A = (aij )m×n and B = (bij )m×n, then
1. A ± B = (aij ± bij )m×n, and
2. cA = (caij )m×n.
Definition 2.2.4. The transpose of a matrix A = (aij )m×n is defined by
AT = (aji)n×m.
• If A = AT , then A is symmetric.
• (A + B)T = (AT + BT ).
Chapter 2
43
BIOS 2083
Linear Models
Abdus S. Wahed
Definition 2.2.5. Matrix product. If A = (aij )m×n and B = (aij )n×p,
then
AB = (cij )m×p,
aik bkj = aTi bj ,
cij =
k
where ai is the ith row (imagine as a vector) of A and bj is the jth column
(vector) of B.
• (AB)T = BT AT ,
• (AB)C = A(BC),whenever defined,
• A(B + C) = AB + AC, whenever defined,
• Jmn Jnp = nJmp .
Chapter 2
44
BIOS 2083
2.3
Linear Models
Abdus S. Wahed
Rank, Column Space and Null Space
Definition 2.3.1. The rank of a matrix A is the number of linearly independent rows or columns of A. We denote it by rank(A).
• rank(AT ) = rank(A).
• An m × n matrix A with with rank m (n) is said to have full row
(column) rank.
• If A is a square matrix with n rows and rank(A) < n, then A is singular
and the inverse does not exist.
• rank(AB) ≤ min(rank(A), rank(B)).
• rank(AT A) = rank(AAT ) = rank(A) = rank(AT ).
Chapter 2
45
BIOS 2083
Linear Models
Abdus S. Wahed
Definition 2.3.2. Inverse of a square matrix. If A is a square matrix
with n rows and rank(A) = n, then A is called non-singular and there exists
a matrix A−1 such that AA−1 = A−1A = In . The matrix A−1 is known as
the inverse of A.
• A−1 is unique.
• If A and B are invertible and has the same dimension, then
(AB)−1 = B−1A−1.
• (cA)−1 = A−1/c.
• (AT )−1 = (A−1)T .
Chapter 2
46
BIOS 2083
Linear Models
Abdus S. Wahed
Definition 2.3.3. Column space. The column space of a matrix A is the
vector space generated by the columns of A. If A = (aij )m×n = (a1 a2 . . . an ,
then the column space of A, denoted by C(A) or R(A) is given by
n
C(A) =
cj aj
a:a=
,
j=1
for scalars cj , j = 1, 2, . . . , n.
Alternatively, a ∈ C(A) iff there exists a vector c such that
a = Ac.
• What is the dimension of the vectors in C(A)?
• How many vectors will a basis of C(A) have?
• dim(C(A)) =?
• If A = BC, then C(A) ⊆ C(B).
• If C(A) ⊆ C(B), then there exist a matrix C such that A = BC.
Example 2.3.1. Find a basis for the column space of the matrix
⎡
⎤
−1 2 −1
⎢
⎥
⎢
⎥
A = ⎢ 1 1 4 ⎥.
⎣
⎦
0 2 2
Chapter 2
47
BIOS 2083
Linear Models
Abdus S. Wahed
Definition 2.3.4. Null Space. The null space of an m × n matrix A is defined as the vector space consisting of the solution of the system of equations
Ax = 0. Null space of A is denoted by N (A) and can be written as
N (A) = {x : Ax = 0} .
• What is the dimension of the vectors in N (A)?
• How many vectors are there in a basis of N (A)?
• dim(N (A)) = n − rank(A) → Nullity of A.
Chapter 2
48
BIOS 2083
Linear Models
Abdus S. Wahed
Definition 2.3.5. Orthogonal complements. Two sub spaces V1 and V2
of a vector space V forms orthogonal complements relative to V if every vector
in V1 is orthogonal to every vector in V2 . We write V1 = V2⊥ or equivalently,
V2 = V1⊥ .
• V1 ∩ V2 = {0}.
• If dim(V1 ) = r, then dim(V2) = n − r, where n is the dimension of the
vectors in the vector space V.
• Every vector a in V can be uniquely decomposed into two components
a1 and a2 such that
a = a1 + a2 ,
a1 ∈ V1 , a2 ∈ V2 .
(2.3.1)
• If (2.3.1) holds, then
a
2
= a1
2
+ a2 2.
(2.3.2)
How?
Chapter 2
49
BIOS 2083
Linear Models
Abdus S. Wahed
Proof of (2.3.1).
• Existence. Suppose it is not possible. Then a is independent of the
basis vectors of V1 and V2 . But that would make the total number of
independent vectors in V n + 1. Is that possible?
• Uniqueness. Let two such decompositions are possible, namely,
a = a1 + a2 ,
a1 ∈ V1 , a2 ∈ V2 ,
a = b1 + b2 ,
b1 ∈ V1 , b2 ∈ V2 .
and
Then,
a1 − b1 = b2 − a2 .
This implies
a1 = b1 & b2 = a2 .(Why?)
.
Chapter 2
50
BIOS 2083
Linear Models
Abdus S. Wahed
Proof of (2.3.2).
• From (2.3.1),
a
2
= aT a
= (a1 + a2 )T (a1 + a2 )
= aT1 a1 + aT1 a2 + aT2 a1 + aT2 a2
=
a1
2
+ a2 2.
(2.3.3)
This result is known as Pythagorean theorem.
Chapter 2
51
BIOS 2083
Linear Models
Abdus S. Wahed
Figure 2.2: Orthogonal decomposition (direct sum)
V1 = {(x, y): x = y İ R2}
(3/2, 3/2)
(2, 1) =
V = {(x, y): x, y İ R2}
+
( 1/2, -1/2 )
V2 = {(x, y): x, y İ R2,x+ y = 0}
Chapter 2
52
BIOS 2083
Linear Models
Abdus S. Wahed
Theorem 2.3.2. If A is an m × n matrix, and C(A) and N (AT ) respectively
denote the column and null space of A and AT , then
C(A) = N (AT )⊥.
Proof.
• dim(C(A)) = rank(A) = rank(AT ) = r (say), dim(N (AT )) =
m − r.
• Suppose a1 ∈ C(A) and a2 ∈ N (AT ). Then, there exist a c such that
Ac = a1 ,
and
AT a2 = 0.
Now,
aT1 a2 = cT AT a2
= 0.
Chapter 2
(2.3.4)
53
BIOS 2083
Linear Models
Abdus S. Wahed
• (More on Orthogonality.) If V1 ⊆ V2 , and V1⊥ and V2⊥ respectively denote
their orthogonal complements, then
V2⊥ ⊆ V1⊥ .
Chapter 2
54
BIOS 2083
Linear Models
Abdus S. Wahed
Proof. Proof of the result on previous page. Suppose a1 ∈ V1. Then we
can write
a1 = A1c1,
for some vector c1 and the columns of matrix A1 consisting of the basis
vectors of V1 . And similarly,
a2 = A2c2 , ∀ a2 ∈ V2 .
In other words,
V1 = C(A1)
and
V2 = C(A2).
Since V1 ⊆ V2 , there exists a matrix B such that A1 = A2B. (See PAGE 39)
Now let, a ∈ V2⊥ =⇒ a ∈ N (AT2 ) implying
AT2 a = 0.
But
AT1 a = BT AT2 a = 0,
providing that a ∈ N (AT1 ) = V2⊥ .
Chapter 2
55
BIOS 2083
2.4
Linear Models
Abdus S. Wahed
Trace
The trace of a matrix will become handy when we will talk about the distribution of quadratic forms.
Definition 2.4.1. Trace of a square matrix is the sum of its diagonal
elements. Thus, if A = (aij )n×n, then
n
aii
trace(A) =
i=1
.
• trace(In ) =
• trace(A) = trace(AT )
• trace(A + B) = trace(A) + trace(B)
• trace(AB) = trace(BA)
• trace(AT A) = trace(A2 ) =
Chapter 2
n n
i=1
2
j=1 aij .
56
BIOS 2083
2.5
Linear Models
Abdus S. Wahed
Determinants
Definition 2.5.1. Determinant. The determinant of a scalar is the scalar
itself. The determinants of an n × n matrix A = (aij )m×n is given by a scalar,
written as |A|, where,
n
aij (−1)i+j |Mij |,
|A| =
j=1
for any fixed i, where, the determinant |Mij | of the matrix Mij is known as
the minor of aij and the matrix Mij is obtained by deleting the ith row and
jth column of matrix A.
• |A| = |AT |
• |diag(di , i = 1, 2, . . . , n)| =
n
i=1 di .
This also holds if the matrix is an upper or lower triangular matrix with
diagonal elements di, i = 1, 2, . . . , n.
Chapter 2
57
BIOS 2083
Linear Models
Abdus S. Wahed
• |AB| = |A||B|
• |cA| = cn |A|
• If A is singular (rank(A) < n), then |A| = 0.
• |A−1| = 1/|A|.
• The determinants of block-diagonal (block-triangular) matrices works
the way as you would expect. For instance,
A C
= |A||B|.
0 B
In general
Chapter 2
A B
C D
= |A||D − CA−1B|.
58
BIOS 2083
2.6
Linear Models
Abdus S. Wahed
Eigenvalues and Eigenvectors
Definition 2.6.1. Eigenvalues and eigen vectors. The eigenvalues (λ)
of a square matrix An×n and the corresponding eigenvectors (a) are defined
by the set of equations
Aa = λa.
(2.6.1)
Equation (2.6.1) leads to the polynomial equation
|A − λIn | = 0.
(2.6.2)
For a given eigenvalue, the corresponding eigenvector is obtained as the solution to the equation (2.6.1). The solutions to equation (2.6.1) constitutes
the eigenspace of the matrix A.
Example 2.6.1. Find the eigenvalues
⎡
−1
⎢
⎢
A=⎢ 1
⎣
0
Chapter 2
and eigenvectors for the matrix
⎤
2 0
⎥
⎥
2 1 ⎥.
⎦
2 −1
59
BIOS 2083
Linear Models
Abdus S. Wahed
Since in this course our focus will be on the eigenvalues of symmetric
matrices, hereto forth we state the results on eigenvalues and eigenvectors
applied to a symmetric matrix A. Some of the results will, however, hold for
general A. If you are interested, please consult a linear algebra book such as
Harville’s Matrix algebra from statistics perspective.
Definition 2.6.2. Spectrum. The spectrum of a matrix A is defined as the
set of distinct (real) eigenvalues {λ1, λ2 , . . . , λk } of A.
• The eigenspace L of a matrix A corresponding to an igenvalue λ can be
written as
L = N (A − λIn ).
• trace(A) =
• |A| =
n
i=1 λi .
n
i=1 λi .
• |In ± A| =
n
i=1 (1
± λi ).
• Eigenvectors associated with different eigenvalues are mutually orthogonal or can be chosen to be mutually orthogonal and hence linearly
independent.
• rank(A) is the number of non-zero λi ’s.
Chapter 2
60
BIOS 2083
Linear Models
Abdus S. Wahed
The proof of some of these results can be easily obtained through the
application of a special theorem called spectral decomposition theorem.
Definition 2.6.3. Orthogonal Matrix. A matrix An×n is said to be orthogonal if
AT A = In = AAT .
This immediately implies that A−1 = AT .
Theorem 2.6.2. Spectral decomposition. Any symmetric matrix Acan
be decomposed as
A = BΛBT ,
where Λ = diag(λ1 , . . . , λn ), is the diagonal matrix of eigenvalues and B is
an orthogonal matrix having its columns as the eigenvectors of A, namely,
A = [a1 a2 . . . an ], where aj ’s are orthonormal eigenvectors corresponding to
the eigenvalues λj , j = 1, 2, . . . , n.
Proof.
Chapter 2
61
BIOS 2083
Linear Models
Abdus S. Wahed
Outline of the proof of spectral decomposition theorem:
• By definition, B satisfies
AB = BΛ,
(2.6.3)
and
B T B = In .
Then from (2.6.3),
A = BΛB−1 = BΛBT .
Spectral decomposition of a symmetric matrix allows one to form ’square
root’ of that matrix. If we define
√
√
A = B ΛBT ,
it is easy to verify that
√ √
A A = A.
In general, one can define
Aα = BΛα BT , α ∈ R.
Chapter 2
62
BIOS 2083
Linear Models
Abdus S. Wahed
Example 2.6.3. Find a matrix B and the matrix Λ (the diagonal matrix of
eigenvalues) such that
⎡
A=⎣
Chapter 2
6
−2
−2
9
⎤
⎦ = BT ΛB.
63
BIOS 2083
2.7
Linear Models
Abdus S. Wahed
Solutions to linear systems of equations
A linear system of m equations in n unknowns is written as
Ax = b,
(2.7.1)
where Am×n is a matrix and b is a vector of known constants and x is an
unknown vector. The goal usually is to find a value (solution) of x such that
(2.7.1) is satisfied. When b = 0, the system is said to be homogeneous. It
is easy to see that homogeneous systems are always consistent, that is, has
at least one solution.
• The solution set of a homogeneous system of equation Ax = 0 forms a
vector space and is given by N (A).
• A non-homogeneous system of equations Ax = b is consistent iff
rank(A, b) = rank(A).
– The system of linear equations Ax = b is consistent iff b ∈ C(A).
– If A is square and rank(A) = n, then Ax = b has a unique solution
given by x = A−1b.
Chapter 2
64
BIOS 2083
2.7.1
Linear Models
Abdus S. Wahed
G-inverse
One way to obtain the solutions to a system of equations (2.7.1) is just to
transform the augmented matrix (A, b) into a row-reduced-echelon form.
However, such forms are not algebraically suitable for further algebraical
treatment. Equivalent to the inverse of a non-singular matrix, one can define
an inverse, referred to as generalized inverse or in short g-inverse of any matrix, square or rectangular, singular or non-singular. This generalized inverse
helps finding the solutions of linear equations easier. Theoretical developments based on g-inverse are very powerful for solving problems arising in
linear models.
Definition 2.7.1. G-inverse. The g-inverse of a matrix Am×n is a matrix
Gn×m that satisfies the relationship
AGA = A.
Chapter 2
65
BIOS 2083
Linear Models
Abdus S. Wahed
The following two lemmas are useful for finding the g-inverse of a matrix
A.
Lemma 2.7.1. Suppose rank(Am×n) = r, and Am×n can be factorized as
⎤
⎡
A11 A12
⎦
Am×n = ⎣
A21 A22
such that A11 is of dimension r × r with rank(A11) = r. Then, a g-inverse
of A is given by
⎡
Gn×m = ⎣
Example 2.7.2. Find the g-inverse
⎡
1
⎢
⎢
A=⎢0
⎣
1
Chapter 2
⎤
A−1
11
0
0
0
⎦.
of the matrix
⎤
1 1 1
⎥
⎥
1 0 −1 ⎥ .
⎦
0 1 2
66
BIOS 2083
Linear Models
Abdus S. Wahed
Suppose you do not have an r × r minor to begin with. What do you do
then?
Lemma 2.7.3. Suppose rank(Am×n) = r, and there exists non-singular matrices B and C such that
⎡
BAC = ⎣
⎤
D 0
⎦.
0 0
where D is a diagonal matrix with rank(D) = r. Then, a g-inverse of A is
given by
⎡
Gn×m = C−1 ⎣
Chapter 2
−1
⎤
D
0
0
0
⎦ B−1.
67
BIOS 2083
Linear Models
Abdus S. Wahed
• rank(G) ≥ rank(A).
• G-inverse of a matrix is not necessarily unique. For instance,
– If G is a g-inverse of a symmetric matrix A, then GAG is also a
g-inverse of A.
– If G is a g-inverse of a symmetric matrix A, then G1 = (G+GT )/2
is also a g-inverse of A.
– The g-inverse of a diagonal matrix D = diag(d1 , . . . , dn) is another
diagonal matrix Dg = diag(dg1 , . . . , dgn), where
⎧
⎨ 1/di, di = 0,
g
di =
⎩ 0, d = 0.
i
Again, as you can see, we concentrate on symmetric matrices as this matrix
properties will be applied to mostly symmetric matrices in this course.
Chapter 2
68
BIOS 2083
Linear Models
Abdus S. Wahed
Another way of finding a g-inverse of a symmetric matrix.
Lemma 2.7.4. Let A be an n-dimensional symmetric matrix. Then a ginverse of A, G is given by
G = QT ΛQ,
where Q and Λ bears the same meaning as in spectral decomposition theorem.
2.7.2
Back to the system of equations
Theorem 2.7.5. If Ax = b is a consistent system of linear equations and
G be a g-inverse of A, then Gb is a solution to Ax = b.
Proof.
Chapter 2
69
BIOS 2083
Linear Models
Abdus S. Wahed
Theorem 2.7.6. x∗ is a solution to the consistent system of linear equation
Ax = b iff there exists a vector c such that
x∗ = Gb + (I − GA)c,
for some g-inverse G of A.
Proof.
Chapter 2
70
BIOS 2083
Linear Models
Abdus S. Wahed
Proof. Proof of Theorem 2.7.6.
If part.
For any compatible vector c and for any g-inverse G of A, define
x∗ = Gb + (I − GA)c.
Then,
Ax∗ = A[Gb + (I − GA)c] = AGb + (A − AGA)c = b + 0 = b.
Only If part.
Suppose x∗ is a solution to the consistent system of linear equation Ax = b.
Then
x∗ = Gb + (x∗ − Gb) = Gb + (x∗ − GAx∗) = Gb + (I − GA)c,
where c = x∗ .
Remark 2.7.1.
1. Any solution to the system of equations Ax = b can be
written as a sum of two components: one being a solution by itself and
the other being in the null space of A.
2. If one computes one g-inverse of A, then he/she has identified all possible
solutions of Ax = b.
Chapter 2
71
BIOS 2083
Linear Models
Abdus S. Wahed
Example 2.7.7. Give a general form of the solutions to the system of equations
1 2
⎢
⎢
⎢1 1
⎢
⎢
⎢0 1
⎣
1 −1
Chapter 2
⎤⎡
⎡
⎤
⎡
⎤
5
x
⎥
⎥⎢ 1 ⎥ ⎢
⎥
⎥ ⎢
⎥⎢
1 1 ⎥ ⎢ x2 ⎥ ⎢ 3 ⎥
⎥.
⎥=⎢
⎥⎢
⎥
⎥ ⎢
⎥⎢
0 −1 ⎥ ⎢ x3 ⎥ ⎢ 2 ⎥
⎦
⎦ ⎣
⎦⎣
−1
x4
1 3
1
0
72
BIOS 2083
Linear Models
Abdus S. Wahed
Idempotent matrix and projections
Definition 2.7.2. Idempotent matrix. A square matrix B is idempotent
if B2 = BB = B.
• If B is idempotent, then rank(B) = trace(B).
• If Bn×n is idempotent, then In −B is also idempotent with rank(In −B) =
n − trace(B).
• If Bn×n is idempotent with rank(B) = n, then B = In .
Lemma 2.7.8. If the m × n matrix A has rank r, then the matrix In − GA
is idempotent with rank n − r, where G is a g-inverse of A.
Chapter 2
73
BIOS 2083
Linear Models
Abdus S. Wahed
Definition 2.7.3. Projection. A square matrix Pn×n is a projection onto a
vector space V ⊆ Rn iff all three of the following holds: (a) P is idempotent,
(b) ∀x ∈ Rn , Px ∈ V, and (c)∀x ∈ V, Px = x. An idempotent matrix is a
projection onto its own column space.
Example 2.7.9. Let the vector space be defined as
V = {(v1, v2), v2 = kv1} ⊆ R2 ,
⎧
⎫
⎨ t (1 − t)/k ⎬
for some non-zero real constant k. Consider the matrix P =
⎩ kt (1 − t) ⎭
for any real number t ∈ R. Notice that
(a) PP = P,
(b) For any x = (x1, x2)T ∈ R2 , Px = (tx1 +(1−t)x2/k, ktx1 +(1−t)x2)T ∈
V.
(c) For any x = (x1, x2)T = (x1, kx1) ∈ V, Px = x.
Thus, P is a projection onto the vector space V. Notice that the projection
P is not unique as it depends on the coice of t. Consider k = 1. Then V is the
linear space representing the line with unit slope passing through
the⎫
origin.
⎧
⎨ 2 −1 ⎬
When multiplied by the projection matrix (for t = 2) P1 =
, any
⎩ 2 −1 ⎭
point in the two-dimensional real space produces a point in V. For instance,
the point (1, .5) when multiplied by P1 produces (1.5, 1.5) which belongs to
Chapter 2
74
BIOS 2083
Linear Models
Abdus S. Wahed
Figure 2.3: Projections.
V = {(x,y), x = y}
(1.5,1.5)
(.75, .75)
P1
P2
(1,1/2)
⎧
⎫
⎨ .5 .5 ⎬
V. But the projection P2 =
projects the point (1, .5) onto V at
⎩ .5 .5 ⎭
(0.75, 0.75). See figure.
Chapter 2
75
BIOS 2083
Linear Models
Abdus S. Wahed
Back to g-inverse and solution of system of equations
Lemma 2.7.10. If G is a g-inverse of A, then I − GA is a projection onto
N (A).
Proof. Left as an exercise.
Lemma 2.7.11. If G is a g-inverse of A, then AG is a projection onto
C(A).
Proof. Left as an exercise (Done in class).
Lemma 2.7.12. If P and Q are symmetric and both project onto the same
space V ⊆ Rn , then P = Q.
Proof.
Chapter 2
76
BIOS 2083
Linear Models
Abdus S. Wahed
By definition, for any x ∈ Rn , Px ∈ V & Qx ∈ V. Let
Px = x1 ∈ V & Qx = x2 ∈ V.
Then,
(P − Q)x = (x1 − x2 ), ∀x ∈ Rn .
(2.7.2)
Multiplying both sides by PT = P,
PT (P − Q)x = P T (x1 − x2 ) = (x1 − x2 ), ∀x ∈ Rn .
We get,
P(P − Q)x = P(x1 − x2 ) = (x1 − x2 ), ∀x ∈ Rn .
(2.7.3)
Subtracting (2.7.2) from (2.7.3) we obtain,
=⇒ [P(P − Q) − (P − Q)]x = 0, ∀x ∈ Rn ,
=⇒ Q = PQ.
Multiplying both sides of (2.7.2) by QT = Q and following similar procedure
we can show that P = PQ = Q.
Chapter 2
77
BIOS 2083
Linear Models
Abdus S. Wahed
Lemma 2.7.13. Suppose V1 , V2(V1 ⊆ V2) are vector spaces in Rn and P1,
⊥
P2 , and P⊥
1 are symmetric projections onto V1 , V2 , and V1 respectively.
Then,
1. P1P2 = P2 P1 = P1 . (The smaller projection survives.)
⊥
2. P⊥
1 P1 = P1 P1 = 0.
3. P2 − P1 is a projection matrix. (What does it project onto?)
Proof. See Ravishanker and Dey, Page 62, Result 2.6.7.
Chapter 2
78
BIOS 2083
2.8
Linear Models
Abdus S. Wahed
Definiteness
Definition 2.8.1. Quadratic form. If x is a vector in Rn and A is a matrix
in Rn×n, then the scalar xT Ax is known as a quadratic form in x.
The matrix A does not need to be symmetric but any quadratic form
xT Ax can be expressed in terms of symmetric matrices, for,
xT Ax = (xT Ax + xT AT x)/2 = xT [(A + AT )/2]x.
Thus, without loss of generality, the matrix associated with a quadratic form
will be assumed symmetric.
Definition 2.8.2. Non-negative definite/Positive semi-definite.
A
quadratic form xT Ax and the corresponding matrix A is non-negative definite if xT Ax ≥ 0 for all x ∈ Rn .
Definition 2.8.3. Positive definite. A quadratic form xT Ax and the corresponding matrix A is positive definite if xT Ax > 0 for all x ∈ Rn , x = 0,
and xT Ax = 0 only when x = 0.
Chapter 2
79
BIOS 2083
Linear Models
Abdus S. Wahed
Properties related to definiteness
1. Positive definite matrices are non-singular. The inverse of a positive
definite matrix is also positive definite.
2. A symmetric matrix is positive (non-negative) definite iff all of its eigenvalues are positive (non-negative).
3. All diagonal elements and hence the trace of a positive definite matrix
are positive.
4. If A is symmetric positive definite then there exists a nonsingular matrix
Q such that A = QQT .
5. A projection matrix is always positive semi-definite.
6. If A and B are non-negative definite, then so is A + B. If one of A or
B is positive definite, then so is A + B.
Chapter 2
80
BIOS 2083
2.9
Linear Models
Abdus S. Wahed
Derivatives with respect to (and of ) vectors
Definition 2.9.1. Derivative with respect to a vector. Let f (a) be any
scalar function of the vector an×1 . Then the derivative of f with respect to
a is defined as the vector
⎡
δf
δa1
⎤
⎢
⎥
⎢ δf ⎥
⎢ δa2 ⎥
δf
⎥,
=⎢
⎢
.
δa ⎢ .. ⎥
⎥
⎣
⎦
δf
δan
and the derivative with respect to the aT is defined as
T
δf
δf
=
.
δaT
δa
The second derivative of f with respect to a is written as the derivative of
each of the elements in
δf
δa
with respect to aT and stacked as rows of n × n
matrix,. i.e.,
⎡
2
δ
δ f
=
δaδaT
δaT
Chapter 2
δf
δa
⎢
⎢
⎢
=⎢
⎢
⎢
⎣
δ2f
δa21
δ2f
δa2 δa1
..
.
δ2f
δan δa1
δ2 f
δa1 δa2
δ2 f
δa22
..
.
δ2 f
δan δa2
...
...
..
.
...
δ2f
δa1 δan
δ2f
δa2 δan
..
.
δ2f
δ 2 an
⎤
⎥
⎥
⎥
⎥.
⎥
⎥
⎦
81
BIOS 2083
Linear Models
Abdus S. Wahed
Example 2.9.1. Derivative of linear and quadratic functions of a
vector.
1.
δaT b
δb
2.
δbT Ab
δb
= a.
= Ab + AT b.
Derivatives with respect to matrices can be defined in a similar fashion.
We will only remind ourselves about one result on matrix derivatives which
will become handy when we talk about likelihood inference.
Lemma 2.9.2. If An×n is a symmetric non-singular matrix, then,
δ ln |A|
= A−1 .
δA
2.10
Problems
1. Are the following sets of vectors linearly independent? If not, in each
case find at least one vectors that are dependent on the others in the
set.
(a) v1T = (0, −1, 0), v2T = (0, 0, 1), v3T = (−1, 0, 0)
(b) v1T = (2, −2, 6), v2T = (1, 1, 1)
(c) v1T = (2, 2, 0, −2), v2T = (2, 0, 1, −1),v3T = (0, −2, 1, 1)
Chapter 2
82
78
BIOS 2083
Linear Models
Abdus S. Wahed
2. Show that a set of non-zero mutually orthogonal vectors v1 , v2 , . . . , vn
are linearly independent.
3. Find the determinant and inverse of the matrices




1 ρ



(a)
,

ρ 1


n×n


1 ρ ρ
 
 
ρ 1 ρ ,
 


ρ ρ 1

1 ρ ... ρ
ρ
..
.

2


(c) 
Chapter 2



,


ρ 1
1 ρ


 

1 ρ 0

 
 
ρ 1 ρ ,
 


0 ρ 1



ρ
..
.
ρ ρ ... 1


1 ρ ρ




1 ρ


, ρ 1 ρ ,
(b) 

 

ρ 1

2
ρ ρ 1

1 ...
..
. ...







1
ρ
..
.
ρ
1
..
.
ρ
2
ρ
...
...
ρ
... ρ
..
.
ρn ρn−1 ρn−2 . . .
n
n−1








1
n×n
1 ρ 0 ... 0 0 

ρ 1 ρ ... 0 0 


0 ρ 1 ... 0 0 

.. .. ..
.. .. 
. . . ... . . 


0 0 0 ... 1 ρ 


0 0 0 ... ρ 1
83
79
BIOS 2083
Linear Models
4. Find the the rank and a basis for

1 2

1 3


1 1


0 1


1 2
Chapter 2
Abdus S. Wahed
the null space of the matrix

2 −1 

1 −2 


3 0 


−1 −1 


2 −1
84
80
BIOS 2083
Chapter 2
Linear Models
Abdus S. Wahed
84
Chapter 3
Random Vectors and Multivariate
Normal Distributions
3.1
Random vectors
Definition 3.1.1. Random vector. Random vectors are vectors of random
variables. For instance,
⎞
⎛
⎜
⎜
⎜
X=⎜
⎜
⎜
⎝
X1
X2
..
.
⎟
⎟
⎟
⎟,
⎟
⎟
⎠
Xn
where each element represent a random variable, is a random vector.
Definition 3.1.2. Mean and covariance matrix of a random vector.
The mean (expectation) and covariance matrix of a random vector X is de85
BIOS 2083
Linear Models
fined as follows:
Abdus S. Wahed
⎛
⎞
E [X1]
⎜
⎜
⎜ E [X2]
E [X] = ⎜
⎜
..
⎜
.
⎝
E [Xn]
⎟
⎟
⎟
⎟,
⎟
⎟
⎠
and
cov(X) = E {X − E (X)} {X − E (X)}T
⎤
⎡
2
σ σ12 . . . σ1n
⎥
⎢ 1
⎥
⎢
2
⎢ σ21 σ2 . . . σ2n ⎥
⎥,
= ⎢
⎥
⎢ ..
.
.
.
.
.
.
⎢ .
.
.
. ⎥
⎦
⎣
σn1 σn2 . . . σn2
(3.1.1)
where σj2 = var(Xj ) and σjk = cov(Xj , Xk ) for j, k = 1, 2, . . . , n.
Properties of Mean and Covariance.
1. If X and Y are random vectors and A, B, C and D are constant matrices,
then
E [AXB + CY + D] = AE [X] B + CE[Y] + D.
(3.1.2)
Proof. Left as an exercise.
Chapter 3
86
BIOS 2083
Linear Models
Abdus S. Wahed
2. For any random vector X, the covariance matrix cov(X) is symmetric.
Proof. Left as an exercise.
3. If Xj , j = 1, 2, . . . , n are independent random variables, then cov(X) =
diag(σj2, j = 1, 2, . . . , n).
Proof. Left as an exercise.
4. cov(X + a) = cov(X) for a constant vector a.
Proof. Left as an exercise.
Properties of Mean and Covariance (cont.)
5. cov(AX) = Acov(X)AT for a constant matrix A.
Proof. Left as an exercise.
6. cov(X) is positive semi-definite.
Proof. Left as an exercise.
7. cov(X) = E[XXT ] − E[X] {E[X]}T .
Proof. Left as an exercise.
Chapter 3
87
BIOS 2083
Linear Models
Abdus S. Wahed
Definition 3.1.3. Correlation Matrix.
A correlation matrix of a vector of random variable X is defined as the
matrix of pairwise correlations between the elements of X. Explicitly,
⎤
⎡
⎢
⎢
⎢
corr(X) = ⎢
⎢
⎢
⎣
1
ρ21
..
.
ρ12 . . . ρ1n
1
..
.
ρn1 ρn2
⎥
⎥
. . . ρ2n ⎥
⎥,
..
.. ⎥
.
. ⎥
⎦
... 1
(3.1.3)
where ρjk = corr(Xj , Xk ) = σjk /(σj σk ), j, k = 1, 2, . . . , n.
Example 3.1.1. If only successive random variables in the random vector X
are correlated and have the same correlation ρ, then the correlation matrix
corr(X) is given by
⎡
⎢
⎢
⎢
⎢
⎢
corr(X) = ⎢
⎢
⎢
⎢
⎢
⎣
⎤
1 ρ 0 ... 0 ⎥
⎥
ρ 1 ρ ... 0 ⎥
⎥
⎥
0 ρ 1 ... 0 ⎥
⎥,
.. .. .. .. .. ⎥
. . . . . ⎥
⎥
⎦
0 0 0 ... 1
(3.1.4)
Example 3.1.2. If every pair of random variables in the random vector X
Chapter 3
88
BIOS 2083
Linear Models
Abdus S. Wahed
have the same correlation ρ, then the correlation matrix corr(X) is given by
⎤
⎡
⎢ 1 ρ ρ ... ρ ⎥
⎥
⎢
⎢ ρ 1 ρ ... ρ ⎥
⎥
⎢
⎥
⎢
⎥,
(3.1.5)
corr(X) = ⎢
ρ
ρ
1
.
.
.
ρ
⎥
⎢
⎢ . . . . . ⎥
⎢ .. .. .. .. .. ⎥
⎥
⎢
⎦
⎣
ρ ρ ρ ... 1
and the random variables are said to be exchangeable.
3.2
Multivariate Normal Distribution
Definition 3.2.1. Multivariate Normal Distribution. A random vector
X = (X1, X2, . . . , Xn)T is said to follow a multivariate normal distribution
with mean μ and covariance matrix Σ if X can be expressed as
X = AZ + μ,
where Σ = AAT and Z = (Z1, Z2, . . . , Zn ) with Zi , i = 1, 2, . . . , n iid N (0, 1)
variables.
Definition 3.2.2. Multivariate Normal Distribution. A random vector
X = (X1, X2, . . . , Xn)T is said to follow a multivariate normal distribution
with mean μ and a positive definite covariance matrix Σ if X has the density
1
1
exp − (x − μ)T Σ−1 (x − μ)
(3.2.1)
fX (x) =
n/2
1/2
2
(2π) |Σ|
Chapter 3
89
BIOS 2083
Linear Models
⎡ Abdus S.⎤Wahed
0.25 0.3
⎦
Bivariate normal distribution with mean (0, 0)T and covariance matrix ⎣
0.3 1.0
Probability Density
0.4
0.3
0.2
0.1
0
2
0
−2
x2
−3
−2
0
−1
1
2
3
x1
.
Properties
1. Moment generating function of a N (μ, Σ) random variable X is given
by
Chapter 3
1 T
T
MX (t) = exp μ t + t Σt .
2
(3.2.2)
90
BIOS 2083
Linear Models
Abdus S. Wahed
2. E(X) = μ and cov(X) = Σ.
3. If X1, X2, . . . , Xn are i.i.d N (0, 1) random variables, then their joint
distribution can be characterized by X = (X1, X2, . . . , Xn)T ∼ N (0, In).
4. X ∼ Nn(μ, Σ) if and only if all non-zero linear combinations of the
components of X are normally distributed.
Linear transformation
5. If X ∼ Nn(μ, Σ) and Am×n is a constant matrix of rank m, then Y =
Ax ∼ Np(Aμ, AΣAT ).
Proof. Use definition 3.2.1 or property 1 above.
Orthogonal linear transformation
6. If X ∼ Nn (μ, In) and An×n is an orthogonal matrix and Σ = In , then
Y = Ax ∼ Nn (Aμ, In).
Marginal and Conditional distributions
Suppose X is Nn (μ, Σ) and X is partitioned as follows,
⎞
⎛
X1
⎠,
X=⎝
X2
Chapter 3
91
BIOS 2083
Linear Models
Abdus S. Wahed
where X1 is of dimension p×1 and X2 is of dimension n−p×1. Suppose
the corresponding partitions for μ and Σ are given by
⎛
⎛
⎞
⎞
Σ Σ12
μ1
⎠ , and Σ = ⎝ 11
⎠
μ=⎝
μ2
Σ21 Σ22
respectively. Then,
7. Marginal distribution. X1 is multivariate normal - Np (μ1 , Σ11).
Proof. Use the result from property 5 above.
8. Conditional distribution. The distribution of X1 |X2 is p-variate normal - Np(μ1|2, Σ1|2), where,
μ1|2 = μ1 + Σ12Σ−1
22 (X2 − μ2 ),
and
Σ1|2 = Σ11 − Σ12 Σ−1
22 Σ21 ,
provided Σ is positive definite.
Proof. See Result 5.2.10, page 156 (Ravishanker and Dey).
Uncorrelated implies independence for multivariate normal random variables
9. If X, μ, and Σ are partitioned as above, then X1 and X2 are independent
if and only if Σ12 = 0 = ΣT21.
Chapter 3
92
BIOS 2083
Linear Models
Abdus S. Wahed
Proof. We will use m.g.f to prove this result. Two random vectors X1
and X2 are independent iff
M(X1 ,X2 ) (t1 , t2) = MX1 (t1 )MX2 (t2).
3.3
Non-central distributions
We will start with the standard chi-square distribution.
Definition 3.3.1. Chi-square distribution. If X1 , X2, . . . , Xn be n inde
pendent N (0, 1) variables, then the distribution of ni=1 Xi2 is χ2n (ch-square
with degrees of freedom n).
χ2n -distribution is a special case of gamma distribution when the scale
parameter is set to 1/2 and the shape parameter is set to be n/2. That is,
the density of χ2n is given by
fχ2n (x) =
(1/2)n/2 −x/2 n/2−1
e
x
, x ≥ 0; n = 1, 2, . . . , .
Γ(n/2)
Example 3.3.1. The distribution of (n − 1)S 2/σ 2, where S 2 =
(3.3.1)
n
i=1 (Xi
−
X̄)2/(n−1) is the sample variance of a random sample of size n from a normal
distribution with mean μ and variance σ 2 , follows a χ2n−1 .
Chapter 3
93
BIOS 2083
Linear Models
Abdus S. Wahed
The moment generating function of a chi-square distribution with n d.f.
is given by
Mχ2n (t) = (1 − 2t)−n/2, t < 1/2.
(3.3.2)
The m.g.f (3.3.2) shows that the sum of two independent ch-square random
variables is also a ch-square. Therefore, differences of sequantial sums of
squares of independent normal random variables will be distributed independently as chi-squares.
Theorem 3.3.2. If X ∼ Nn (μ, Σ) and Σ is positive definite, then
(X − μ)T Σ−1(X − μ) ∼ χ2n .
(3.3.3)
Proof. Since Σ is positive definite, there exists a non-singular An×n such that
Σ = AAT (Cholesky decomposition). Then, by definition of multivariate
normal distribution,
X = AZ + μ,
where Z is a random sample from a N (0, 1) distribution. Now,
Definition 3.3.2. Non-central chi-square distribution. Suppose X’s
are as in Definition (3.3.1) except that each Xi has mean μi , i = 1, 2, . . . , n.
Equivalently, suppose, X = (X1 , . . . , Xn)T be a random vector distributed
as Nn (μ, In ), where μ = (μ1, . . . , μn )T . Then the distribution of ni=1 Xi2 =
XT X is referred to as non-central chi-square with d.f. n and non-centrality
Chapter 3
94
BIOS 2083
Linear Models
Abdus S. Wahed
0.16
← λ=0
0.14
0.12
← λ=2
0.1
0.08
← λ=4
0.06
← λ=6
0.04
← λ=8
0.02
0
← λ=10
0
5
10
15
20
Figure 3.1: Non-central chi-square densities with df 5 and non-centrality parameter λ.
parameter λ =
n
2
i=1 μi /2
= 12 μT μ. The density of such a non-central chi-
square variable χ2n (λ) can be written as a infinite poisson mixture of central
chi-square densities as follows:
∞
e−λ λj (1/2)(n+2j)/2 −x/2 (n+2j)/2−1
e
x
.
fχ2n (λ) (x) =
j!
Γ((n
+
2j)/2)
j=1
(3.3.4)
Properties
1. The moment generating function of a non-central chi-square variable
χ2n (λ) is given by
Mχ2n (n,λ) (t) = (1 − 2t)
Chapter 3
−n/2
2λt
, t < 1/2.
exp
1 − 2t
(3.3.5)
95
BIOS 2083
Linear Models
Abdus S. Wahed
2. E χ2n (λ) = n + 2λ.
3. V ar χ2n (λ) = 2(n + 4λ).
4. χ2n (0) ≡ χ2n .
5. For a given constant c,
(a) P (χ2n (λ) > c) is an increasing function of λ.
(b) P (χ2n (λ) > c) ≥ P (χ2n > c).
Theorem 3.3.3. If X ∼ Nn (μ, Σ) and Σ is positive definite, then
XT Σ−1X ∼ χ2n (λ = μT Σ−1 μ/2).
(3.3.6)
Proof. Since Σ is positive definite, there exists a non-singular matrix An×n
such that Σ = AAT (Cholesky decomposition). Define,
Y = {AT }−1X.
Then,
Definition 3.3.3. Non-central F -distribution. If U1 ∼ χ2n1 (λ) and U2 ∼
χ2n2 and U1 and U2 are independent, then, the distribution of
F =
U1/n1
U2/n2
(3.3.7)
is referred to as non-central F -distribution with df n1 and n2 , and noncentrality parameter λ.
Chapter 3
96
BIOS 2083
Linear Models
Abdus S. Wahed
0.8
0.7
← λ=0
0.6
← λ=2
0.5
0.4
← λ=4
0.3
← λ=6
0.2
← λ=8
← λ=10
0.1
0
0
1
2
3
4
5
6
7
8
Figure 3.2: Non-central F-densities with df 5 and 15 and non-centrality parameter λ.
Chapter 3
97
BIOS 2083
Linear Models
0.4
Abdus S. Wahed
← λ=0
0.35
← λ=2
0.3
0.25
← λ=4
0.2
← λ=6
0.15
← λ=8
0.1
← λ=10
0.05
0
−0.05
−5
0
5
10
15
20
Figure 3.3: Non-central t-densities with df 5 and non-centrality parameter λ.
Definition 3.3.4. Non-central t-distribution. If U1 ∼ N (λ, 1) and U2 ∼
χ2n and U1 and U2 are independent, then, the distribution of
U1
T =
U2/n
(3.3.8)
is referred to as non-central t-distribution with df n and non-centrality parameter λ.
Chapter 3
98
BIOS 2083
3.4
Linear Models
Abdus S. Wahed
Distribution of quadratic forms
Caution: We assume that our matrix of quadratic form is symmetric.
Lemma 3.4.1. If An×n is symmetric and idempotent with rank r, then r of
its eigenvalues are exactly equal to 1 and n − r are equal to zero.
Proof. Use spectral decomposition theorem. (See Result 2.3.10 on page 51 of
Ravishanker and Dey).
Theorem 3.4.2. Let X ∼ Nn (0, In). The quadratic form XT AX ∼ χ2r iff A
is idempotent with rank(A) = r.
Proof. Let A be (symmetric) idempotent matrix of rank r. Then, by spectral
decomposition theorem, there exists an orthogonal matrix P such that
⎡
PT AP = Λ = ⎣
⎤
Ir 0
⎦.
(3.4.1)
0 0
⎡
Define Y = PT X = ⎣
Chapter 3
PT1 X
PT2 X
⎤
⎤
⎡
⎦=⎣
Y1
Y2
⎦, so that PT1 P1 = Ir . Thus, X =
99
BIOS 2083
Linear Models
Abdus S. Wahed
PY and Y1 ∼ Nr (0, Ir ). Now,
XT Ax = (PY)T APY
⎡
⎤
Ir 0
⎦Y
= YT ⎣
0 0
= Y1T Y1 ∼ χ2r .
(3.4.2)
Now suppose XT AX ∼ χ2r . This means that the moment generating
function of XT AX is given by
MXT AX (t) = (1 − 2t)−r/2.
(3.4.3)
But, one can calculate the m.g.f. of XT AX directly using the multivariate
normal density as
MXT AX (t) = E exp (XT AX)t
=
exp (XT AX)t fX (x)dx
T
1
1 T
=
exp (X AX)t
exp − x x dx
2
(2π)n/2
1 T
1
x (In − 2tA)x dx
exp
−
=
2
(2π)n/2
= |In − 2tA|−1/2
n
=
(1 − 2tλi )−1/2.
(3.4.4)
i=1
Chapter 3
100
BIOS 2083
Linear Models
Abdus S. Wahed
Equate (3.4.3) and (3.4.4) to obtain the desired result.
Theorem 3.4.3. Let X ∼ Nn (μ, Σ) where Σ is positive definite. The quadratic
form XT AX ∼ χ2r (λ) where λ = μT Aμ/2, iff AΣ is idempotent with rank(AΣ) =
r.
Proof. Omitted.
Theorem 3.4.4. Independence of two quadratic forms. Let X ∼
Nn (μ, Σ) where Σ is positive definite. The two quadratic forms XT AX and
XT BX are independent if and only if
AΣB = 0 = BΣA.
(3.4.5)
Proof. Omitted.
Remark 3.4.1. Note that in the above theorem, the two quadratic forms need
not have a chi-square distribution. When they are, the theorem is referred
to as Craig’s theorem.
Theorem 3.4.5. Independence of linear and quadratic forms. Let
X ∼ Nn (μ, Σ) where Σ is positive definite. The quadratic form XT AX and
the linear form BX are independently distributed if and only if
BΣA = 0.
(3.4.6)
Proof. Omitted.
Chapter 3
101
BIOS 2083
Linear Models
Abdus S. Wahed
Remark 3.4.2. Note that in the above theorem, the quadratic form need not
have a chi-square distribution.
Example 3.4.6. Independence of sample mean and sample varin
T
ance. Suppose X ∼ Nn (0, In). Then X̄ =
i=1 Xi /n = 1 X/n and
2
= ni=1(Xi − X̄)2/(n − 1) are independently distributed.
SX
Proof.
Theorem 3.4.7. Let X ∼ Nn (μ, Σ). Then
E XT AX = μT Aμ + trace(AΣ).
(3.4.7)
Remark 3.4.3. Note that in the above theorem, the quadratic form need not
have a chi-square distribution.
Proof.
Theorem 3.4.8. Fisher-Cochran theorem. Suppose X ∼ Nn(μ, In ). Let
Qj = XT Aj X, j = 1, 2, . . . , k be k quadratic forms with rank(Aj ) = rj such
that XT X = kj=1 Qj . Then, Qj ’s are independently distributed as χ2rj (λj )
where λj = μT Aj μ/2 if and only if kj=1 rj = n.
Proof. Omitted.
Theorem 3.4.9. Generalization of Fisher-Cochran theorem. Suppose X ∼ Nn (μ, In). Let Aj , j = 1, 2, . . . , k be k n × n symmetric matrices
with rank(Aj ) = rj such that A = kj=1 Aj with rank(A) = r. Then,
Chapter 3
102
BIOS 2083
Linear Models
Abdus S. Wahed
1. XT Aj X’s are independently distributed as χ2rj (λj ) where λj = μT Aj μ/2,
and
2. XT AX ∼ χ2r (λ) where λ =
k
j=1 λj
if and only if any one of the following conditions is satisfied.
C1. Aj Σ is idempotent for all j and Aj ΣAk = 0 for all j < k.
C2. Aj Σ is idempotent for all j and AΣ is idempotent.
C3. Aj ΣAk = 0 for all j < k and AΣ is idempotent.
C4. r =
k
j=1 rj
and AΣ is idempotent.
C5. the matrices AΣ, Aj Σ, j = 1, 2, . . . , k − 1 are idempotent and Ak Σ
is non-negative definite.
3.5
Problems
1. Consider the matrix
⎛
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
A=⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎝
Chapter 3
⎞
8 4 4 2 2 2 2
⎟
⎟
4 4 0 2 2 0 0 ⎟
⎟
⎟
4 0 4 0 0 2 2 ⎟
⎟
⎟
2 2 0 2 0 0 0 ⎟.
⎟
⎟
2 2 0 0 2 0 0 ⎟
⎟
⎟
2 0 2 0 0 2 0 ⎟
⎠
2 0 2 0 0 0 2
103
BIOS 2083
Linear Models
Abdus S. Wahed
(a) Find the rank of this matrix.
(b) Find a basis for the null space of A.
(c) Find a basis for the column space of A.
2. Let Xi , i = 1, 2, 3 are independent standard normal random variables. Show that the
variance-covariance matrix of the 3-dimensional vector Y, defined as
⎛
⎞
5X1
⎜
⎟
⎜
⎟
Y = ⎜ 1.6X1 − 1.2X2 ⎟ ,
⎝
⎠
2X1 − X2
is not positive definite.
⎛
3. Let
⎞
⎡⎛
X1
⎞⎤
⎞ ⎛
μ1
1 ρ 0
⎟⎥
⎜
⎟
⎢⎜
⎟ ⎜
⎟⎥
⎜
⎟
⎢⎜
⎟ ⎜
X = ⎜ X2 ⎟ ∼ N3 ⎢⎜ μ2 ⎟ , ⎜ ρ 1 ρ ⎟⎥ .
⎠⎦
⎝
⎠
⎣⎝
⎠ ⎝
0 ρ 1
X3
μ3
(a) Find the marginal distribution of X2 .
(b) What is the conditional distribution of X2 given X1 = x1 and X3 = x3 ? Under
what condition does this distribution coincide with the marginal distribution of
X2 ?
4. If X ∼ Nn (μ, Σ), then show that (X − μ)T Σ−1 (X − μ) ∼ χ2n .
5. Suppose Y = (Y1 , Y2 , Y3 )T be distributed as N3 (0, σ 2 I3 ).
(a) Consider the quadratic form:
Q=
Chapter 3
(Y1 − Y2 )2 + (Y2 − Y3 )2 + (Y3 − Y1 )2
.
3
(3.5.1)
104
BIOS 2083
Linear Models
Abdus S. Wahed
Write Q as Y T AY where A is symmetric. Is A idempotent? What is the distribution of Q/σ 2 ? Find E(Q).
(b) What is the distribution of L = Y1 + Y2 + Y3 ? Find E(L) and V ar(L).
(c) Are Q and L independent? Find E(Q/L2 )
6. Write each of the following quadratic forms in XT AX form:
(a)
1
X2
6 1
+ 23 X22 + 16 X32 − 23 X1 X2 + 13 X1 X3 − 23 X2 X3
(b) nX̄ 2 , where X̄ = (X1 + X2 + . . . + Xn )/n.
n
2
(c)
i=1 Xi
2
n (d)
i=1 Xi − X̄
2
2 2 i2
(e)
−
X̄
, where X̄i. = Xi1 +X
X
ij
i.
i=1
j=1
2
2
21 +X22
(f) 2 2i=1 X̄i. − X̄.. , where X̄.. = X11 +X12 +X
.
4
2
2
12
(g) 2 X̄1. − X̄.. + 3 X̄2. − X̄.. , where X̄1. = X11 +X
,X̄2. =
2
X21 +X22 +X23
,
3
X̄.. =
2X̄1. +3X̄2.
.
5
In each case, determine if A is idempotent. If A is idempotent, find rank(A).
⎛
⎞
⎛
⎞
μ1
1
0.5
⎠, and Σ = ⎝
⎠. Show that Q1 =
7. Let X ∼ N2 (μ, Σ), where μ = ⎝
0.5 1
μ2
2
2
(X1 − X2 ) and Q2 = (X1 + X2 ) are independently distributed. Find the distribution
of Q1 , Q2 , and
Q2
.
3Q1
8. Assume that Y ∼ N3 (0, I3 ). Define Q1 = Y T AY and Q2 = Y T BY , where
⎛
⎞
⎛
⎞
1 1 0
1 −1 0
⎜
⎟
⎜
⎟
⎜
⎟
⎜
⎟
A = ⎜ 1 1 0 ⎟ , and, B = ⎜ −1
1 0 ⎟.
⎝
⎠
⎝
⎠
0 0 1
0
0 0
Chapter 3
(3.5.2)
105
BIOS 2083
Linear Models
Abdus S. Wahed
Are Q1 and Q2 independent? Do Q1 and Q2 follow χ2 distribution?
9. Let Y ∼ N3 (0, I3 ). Let U1 = Y T A1 Y , U2 = Y T A2 Y , and V = BY where
⎛
⎛
⎞
⎞
1/2 1/2 0
1/2 −1/2 0
⎜
⎜
⎟
⎟
⎜
⎜
⎟
⎟
A1 = ⎜ 1/2 1/2 0 ⎟ , A2 = ⎜ −1/2 1/2 0 ⎟ , and, B = 1 1 0
⎝
⎝
⎠
⎠
0
0
1
0
0
0
.
(a) Are U1 and U2 independent?
(b) Are U1 and V independent?
(c) Are U2 and V independent?
(d) Find the distribution of V .
(e) Find the distribution of
U2
.
U1
(Include specific values for any parameters of the
distribution.)
10. Suppose X = (X1 , X2 , X3 )T ∼ N3 (μ, σ 2 V ), where, μ = (1, 1, 0)T , and
⎡
⎤
1
0.5
0
⎢
⎥
⎢
⎥
V = ⎢ 0.5 1 0.5 ⎥
⎣
⎦
0 0.5 1
(a) What is the joint distribution of (X1 , X3 )?
(b) What is the conditional distribution of X1 , X3 |X2 ?
(c) For what value or values of a does L = (aX12 + 12 X32 +
√
2aX1 X3 )/σ 2 follow a
chi-square distribution?
(d) Find the value of b for which L in (c) and M = X1 + bX3 are independently
distributed.
Chapter 3
106
BIOS 2083
Linear Models
11. Suppose X ∼ N3 (μ, Σ), where
⎛
μ
⎜ 1
⎜
μ = ⎜ μ2
⎝
μ3
Find the distribution of Q =
⎞
⎛
Abdus S. Wahed
σ12
⎟
⎜
⎟
⎜
⎟ , and Σ = ⎜ 0
⎠
⎝
0
3
Xi2
i=1 σi2 .
⎞
0
σ22
0
0
⎟
⎟
0 ⎟.
⎠
2
σ3
Express the parameters of its distribution in
terms of μi and σi2 , i = 1, 2, 3. What is the variance of Q?
12. Suppose X ∼ N(0, 1) and Y = UX, where U follows a uniform distribution on the
discrete space {−1, 1} independently of X.
(a) Find E(Y ) and cov(X, Y ).
(b) Show that Y and X are not independent.
13. Suppose X ∼ N4 (μ, I4), where
⎞
⎛
X
⎜ 11
⎜
⎜ X12
X=⎜
⎜
⎜ X21
⎝
X22
⎛
⎞
α + a1
⎟
⎜
⎟
⎜
⎟
⎜ α + a1
⎟,μ = ⎜
⎟
⎜
⎟
⎜ α + a2
⎠
⎝
α + a2
⎟
⎟
⎟
⎟.
⎟
⎟
⎠
2
i2
X
−
X̄
, where X̄i. = Xi1 +X
.
ij
i.
i=1
j=1
2
2
21 +X22
(b) Find the distribution of Q = 2 2i=1 X̄i. − X̄.. , where X̄.. = X11 +X12 +X
.
4
(a) Find the distribution of E =
2
2
(c) Use Fisher-Cochran theorem to prove that E and Q are independently distributed.
(d) What is the distribution of Q/E?
Chapter 3
107
107
BIOS 2083
c
Abdus
S. Wahed
Linear Models
Chapter 4
General linear model: the least
squares problem
4.1
Least squares (LS) problem
As observed in Chapter 1, any linear model can be expressed in the
form
⎛
Y
⎞ ⎡
X
⎜ Y1 ⎟ ⎢ x11 x12 . . . x1p
⎜
⎟ ⎢
⎜ Y2 ⎟ ⎢ x21 x22 . . . x2p
⎜
⎟ ⎢
⎜
⎟=⎢
...
...
...
⎜ · · · ⎟ ⎢ ...
⎜
⎟ ⎢
⎝
⎠ ⎣
xn1 xn2 . . . xnp
Yn
Chapter 4
⎤⎛
β
⎞
⎛
⎞
⎥⎜ β1 ⎟ ⎜ 1 ⎟
⎥⎜
⎟ ⎜
⎟
⎥⎜ β2 ⎟ ⎜ 2 ⎟
⎥⎜
⎟ ⎜
⎟
⎥⎜
⎟+⎜
⎟.
⎥⎜ · · · ⎟ ⎜ · · · ⎟
⎥⎜
⎟ ⎜
⎟
⎦⎝
⎠ ⎝
⎠
βp
n
(4.1.1)
108
BIOS 2083
Linear Models
c
Abdus
S. Wahed
Usually X is a matrix of known constants representing the values of
covariates, and Y is the vector of response and is an error vector
with the assumption that E(|X) = 0.
The goal is to find a value of β for which Xβ is a “close” approximation of Y. In statistical terms, one would like to estimate β such
that the “distance” between Y and Xβ is minimum. One form of
distance in real vector spaces is given by the length of the difference
between two vectors Y and Xβ, namely,
Y − Xβ2 = (Y − Xβ)T (Y − Xβ).
(4.1.2)
Note that for a given β, both Y and Xβ are vectors in Rn. In
addition, Xβ is always a member of C(X). Thus, for given Y and
X, the least squares problem can be characterized as a restricted
minimization problem:
Minimize Y − Xβ2 over β ∈ Rn.
Or equivalently,
Minimize Y − θ2 over θ ∈ C(X).
Chapter 4
109
BIOS 2083
4.2
Linear Models
c
Abdus
S. Wahed
Solution to the LS problem
Since θ belongs to the C(X), the value of θ that minimizes the distance between Y and θ is given by the orthogonal projection of Y
onto the column space of X (see a formal proof below). Let
Ŷ = Xβ̂ ∈ C(X)
(4.2.1)
is the orthogonal projection of Y onto the C(X). Then, since N (XT ) =
C(X)⊥, one can write
Y = Ŷ + e,
(4.2.2)
Y − Ŷ ∈ N (XT ).
(4.2.3)
where e ∈ N (XT ). Thus,
Lemma 4.2.1. For any θ ∈ C(X),
(Y − Ŷ)T (Ŷ − θ) = 0.
(4.2.4)
Proof.
Chapter 4
110
BIOS 2083
Linear Models
c
Abdus
S. Wahed
Lemma 4.2.2. Y − θ2 is minimized when θ = Ŷ.
Proof.
Y − θ2 = (Y − θ)T (Y − θ)
= (Y − Ŷ + (Ŷ − θ))T (Y − Ŷ + (Ŷ − θ))
= (Y − Ŷ)T (Y − Ŷ) + (Ŷ − θ)T (Ŷ − θ)
= Y − Ŷ2 + Ŷ − θ2,
(4.2.5)
which is minimized when θ = Ŷ.
Thus, we have figured out that Y − Xβ2 is minimum when
β = β̂ is such that Ŷ = Xβ̂ is the orthogonal projection of Y
onto the column space of X. But how do we find the orthogonal
projection?
Chapter 4
111
BIOS 2083
Linear Models
c
Abdus
S. Wahed
Normal equations
Notice from the result in (4.2.3) that
Y − Ŷ ∈ N (XT )
=⇒ XT (Y − Ŷ) = 0
=⇒ XT (Y − Xβ̂) = 0
=⇒ XT Y = XT Xβ̂
(4.2.6)
Equation (4.2.6) is referred to as normal equations; solution of which,
if exists will lead us to the orthogonal projection.
Example 4.2.3. Example 1.1.3 (continued). The linear model
in matrix form can be written as
Y
X
⎛
⎞ ⎡
⎜ Y1 ⎟ ⎢ 1 w1
⎜
⎟ ⎢
⎜ Y2 ⎟ ⎢ 1 w2
⎜
⎟ ⎢
⎜
⎟=⎢
⎜ · · · ⎟ ⎢ ... ...
⎜
⎟ ⎢
⎝
⎠ ⎣
1 wn
Yn
Chapter 4
⎛
⎞
β
⎥ ⎛ ⎞ ⎜ 1 ⎟
⎜
⎥
⎟
⎜ 2 ⎟
⎥ α
⎥⎝ ⎠ ⎜
⎟
+⎜
⎥
⎟.
⎜ ··· ⎟
⎥ β
⎜
⎥
⎟
⎝
⎦
⎠
n
⎤
(4.2.7)
112
BIOS 2083
Linear Models
Here,
⎡
c
Abdus
S. Wahed
⎤
n
wi
⎦
XT X = ⎣ 2 ,
wi
wi
and
⎞
⎛ Yi
T
⎠
⎝
X Y= wi Yi
The normal equations are then calculated as
(4.2.8)
(4.2.9)
⎫
⎬
αn + β wi =
Yi
α wi + β wi Yi = wi Yi
⎭
(4.2.10)
From the linear regression course, you know that, the solution to
these normal equations is given by
β̂ =
(w
Ȳ )
i −w̄)(Yi −
(wi −w̄)2
α̂ = Ȳ − β̂ w̄,
provided
Chapter 4
⎫
⎬
⎭
(4.2.11)
(wi − w̄)2 > 0.
113
BIOS 2083
c
Abdus
S. Wahed
Linear Models
Example 4.2.4. Example 1.1.7 (continued). The linear model
in matrix form can be written as
β
Y
⎞ ⎡
⎛
⎜ Y1 ⎟ ⎢ 1n1 1n1
⎟ ⎢
⎜
⎜ Y2 ⎟ ⎢ 1 n 0 n
⎟ ⎢ 2
⎜
2
⎟=⎢
⎜
...
⎜ · · · ⎟ ⎢ ...
⎟ ⎢
⎜
⎠ ⎣
⎝
1na 0na
Ya
⎛ ⎞
X
⎤
⎞
⎛
μ
⎟
⎜
0n1 . . . 0n1 ⎥ ⎜
⎟ ⎜ 1 ⎟
⎜
⎥ ⎜ α1 ⎟
⎟
⎟ ⎜
⎥
⎜
⎟ ⎜ 2 ⎟
1n2 . . . 0n2 ⎥ ⎜
⎟
⎟+⎜
⎥⎜
⎟,
α
2 ⎟
... ⎥ ⎜
...
...
⎜
⎟ ⎜ ··· ⎟
⎥⎜
⎟
... ⎟ ⎝
⎦⎜
⎠
⎟
⎜
⎠
a
0na . . . 1na ⎝
αa
(4.2.12)
where Yi = (Yi1, Yi2, . . . , Yini )T and i = (i1, i2, . . . , ini )T for i =
1, 2, . . . , a. Here,
⎡
⎢ n
⎢
⎢ n1
⎢
T
X X=⎢
⎢ ...
⎢
⎣
na
Chapter 4
⎤
n1 n2 . . . na ⎥
⎥
n1 0 . . . 0 ⎥
⎥
⎥,
... ... ... ... ⎥
⎥
⎦
0 0 . . . na
(4.2.13)
114
BIOS 2083
and
c
Abdus
S. Wahed
Linear Models
⎛
⎞
⎜ i j Yij
⎜ n
1
⎜
j Y1j
⎜
⎜ n
T
2
X Y=⎜
⎜
j Y2j
⎜
⎜
...
⎜
⎝ na
j Yaj
⎞
⎛
⎟ ⎜ Y..
⎟ ⎜
⎟ ⎜ Y1.
⎟ ⎜
⎟ ⎜
⎟=⎜Y
⎟ ⎜ 2.
⎟ ⎜
⎟ ⎜ ...
⎟ ⎜
⎠ ⎝
Ya.
⎞
⎛
⎟ ⎜ nȲ.. ⎟
⎟ ⎜
⎟
⎟ ⎜ n1Ȳ1. ⎟
⎟ ⎜
⎟
⎟ ⎜
⎟
⎟ = ⎜ n Ȳ ⎟
⎟ ⎜ 2 2. ⎟
⎟ ⎜
⎟
⎟ ⎜ ... ⎟
⎟ ⎜
⎟
⎠ ⎝
⎠
naȲa.
The normal equations are then calculated as
= nȲ..
⎫
⎬
niμ + niαi = niȲi., i = 1, 2, . . . , a.
⎭
nμ +
a
i=1 ni αi
Two solutions to this set of normal equations is given by
⎫
⎬
μ̂(1) = 0
(1)
⎭
α̂ = Ȳ , i = 1, 2, . . . , a,
i
Chapter 4
(4.2.15)
(4.2.16)
i.
μ̂(2) = Ȳ..
⎫
⎬
= Ȳi. − Ȳ.., i = 1, 2, . . . , a.
⎭
and
(2)
α̂i
(4.2.14)
(4.2.17)
115
BIOS 2083
Linear Models
c
Abdus
S. Wahed
Solutions to the normal equations
In Example 4.2.3, the normal equations have a unique solutions,
whereas in Example 4.2.4, there are more than one (in fact, infinitely
many) solutions. Are normal equations always consistent? If we
closely look at the normal equations (4.2.6)
XT Xβ = XT Y,
(4.2.18)
we see that if XT X is non-singular, then there exists a unique
solution to the normal equations, namely,
β̂ = (XT X)−1XT Y,
(4.2.19)
which is the case for the simple linear regression in Example 4.2.3,
or more generally for any linear regression problem (multiple, polynomial).
Chapter 4
116
BIOS 2083
Linear Models
c
Abdus
S. Wahed
Theorem 4.2.5. Normal equations (4.2.6) are always consistent.
Proof. From Chapter 2, Page 64, a system of equations Ax = b is
consistent iff b ∈ C(A). Thus, in our case, we need to show that,
XT Y ∈ C(XT X).
(4.2.20)
Now, XT Y ∈ C(XT ). If we can show that C(XT ) ⊆ C(XT X), then
the result is established. Let us look at the following lemma first:
Lemma 4.2.6. N (XT X) = N (X).
Proof. . If a ∈ N (XT X), then
XT Xa = 0 =⇒ aT XT Xa = 0
=⇒ Xa2 = 0 =⇒ Xa = 0
=⇒ a ∈ N (X).
(4.2.21)
On the other hand, if a ∈ N (X), then Xa = 0, and hence
XT Xa = 0 which implies that a ∈ N (XT X), which completes
the proof.
Chapter 4
117
BIOS 2083
c
Abdus
S. Wahed
Linear Models
Now, from the above lemma, and from the result stated in chapter
2, Page 54, and Theorem 2.3.2 on Page 53 ,
N ⊥ (XT X) = N ⊥ (X)
=⇒ C(XT X) = C(XT ),
(4.2.22)
which completes the proof.
Least squares estimator
The above theorem shows that the normal equations are always consistent. Using a g-inverse of XT X, we can write out all possible
solutions of the normal equations. Namely,
β̂ = (X X) X Y + I − (X X) X X c
T
g
T
T
g
T
(4.2.23)
gives all possible solution to the normal equations (4.2.6) for arbitrary
vector c. The estimator β̂ is known as a least squares estimator of β
for a given c. Note that one could write all possible solutions using
the arbitrariness of the g-inverse of XT X.
Chapter 4
118
BIOS 2083
Linear Models
c
Abdus
S. Wahed
We know that the orthogonal projection Ŷ of Y onto C(X) is
unique. However, the solutions to the normal equations are not.
Does any solution of the normal equation lead to the orthogonal
projection? In fact, it does. Specifically, if βˆ1 and βˆ2 are any two
solutions to the normal equations, then
Xβˆ1 = Xβˆ2.
(4.2.24)
Projection and projection matrix
From the equation (4.2.23), the projection of Y onto the column
space C(X) is given by the prediction vector
Ŷ = Xβ̂ = X(XT X)g XT Y = PY,
(4.2.25)
where P = X(XT X)g XT is the projection matrix.
A very useful lemma:
Lemma 4.2.7. XT XA = XT XB if and only if XA = XB for
any two matrices A and B.
Chapter 4
119
BIOS 2083
Linear Models
c
Abdus
S. Wahed
Proposition 4.2.8. Verify (algebraically) the following results:
1. P = X(XT X)g XT is idempotent.
2. P is invariant to the choice of the g-inverse (XT X)g .
3. P is symmetric.(Note (XT X)g does not need to be symmetric).
Chapter 4
120
BIOS 2083
Linear Models
c
Abdus
S. Wahed
Proposition 4.2.9. If P = X(XT X)g XT is the orthogonal projection onto the column space of XT , then show that
XT P = XT .
(4.2.26)
rank(P) = rank(X).
(4.2.27)
and
Chapter 4
121
BIOS 2083
Linear Models
c
Abdus
S. Wahed
Residual vector
Definition 4.2.1. The vector e = Y − Ŷ is known to be the
residual vector.
Notice,
e = Y − Ŷ = (In − P)Y,
(4.2.28)
and Y can be decomposed into two orthogonal components,
Y = Ŷ + e,
(4.2.29)
Ŷ = PY belonging to the column space of X and e = (In − P)Y
belonging to N (XT ).
Example 4.2.10. Show that Ŷ and e are uncorrelated when the
elements of Y are independent with equal variance.
Chapter 4
122
BIOS 2083
Linear Models
c
Abdus
S. Wahed
Proof. Let cov(Y) = σ 2In. Then,
E(ŶeT ) = E(PYYT (In − P))
= PE(YYT )(In − P)
= P σ 2In + Xββ T XT (In − P)
= σ 2P(In − P)
= 0.
(4.2.30)
Also, E [e] = 0. Together we get, cov(Ŷ, e) = 0.
Chapter 4
123
BIOS 2083
Linear Models
c
Abdus
S. Wahed
Example 4.2.11. For the simple linear regression problem in exam
ple (4.2.3), we find that rank(XT X) = 2, provided (wi − w̄)2 > 0.
Then,
(XT X)−1
⎤
⎡ wi2 − wi
1
⎦.
⎣
= 2
n (wi − w̄)
− wi
n
(4.2.31)
Recall the XT Y matrix,
⎞
⎛ Yi
T
⎠,
⎝
X Y= wiYi
(4.2.32)
leading to the least squares estimator
β̂ = (XT X)−1XT Y
⎡ ⎞
⎤⎛ 2
wi − wi
Yi
1
⎣
⎠
⎦⎝
= 2
n (wi − w̄)
wiYi
− wi
n
⎛ ⎞
Yi wi2 − wi Yi wi
1
⎝
⎠
= 2
n (wi − w̄)
n wi Yi − wi Yi
⎞
⎛
Ȳ − β̂ w̄ = α̂
?
⎠.
(4.2.33)
= ⎝ n wi Yi − wi Yi
= β̂
n (w −w̄)2
i
Chapter 4
124
BIOS 2083
c
Abdus
S. Wahed
Linear Models
Example 4.2.12. For the one-way ANOVA model in Example
(4.2.4),
⎤
⎡
⎢ n
⎢
⎢ n1
⎢
T
X X=⎢
⎢ ...
⎢
⎣
na
n1 n2 . . . na ⎥
⎥
n1 0 . . . 0 ⎥
⎥
⎥,
... ... ... ... ⎥
⎥
⎦
0 0 . . . na
(4.2.34)
A g-inverse is given by,
⎡
⎤
⎢ 0 0 0 ... 0
⎢
⎢ 0 1/n1 0 . . . 0
⎢
T
g
(X X) = ⎢
...
... ...
...
⎢ ...
⎢
⎣
0 0 0 . . . 1/na
⎥
⎥
⎥
⎥
⎥,
⎥
⎥
⎦
(4.2.35)
The projection, P is obtained as,
P = X(XT X)g XT = blockdiag
Chapter 4
1
Jn , i = 1, 2, . . . , a.
ni i
(4.2.36)
125
BIOS 2083
Linear Models
c
Abdus
S. Wahed
A solution to the normal equation is then obtained as:
⎛
⎞
⎜ 0 ⎟
⎜
⎟
⎜ Ȳ1. ⎟
⎜
⎟
⎜
⎟
T
g T
⎜
β̂ = (X X) X Y = ⎜ Ȳ2. ⎟
⎟.
⎜
⎟
⎜ ... ⎟
⎜
⎟
⎝
⎠
Ȳa.
Corresponding prediction vector Ŷ is given by,
⎞
⎛
⎜ 1n1 Ȳ1. ⎟
⎟
⎜
⎜ 1n Ȳ2. ⎟
⎜ 2 ⎟
Ŷ = PY = Xβ̂ = ⎜
⎟.
⎜ ... ⎟
⎟
⎜
⎠
⎝
1na Ȳa.
Notice that,
⎛
⎜ Y1 − 1n1 Ȳ1.
⎜
⎜ Y2 − 1n Ȳ2.
⎜
2
e = (In − P)Y = (Y − Xβ̂) = ⎜
...
⎜
⎜
⎝
Ya − 1na Ȳa.
Chapter 4
(4.2.37)
(4.2.38)
⎞
⎟
⎟
⎟
⎟
⎟.
⎟
⎟
⎠
(4.2.39)
126
BIOS 2083
Linear Models
c
Abdus
S. Wahed
Ŷ2 = ŶT Ŷ = n1Ȳ1.2 + n2Ȳ2.2 + . . . + naȲa.2
a
=
ni Ȳi.2.
(4.2.40)
i=1
and
e2 = eT e = Y1T Y1 − n1Ȳ1.2 + Y2T Y2 − n2Ȳ2.2 + . . . + YaT Ya − naȲa.2
a
T
2
=
Yi Yi − niȲi.
=
=
i=1
ni
a i=1 j=1
ni
a i=1 j=1
2
2
Yij − Ȳi.
Yij2
−
a
niȲi.2
i=1
= Y − Ŷ2.
(4.2.41)
“Residual SS” = Total SS -“Regression SS”. Or,
Total SS = “Regression SS” + “Residual SS”.
Chapter 4
127
BIOS 2083
Linear Models
c
Abdus
S. Wahed
Theorem 4.2.13. If β̂ is a solution to the normal equations
(4.2.6), then,
Y2 = Ŷ2 + e2,
(4.2.42)
where, Ŷ = Xβ̂ and e = Y − Xβ̂.
Proof. Left as an exercise.
Definition 4.2.2. Regression SS, Residual SS. The quantity
Ŷ2 is referred to as regression sum of squares or model sum of
squares, the portion of total sum of squares explained by the linear
model whereas the other part e2 is the error sum of squares or
residual sum of squares (unexplained variation).
Coefficient of determination (R2 )
To have a general definition, let the model Y = Xβ + contains an
intercept term, meaning the first column of X is 1n. Total sum of
Chapter 4
128
BIOS 2083
Linear Models
c
Abdus
S. Wahed
Table 4.1: Analysis of variance
Models with/without an intercept term
Source
df
SS
Regression (Model)
r
Y T PY
Residual (Error)
n−r
Y T (I − P)Y
Total
n
YT Y
Models with an intercept term
Source
df
SS
Mean
1
Y T 1n 1Tn Y/n
Regression (corrected for mean) r − 1
Y T (P − 1n 1Tn /n)Y
Residual (Error)
n−r
Y T (I − P)Y
Total
n
YT Y
Models with an intercept term
Source
df
Regression (corrected for mean) r − 1
SS
Y T (P − 1n 1Tn /n)Y
Residual (Error)
n−r
Y T (I − P)Y
Total (corrected)
n−1
Y T Y − Y T 1n 1Tn Y/n
Chapter 4
129
BIOS 2083
Linear Models
c
Abdus
S. Wahed
squares corrected for the intercept term (or mean) is then written as
T otal SS(corr.) = YT Y − nȲ 2
1
= YT (In − Jn )Y.
n
(4.2.43)
Similarly, the regression SS is also corrected for the intercept term
and is expressed as
Regression SS(corr.) = YT PY − nȲ 2
1
= YT (P − Jn )Y.
n
(4.2.44)
This is the portion of total corrected sum of squares that is purely
explained by the design variables in the model. However, an equality similar to (4.4.42) applied to the corrected sums of squares still
follows, and the ratio
YT (P − n1 Jn)Y
Reg. SS(Corr.)
R =
=
T otal SS(Corr.) YT (In − n1 Jn)Y
2
(4.2.45)
explains the proportion of total variation explained by the
model. This ratio is known as the coefficient of determination and is
denoted by R2.
Chapter 4
130
BIOS 2083
Linear Models
c
Abdus
S. Wahed
Two important results:
Lemma 4.2.14. Ip − (XT X)g XT X is a projection onto N (X).
Proof. Use lemma 2.7.10.
Lemma 4.2.15. XT X(XT X)g is a projection onto C(XT ).
Proof. Use lemma 2.7.11.
Importance:
Sometimes it is easy to obtain a basis for the null space of X
or column space of XT by careful examination of the relationship
between the columns of X. However, in some cases it is not as
straightforward. In such cases, independent non-zero columns from
the projection matrix Ip − (XT X)g XT X can be used as a basis for
the null space of X. Similarly, independent non-zero columns from
the projection matrix XT X(XT X)g can be used as a basis for the
column space of XT .
Chapter 4
131
BIOS 2083
c
Abdus
S. Wahed
Linear Models
Example 4.2.16. Example 4.2.12
⎡
⎢0 1
⎢
⎢0 1
⎢
⎢
T
T
g
X X(X X) = ⎢
⎢0 0
⎢
⎢ ... ...
⎢
⎣
0 0
continued.
⎤
1 ... 1 ⎥
⎥
0 ... 0 ⎥
⎥
⎥
1 ... 0 ⎥
⎥.
⎥
... ... ... ⎥
⎥
⎦
0 ... 1
(4.2.46)
Therefore a basis for the column space of XT is given by the last n
columns of the above matrix. Similarly,
⎡
⎢ 1 0
⎢
⎢ −1 0
⎢
⎢
Ia+1 − (XT X)g XT X = ⎢
⎢ −1 0
⎢
⎢ ... ...
⎢
⎣
−1 0
⎤
0 ... 0 ⎥
⎥
0 ... 0 ⎥
⎥
⎥
0 ... 0 ⎥
⎥.
⎥
... ... ... ⎥
⎥
⎦
0 ... 0
(4.2.47)
Therefore, the only basis vector in the null space of X is (1, −1Ta )T .
Chapter 4
132
BIOS 2083
4.3
Linear Models
c
Abdus
S. Wahed
Interpreting LS estimator
Usually, an estimator is interpreted by the quantity it estimates.
Remember, a solution to the normal equation (4.2.6) is given by
β̂ = (XT X)g XT Y. What does β̂ really estimates?
E(β̂) = (XT X)g XT E(Y) = (XT X)g XT Xβ = Hβ.
(4.3.1)
Unless X has full column rank, β̂ is not an unbiased estimator
of β. It is an unbiased estimator of Hβ, which may not be unique
(depends on g-inverse of XT X). Therefore, when X is not of full column rank, the estimator β̂ is practically meaningless. Nevertheless,
being a solution to the normal equations, it helps us construct useful
estimators for other important functions of β (will discuss later).
Estimating E(Y)
Even though the normal equations (4.2.6) may not have a unique
solution, it facilitates a unique LS estimator for E(Y) = Xβ since
E(Ŷ) = E(PY) = PXβ = Xβ = E(Y).
Chapter 4
(4.3.2)
133
BIOS 2083
Linear Models
c
Abdus
S. Wahed
= Ŷ = Xβ̂ = PY is an unique unbiased estimator of
Thus E(Y)
E(Y).
Introducing assumptions
So far the only assumptions we put on the response vector Y or
equivalently on the error vector is that
E() = 0.
(4.3.3)
This was a defining assumption of the general linear model. This
allowed us to obtain a unique unbiased estimator for the mean response Xβ. However, without further assumptions on the variance
of the responses (or, equivalently of the random errors) it is difficult
or even impossible to ascertain how efficient this estimator of the
mean response is. We will introduce assumptions as we need them.
Let us assume that
Assumption II. Error components are independently and identically distributed with constant variance σ 2.
Chapter 4
134
BIOS 2083
Linear Models
c
Abdus
S. Wahed
Variance-covariance matrix for LS estimator
Under assumption II, cov(Y) = σ 2In . Variance-covariance matrix
cov(β̂) of a LS estimator β̂ = (XT X)g XT Y is given by
cov(β̂) = cov((XT X)g XT Y)
T
= (XT X)g XT cov(Y) (XT X)g XT
T
= (XT X)g XT X (XT X)g σ 2
(4.3.4)
For full rank cases (4.3.4) reduces to the familiar form cov(β̂) =
(XT X)−1σ 2.
Variance-covariance matrix for Ŷ
Example 4.3.1. Show that
1. cov(Ŷ) = Pσ 2 .
2. cov(e) = (I − P)σ 2.
Chapter 4
135
BIOS 2083
c
Abdus
S. Wahed
Linear Models
Estimating the error variance
Note that, using Theorem 3.4.7,
E(Residual SS) = E Y (I − P)Y
2
= trace (I − P)σ In + (Xβ)T (I − P)Xβ
T
= σ 2trace {(I − P)} + β T XT (I − P)Xβ
= σ 2(n − r),
(4.3.5)
where r = rank(X). Therefore, an unbiased estimator of the error
variance σ 2 is given by
Residual SS Residual M S YT (I − P)Y
ˆ
2
σ =
=
=
.
n−r
n−r
n−r
4.4
(4.3.6)
Estimability
Unless X is of full column rank, solution to the normal equations
(4.2.6) is not unique. Therefore, in such cases, a solution to the
normal equation does not estimate any useful population quantity.
More specifically, we have shown that E(β̂) = Hβ, where H =
Chapter 4
136
BIOS 2083
c
Abdus
S. Wahed
Linear Models
(XT X)g XT X. Consider the following XT X matrix
⎡
⎤
⎢6 3 3⎥
⎢
⎥
T
⎢
X X=⎢3 3 0⎥
⎥
⎣
⎦
3 0 3
(4.4.1)
from a one-way ANOVA experiment with two treatments each replicated 3 times. Let us consider two g-inverses
⎡
⎤
0 ⎥
⎢0 0
⎢
⎥
⎥
G1 = ⎢
⎢ 0 1/3 0 ⎥
⎣
⎦
0 0 1/3
and
⎡
(4.4.2)
⎤
⎢ 1/3 −1/3 0 ⎥
⎢
⎥
⎢
G2 = ⎢ −1/3 2/3 0 ⎥
⎥
⎣
⎦
0
0 0
with
⎡
(4.4.3)
⎤
⎢0 0 0⎥
⎢
⎥
T
⎢
H1 = G1X X = ⎢ 1 1 0 ⎥
⎥
⎣
⎦
1 0 1
Chapter 4
(4.4.4)
137
BIOS 2083
c
Abdus
S. Wahed
Linear Models
⎡
and
⎤
⎢1 0 1 ⎥
⎢
⎥
T
⎢
H2 = G2X X = ⎢ 0 1 −1 ⎥
⎥
⎣
⎦
0 0 0
respectively. Now, if β = (μ, α1, α2 )T , then,
⎞
⎛
⎜ 0 ⎟
⎟
⎜
⎜
H1β = ⎜ μ + α1 ⎟
⎟
⎠
⎝
μ + α2
whereas
⎛
⎞
⎜ μ + α1
⎜
H2β = ⎜
⎜ α1 − α2
⎝
0
⎟
⎟
⎟.
⎟
⎠
(4.4.5)
(4.4.6)
(4.4.7)
Thus two solutions to the same normal equations set estimate two
different quantities. However, in practice, one would like to construct
estimators that estimate the same population quantity, no matter
what solution to the normal equation is used to derive that estimator.
One important goal in one-way ANOVA is to estimate the difference
Chapter 4
138
BIOS 2083
Linear Models
c
Abdus
S. Wahed
between two treatment effects, namely, δ = α1 − α2 = (0, 1, −1)β.
Two different solutions based on the two g-inverses G1 and G2 are
given by β̂ 1 = (0, Ȳ1., Ȳ2.)T and β̂ 2 = (Ȳ2., Ȳ1. − Ȳ2., 0)T . If we
construct our estimator for δ based on the solution β̂ 1, we obtain
δ̂1 = (0, 1, −1)β̂1 = Ȳ1. − Ȳ2.,
(4.4.8)
exactly the quantity you would expect. Now let us see if the same
happens with the other solution β̂ 2. For this solution,
δ̂2 = (0, 1, −1)β̂2 = Ȳ1. − Ȳ2.,
(4.4.9)
same as δ̂1. Now we will show that no matter what solution you pick
for the normal equation, δ̂ will always be the same. To see it, let us
write δ̂ as
δ̂ = (0, 1, −1)(XT X)g XT Y
= Pδ Y,
(4.4.10)
where Pδ = (0, 1, −1)(XT X)g XT . If we can show that Pδ does not
depend on the choice of g-inverse (XT X)g , then we are through. Let
Chapter 4
139
BIOS 2083
c
Abdus
S. Wahed
Linear Models
us first look at the XT -matrix for this simpler version of one-way
ANOVA problem:
⎡
⎤
⎢1 1 1 1 1 1⎥
⎢
⎥
T
⎢
X =⎢1 1 1 0 0 0⎥
⎥.
⎣
⎦
0 0 0 1 1 1
Notice that, (0, 1, −1)T belongs to C(XT ), e.g.
⎛
⎞
⎜ 1 ⎟
⎟
⎛
⎞ ⎡
⎤⎜
⎜ 0 ⎟
⎜
⎟
0
1
1
1
1
1
1
⎜
⎟
⎜
⎟ ⎢
⎥⎜
⎜
⎟ ⎢
⎥⎜ 0 ⎟
⎟
⎜ 1 ⎟ = ⎢ 1 1 1 0 0 0 ⎥⎜
⎟.
⎜
⎟ ⎢
⎥⎜
⎝
⎠ ⎣
⎦ ⎜ −1 ⎟
⎟
⎟
−1
0 0 0 1 1 1 ⎜
⎜
⎟
⎜ 0 ⎟
⎝
⎠
0
(4.4.11)
(4.4.12)
But we know that there exists a unique c ∈ C(X) such that
(0, 1, −1)T = XT c. Now,
Pδ = (0, 1, −1)(XT X)g XT = cT X(XT X)g XT = cT P.
(4.4.13)
Since c is unique, and P does not depend on the choice of (XT X)g ,
Chapter 4
140
BIOS 2083
Linear Models
c
Abdus
S. Wahed
from the above equation we see that Pδ , and hence δ̂ = Pδ Y is
unique to the choice of a g-inverse.
Summary
• Not all linear functions of β may be estimated uniquely based
on the LS method.
• Linear functions λT β of β, where λ is a linear combination of
the columns of XT allow unique estimators based on the LS
estimator.
Estimable functions
Definition 4.4.1. θ̂(Y) is an unbiased estimator of θ if and only
if E θ̂(Y) = θ, for all θ.
Definition 4.4.2. θ̂(Y) is a linear estimator of θ if and only if
θ̂(Y) = aT Y + b, for some constant (vector) b and vector (matrix)
a.
Chapter 4
141
BIOS 2083
Linear Models
c
Abdus
S. Wahed
Definition 4.4.3. A linear function θ = λT β is linearly estimable
if and only if there exists a linear function cT Y such that E(cT Y) =
λT β = θ, for all β.
We will drop “linearly” from “linearly estimable” for simplicity.
That means “estimable” will always refer to linearly estimable unless
mentioned specifically.
Example 4.4.1.
1. Components of the mean vector Xβ are estimable.
2. Components of the vector XT Xβ are estimable.
Proposition 4.4.2. Linear combinations of estimable functions
are estimable.
Proof. Follows from the definition 4.4.3.
Chapter 4
142
BIOS 2083
Linear Models
c
Abdus
S. Wahed
Proposition 4.4.3. A linear function θ = λT β is estimable if
and only if λ ∈ C(XT ).
Proof. Suppose θ = λT β is estimable. Then, by definition, there
exists a vector c such that
E(cT Y) = λT β, for all β
=⇒ cT Xβ = λT β, for all β
=⇒ cT X = λT ,
=⇒ λ = XT c
=⇒ λ ∈ C(XT ).
(4.4.14)
Now, suppose λ ∈ C(XT ). This implies that λ = XT c for some c.
Then, for all β,
λT β = cT Xβ = cT E(Y) = E(cT Y).
Chapter 4
(4.4.15)
143
BIOS 2083
Linear Models
c
Abdus
S. Wahed
Proposition 4.4.4. If θ = λT β is estimable then there exists a
unique c∗ ∈ C(X) such that λ = XT c∗.
Proof. Proposition 4.4.3 indicates that there exists a c such that
λ = XT c. But any vector c can be written as a direct sum of two
unique components belonging to two orthogonal complements. Thus,
we can find c∗ ∈ C(X) and c∗∗ ∈ N (XT ) such that
c = c∗ + c∗∗.
(4.4.16)
λ = XT c = XT c∗ + XT c∗∗ = XT c∗.
(4.4.17)
Now
Hence, the proof.
Proposition 4.4.5. Collection of all possible estimable functions constitutes a vector space of dimension r = rank(X).
Proof. Hint: (i) Show that linear combinations of estimable functions are also estimable, and (ii) Use proposition 4.4.3.
Chapter 4
144
BIOS 2083
Linear Models
c
Abdus
S. Wahed
Methods to determine estimability
Method 1. λT β is estimable if and only if it can be expressed as a linear
combinations of the rows of Xβ.
Method 2. λT β is estimable if and only if λT e = 0 for all bassis vectors e
of the null space of X.
Method 3. λT β is estimable if and only if λ is a linear combination of the
basis vectors of C(XT ).
Example 4.4.6. Multiple linear regression (Example 1.1.5
continued..) In the case of multiple regression with p independent
variables (which may include the intercept term) and n observations
(n > p) , columns of X are all independent. Therefore, N (X) = {0}.
By method 2, all linear functions of β are estimable. In particular,
1. Individual coefficients βj are estimable.
2. Differences between two coefficients are estimable.
Chapter 4
145
BIOS 2083
c
Abdus
S. Wahed
Linear Models
Example 4.4.7. Example 4.2.12 continued.
1. Treatment-specific means μ + αi , i = 1, 2, . . . , a are estimable
(using Method 1).
2. Difference between two treatment effects (αi − αi ) is estimable.
(Follows from the above, or can be inferred by Method 2).
a
3. In general, any linear combination λ β = λ0μ + i=1 λiαi is
estimable if and only if λ0 = ai=1 λi. (Use Method 2).
T
Chapter 4
146
BIOS 2083
Linear Models
c
Abdus
S. Wahed
Example 4.4.8. Two-way nested design. Suppose ni patients
are randomized to the ith level of treatment A, i = 1, 2, . . . , a and
within the ith treatment group a second randomization is done to bi
levels of treatment B which are unique to each level of treatment A.
The linear model for this problem can be written as
Yijk
= μ + αi + βij + ijk ,
i = 1, 2, . . . , a; j = 1, 2, . . . , bi; k = 1, 2, . . . , nij(4.4.18)
.
Then the X−matrix for this problem is given by
⎡
⎤
1 1 0 1 0 0 0
⎢
⎥
⎢
⎥
⎢1 1 0 1 0 0 0⎥
⎢
⎥
⎢
⎥
⎢1 1 0 0 1 0 0⎥
⎢
⎥
⎢
⎥
⎢1 1 0 0 1 0 0⎥
⎢
⎥
X=⎢
⎥,
⎢1 0 1 0 0 1 0⎥
⎢
⎥
⎢
⎥
⎢
⎥
⎢1 0 1 0 0 1 0⎥
⎢
⎥
⎢
⎥
⎢1 0 1 0 0 0 1⎥
⎣
⎦
1 0 1 0 0 0 1
Chapter 4
(4.4.19)
147
BIOS 2083
c
Abdus
S. Wahed
Linear Models
where we have simplified the problem by taking a = 2, b1 = b2 = 2,
and n11 = n12 = n21 = n22 = 2. Clearly rank(X) = 4. Dimension
of the null space of X is 7 - 4 = 3. A set of basis vectors for the null
space of X can be written as:
⎛
⎞
⎛
⎞
⎛
⎞
1
0
0
⎜
⎜
⎜
⎟
⎟
⎟
⎜
⎜
⎜
⎟
⎟
⎟
⎜ 0 ⎟
⎜ 1 ⎟
⎜ 0 ⎟
⎜
⎜
⎜
⎟
⎟
⎟
⎜
⎜
⎜
⎟
⎟
⎟
⎜ 0 ⎟
⎜ 0 ⎟
⎜ 1 ⎟
⎜
⎜
⎜
⎟
⎟
⎟
⎜
⎜
⎜
⎟
⎟
⎟
⎜
⎜
⎟
⎟
⎟
e1 = ⎜
=
=
,
e
,
e
⎜ −1 ⎟ 2 ⎜ −1 ⎟ 3 ⎜ 0 ⎟
⎜
⎜
⎜
⎟
⎟
⎟
⎜ −1 ⎟
⎜ −1 ⎟
⎜ 0 ⎟
⎜
⎜
⎜
⎟
⎟
⎟
⎜
⎜
⎜
⎟
⎟
⎟
⎜
⎜
⎜
⎟
⎟
⎟
⎜ −1 ⎟
⎜ 0 ⎟
⎜ −1 ⎟
⎝
⎝
⎝
⎠
⎠
⎠
−1
0
−1
(4.4.20)
Thus, using Method 2, λT β is estimable if
λT ej = 0, j = 1, 2, 3.
(4.4.21)
Specifically, if λ = (λ0, λ1, λ2, λ11, λ12, λ21, λ22)T , then λT β is esChapter 4
148
BIOS 2083
c
Abdus
S. Wahed
Linear Models
timable if the following three conditions are satisfied:
(1) λ0 =
(2) λ1 =
(3) λ2 =
2 2
λij ,
i=1 j=1
2
λ1j ,
j=1
2
λ2j .
(4.4.22)
j=1
Let us consider some special cases:
1. Is α1 estimable?
2. Is μ + α1 estimable?
3. Is α1 − α2 estimable?
4. Is α1 − α2 + (β11 + β12)/2 − (β21 + β22)/2 estimable?
Chapter 4
149
BIOS 2083
Linear Models
c
Abdus
S. Wahed
Definition 4.4.4. Least squares estimator of an estimable
function λT β is given by λT β̂, where β̂ is a solution to the normal
equations (4.2.6).
Properties of least squares estimator
Proposition 4.4.9. Uniqueness. Least squares estimator (of
an estimable function) is invariant to the choice of a solution to
the normal equations.
Proof. Let us consider the class of solutions from the normal equations
β̂ = (XT X)g XT Y.
Least squares estimator of a an estimable function λT β is then given
by
λT β̂ = λT (XT X)T XT Y.
(4.4.23)
From proposition 4.4.4, since λT β is estimable, there exists a unique
c ∈ C(X) such that
λ = XT c.
Chapter 4
(4.4.24)
150
BIOS 2083
Linear Models
c
Abdus
S. Wahed
Therefore, Equation (4.4.23) combined with (4.4.24) leads to
λT β̂ = cT X(XT X)g XT Y = cT PY.
(4.4.25)
Since both c and P are unique (does not depend on the choice of
g-inverse), the result follows.
Proposition 4.4.10. Linearity and Unbiasedness. LS estimator is linear and unbiased.
Proof. Left as an exercise.
Proposition 4.4.11. Variance. Under Assumption II,
V ar(λT β̂) = σ 2λT (XT X)g λ.
(4.4.26)
Proof.
V ar(λT β̂) = V ar λT (XT X)g XT Y
T
= λT (XT X)g XT cov(Y) λT (XT X)g XT
T
= σ 2λT (XT X)g XT X (XT X)g λ
?
= σ 2λT (XT X)g λ.
Chapter 4
(4.4.27)
151
BIOS 2083
Linear Models
c
Abdus
S. Wahed
Proposition 4.4.12. Characterization. If an estimator λT β̂
of a linear function λT β is invariant to the choice of the solutions
β̂ to the normal equations, then λT β is estimable.
Proof. For a given g-inverse G of XT X, consider the general form
of the solutions to the normal equations:
β̂ = GXT Y + (I − GXT X)c
(4.4.28)
for any vector c ∈ Rp. Then,
T
T
T
T
λ β̂ = λ GX Y + (I − GX X)c
= λT GXT Y + λT (I − GXT X)c.
(4.4.29)
Since G is given, in order for the above to be equal for all c, we must
have
λT (I − GXT X) = 0.
(4.4.30)
λT = λT GXT X.
(4.4.31)
Or, equivalently,
This last equation implies that λ ∈ C(XT ). This completes the
proof.
Chapter 4
152
BIOS 2083
Linear Models
c
Abdus
S. Wahed
Theorem 4.4.13. Gauss-Markov Theorem. Under Assumptions I and II, if λT β is estimable, then the least squares estimator λT β̂ is the unique minimum variance linear unbiased
estimator.
In the econometric literature, minimum variance is referred to as
best and along with the linearity and unbiasedness the least squares
estimator becomes best linear unbiased estimator (BLUE).
Proof. Uniqueness follows from the proposition 4.4.9. Linearity and
unbiasedness follows from the proposition 4.4.10. The only thing
remains to be shown is that no other linear unbiased estimator of
λT β can have smaller variance than λT β̂.
Since λT β is estimable, there exists a c such that λ = XT c. Let
a + dT Y be any other linear unbiased estimator of λT β. Then, we
Chapter 4
153
BIOS 2083
Linear Models
c
Abdus
S. Wahed
must have a = 0 and λT = dT X. Then,
XT d
=
XT c
=⇒ XT (c − d) = 0
=⇒ (c − d) ∈ N (XT )
=⇒ P(c − d) = 0
=⇒ Pc = Pd.
(4.4.32)
Now, by proposition 4.4.11,
var(λT β̂) = σ 2cT X(XT X)g XT c = σ 2cT Pc.
(4.4.33)
and
var(dT Y) = σ 2dT d.
Chapter 4
(4.4.34)
154
BIOS 2083
Linear Models
c
Abdus
S. Wahed
Thus,
var(dT Y) − var(λT β̂) = σ 2 dT d − cT Pc
= σ 2 dT d − cT P2c
= σ 2 dT d − dT P2d
= σ 2dT (I − P)d
(4.4.35)
≥ 0.
Therefore the LS estimator has the minimum variance among all linear unbiased estimators. Equation (4.4.35) shows that var(dT Y) =
var(λT β̂) if and only if (I−P)d = 0, or equivalently, d = Pd = Pc,
leading to dT Y = cT PY = cT X(XT X)g XT Y = λT β̂.
Chapter 4
155
BIOS 2083
c
Abdus
S. Wahed
Linear Models
Example 4.4.14. Example 4.4.8 continued.
⎤
⎡
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
T
X X=⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎣
8 4 4 2 2 2 2⎥
⎥
4 4 0 2 2 0 0⎥
⎥
⎥
4 0 4 0 0 2 2⎥
⎥
⎥
2 2 0 2 0 0 0⎥
⎥,
⎥
2 2 0 0 2 0 0⎥
⎥
⎥
⎥
2 0 2 0 0 2 0⎥
⎦
2 0 2 0 0 0 2
(4.4.36)
a g-inverse of which is given by,
⎤
⎡
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
T
g
(X X) = ⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎣
Chapter 4
0 0 0
0
0
0
0 ⎥
⎥
0 0 0 0
0
0
0 ⎥
⎥
⎥
0 0 0 0
0
0
0 ⎥
⎥
⎥
0 0 0 1/2 0
0
0 ⎥
⎥.
⎥
0 0 0 0 1/2 0
0 ⎥
⎥
⎥
⎥
0 0 0 0
0 1/2 0 ⎥
⎦
0 0 0 0
0
0 1/2
(4.4.37)
156
BIOS 2083
Linear Models
⎤T ⎛
⎡
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
T
X Y=⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎣
1 1 0 1 0 0 0⎥
⎥
1 1 0 1 0 0 0⎥
⎥
⎥
1 1 0 0 1 0 0⎥
⎥
⎥
1 1 0 0 1 0 0⎥
⎥
⎥
1 0 1 0 0 1 0⎥
⎥
⎥
1 0 1 0 0 1 0⎥
⎥
⎥
⎥
1 0 1 0 0 0 1⎥
⎦
1 0 1 0 0 0 1
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎝
c
Abdus
S. Wahed
⎞
Y111 ⎟ ⎛
⎟ ⎜
Y112 ⎟
⎟ ⎜
⎟ ⎜
⎜
Y121 ⎟
⎟ ⎜
⎟ ⎜
⎜
Y122 ⎟
⎟ ⎜
⎟=⎜
⎜
Y211 ⎟
⎟ ⎜
⎟ ⎜
⎜
Y212 ⎟
⎟ ⎜
⎟ ⎜
⎟ ⎜
Y221 ⎟ ⎝
⎠
Y222
⎞
Y... ⎟
⎟
Y1.. ⎟
⎟
⎟
Y2.. ⎟
⎟
⎟
Y11. ⎟
⎟
⎟
Y12. ⎟
⎟
⎟
⎟
Y21. ⎟
⎠
Y22.
(4.4.38)
Thus, a solution to the normal equations is given by
⎛
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎝
⎞
⎛
μ̂ ⎟
⎜ 0
⎜
⎟
⎜ 0
α̂1 ⎟
⎜
⎟
⎜
⎟
⎜ 0
α̂2 ⎟
⎜
⎟
⎜
⎟
T
g T
⎜ Ȳ
=
(X
X)
X
Y
=
β̂11 ⎟
⎜ 11.
⎟
⎜
⎟
⎜ Ȳ
⎟
β̂12 ⎟
⎜ 12.
⎜
⎟
⎜
⎟
β̂21 ⎟
⎜ Ȳ21.
⎝
⎠
Ȳ22.
β̂22
⎞
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟.
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎠
(4.4.39)
Therefore the linear MVUE (or BLUE) of the estimable function
α1 − α2 + (β11 + β12)/2 − (β21 + β22)/2 is given by (Ȳ11. + Ȳ12.)/2 −
(Ȳ21. + Ȳ22.)/2.
Chapter 4
157
BIOS 2083
4.4.1
Linear Models
c
Abdus
S. Wahed
A comment on estimability and Missing data
The concept of estimability is very important in drawing statistical
inference from a linear model. What effects can be estimated from
an experiment totally depends on how the experiment was designed.
For instance, in a two-way nested model, difference between two
main effects is not estimable, whereas difference between two nested
effects within the same main effect is. In an over-parameterized oneway ANOVA model (One-way ANOVA with an intercept term), the
treatment effects are not estimable while the difference between any
two pair of treatments is estimated by the difference in corresponding
cell means.
When observations in some cells are missing, the problem of estimability becomes more acute. We illustrate the concept by using an
example. Consider the two-way nested design considered in Example
4.4.8. Suppose after planning the experiment, the observation corresponding to the last two rows of X matrix could not be observed.
Chapter 4
158
BIOS 2083
c
Abdus
S. Wahed
Linear Models
Thus the observed design matrix is given by
⎤
⎡
XM
⎢
⎢
⎢
⎢
⎢
⎢
⎢
=⎢
⎢
⎢
⎢
⎢
⎢
⎣
1 1 0 1 0 0 0⎥
⎥
1 1 0 1 0 0 0⎥
⎥
⎥
1 1 0 0 1 0 0⎥
⎥
⎥.
1 1 0 0 1 0 0⎥
⎥
⎥
1 0 1 0 0 1 0⎥
⎥
⎦
1 0 1 0 0 1 0
(4.4.40)
How does this effect the estimability of certain functions? Note that
rank(XM ) = 3. A basis for the null space of XT is given by
⎧
⎞
⎞
⎞
⎛
⎛
⎛
⎛ ⎞⎫
⎪
⎪
⎪
1 ⎟
0 ⎟
0 ⎟
0 ⎟⎪
⎪
⎪
⎪
⎪
⎜
⎜
⎜
⎜
⎪
⎪
⎪
⎟
⎟
⎟
⎟
⎪
⎜
⎜
⎜
⎜
⎪
⎪
⎪
⎟
⎟
⎟
⎟
⎪
⎜
⎜
⎜
⎜
0
1
0
0
⎪
⎪
⎪
⎟
⎟
⎟
⎟
⎪
⎜
⎜
⎜
⎜
⎪
⎪
⎪
⎟
⎟
⎟
⎟
⎪
⎜
⎜
⎜
⎜
⎪
⎪
⎪
⎟
⎟
⎟
⎟
⎪
⎜
⎜
⎜
⎜
⎪
0
0
1
0
⎪
⎪
⎟
⎟
⎟
⎟
⎪
⎜
⎜
⎜
⎜
⎨
⎟
⎟
⎟
⎜
⎜
⎜
⎜ ⎟⎬
, e2 = ⎜
, e3 = ⎜
, e4 = ⎜
e1 = ⎜
−1 ⎟
−1 ⎟
0 ⎟
0⎟
⎟
⎟
⎟
⎟⎪ .
⎜
⎜
⎜
⎜
⎪
⎟
⎟
⎟
⎜
⎜
⎜
⎜ ⎟⎪
⎪
⎪
⎪
⎜ −1 ⎟
⎜ −1 ⎟
⎜ 0 ⎟
⎜ 0 ⎟⎪
⎪
⎪
⎪
⎟
⎟
⎟
⎜
⎜
⎜
⎜ ⎟⎪
⎪
⎪
⎪
⎟
⎟
⎟
⎜
⎜
⎜
⎜ ⎟⎪
⎪
⎪
⎪
⎟
⎟
⎟
⎜
⎜
⎜
⎜ ⎟⎪
⎪
⎪
⎪
−1
0
−1
0
⎪
⎟
⎟
⎟
⎟
⎜
⎜
⎜
⎜
⎪
⎪
⎪
⎪
⎪
⎠
⎠
⎠
⎠
⎝
⎝
⎝
⎝
⎪
⎪
⎪
⎪
⎩
1
1
1
1 ⎭
(4.4.41)
1. Is α1 estimable?
α1 = (0, 1, 0, 0, 0, 0, 0)β = λT1 β.
λT1 e1 = 0 = λT1 e2 → Not estimable.
Chapter 4
159
BIOS 2083
Linear Models
c
Abdus
S. Wahed
2. Is μ + α1 estimable?
3. Is α1 − α2 estimable?
4. Is α1 − α2 + (β11 + β12)/2 − (β21 + β22)/2 estimable?
Here, λT4 = (0, 1, −1, 1/2, 1/2, −1/2, −1/2), and
λT4 e1 = 0 → Not estimable.
5. Is α1 − α2 + (β11 + β12)/2 − β21 estimable?
Here, λT5 = (0, 1, −1, 1/2, 1/2, −1, 0), and you can check that
λT5 ej = 0, j = 1, 2, 3, 4 → Estimable.
Chapter 4
160
BIOS 2083
4.5
Linear Models
c
Abdus
S. Wahed
Least squares estimation under linear constraints
Often it is desirable to estimate the parameters from a linear model
under certain linear constraints. Two possible scenarios where such
constrained minimization of the error sum of squares (Y − Xβ2)
becomes handy are as follows.
1. Converting a non-full rank model to a full rank model.
A model of non-full rank can be transformed into a full rank
model by imposing a linear constraint on the model. Let us
take a simple example of a balanced one-way ANOVA with two
treatments. The over-parameterized version of this model can
be written as
Yij = μ + αi + ij , i = 1, 2; j = 1, 2, . . . , n.
(4.5.1)
We know from our discussion that αi is not estimable in this
model. We also know that the X-matrix is not of full rank,
Chapter 4
161
BIOS 2083
Linear Models
c
Abdus
S. Wahed
leading to more than one solutions for the normal equations
2μ + α1 + α2 = 2Ȳ..
μ + α1 = Ȳ1.
μ + α2 = Ȳ2.
(4.5.2)
One traditional way of obtaining a unique solution is to impose
some restrictions on the parameters. A popular one is to treat
one of the treatment effect as a reference by setting it equal to
zero. Treating α2 = 0 leads to the solution α̂1 = Ȳ1. − Ȳ2. and
μ̂ = Ȳ2.. Another restriction that is commonly applied is that
the treatment effects are centered to zero. That is, α1 + α2 = 0.
If we apply this last restriction to the above normal equations,
we obtain a unique solution: μ̂ = Ȳ.., α̂1 = Ȳ1. − Ȳ.., and α̂2 =
Ȳ2. − Ȳ...
2. Testing a linear hypothesis. One major goal in statistical analysis involving linear models is to test certain hypothesis
regarding the parameters. A certain linear hypothesis can be
Chapter 4
162
BIOS 2083
Linear Models
c
Abdus
S. Wahed
tested by comparing the residual sum of squares from the model
under the null hypothesis to the same from unrestricted model
(no hypothesis). Details will follow in Chapter 6.
4.5.1
Restricted Least Squares
Suppose the linear model is of the form
Y = Xβ + ,
(4.5.3)
where a set of linear restrictions
AT β = b
(4.5.4)
has been imposed on the parameters for given matrices A and b.
We want to minimize the residual sum of squares
Y − Xβ2 = (Y − Xβ)T (Y − Xβ)
(4.5.5)
for β to obtain the LS estimators under the constraints (4.5.4).The
problem can easily be written as a Lagrangian optimization problem
by constructing the objective function
E = (Y − Xβ)T (Y − Xβ) + 2λT (AT β − b),
Chapter 4
(4.5.6)
163
BIOS 2083
Linear Models
c
Abdus
S. Wahed
which needs to be minimized unconditionally with respect to β and
λ. Taking the derivatives of (4.5.6) with respect to β and λ and
setting them equal to zero, we obtain,
XT Xβ + Aλ = XT Y
AT β = b.
(4.5.7)
(4.5.8)
The above equations will be referred to as restricted normal equations
(RNE). We will consider two different scenarios.
CASE I. AT β is estimable.
A set of q linear constraints AT β is estimable if and only if each
constraint is estimable. If we write A as (a1 a2 . . . aq ) and b =
(b1, b2, . . . , bq ), then AT β is estimable iff each component aTi β is
estimable. Although, the q constraints need not be independently
estimable, but we assume that they are so that is rank(A) = q.
If they are not, one can easily reduce them into a set of linearly
independent constraints.
Now, if (β̂ r , λ̂r ) is a solution to the restricted normal equations,
Chapter 4
164
BIOS 2083
c
Abdus
S. Wahed
Linear Models
then from (4.5.7) we obtain,
β̂ r = (XT X)g (XT Y − Aλ̂r ) = β̂ − (XT X)g Aλ̂r .
(4.5.9)
From (4.5.8), using (4.5.9), assuming the required inverse exists,
T
T
g
λ̂r = A (X X) A
−1
(AT β̂ − b).
(4.5.10)
But we have not yet shown that AT (XT X)g A is invertible. The
following proposition takes care of that.
Proposition 4.5.1. In terms of the notations of this section,
when AT β is estimable,
rank(AT (XT X)g A) = rank(A) = q.
(4.5.11)
Proof.
Chapter 4
165
BIOS 2083
c
Abdus
S. Wahed
Linear Models
Using (4.5.9) and (4.5.10), it is possible to express the restricted
least square estimator β̂ r in terms of an unrestricted LS estimator
β̂:
−1
β̂ r = β̂ − (XT X)g A AT (XT X)g A (AT β̂ − b).
(4.5.12)
Example 4.5.2. Take the simple example of one-way balanced
ANOVA from the beginning of this section. Consider the restriction α1 − α2 = 0, which can be written as AT β = 0, where
⎞
⎛
⎜ 0 ⎟
⎟
⎜
⎜
A=⎜ 1 ⎟
(4.5.13)
⎟
⎝
⎠
−1
A g-inverse of the XT X-matrix is given by
⎛
⎞
0 0
0
⎜
⎜
⎜ 0 1/n 0
⎝
0 0 1/n
⎟
⎟
⎟
⎠
with corresponding unrestricted solution,
⎛
0
⎜
⎜
β̂ = ⎜ Ȳ1.
⎝
Ȳ2.
Chapter 4
(4.5.14)
⎞
⎟
⎟
⎟.
⎠
(4.5.15)
166
BIOS 2083
Linear Models
c
Abdus
S. Wahed
AT β̂ = Ȳ1. − Ȳ2. .
(4.5.16)
AT (XT X)g A = 2/n.
⎛
⎞
0
⎜
⎟
⎜
⎟
T
g
(X X) A = ⎜ 1/n ⎟ .
⎝
⎠
−1/n
(4.5.17)
Using these in equation 4.5.12, we obtain
⎛
⎞ ⎛
0
0
⎜
⎟ ⎜
⎜
⎟ ⎜
β̂ r = ⎜ Ȳ1. ⎟ − ⎜ 1/n
⎝
⎠ ⎝
−1/n
Ȳ2.
⎛
⎞
0
⎜
⎟
⎜
⎟
= ⎜ (Ȳ1. + Ȳ2. )/2 ⎟
⎝
⎠
(Ȳ1. + Ȳ2. )/2.
(4.5.18)
⎞
⎟ #n$
⎟
(Ȳ1. − Ȳ2.).
⎟
⎠ 2
(4.5.19)
Is this restricted solution unique? Try with a different g-inverse.
(Note you do not have to compute AT (XT X)g A, as it is invariant to
the choice of a g-inverse.)
Chapter 4
167
BIOS 2083
Linear Models
c
Abdus
S. Wahed
Properties of restricted LS estimator
Proposition 4.5.3. 1. E β̂r = (XT X)g XT Xβ = Hβ = E β̂ .
#
$
T g T &
−1 T
2. cov β̂ r = σ (X X) D (X X)
A .
, where D = I−A AT (XT X)g A
T
3. E(RSSr ) = E (Y − Xβ̂r ) (Y − Xβ̂ r ) = (n − r + q)σ 2.
2
%
T
g
Proof. We will leave the first two as exercises. For the third one,
RSSr = (Y − Xβ̂ r )T (Y − Xβ̂r )
= (Y − Xβ̂ + Xβ̂ − Xβ̂ r )T (Y − Xβ̂ + Xβ̂ − Xβ̂ r )
∈N (XT )
∈C(X)
T
= (Y − Xβ̂) (Y − Xβ̂) + (β̂ − β̂ r )T XT X(β̂ − β̂ r )
−1 T T g T
= RSS + (AT β̂ − b)T AT (XT X)g A
A (X X)
−1 T
XT X(XT X)g A AT (XT X)g A
(A β̂ − b)
−1 T
(A β̂ − b).
= RSS + (AT β̂ − b)T AT (XT X)g A
%
&
T T g −1 T
T
T
(A β̂ − b)
E(RSSr ) = E(RSS) + E (A β̂ − b) A (X X) A
%
&
−1
2
T
T
g
T
cov(A β̂ − b)
= (n − r)σ + trace A (X X) A
%
−1 2 T T g &
2
T
T
g
σ A (X X) A
= (n − r)σ + trace A (X X) A
= (n − r + q)σ 2 .
Chapter 4
(4.5.20)
168
BIOS 2083
Linear Models
c
Abdus
S. Wahed
CASE II. AT β is not estimable.
A set of q linear constraints AT β is non-estimable if and only if each
constraint is non-estimable and no linear combination of the linear
constraints is estimable. Assume as before that columns of A are
independent. That is, rank(A) = q. This means Ac ∈
/ C(XT ), for
all p × 1 vectors c (why?). This in turn implies that
C(A) ∩ C(XT ) = {0} .
(4.5.21)
On the other hand, from the RNEs,
Aλ̂r = XT (Y − Xβˆr ) ∈ C(XT ).
(4.5.22)
But by definition,
Aλ̂r ∈ C(A).
(4.5.23)
Aλ̂r = 0.
(4.5.24)
Together we get,
Since the columns of A are independent, this last equation implies
that λ̂r = 0. The normal equation (4.5.7) then reduces to
XT Xβ = XT Y,
Chapter 4
(4.5.25)
169
BIOS 2083
Linear Models
c
Abdus
S. Wahed
which is the normal equation for the unrestricted LS problem. Thus
RNEs in this case have a solution
β̂ r = β̂ = (XT X)g XT Y, and
(4.5.26)
λ̂r = 0.
(4.5.27)
Therefore, in this case the residual sums of squares from restricted
and unrestricted model are identical. i.e. RSSr = RSS.
Chapter 4
170
BIOS 2083
4.6
Linear Models
c
Abdus
S. Wahed
Problems
1. The least squares estimator of β can be obtained by minimizing Y − Xβ2 . Use the
derivative approach to derive the normal equations for estimating β.
2. For the linear model
yi = μ + αxi + i , i = 1, 2, 3,
where xi = (i − 1).
(a) Find P and I − P.
(b) Find a solution to the equation Xβ = PY.
(c) Find a solution to the equation XT Xβ = XT Y. Is this solution same as the
solution you found for the previous equation?
(d) What is the null space of XT for this problem?
3. Show that, for any general linear model, the solutions to the system of linear equations
Xβ = PY are the same as the solutions to the normal equations XT Xβ = XT Y.
4. Show that
(a) I − P is a projection matrix onto the null space of XT , and
(b) XT X(XT X)g is a projection onto the column space of XT .
5. (a) If Ag is a generalized inverse of A, then show that A− = Ag AAg + (I − Ag A)B +
C(I − AAg ) is also a g-inverse of A for any conformable matrices B nd C.
(b) In class, we have shown that β̂ = (XT X)g XT Y is a solution to the normal equations XT Xβ = XT Y for a given g-inverse(XT X)g of XT X. Show that β̃ is a
solution to the normal equations if and only if there exists a vector z such that
Chapter 4
171
BIOS 2083
Linear Models
c
Abdus
S. Wahed
β̃ = (XT X)g XT Y + (I − (XT X)g XT X)z. (Thus, by varying z, one can swipe out
all possible solutions to the normal equations.)
(c) In fact, β̃ = GXT Y generates all solutions to the normal equations, for all possible
generalized inverses G of XT X. To show this, start with the general solution
β̃ = (XT X)g XT Y + (I − (XT X)g XT X)z (from part (b)). Also take it as a fact
that for a given non-zero vector Y and an arbitrary vector z, there exists an
arbitrary matrix M such that z = MY. Use this fact, along with the result from
part (a) to write β̃ as GXT Y where G is a g-inverse of XT X.
6. For the general one-way ANOVA model,
yij = μ + αi + ij , i = 1, 2, . . . , a; j = 1, 2, . . . , ni ,
(a) What is the X matrix?
(b) Find r(X).
(c) Find a basis for the null space of X.
(d) Give a basis for the set of all possible linearly independent estimable functions.
(e) Give conditions under which c0 μ + ai=1 ci αi is estimable. In particular, is μ
estimable? Is α1 − α2 estimable?
(f) Obtain a solution to the normal equation for this problem and find the least
square estimator of αa − α1 .
7. Consider the linear model
Y = Xβ + , E() = 0, cov() = σ 2 In .
(4.6.1)
Follow the following steps to show that if λT β is estimable, then λT β̂ is the BLUE of
λT β, where β̂ is a solution to the normal equations (XT X)β = XT Y.
Chapter 4
172
BIOS 2083
c
Abdus
S. Wahed
Linear Models
(a) Consider another linear unbiased estimator c + dt Y of λT β. Show that c must be
equal to zero and dT X = λT .
(b) Now we will show that var(c + dT Y) can be written as the var(λT β̂) plus some
non-negative quantity. To do this, write
var(c + dt Y) = var(dT Y) = var(λT β̂ + dT Y − λT β̂ ).
g(Y)
Show that g(Y) defined in this manner is a linear function of Y.
(c) Show that λT β̂ and g(Y) are uncorrelated. Hint: Use (i) cov(AY, BY) =
Acov(Y)B T (ii) Result from part (b).
(d) Hence
var(c + dT Y) = var(dT Y) = var(λT β̂) + . . . .
In other words, variance of any other linear unbiased estimator is greater than or
equal to the variance of the least square estimator.
(e) Show that var(c + dT Y) = var(λT β̂) only if c + dT Y = λT β̂.
8. One example of a simple two-way nested model is as follows. Suppose two instructors
taught two classes using Teaching Method I, and three instructors taught two classes
with Teaching Method II. Let Yijk is the average score for the kth class taught by jth
instructor with ith teaching method. The model can be written as:
Yijk = μ + αi + βij + ijk .
Assume E(ijk ) = 0, and cov(ijk , i1 j1 k1 ) = σ 2 , if i = i1 , j = j1 , k = k1 ; 0, otherwise.
(a) Write this model as Y = Xβ + , explicitly describing the X matrix and β.
(b) Find r, the rank of X. Give a basis for the null space of X.
Chapter 4
173
BIOS 2083
c
Abdus
S. Wahed
Linear Models
(c) Write out the normal equations and give a solution to the normal equations.
(d) How many linearly independent estimable functions can you have in this problem?
Provide a list of such estimable functions and give the least squares estimators
for each one.
(e) Show that the difference in the effect of two teaching methods is not estimable.
9. Consider the linear model
Yij =
i−1
βk + ij , i = 1, 2, 3; j = 1, 2;
(4.6.2)
k=0
with E(ij ) = 0; V ar(ij ) = σ 2 ; cov(ij , i j ) = 0 whenever i = i or j = j.
9(a) Write the above model in the form of a general linear model. Find rank(X).
9(b) Find β = (β0 , β1 , β2 )T such that the quantity
'
(2
3 i−1
2
βk
E=
Yij −
i=1 j=1
(4.6.3)
k=0
is minimized. Call it β̂ = (βˆ0 , βˆ1 , βˆ2 ).
9(c) Find the mean and variance of β̂.
For the rest of the parts of this question, assume that ij ’s are normally
distributed.
9(d) What is the distribution of β̂?
9(e) What is the distribution of βˆ1 ?
9(f) What is the distribution of D = βˆ1 − βˆ2 ?
9(g) Find the distribution of
Ê =
3 2
i=1 j=1
Chapter 4
'
Yij −
i−1
(2
βˆk
.
(4.6.4)
k=0
174
BIOS 2083
Linear Models
c
Abdus
S. Wahed
9(h) Are D and Ê independent?
)
9(i) Find the distribution of D/ Ê.
10. Consider the analysis of covariance model
Yij = μ + αi + γXij + ij , i = 1, 2; j = 1, 2, . . . , n,
where Xij represents the value of a continuous explanatory variable.
(a) Write this model as Y = Xβ + , explicitly describing the X matrix and β.
(b) Find r, the rank of X. Give a basis for the null space of X.
(c) Give a basis for the null space of X.
(d) Is the regression coefficient γ estimable?
(e) Give conditions under which a linear function aμ+bα1 +cα2 +dγ will be estimable.
For the rest of the problem, assume n = 5, and Xi1 = −2, Xi2 = −1, Xi3 =
0, Xi4 = 1, and Xi5 = 2, i = 1, 2.
(f) Give an expression for the LS estimator of γ and α1 − α2 , if exists.
(g) Obtain the LS estimator of γ under the restriction that α1 = α2 .
(h) Obtain the LS estimator of α1 − α2 under the restriction that γ = 0.
(i) Obtain the LS estimator of γ under the restriction that α1 + α2 = 0.
11. Consider the two-way crossed ANOVA model with an additional continuous baseline
covariate Xij :
Yij = μi + αj + γXij + ij , i = 1, 2; j = 1, 2; k = 1, 2,
Chapter 4
(4.6.5)
175
BIOS 2083
c
Abdus
S. Wahed
Linear Models
under usual assumptions (I and II from lecture note). Let the parameter vector be β =
(μ1 , μ2 , α1 , α2 , γ)T and X be the corresponding X matrix. Define X̄i. = 2j=1 Xij /2, i =
1, 2 and X̄.j = 2i=1 Xij /2, j = 1, 2.
(a) Find rank(X).
(b) Give a basis for the null space of X.
(c) Give conditions under which λT β will be estimable. In particular:
i. Is γ estimable?
ii. Is μ1 − μ2 estimable?
iii. Is α1 − α2 + γ(X̄1. − X̄2. ) estimable?
iv. Is μ1 − μ2 + γ(X̄.1 − X̄.2 ) estimable?
v. Is μ1 + γ(X̄.1 + X̄.2 )/2 estimable?
12. Consider the linear model:
Yijk = βi + βj + ijk , i, j = 1, 2, 3; i < j; k = 1, 2,
(4.6.6)
so that there are a total of 6 observations.
(a) Write the model in matrix form and compute the X T X-matrix.
(b) Write down the normal equations explicitly.
(c) Give condition(s), if any, under which a linear function
3
i=1
λi βi is estimable,
where λi , i = 1, 2, 3 are known constants.
(d) If the observation corresponding to (i, j) = (2, 3) is missing, then the above model
reduces to a familiar model. How would you respond to part (c) in this situation?
Chapter 4
176
BIOS 2083
Linear Models
c
Abdus
S. Wahed
13. I have come across a tiny dataset with 5 variables y, x1 , x2 , x3 , and x4 . I use SAS for
most of my day-to-day data analysis work. Here are the data, program, and the result
of an analysis to “regress” y on x1 , x2 , x3 , and x4 .
data x;
input y x1 x2 x3 x4;
cards;
11 1 -3 0 4
21 1 -2 1 3
13 1 -1 0 2
45 1 0 1 1
50 1 1 0 0
;run;
proc glm;
model y=x1 x2 x3 x4/noint solution;
estimate "2b1+b2+b4" x1 2 x2 1 x3 0 x4 1;
estimate "2b1-b2-b3" x1 2 x2 -1 x3 -1 x4 0;
estimate "b1+b2" x1 1 x2 1 x3 0 x4 0;
estimate "b4" x1 0 x2 0 x3 0 x4 1;
estimate "b1+b4" x1 1 x2 0 x3 0 x4 1;
run;
quit;
Output:
========================================
Chapter 4
177
BIOS 2083
Linear Models
c
Abdus
S. Wahed
/* Parameter Estimates*/
Parameter Estimate
SE
t
Pr > |t|
x1 34.86666667 B 6.78167465 5.14 0.0358
x2 10.20000000 B 3.25781113 3.13 0.0887
x3 8.33333333
9.40449065 0.89 0.4690
x4 0.00000000 B . . .
/* Contrast Estimates*/
Parameter Estimate SE t Pr > |t|
2b1+b2+b4 79.9333333 15.3958147 5.19 0.0352
b1+b2 45.0666667 8.8221942 5.11 0.0363
b1+b4 34.8666667 6.7816747 5.14 0.0358
I am puzzled by several things I see in the output.
(a) All the parameter estimates except the one corresponding to x3 has a letter ‘B’
next to it. What explanation can you provide for that?
(b) What happens to the parameter estimates if you set-up the model as ‘model y=x2
x3 x4 x1’ or ‘y=x1 x2 x4 x3’ ? Can you explain the differences across these three
sets of parameter estimates?
(c) Although I set up 5 contrasts, the output only shows three of them. Why? Justify
your answers using the techniques you have learned in Chapter 4.
14. Consider the simple linear model
Yi = μ + α (−1)i , i = 1, 2, . . . , 2n − 1, 2n.
Chapter 4
(4.6.7)
178
BIOS 2083
Linear Models
c
Abdus
S. Wahed
(a) Show that U = (Y2 + Y1 )/2 and V = (Y2 − Y1 )/2 are unbiased estimators of μ and
α, respectively. What is the joint distribution of U and V under normality and
independence assumptions for Yi ’s? You can make any assumptions regarding the
variances of Yi ’s.
(b) Find the least square estimators of μ and α, respectively. Obtain their joint distribution under the same assumption as above. Are they independently distributed?
(c) Compare estimators in (a) and (b) and comment.
15. In an experiment where several treatments are compared with a control, it may be desirable to replicate the control more than the experimental treatments since the control
enters into every difference investigated. Suppose each of t experimental treatment is
replicated m times while the control is replicated n (> m) times. Let Yij denote the
jth observation on the ith experimental treatment, j = 1, 2, . . . , m; i = 1, 2, . . . , t, and
let Y0j denote the jth observation on the control, j = 1, 2, . . . , n. Assume the following
linear model:
Yij = τi + ij ,
where i = 0, 1, 2, . . . , t; j = 1, 2, . . . , n for i = 0, and j = 1, 2, . . . , m for i = 1, 2, . . . , t.
(a) Write the above linear model in the matrix form Y = Xβ + by explicitly
specifying the random response vector Y, the design matrix X, and the parameter
vector β.
(b) Show that the differences between the treatments and controls θi = τi − τ0 , i =
1, 2, . . . , t are estimable.
(c) Obtain the least square estimator of θi , i = 1, 2, . . . , t.
Chapter 4
179
BIOST 2083: Linear Models
Homework Assignment I
Due: September 9 2010
1. Determine if the following are linear models or not:
(a) Yi = α + βxi + i , where Yi , i = 1, 2, . . . , n are independent Bernoulli random
variables, and xi ’s are fixed values of a continuous covariate.
(b) Yi , i = 1, 2, . . . , n are independent Poisson random variables with mean eα + βxi ,
where xi ’s are fixed values of a continuous covariate.
(c) E[Y1i /Y2i ] = α + βxi , where (Y1i , Y2i ), i = 1, 2, . . . , n are pre- and post-treatment
levels of certain cytokine from the ith individual, and xi ’s are fixed values of a
continuous covariate.
2. Solve problems 1, 4, and 8 from Chapter 1.
3. Assuming i = 1, 2, . . . , n and j = 1, 2, write model (1.1.8) in matrix form. What is
rank(X)? Find a basis for the null space of X.
Download