BIOS 2083: Linear Models Abdus S Wahed August 30, 2010 BIOS 2083 Chapter 0 Linear Models Abdus S. Wahed 2 Chapter 1 Introduction to linear models 1.1 Linear Models: Definition and Examples Example 1.1.1. Estimating the mean of a N(μ, σ 2 ) random variable. 1. Suppose Y1, Y2, . . . Yn be n i.i.d random variables from N (μ, σ 2) distribution. • What can we tell about μ based on these n observations? • Likelihood: 2 L(μ, σ ) = n fYi (yi ) √ −n 1 2 exp − 2 (yi − μ) = σ 2π 2σ i=1 3 (1.1.1) BIOS 2083 Linear Models Abdus S. Wahed • Maximum likelihood estimator: 1 = Ȳ = Yi . n i=1 n μ̂M LE (1.1.2) 2. Now consider Y1 , Y2, . . . Yn be n random variables such that Yi = μ + i , i = 1, 2, . . . , n, (1.1.3) where i ’s are i.i.d N (0, σ 2) random variables. • How can we draw inference on μ? • Likelihood: 2 L(μ, σ ) = = n i=1 n fYi (yi ) fi (yi − μ) √ −n 1 2 = σ 2π exp − 2 (yi − μ) 2σ i=1 (1.1.4) • Maximum likelihood estimator: 1 = Ȳ = Yi . n i=1 n μ̂M LE (1.1.5) SAME RESULT AS BEFORE In the Equation (1.1.3), we have expressed the random variable Y as a linear function of the parameter μ plus an error term. Chapter 1 4 BIOS 2083 Linear Models Abdus S. Wahed Definition 1.1.1. A model which expresses the response as a linear function of the parameter(s) (plus an error term that has mean zero) is known as a linear model. The model • (log-linear regression) ln Yi = α + βXi + i ; E(i) = 0 is a linear model, while • Yi = exp(α + βXi ) + i ; E(i) = 0 is not. Chapter 1 5 BIOS 2083 Linear Models Abdus S. Wahed Figure 1.1: Age distribution in African American(AA) and Caucasian Americans (CA) volunteered to participate in a clinical study. Example 1.1.2. Testing the equality of the means of two independent normal populations. 1. Suppose Yi1, Yi2, . . . Yini be ni i.i.d random variables from N (μi , σ 2) distribution for i = 1, 2. • How do we test the equality of the two means μ1 and μ2 based on these n1 + n2 = n observations? Chapter 1 6 BIOS 2083 Linear Models Abdus S. Wahed • Hypothesis: H0 : μ1 = μ2 . • Usual test statistic: Tpooled = Ȳ1 − Ȳ2 ∼ tn−2 , 1 1 S n1 + n2 where S is the pooled sample standard deviation. • Alternatively: Write Yij = μi + ij , i = 1, 2; j = 1, 2, . . . , ni, (1.1.6) where ij ’s are i.i.d. N (0, σ 2) random variables. • We will show later that this alternative representation also leads to the same test statistic as Tpooled. • The equation (1.1.6) shows that two-sample t-test can be viewed as a linear model, as well. Chapter 1 7 BIOS 2083 Linear Models Abdus S. Wahed Example 1.1.3. Paired experiment (before and after test). Suppose observations are collected on a number of individuals in two separate conditions (temperatures, times, before/after treatment). Let Yji denote the response from the ith individual at condition j, i = 1, 2, . . . , n; j = 1, 2. The goal is to see if the mean response is similar across the two conditions. Since the observations are paired, one would be willing to construct the differences Di = Y2i − Y1i and draw inference on the expected difference δ = E(D) = E(Y2) − E(Y1). We can write this problem as a linear model in many different ways: Di = δ + i , E(i) = 0. (1.1.7) Yji = μ + αj + ji , E(ji) = 0. (1.1.8) and In the second model, one draws inference on the difference α2 − α1 . It can be shown that both models lead to the same conclusion under normality assumption when 1i and 2i in the second construction are assumed to be correlated. Chapter 1 8 BIOS 2083 Linear Models Abdus S. Wahed Example 1.1.4. Simple linear regression. Very often we are interested in associating one variable (covariate) to another variable (outcome). For instance, consider a random sample of n leukemia patients who were diagnosed at age wi, i = 1, 2, . . . , n and died at age Yi , i = 1, 2, . . . , n. The objective is to relate the survival times to the age at diagnosis. • The simple linear regression assumes the following model: Yi = α + βwi + i . (1.1.9) • The goal is to estimate the “parameters” α (intercept) and β (slope) so that given the age at prognosis, one can predict how long the patient is going to survive. Chapter 1 9 BIOS 2083 Linear Models Abdus S. Wahed Figure 1.2: Regression of survival time on age at prognosis Chapter 1 10 BIOS 2083 Linear Models Abdus S. Wahed Figure 1.3: Polynomial regression of survival time on age at prognosis Example 1.1.5. Polynomial regression (Example 1.1.4 continued..) • The quadratic linear regression assumes the following model: Yi = α + βwi + γwi2 + i . (1.1.10) • The goal is to estimate the “parameters” α (intercept), β (linear coefficient) and γ (quadratic coefficient) so that again, as in previous example, given the age at prognosis, one can predict how long the patient is going to survive. Chapter 1 11 BIOS 2083 Linear Models Abdus S. Wahed Example 1.1.6. Multiple linear regression • Extend the idea of Examples 1.1.4 and 1.1.5 to associate a single response to a set of k explanatory variables. Multiple linear regression assumes the following model: Yi = β0 + β1 x1i + β2 x2i + . . . + βk xki + i . (1.1.11) • The goal is to estimate the “parameters” βj , j = 0, 1, . . . , k with the goal of investigating the relationship between the response Y and the explanatory variables Xj , j = 1, 2, . . . , k. Chapter 1 12 BIOS 2083 Linear Models Abdus S. Wahed Example 1.1.7. Transformed data. Inverse square law states that the force of gravity between two particle situated at D distance apart can be modeled by F = Dβ . (1.1.12) • Consider a log transformation on the both sides of (1.1.12) to obtain Y = α + βx + . (1.1.13) where Y = ln(F ), x = ln(D), and α = ln( ). • Model (1.1.13) is basically in the form of model (1.1.9). Chapter 1 13 BIOS 2083 Linear Models Abdus S. Wahed Example 1.1.8. One-way analysis of variance (ANOVA). Consider a clinical trial in which we are interested in comparing a treatments. Suppose ni patients are randomized to the ith treatment. Let Yij denote the response from the jth patient receiving the ith treatment, μ the overall mean response, and αi the incremental effect of treatment i, i = 1, 2, . . . , a and j = 1, 2, . . . , ni. Then: • The one-way analysis of variance model is written as Yij = μ + αi + ij , (1.1.14) where ij is the error term associated with the jth observation from the ith treatment and have zero mean. Chapter 1 14 BIOS 2083 Linear Models Abdus S. Wahed Example 1.1.9. Two-way analysis of variance (ANOVA). A clinical trial is being planned to compare a treatments. However, the treatments are known to have different effect in different racial groups and one would like to adjust for race while determining the effect of treatment on the response. Suppose ni = ni1 +ni2 patients are randomized to the ith treatment with nij patients belonging to the jth racial group, j = 1, 2. Let Yijk denote the response from the kth patient belonging to the jth racial group receiving the ith treatment, μ the overall mean response, αi the incremental effect of treatment i, and βj the incremental effect for race j, i = 1, 2, . . . , a, j = 1, 2, and k = 1, 2, . . . , nij . Then: • The two-way analysis of variance model is written as Yijk = μ + αi + βj + ijk , (1.1.15) where ijk is the error term associated with the kth patient belonging to racial group j and receiving treatment i. Chapter 1 15 BIOS 2083 Linear Models Abdus S. Wahed Example 1.1.10. Analysis of covariance The objective of analysis of covariance is similar to the previous two examples. Here we are interested in comparing a treatments adjusting for continuous covariates. • There are multiple representation of an analysis of covariance model with a single adjusting covariate. Yij = μ + αi + βxij + ij , (1.1.16) Yij = μ + αi + β(xij − x̄..) + ij , (1.1.17) Yij = μ + αi + βi xij + ij , (1.1.18) Yij = μ + αi + β(xij − x̄i.) + ij , (1.1.19) where – μ = overall mean response, – αi = incremental mean response from ith treatment, – β = effect of the adjusting covariate X, – xij = value of the covariate X for the subject j from treat. group i, – x̄.. = overall mean for the adjusting covariate Chapter 1 16 BIOS 2083 Linear Models Abdus S. Wahed – x̄i. = ith treatment group-specific mean for the adjusting covariate X. ij is the error term associated with the jth patient receiving treatment i. You can find more examples of linear models in different applications of statistics. I have cited only a few simple ones that are commonly applied in day-to-day data analysis. More complex linear models can be constructed to address particular problems of interest. 1.2 General form of linear model All the models in the examples from the previous section can be written in a general form using a response vector Y , a matrix of constants X, a parameter vector β and an error vector . • Specifically, a general linear model will have the form Y = Xβ + , (1.2.1) where Chapter 1 17 BIOS 2083 – – – – ⎛ ⎞ Linear Models Abdus S. Wahed ⎜ Y1 ⎟ ⎟ ⎜ ⎟ ⎜ ⎜ Y2 ⎟ ⎟ is an n × 1 vector of response, Y =⎜ ⎟ ⎜ ⎜ ··· ⎟ ⎟ ⎜ ⎠ ⎝ Yn ⎡ ⎤ ⎢ x11 x12 . . . x1p ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ x21 x22 . . . x2p ⎥ ⎥ is an n × p matrix of constants, X=⎢ ⎢ . ⎥ . . . ⎢ .. .. .. .. ⎥ ⎢ ⎥ ⎣ ⎦ xn1 xn2 . . . xnp ⎛ ⎞ ⎜ β1 ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ β2 ⎟ ⎟ is an p × 1 vector of parameters, and β=⎜ ⎜ ⎟ ⎜ ··· ⎟ ⎜ ⎟ ⎝ ⎠ βp ⎛ ⎞ ⎜ 1 ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ 2 ⎟ ⎟ is an n × 1 vector of error terms. =⎜ ⎜ ⎟ ⎜ ··· ⎟ ⎜ ⎟ ⎝ ⎠ n • The response vector Y usually contains the responses from patients, subjects, or experimental units. Chapter 1 18 BIOS 2083 Linear Models Abdus S. Wahed • The columns of X-matrix represents the values of the variables, the effect of which on the response is being studied, known as predictors, covariates, regressors, or independent variables. • Usually the first column of the X-matrix is a column of 1’s where there is an intercept in the model. • β is referred to as the parameter vector, regression coefficient (coefficient, in short). • The model is linear in the unknown coefficients β1, β2 , . . . , βp as (1.2.1) can be written as Y = p βj xj + , (1.2.2) j=1 where xj is the jth column of X. • Typically, for fixed X, the assumption that is required on a general linear model (1.2.1), or equivalently, (1.2.2) is that the error vector has mean zero. That is, Assumption I. E() = 0. • For random X, we require that E(Y |X) = Xβ, an assumption that is guaranteed to hold when E(|X) = 0. Chapter 1 19 BIOS 2083 Linear Models Abdus S. Wahed Now, how do we show that all the models considered as examples of linear models in the previous section can be written in the form (1.2.1)? 1. Example 1.1.1. Writing (1.1.3) specifically for each i = 1, 2, . . . , n, we can easily see that Y1 = μ + 1 Y2 = μ + 2 .. . . = .. Yn = μ + n , leading to ⎛ ⎞ ⎡ ⎜ Y1 ⎟ ⎢ ⎟ ⎜ ⎢ ⎟ ⎜ ⎢ ⎜ Y2 ⎟ ⎢ ⎟, X = ⎢ where Y = ⎜ ⎟ ⎜ ⎢ ⎜ ··· ⎟ ⎢ ⎟ ⎜ ⎢ ⎠ ⎝ ⎣ Yn Chapter 1 Y = Xβ + , ⎛ ⎤ 1⎥ ⎜ ⎜ ⎥ ⎜ ⎥ 1⎥ ⎜ ⎥ = 1n , β = μ, and = ⎜ ⎜ .. ⎥ ⎜ . ⎥ ⎜ ⎥ ⎝ ⎦ 1 ⎞ 1 ⎟ ⎟ ⎟ 2 ⎟ ⎟. ⎟ ··· ⎟ ⎟ ⎠ n 20 BIOS 2083 Linear Models Abdus S. Wahed 2. Example 1.1.2. For this problem, follow Equation (1.1.6) and write it out for all i and j which will lead to ⎡ ⎞ ⎛ ⎢1 ⎜ Y11 ⎟ ⎢ ⎟ ⎜ ⎢ ⎟ ⎜ ⎢1 ⎜ Y12 ⎟ ⎢ ⎟ ⎜ ⎢ . ⎟ ⎜ ⎢ .. ⎜ ··· ⎟ ⎢ ⎟ ⎜ ⎢ ⎟ ⎜ ⎢ ⎟ ⎜ ⎢1 ⎜ Y1n1 ⎟ ⎟, X = ⎢ Y =⎜ ⎢ ⎟ ⎜ ⎢0 ⎜ Y21 ⎟ ⎢ ⎟ ⎜ ⎢ ⎟ ⎜ ⎢ ⎟ ⎜ ⎢0 ⎜ Y22 ⎟ ⎢ ⎟ ⎜ ⎢ . ⎟ ⎜ ⎢ .. ⎜ ··· ⎟ ⎢ ⎟ ⎜ ⎣ ⎠ ⎝ 0 Y2n2 ⎤ 0⎥ ⎥ ⎥ 0⎥ ⎥ ⎥ ⎥ ⎥ ⎡ ⎤ ⎥ ⎥ 0 ⎥ ⎢ 1 n1 0 n1 ⎥ ⎥=⎣ ⎦, ⎥ 1⎥ 0 n2 1 n2 ⎥ ⎥ ⎥ 1⎥ ⎥ ⎥ ⎥ ⎥ ⎦ 1 ⎞ ⎛ ⎜ 11 ⎟ ⎟ ⎜ ⎟ ⎜ ⎜ 12 ⎟ ⎟ ⎜ ⎟ ⎜ ⎜ ··· ⎟ ⎟ ⎜ ⎛ ⎞ ⎟ ⎜ ⎟ ⎜ ⎜ 1n1 ⎟ ⎜ μ1 ⎟ ⎟. β=⎝ ⎠, and = ⎜ ⎟ ⎜ ⎜ 21 ⎟ μ2 ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎜ 22 ⎟ ⎟ ⎜ ⎟ ⎜ ⎜ ··· ⎟ ⎟ ⎜ ⎠ ⎝ 2n2 Chapter 1 21 BIOS 2083 Linear Models Abdus S. Wahed 3. Example 1.1.4. Chapter 1 22 BIOS 2083 Linear Models Abdus S. Wahed 4. Example 1.1.5. Chapter 1 23 BIOS 2083 Linear Models Abdus S. Wahed 5. Example 1.1.6. Chapter 1 24 BIOS 2083 Linear Models Abdus S. Wahed 6. Example 1.1.7. Chapter 1 25 BIOS 2083 Linear Models Abdus S. Wahed 7. Example 1.1.8. Chapter 1 26 BIOS 2083 Linear Models Abdus S. Wahed 8. Example 1.1.9. Chapter 1 27 BIOS 2083 Linear Models Abdus S. Wahed 9. Example 1.1.10. Chapter 1 28 BIOS 2083 1.3 Linear Models Abdus S. Wahed Problems 1. A clinical trial was designed to compare three treatments based on a continuous endpoint (Y ). Each treatment consists of doses of 6 pills to be taken orally everyday for 6 weeks. The patient population is highly variable regarding their medication adherence. A measure of adherence is given by the proportion of pills taken during the course of treatment. Suppose that the investigators would like to compare the treatments adjusting for the effect of adherence. They also suspect that the effect of adherence on response will vary by treatment group. Use your own sets of notations to propose a linear model to analyze the data from this trial. Write the model in matrix form. 2. An immunologist is investigating the effect of treatment on the expressions of Programmed Death - 1 (PD-1) molecules on disease-specific CD8 cells. For each patient, PD-1 levels are measured on 5 fixed pentamers (a viral capsomer having five structural units) before and after the end of the therapy. Patients are classified into early response groups (marked, intermediate, or poor) based on characteristics observed prior to meaChapter 1 29 BIOS 2083 Linear Models Abdus S. Wahed suring the PD-1 levels. It is well-known that pre-treatment PD-1 levels vary across early response groups. Accordingly, the immunologist would like to adjust for pre-treatment PD-1 levels while assessing the effect of treatment and early response on the change in PD-1 expressions. Assuming that there are n patients in each early response group, use your own set of notation to set up a linear model that will answer the immunologist’s questions. Write the model in matrix form. 3. Consider the linear model: Yijk = βi + βj + ijk , i, j = 1, 2, 3; i < j; k = 1, 2, (1.3.1) so that there are a total of 6 observations. Write the model in matrix form. 4. Suppose the investigators want to compare the effect of a treatments by treating N = an individuals. Treatments are allocated randomly in such a way that there are n individuals in each treatment group. Even though the treatments were assigned randomly, investigators are concerned that younger patients might respond better than the older patients. Therefore, the analyst needs to adjust for the factor age while comparing the Chapter 1 30 BIOS 2083 Linear Models Abdus S. Wahed treatments. The observed data for this problem is (Yij , Xij ), where Yij and Xij respectively denote the response and age for the jth individual assigned to the ith treatment. Suppose we want to treat age as a continuous variable and want to model the response as a linear function of treatment effect αi and age effect β. Write the linear model in the form Y = Xβ + . 5. Suppose 2n Hepatitis C patients are randomized equally to two treatments IFN (treatment 1) and IFN-RBV (treatment 2). Hepatitis C virus (HCV) RNA levels are measured on each patient at day 0 (timepoint 0) and at week 24 (timepoint 1). The objective of interest is to compare the effect of two treatments in reducing the HCV RNA levels after 24 weeks of therapy. The following linear model have been assumed: ⎧ ⎪ ⎨ μ + eijk , k = 0, yijk = i = 1, 2; j = 1, 2, . . . , n, (1.3.2) ⎪ ⎩ μ − αi + eijk , k = 1, where yijk denote the HCV RNA levels at timepoint k for the jth patient in the ith group. Write the above model in the form Y = Xβ + by explicitly defining Y , X and β. Chapter 1 31 BIOS 2083 Linear Models Abdus S. Wahed 6. Suppose Y11, Y12, . . . , Y1ni be n1 independent observations from a N (μ + α1 , σ 2) distribution and Y21, Y22, . . . , Y2n2 be n2 independent observations from a N (μ − α2 , σ 2) distribution. Notice that the two populations have different means but the same standard deviation. Assume that Y1j and Y2j are independent for all j. Define n1 + n2 = n, and Y = (Y11, Y12, . . . , Y1n1 , Y21, Y22, . . . , Y2n2 )T as the n × 1 vector consisting of all n observations. We write Y as Y = Xβ + . (1.3.3) What are the Y , X and β in the above equation? 7. Homeostasis Model Assessment (HOMA) is a measure of insulin resistance, calculated as a product of fasting glucose and insulin levels. The higher the HOMA score, the higher the insulin resistance. Researchers at the University of Michigan, An Arbor have collected fasting glucose and insulin levels for a group of hepatitis C patients undergoing peg-interferon therapy. HOMA score was computed for all patients at baseline and at 24 week post therapy. The goal was to identify factors associated with changes in insulin resistance in response to peg-interferon Chapter 1 32 BIOS 2083 Linear Models Abdus S. Wahed therapy. The candidate factors are: (i) Baseline BMI (a continuous measure) (ii) Peg-interferon dose (0 = placebo, 1 = 135mcg, and 2 = 180mcg) (iii) HCV negativity at week 24 (1 = negative, 0 = positive), and (iv) Interaction between (ii) and (iii). Use your own set of notation to develop a linear model for this problem. Make sure to clearly define each symbol that appears in your model. 8. In a recent weight loss study, subjects were randomized to two treatment groups - SBWP (standard behavioral weight-control program) and EWLI (extended weight loss intervention). Subjects in both treatment groups received instructions on exercise and diet in batches, (a subject could belong to one batch only). Subjects in EWLI group additionally received personalized text messages on their cellular phones. The study weighed each subject at baseline (Month 0), and then at months 6, 12, and 24. The aim of the study was to compare weight loss between the two groups at months 6, 12, and 24 from baseline. Using weight as the outcome variable, we would like to develop a linear model to conduct the Chapter 1 33 BIOS 2083 Linear Models Abdus S. Wahed statistical analysis for this study. The proposed model should treat time as a categorical independent variable. Since subjects received instructions in batches, the model should account for the correlation among patients belonging to the same batch. (a) Write the linear model using your own notation for random variables, parameters, and error term. You must define each of the terms, and describe assumptions you make about the random variables and the parameters in your model. (b) Express the null hypothesis ”The weight loss after 24 months of treatment is similar between the two treatment groups” in terms of the parameters of your model in part (a). Chapter 1 34 Chapter 2 A short review of matrix algebra 2.1 Vectors and vector spaces Definition 2.1.1. A vector a of dimension n is a collection of n elements typically written as ⎞ ⎛ ⎜ ⎜ ⎜ a=⎜ ⎜ ⎜ ⎝ a1 a2 .. . ⎟ ⎟ ⎟ ⎟ = (ai )n. ⎟ ⎟ ⎠ an Vectors of length 2 (two-dimensional vectors) can be thought of points in the plane (See figures). 35 BIOS 2083 Linear Models Abdus S. Wahed Figure 2.1: Vectors in two and three dimensional spaces (-1.5,2) (1, 1) (1, -2) x1 (2.5, 1.5, 0.95) x2 (0, 1.5, 0.95) x3 Chapter 2 36 BIOS 2083 Linear Models Abdus S. Wahed • A vector with all elements equal to zero is known as a zero vector and is denoted by 0. • A vector whose elements are stacked vertically is known as column vector whereas a vector whose elements are stacked horizontally will be referred to as row vector. (Unless otherwise mentioned, all vectors will be referred to as column vectors). • A row vector representation of a column vector is known as its trans T pose. We will use ⎛ the⎞notation ‘ ’ or ‘ ’ to indicate a transpose. For a ⎜ 1 ⎟ ⎟ ⎜ ⎜ a2 ⎟ T ⎟ instance, if a = ⎜ ⎜ .. ⎟ and b = (a1 a2 . . . an ), then we write b = a ⎜ . ⎟ ⎠ ⎝ an or a = bT . • Vectors of same dimension are conformable to algebraic operations such as additions and subtractions. Sum of two or more vectors of dimension n results in another n-dimensional vector with elements as the sum of the corresponding elements of summand vectors. That is, (ai)n ± (bi)n = (ai ± bi)n . Chapter 2 37 BIOS 2083 Linear Models Abdus S. Wahed • Vectors can be multiplied by a scalar. c(ai )n = (cai )n. • Product of two vectors of same dimension can be formed when one of them is a row vector and the other is a column The result is called ⎛ ⎞ ⎞ ⎛ vector. b a ⎜ 1 ⎟ ⎜ 1 ⎟ ⎜ ⎟ ⎟ ⎜ ⎜ b2 ⎟ ⎜ a2 ⎟ ⎜ ⎟ ⎟ inner, dot or scalar product. if a = ⎜ ⎜ .. ⎟ and b = ⎜ .. ⎟, then ⎜ . ⎟ ⎜ . ⎟ ⎝ ⎠ ⎠ ⎝ an bn aT b = a1 b1 + a2 b2 + . . . + an bn . Definition 2.1.2. The length, magnitude, or Euclidean norm of a vector is defined as the square root of the sum of squares of its elements and is denoted by ||.||. For example, n a2i = ||a|| = ||(ai )n|| = √ aT a. i=1 • The length of the sum of two or more vectors is less than or equal to the sum of the lengths of each vector. (Cauchy-Schwarz Inequality). ||a + b|| ≤ ||a|| + ||b|| Chapter 2 38 BIOS 2083 Linear Models Abdus S. Wahed Definition 2.1.3. A set of vectors {a1 , a2 , . . . , am} is linearly dependent if at least one of them can be written as a linear combination of the others. In other words, {a1 , a2 , . . . , am } are linearly dependent if there exists at least one non-zero cj such that m cj aj = 0. (2.1.1) j=1 In other words, for some k, ak = −(1/ck ) cj aj . j=k Definition 2.1.4. A set of vectors are linearly independent if they are not linearly dependent. That is, in order for (2.1.1) to hold, all cj ’s must be equal to zero. Chapter 2 39 BIOS 2083 Linear Models Abdus S. Wahed Definition 2.1.5. Two vectors a and b are orthogonal if their scalar product is zero. That is, aT b = 0, and we write a ⊥ b. Definition 2.1.6. A set of vectors is said to be mutually orthogonal if members of any pair of vectors belonging to the set are orthogonal. • If vectors are mutually orthogonal then they are linearly independent. Chapter 2 40 BIOS 2083 Linear Models Abdus S. Wahed Definition 2.1.7. Vector space. A set of vectors which are closed under addition and scalar multiplication is known as a vector space. Thus if V is a vector space, for any two vectors a and b from V, (i) ca a + cb b ∈ V, and (ii) ca a ∈ V for any two constants ca and cb . Definition 2.1.8. Span. All possible linear combinations of a set of linearly independent vectors form a Span of that set. Thus if A = {a1 , a2 , . . . , am } is a set of m linearly independent vectors, then the span of A is given by m span(A) = cj aj a:a= , j=1 for some numbers cj , j = 1, 2, . . . , m. Viewed differently, the set of vectors A generates the vector space span(A) and is referred to as a basis of span(A). Formally, • Let a1 , a2 , . . . , am be a set of m linearly independent n-dimensional vector in a vector space V that spans V. Then a1 , a2, . . . , am together forms a basis of V and the dimension of a vector space is defined by the number of vectors in its basis. That is, dim(V) = m. Chapter 2 41 BIOS 2083 2.2 Linear Models Abdus S. Wahed Matrix Definition 2.2.1. A matrix is a rectangular or square arrangement of numbers. A matrix with m rows and n columns is referred to as an m × n (read as ‘m by n’) matrix. An m × n matrix A with (i, j)th element aij is written as ⎡ A = (aij )m×n a ⎢ 11 ⎢ ⎢ a21 =⎢ ⎢ ⎢ ··· ⎣ am1 ⎤ a12 . . . a1n ⎥ ⎥ a22 . . . a2n ⎥ ⎥. ⎥ ··· ... ··· ⎥ ⎦ am2 . . . amn If m = n then the matrix is a square matrix. Definition 2.2.2. A diagonal matrix is a square matrix with non-zero elements in the diagonal cells and zeros elsewhere. A diagonal matrix with diagonal elements a1 , a2 , . . . , an is written as ⎤ ⎡ 0 ... 0 a ⎥ ⎢ 1 ⎥ ⎢ ⎢ 0 a2 . . . 0 ⎥ ⎥. diag(a1 , a2, . . . , an ) = ⎢ ⎥ ⎢ ⎢ ··· ··· ... ··· ⎥ ⎦ ⎣ 0 0 . . . an Definition 2.2.3. An n × n diagonal matrix with all diagonal elements equal to 1 is known as identity matrix of order n and is denoted by In . Chapter 2 42 BIOS 2083 Linear Models Abdus S. Wahed A similar notation Jmn is sometimes used for an m × n matrix with all elements equal to 1, i.e., ⎡ Jmn ⎤ 1 1 ⎢ ⎢ ⎢ 1 1 =⎢ ⎢ ⎢ ··· ··· ⎣ 1 1 ... 1 ⎥ ⎥ ... 1 ⎥ ⎥ = [1m 1m . . . 1m ] . ⎥ ... ··· ⎥ ⎦ ... 1 Like vectors, matrices with the same dimensions can be added together and results in another matrix. Any matrix is conformable to multiplication by a scalar. If A = (aij )m×n and B = (bij )m×n, then 1. A ± B = (aij ± bij )m×n, and 2. cA = (caij )m×n. Definition 2.2.4. The transpose of a matrix A = (aij )m×n is defined by AT = (aji)n×m. • If A = AT , then A is symmetric. • (A + B)T = (AT + BT ). Chapter 2 43 BIOS 2083 Linear Models Abdus S. Wahed Definition 2.2.5. Matrix product. If A = (aij )m×n and B = (aij )n×p, then AB = (cij )m×p, aik bkj = aTi bj , cij = k where ai is the ith row (imagine as a vector) of A and bj is the jth column (vector) of B. • (AB)T = BT AT , • (AB)C = A(BC),whenever defined, • A(B + C) = AB + AC, whenever defined, • Jmn Jnp = nJmp . Chapter 2 44 BIOS 2083 2.3 Linear Models Abdus S. Wahed Rank, Column Space and Null Space Definition 2.3.1. The rank of a matrix A is the number of linearly independent rows or columns of A. We denote it by rank(A). • rank(AT ) = rank(A). • An m × n matrix A with with rank m (n) is said to have full row (column) rank. • If A is a square matrix with n rows and rank(A) < n, then A is singular and the inverse does not exist. • rank(AB) ≤ min(rank(A), rank(B)). • rank(AT A) = rank(AAT ) = rank(A) = rank(AT ). Chapter 2 45 BIOS 2083 Linear Models Abdus S. Wahed Definition 2.3.2. Inverse of a square matrix. If A is a square matrix with n rows and rank(A) = n, then A is called non-singular and there exists a matrix A−1 such that AA−1 = A−1A = In . The matrix A−1 is known as the inverse of A. • A−1 is unique. • If A and B are invertible and has the same dimension, then (AB)−1 = B−1A−1. • (cA)−1 = A−1/c. • (AT )−1 = (A−1)T . Chapter 2 46 BIOS 2083 Linear Models Abdus S. Wahed Definition 2.3.3. Column space. The column space of a matrix A is the vector space generated by the columns of A. If A = (aij )m×n = (a1 a2 . . . an , then the column space of A, denoted by C(A) or R(A) is given by n C(A) = cj aj a:a= , j=1 for scalars cj , j = 1, 2, . . . , n. Alternatively, a ∈ C(A) iff there exists a vector c such that a = Ac. • What is the dimension of the vectors in C(A)? • How many vectors will a basis of C(A) have? • dim(C(A)) =? • If A = BC, then C(A) ⊆ C(B). • If C(A) ⊆ C(B), then there exist a matrix C such that A = BC. Example 2.3.1. Find a basis for the column space of the matrix ⎡ ⎤ −1 2 −1 ⎢ ⎥ ⎢ ⎥ A = ⎢ 1 1 4 ⎥. ⎣ ⎦ 0 2 2 Chapter 2 47 BIOS 2083 Linear Models Abdus S. Wahed Definition 2.3.4. Null Space. The null space of an m × n matrix A is defined as the vector space consisting of the solution of the system of equations Ax = 0. Null space of A is denoted by N (A) and can be written as N (A) = {x : Ax = 0} . • What is the dimension of the vectors in N (A)? • How many vectors are there in a basis of N (A)? • dim(N (A)) = n − rank(A) → Nullity of A. Chapter 2 48 BIOS 2083 Linear Models Abdus S. Wahed Definition 2.3.5. Orthogonal complements. Two sub spaces V1 and V2 of a vector space V forms orthogonal complements relative to V if every vector in V1 is orthogonal to every vector in V2 . We write V1 = V2⊥ or equivalently, V2 = V1⊥ . • V1 ∩ V2 = {0}. • If dim(V1 ) = r, then dim(V2) = n − r, where n is the dimension of the vectors in the vector space V. • Every vector a in V can be uniquely decomposed into two components a1 and a2 such that a = a1 + a2 , a1 ∈ V1 , a2 ∈ V2 . (2.3.1) • If (2.3.1) holds, then a 2 = a1 2 + a2 2. (2.3.2) How? Chapter 2 49 BIOS 2083 Linear Models Abdus S. Wahed Proof of (2.3.1). • Existence. Suppose it is not possible. Then a is independent of the basis vectors of V1 and V2 . But that would make the total number of independent vectors in V n + 1. Is that possible? • Uniqueness. Let two such decompositions are possible, namely, a = a1 + a2 , a1 ∈ V1 , a2 ∈ V2 , a = b1 + b2 , b1 ∈ V1 , b2 ∈ V2 . and Then, a1 − b1 = b2 − a2 . This implies a1 = b1 & b2 = a2 .(Why?) . Chapter 2 50 BIOS 2083 Linear Models Abdus S. Wahed Proof of (2.3.2). • From (2.3.1), a 2 = aT a = (a1 + a2 )T (a1 + a2 ) = aT1 a1 + aT1 a2 + aT2 a1 + aT2 a2 = a1 2 + a2 2. (2.3.3) This result is known as Pythagorean theorem. Chapter 2 51 BIOS 2083 Linear Models Abdus S. Wahed Figure 2.2: Orthogonal decomposition (direct sum) V1 = {(x, y): x = y İ R2} (3/2, 3/2) (2, 1) = V = {(x, y): x, y İ R2} + ( 1/2, -1/2 ) V2 = {(x, y): x, y İ R2,x+ y = 0} Chapter 2 52 BIOS 2083 Linear Models Abdus S. Wahed Theorem 2.3.2. If A is an m × n matrix, and C(A) and N (AT ) respectively denote the column and null space of A and AT , then C(A) = N (AT )⊥. Proof. • dim(C(A)) = rank(A) = rank(AT ) = r (say), dim(N (AT )) = m − r. • Suppose a1 ∈ C(A) and a2 ∈ N (AT ). Then, there exist a c such that Ac = a1 , and AT a2 = 0. Now, aT1 a2 = cT AT a2 = 0. Chapter 2 (2.3.4) 53 BIOS 2083 Linear Models Abdus S. Wahed • (More on Orthogonality.) If V1 ⊆ V2 , and V1⊥ and V2⊥ respectively denote their orthogonal complements, then V2⊥ ⊆ V1⊥ . Chapter 2 54 BIOS 2083 Linear Models Abdus S. Wahed Proof. Proof of the result on previous page. Suppose a1 ∈ V1. Then we can write a1 = A1c1, for some vector c1 and the columns of matrix A1 consisting of the basis vectors of V1 . And similarly, a2 = A2c2 , ∀ a2 ∈ V2 . In other words, V1 = C(A1) and V2 = C(A2). Since V1 ⊆ V2 , there exists a matrix B such that A1 = A2B. (See PAGE 39) Now let, a ∈ V2⊥ =⇒ a ∈ N (AT2 ) implying AT2 a = 0. But AT1 a = BT AT2 a = 0, providing that a ∈ N (AT1 ) = V2⊥ . Chapter 2 55 BIOS 2083 2.4 Linear Models Abdus S. Wahed Trace The trace of a matrix will become handy when we will talk about the distribution of quadratic forms. Definition 2.4.1. Trace of a square matrix is the sum of its diagonal elements. Thus, if A = (aij )n×n, then n aii trace(A) = i=1 . • trace(In ) = • trace(A) = trace(AT ) • trace(A + B) = trace(A) + trace(B) • trace(AB) = trace(BA) • trace(AT A) = trace(A2 ) = Chapter 2 n n i=1 2 j=1 aij . 56 BIOS 2083 2.5 Linear Models Abdus S. Wahed Determinants Definition 2.5.1. Determinant. The determinant of a scalar is the scalar itself. The determinants of an n × n matrix A = (aij )m×n is given by a scalar, written as |A|, where, n aij (−1)i+j |Mij |, |A| = j=1 for any fixed i, where, the determinant |Mij | of the matrix Mij is known as the minor of aij and the matrix Mij is obtained by deleting the ith row and jth column of matrix A. • |A| = |AT | • |diag(di , i = 1, 2, . . . , n)| = n i=1 di . This also holds if the matrix is an upper or lower triangular matrix with diagonal elements di, i = 1, 2, . . . , n. Chapter 2 57 BIOS 2083 Linear Models Abdus S. Wahed • |AB| = |A||B| • |cA| = cn |A| • If A is singular (rank(A) < n), then |A| = 0. • |A−1| = 1/|A|. • The determinants of block-diagonal (block-triangular) matrices works the way as you would expect. For instance, A C = |A||B|. 0 B In general Chapter 2 A B C D = |A||D − CA−1B|. 58 BIOS 2083 2.6 Linear Models Abdus S. Wahed Eigenvalues and Eigenvectors Definition 2.6.1. Eigenvalues and eigen vectors. The eigenvalues (λ) of a square matrix An×n and the corresponding eigenvectors (a) are defined by the set of equations Aa = λa. (2.6.1) Equation (2.6.1) leads to the polynomial equation |A − λIn | = 0. (2.6.2) For a given eigenvalue, the corresponding eigenvector is obtained as the solution to the equation (2.6.1). The solutions to equation (2.6.1) constitutes the eigenspace of the matrix A. Example 2.6.1. Find the eigenvalues ⎡ −1 ⎢ ⎢ A=⎢ 1 ⎣ 0 Chapter 2 and eigenvectors for the matrix ⎤ 2 0 ⎥ ⎥ 2 1 ⎥. ⎦ 2 −1 59 BIOS 2083 Linear Models Abdus S. Wahed Since in this course our focus will be on the eigenvalues of symmetric matrices, hereto forth we state the results on eigenvalues and eigenvectors applied to a symmetric matrix A. Some of the results will, however, hold for general A. If you are interested, please consult a linear algebra book such as Harville’s Matrix algebra from statistics perspective. Definition 2.6.2. Spectrum. The spectrum of a matrix A is defined as the set of distinct (real) eigenvalues {λ1, λ2 , . . . , λk } of A. • The eigenspace L of a matrix A corresponding to an igenvalue λ can be written as L = N (A − λIn ). • trace(A) = • |A| = n i=1 λi . n i=1 λi . • |In ± A| = n i=1 (1 ± λi ). • Eigenvectors associated with different eigenvalues are mutually orthogonal or can be chosen to be mutually orthogonal and hence linearly independent. • rank(A) is the number of non-zero λi ’s. Chapter 2 60 BIOS 2083 Linear Models Abdus S. Wahed The proof of some of these results can be easily obtained through the application of a special theorem called spectral decomposition theorem. Definition 2.6.3. Orthogonal Matrix. A matrix An×n is said to be orthogonal if AT A = In = AAT . This immediately implies that A−1 = AT . Theorem 2.6.2. Spectral decomposition. Any symmetric matrix Acan be decomposed as A = BΛBT , where Λ = diag(λ1 , . . . , λn ), is the diagonal matrix of eigenvalues and B is an orthogonal matrix having its columns as the eigenvectors of A, namely, A = [a1 a2 . . . an ], where aj ’s are orthonormal eigenvectors corresponding to the eigenvalues λj , j = 1, 2, . . . , n. Proof. Chapter 2 61 BIOS 2083 Linear Models Abdus S. Wahed Outline of the proof of spectral decomposition theorem: • By definition, B satisfies AB = BΛ, (2.6.3) and B T B = In . Then from (2.6.3), A = BΛB−1 = BΛBT . Spectral decomposition of a symmetric matrix allows one to form ’square root’ of that matrix. If we define √ √ A = B ΛBT , it is easy to verify that √ √ A A = A. In general, one can define Aα = BΛα BT , α ∈ R. Chapter 2 62 BIOS 2083 Linear Models Abdus S. Wahed Example 2.6.3. Find a matrix B and the matrix Λ (the diagonal matrix of eigenvalues) such that ⎡ A=⎣ Chapter 2 6 −2 −2 9 ⎤ ⎦ = BT ΛB. 63 BIOS 2083 2.7 Linear Models Abdus S. Wahed Solutions to linear systems of equations A linear system of m equations in n unknowns is written as Ax = b, (2.7.1) where Am×n is a matrix and b is a vector of known constants and x is an unknown vector. The goal usually is to find a value (solution) of x such that (2.7.1) is satisfied. When b = 0, the system is said to be homogeneous. It is easy to see that homogeneous systems are always consistent, that is, has at least one solution. • The solution set of a homogeneous system of equation Ax = 0 forms a vector space and is given by N (A). • A non-homogeneous system of equations Ax = b is consistent iff rank(A, b) = rank(A). – The system of linear equations Ax = b is consistent iff b ∈ C(A). – If A is square and rank(A) = n, then Ax = b has a unique solution given by x = A−1b. Chapter 2 64 BIOS 2083 2.7.1 Linear Models Abdus S. Wahed G-inverse One way to obtain the solutions to a system of equations (2.7.1) is just to transform the augmented matrix (A, b) into a row-reduced-echelon form. However, such forms are not algebraically suitable for further algebraical treatment. Equivalent to the inverse of a non-singular matrix, one can define an inverse, referred to as generalized inverse or in short g-inverse of any matrix, square or rectangular, singular or non-singular. This generalized inverse helps finding the solutions of linear equations easier. Theoretical developments based on g-inverse are very powerful for solving problems arising in linear models. Definition 2.7.1. G-inverse. The g-inverse of a matrix Am×n is a matrix Gn×m that satisfies the relationship AGA = A. Chapter 2 65 BIOS 2083 Linear Models Abdus S. Wahed The following two lemmas are useful for finding the g-inverse of a matrix A. Lemma 2.7.1. Suppose rank(Am×n) = r, and Am×n can be factorized as ⎤ ⎡ A11 A12 ⎦ Am×n = ⎣ A21 A22 such that A11 is of dimension r × r with rank(A11) = r. Then, a g-inverse of A is given by ⎡ Gn×m = ⎣ Example 2.7.2. Find the g-inverse ⎡ 1 ⎢ ⎢ A=⎢0 ⎣ 1 Chapter 2 ⎤ A−1 11 0 0 0 ⎦. of the matrix ⎤ 1 1 1 ⎥ ⎥ 1 0 −1 ⎥ . ⎦ 0 1 2 66 BIOS 2083 Linear Models Abdus S. Wahed Suppose you do not have an r × r minor to begin with. What do you do then? Lemma 2.7.3. Suppose rank(Am×n) = r, and there exists non-singular matrices B and C such that ⎡ BAC = ⎣ ⎤ D 0 ⎦. 0 0 where D is a diagonal matrix with rank(D) = r. Then, a g-inverse of A is given by ⎡ Gn×m = C−1 ⎣ Chapter 2 −1 ⎤ D 0 0 0 ⎦ B−1. 67 BIOS 2083 Linear Models Abdus S. Wahed • rank(G) ≥ rank(A). • G-inverse of a matrix is not necessarily unique. For instance, – If G is a g-inverse of a symmetric matrix A, then GAG is also a g-inverse of A. – If G is a g-inverse of a symmetric matrix A, then G1 = (G+GT )/2 is also a g-inverse of A. – The g-inverse of a diagonal matrix D = diag(d1 , . . . , dn) is another diagonal matrix Dg = diag(dg1 , . . . , dgn), where ⎧ ⎨ 1/di, di = 0, g di = ⎩ 0, d = 0. i Again, as you can see, we concentrate on symmetric matrices as this matrix properties will be applied to mostly symmetric matrices in this course. Chapter 2 68 BIOS 2083 Linear Models Abdus S. Wahed Another way of finding a g-inverse of a symmetric matrix. Lemma 2.7.4. Let A be an n-dimensional symmetric matrix. Then a ginverse of A, G is given by G = QT ΛQ, where Q and Λ bears the same meaning as in spectral decomposition theorem. 2.7.2 Back to the system of equations Theorem 2.7.5. If Ax = b is a consistent system of linear equations and G be a g-inverse of A, then Gb is a solution to Ax = b. Proof. Chapter 2 69 BIOS 2083 Linear Models Abdus S. Wahed Theorem 2.7.6. x∗ is a solution to the consistent system of linear equation Ax = b iff there exists a vector c such that x∗ = Gb + (I − GA)c, for some g-inverse G of A. Proof. Chapter 2 70 BIOS 2083 Linear Models Abdus S. Wahed Proof. Proof of Theorem 2.7.6. If part. For any compatible vector c and for any g-inverse G of A, define x∗ = Gb + (I − GA)c. Then, Ax∗ = A[Gb + (I − GA)c] = AGb + (A − AGA)c = b + 0 = b. Only If part. Suppose x∗ is a solution to the consistent system of linear equation Ax = b. Then x∗ = Gb + (x∗ − Gb) = Gb + (x∗ − GAx∗) = Gb + (I − GA)c, where c = x∗ . Remark 2.7.1. 1. Any solution to the system of equations Ax = b can be written as a sum of two components: one being a solution by itself and the other being in the null space of A. 2. If one computes one g-inverse of A, then he/she has identified all possible solutions of Ax = b. Chapter 2 71 BIOS 2083 Linear Models Abdus S. Wahed Example 2.7.7. Give a general form of the solutions to the system of equations 1 2 ⎢ ⎢ ⎢1 1 ⎢ ⎢ ⎢0 1 ⎣ 1 −1 Chapter 2 ⎤⎡ ⎡ ⎤ ⎡ ⎤ 5 x ⎥ ⎥⎢ 1 ⎥ ⎢ ⎥ ⎥ ⎢ ⎥⎢ 1 1 ⎥ ⎢ x2 ⎥ ⎢ 3 ⎥ ⎥. ⎥=⎢ ⎥⎢ ⎥ ⎥ ⎢ ⎥⎢ 0 −1 ⎥ ⎢ x3 ⎥ ⎢ 2 ⎥ ⎦ ⎦ ⎣ ⎦⎣ −1 x4 1 3 1 0 72 BIOS 2083 Linear Models Abdus S. Wahed Idempotent matrix and projections Definition 2.7.2. Idempotent matrix. A square matrix B is idempotent if B2 = BB = B. • If B is idempotent, then rank(B) = trace(B). • If Bn×n is idempotent, then In −B is also idempotent with rank(In −B) = n − trace(B). • If Bn×n is idempotent with rank(B) = n, then B = In . Lemma 2.7.8. If the m × n matrix A has rank r, then the matrix In − GA is idempotent with rank n − r, where G is a g-inverse of A. Chapter 2 73 BIOS 2083 Linear Models Abdus S. Wahed Definition 2.7.3. Projection. A square matrix Pn×n is a projection onto a vector space V ⊆ Rn iff all three of the following holds: (a) P is idempotent, (b) ∀x ∈ Rn , Px ∈ V, and (c)∀x ∈ V, Px = x. An idempotent matrix is a projection onto its own column space. Example 2.7.9. Let the vector space be defined as V = {(v1, v2), v2 = kv1} ⊆ R2 , ⎧ ⎫ ⎨ t (1 − t)/k ⎬ for some non-zero real constant k. Consider the matrix P = ⎩ kt (1 − t) ⎭ for any real number t ∈ R. Notice that (a) PP = P, (b) For any x = (x1, x2)T ∈ R2 , Px = (tx1 +(1−t)x2/k, ktx1 +(1−t)x2)T ∈ V. (c) For any x = (x1, x2)T = (x1, kx1) ∈ V, Px = x. Thus, P is a projection onto the vector space V. Notice that the projection P is not unique as it depends on the coice of t. Consider k = 1. Then V is the linear space representing the line with unit slope passing through the⎫ origin. ⎧ ⎨ 2 −1 ⎬ When multiplied by the projection matrix (for t = 2) P1 = , any ⎩ 2 −1 ⎭ point in the two-dimensional real space produces a point in V. For instance, the point (1, .5) when multiplied by P1 produces (1.5, 1.5) which belongs to Chapter 2 74 BIOS 2083 Linear Models Abdus S. Wahed Figure 2.3: Projections. V = {(x,y), x = y} (1.5,1.5) (.75, .75) P1 P2 (1,1/2) ⎧ ⎫ ⎨ .5 .5 ⎬ V. But the projection P2 = projects the point (1, .5) onto V at ⎩ .5 .5 ⎭ (0.75, 0.75). See figure. Chapter 2 75 BIOS 2083 Linear Models Abdus S. Wahed Back to g-inverse and solution of system of equations Lemma 2.7.10. If G is a g-inverse of A, then I − GA is a projection onto N (A). Proof. Left as an exercise. Lemma 2.7.11. If G is a g-inverse of A, then AG is a projection onto C(A). Proof. Left as an exercise (Done in class). Lemma 2.7.12. If P and Q are symmetric and both project onto the same space V ⊆ Rn , then P = Q. Proof. Chapter 2 76 BIOS 2083 Linear Models Abdus S. Wahed By definition, for any x ∈ Rn , Px ∈ V & Qx ∈ V. Let Px = x1 ∈ V & Qx = x2 ∈ V. Then, (P − Q)x = (x1 − x2 ), ∀x ∈ Rn . (2.7.2) Multiplying both sides by PT = P, PT (P − Q)x = P T (x1 − x2 ) = (x1 − x2 ), ∀x ∈ Rn . We get, P(P − Q)x = P(x1 − x2 ) = (x1 − x2 ), ∀x ∈ Rn . (2.7.3) Subtracting (2.7.2) from (2.7.3) we obtain, =⇒ [P(P − Q) − (P − Q)]x = 0, ∀x ∈ Rn , =⇒ Q = PQ. Multiplying both sides of (2.7.2) by QT = Q and following similar procedure we can show that P = PQ = Q. Chapter 2 77 BIOS 2083 Linear Models Abdus S. Wahed Lemma 2.7.13. Suppose V1 , V2(V1 ⊆ V2) are vector spaces in Rn and P1, ⊥ P2 , and P⊥ 1 are symmetric projections onto V1 , V2 , and V1 respectively. Then, 1. P1P2 = P2 P1 = P1 . (The smaller projection survives.) ⊥ 2. P⊥ 1 P1 = P1 P1 = 0. 3. P2 − P1 is a projection matrix. (What does it project onto?) Proof. See Ravishanker and Dey, Page 62, Result 2.6.7. Chapter 2 78 BIOS 2083 2.8 Linear Models Abdus S. Wahed Definiteness Definition 2.8.1. Quadratic form. If x is a vector in Rn and A is a matrix in Rn×n, then the scalar xT Ax is known as a quadratic form in x. The matrix A does not need to be symmetric but any quadratic form xT Ax can be expressed in terms of symmetric matrices, for, xT Ax = (xT Ax + xT AT x)/2 = xT [(A + AT )/2]x. Thus, without loss of generality, the matrix associated with a quadratic form will be assumed symmetric. Definition 2.8.2. Non-negative definite/Positive semi-definite. A quadratic form xT Ax and the corresponding matrix A is non-negative definite if xT Ax ≥ 0 for all x ∈ Rn . Definition 2.8.3. Positive definite. A quadratic form xT Ax and the corresponding matrix A is positive definite if xT Ax > 0 for all x ∈ Rn , x = 0, and xT Ax = 0 only when x = 0. Chapter 2 79 BIOS 2083 Linear Models Abdus S. Wahed Properties related to definiteness 1. Positive definite matrices are non-singular. The inverse of a positive definite matrix is also positive definite. 2. A symmetric matrix is positive (non-negative) definite iff all of its eigenvalues are positive (non-negative). 3. All diagonal elements and hence the trace of a positive definite matrix are positive. 4. If A is symmetric positive definite then there exists a nonsingular matrix Q such that A = QQT . 5. A projection matrix is always positive semi-definite. 6. If A and B are non-negative definite, then so is A + B. If one of A or B is positive definite, then so is A + B. Chapter 2 80 BIOS 2083 2.9 Linear Models Abdus S. Wahed Derivatives with respect to (and of ) vectors Definition 2.9.1. Derivative with respect to a vector. Let f (a) be any scalar function of the vector an×1 . Then the derivative of f with respect to a is defined as the vector ⎡ δf δa1 ⎤ ⎢ ⎥ ⎢ δf ⎥ ⎢ δa2 ⎥ δf ⎥, =⎢ ⎢ . δa ⎢ .. ⎥ ⎥ ⎣ ⎦ δf δan and the derivative with respect to the aT is defined as T δf δf = . δaT δa The second derivative of f with respect to a is written as the derivative of each of the elements in δf δa with respect to aT and stacked as rows of n × n matrix,. i.e., ⎡ 2 δ δ f = δaδaT δaT Chapter 2 δf δa ⎢ ⎢ ⎢ =⎢ ⎢ ⎢ ⎣ δ2f δa21 δ2f δa2 δa1 .. . δ2f δan δa1 δ2 f δa1 δa2 δ2 f δa22 .. . δ2 f δan δa2 ... ... .. . ... δ2f δa1 δan δ2f δa2 δan .. . δ2f δ 2 an ⎤ ⎥ ⎥ ⎥ ⎥. ⎥ ⎥ ⎦ 81 BIOS 2083 Linear Models Abdus S. Wahed Example 2.9.1. Derivative of linear and quadratic functions of a vector. 1. δaT b δb 2. δbT Ab δb = a. = Ab + AT b. Derivatives with respect to matrices can be defined in a similar fashion. We will only remind ourselves about one result on matrix derivatives which will become handy when we talk about likelihood inference. Lemma 2.9.2. If An×n is a symmetric non-singular matrix, then, δ ln |A| = A−1 . δA 2.10 Problems 1. Are the following sets of vectors linearly independent? If not, in each case find at least one vectors that are dependent on the others in the set. (a) v1T = (0, −1, 0), v2T = (0, 0, 1), v3T = (−1, 0, 0) (b) v1T = (2, −2, 6), v2T = (1, 1, 1) (c) v1T = (2, 2, 0, −2), v2T = (2, 0, 1, −1),v3T = (0, −2, 1, 1) Chapter 2 82 78 BIOS 2083 Linear Models Abdus S. Wahed 2. Show that a set of non-zero mutually orthogonal vectors v1 , v2 , . . . , vn are linearly independent. 3. Find the determinant and inverse of the matrices 1 ρ (a) , ρ 1 n×n 1 ρ ρ ρ 1 ρ , ρ ρ 1 1 ρ ... ρ ρ .. . 2 (c) Chapter 2 , ρ 1 1 ρ 1 ρ 0 ρ 1 ρ , 0 ρ 1 ρ .. . ρ ρ ... 1 1 ρ ρ 1 ρ , ρ 1 ρ , (b) ρ 1 2 ρ ρ 1 1 ... .. . ... 1 ρ .. . ρ 1 .. . ρ 2 ρ ... ... ρ ... ρ .. . ρn ρn−1 ρn−2 . . . n n−1 1 n×n 1 ρ 0 ... 0 0 ρ 1 ρ ... 0 0 0 ρ 1 ... 0 0 .. .. .. .. .. . . . ... . . 0 0 0 ... 1 ρ 0 0 0 ... ρ 1 83 79 BIOS 2083 Linear Models 4. Find the the rank and a basis for 1 2 1 3 1 1 0 1 1 2 Chapter 2 Abdus S. Wahed the null space of the matrix 2 −1 1 −2 3 0 −1 −1 2 −1 84 80 BIOS 2083 Chapter 2 Linear Models Abdus S. Wahed 84 Chapter 3 Random Vectors and Multivariate Normal Distributions 3.1 Random vectors Definition 3.1.1. Random vector. Random vectors are vectors of random variables. For instance, ⎞ ⎛ ⎜ ⎜ ⎜ X=⎜ ⎜ ⎜ ⎝ X1 X2 .. . ⎟ ⎟ ⎟ ⎟, ⎟ ⎟ ⎠ Xn where each element represent a random variable, is a random vector. Definition 3.1.2. Mean and covariance matrix of a random vector. The mean (expectation) and covariance matrix of a random vector X is de85 BIOS 2083 Linear Models fined as follows: Abdus S. Wahed ⎛ ⎞ E [X1] ⎜ ⎜ ⎜ E [X2] E [X] = ⎜ ⎜ .. ⎜ . ⎝ E [Xn] ⎟ ⎟ ⎟ ⎟, ⎟ ⎟ ⎠ and cov(X) = E {X − E (X)} {X − E (X)}T ⎤ ⎡ 2 σ σ12 . . . σ1n ⎥ ⎢ 1 ⎥ ⎢ 2 ⎢ σ21 σ2 . . . σ2n ⎥ ⎥, = ⎢ ⎥ ⎢ .. . . . . . . ⎢ . . . . ⎥ ⎦ ⎣ σn1 σn2 . . . σn2 (3.1.1) where σj2 = var(Xj ) and σjk = cov(Xj , Xk ) for j, k = 1, 2, . . . , n. Properties of Mean and Covariance. 1. If X and Y are random vectors and A, B, C and D are constant matrices, then E [AXB + CY + D] = AE [X] B + CE[Y] + D. (3.1.2) Proof. Left as an exercise. Chapter 3 86 BIOS 2083 Linear Models Abdus S. Wahed 2. For any random vector X, the covariance matrix cov(X) is symmetric. Proof. Left as an exercise. 3. If Xj , j = 1, 2, . . . , n are independent random variables, then cov(X) = diag(σj2, j = 1, 2, . . . , n). Proof. Left as an exercise. 4. cov(X + a) = cov(X) for a constant vector a. Proof. Left as an exercise. Properties of Mean and Covariance (cont.) 5. cov(AX) = Acov(X)AT for a constant matrix A. Proof. Left as an exercise. 6. cov(X) is positive semi-definite. Proof. Left as an exercise. 7. cov(X) = E[XXT ] − E[X] {E[X]}T . Proof. Left as an exercise. Chapter 3 87 BIOS 2083 Linear Models Abdus S. Wahed Definition 3.1.3. Correlation Matrix. A correlation matrix of a vector of random variable X is defined as the matrix of pairwise correlations between the elements of X. Explicitly, ⎤ ⎡ ⎢ ⎢ ⎢ corr(X) = ⎢ ⎢ ⎢ ⎣ 1 ρ21 .. . ρ12 . . . ρ1n 1 .. . ρn1 ρn2 ⎥ ⎥ . . . ρ2n ⎥ ⎥, .. .. ⎥ . . ⎥ ⎦ ... 1 (3.1.3) where ρjk = corr(Xj , Xk ) = σjk /(σj σk ), j, k = 1, 2, . . . , n. Example 3.1.1. If only successive random variables in the random vector X are correlated and have the same correlation ρ, then the correlation matrix corr(X) is given by ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ corr(X) = ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ ⎤ 1 ρ 0 ... 0 ⎥ ⎥ ρ 1 ρ ... 0 ⎥ ⎥ ⎥ 0 ρ 1 ... 0 ⎥ ⎥, .. .. .. .. .. ⎥ . . . . . ⎥ ⎥ ⎦ 0 0 0 ... 1 (3.1.4) Example 3.1.2. If every pair of random variables in the random vector X Chapter 3 88 BIOS 2083 Linear Models Abdus S. Wahed have the same correlation ρ, then the correlation matrix corr(X) is given by ⎤ ⎡ ⎢ 1 ρ ρ ... ρ ⎥ ⎥ ⎢ ⎢ ρ 1 ρ ... ρ ⎥ ⎥ ⎢ ⎥ ⎢ ⎥, (3.1.5) corr(X) = ⎢ ρ ρ 1 . . . ρ ⎥ ⎢ ⎢ . . . . . ⎥ ⎢ .. .. .. .. .. ⎥ ⎥ ⎢ ⎦ ⎣ ρ ρ ρ ... 1 and the random variables are said to be exchangeable. 3.2 Multivariate Normal Distribution Definition 3.2.1. Multivariate Normal Distribution. A random vector X = (X1, X2, . . . , Xn)T is said to follow a multivariate normal distribution with mean μ and covariance matrix Σ if X can be expressed as X = AZ + μ, where Σ = AAT and Z = (Z1, Z2, . . . , Zn ) with Zi , i = 1, 2, . . . , n iid N (0, 1) variables. Definition 3.2.2. Multivariate Normal Distribution. A random vector X = (X1, X2, . . . , Xn)T is said to follow a multivariate normal distribution with mean μ and a positive definite covariance matrix Σ if X has the density 1 1 exp − (x − μ)T Σ−1 (x − μ) (3.2.1) fX (x) = n/2 1/2 2 (2π) |Σ| Chapter 3 89 BIOS 2083 Linear Models ⎡ Abdus S.⎤Wahed 0.25 0.3 ⎦ Bivariate normal distribution with mean (0, 0)T and covariance matrix ⎣ 0.3 1.0 Probability Density 0.4 0.3 0.2 0.1 0 2 0 −2 x2 −3 −2 0 −1 1 2 3 x1 . Properties 1. Moment generating function of a N (μ, Σ) random variable X is given by Chapter 3 1 T T MX (t) = exp μ t + t Σt . 2 (3.2.2) 90 BIOS 2083 Linear Models Abdus S. Wahed 2. E(X) = μ and cov(X) = Σ. 3. If X1, X2, . . . , Xn are i.i.d N (0, 1) random variables, then their joint distribution can be characterized by X = (X1, X2, . . . , Xn)T ∼ N (0, In). 4. X ∼ Nn(μ, Σ) if and only if all non-zero linear combinations of the components of X are normally distributed. Linear transformation 5. If X ∼ Nn(μ, Σ) and Am×n is a constant matrix of rank m, then Y = Ax ∼ Np(Aμ, AΣAT ). Proof. Use definition 3.2.1 or property 1 above. Orthogonal linear transformation 6. If X ∼ Nn (μ, In) and An×n is an orthogonal matrix and Σ = In , then Y = Ax ∼ Nn (Aμ, In). Marginal and Conditional distributions Suppose X is Nn (μ, Σ) and X is partitioned as follows, ⎞ ⎛ X1 ⎠, X=⎝ X2 Chapter 3 91 BIOS 2083 Linear Models Abdus S. Wahed where X1 is of dimension p×1 and X2 is of dimension n−p×1. Suppose the corresponding partitions for μ and Σ are given by ⎛ ⎛ ⎞ ⎞ Σ Σ12 μ1 ⎠ , and Σ = ⎝ 11 ⎠ μ=⎝ μ2 Σ21 Σ22 respectively. Then, 7. Marginal distribution. X1 is multivariate normal - Np (μ1 , Σ11). Proof. Use the result from property 5 above. 8. Conditional distribution. The distribution of X1 |X2 is p-variate normal - Np(μ1|2, Σ1|2), where, μ1|2 = μ1 + Σ12Σ−1 22 (X2 − μ2 ), and Σ1|2 = Σ11 − Σ12 Σ−1 22 Σ21 , provided Σ is positive definite. Proof. See Result 5.2.10, page 156 (Ravishanker and Dey). Uncorrelated implies independence for multivariate normal random variables 9. If X, μ, and Σ are partitioned as above, then X1 and X2 are independent if and only if Σ12 = 0 = ΣT21. Chapter 3 92 BIOS 2083 Linear Models Abdus S. Wahed Proof. We will use m.g.f to prove this result. Two random vectors X1 and X2 are independent iff M(X1 ,X2 ) (t1 , t2) = MX1 (t1 )MX2 (t2). 3.3 Non-central distributions We will start with the standard chi-square distribution. Definition 3.3.1. Chi-square distribution. If X1 , X2, . . . , Xn be n inde pendent N (0, 1) variables, then the distribution of ni=1 Xi2 is χ2n (ch-square with degrees of freedom n). χ2n -distribution is a special case of gamma distribution when the scale parameter is set to 1/2 and the shape parameter is set to be n/2. That is, the density of χ2n is given by fχ2n (x) = (1/2)n/2 −x/2 n/2−1 e x , x ≥ 0; n = 1, 2, . . . , . Γ(n/2) Example 3.3.1. The distribution of (n − 1)S 2/σ 2, where S 2 = (3.3.1) n i=1 (Xi − X̄)2/(n−1) is the sample variance of a random sample of size n from a normal distribution with mean μ and variance σ 2 , follows a χ2n−1 . Chapter 3 93 BIOS 2083 Linear Models Abdus S. Wahed The moment generating function of a chi-square distribution with n d.f. is given by Mχ2n (t) = (1 − 2t)−n/2, t < 1/2. (3.3.2) The m.g.f (3.3.2) shows that the sum of two independent ch-square random variables is also a ch-square. Therefore, differences of sequantial sums of squares of independent normal random variables will be distributed independently as chi-squares. Theorem 3.3.2. If X ∼ Nn (μ, Σ) and Σ is positive definite, then (X − μ)T Σ−1(X − μ) ∼ χ2n . (3.3.3) Proof. Since Σ is positive definite, there exists a non-singular An×n such that Σ = AAT (Cholesky decomposition). Then, by definition of multivariate normal distribution, X = AZ + μ, where Z is a random sample from a N (0, 1) distribution. Now, Definition 3.3.2. Non-central chi-square distribution. Suppose X’s are as in Definition (3.3.1) except that each Xi has mean μi , i = 1, 2, . . . , n. Equivalently, suppose, X = (X1 , . . . , Xn)T be a random vector distributed as Nn (μ, In ), where μ = (μ1, . . . , μn )T . Then the distribution of ni=1 Xi2 = XT X is referred to as non-central chi-square with d.f. n and non-centrality Chapter 3 94 BIOS 2083 Linear Models Abdus S. Wahed 0.16 ← λ=0 0.14 0.12 ← λ=2 0.1 0.08 ← λ=4 0.06 ← λ=6 0.04 ← λ=8 0.02 0 ← λ=10 0 5 10 15 20 Figure 3.1: Non-central chi-square densities with df 5 and non-centrality parameter λ. parameter λ = n 2 i=1 μi /2 = 12 μT μ. The density of such a non-central chi- square variable χ2n (λ) can be written as a infinite poisson mixture of central chi-square densities as follows: ∞ e−λ λj (1/2)(n+2j)/2 −x/2 (n+2j)/2−1 e x . fχ2n (λ) (x) = j! Γ((n + 2j)/2) j=1 (3.3.4) Properties 1. The moment generating function of a non-central chi-square variable χ2n (λ) is given by Mχ2n (n,λ) (t) = (1 − 2t) Chapter 3 −n/2 2λt , t < 1/2. exp 1 − 2t (3.3.5) 95 BIOS 2083 Linear Models Abdus S. Wahed 2. E χ2n (λ) = n + 2λ. 3. V ar χ2n (λ) = 2(n + 4λ). 4. χ2n (0) ≡ χ2n . 5. For a given constant c, (a) P (χ2n (λ) > c) is an increasing function of λ. (b) P (χ2n (λ) > c) ≥ P (χ2n > c). Theorem 3.3.3. If X ∼ Nn (μ, Σ) and Σ is positive definite, then XT Σ−1X ∼ χ2n (λ = μT Σ−1 μ/2). (3.3.6) Proof. Since Σ is positive definite, there exists a non-singular matrix An×n such that Σ = AAT (Cholesky decomposition). Define, Y = {AT }−1X. Then, Definition 3.3.3. Non-central F -distribution. If U1 ∼ χ2n1 (λ) and U2 ∼ χ2n2 and U1 and U2 are independent, then, the distribution of F = U1/n1 U2/n2 (3.3.7) is referred to as non-central F -distribution with df n1 and n2 , and noncentrality parameter λ. Chapter 3 96 BIOS 2083 Linear Models Abdus S. Wahed 0.8 0.7 ← λ=0 0.6 ← λ=2 0.5 0.4 ← λ=4 0.3 ← λ=6 0.2 ← λ=8 ← λ=10 0.1 0 0 1 2 3 4 5 6 7 8 Figure 3.2: Non-central F-densities with df 5 and 15 and non-centrality parameter λ. Chapter 3 97 BIOS 2083 Linear Models 0.4 Abdus S. Wahed ← λ=0 0.35 ← λ=2 0.3 0.25 ← λ=4 0.2 ← λ=6 0.15 ← λ=8 0.1 ← λ=10 0.05 0 −0.05 −5 0 5 10 15 20 Figure 3.3: Non-central t-densities with df 5 and non-centrality parameter λ. Definition 3.3.4. Non-central t-distribution. If U1 ∼ N (λ, 1) and U2 ∼ χ2n and U1 and U2 are independent, then, the distribution of U1 T = U2/n (3.3.8) is referred to as non-central t-distribution with df n and non-centrality parameter λ. Chapter 3 98 BIOS 2083 3.4 Linear Models Abdus S. Wahed Distribution of quadratic forms Caution: We assume that our matrix of quadratic form is symmetric. Lemma 3.4.1. If An×n is symmetric and idempotent with rank r, then r of its eigenvalues are exactly equal to 1 and n − r are equal to zero. Proof. Use spectral decomposition theorem. (See Result 2.3.10 on page 51 of Ravishanker and Dey). Theorem 3.4.2. Let X ∼ Nn (0, In). The quadratic form XT AX ∼ χ2r iff A is idempotent with rank(A) = r. Proof. Let A be (symmetric) idempotent matrix of rank r. Then, by spectral decomposition theorem, there exists an orthogonal matrix P such that ⎡ PT AP = Λ = ⎣ ⎤ Ir 0 ⎦. (3.4.1) 0 0 ⎡ Define Y = PT X = ⎣ Chapter 3 PT1 X PT2 X ⎤ ⎤ ⎡ ⎦=⎣ Y1 Y2 ⎦, so that PT1 P1 = Ir . Thus, X = 99 BIOS 2083 Linear Models Abdus S. Wahed PY and Y1 ∼ Nr (0, Ir ). Now, XT Ax = (PY)T APY ⎡ ⎤ Ir 0 ⎦Y = YT ⎣ 0 0 = Y1T Y1 ∼ χ2r . (3.4.2) Now suppose XT AX ∼ χ2r . This means that the moment generating function of XT AX is given by MXT AX (t) = (1 − 2t)−r/2. (3.4.3) But, one can calculate the m.g.f. of XT AX directly using the multivariate normal density as MXT AX (t) = E exp (XT AX)t = exp (XT AX)t fX (x)dx T 1 1 T = exp (X AX)t exp − x x dx 2 (2π)n/2 1 T 1 x (In − 2tA)x dx exp − = 2 (2π)n/2 = |In − 2tA|−1/2 n = (1 − 2tλi )−1/2. (3.4.4) i=1 Chapter 3 100 BIOS 2083 Linear Models Abdus S. Wahed Equate (3.4.3) and (3.4.4) to obtain the desired result. Theorem 3.4.3. Let X ∼ Nn (μ, Σ) where Σ is positive definite. The quadratic form XT AX ∼ χ2r (λ) where λ = μT Aμ/2, iff AΣ is idempotent with rank(AΣ) = r. Proof. Omitted. Theorem 3.4.4. Independence of two quadratic forms. Let X ∼ Nn (μ, Σ) where Σ is positive definite. The two quadratic forms XT AX and XT BX are independent if and only if AΣB = 0 = BΣA. (3.4.5) Proof. Omitted. Remark 3.4.1. Note that in the above theorem, the two quadratic forms need not have a chi-square distribution. When they are, the theorem is referred to as Craig’s theorem. Theorem 3.4.5. Independence of linear and quadratic forms. Let X ∼ Nn (μ, Σ) where Σ is positive definite. The quadratic form XT AX and the linear form BX are independently distributed if and only if BΣA = 0. (3.4.6) Proof. Omitted. Chapter 3 101 BIOS 2083 Linear Models Abdus S. Wahed Remark 3.4.2. Note that in the above theorem, the quadratic form need not have a chi-square distribution. Example 3.4.6. Independence of sample mean and sample varin T ance. Suppose X ∼ Nn (0, In). Then X̄ = i=1 Xi /n = 1 X/n and 2 = ni=1(Xi − X̄)2/(n − 1) are independently distributed. SX Proof. Theorem 3.4.7. Let X ∼ Nn (μ, Σ). Then E XT AX = μT Aμ + trace(AΣ). (3.4.7) Remark 3.4.3. Note that in the above theorem, the quadratic form need not have a chi-square distribution. Proof. Theorem 3.4.8. Fisher-Cochran theorem. Suppose X ∼ Nn(μ, In ). Let Qj = XT Aj X, j = 1, 2, . . . , k be k quadratic forms with rank(Aj ) = rj such that XT X = kj=1 Qj . Then, Qj ’s are independently distributed as χ2rj (λj ) where λj = μT Aj μ/2 if and only if kj=1 rj = n. Proof. Omitted. Theorem 3.4.9. Generalization of Fisher-Cochran theorem. Suppose X ∼ Nn (μ, In). Let Aj , j = 1, 2, . . . , k be k n × n symmetric matrices with rank(Aj ) = rj such that A = kj=1 Aj with rank(A) = r. Then, Chapter 3 102 BIOS 2083 Linear Models Abdus S. Wahed 1. XT Aj X’s are independently distributed as χ2rj (λj ) where λj = μT Aj μ/2, and 2. XT AX ∼ χ2r (λ) where λ = k j=1 λj if and only if any one of the following conditions is satisfied. C1. Aj Σ is idempotent for all j and Aj ΣAk = 0 for all j < k. C2. Aj Σ is idempotent for all j and AΣ is idempotent. C3. Aj ΣAk = 0 for all j < k and AΣ is idempotent. C4. r = k j=1 rj and AΣ is idempotent. C5. the matrices AΣ, Aj Σ, j = 1, 2, . . . , k − 1 are idempotent and Ak Σ is non-negative definite. 3.5 Problems 1. Consider the matrix ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ A=⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ Chapter 3 ⎞ 8 4 4 2 2 2 2 ⎟ ⎟ 4 4 0 2 2 0 0 ⎟ ⎟ ⎟ 4 0 4 0 0 2 2 ⎟ ⎟ ⎟ 2 2 0 2 0 0 0 ⎟. ⎟ ⎟ 2 2 0 0 2 0 0 ⎟ ⎟ ⎟ 2 0 2 0 0 2 0 ⎟ ⎠ 2 0 2 0 0 0 2 103 BIOS 2083 Linear Models Abdus S. Wahed (a) Find the rank of this matrix. (b) Find a basis for the null space of A. (c) Find a basis for the column space of A. 2. Let Xi , i = 1, 2, 3 are independent standard normal random variables. Show that the variance-covariance matrix of the 3-dimensional vector Y, defined as ⎛ ⎞ 5X1 ⎜ ⎟ ⎜ ⎟ Y = ⎜ 1.6X1 − 1.2X2 ⎟ , ⎝ ⎠ 2X1 − X2 is not positive definite. ⎛ 3. Let ⎞ ⎡⎛ X1 ⎞⎤ ⎞ ⎛ μ1 1 ρ 0 ⎟⎥ ⎜ ⎟ ⎢⎜ ⎟ ⎜ ⎟⎥ ⎜ ⎟ ⎢⎜ ⎟ ⎜ X = ⎜ X2 ⎟ ∼ N3 ⎢⎜ μ2 ⎟ , ⎜ ρ 1 ρ ⎟⎥ . ⎠⎦ ⎝ ⎠ ⎣⎝ ⎠ ⎝ 0 ρ 1 X3 μ3 (a) Find the marginal distribution of X2 . (b) What is the conditional distribution of X2 given X1 = x1 and X3 = x3 ? Under what condition does this distribution coincide with the marginal distribution of X2 ? 4. If X ∼ Nn (μ, Σ), then show that (X − μ)T Σ−1 (X − μ) ∼ χ2n . 5. Suppose Y = (Y1 , Y2 , Y3 )T be distributed as N3 (0, σ 2 I3 ). (a) Consider the quadratic form: Q= Chapter 3 (Y1 − Y2 )2 + (Y2 − Y3 )2 + (Y3 − Y1 )2 . 3 (3.5.1) 104 BIOS 2083 Linear Models Abdus S. Wahed Write Q as Y T AY where A is symmetric. Is A idempotent? What is the distribution of Q/σ 2 ? Find E(Q). (b) What is the distribution of L = Y1 + Y2 + Y3 ? Find E(L) and V ar(L). (c) Are Q and L independent? Find E(Q/L2 ) 6. Write each of the following quadratic forms in XT AX form: (a) 1 X2 6 1 + 23 X22 + 16 X32 − 23 X1 X2 + 13 X1 X3 − 23 X2 X3 (b) nX̄ 2 , where X̄ = (X1 + X2 + . . . + Xn )/n. n 2 (c) i=1 Xi 2 n (d) i=1 Xi − X̄ 2 2 2 i2 (e) − X̄ , where X̄i. = Xi1 +X X ij i. i=1 j=1 2 2 21 +X22 (f) 2 2i=1 X̄i. − X̄.. , where X̄.. = X11 +X12 +X . 4 2 2 12 (g) 2 X̄1. − X̄.. + 3 X̄2. − X̄.. , where X̄1. = X11 +X ,X̄2. = 2 X21 +X22 +X23 , 3 X̄.. = 2X̄1. +3X̄2. . 5 In each case, determine if A is idempotent. If A is idempotent, find rank(A). ⎛ ⎞ ⎛ ⎞ μ1 1 0.5 ⎠, and Σ = ⎝ ⎠. Show that Q1 = 7. Let X ∼ N2 (μ, Σ), where μ = ⎝ 0.5 1 μ2 2 2 (X1 − X2 ) and Q2 = (X1 + X2 ) are independently distributed. Find the distribution of Q1 , Q2 , and Q2 . 3Q1 8. Assume that Y ∼ N3 (0, I3 ). Define Q1 = Y T AY and Q2 = Y T BY , where ⎛ ⎞ ⎛ ⎞ 1 1 0 1 −1 0 ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ A = ⎜ 1 1 0 ⎟ , and, B = ⎜ −1 1 0 ⎟. ⎝ ⎠ ⎝ ⎠ 0 0 1 0 0 0 Chapter 3 (3.5.2) 105 BIOS 2083 Linear Models Abdus S. Wahed Are Q1 and Q2 independent? Do Q1 and Q2 follow χ2 distribution? 9. Let Y ∼ N3 (0, I3 ). Let U1 = Y T A1 Y , U2 = Y T A2 Y , and V = BY where ⎛ ⎛ ⎞ ⎞ 1/2 1/2 0 1/2 −1/2 0 ⎜ ⎜ ⎟ ⎟ ⎜ ⎜ ⎟ ⎟ A1 = ⎜ 1/2 1/2 0 ⎟ , A2 = ⎜ −1/2 1/2 0 ⎟ , and, B = 1 1 0 ⎝ ⎝ ⎠ ⎠ 0 0 1 0 0 0 . (a) Are U1 and U2 independent? (b) Are U1 and V independent? (c) Are U2 and V independent? (d) Find the distribution of V . (e) Find the distribution of U2 . U1 (Include specific values for any parameters of the distribution.) 10. Suppose X = (X1 , X2 , X3 )T ∼ N3 (μ, σ 2 V ), where, μ = (1, 1, 0)T , and ⎡ ⎤ 1 0.5 0 ⎢ ⎥ ⎢ ⎥ V = ⎢ 0.5 1 0.5 ⎥ ⎣ ⎦ 0 0.5 1 (a) What is the joint distribution of (X1 , X3 )? (b) What is the conditional distribution of X1 , X3 |X2 ? (c) For what value or values of a does L = (aX12 + 12 X32 + √ 2aX1 X3 )/σ 2 follow a chi-square distribution? (d) Find the value of b for which L in (c) and M = X1 + bX3 are independently distributed. Chapter 3 106 BIOS 2083 Linear Models 11. Suppose X ∼ N3 (μ, Σ), where ⎛ μ ⎜ 1 ⎜ μ = ⎜ μ2 ⎝ μ3 Find the distribution of Q = ⎞ ⎛ Abdus S. Wahed σ12 ⎟ ⎜ ⎟ ⎜ ⎟ , and Σ = ⎜ 0 ⎠ ⎝ 0 3 Xi2 i=1 σi2 . ⎞ 0 σ22 0 0 ⎟ ⎟ 0 ⎟. ⎠ 2 σ3 Express the parameters of its distribution in terms of μi and σi2 , i = 1, 2, 3. What is the variance of Q? 12. Suppose X ∼ N(0, 1) and Y = UX, where U follows a uniform distribution on the discrete space {−1, 1} independently of X. (a) Find E(Y ) and cov(X, Y ). (b) Show that Y and X are not independent. 13. Suppose X ∼ N4 (μ, I4), where ⎞ ⎛ X ⎜ 11 ⎜ ⎜ X12 X=⎜ ⎜ ⎜ X21 ⎝ X22 ⎛ ⎞ α + a1 ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ α + a1 ⎟,μ = ⎜ ⎟ ⎜ ⎟ ⎜ α + a2 ⎠ ⎝ α + a2 ⎟ ⎟ ⎟ ⎟. ⎟ ⎟ ⎠ 2 i2 X − X̄ , where X̄i. = Xi1 +X . ij i. i=1 j=1 2 2 21 +X22 (b) Find the distribution of Q = 2 2i=1 X̄i. − X̄.. , where X̄.. = X11 +X12 +X . 4 (a) Find the distribution of E = 2 2 (c) Use Fisher-Cochran theorem to prove that E and Q are independently distributed. (d) What is the distribution of Q/E? Chapter 3 107 107 BIOS 2083 c Abdus S. Wahed Linear Models Chapter 4 General linear model: the least squares problem 4.1 Least squares (LS) problem As observed in Chapter 1, any linear model can be expressed in the form ⎛ Y ⎞ ⎡ X ⎜ Y1 ⎟ ⎢ x11 x12 . . . x1p ⎜ ⎟ ⎢ ⎜ Y2 ⎟ ⎢ x21 x22 . . . x2p ⎜ ⎟ ⎢ ⎜ ⎟=⎢ ... ... ... ⎜ · · · ⎟ ⎢ ... ⎜ ⎟ ⎢ ⎝ ⎠ ⎣ xn1 xn2 . . . xnp Yn Chapter 4 ⎤⎛ β ⎞ ⎛ ⎞ ⎥⎜ β1 ⎟ ⎜ 1 ⎟ ⎥⎜ ⎟ ⎜ ⎟ ⎥⎜ β2 ⎟ ⎜ 2 ⎟ ⎥⎜ ⎟ ⎜ ⎟ ⎥⎜ ⎟+⎜ ⎟. ⎥⎜ · · · ⎟ ⎜ · · · ⎟ ⎥⎜ ⎟ ⎜ ⎟ ⎦⎝ ⎠ ⎝ ⎠ βp n (4.1.1) 108 BIOS 2083 Linear Models c Abdus S. Wahed Usually X is a matrix of known constants representing the values of covariates, and Y is the vector of response and is an error vector with the assumption that E(|X) = 0. The goal is to find a value of β for which Xβ is a “close” approximation of Y. In statistical terms, one would like to estimate β such that the “distance” between Y and Xβ is minimum. One form of distance in real vector spaces is given by the length of the difference between two vectors Y and Xβ, namely, Y − Xβ2 = (Y − Xβ)T (Y − Xβ). (4.1.2) Note that for a given β, both Y and Xβ are vectors in Rn. In addition, Xβ is always a member of C(X). Thus, for given Y and X, the least squares problem can be characterized as a restricted minimization problem: Minimize Y − Xβ2 over β ∈ Rn. Or equivalently, Minimize Y − θ2 over θ ∈ C(X). Chapter 4 109 BIOS 2083 4.2 Linear Models c Abdus S. Wahed Solution to the LS problem Since θ belongs to the C(X), the value of θ that minimizes the distance between Y and θ is given by the orthogonal projection of Y onto the column space of X (see a formal proof below). Let Ŷ = Xβ̂ ∈ C(X) (4.2.1) is the orthogonal projection of Y onto the C(X). Then, since N (XT ) = C(X)⊥, one can write Y = Ŷ + e, (4.2.2) Y − Ŷ ∈ N (XT ). (4.2.3) where e ∈ N (XT ). Thus, Lemma 4.2.1. For any θ ∈ C(X), (Y − Ŷ)T (Ŷ − θ) = 0. (4.2.4) Proof. Chapter 4 110 BIOS 2083 Linear Models c Abdus S. Wahed Lemma 4.2.2. Y − θ2 is minimized when θ = Ŷ. Proof. Y − θ2 = (Y − θ)T (Y − θ) = (Y − Ŷ + (Ŷ − θ))T (Y − Ŷ + (Ŷ − θ)) = (Y − Ŷ)T (Y − Ŷ) + (Ŷ − θ)T (Ŷ − θ) = Y − Ŷ2 + Ŷ − θ2, (4.2.5) which is minimized when θ = Ŷ. Thus, we have figured out that Y − Xβ2 is minimum when β = β̂ is such that Ŷ = Xβ̂ is the orthogonal projection of Y onto the column space of X. But how do we find the orthogonal projection? Chapter 4 111 BIOS 2083 Linear Models c Abdus S. Wahed Normal equations Notice from the result in (4.2.3) that Y − Ŷ ∈ N (XT ) =⇒ XT (Y − Ŷ) = 0 =⇒ XT (Y − Xβ̂) = 0 =⇒ XT Y = XT Xβ̂ (4.2.6) Equation (4.2.6) is referred to as normal equations; solution of which, if exists will lead us to the orthogonal projection. Example 4.2.3. Example 1.1.3 (continued). The linear model in matrix form can be written as Y X ⎛ ⎞ ⎡ ⎜ Y1 ⎟ ⎢ 1 w1 ⎜ ⎟ ⎢ ⎜ Y2 ⎟ ⎢ 1 w2 ⎜ ⎟ ⎢ ⎜ ⎟=⎢ ⎜ · · · ⎟ ⎢ ... ... ⎜ ⎟ ⎢ ⎝ ⎠ ⎣ 1 wn Yn Chapter 4 ⎛ ⎞ β ⎥ ⎛ ⎞ ⎜ 1 ⎟ ⎜ ⎥ ⎟ ⎜ 2 ⎟ ⎥ α ⎥⎝ ⎠ ⎜ ⎟ +⎜ ⎥ ⎟. ⎜ ··· ⎟ ⎥ β ⎜ ⎥ ⎟ ⎝ ⎦ ⎠ n ⎤ (4.2.7) 112 BIOS 2083 Linear Models Here, ⎡ c Abdus S. Wahed ⎤ n wi ⎦ XT X = ⎣ 2 , wi wi and ⎞ ⎛ Yi T ⎠ ⎝ X Y= wi Yi The normal equations are then calculated as (4.2.8) (4.2.9) ⎫ ⎬ αn + β wi = Yi α wi + β wi Yi = wi Yi ⎭ (4.2.10) From the linear regression course, you know that, the solution to these normal equations is given by β̂ = (w Ȳ ) i −w̄)(Yi − (wi −w̄)2 α̂ = Ȳ − β̂ w̄, provided Chapter 4 ⎫ ⎬ ⎭ (4.2.11) (wi − w̄)2 > 0. 113 BIOS 2083 c Abdus S. Wahed Linear Models Example 4.2.4. Example 1.1.7 (continued). The linear model in matrix form can be written as β Y ⎞ ⎡ ⎛ ⎜ Y1 ⎟ ⎢ 1n1 1n1 ⎟ ⎢ ⎜ ⎜ Y2 ⎟ ⎢ 1 n 0 n ⎟ ⎢ 2 ⎜ 2 ⎟=⎢ ⎜ ... ⎜ · · · ⎟ ⎢ ... ⎟ ⎢ ⎜ ⎠ ⎣ ⎝ 1na 0na Ya ⎛ ⎞ X ⎤ ⎞ ⎛ μ ⎟ ⎜ 0n1 . . . 0n1 ⎥ ⎜ ⎟ ⎜ 1 ⎟ ⎜ ⎥ ⎜ α1 ⎟ ⎟ ⎟ ⎜ ⎥ ⎜ ⎟ ⎜ 2 ⎟ 1n2 . . . 0n2 ⎥ ⎜ ⎟ ⎟+⎜ ⎥⎜ ⎟, α 2 ⎟ ... ⎥ ⎜ ... ... ⎜ ⎟ ⎜ ··· ⎟ ⎥⎜ ⎟ ... ⎟ ⎝ ⎦⎜ ⎠ ⎟ ⎜ ⎠ a 0na . . . 1na ⎝ αa (4.2.12) where Yi = (Yi1, Yi2, . . . , Yini )T and i = (i1, i2, . . . , ini )T for i = 1, 2, . . . , a. Here, ⎡ ⎢ n ⎢ ⎢ n1 ⎢ T X X=⎢ ⎢ ... ⎢ ⎣ na Chapter 4 ⎤ n1 n2 . . . na ⎥ ⎥ n1 0 . . . 0 ⎥ ⎥ ⎥, ... ... ... ... ⎥ ⎥ ⎦ 0 0 . . . na (4.2.13) 114 BIOS 2083 and c Abdus S. Wahed Linear Models ⎛ ⎞ ⎜ i j Yij ⎜ n 1 ⎜ j Y1j ⎜ ⎜ n T 2 X Y=⎜ ⎜ j Y2j ⎜ ⎜ ... ⎜ ⎝ na j Yaj ⎞ ⎛ ⎟ ⎜ Y.. ⎟ ⎜ ⎟ ⎜ Y1. ⎟ ⎜ ⎟ ⎜ ⎟=⎜Y ⎟ ⎜ 2. ⎟ ⎜ ⎟ ⎜ ... ⎟ ⎜ ⎠ ⎝ Ya. ⎞ ⎛ ⎟ ⎜ nȲ.. ⎟ ⎟ ⎜ ⎟ ⎟ ⎜ n1Ȳ1. ⎟ ⎟ ⎜ ⎟ ⎟ ⎜ ⎟ ⎟ = ⎜ n Ȳ ⎟ ⎟ ⎜ 2 2. ⎟ ⎟ ⎜ ⎟ ⎟ ⎜ ... ⎟ ⎟ ⎜ ⎟ ⎠ ⎝ ⎠ naȲa. The normal equations are then calculated as = nȲ.. ⎫ ⎬ niμ + niαi = niȲi., i = 1, 2, . . . , a. ⎭ nμ + a i=1 ni αi Two solutions to this set of normal equations is given by ⎫ ⎬ μ̂(1) = 0 (1) ⎭ α̂ = Ȳ , i = 1, 2, . . . , a, i Chapter 4 (4.2.15) (4.2.16) i. μ̂(2) = Ȳ.. ⎫ ⎬ = Ȳi. − Ȳ.., i = 1, 2, . . . , a. ⎭ and (2) α̂i (4.2.14) (4.2.17) 115 BIOS 2083 Linear Models c Abdus S. Wahed Solutions to the normal equations In Example 4.2.3, the normal equations have a unique solutions, whereas in Example 4.2.4, there are more than one (in fact, infinitely many) solutions. Are normal equations always consistent? If we closely look at the normal equations (4.2.6) XT Xβ = XT Y, (4.2.18) we see that if XT X is non-singular, then there exists a unique solution to the normal equations, namely, β̂ = (XT X)−1XT Y, (4.2.19) which is the case for the simple linear regression in Example 4.2.3, or more generally for any linear regression problem (multiple, polynomial). Chapter 4 116 BIOS 2083 Linear Models c Abdus S. Wahed Theorem 4.2.5. Normal equations (4.2.6) are always consistent. Proof. From Chapter 2, Page 64, a system of equations Ax = b is consistent iff b ∈ C(A). Thus, in our case, we need to show that, XT Y ∈ C(XT X). (4.2.20) Now, XT Y ∈ C(XT ). If we can show that C(XT ) ⊆ C(XT X), then the result is established. Let us look at the following lemma first: Lemma 4.2.6. N (XT X) = N (X). Proof. . If a ∈ N (XT X), then XT Xa = 0 =⇒ aT XT Xa = 0 =⇒ Xa2 = 0 =⇒ Xa = 0 =⇒ a ∈ N (X). (4.2.21) On the other hand, if a ∈ N (X), then Xa = 0, and hence XT Xa = 0 which implies that a ∈ N (XT X), which completes the proof. Chapter 4 117 BIOS 2083 c Abdus S. Wahed Linear Models Now, from the above lemma, and from the result stated in chapter 2, Page 54, and Theorem 2.3.2 on Page 53 , N ⊥ (XT X) = N ⊥ (X) =⇒ C(XT X) = C(XT ), (4.2.22) which completes the proof. Least squares estimator The above theorem shows that the normal equations are always consistent. Using a g-inverse of XT X, we can write out all possible solutions of the normal equations. Namely, β̂ = (X X) X Y + I − (X X) X X c T g T T g T (4.2.23) gives all possible solution to the normal equations (4.2.6) for arbitrary vector c. The estimator β̂ is known as a least squares estimator of β for a given c. Note that one could write all possible solutions using the arbitrariness of the g-inverse of XT X. Chapter 4 118 BIOS 2083 Linear Models c Abdus S. Wahed We know that the orthogonal projection Ŷ of Y onto C(X) is unique. However, the solutions to the normal equations are not. Does any solution of the normal equation lead to the orthogonal projection? In fact, it does. Specifically, if βˆ1 and βˆ2 are any two solutions to the normal equations, then Xβˆ1 = Xβˆ2. (4.2.24) Projection and projection matrix From the equation (4.2.23), the projection of Y onto the column space C(X) is given by the prediction vector Ŷ = Xβ̂ = X(XT X)g XT Y = PY, (4.2.25) where P = X(XT X)g XT is the projection matrix. A very useful lemma: Lemma 4.2.7. XT XA = XT XB if and only if XA = XB for any two matrices A and B. Chapter 4 119 BIOS 2083 Linear Models c Abdus S. Wahed Proposition 4.2.8. Verify (algebraically) the following results: 1. P = X(XT X)g XT is idempotent. 2. P is invariant to the choice of the g-inverse (XT X)g . 3. P is symmetric.(Note (XT X)g does not need to be symmetric). Chapter 4 120 BIOS 2083 Linear Models c Abdus S. Wahed Proposition 4.2.9. If P = X(XT X)g XT is the orthogonal projection onto the column space of XT , then show that XT P = XT . (4.2.26) rank(P) = rank(X). (4.2.27) and Chapter 4 121 BIOS 2083 Linear Models c Abdus S. Wahed Residual vector Definition 4.2.1. The vector e = Y − Ŷ is known to be the residual vector. Notice, e = Y − Ŷ = (In − P)Y, (4.2.28) and Y can be decomposed into two orthogonal components, Y = Ŷ + e, (4.2.29) Ŷ = PY belonging to the column space of X and e = (In − P)Y belonging to N (XT ). Example 4.2.10. Show that Ŷ and e are uncorrelated when the elements of Y are independent with equal variance. Chapter 4 122 BIOS 2083 Linear Models c Abdus S. Wahed Proof. Let cov(Y) = σ 2In. Then, E(ŶeT ) = E(PYYT (In − P)) = PE(YYT )(In − P) = P σ 2In + Xββ T XT (In − P) = σ 2P(In − P) = 0. (4.2.30) Also, E [e] = 0. Together we get, cov(Ŷ, e) = 0. Chapter 4 123 BIOS 2083 Linear Models c Abdus S. Wahed Example 4.2.11. For the simple linear regression problem in exam ple (4.2.3), we find that rank(XT X) = 2, provided (wi − w̄)2 > 0. Then, (XT X)−1 ⎤ ⎡ wi2 − wi 1 ⎦. ⎣ = 2 n (wi − w̄) − wi n (4.2.31) Recall the XT Y matrix, ⎞ ⎛ Yi T ⎠, ⎝ X Y= wiYi (4.2.32) leading to the least squares estimator β̂ = (XT X)−1XT Y ⎡ ⎞ ⎤⎛ 2 wi − wi Yi 1 ⎣ ⎠ ⎦⎝ = 2 n (wi − w̄) wiYi − wi n ⎛ ⎞ Yi wi2 − wi Yi wi 1 ⎝ ⎠ = 2 n (wi − w̄) n wi Yi − wi Yi ⎞ ⎛ Ȳ − β̂ w̄ = α̂ ? ⎠. (4.2.33) = ⎝ n wi Yi − wi Yi = β̂ n (w −w̄)2 i Chapter 4 124 BIOS 2083 c Abdus S. Wahed Linear Models Example 4.2.12. For the one-way ANOVA model in Example (4.2.4), ⎤ ⎡ ⎢ n ⎢ ⎢ n1 ⎢ T X X=⎢ ⎢ ... ⎢ ⎣ na n1 n2 . . . na ⎥ ⎥ n1 0 . . . 0 ⎥ ⎥ ⎥, ... ... ... ... ⎥ ⎥ ⎦ 0 0 . . . na (4.2.34) A g-inverse is given by, ⎡ ⎤ ⎢ 0 0 0 ... 0 ⎢ ⎢ 0 1/n1 0 . . . 0 ⎢ T g (X X) = ⎢ ... ... ... ... ⎢ ... ⎢ ⎣ 0 0 0 . . . 1/na ⎥ ⎥ ⎥ ⎥ ⎥, ⎥ ⎥ ⎦ (4.2.35) The projection, P is obtained as, P = X(XT X)g XT = blockdiag Chapter 4 1 Jn , i = 1, 2, . . . , a. ni i (4.2.36) 125 BIOS 2083 Linear Models c Abdus S. Wahed A solution to the normal equation is then obtained as: ⎛ ⎞ ⎜ 0 ⎟ ⎜ ⎟ ⎜ Ȳ1. ⎟ ⎜ ⎟ ⎜ ⎟ T g T ⎜ β̂ = (X X) X Y = ⎜ Ȳ2. ⎟ ⎟. ⎜ ⎟ ⎜ ... ⎟ ⎜ ⎟ ⎝ ⎠ Ȳa. Corresponding prediction vector Ŷ is given by, ⎞ ⎛ ⎜ 1n1 Ȳ1. ⎟ ⎟ ⎜ ⎜ 1n Ȳ2. ⎟ ⎜ 2 ⎟ Ŷ = PY = Xβ̂ = ⎜ ⎟. ⎜ ... ⎟ ⎟ ⎜ ⎠ ⎝ 1na Ȳa. Notice that, ⎛ ⎜ Y1 − 1n1 Ȳ1. ⎜ ⎜ Y2 − 1n Ȳ2. ⎜ 2 e = (In − P)Y = (Y − Xβ̂) = ⎜ ... ⎜ ⎜ ⎝ Ya − 1na Ȳa. Chapter 4 (4.2.37) (4.2.38) ⎞ ⎟ ⎟ ⎟ ⎟ ⎟. ⎟ ⎟ ⎠ (4.2.39) 126 BIOS 2083 Linear Models c Abdus S. Wahed Ŷ2 = ŶT Ŷ = n1Ȳ1.2 + n2Ȳ2.2 + . . . + naȲa.2 a = ni Ȳi.2. (4.2.40) i=1 and e2 = eT e = Y1T Y1 − n1Ȳ1.2 + Y2T Y2 − n2Ȳ2.2 + . . . + YaT Ya − naȲa.2 a T 2 = Yi Yi − niȲi. = = i=1 ni a i=1 j=1 ni a i=1 j=1 2 2 Yij − Ȳi. Yij2 − a niȲi.2 i=1 = Y − Ŷ2. (4.2.41) “Residual SS” = Total SS -“Regression SS”. Or, Total SS = “Regression SS” + “Residual SS”. Chapter 4 127 BIOS 2083 Linear Models c Abdus S. Wahed Theorem 4.2.13. If β̂ is a solution to the normal equations (4.2.6), then, Y2 = Ŷ2 + e2, (4.2.42) where, Ŷ = Xβ̂ and e = Y − Xβ̂. Proof. Left as an exercise. Definition 4.2.2. Regression SS, Residual SS. The quantity Ŷ2 is referred to as regression sum of squares or model sum of squares, the portion of total sum of squares explained by the linear model whereas the other part e2 is the error sum of squares or residual sum of squares (unexplained variation). Coefficient of determination (R2 ) To have a general definition, let the model Y = Xβ + contains an intercept term, meaning the first column of X is 1n. Total sum of Chapter 4 128 BIOS 2083 Linear Models c Abdus S. Wahed Table 4.1: Analysis of variance Models with/without an intercept term Source df SS Regression (Model) r Y T PY Residual (Error) n−r Y T (I − P)Y Total n YT Y Models with an intercept term Source df SS Mean 1 Y T 1n 1Tn Y/n Regression (corrected for mean) r − 1 Y T (P − 1n 1Tn /n)Y Residual (Error) n−r Y T (I − P)Y Total n YT Y Models with an intercept term Source df Regression (corrected for mean) r − 1 SS Y T (P − 1n 1Tn /n)Y Residual (Error) n−r Y T (I − P)Y Total (corrected) n−1 Y T Y − Y T 1n 1Tn Y/n Chapter 4 129 BIOS 2083 Linear Models c Abdus S. Wahed squares corrected for the intercept term (or mean) is then written as T otal SS(corr.) = YT Y − nȲ 2 1 = YT (In − Jn )Y. n (4.2.43) Similarly, the regression SS is also corrected for the intercept term and is expressed as Regression SS(corr.) = YT PY − nȲ 2 1 = YT (P − Jn )Y. n (4.2.44) This is the portion of total corrected sum of squares that is purely explained by the design variables in the model. However, an equality similar to (4.4.42) applied to the corrected sums of squares still follows, and the ratio YT (P − n1 Jn)Y Reg. SS(Corr.) R = = T otal SS(Corr.) YT (In − n1 Jn)Y 2 (4.2.45) explains the proportion of total variation explained by the model. This ratio is known as the coefficient of determination and is denoted by R2. Chapter 4 130 BIOS 2083 Linear Models c Abdus S. Wahed Two important results: Lemma 4.2.14. Ip − (XT X)g XT X is a projection onto N (X). Proof. Use lemma 2.7.10. Lemma 4.2.15. XT X(XT X)g is a projection onto C(XT ). Proof. Use lemma 2.7.11. Importance: Sometimes it is easy to obtain a basis for the null space of X or column space of XT by careful examination of the relationship between the columns of X. However, in some cases it is not as straightforward. In such cases, independent non-zero columns from the projection matrix Ip − (XT X)g XT X can be used as a basis for the null space of X. Similarly, independent non-zero columns from the projection matrix XT X(XT X)g can be used as a basis for the column space of XT . Chapter 4 131 BIOS 2083 c Abdus S. Wahed Linear Models Example 4.2.16. Example 4.2.12 ⎡ ⎢0 1 ⎢ ⎢0 1 ⎢ ⎢ T T g X X(X X) = ⎢ ⎢0 0 ⎢ ⎢ ... ... ⎢ ⎣ 0 0 continued. ⎤ 1 ... 1 ⎥ ⎥ 0 ... 0 ⎥ ⎥ ⎥ 1 ... 0 ⎥ ⎥. ⎥ ... ... ... ⎥ ⎥ ⎦ 0 ... 1 (4.2.46) Therefore a basis for the column space of XT is given by the last n columns of the above matrix. Similarly, ⎡ ⎢ 1 0 ⎢ ⎢ −1 0 ⎢ ⎢ Ia+1 − (XT X)g XT X = ⎢ ⎢ −1 0 ⎢ ⎢ ... ... ⎢ ⎣ −1 0 ⎤ 0 ... 0 ⎥ ⎥ 0 ... 0 ⎥ ⎥ ⎥ 0 ... 0 ⎥ ⎥. ⎥ ... ... ... ⎥ ⎥ ⎦ 0 ... 0 (4.2.47) Therefore, the only basis vector in the null space of X is (1, −1Ta )T . Chapter 4 132 BIOS 2083 4.3 Linear Models c Abdus S. Wahed Interpreting LS estimator Usually, an estimator is interpreted by the quantity it estimates. Remember, a solution to the normal equation (4.2.6) is given by β̂ = (XT X)g XT Y. What does β̂ really estimates? E(β̂) = (XT X)g XT E(Y) = (XT X)g XT Xβ = Hβ. (4.3.1) Unless X has full column rank, β̂ is not an unbiased estimator of β. It is an unbiased estimator of Hβ, which may not be unique (depends on g-inverse of XT X). Therefore, when X is not of full column rank, the estimator β̂ is practically meaningless. Nevertheless, being a solution to the normal equations, it helps us construct useful estimators for other important functions of β (will discuss later). Estimating E(Y) Even though the normal equations (4.2.6) may not have a unique solution, it facilitates a unique LS estimator for E(Y) = Xβ since E(Ŷ) = E(PY) = PXβ = Xβ = E(Y). Chapter 4 (4.3.2) 133 BIOS 2083 Linear Models c Abdus S. Wahed = Ŷ = Xβ̂ = PY is an unique unbiased estimator of Thus E(Y) E(Y). Introducing assumptions So far the only assumptions we put on the response vector Y or equivalently on the error vector is that E() = 0. (4.3.3) This was a defining assumption of the general linear model. This allowed us to obtain a unique unbiased estimator for the mean response Xβ. However, without further assumptions on the variance of the responses (or, equivalently of the random errors) it is difficult or even impossible to ascertain how efficient this estimator of the mean response is. We will introduce assumptions as we need them. Let us assume that Assumption II. Error components are independently and identically distributed with constant variance σ 2. Chapter 4 134 BIOS 2083 Linear Models c Abdus S. Wahed Variance-covariance matrix for LS estimator Under assumption II, cov(Y) = σ 2In . Variance-covariance matrix cov(β̂) of a LS estimator β̂ = (XT X)g XT Y is given by cov(β̂) = cov((XT X)g XT Y) T = (XT X)g XT cov(Y) (XT X)g XT T = (XT X)g XT X (XT X)g σ 2 (4.3.4) For full rank cases (4.3.4) reduces to the familiar form cov(β̂) = (XT X)−1σ 2. Variance-covariance matrix for Ŷ Example 4.3.1. Show that 1. cov(Ŷ) = Pσ 2 . 2. cov(e) = (I − P)σ 2. Chapter 4 135 BIOS 2083 c Abdus S. Wahed Linear Models Estimating the error variance Note that, using Theorem 3.4.7, E(Residual SS) = E Y (I − P)Y 2 = trace (I − P)σ In + (Xβ)T (I − P)Xβ T = σ 2trace {(I − P)} + β T XT (I − P)Xβ = σ 2(n − r), (4.3.5) where r = rank(X). Therefore, an unbiased estimator of the error variance σ 2 is given by Residual SS Residual M S YT (I − P)Y ˆ 2 σ = = = . n−r n−r n−r 4.4 (4.3.6) Estimability Unless X is of full column rank, solution to the normal equations (4.2.6) is not unique. Therefore, in such cases, a solution to the normal equation does not estimate any useful population quantity. More specifically, we have shown that E(β̂) = Hβ, where H = Chapter 4 136 BIOS 2083 c Abdus S. Wahed Linear Models (XT X)g XT X. Consider the following XT X matrix ⎡ ⎤ ⎢6 3 3⎥ ⎢ ⎥ T ⎢ X X=⎢3 3 0⎥ ⎥ ⎣ ⎦ 3 0 3 (4.4.1) from a one-way ANOVA experiment with two treatments each replicated 3 times. Let us consider two g-inverses ⎡ ⎤ 0 ⎥ ⎢0 0 ⎢ ⎥ ⎥ G1 = ⎢ ⎢ 0 1/3 0 ⎥ ⎣ ⎦ 0 0 1/3 and ⎡ (4.4.2) ⎤ ⎢ 1/3 −1/3 0 ⎥ ⎢ ⎥ ⎢ G2 = ⎢ −1/3 2/3 0 ⎥ ⎥ ⎣ ⎦ 0 0 0 with ⎡ (4.4.3) ⎤ ⎢0 0 0⎥ ⎢ ⎥ T ⎢ H1 = G1X X = ⎢ 1 1 0 ⎥ ⎥ ⎣ ⎦ 1 0 1 Chapter 4 (4.4.4) 137 BIOS 2083 c Abdus S. Wahed Linear Models ⎡ and ⎤ ⎢1 0 1 ⎥ ⎢ ⎥ T ⎢ H2 = G2X X = ⎢ 0 1 −1 ⎥ ⎥ ⎣ ⎦ 0 0 0 respectively. Now, if β = (μ, α1, α2 )T , then, ⎞ ⎛ ⎜ 0 ⎟ ⎟ ⎜ ⎜ H1β = ⎜ μ + α1 ⎟ ⎟ ⎠ ⎝ μ + α2 whereas ⎛ ⎞ ⎜ μ + α1 ⎜ H2β = ⎜ ⎜ α1 − α2 ⎝ 0 ⎟ ⎟ ⎟. ⎟ ⎠ (4.4.5) (4.4.6) (4.4.7) Thus two solutions to the same normal equations set estimate two different quantities. However, in practice, one would like to construct estimators that estimate the same population quantity, no matter what solution to the normal equation is used to derive that estimator. One important goal in one-way ANOVA is to estimate the difference Chapter 4 138 BIOS 2083 Linear Models c Abdus S. Wahed between two treatment effects, namely, δ = α1 − α2 = (0, 1, −1)β. Two different solutions based on the two g-inverses G1 and G2 are given by β̂ 1 = (0, Ȳ1., Ȳ2.)T and β̂ 2 = (Ȳ2., Ȳ1. − Ȳ2., 0)T . If we construct our estimator for δ based on the solution β̂ 1, we obtain δ̂1 = (0, 1, −1)β̂1 = Ȳ1. − Ȳ2., (4.4.8) exactly the quantity you would expect. Now let us see if the same happens with the other solution β̂ 2. For this solution, δ̂2 = (0, 1, −1)β̂2 = Ȳ1. − Ȳ2., (4.4.9) same as δ̂1. Now we will show that no matter what solution you pick for the normal equation, δ̂ will always be the same. To see it, let us write δ̂ as δ̂ = (0, 1, −1)(XT X)g XT Y = Pδ Y, (4.4.10) where Pδ = (0, 1, −1)(XT X)g XT . If we can show that Pδ does not depend on the choice of g-inverse (XT X)g , then we are through. Let Chapter 4 139 BIOS 2083 c Abdus S. Wahed Linear Models us first look at the XT -matrix for this simpler version of one-way ANOVA problem: ⎡ ⎤ ⎢1 1 1 1 1 1⎥ ⎢ ⎥ T ⎢ X =⎢1 1 1 0 0 0⎥ ⎥. ⎣ ⎦ 0 0 0 1 1 1 Notice that, (0, 1, −1)T belongs to C(XT ), e.g. ⎛ ⎞ ⎜ 1 ⎟ ⎟ ⎛ ⎞ ⎡ ⎤⎜ ⎜ 0 ⎟ ⎜ ⎟ 0 1 1 1 1 1 1 ⎜ ⎟ ⎜ ⎟ ⎢ ⎥⎜ ⎜ ⎟ ⎢ ⎥⎜ 0 ⎟ ⎟ ⎜ 1 ⎟ = ⎢ 1 1 1 0 0 0 ⎥⎜ ⎟. ⎜ ⎟ ⎢ ⎥⎜ ⎝ ⎠ ⎣ ⎦ ⎜ −1 ⎟ ⎟ ⎟ −1 0 0 0 1 1 1 ⎜ ⎜ ⎟ ⎜ 0 ⎟ ⎝ ⎠ 0 (4.4.11) (4.4.12) But we know that there exists a unique c ∈ C(X) such that (0, 1, −1)T = XT c. Now, Pδ = (0, 1, −1)(XT X)g XT = cT X(XT X)g XT = cT P. (4.4.13) Since c is unique, and P does not depend on the choice of (XT X)g , Chapter 4 140 BIOS 2083 Linear Models c Abdus S. Wahed from the above equation we see that Pδ , and hence δ̂ = Pδ Y is unique to the choice of a g-inverse. Summary • Not all linear functions of β may be estimated uniquely based on the LS method. • Linear functions λT β of β, where λ is a linear combination of the columns of XT allow unique estimators based on the LS estimator. Estimable functions Definition 4.4.1. θ̂(Y) is an unbiased estimator of θ if and only if E θ̂(Y) = θ, for all θ. Definition 4.4.2. θ̂(Y) is a linear estimator of θ if and only if θ̂(Y) = aT Y + b, for some constant (vector) b and vector (matrix) a. Chapter 4 141 BIOS 2083 Linear Models c Abdus S. Wahed Definition 4.4.3. A linear function θ = λT β is linearly estimable if and only if there exists a linear function cT Y such that E(cT Y) = λT β = θ, for all β. We will drop “linearly” from “linearly estimable” for simplicity. That means “estimable” will always refer to linearly estimable unless mentioned specifically. Example 4.4.1. 1. Components of the mean vector Xβ are estimable. 2. Components of the vector XT Xβ are estimable. Proposition 4.4.2. Linear combinations of estimable functions are estimable. Proof. Follows from the definition 4.4.3. Chapter 4 142 BIOS 2083 Linear Models c Abdus S. Wahed Proposition 4.4.3. A linear function θ = λT β is estimable if and only if λ ∈ C(XT ). Proof. Suppose θ = λT β is estimable. Then, by definition, there exists a vector c such that E(cT Y) = λT β, for all β =⇒ cT Xβ = λT β, for all β =⇒ cT X = λT , =⇒ λ = XT c =⇒ λ ∈ C(XT ). (4.4.14) Now, suppose λ ∈ C(XT ). This implies that λ = XT c for some c. Then, for all β, λT β = cT Xβ = cT E(Y) = E(cT Y). Chapter 4 (4.4.15) 143 BIOS 2083 Linear Models c Abdus S. Wahed Proposition 4.4.4. If θ = λT β is estimable then there exists a unique c∗ ∈ C(X) such that λ = XT c∗. Proof. Proposition 4.4.3 indicates that there exists a c such that λ = XT c. But any vector c can be written as a direct sum of two unique components belonging to two orthogonal complements. Thus, we can find c∗ ∈ C(X) and c∗∗ ∈ N (XT ) such that c = c∗ + c∗∗. (4.4.16) λ = XT c = XT c∗ + XT c∗∗ = XT c∗. (4.4.17) Now Hence, the proof. Proposition 4.4.5. Collection of all possible estimable functions constitutes a vector space of dimension r = rank(X). Proof. Hint: (i) Show that linear combinations of estimable functions are also estimable, and (ii) Use proposition 4.4.3. Chapter 4 144 BIOS 2083 Linear Models c Abdus S. Wahed Methods to determine estimability Method 1. λT β is estimable if and only if it can be expressed as a linear combinations of the rows of Xβ. Method 2. λT β is estimable if and only if λT e = 0 for all bassis vectors e of the null space of X. Method 3. λT β is estimable if and only if λ is a linear combination of the basis vectors of C(XT ). Example 4.4.6. Multiple linear regression (Example 1.1.5 continued..) In the case of multiple regression with p independent variables (which may include the intercept term) and n observations (n > p) , columns of X are all independent. Therefore, N (X) = {0}. By method 2, all linear functions of β are estimable. In particular, 1. Individual coefficients βj are estimable. 2. Differences between two coefficients are estimable. Chapter 4 145 BIOS 2083 c Abdus S. Wahed Linear Models Example 4.4.7. Example 4.2.12 continued. 1. Treatment-specific means μ + αi , i = 1, 2, . . . , a are estimable (using Method 1). 2. Difference between two treatment effects (αi − αi ) is estimable. (Follows from the above, or can be inferred by Method 2). a 3. In general, any linear combination λ β = λ0μ + i=1 λiαi is estimable if and only if λ0 = ai=1 λi. (Use Method 2). T Chapter 4 146 BIOS 2083 Linear Models c Abdus S. Wahed Example 4.4.8. Two-way nested design. Suppose ni patients are randomized to the ith level of treatment A, i = 1, 2, . . . , a and within the ith treatment group a second randomization is done to bi levels of treatment B which are unique to each level of treatment A. The linear model for this problem can be written as Yijk = μ + αi + βij + ijk , i = 1, 2, . . . , a; j = 1, 2, . . . , bi; k = 1, 2, . . . , nij(4.4.18) . Then the X−matrix for this problem is given by ⎡ ⎤ 1 1 0 1 0 0 0 ⎢ ⎥ ⎢ ⎥ ⎢1 1 0 1 0 0 0⎥ ⎢ ⎥ ⎢ ⎥ ⎢1 1 0 0 1 0 0⎥ ⎢ ⎥ ⎢ ⎥ ⎢1 1 0 0 1 0 0⎥ ⎢ ⎥ X=⎢ ⎥, ⎢1 0 1 0 0 1 0⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢1 0 1 0 0 1 0⎥ ⎢ ⎥ ⎢ ⎥ ⎢1 0 1 0 0 0 1⎥ ⎣ ⎦ 1 0 1 0 0 0 1 Chapter 4 (4.4.19) 147 BIOS 2083 c Abdus S. Wahed Linear Models where we have simplified the problem by taking a = 2, b1 = b2 = 2, and n11 = n12 = n21 = n22 = 2. Clearly rank(X) = 4. Dimension of the null space of X is 7 - 4 = 3. A set of basis vectors for the null space of X can be written as: ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ 1 0 0 ⎜ ⎜ ⎜ ⎟ ⎟ ⎟ ⎜ ⎜ ⎜ ⎟ ⎟ ⎟ ⎜ 0 ⎟ ⎜ 1 ⎟ ⎜ 0 ⎟ ⎜ ⎜ ⎜ ⎟ ⎟ ⎟ ⎜ ⎜ ⎜ ⎟ ⎟ ⎟ ⎜ 0 ⎟ ⎜ 0 ⎟ ⎜ 1 ⎟ ⎜ ⎜ ⎜ ⎟ ⎟ ⎟ ⎜ ⎜ ⎜ ⎟ ⎟ ⎟ ⎜ ⎜ ⎟ ⎟ ⎟ e1 = ⎜ = = , e , e ⎜ −1 ⎟ 2 ⎜ −1 ⎟ 3 ⎜ 0 ⎟ ⎜ ⎜ ⎜ ⎟ ⎟ ⎟ ⎜ −1 ⎟ ⎜ −1 ⎟ ⎜ 0 ⎟ ⎜ ⎜ ⎜ ⎟ ⎟ ⎟ ⎜ ⎜ ⎜ ⎟ ⎟ ⎟ ⎜ ⎜ ⎜ ⎟ ⎟ ⎟ ⎜ −1 ⎟ ⎜ 0 ⎟ ⎜ −1 ⎟ ⎝ ⎝ ⎝ ⎠ ⎠ ⎠ −1 0 −1 (4.4.20) Thus, using Method 2, λT β is estimable if λT ej = 0, j = 1, 2, 3. (4.4.21) Specifically, if λ = (λ0, λ1, λ2, λ11, λ12, λ21, λ22)T , then λT β is esChapter 4 148 BIOS 2083 c Abdus S. Wahed Linear Models timable if the following three conditions are satisfied: (1) λ0 = (2) λ1 = (3) λ2 = 2 2 λij , i=1 j=1 2 λ1j , j=1 2 λ2j . (4.4.22) j=1 Let us consider some special cases: 1. Is α1 estimable? 2. Is μ + α1 estimable? 3. Is α1 − α2 estimable? 4. Is α1 − α2 + (β11 + β12)/2 − (β21 + β22)/2 estimable? Chapter 4 149 BIOS 2083 Linear Models c Abdus S. Wahed Definition 4.4.4. Least squares estimator of an estimable function λT β is given by λT β̂, where β̂ is a solution to the normal equations (4.2.6). Properties of least squares estimator Proposition 4.4.9. Uniqueness. Least squares estimator (of an estimable function) is invariant to the choice of a solution to the normal equations. Proof. Let us consider the class of solutions from the normal equations β̂ = (XT X)g XT Y. Least squares estimator of a an estimable function λT β is then given by λT β̂ = λT (XT X)T XT Y. (4.4.23) From proposition 4.4.4, since λT β is estimable, there exists a unique c ∈ C(X) such that λ = XT c. Chapter 4 (4.4.24) 150 BIOS 2083 Linear Models c Abdus S. Wahed Therefore, Equation (4.4.23) combined with (4.4.24) leads to λT β̂ = cT X(XT X)g XT Y = cT PY. (4.4.25) Since both c and P are unique (does not depend on the choice of g-inverse), the result follows. Proposition 4.4.10. Linearity and Unbiasedness. LS estimator is linear and unbiased. Proof. Left as an exercise. Proposition 4.4.11. Variance. Under Assumption II, V ar(λT β̂) = σ 2λT (XT X)g λ. (4.4.26) Proof. V ar(λT β̂) = V ar λT (XT X)g XT Y T = λT (XT X)g XT cov(Y) λT (XT X)g XT T = σ 2λT (XT X)g XT X (XT X)g λ ? = σ 2λT (XT X)g λ. Chapter 4 (4.4.27) 151 BIOS 2083 Linear Models c Abdus S. Wahed Proposition 4.4.12. Characterization. If an estimator λT β̂ of a linear function λT β is invariant to the choice of the solutions β̂ to the normal equations, then λT β is estimable. Proof. For a given g-inverse G of XT X, consider the general form of the solutions to the normal equations: β̂ = GXT Y + (I − GXT X)c (4.4.28) for any vector c ∈ Rp. Then, T T T T λ β̂ = λ GX Y + (I − GX X)c = λT GXT Y + λT (I − GXT X)c. (4.4.29) Since G is given, in order for the above to be equal for all c, we must have λT (I − GXT X) = 0. (4.4.30) λT = λT GXT X. (4.4.31) Or, equivalently, This last equation implies that λ ∈ C(XT ). This completes the proof. Chapter 4 152 BIOS 2083 Linear Models c Abdus S. Wahed Theorem 4.4.13. Gauss-Markov Theorem. Under Assumptions I and II, if λT β is estimable, then the least squares estimator λT β̂ is the unique minimum variance linear unbiased estimator. In the econometric literature, minimum variance is referred to as best and along with the linearity and unbiasedness the least squares estimator becomes best linear unbiased estimator (BLUE). Proof. Uniqueness follows from the proposition 4.4.9. Linearity and unbiasedness follows from the proposition 4.4.10. The only thing remains to be shown is that no other linear unbiased estimator of λT β can have smaller variance than λT β̂. Since λT β is estimable, there exists a c such that λ = XT c. Let a + dT Y be any other linear unbiased estimator of λT β. Then, we Chapter 4 153 BIOS 2083 Linear Models c Abdus S. Wahed must have a = 0 and λT = dT X. Then, XT d = XT c =⇒ XT (c − d) = 0 =⇒ (c − d) ∈ N (XT ) =⇒ P(c − d) = 0 =⇒ Pc = Pd. (4.4.32) Now, by proposition 4.4.11, var(λT β̂) = σ 2cT X(XT X)g XT c = σ 2cT Pc. (4.4.33) and var(dT Y) = σ 2dT d. Chapter 4 (4.4.34) 154 BIOS 2083 Linear Models c Abdus S. Wahed Thus, var(dT Y) − var(λT β̂) = σ 2 dT d − cT Pc = σ 2 dT d − cT P2c = σ 2 dT d − dT P2d = σ 2dT (I − P)d (4.4.35) ≥ 0. Therefore the LS estimator has the minimum variance among all linear unbiased estimators. Equation (4.4.35) shows that var(dT Y) = var(λT β̂) if and only if (I−P)d = 0, or equivalently, d = Pd = Pc, leading to dT Y = cT PY = cT X(XT X)g XT Y = λT β̂. Chapter 4 155 BIOS 2083 c Abdus S. Wahed Linear Models Example 4.4.14. Example 4.4.8 continued. ⎤ ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ T X X=⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ 8 4 4 2 2 2 2⎥ ⎥ 4 4 0 2 2 0 0⎥ ⎥ ⎥ 4 0 4 0 0 2 2⎥ ⎥ ⎥ 2 2 0 2 0 0 0⎥ ⎥, ⎥ 2 2 0 0 2 0 0⎥ ⎥ ⎥ ⎥ 2 0 2 0 0 2 0⎥ ⎦ 2 0 2 0 0 0 2 (4.4.36) a g-inverse of which is given by, ⎤ ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ T g (X X) = ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ Chapter 4 0 0 0 0 0 0 0 ⎥ ⎥ 0 0 0 0 0 0 0 ⎥ ⎥ ⎥ 0 0 0 0 0 0 0 ⎥ ⎥ ⎥ 0 0 0 1/2 0 0 0 ⎥ ⎥. ⎥ 0 0 0 0 1/2 0 0 ⎥ ⎥ ⎥ ⎥ 0 0 0 0 0 1/2 0 ⎥ ⎦ 0 0 0 0 0 0 1/2 (4.4.37) 156 BIOS 2083 Linear Models ⎤T ⎛ ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ T X Y=⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ 1 1 0 1 0 0 0⎥ ⎥ 1 1 0 1 0 0 0⎥ ⎥ ⎥ 1 1 0 0 1 0 0⎥ ⎥ ⎥ 1 1 0 0 1 0 0⎥ ⎥ ⎥ 1 0 1 0 0 1 0⎥ ⎥ ⎥ 1 0 1 0 0 1 0⎥ ⎥ ⎥ ⎥ 1 0 1 0 0 0 1⎥ ⎦ 1 0 1 0 0 0 1 ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ c Abdus S. Wahed ⎞ Y111 ⎟ ⎛ ⎟ ⎜ Y112 ⎟ ⎟ ⎜ ⎟ ⎜ ⎜ Y121 ⎟ ⎟ ⎜ ⎟ ⎜ ⎜ Y122 ⎟ ⎟ ⎜ ⎟=⎜ ⎜ Y211 ⎟ ⎟ ⎜ ⎟ ⎜ ⎜ Y212 ⎟ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ Y221 ⎟ ⎝ ⎠ Y222 ⎞ Y... ⎟ ⎟ Y1.. ⎟ ⎟ ⎟ Y2.. ⎟ ⎟ ⎟ Y11. ⎟ ⎟ ⎟ Y12. ⎟ ⎟ ⎟ ⎟ Y21. ⎟ ⎠ Y22. (4.4.38) Thus, a solution to the normal equations is given by ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ ⎞ ⎛ μ̂ ⎟ ⎜ 0 ⎜ ⎟ ⎜ 0 α̂1 ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ 0 α̂2 ⎟ ⎜ ⎟ ⎜ ⎟ T g T ⎜ Ȳ = (X X) X Y = β̂11 ⎟ ⎜ 11. ⎟ ⎜ ⎟ ⎜ Ȳ ⎟ β̂12 ⎟ ⎜ 12. ⎜ ⎟ ⎜ ⎟ β̂21 ⎟ ⎜ Ȳ21. ⎝ ⎠ Ȳ22. β̂22 ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟. ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ (4.4.39) Therefore the linear MVUE (or BLUE) of the estimable function α1 − α2 + (β11 + β12)/2 − (β21 + β22)/2 is given by (Ȳ11. + Ȳ12.)/2 − (Ȳ21. + Ȳ22.)/2. Chapter 4 157 BIOS 2083 4.4.1 Linear Models c Abdus S. Wahed A comment on estimability and Missing data The concept of estimability is very important in drawing statistical inference from a linear model. What effects can be estimated from an experiment totally depends on how the experiment was designed. For instance, in a two-way nested model, difference between two main effects is not estimable, whereas difference between two nested effects within the same main effect is. In an over-parameterized oneway ANOVA model (One-way ANOVA with an intercept term), the treatment effects are not estimable while the difference between any two pair of treatments is estimated by the difference in corresponding cell means. When observations in some cells are missing, the problem of estimability becomes more acute. We illustrate the concept by using an example. Consider the two-way nested design considered in Example 4.4.8. Suppose after planning the experiment, the observation corresponding to the last two rows of X matrix could not be observed. Chapter 4 158 BIOS 2083 c Abdus S. Wahed Linear Models Thus the observed design matrix is given by ⎤ ⎡ XM ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ =⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ 1 1 0 1 0 0 0⎥ ⎥ 1 1 0 1 0 0 0⎥ ⎥ ⎥ 1 1 0 0 1 0 0⎥ ⎥ ⎥. 1 1 0 0 1 0 0⎥ ⎥ ⎥ 1 0 1 0 0 1 0⎥ ⎥ ⎦ 1 0 1 0 0 1 0 (4.4.40) How does this effect the estimability of certain functions? Note that rank(XM ) = 3. A basis for the null space of XT is given by ⎧ ⎞ ⎞ ⎞ ⎛ ⎛ ⎛ ⎛ ⎞⎫ ⎪ ⎪ ⎪ 1 ⎟ 0 ⎟ 0 ⎟ 0 ⎟⎪ ⎪ ⎪ ⎪ ⎪ ⎜ ⎜ ⎜ ⎜ ⎪ ⎪ ⎪ ⎟ ⎟ ⎟ ⎟ ⎪ ⎜ ⎜ ⎜ ⎜ ⎪ ⎪ ⎪ ⎟ ⎟ ⎟ ⎟ ⎪ ⎜ ⎜ ⎜ ⎜ 0 1 0 0 ⎪ ⎪ ⎪ ⎟ ⎟ ⎟ ⎟ ⎪ ⎜ ⎜ ⎜ ⎜ ⎪ ⎪ ⎪ ⎟ ⎟ ⎟ ⎟ ⎪ ⎜ ⎜ ⎜ ⎜ ⎪ ⎪ ⎪ ⎟ ⎟ ⎟ ⎟ ⎪ ⎜ ⎜ ⎜ ⎜ ⎪ 0 0 1 0 ⎪ ⎪ ⎟ ⎟ ⎟ ⎟ ⎪ ⎜ ⎜ ⎜ ⎜ ⎨ ⎟ ⎟ ⎟ ⎜ ⎜ ⎜ ⎜ ⎟⎬ , e2 = ⎜ , e3 = ⎜ , e4 = ⎜ e1 = ⎜ −1 ⎟ −1 ⎟ 0 ⎟ 0⎟ ⎟ ⎟ ⎟ ⎟⎪ . ⎜ ⎜ ⎜ ⎜ ⎪ ⎟ ⎟ ⎟ ⎜ ⎜ ⎜ ⎜ ⎟⎪ ⎪ ⎪ ⎪ ⎜ −1 ⎟ ⎜ −1 ⎟ ⎜ 0 ⎟ ⎜ 0 ⎟⎪ ⎪ ⎪ ⎪ ⎟ ⎟ ⎟ ⎜ ⎜ ⎜ ⎜ ⎟⎪ ⎪ ⎪ ⎪ ⎟ ⎟ ⎟ ⎜ ⎜ ⎜ ⎜ ⎟⎪ ⎪ ⎪ ⎪ ⎟ ⎟ ⎟ ⎜ ⎜ ⎜ ⎜ ⎟⎪ ⎪ ⎪ ⎪ −1 0 −1 0 ⎪ ⎟ ⎟ ⎟ ⎟ ⎜ ⎜ ⎜ ⎜ ⎪ ⎪ ⎪ ⎪ ⎪ ⎠ ⎠ ⎠ ⎠ ⎝ ⎝ ⎝ ⎝ ⎪ ⎪ ⎪ ⎪ ⎩ 1 1 1 1 ⎭ (4.4.41) 1. Is α1 estimable? α1 = (0, 1, 0, 0, 0, 0, 0)β = λT1 β. λT1 e1 = 0 = λT1 e2 → Not estimable. Chapter 4 159 BIOS 2083 Linear Models c Abdus S. Wahed 2. Is μ + α1 estimable? 3. Is α1 − α2 estimable? 4. Is α1 − α2 + (β11 + β12)/2 − (β21 + β22)/2 estimable? Here, λT4 = (0, 1, −1, 1/2, 1/2, −1/2, −1/2), and λT4 e1 = 0 → Not estimable. 5. Is α1 − α2 + (β11 + β12)/2 − β21 estimable? Here, λT5 = (0, 1, −1, 1/2, 1/2, −1, 0), and you can check that λT5 ej = 0, j = 1, 2, 3, 4 → Estimable. Chapter 4 160 BIOS 2083 4.5 Linear Models c Abdus S. Wahed Least squares estimation under linear constraints Often it is desirable to estimate the parameters from a linear model under certain linear constraints. Two possible scenarios where such constrained minimization of the error sum of squares (Y − Xβ2) becomes handy are as follows. 1. Converting a non-full rank model to a full rank model. A model of non-full rank can be transformed into a full rank model by imposing a linear constraint on the model. Let us take a simple example of a balanced one-way ANOVA with two treatments. The over-parameterized version of this model can be written as Yij = μ + αi + ij , i = 1, 2; j = 1, 2, . . . , n. (4.5.1) We know from our discussion that αi is not estimable in this model. We also know that the X-matrix is not of full rank, Chapter 4 161 BIOS 2083 Linear Models c Abdus S. Wahed leading to more than one solutions for the normal equations 2μ + α1 + α2 = 2Ȳ.. μ + α1 = Ȳ1. μ + α2 = Ȳ2. (4.5.2) One traditional way of obtaining a unique solution is to impose some restrictions on the parameters. A popular one is to treat one of the treatment effect as a reference by setting it equal to zero. Treating α2 = 0 leads to the solution α̂1 = Ȳ1. − Ȳ2. and μ̂ = Ȳ2.. Another restriction that is commonly applied is that the treatment effects are centered to zero. That is, α1 + α2 = 0. If we apply this last restriction to the above normal equations, we obtain a unique solution: μ̂ = Ȳ.., α̂1 = Ȳ1. − Ȳ.., and α̂2 = Ȳ2. − Ȳ... 2. Testing a linear hypothesis. One major goal in statistical analysis involving linear models is to test certain hypothesis regarding the parameters. A certain linear hypothesis can be Chapter 4 162 BIOS 2083 Linear Models c Abdus S. Wahed tested by comparing the residual sum of squares from the model under the null hypothesis to the same from unrestricted model (no hypothesis). Details will follow in Chapter 6. 4.5.1 Restricted Least Squares Suppose the linear model is of the form Y = Xβ + , (4.5.3) where a set of linear restrictions AT β = b (4.5.4) has been imposed on the parameters for given matrices A and b. We want to minimize the residual sum of squares Y − Xβ2 = (Y − Xβ)T (Y − Xβ) (4.5.5) for β to obtain the LS estimators under the constraints (4.5.4).The problem can easily be written as a Lagrangian optimization problem by constructing the objective function E = (Y − Xβ)T (Y − Xβ) + 2λT (AT β − b), Chapter 4 (4.5.6) 163 BIOS 2083 Linear Models c Abdus S. Wahed which needs to be minimized unconditionally with respect to β and λ. Taking the derivatives of (4.5.6) with respect to β and λ and setting them equal to zero, we obtain, XT Xβ + Aλ = XT Y AT β = b. (4.5.7) (4.5.8) The above equations will be referred to as restricted normal equations (RNE). We will consider two different scenarios. CASE I. AT β is estimable. A set of q linear constraints AT β is estimable if and only if each constraint is estimable. If we write A as (a1 a2 . . . aq ) and b = (b1, b2, . . . , bq ), then AT β is estimable iff each component aTi β is estimable. Although, the q constraints need not be independently estimable, but we assume that they are so that is rank(A) = q. If they are not, one can easily reduce them into a set of linearly independent constraints. Now, if (β̂ r , λ̂r ) is a solution to the restricted normal equations, Chapter 4 164 BIOS 2083 c Abdus S. Wahed Linear Models then from (4.5.7) we obtain, β̂ r = (XT X)g (XT Y − Aλ̂r ) = β̂ − (XT X)g Aλ̂r . (4.5.9) From (4.5.8), using (4.5.9), assuming the required inverse exists, T T g λ̂r = A (X X) A −1 (AT β̂ − b). (4.5.10) But we have not yet shown that AT (XT X)g A is invertible. The following proposition takes care of that. Proposition 4.5.1. In terms of the notations of this section, when AT β is estimable, rank(AT (XT X)g A) = rank(A) = q. (4.5.11) Proof. Chapter 4 165 BIOS 2083 c Abdus S. Wahed Linear Models Using (4.5.9) and (4.5.10), it is possible to express the restricted least square estimator β̂ r in terms of an unrestricted LS estimator β̂: −1 β̂ r = β̂ − (XT X)g A AT (XT X)g A (AT β̂ − b). (4.5.12) Example 4.5.2. Take the simple example of one-way balanced ANOVA from the beginning of this section. Consider the restriction α1 − α2 = 0, which can be written as AT β = 0, where ⎞ ⎛ ⎜ 0 ⎟ ⎟ ⎜ ⎜ A=⎜ 1 ⎟ (4.5.13) ⎟ ⎝ ⎠ −1 A g-inverse of the XT X-matrix is given by ⎛ ⎞ 0 0 0 ⎜ ⎜ ⎜ 0 1/n 0 ⎝ 0 0 1/n ⎟ ⎟ ⎟ ⎠ with corresponding unrestricted solution, ⎛ 0 ⎜ ⎜ β̂ = ⎜ Ȳ1. ⎝ Ȳ2. Chapter 4 (4.5.14) ⎞ ⎟ ⎟ ⎟. ⎠ (4.5.15) 166 BIOS 2083 Linear Models c Abdus S. Wahed AT β̂ = Ȳ1. − Ȳ2. . (4.5.16) AT (XT X)g A = 2/n. ⎛ ⎞ 0 ⎜ ⎟ ⎜ ⎟ T g (X X) A = ⎜ 1/n ⎟ . ⎝ ⎠ −1/n (4.5.17) Using these in equation 4.5.12, we obtain ⎛ ⎞ ⎛ 0 0 ⎜ ⎟ ⎜ ⎜ ⎟ ⎜ β̂ r = ⎜ Ȳ1. ⎟ − ⎜ 1/n ⎝ ⎠ ⎝ −1/n Ȳ2. ⎛ ⎞ 0 ⎜ ⎟ ⎜ ⎟ = ⎜ (Ȳ1. + Ȳ2. )/2 ⎟ ⎝ ⎠ (Ȳ1. + Ȳ2. )/2. (4.5.18) ⎞ ⎟ #n$ ⎟ (Ȳ1. − Ȳ2.). ⎟ ⎠ 2 (4.5.19) Is this restricted solution unique? Try with a different g-inverse. (Note you do not have to compute AT (XT X)g A, as it is invariant to the choice of a g-inverse.) Chapter 4 167 BIOS 2083 Linear Models c Abdus S. Wahed Properties of restricted LS estimator Proposition 4.5.3. 1. E β̂r = (XT X)g XT Xβ = Hβ = E β̂ . # $ T g T & −1 T 2. cov β̂ r = σ (X X) D (X X) A . , where D = I−A AT (XT X)g A T 3. E(RSSr ) = E (Y − Xβ̂r ) (Y − Xβ̂ r ) = (n − r + q)σ 2. 2 % T g Proof. We will leave the first two as exercises. For the third one, RSSr = (Y − Xβ̂ r )T (Y − Xβ̂r ) = (Y − Xβ̂ + Xβ̂ − Xβ̂ r )T (Y − Xβ̂ + Xβ̂ − Xβ̂ r ) ∈N (XT ) ∈C(X) T = (Y − Xβ̂) (Y − Xβ̂) + (β̂ − β̂ r )T XT X(β̂ − β̂ r ) −1 T T g T = RSS + (AT β̂ − b)T AT (XT X)g A A (X X) −1 T XT X(XT X)g A AT (XT X)g A (A β̂ − b) −1 T (A β̂ − b). = RSS + (AT β̂ − b)T AT (XT X)g A % & T T g −1 T T T (A β̂ − b) E(RSSr ) = E(RSS) + E (A β̂ − b) A (X X) A % & −1 2 T T g T cov(A β̂ − b) = (n − r)σ + trace A (X X) A % −1 2 T T g & 2 T T g σ A (X X) A = (n − r)σ + trace A (X X) A = (n − r + q)σ 2 . Chapter 4 (4.5.20) 168 BIOS 2083 Linear Models c Abdus S. Wahed CASE II. AT β is not estimable. A set of q linear constraints AT β is non-estimable if and only if each constraint is non-estimable and no linear combination of the linear constraints is estimable. Assume as before that columns of A are independent. That is, rank(A) = q. This means Ac ∈ / C(XT ), for all p × 1 vectors c (why?). This in turn implies that C(A) ∩ C(XT ) = {0} . (4.5.21) On the other hand, from the RNEs, Aλ̂r = XT (Y − Xβˆr ) ∈ C(XT ). (4.5.22) But by definition, Aλ̂r ∈ C(A). (4.5.23) Aλ̂r = 0. (4.5.24) Together we get, Since the columns of A are independent, this last equation implies that λ̂r = 0. The normal equation (4.5.7) then reduces to XT Xβ = XT Y, Chapter 4 (4.5.25) 169 BIOS 2083 Linear Models c Abdus S. Wahed which is the normal equation for the unrestricted LS problem. Thus RNEs in this case have a solution β̂ r = β̂ = (XT X)g XT Y, and (4.5.26) λ̂r = 0. (4.5.27) Therefore, in this case the residual sums of squares from restricted and unrestricted model are identical. i.e. RSSr = RSS. Chapter 4 170 BIOS 2083 4.6 Linear Models c Abdus S. Wahed Problems 1. The least squares estimator of β can be obtained by minimizing Y − Xβ2 . Use the derivative approach to derive the normal equations for estimating β. 2. For the linear model yi = μ + αxi + i , i = 1, 2, 3, where xi = (i − 1). (a) Find P and I − P. (b) Find a solution to the equation Xβ = PY. (c) Find a solution to the equation XT Xβ = XT Y. Is this solution same as the solution you found for the previous equation? (d) What is the null space of XT for this problem? 3. Show that, for any general linear model, the solutions to the system of linear equations Xβ = PY are the same as the solutions to the normal equations XT Xβ = XT Y. 4. Show that (a) I − P is a projection matrix onto the null space of XT , and (b) XT X(XT X)g is a projection onto the column space of XT . 5. (a) If Ag is a generalized inverse of A, then show that A− = Ag AAg + (I − Ag A)B + C(I − AAg ) is also a g-inverse of A for any conformable matrices B nd C. (b) In class, we have shown that β̂ = (XT X)g XT Y is a solution to the normal equations XT Xβ = XT Y for a given g-inverse(XT X)g of XT X. Show that β̃ is a solution to the normal equations if and only if there exists a vector z such that Chapter 4 171 BIOS 2083 Linear Models c Abdus S. Wahed β̃ = (XT X)g XT Y + (I − (XT X)g XT X)z. (Thus, by varying z, one can swipe out all possible solutions to the normal equations.) (c) In fact, β̃ = GXT Y generates all solutions to the normal equations, for all possible generalized inverses G of XT X. To show this, start with the general solution β̃ = (XT X)g XT Y + (I − (XT X)g XT X)z (from part (b)). Also take it as a fact that for a given non-zero vector Y and an arbitrary vector z, there exists an arbitrary matrix M such that z = MY. Use this fact, along with the result from part (a) to write β̃ as GXT Y where G is a g-inverse of XT X. 6. For the general one-way ANOVA model, yij = μ + αi + ij , i = 1, 2, . . . , a; j = 1, 2, . . . , ni , (a) What is the X matrix? (b) Find r(X). (c) Find a basis for the null space of X. (d) Give a basis for the set of all possible linearly independent estimable functions. (e) Give conditions under which c0 μ + ai=1 ci αi is estimable. In particular, is μ estimable? Is α1 − α2 estimable? (f) Obtain a solution to the normal equation for this problem and find the least square estimator of αa − α1 . 7. Consider the linear model Y = Xβ + , E() = 0, cov() = σ 2 In . (4.6.1) Follow the following steps to show that if λT β is estimable, then λT β̂ is the BLUE of λT β, where β̂ is a solution to the normal equations (XT X)β = XT Y. Chapter 4 172 BIOS 2083 c Abdus S. Wahed Linear Models (a) Consider another linear unbiased estimator c + dt Y of λT β. Show that c must be equal to zero and dT X = λT . (b) Now we will show that var(c + dT Y) can be written as the var(λT β̂) plus some non-negative quantity. To do this, write var(c + dt Y) = var(dT Y) = var(λT β̂ + dT Y − λT β̂ ). g(Y) Show that g(Y) defined in this manner is a linear function of Y. (c) Show that λT β̂ and g(Y) are uncorrelated. Hint: Use (i) cov(AY, BY) = Acov(Y)B T (ii) Result from part (b). (d) Hence var(c + dT Y) = var(dT Y) = var(λT β̂) + . . . . In other words, variance of any other linear unbiased estimator is greater than or equal to the variance of the least square estimator. (e) Show that var(c + dT Y) = var(λT β̂) only if c + dT Y = λT β̂. 8. One example of a simple two-way nested model is as follows. Suppose two instructors taught two classes using Teaching Method I, and three instructors taught two classes with Teaching Method II. Let Yijk is the average score for the kth class taught by jth instructor with ith teaching method. The model can be written as: Yijk = μ + αi + βij + ijk . Assume E(ijk ) = 0, and cov(ijk , i1 j1 k1 ) = σ 2 , if i = i1 , j = j1 , k = k1 ; 0, otherwise. (a) Write this model as Y = Xβ + , explicitly describing the X matrix and β. (b) Find r, the rank of X. Give a basis for the null space of X. Chapter 4 173 BIOS 2083 c Abdus S. Wahed Linear Models (c) Write out the normal equations and give a solution to the normal equations. (d) How many linearly independent estimable functions can you have in this problem? Provide a list of such estimable functions and give the least squares estimators for each one. (e) Show that the difference in the effect of two teaching methods is not estimable. 9. Consider the linear model Yij = i−1 βk + ij , i = 1, 2, 3; j = 1, 2; (4.6.2) k=0 with E(ij ) = 0; V ar(ij ) = σ 2 ; cov(ij , i j ) = 0 whenever i = i or j = j. 9(a) Write the above model in the form of a general linear model. Find rank(X). 9(b) Find β = (β0 , β1 , β2 )T such that the quantity ' (2 3 i−1 2 βk E= Yij − i=1 j=1 (4.6.3) k=0 is minimized. Call it β̂ = (βˆ0 , βˆ1 , βˆ2 ). 9(c) Find the mean and variance of β̂. For the rest of the parts of this question, assume that ij ’s are normally distributed. 9(d) What is the distribution of β̂? 9(e) What is the distribution of βˆ1 ? 9(f) What is the distribution of D = βˆ1 − βˆ2 ? 9(g) Find the distribution of Ê = 3 2 i=1 j=1 Chapter 4 ' Yij − i−1 (2 βˆk . (4.6.4) k=0 174 BIOS 2083 Linear Models c Abdus S. Wahed 9(h) Are D and Ê independent? ) 9(i) Find the distribution of D/ Ê. 10. Consider the analysis of covariance model Yij = μ + αi + γXij + ij , i = 1, 2; j = 1, 2, . . . , n, where Xij represents the value of a continuous explanatory variable. (a) Write this model as Y = Xβ + , explicitly describing the X matrix and β. (b) Find r, the rank of X. Give a basis for the null space of X. (c) Give a basis for the null space of X. (d) Is the regression coefficient γ estimable? (e) Give conditions under which a linear function aμ+bα1 +cα2 +dγ will be estimable. For the rest of the problem, assume n = 5, and Xi1 = −2, Xi2 = −1, Xi3 = 0, Xi4 = 1, and Xi5 = 2, i = 1, 2. (f) Give an expression for the LS estimator of γ and α1 − α2 , if exists. (g) Obtain the LS estimator of γ under the restriction that α1 = α2 . (h) Obtain the LS estimator of α1 − α2 under the restriction that γ = 0. (i) Obtain the LS estimator of γ under the restriction that α1 + α2 = 0. 11. Consider the two-way crossed ANOVA model with an additional continuous baseline covariate Xij : Yij = μi + αj + γXij + ij , i = 1, 2; j = 1, 2; k = 1, 2, Chapter 4 (4.6.5) 175 BIOS 2083 c Abdus S. Wahed Linear Models under usual assumptions (I and II from lecture note). Let the parameter vector be β = (μ1 , μ2 , α1 , α2 , γ)T and X be the corresponding X matrix. Define X̄i. = 2j=1 Xij /2, i = 1, 2 and X̄.j = 2i=1 Xij /2, j = 1, 2. (a) Find rank(X). (b) Give a basis for the null space of X. (c) Give conditions under which λT β will be estimable. In particular: i. Is γ estimable? ii. Is μ1 − μ2 estimable? iii. Is α1 − α2 + γ(X̄1. − X̄2. ) estimable? iv. Is μ1 − μ2 + γ(X̄.1 − X̄.2 ) estimable? v. Is μ1 + γ(X̄.1 + X̄.2 )/2 estimable? 12. Consider the linear model: Yijk = βi + βj + ijk , i, j = 1, 2, 3; i < j; k = 1, 2, (4.6.6) so that there are a total of 6 observations. (a) Write the model in matrix form and compute the X T X-matrix. (b) Write down the normal equations explicitly. (c) Give condition(s), if any, under which a linear function 3 i=1 λi βi is estimable, where λi , i = 1, 2, 3 are known constants. (d) If the observation corresponding to (i, j) = (2, 3) is missing, then the above model reduces to a familiar model. How would you respond to part (c) in this situation? Chapter 4 176 BIOS 2083 Linear Models c Abdus S. Wahed 13. I have come across a tiny dataset with 5 variables y, x1 , x2 , x3 , and x4 . I use SAS for most of my day-to-day data analysis work. Here are the data, program, and the result of an analysis to “regress” y on x1 , x2 , x3 , and x4 . data x; input y x1 x2 x3 x4; cards; 11 1 -3 0 4 21 1 -2 1 3 13 1 -1 0 2 45 1 0 1 1 50 1 1 0 0 ;run; proc glm; model y=x1 x2 x3 x4/noint solution; estimate "2b1+b2+b4" x1 2 x2 1 x3 0 x4 1; estimate "2b1-b2-b3" x1 2 x2 -1 x3 -1 x4 0; estimate "b1+b2" x1 1 x2 1 x3 0 x4 0; estimate "b4" x1 0 x2 0 x3 0 x4 1; estimate "b1+b4" x1 1 x2 0 x3 0 x4 1; run; quit; Output: ======================================== Chapter 4 177 BIOS 2083 Linear Models c Abdus S. Wahed /* Parameter Estimates*/ Parameter Estimate SE t Pr > |t| x1 34.86666667 B 6.78167465 5.14 0.0358 x2 10.20000000 B 3.25781113 3.13 0.0887 x3 8.33333333 9.40449065 0.89 0.4690 x4 0.00000000 B . . . /* Contrast Estimates*/ Parameter Estimate SE t Pr > |t| 2b1+b2+b4 79.9333333 15.3958147 5.19 0.0352 b1+b2 45.0666667 8.8221942 5.11 0.0363 b1+b4 34.8666667 6.7816747 5.14 0.0358 I am puzzled by several things I see in the output. (a) All the parameter estimates except the one corresponding to x3 has a letter ‘B’ next to it. What explanation can you provide for that? (b) What happens to the parameter estimates if you set-up the model as ‘model y=x2 x3 x4 x1’ or ‘y=x1 x2 x4 x3’ ? Can you explain the differences across these three sets of parameter estimates? (c) Although I set up 5 contrasts, the output only shows three of them. Why? Justify your answers using the techniques you have learned in Chapter 4. 14. Consider the simple linear model Yi = μ + α (−1)i , i = 1, 2, . . . , 2n − 1, 2n. Chapter 4 (4.6.7) 178 BIOS 2083 Linear Models c Abdus S. Wahed (a) Show that U = (Y2 + Y1 )/2 and V = (Y2 − Y1 )/2 are unbiased estimators of μ and α, respectively. What is the joint distribution of U and V under normality and independence assumptions for Yi ’s? You can make any assumptions regarding the variances of Yi ’s. (b) Find the least square estimators of μ and α, respectively. Obtain their joint distribution under the same assumption as above. Are they independently distributed? (c) Compare estimators in (a) and (b) and comment. 15. In an experiment where several treatments are compared with a control, it may be desirable to replicate the control more than the experimental treatments since the control enters into every difference investigated. Suppose each of t experimental treatment is replicated m times while the control is replicated n (> m) times. Let Yij denote the jth observation on the ith experimental treatment, j = 1, 2, . . . , m; i = 1, 2, . . . , t, and let Y0j denote the jth observation on the control, j = 1, 2, . . . , n. Assume the following linear model: Yij = τi + ij , where i = 0, 1, 2, . . . , t; j = 1, 2, . . . , n for i = 0, and j = 1, 2, . . . , m for i = 1, 2, . . . , t. (a) Write the above linear model in the matrix form Y = Xβ + by explicitly specifying the random response vector Y, the design matrix X, and the parameter vector β. (b) Show that the differences between the treatments and controls θi = τi − τ0 , i = 1, 2, . . . , t are estimable. (c) Obtain the least square estimator of θi , i = 1, 2, . . . , t. Chapter 4 179 BIOST 2083: Linear Models Homework Assignment I Due: September 9 2010 1. Determine if the following are linear models or not: (a) Yi = α + βxi + i , where Yi , i = 1, 2, . . . , n are independent Bernoulli random variables, and xi ’s are fixed values of a continuous covariate. (b) Yi , i = 1, 2, . . . , n are independent Poisson random variables with mean eα + βxi , where xi ’s are fixed values of a continuous covariate. (c) E[Y1i /Y2i ] = α + βxi , where (Y1i , Y2i ), i = 1, 2, . . . , n are pre- and post-treatment levels of certain cytokine from the ith individual, and xi ’s are fixed values of a continuous covariate. 2. Solve problems 1, 4, and 8 from Chapter 1. 3. Assuming i = 1, 2, . . . , n and j = 1, 2, write model (1.1.8) in matrix form. What is rank(X)? Find a basis for the null space of X.