Stat 103 Homework 6 - Solutions Spring 2014 Matrices in One-Way ANOVA Although you will have plenty of opportunities to use software to perform an analysis of variance with one qualitative independent variable, you will go through the process once by hand for the "toy" data set: Factor Level y A 11 A 19 B 32 C 22 B 42 C 30 The analysis of variance is a linear model because it can be written in the form Y Xβ ε , Where Y is the response vector, X is the design matrix, β is the coefficient vector, and ε is the error vector. (The form of the design matrix and coefficient vector depends on the model being used. You will explore both the cell means and factor effects models in this assignment.) As in regression, the least square estimates of the coefficients is given by b XT X 1 ˆ Xb . X T Y , and the vector of fitted values is Y 1. The Cell Means Model. Create three dummy variables, one for each treatment (level) to make up the 1 1 0 three columns of the design matrix. The resulting matrix is X 0 0 0 0 0 1 1 0 0 0 0 0 , where the least squares 0 1 1 solution solves the equation X TXb = X TY . On a separate piece of paper, a. Show that the columns of X are independent by showing that they are orthogonal. By direct computation you show that the inner products between any two columns equals zero. b. Find X TX 2 0 0 T X X 0 2 0 0 0 2 c. Find X TY 30 T X Y 52 74 d. Because the columns of X are independent, X TX will be full rank, i.e., invertible. Find X T X you've computed X TX correctly, you shouldn't need a formula for this.) 0 0 1 / 2 1 T X X 0 1/ 2 0 pretty much by inspection. 0 1 / 2 0 e. Find b XT X 1 X T Y . What do the entries in b represent? 1 . (If Y1 15 b X X X Y 26 . The coefficient vector b is the vector of sample treatment means b Y2 . Y3 37 ˆ Xb Find the vector of fitted values, Y f. T 1 T 15 15 ˆ Xb 37 Y 37 26 26 2. The Factor Effects Model. This model is sometimes thought of as being more "regression-like". The first column of the design matrix is a column of ones (similar to the first column in a regression analysis). Then, create two dummy variables, one each for treatments A and B (none is needed for treatment C in this set-up). Note: These dummy variables have the special form 1 , if the i th factor is involved X i 0 ,if another factor is involved . For this example, the resulting design matrix takes the form th 1 ,if the k factor is involved 1 1 0 1 1 0 1 0 1 T T X , where the least squares solution solves the equation X Xb = X Y as before. 1 0 1 1 1 1 1 1 1 a. Show that the first column of X is orthogonal to the other two. Columns two and three aren't orthogonal, but they are independent. Thus the design matrix has rank 3. Note: columns two and three are independent because neither is a multiple of the other. b. Find X TX 6 0 0 T X X 0 4 2 0 2 4 c. Find X TY 156 T X Y 22 22 d. Because the columns of X are independent, X TX will be full rank, i.e., invertible. Show that 0 0 1 / 6 1 T X X 0 1 / 3 1 / 6 . 0 1 / 6 1 / 3 By direct computation we show that X T X X T X e. Find b XT X b XTX 1 1 1 I , where I is the 3x3 identity matrix. X T Y . What do the entries in b represent? Y 26 ˆ 1 , where ̂1 and ̂ 2 are the X T Y 11 . The coefficient vector b is the vector b ˆ 11 2 ˆ 3 ˆ1 ˆ 2 11 11 0 because the estimated treatment effects for treatments A and B. Note: treatment effects must sum to zero in this model. ˆ Xb f. Find the vector of fitted values, Y 15 15 ˆ Xb 37 , as for the Cell Means Model. Y 37 26 26