Regression Analysis Example 3.1: Yield of a chemical process 3. Linear Models Yield (%) Temperature (oF) Time (hr) Y X1 X2 77 160 1 82 165 3 84 165 2 89 170 1 94 175 2 Unied approach { regression analysis { analysis of variance { balanced factorial experiments Extensions { analysis of unbalanced experiments { models with correlated responses split plot designs repeated measures { models with xed and random components (mixed models) Simple linear regression model Yi = 0 + 1 X1i + 2 X2i + i i = 1; 2; 3; 4; 5 124 Matrix formulation: 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 Analysis of Variance (ANOVA) Y1 0 + 1 X11 + 2 X21 + 1 Y2 0 + 1 X12 + 2 X22 + 2 Y3 = 0 + 1 X13 + 2 X23 + 3 Y4 0 + 1 X14 + 2 X24 + 4 Y5 0 + 1 X15 + 2 X25 + 5 3 2 3 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 0 + 1 X11 + 2 X21 1 0 + 1 X12 + 2 X22 2 = 0 + 1 X13 + 2 X23 + 3 0 + 1 X14 + 2 X24 4 0 + 1 X15 + 2 X25 5 2 3 2 3 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 1 1 = 1 1 1 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 X11 X12 X13 X14 X15 X21 X22 X23 X24 X25 3 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 2 6 6 6 6 6 6 4 125 1 0 2 1 + 3 2 4 5 3 7 7 7 7 7 7 5 2 3 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 Source of Variation Sums of d.f. Squares Model 2 Error 2 C. total 4 5 X i i (Y^ Y )2 1 2 SSmodel (Y Y^ )2 1 2 SSerror =1 5 X i i =1 5 X Mean Squares =1 : i (Y i i Y )2 : where Y: = n1 n Yi i=1 Y^i = b0 + b1 X1i + b2 X2i X n = total number of observations 126 127 Example 3.2. Blood coagulation times (in seconds) for blood samples from six dierent rats. Each rat was fed one of three diets. Diet 1 Diet 2 Diet 3 Y11 = 62 Y21 = 71 Y31 = 72 Y12 = 60 Y32 = 68 Y33 = 67 observed time for the j -th rat fed the i-th diet " mean time for rats given the i-th diet 2 3 2 3 2 3 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 " Y " X 2 3 6 6 6 6 6 6 6 6 4 7 7 7 7 7 7 7 7 5 " - random error with E ( ) = 0 E (Y) = X and V ar(Y) = ij 129 128 An \eects" model Yij = + i + ij This can be expressed as Y11 1100 Y12 1100 Y21 = 1 0 1 0 1 + Y31 1 0 0 1 2 Y32 1 0 0 1 3 Y33 1001 or Y = X + : This is a linear model with 2 3 2 3 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 " Assuming that E (ij ) = 0 for all (i; j ), this is a linear model with \Means" model Yij = i + ij % You can express this model as Y11 100 11 Y12 1 0 0 1 12 Y21 = 0 1 0 2 + 21 Y31 0 0 1 3 31 Y32 001 32 Y33 001 33 2 3 6 6 6 6 6 6 6 6 6 6 6 6 6 4 7 7 7 7 7 7 7 7 7 7 7 7 7 5 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 11 12 21 31 32 33 3 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 You could add the assumptions independent errors homogeneous variance, i.e., V ar(ij ) = 2 to obtain a linear model Y = X + : with E (Y) = X and V ar(Y) = 130 E (Y) = X V ar(Y) = V ar() = 2 I 131 Analysis of Variance (ANOVA) Source of Variation d.f. Diets 3-1=2 3 X Error (n 3 X C. total i (n 1) = 5 i =1 i 1) = 3 i =1 i Sums of Squares 3 X n (Y Y )2 =1 i ni 3 X X =1 j =1 i 3 X i =1 i: :: (Y ij ni X j =1 Mean Squares 1 2 SSdiets Y i: (Y 1 3 )2 SSerror Y )2 ij :: where Example 3.3. A 2 2 factorial experiment Experimental units: 8 plots with 5 trees per plot. Factor 1: Variety (A or B) Factor 2: Fungicide use (new or old) Response: Percentage of apples with spots Percentage of apples with spots ni = number of rats fed the i-th diet Yij Yi: = n1 i j =1 nXi Y111 Y112 Y121 Y122 Y211 Y212 Y211 Y222 Y:: = n1 3 Yij : i=1 j =1 X n: = nXi 3 n i=1 i X = total number of observations Variety Fungicide use A A A A B B B B new new old old new new old old = 4.6 = 7.4 = 18.3 = 15.7 = 9.8 = 14.2 = 21.1 = 18.9 132 133 Linear model: Y = + + + Æ + " " " " " pervariety fung. interrandom cent eects use action error with (i=1,2) (j=1,2) spots ijk Analysis of Variance (ANOVA) Source of Variation Sums of Squares d.f. 2 Varieties 2 1=1 4 X (Y Fungicide use Variety Fung. use interaction 2 1=1 4 (2 1)(2 Error 4(2 Corrected total 8 =1 1) 1) = 4 1=7 =1 2 X Y i:: ::: i 2 j =1 2 X (Y Y :j: 2 X =1 j =1 i ij: )2 Y + Y Y )2 Y )2 Y )2 i:: :j: ::: X X X i j k X X X i j k (Y (Y ijk ijk j ij ijk )2 ::: (Y i ij: ::: 134 Here we are using 9 parameters T = ( 1 2 1 2 Æ11 Æ12 Æ21 Æ22) to represent the 4 response means, E (Yijk) = ij ; i = 1; 2; and j = 1; 2; corresponding to the 4 combinations of levels of the two factors. 135 \Means" model Yijk = ij + ijk Write this model in the form % Y =X+ Y111 Y112 Y121 Y122 Y211 Y212 Y221 Y224 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 3 2 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 = 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 2 111 112 121 122 211 212 221 222 3 + 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 1 1 0 0 1 1 0 0 0 0 1 1 0 0 1 1 1 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 1 3 2 6 76 76 76 76 76 6 76 76 76 76 7 76 76 6 76 76 76 76 7 76 76 56 6 4 1 2 1 2 Æ11 Æ12 Æ21 Æ22 ij = E (Yijk) = mean percentage of apples with spots 3 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 This linear model can be written in the form Y = X + , that is, 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 Y111 Y112 Y121 Y122 Y211 Y212 Y221 Y222 3 2 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 136 Any linear model can be written as Y = X + 6 6 6 6 6 6 6 4 " Y1 Y2 .. Y n 3 2 7 7 7 7 7 7 7 5 6 6 6 6 6 6 6 4 = X11 X12 .. X1 n X21 X 1 X22 X 2 .. .. X2 X k k " n kn observed the elements of X responses are known (non-random) values 3 2 7 76 76 76 7 76 74 5 0 0 1 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 1 3 2 7 7 7 72 7 7 76 76 76 7 76 76 76 74 7 7 7 7 7 5 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 11 12 21 22 3 7 7 7 7 7 7 5 + 111 112 121 122 211 212 221 222 3 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 137 General Linear Model 2 = 1 1 0 0 0 0 0 0 Y1 Y = Y..2 is a random vector with Yn 2 3 6 6 6 6 6 6 6 6 6 6 6 6 6 4 7 7 7 7 7 7 7 7 7 7 7 7 7 5 (1) E (Y) = X for some known n k matrix X of constants and unknown k 1 parameter vector 1 .. k 2 3 7 7 7 7 5 + 6 6 6 6 6 6 6 4 1 2 .. 3 7 7 7 7 7 7 7 5 n " random errors are not observed 138 (2) Complete the model by specifying a probability distribution for the possible values of Y or 139 Sometimes we will only specify the covariance matrix Gauss-Markov Model V ar(Y) = Defn 3.1: The linear model Y =X+ is a Gauss-Markov model if Y = X + V ar(Y) = V ar() = 2 I Since we have for an unknown constant 2. = Y X = Y E (Y) Notation: Y ( X ; 2 I ) " " " and distributed E (Y) V ar(Y) as E () = 0 V ar() = V ar(Y) = The distribution of Y is not completely specied. 141 140 Normal Theory Gauss-Markov Model Defn 3.2: A normal theory GaussMarkov model is a GaussMarkov model in which Y (or ) has a multivariate normal distribution. The additional assumption of a normal distribution is (1) not needed for some estimation results (2) useful in creating condence intervals tests of hypotheses Y N (X ; 2 I ) % distr. as " - - multivar. E (Y) V ar(Y) normal distr. 142 143 Objectives (iii) Analysis of Variance (ANOVA) (i) Develop estimation procedures Estimate ? Estimate E(Y) = X Estimable functions of . (iv) Tests of hypotheses Distributions of quadratic forms F-tests power (ii) Quantify uncertainty in estimates variances, standard deviations distributions condence intervals (v) sample size determination 144 145 Least Squares Estimation OLS Estimator For the linear model with Defn 3.3: For a linear model with E (Y) = X, any vector b that minimizes the sum of squared residuals E (Y) = X and V ar(Y) = we have Y1 X11 Y2 = X12 .. .. Yn X1n and 2 3 2 6 6 6 6 6 6 6 6 6 6 6 6 6 4 7 7 7 7 7 7 7 7 7 7 7 7 7 5 6 6 6 6 6 6 6 6 6 6 6 6 6 4 X21 Xk1 X22 Xk2 ..1 + ..1 .. . X2n Xkn k n 3 7 7 7 7 7 7 7 7 7 7 7 7 7 5 2 3 2 3 6 6 6 6 6 6 6 6 4 7 7 7 7 7 7 7 7 5 6 6 6 6 6 6 6 6 4 7 7 7 7 7 7 7 7 5 Yi = 1 X1i + 2 X2i + + k Xki + i = XTi + i where XTi = (X1i X2i Xki) is the i-th row of the model matrix X . 146 n Q(b) = i=1 (Yi XTi b)2 = (Y X b)T (Y X b) X is an ordinary least squares (OLS) estimator for . 147 OLS Estimating Equations For j = 1; 2; : : : ; k, solve (b) = 2 n (Y XT b) X 0 = @Q ij i i=1 i @bj X These equations are expressed in matrix form as 0 = X T (Y X b) = XT Y XT Xb If Xnk has full column rank, i.e., rank(X ) = k, then (i) X T X is non-singular (ii) (X T X ) 1 exists and is unique Consequently, (X T X ) 1(X T X )b = (X T X ) 1X T y or XT Xb = XT Y These are called the \normal" equations. 148 and b = (X T X ) 1 X T Y is the unique solution to the normal equations. 149 Generalized Inverse If rank(X ) < k, then (i) there are innitely many solutions to the normal equations (ii) if (X T X ) is a generalized inverse of X T X , then b = (X T X ) X T Y is a solution of the normal equations. 150 Defn 3.4: For a given m n matrix A, any n m matrix G that satises AGA = A is a generalized inverse of A. Comments We will often use A to denote a generalized inverse of A. There may be innitely many generalized inverses. If A is an m m nonsingular matrix, then G = A 1 is the unique generalized inverse for A. 151 Example 3.5. 2 A= 6 6 6 6 6 6 6 6 4 16 6 10 6 21 15 10 15 25 Example 3.2. \means" model 3 7 7 7 7 7 7 7 7 5 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 with rank(A) = 2 A generalized inverse is 2 G= 6 6 6 6 6 6 6 6 6 6 4 1 20 01 0 0 30 0 0 0 501 3 7 7 7 7 7 7 7 7 7 7 5 Y11 1 0 0 Y12 1 0 0 Y21 = 0 1 0 Y31 0 0 1 Y32 0 0 1 Y33 0 0 1 G= 16 6 0 6 21 0 2 3 0 0 0 7 7 7 7 7 5 = 6 6 6 6 6 6 4 3 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 2 6 6 6 6 6 6 4 11 12 1 2 + 21 31 3 32 33 3 7 7 7 7 7 7 5 n1 0 0 X T X = 0 n2 0 0 0 n3 Another generalized inverse is 3 1 5 2 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 21 300 6 300 0 2 3 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 For this model Check that AGA = A. 2 2 6 4 6 6 6 6 4 3 6 300 16 300 0 0 0 0 2 3 6 6 6 6 6 6 4 7 7 7 7 7 7 5 Y11 + Y12 X T Y = Y21 Y31 + Y32 + Y33 3 7 7 7 7 7 7 5 2 3 6 6 6 6 6 6 4 7 7 7 7 7 7 5 152 153 Example 3.2. \eects" model 2 and the unique OLS estimator for = (1 2 3)T is b = (X T X ) 1 n1 0 = 0 n12 0 0 2 6 6 6 6 6 6 6 6 6 6 6 6 4 1 XT Y 0 0 1 n3 3 7 7 7 7 7 7 7 7 7 7 7 7 5 2 6 6 6 6 6 6 6 6 4 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 Y11 1 1 0 Y12 1 1 0 Y21 = 1 0 1 Y31 1 0 1 Y32 1 0 0 Y33 1 0 0 Here Y1: Y1: Y2: = Y2: Y3: Y3: 3 2 3 7 7 7 7 7 7 7 7 5 6 6 6 6 6 6 6 6 4 7 7 7 7 7 7 7 7 5 154 3 2 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 0 0 0 1 1 1 3 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 2 6 6 6 6 6 6 6 6 6 6 4 n n1 X T X = nn12 n01 n3 0 Y:: X T Y = YY12:: Y3: 2 6 6 6 6 6 6 6 6 6 6 6 6 6 4 2 3 6 6 6 6 6 6 6 6 6 6 6 6 6 4 7 7 7 7 7 7 7 7 7 7 7 7 7 5 11 12 1 + 21 2 31 3 32 33 3 7 7 7 7 7 7 7 7 7 7 5 2 3 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 n2 n3 0 0 n2 0 0 n3 3 7 7 7 7 7 7 7 7 7 7 7 7 7 5 155 Solution B: Another generalized inverse Solution A: 0 0 0 0 0 1 0 0 (X T X ) = 0 n01 1 0 n2 0 0 0 n13 2 3 6 6 6 6 6 6 6 6 6 6 6 6 6 4 7 7 7 7 7 7 7 7 7 7 7 7 7 5 for X T X is (X T X ) and a solution to the normal equations is (X T X ) b = 0 0 0 n1 1 = 0 0 0 0 0 1: Y = Y2: Y3: XT Y 0 0 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 2 3 6 6 6 6 6 6 6 6 6 6 4 7 7 7 7 7 7 7 7 7 7 5 0 0 0 n2 1 0 n3 1 3 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 2 6 6 6 6 6 6 6 6 6 6 4 Let Y:: Y1: Y2: Y3: n n 1 n2 n1 n1 0 n2 0 n2 1 3 7 7 7 7 7 7 5 0 0 0 n n1 n2 A = n1 n1 0 n2 0 n 2 3 7 7 7 7 7 7 7 7 7 7 5 = 2 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 6 6 6 4 2 3 6 6 6 6 6 6 4 7 7 7 7 7 7 5 0 0 0 0 A 1 = n1 3 2 6 6 6 6 6 6 6 6 4 A 1 = = 1 1 1 1 n1+n3 n1 1 1 1 3 n2+n3 n2 1 jA11j jA12j jA13j 2 jAj 6 6 6 4 1 2 n1n2n3 6 6 6 4 jA21j jA22j jA23j jA31j jA32j jA33j 3 7 7 7 5 n1n2 n1n2 n1n2 n1n2 n2(n1 + n3) n1n2 n1n2 n1n2 n1(n2 + n3) b = (X T X ) X T Y 1 1 1 0 = n1 3 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 1 n1+n3 n1 1 0 1 0 1 0 n2+n3 0 n2 0 0 3 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 2 6 6 6 6 6 6 6 6 6 6 4 Y:: Y1: Y2: n1+n3 = n1 Y:: + ( n1 )nY2+1:n+3 Y2: Y:: + Y1: + ( n2 )Y2: 3 Y:: Y1: Y2: Y3: 2 3 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 0 3 7 7 7 7 7 7 7 7 7 7 5 3 7 7 7 5 157 Then 7 7 7 7 7 7 7 7 5 and a solution to the normal equations is 2 7 7 7 7 7 7 7 7 7 7 7 5 and use result 1.4(ii) to compute 156 Then 3 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 b = Y1: Y2: Y3: 0 3 Y3: Y3: 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 This is the OLS estimator for = 12 3 2 3 6 6 6 6 6 6 6 6 6 6 6 6 6 4 7 7 7 7 7 7 7 7 7 7 7 7 7 5 reported by PROC GLM in the SAS package, but it is not the only possible solution to the normal equations. 158 159 Solution C: Another generalized inverse for X T X is 2 (X T X ) = 1 2 n 1 n 1 n 1 n 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 1 n 1 n1 1 n n 0 0 0 0 0 1 n2 0 1 n3 Evaluating Generalized Inverses 3 Algorithm 3.1: 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 The corresponding solution to the normal equations is b = (X T X ) X T Y 2 = 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 2 1 n 1 n 1 n 1 n 1 n 1 n1 1 n n 0 0 0 0 0 1 n2 0 1 n3 3 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 Y:: Y:: Y1: = Y1: Y:: Y2: Y2: Y:: Y3: Y3: Y:: 2 6 6 6 6 6 6 6 6 6 6 4 3 2 3 7 7 7 7 7 7 7 7 7 7 5 6 6 6 6 6 6 6 6 6 6 4 7 7 7 7 7 7 7 7 7 7 5 (i) Find any r r nonsingular submatrix of A where r=rank(A). Call this matrix W. (ii) Invert and transpose W , ie., compute (W 1)T . (iii) Replace each element of W in A with the corresponding element of (W 1)T (iv) Replace all other elements in A with zeros. (v) Transpose the resulting matrix to obtain G, a generalized inverse for A. 161 160 Example 3.6. 4 1 2 0 1 5 15 A= 1 1 3 5 3 2 3 6 6 6 6 6 6 4 7 7 7 7 7 7 5 - Another solution W = 11 53 2 3 6 6 4 7 7 5 (W 1)T = 0 G = 0 0 0 2 6 6 6 6 6 6 6 6 6 4 2 = 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 3 2 5 2 0 0 0 0 0 0 1 0 2 1 0 2 0 0 3 2 1 2 0 5 2 1 2 0 2 6 6 6 4 3=2 5=2 1=2 1=2 3 7 7 7 5 4 1 2 0 A = 1 1 5 15 3 1 3 5 W = 43 05 2 3 6 6 6 6 6 6 4 7 7 7 7 7 7 5 2 3 6 6 4 7 7 5 (W 1)T = 2 6 6 6 6 4 T 3 7 7 7 7 7 7 7 7 7 5 0 0 G = 0 0 0 0 20 0 0 2 6 6 6 6 6 6 6 6 6 4 3 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 5 20 2 = Check: AGA = A 162 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 3 20 0 4 20 0 200 0 0 0 0 0 0 3 0 4 20 20 5 20 3 7 7 7 7 7 7 7 7 7 5 5 20 0 20 3 20 4 20 3 7 7 7 7 5 T 3 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 163 Algorithm 3.2 For any m n matrix A with rank(A) = r, Proof: Check if AGA = A. (i) compute a singular value decomposition of A (see result 1.14) to obtain P AQ = Drr 0 AGA = P 2 6 6 4 0 0 7 7 5 1 F 1 (ii) G = Q D F2 F3 P is a generalized inverse for A for any choice of F1; F2; F3. 3 6 6 4 7 7 5 4 3 where P is an m m orthogonal matrix Q is an n n orthogonal matrix D is an r r matrix of singular values 2 2 T 2 =P T =P T =P T 4 2 4 2 4 D 0 0 0 3 5 2 Q Q4 D F2 T " 1 F1 35 P P F3 Defn 3.5: For any matrix A there is a unique matrix M , called the Moore-Penrose inverse, that satises (i) AMA = A (ii) MAM = M (iii) AM is symmetric (iv) MA is symmetric 166 4 " D 0 0 0 3 5 Q T nn mm identity identity matrix matrix 32 32 1 D 0 5 4 D F1 5 4 D 0 35 Q 0 0 F2 F3 0 0 32 I DF1 5 4 D 0 35 Q T 0 D 0 0 0 0 0 0 3 5 Q T T =A 164 Moore-Penrose Inverse 2 T 165 Result 3.1 1 M = Q D0 00 P 2 3 6 6 6 6 4 7 7 7 7 5 is the Moore-Penrose inverse of A, where P AQ = D 0 3 2 6 6 6 4 0 0 7 7 7 5 is a singular value decomposition of A. 167 Properties of generalized inverses of X T X Normal equations: (X T X )b = X T Y Result 3.3 If G is a generalized inverse of X T X , then (i) GT is a generalized inverse of XT X. (ii) XGX T X = X , i.e., GX T is a generalized inverse of X . (iii) XGX T is invariant with respect to the choice of G. (iv) XGX T is symmetric. 168 Hence, 0 = BX T ) BX T = X T ) X T XGT X T XT = XT Taking the transpose X = (X T XGT X T )T = XGX T X Hence, GX T is a generalized inverse for X . (iii) Suppose F and G are generalized inverses for X T X . Then, from (ii) XGX T X = X and XF X T X = X 170 Proof: (i) Since G is a generalized inverse of (X T X ), (X T X )G(X T X ) = X T X . Taking the transpose of both sides X T X T = (X T X )G(X T X ) T = (X T X )T GT (X T X )T But (X T X )T = X T (X T )T = X T X , hence (X T X )GT (X T X ) = (X T X ) " # " # (ii) From (i) (X T X )GT (X T X ) = (X T X ) - Call this B Then 0 = BX T X X T X = (BX T X X T X )(BT I ) = BX T XBT X T XBT BX T X X T X = (BX T X T )(BX T X T )T 169 It follows that 0 = X X = (XGX T X XF X T X ) = (XGX T X XF X T X )(GT X T F T X T ) = (XGX T XF X T )X (GT X T F T X T ) = (XGX T XF X T )(XGT X T XF T X T ) = (XGX T XF X T )(XGX T XF X T )T Since the (i,i) diagonal element of the result of multiplying a matrix by its transpose is the sum of the squared entries in the i-th row of the matrix, the diagonal elements of the product are all zero only if all entries are zero in every row of the matrix. Consequently, (XGX T XF X T ) = 0 171 Estimation of the Mean Vector E (Y) = X (iv) For any generalized inverse G, T = GX T XGT is a symmetric generalized inverse. Then XT X T is symmetric and from (iii), XGX T = XT X T : For any solution to the normal equations, say b = (X T X ) X T Y ; the OLS estimator for E (Y) = X is Y^ = X b = X (X T X ) X T Y = PX Y The matrix PX = X (X T X ) X T is called an \orthogonal projection matrix". Y^ = PX Y is the projection of Y onto the space spanned by the columns of X . 172 Result 3.4 Properties of a projection matrix PX = X (X T X ) X T 173 (ii) PX is symmetric (from Result 3.3 (iv)) (i) PX is invariant to the choice of (X T X ) . For any solution b = (X T X ) X T Y (iii) PX is idempodent (PX PX = PX ) (iv) PX X = X (from Result 3.3 (ii)) (v) Partition X as to the normal equations X = [X1jX2j jXk] ; Y^ = X b = PX Y then PX Xj = Xj is the same. (from Result 3.3 (iii)) 174 175 Result 3.5 Properties of I PX Residuals: (i) I PX is symmetric (ii) I PX is idempodent ei = Yi Y^i i = 1; : : : ; n The vector of residuals is e= = = = Y Y Y (I (I PX )(I PX ) = I PX Y^ Xb PX Y PX )Y (iii) (I PX )PX = PX PX PX = PX PX = 0 Comment: I PX is a projection matrix that projects Y onto the space orthogonal to the space spanned by the columns of X . (iv) (I PX )X = X PX X = X X=0 176 (v) Partition X as [X1jX2j jXk] then 177 e = Y Y^ YRn (I PX )Xj = 0 (vi) Residuals are invariant with respect to the choice of (X T X ) , so e Y X b = (I PX )Y % Y^ vector space spanned by the columns of X . dimension of this space is rank(X ). The residual vector e = Y Y~ = (I PX )Y is the same for any solution is in the space orthogonal to the space spanned by the columns of X . It has dimension n rank(X ): b = (X T X ) X T Y to the normal equations 178 179 Partition of a total sum of squares Squared length of Y is X y2 = Y Y n =1 i Squared length of the residual vector is T i e2 = e e [(I P )Y] (I P Y (I P )Y n X =1 i = = ANOVA T i T X T X )Y X Squared length of ^ = P Y is Y X Y^ 2 = Y^ Y^ =1 = (P Y) (P Y) = Y (P ) P Y since P is symmetric = Y P P Y since P is idempotent = Y P Y X n i T i T X T T T X X X T X X Source of Variation Degrees of Freedom Sums of Squares model rank(X ) Y^ T Y^ = YT PX Y (uncorrected) residuals X X n-rank(X ) eT e = YT (I PX )Y X total We have (uncorrected) n n 2 YT Y = i=1 yi X YT Y = YT (PX + I PX )Y = YT PX Y + YT (I PX )Y : 181 180 Result 3.6 For the linear model E (Y) = X and V ar(Y ) = , the OLS estimator Y^ = X b = PX Y for X is (i) unbiased, i.e., E (Y^ ) = X (ii) a linear function of Y (iii) has variance-covariance matrix V ar(Y^ ) = PX PX This is true for any solution b = (X T X ) X T Y to the normal equations. 182 Proof: (ii) is trivial, since Y^ = PX Y (iii) follows from result 2.1.(ii) (i) E (Y^ ) = E (PX Y) = PX E (Y ) from result 2.1.(i) = PX X = X since PX X = X 183 Comments: Y^ = X b = PX Y is said to be a linear unbiased estimator for E (Y) = X For the Gauss-Markov model, V ar(Y) = 2I and V ar(Y^ ) = PX (2I )PX = 2PX PX = 2 PX = 2X (X T X ) X T " this is sometimes called the \hat" matrix. Questions Is Y^ = X b = PX Y the \best" estimator for E (Y) = X? Is Y^ = X b = PX Y the \best" estimator for E (Y ) = X in the class of linear, unbiased estimators? What other linear functions of , say cT = c11 + c22 + + ckk; have OLS estimators that are invariant to the choice of b = (X T X ) X T Y that solves the normal equations? 184 185 Estimable Functions Some estimates of linear functions of the parameters have the same value, regardless of which solution to the normal equations is used These are called estimable functions An example is E(Y) = X Check that X b has the same value for each solution to the normal equations obtained in Example 3.2, i.e., Y1: Y1: X b = YY23:: Y3: Y3: 2 3 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 186 Estimable Functions Defn 3.6: For a linear model E (Y) = X and V ar(Y) = we will say that cT = c11 + c22 + + ckk is estimable if there exists a T linear unbiased estimator a Y for cT , i.e., for some non-random vector a, we have E (aT Y) = cT . 187 Example 3.2. Blood coagulation times Diet 1 Diet 2 Diet 3 Y11 = 62 Y21 = 71 Y31 = 72 Y12 = 60 Y32 = 68 Y33 = 67 The \Eects" model Yij = + i + ij can be written as 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 Y11 1 1 0 Y12 1 1 0 Y21 = 1 0 1 Y31 1 0 0 Y32 1 0 0 Y33 1 0 0 3 2 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 0 0 0 1 1 1 3 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 2 6 6 6 6 6 6 6 6 6 6 4 11 12 1 + 21 2 31 3 32 33 3 7 7 7 7 7 7 7 7 7 7 5 2 3 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 Examples of estimable functions + 1 Choose aT = ( 12 21 0 0 0 0). Then, E (aT Y) = E ( 12 Y11 + 21 Y12) = = = 1E (Y ) + 1E (Y ) 2 11 2 12 1( + ) + 1( + ) 1 2 1 2 + 1 Choose aT = (1 0 0 0 0 0) and note that E (aT Y) = E (Y11) = + 1. 189 188 + 2 Choose aT = (0 0 1 0 0 0): Then, aT Y = Y21 1 2 Note that 1 2 = ( + 1) ( + 2) = E (Y11) E (Y21) = E (Y11 Y21) = E (aT Y) and E (aT Y) = E (Y21) = + 2: + 3 Choose aT = (0 0 0 1 0 0): Then, where aT = (1 0 -1 0 0 0) E (aT Y) = E (Y31) = + 3 190 191 Quantities that are not estimable include 2 + 31 2 ; 1; 2; 3; 31; 1 + 2 Note that 2 + 31 2 = = = = 3( + 1) ( + 2) 3E (Y11) E (Y21) E (3Y11 Y21) E (aT Y) where aT To show that a linear function of parameters, c0 + c11 + c22 + c33 is not estimable, one must show that there is no non-random vector a = (a0; a1; a2; a3)T = (3 0 -1 0 0 0) for which E (aT Y) = c0 + c11 + c22 + c33 192 For 1 to be estimable we would need to nd an a that satises 1 = E (aT Y) = a1E (Y11) + a2E (Y12) + a3E (Y21) +a4(E (Y31) + a5E (Y32) + a6E (Y33) = (a1 + a2)( + 1) + a3( + 2) +(a4 + a5 + a6)( + a3) This implies 0 = a3 = (a4 + a5 + a6), Then 1 = (a1 + a2)( + 1) 193 Example 3.1. Yield of a chemical process 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 Y1 1 160 Y2 1 165 Y3 = 1 165 Y4 1 170 Y5 1 175 3 2 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 1 3 2 1 2 3 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 2 6 6 6 6 6 6 6 6 4 1 0 2 1 + 3 2 4 5 3 7 7 7 7 7 7 7 7 5 2 3 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 Since X has full column rank, each element of is estimable. which is impossible. 194 195 2 6 6 6 6 6 6 6 6 4 0 0 3 7 7 7 7 7 7 7 7 5 Consider B1 = cT where c = 1 . Since X has full column rank, the unique least squares estimator for is b = (X T X ) 1XT Y and an unbiased linear estimator for cT is cT b = cT (X T X ) 1X T Y call this aT Result 3.7 For a linear model with E (Y) = X and V ar(Y ) = (i) The expectation of any observation is estimable. (ii) A linear combination of estimable functions is estimable. (iii) Each element of is estimable if and only if rank(X ) = k = number of columns. (iv) Every cT is estimable if and only if rank(X ) = k = number of columns in X. 197 196 Proof: Y1 (i) For Y = .. with E (Y) = X Yn we have 2 3 6 6 6 6 6 6 4 7 7 7 7 7 7 5 0 0. . one in the ith Yi = aTi Y where ai = 1 position 0. . 0 Then E (Yi) = E (aTi Y) = aTi E (Y) = aTi X = cTi 2 3 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 198 (ii) Suppose cTi is estimable. Then, there is an ai such that E (aTi Y) = cTi . Now consider a linear combination of estimable functions w1cT1 + w2cT2 + + wpcTp Let a = w1a1 + w2a2 + + wpap. Then, E (aT Y) = E (w1aT1 Y + + wpaTp Y) = w1E (aT1 Y) + + wpE (aTp Y) = w1cT1 + + wpcTp (iii) Previous argument. (iv) Follows from (ii) and (iii). 199 Result 3.8. For a linear model with E (Y) = X and V ar(Y) = , each of the following is true if and only if cT is estimable. (i) cT = aT X for some a i.e., c is in the space spanned by the rows of X . Use Result 3.8. (ii) to show that is not estimable in Example 3.2. In that case (ii) cT a = 0 for every a for which X a = 0. and (iii) cT b is the same for any solution to the normal equations (X T X )b = X T y, i.e., there is a unique least squares estimator for cT . 200 Part (ii) of Result 3.8 sometimes provides a convenient way to identify all possible estimable functions of . In example 3.2, X d = 0 if and only if 2 d=W 6 6 6 6 6 6 6 6 6 6 6 6 6 4 1 1 for some scalar W . 1 1 3 7 7 7 7 7 7 7 7 7 7 7 7 7 5 Then, cT is estimable if and only if 0 = cT d = W (c1 c2 c3 c4) = 0 () c1 = c2 + c3 + c4. 202 1 1 E (Y) = X = 11 1 1 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 1 3 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 2 6 6 6 6 6 6 6 6 6 6 6 6 6 4 1 2 3 3 7 7 7 7 7 7 7 7 7 7 7 7 7 5 = cT = [1 0 0 0]: Let dT = [1 1 1 1], then X d = 0; but cT d = 1 6= 0 Hence, is not estimable. 201 Then, (c2 + c3 + c4) + c21 + c32 + c43 is estimable for any (c2 c3 c4) and these are the only estimable functions of ; 1; 2; 3. Some estimable functions are + 13 (1 + 2 + 3) (c2 = c3 = c4 = 13 ) and + 2 (c2 = 1 c3 = c4 = 0) but + 22 is not estimable. 203 Defn 3.7: For a linear model with E (Y) = X and V ar(Y) = ; where X is an n k matrix, Crkk1 is said to be estimable if all of its elements 2 C = 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 cT1 cT2 = .. cTr 3 2 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 cT1 cT2 .. cTr 3 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 are estimable. 204 Summary For a linear model Y = X + with E (Y) = X and V ar(Y) = , we have Any estimable function has a unique interpretation The OLS estimator for an estimable function C is unique C b = C (X T X ) X T Y The OLS estimator for an estimable function C is { a linear estimator { an unbiased estimator 206 Result 3.9 For the linear model with E (Y) = X and V ar(Y) = , where X is an n k matrix, each of the following conditions hold if and only if C is estimable. (i) AX = C for some matrix A, i.e., each row of C is in the space spanned by the rows of X . (ii) C d = 0 for any d for which X d = 0. (iii) C b is the same for any solution to the normal equations (X T X )b = X T y. 205 In the class of linear unbiased estimators for cT , is the OLS estimator the \best?" Here \best" means smallest expected squared error. Let t(Y) denote an estimator for cT . Then, the expected squared error is MSE = E [t(Y) cT ]2 = E [t(Y) E (t(Y)) + E (t(Y)) cT ]2 = E [t(Y) E (t(Y))]2 +[E (t(Y)) cT ]2 +2[E (t(Y)) cT ]E [t(Y) E (t(Y))] = E [t(Y) E (t(Y))]2 + [E (t(Y)) cT ]2 = V ar(t(Y)) + [bias]2 207 We are restricting our attention to linear unbiased estimators for cT : E(t(Y)) = cT t(Y) = aT Y for some a Then, t(Y) = aT Y is the best linear unbiased estimator (blue) for cT if V ar(aT Y) V ar(dT Y) for any d and any value of . Result 3.10 (Gauss-Markov Theorem) For the Gauss-Markov model, E (Y) = X and V ar(Y) = 2I; the OLS estimator of an estimable function cT is the unique best linear unbiased estimator (blue) of cT . Proof: (i) For any solution b = (X T X ) X T Y to the normal equations, the OLS estimator for cT is cT b = cT (X T X ) X T y which is a linear function of Y. 208 (ii) From Result 3.8.(i), there exists a vector a such that cT = aT X . Then E (cT b) = E (cT (X T X ) X T Y) = cT (X T X ) X T E (Y) = cT (X T X ) X T X = aT X (X T X ) X T X of X " projection P onto the column space X = aT X = cT 209 (iii) Minimum variance in the class of linear unbiased estimators Suppose dT Y is any other linear unbiased estimator for cT . Then E (dT Y) = dT E (Y) = dT X = cT for every . Hence, dT X = cT and c = X T d. We must show that V ar(cT b) V ar(dT Y): First, note that V ar(dT Y) = V ar(cT b + [dT Y cT b]) = V ar(cT b) + V ar(dT Y cT b) +2Cov(cT b; dT Y cT b) Hence, cT b is an unbiased estimator. 210 211 Then, Then V ar(dT Y) V ar(cT b) +2Cov(cT b; dT Y because Cov(c b; d T Y c b) T = Cov (c (X T cT b) = (c (X T = [c (X T = V ar(cT b) Cov(cT b; dT Y T T X) X Y; [d T c (X T T T X) X ) V ar (Y)[d T X) X ] 2 I [d T X) X c (X T T X(X T T T " X) T T ]Y) X) X T ] T c] This is where the symmetry of (X X ) is needed. T cT b) = 0: = 2[c (X X ) X T To show this rst note that cT b = cT (X T X ) X T Y is invariant with respect to the choice of (X T X ) . Consequently, we can use the Moore-Penrose generalized inverse which is symmteric. (Not every generalized inverse of X T X is symmetric.) T T " since X d T c (X T T X ) X X (X X ) T T c] d=c Since cT b is invariant to the choice of b (result 3.8.(iii)), we were able to use the Moore-Penrose inverse for (X T X ) which satises (X T X ) (X T X )(X T X ) = X T X by denition. 213 212 (iv) To show that the OLS estimator is the unique blue, note that V ar(dT Y) = V ar(cT b + [dT Y cT b]) = V ar(cT b) + V ar(dT Y cT b) because Cov(cT b; dT Y cT b) = 0. Then, Cov(cT b; dT Y cT b) = 2[cT (X T X ) c cT (X T X ) c] Then, dT Y is blue if and only if V ar(dT Y cT b) = 0 : =0 This is equivalent to Consequently, V ar(dT Y) V ar(cT b) and cT b is blue. dT Y cT b = constant: Since both estimators are unbiased E (dT Y cT b) = E (dT Y) E (cT b) = 0: Consequently, dT Y cT b = 0 for all Y and cT b is the unique blue. 214 215 Parts (i) and (ii) of the proof of result 3.11 do not require What if you have a linear model that is not a Gauss-Markov model? V ar(Y) = 2I : E (Y) = X V ar(Y) = 6= 2I Consequently, the OLS estimator for cT , cT b = cT (X T X ) X T Y is a linear unbiased estimator. 216 Result 3.8 does not require V ar(Y) = 2I and the OLS estimator for any estimable quantity, Note that V ar(cT b) = V ar(cT (X T X ) X T Y) = cT (X T X ) X T X [(X T X ) ]T c cT b = cT (X T X ) X T Y ; is invariant to the choice of (X T X ) . The OLS estimator cT b may not be blue. There may be other linear unbiased estimators with smaller variance. 217 For the Gauss-Markov model and V ar(Y) = = 2I V ar(cT b) = 2cT (X T X ) X T X [(X T X ) ]T c = 2cT (X T X ) c 218 Generalized Least Squares (GLS) Estimation: The spectral decomposition of yields Defn 3.8: For a linear model with n = j =1 j uj uTj : E (Y) = X and V ar(Y) = ; where is positive denite, a generalized least squares estimator for minimizes (Y X bGLS)T 1(Y X bGLS) Strategy: Transform Y to a random vector Z for which the GaussMarkov model applies. X Dene n 1 T 1=2 = j =1 j uj uj X s and create the random vector Z = 1=2Y: Then V ar(Z) = V ar( 1=2Y) = 1=2 1=2 = I 220 219 Note that and E (Z) = E ( 1=2Y) = 1=2E (Y) = 1=2X = W and we have a Gauss-Markov model for Z, where W = 1=2X is the model matrix. (Z W b)T (Z W b) = ( 1=2Y 1=2X b)T ( 1=2Y 1=2X b) = (Y X b)T 1=2 1=2(Y X b) = (Y X b)T 1(Y X b) Hence, any GLS estimator for the Y model is an OLS estimator for the Z model. 221 WTWb = WTZ , (X T 1=2 1=2X )b = X T 1=2 1=2Y It must be a solution to the normal equations for the Z model WTWb = WTZ , (X T 1X )b = X T 1Y These are the generalized least squares estimating equations. Any solution bGLS = (W T W ) W T Z = (X T 1X ) X T 1Y is called a generalized least squares (GLS) estimator for . 222 Result 3.11 For the linear model with E (Y) = X and V ar(Y) = the GLS estimator of an estimable function cT , cT bGLS = cT (X T 1X ) X T 1Y ; is the unique blue of cT . cT Proof: Since is estimable, there is an a such that cT = E (aT Y) = E (aT 1=2 1=2Y) = E (aT 1=2Z) Consequently, cT is estimable for the Z model. Apply the Gauss-Markov theorem (result 3.10) to the Z model. 223 Comments 224 For the linear model with E (Y) = X and V ar(Y) = ; both the OLS and GLS estimators for an estimable function cT are linear unbiased estimators. V ar(cT bOLS) = cT (X T X ) X T X [(X T X ) ]T c V ar(cT bGLS) = cT (X T 1X ) X T 1X (X T 1X ) c cT bOLS is not a \bad" estimator, but V ar(cT bOLS) V ar(cT bGLS) because cT bGLS is the unique blue for cT . For the Gauss-Markov model, cT bGLS = cT bOLS : The results for bGLS and cT bGLS assume that V ar(Y) = is known. The same results, including Results 3.12, hold for the Aitken model where E (Y) = X and V ar(Y) = 2V for some known matrix V . 225 In practice V ar(Y) = is usually unknown. Then an approximation to bGLS = (X T 1X ) X T 1Y is obtained by substituting a consistent estimator ^ for . { use method of moments or maximum likelihood estimation to obtain ^ { the resulting estimator is not a linear estimator is consistent but not necessarily unbiased 226 does not provide a blue for estimable functions may have larger mean squared error than the OLS estimator To create condence intervals or test hypotheses about estimable functions for a linear model with E (Y) = X and V ar(Y) = we must (ii) estimate 2 when V ar(Y) = 2I or (i) specify a probability distribution for Y so we can derive a distribution for V ar(Y) = 2V for some known V . cT b = cT (X T 1X ) X T 1Y 227 The \eects" linear model and the \means" linear model are equivalent in the sense that the space of possible mean vectors E (Y) = X (iii) Estimate when is the same for the two models. the model matrices dier the parameter vectors dier the columns of the model matrices span the same vector space V ar(Y) = E (Y) = X 228 229 1 = [X1jX2j jXk] .. k = 1 X1 + 2 X2 + + k Xk 2 3 6 6 6 6 6 6 6 6 4 7 7 7 7 7 7 7 7 5