DISTRIBUTIONS ARISING FROM THE SIMPLE REGRESSION MODEL The simple linear regression model is that Yi = 0 + 1 xi + i for i = 1, 2, 3, …, n The i are assumed to be independent random variables, constituting a sample from a normal population with mean 0 and unknown standard deviation . It is further assumed that x1, x2, … , xn are non-random. Of course, in many situations the xi’s are random. For those cases, we do the analysis conditional on the values that the xi’s take; that makes them effectively non-random. It’s possible to make a lesser assumption on the i’s by not requiring that they be normal. It would still be possible then to compute means, standard deviations, and the correlation for ̂0 and ̂1 . It would not be possible to derive, as we do here, the actual distributions for ̂0 and ̂1 . If the xi’s are non-random, then so is Sxx . We will be able to get the exact statistical properties under the normal model. Let’s get the distribution of Sxy . Note first that n Sxy = n xi x yi y = i 1 x i 1 i x yi At this point, we can invoke the normal distribution. It’s clear that Sxy is just a linear combination of independent normal random variables, so that Sxy must have a normal distribution. Let’s get the expected value. n E Sxy = E xi x yi = i 1 n n i 1 i 1 n xi x E yi i 1 n = x i 1 i x 0 1 xi = 1 xi x xi = 1 xi x xi x = 1 Sxx Since ̂1 = S xy S xx , we have discovered that E ̂1 = 1 . This is good. 1 gs2011 DISTRIBUTIONS ARISING FROM THE SIMPLE REGRESSION MODEL We can similarly get Var(Sxy). n Var(Sxy) = Var xi x yi = i 1 n x i i 1 x Var yi = 2 2 n x i i 1 x 2 = 2 Sxx . Noting again that ̂1 = S 1 Var S xy = , we decide that Var( ̂1 ) = Var xy = 2 S S S xx xx xx S xy 1 2 2 S xx = S . S xx2 xx 2 This is a big-deal result. We’ve discovered that ̂1 ~ N 1 , . S xx We can make similar statements for ̂0 . Since ̂0 = y - ̂1 x , and since y and ̂1 are linear functions of y1, y2, …, yn, it certainly must happen that ̂0 follows a normal distribution. Let’s develop this. n ̂0 = y - ̂1 x = y S xy S xx x x y x = y i 1 i i x S xx n = 1 n n yi i 1 x x y i 1 i S xx i n x = y i 1 i xi x x 1 n S xx We can use this to find the mean and variance. n 1 E ̂0 = E yi n i 1 xi x x 1 1 xi n xi x x n = i 1 0 S xx S xx = E y n i 1 i xi x x S xx 2 1 n gs2011 DISTRIBUTIONS ARISING FROM THE SIMPLE REGRESSION MODEL n = i 1 0 n = 0 i 1 1 xi 1 n n 1 n n 1 xi i 1 i 1 1 n 0 x S xx = 0 1 x 0 1 xi n 0 xi x x S xx xi x x S xx i 1 n xi x i 1 n 1 i 1 1 x S xx xi xi x x S xx n x x x i 1 This is zero. i = 0 i This is S xx . Thus, E ̂0 = 0 , so we have an unbiased estimate of the intercept! We can get the variance also: n 1 Var ̂0 = Var yi n i 1 xi x x 1 = i 1 n n 1 = i 1 n n 2 = S xx 2 xi x x S xx 2 2 Var yi = 1 x x x 2 2 i n S i 1 xx 2 2 x 2 n n S xx n n xi x i 1 2 x 2 S xx2 n x x i 1 xi x x 1 i 1 n n 2 This is zero. = S xx 2 2 xi x x S xx i 1 n 2 i This is S xx . 1 2 2 x 2 x2 = 2 n S xx S xx n 1 x2 Thus we are able to summarize that ̂0 ~ N 0 , 2 S xx n 3 . gs2011 DISTRIBUTIONS ARISING FROM THE SIMPLE REGRESSION MODEL One can find also the covariance between the slope and intercept estimates. Cov( ̂0 , ̂1 ) xi x x n 1 Cov yi n i 1 = xi x x n 1 = Cov yi i 1 n S xx S xx 1 , S xx n x j 1 j S xy , S xx x yj Observe that the sum for Sxy is written with counter j. n = 1 yi n n Cov i 1 j 1 xi x x 1 xj x yj , S xx S xx This is the bilinear property of the covariance. Notice now that only the terms with i = j are different from zero. 1 yi n Cov i 1 n = 1 n i 1 = 2 = 2 n 1 n i 1 n i 1 = xi x x n = 2 n S xx S xx xi x x S xx 1 xi x Var yi S xx xi x x S xx 1 1 xi x n S xx n xi x i 1 1 xi x yi , S xx 1 xi x S xx 2 2 x S xx This sum is zero. 2 n xi x x i 1 S xx n xi x 2 1 xi x S xx = x i 1 2 S xx This sum is S xx . You might note that the covariance is zero when x = 0. These derivations are much cleaner and easier when the work is done in matrix notation. The regression variance estimate s2 = 1 n2 y ˆ n i 1 i 0 ˆ 1 xi 2 has the distribution 2 2 n 2 . This fact is easily obtained in matrix notation, but it is very difficult to do in the notation used here. It is best to take this distribution fact from the matrix notation. 4 gs2011