Lecture XXVIII Most of the material for this lecture is from George Casella and Roger L. Berger Statistical Inference (Belmont, California: Duxbury Press, 1990) Chapter 12, pp. 554-577. The purpose of regression analysis is to explore the relationship between two variables. In this course, the relationship that we will be interested in can be expressed as: yi a bxi i where yi is a random variable and xi is a variable hypothesized to affect or drive yi. The coefficients a and b are the intercept and slope parameters, respectively. These parameters are assumed to be fixed, but unknown. The residual i is assumed to be an unobserved, random error. Under typical assumptions E[i]=0. Thus, the expected value of yi given xi then becomes: E yi a bxi The goal of regression analysis is to estimate a and b and to say something about the significance of the relationship. From a terminology standpoint, y is typically referred to as the dependent variable and x is referred to as the independent variable. Cassella and Berger prefer the terminology of y as the response variable and x as the predictor variable. This relationship is a linear regression in that the relationship is linear in the parameters a and b. Abstracting for a moment, the traditional CobbDouglas production function can be written as: b yi axi taking the natural log of both sides yields: ln yi ln a b ln xi The setup for simple linear regression is that we have a sample of n pairs of variables (xi,yi),…(xn,yn). Further, we want to summarize this relationship using by fitting a line through the data. Based on the sample data, we first describe the data as follows: The sample means 1 n x xi n i 1 1 n y yi n i 1 The sums of squares: n S xx xi x 2 i 1 n n S yy yi y i 1 S xy xi x yi y i 1 2 The most common estimators given this formulation are then given by b S xy S xx a y bx Following on our theme in the discussion of linear projections “Our first derivation of estimates of a and b makes no statistical assumptions about the observations (xi,yi)…. Think of drawing through this cloud of points a straight line that comes ‘as close as possible’ to all the points.” This definition involves minimizing the sum of square error in the choice of a and b: n min RSS yi a bxi a ,b i 1 2 Focusing on a first n n y a bx y i 1 2 i i i 1 bxi a 2 i n RSS 2 yi bxi a 0 a i 1 n n i 1 i 1 yi b xi na y bˆx aˆ Taking the first-order conditions with respect to b yields: n RSS yi bxi y bx xi b i 1 n yi y b xi x xi i 1 n n i 1 i 1 yi y xi b xi x xi Going from this result to the traditional estimator requires the statement that n n i 1 i 1 S xy yi y xi x yi y xi yi y x n yi y xi i 1 The least squares estimator of bˆ S xy S xx b then becomes: Grape Yields and Number of Clusters Yield Cluster Count Year (y) (x) 1971 5.6 116.37 1973 3.2 82.77 1974 4.5 110.68 1975 4.2 97.50 1976 5.2 115.88 1977 2.7 80.19 1978 4.8 125.24 1979 4.9 116.15 1980 4.7 117.36 1981 4.1 93.31 1982 4.4 107.46 1983 5.4 122.30 Computing the simple least squares representation: x 107.101 y 4.475 S xx 2521.66 S xy 129.564 bˆ 0.0513806 aˆ 1.0279 1 1 1 1 1 1 X 1 1 1 1 1 1 116.37 5 .6 3 .2 82.77 4 .5 110.68 97.50 4 .2 5 .2 115.88 80.19 2 .7 y 125.24 4 .8 116.15 4 .9 4 .7 117.36 4.1 93.31 107.46 4 .4 5.4 122.30 First, we derive the projection matrix Pc X X ' X X ' 1 which is a 12 x 12 matrix. The projection of y onto the space can then be calculated as: Pc y X X ' X X ' y 1 4.95126 3.22487 4.6589 3.9817 4.92608 3.09231 Pc y 5.407 4.93995 5.00212 3.76642 4.49345 5.25594 Comparing these results with the estimated values of y from the model yields: a b 116.37 4.95126 a b 122.30 5.25594 The linear relationship between the xs and ys E yi a bxi and we assume that V yi 2 The implications of this variance assumption are significant. Note that we assume that each observation has the same variance irregardless of the value of the independent variable. In traditional regression terms, this implies that the errors are homoscedastic. One way to state these assumptions is yi a bxi i E i 0 V i 2 This specification is consistent with our assumptions, since the model is homoscedastic and linear in the parameters. Based on this formulation, we can define the linear estimators of a and b as n d y i 1 i i An unbiased estimator of b can further be defined as those linear estimators whose expected value is the true value of the parameter: n E d i yi b i 1 n b E d i yi i 1 n d i E yi i 1 n d i a bxi i 1 n n a d i b d i xi i 1 i 1 n d i 0 ni 1 d i xi 1 i 1 The linear estimator that satisfies these unbiasedness conditions and yields the smallest variance of the estimate is referred to as the best linear unbiased estimator (or BLUE). In this example, we need to show that n di xi x b̂ S xx x x y i 1 i S xx i Given that the yis are uncorrelated, the variance of linear model can be written as: n n n 2 2 2 V d i yi d i V yi d i i 1 i 1 i 1 The problem of minimizing the variance then becomes choosing the dis to minimize this sum subject to the unbiasedness constraints n min 2 d i2 di i 1 n s.t. d x i 1 i i n d i 1 i 1 0 n n 2 2 L d i 1 d i xi d i i 1 i 1 i 1 L 2 2 d i xi 0 d i n di x 2 i 2 2 2 n L 1 d i xi 0 i 1 n L d i 0 i 1 Using the results from the first n first-order conditions and the second constraint first, we have 2 xi 0 2 2 i 1 2 n n x 2 i 2 i 1 2 2 n n xi i 1 n x Substituting this result into the first n first-order conditions yields: di x x 2 i 2 2 2 x x i 2 2 Substituting these conditions into the first constraint, we get 1 2 xi x xi 0 i 1 2 2 2 n n x x x i 1 i i xi x xi x di n S xx xi x xi i 1 This proves that simple least squares is BLUE on a fairly global scale. Note that we did not assume normality in this proof. The only assumptions were that the expected error term is equal to zero and that the variances were independently and identically distributed.