Lecture XXVIII

advertisement
Lecture XXVIII
 Most of the material for this lecture is from George
Casella and Roger L. Berger Statistical Inference
(Belmont, California: Duxbury Press, 1990) Chapter 12,
pp. 554-577.
 The purpose of regression analysis is to explore the
relationship between two variables.
 In this course, the relationship that we will be interested
in can be expressed as:
yi  a  bxi   i
where yi is a random variable and xi is a variable
hypothesized to affect or drive yi. The coefficients a and
b are the intercept and slope parameters, respectively.
 These parameters are assumed to be fixed, but
unknown.
 The residual i is assumed to be an unobserved, random
error. Under typical assumptions E[i]=0.
 Thus, the expected value of yi given xi then becomes:
E yi   a  bxi
 The goal of regression analysis is to estimate a and b and
to say something about the significance of the
relationship.
 From a terminology standpoint, y is typically referred to
as the dependent variable and x is referred to as the
independent variable. Cassella and Berger prefer the
terminology of y as the response variable and x as the
predictor variable.
 This relationship is a linear regression in that the
relationship is linear in the parameters a and b.
Abstracting for a moment, the traditional CobbDouglas production function can be written as:
b
yi  axi
taking the natural log of both sides yields:
ln  yi   ln a   b ln xi 
 The setup for simple linear regression is that we have a
sample of n pairs of variables (xi,yi),…(xn,yn). Further,
we want to summarize this relationship using by
fitting a line through the data.
 Based on the sample data, we first describe the data as
follows:
 The sample means
1 n
x   xi
n i 1
1 n
y   yi
n i 1
 The sums of squares:
n
S xx    xi  x 
2
i 1
n
n
S yy    yi  y 
i 1
S xy    xi  x  yi  y 
i 1
2
 The most common estimators given this formulation are
then given by
b
S xy
S xx
a  y  bx
 Following on our theme in the discussion of linear
projections “Our first derivation of estimates of a and
b makes no statistical assumptions about the
observations (xi,yi)…. Think of drawing through this
cloud of points a straight line that comes ‘as close as
possible’ to all the points.”
 This definition involves minimizing the sum of square
error in the choice of a and b:
n
min RSS    yi  a  bxi 
a ,b
i 1
2
 Focusing on a first
n
n
  y  a  bx     y
i 1
2
i
i
i 1
 bxi   a 
2
i
n
RSS

 2  yi  bxi   a   0
a
i 1
n
n
i 1
i 1
  yi  b  xi  na
y  bˆx  aˆ
 Taking the first-order conditions with respect to b
yields:
n
RSS
   yi  bxi    y  bx xi
b
i 1
n
   yi  y   b  xi  x xi
i 1
n
n
i 1
i 1
   yi  y xi  b  xi  x xi
 Going from this result to the traditional estimator
requires the statement that
n
n
i 1
i 1
S xy    yi  y  xi  x     yi  y xi   yi  y x 
n
   yi  y xi
i 1
 The least squares estimator of
bˆ 
S xy
S xx
b then becomes:
Grape Yields and Number of Clusters
Yield
Cluster Count
Year
(y)
(x)
1971
5.6
116.37
1973
3.2
82.77
1974
4.5
110.68
1975
4.2
97.50
1976
5.2
115.88
1977
2.7
80.19
1978
4.8
125.24
1979
4.9
116.15
1980
4.7
117.36
1981
4.1
93.31
1982
4.4
107.46
1983
5.4
122.30
 Computing the simple least squares representation:
x  107.101
y  4.475
S xx  2521.66
S xy  129.564
bˆ  0.0513806
aˆ  1.0279
1
1

1

1
1

1

X 
1

1
1

1

1
1
116.37 
 5 .6 
 3 .2 
82.77 
 
 4 .5 
110.68

 
97.50 
 4 .2 
 5 .2 
115.88

 
80.19 
2 .7 

y 

125.24
4 .8

 
116.15
 4 .9 
 4 .7 
117.36

 
 4.1
93.31 

 
107.46
 4 .4 
5.4 
122.30
 First, we derive the projection matrix
Pc  X  X ' X  X '
1
which is a 12 x 12 matrix. The projection of y onto the
space can then be calculated as:
Pc y  X  X ' X  X ' y
1
4.95126
3.22487 


 4.6589 


 3.9817 
 4.92608


3.09231

Pc y  
5.407 


 4.93995
5.00212 


3.76642 


 4.49345
5.25594 
 Comparing these results with the estimated values of y
from the model yields:
a  b 116.37  4.95126





a  b 122.30  5.25594 
 The linear relationship between the xs and ys
E yi   a  bxi
and we assume that
V  yi   
2
 The implications of this variance assumption are
significant. Note that we assume that each
observation has the same variance irregardless of the
value of the independent variable. In traditional
regression terms, this implies that the errors are
homoscedastic.
 One way to state these assumptions is
yi  a  bxi   i
E i   0 V  i   
2
This specification is consistent with our assumptions,
since the model is homoscedastic and linear in the
parameters.
 Based on this formulation, we can define the linear
estimators of a and b as
n
d y
i 1
i
i
An unbiased estimator of b can further be defined as
those linear estimators whose expected value is the
true value of the parameter:
 n

E   d i yi   b
 i 1

 n

b  E   d i yi 
 i 1

n
  d i E  yi 
i 1
n
  d i a  bxi 
i 1
 n 
 n

 a   d i   b   d i xi 
 i 1 
 i 1

 n
  d i  0
  ni 1
 d i xi  1
 i 1
 The linear estimator that satisfies these unbiasedness
conditions and yields the smallest variance of the
estimate is referred to as the best linear unbiased
estimator (or BLUE). In this example, we need to
show that
n
di

xi  x 

 b̂ 
S xx
 x  x  y
i 1
i
S xx
i
 Given that the yis are uncorrelated, the variance of
linear model can be written as:
n
 n
 n 2
2
2
V   d i yi    d i V  yi     d i
i 1
 i 1
 i 1
 The problem of minimizing the variance then
becomes choosing the dis to minimize this sum subject
to the unbiasedness constraints
n
min  2  d i2
di
i 1
n
s.t.
d x
i 1
i i
n
d
i 1
i
1
0
n
n




2
2
L    d i   1   d i xi      d i 
i 1
 i 1

 i 1 
L
2
 2 d i  xi    0
d i
n


 di 
x 
2 i
2
2
2
n
L
 1   d i xi  0

i 1
n
L
  d i  0

i 1
 Using the results from the first n first-order conditions
and the second constraint first, we have
 
 
   2 xi 
0
2 
2 
i 1  2
 n
n

x 
2  i
2 i 1
2 2
n
n
 
  xi
i 1
n
 x
 Substituting this result into the first n first-order
conditions yields:


di 
x 
x
2 i
2
2
2




x

x
i
2
2
 Substituting these conditions into the first constraint,
we get

1   2  xi  x xi  0
i 1 2
2 2
  n
n
 x  x x
i 1
i
i


xi  x 
xi  x 
 di  n

S xx
 xi  x xi
i 1
 This proves that simple least squares is BLUE on a
fairly global scale. Note that we did not assume
normality in this proof. The only assumptions were
that the expected error term is equal to zero and that
the variances were independently and identically
distributed.
Download