docx - Tsinghua Math Camp 2015

advertisement
Project
Previously, we have learned two common estimation methods in order to obtain
point estimators: (1) the maximum likelihood estimator (MLE), and (2) the
method of moment estimator (MOME). Today, we introduce the simple linear
regression model, and a third, commonly used estimation method, (3) the least
squares estimator (LSE).
Our first project is to derive the maximum likelihood estimator for the simple
linear regression model, assuming the regressor X is given (that is, not random -this is also commonly referred to as conditioning on X). One must check whether
the MLE of the model parameters are the same as their LSE’s.
Next, we introduce the concept of errors in variables (EIV) regression model, and
its two special cases of orthogonal regression (OR) and geometric mean
regression (GMR). Our second project is to derive the maximum likelihood
estimators of the EIV model for simple linear regression, and to pin-point which
special cases correspond to the OR and the GMR. We also need to identify the two
boundary lines of the entire class of EIV regression lines.
The Least Squares Estimators
Approach: To minimize the sum of the squared vertical distances (or the
squared deviations/errors)
Example: Simple Linear Regression
The aim of simple linear regression is to find the linear relationship between two
variables. This is in turn translated into a mathematical problem of finding the
equation of the line that is closest to all points observed. Consider the scatter
plot below. The vertical distance each point is above or below the line has
been added to the diagram. These distances are called deviations or errors –
they are symbolised as d i  yi  yˆ i , i  1,n .
1
Figure 1. Illustration of the least squares regression method.
The least-squares regression line will minimize the sum of the squared
vertical distance from every point to the line,
i.e. we minimise
d
2
i
.
** The statistical equation of the simple linear regression line, when only the
response variable Y is random, is: Y  0  1 x   (or in terms of each point:
Yi   0  1 xi   i ). One notices that we used the lower case x for the regressor.
Here  0 is called the intercept,  1 the regression slope,  is the random error
with mean 0, x is the regressor (independent variable), and Y the response
variable (dependent variable).
** The least squares regression line is obtained by finding the values of  0 and  1
values (denoted in the solutions as ˆ0 & ˆ1 ) that will minimize the sum of the
squared
vertical
distances
from
all
points
to
the
line:
   d i2    yi  yˆ i    yi   0  1 xi 
2
2
The solutions are found by solving the equations:


 0 and
0
 0
 1
** The equation of the fitted least squares regression line is Yˆ  ˆ0  ˆ1 x (or in
terms of each point: Yˆi  ˆ0  ˆ1 xi )
----- For simplicity of notations, many
books denote the fitted regression equation as: Yˆ  b0  b1 x (* you can see that
for some examples, we will use this simpler notation.)
where ̂1 
S xy
ˆ0  y  ˆ1 x .
and
S xx
Notations:
S xy
x y
  xy       x
n
i
 x  yi  y  ;
 x 

2
S xx   x
2
n
   xi  x  ;
2
x and y are the mean values of x and y respectively.
Note: Please notice that in finding the least squares regression line, we do
2
not need to assume any distribution for the random errors  i . However, for
statistical inference on the model parameters (  0 and  1 ), it is often
assumed that the errors have the following three properties:
(1) Normally distributed errors
(2) Homoscedasticity (constant error variance var  i    2 for Y at all levels
of X)
(3) Independent errors (usually checked when data collected over time or
space)
***The above three properties can be summarized as:  i ~ N 0,  2 , i  1,  , n
i .i . d .
Project 1.
Assuming that the error terms are distributed as:  i ~ N 0,  2  , i  1,  , n
i .i . d .
Please derive the maximum likelihood estimator for the simple linear regression
model, assuming the regressor X is given (that is, not random -- this is also
commonly referred to as conditioning on X = x). One must check whether the
MLE of the model parameters (  0 and  1 ) are the same as their LSE’s.
Finance Application:
•
•
Market Model
One of the most important applications of linear regression is the market
model.
It is assumed that rate of return on a stock (R) is linearly related to the
rate of return on the overall market.
R = β0 + β1Rm + ε
R: Rate of return on a particular stock
Rm: Rate of return on some major stock index
β1: The beta coefficient measures how sensitive the stock’s rate of return
is to changes in the level of the overall market.
3
Errors-in-Variables (EIV) Regression Model
Please notice that the least squares regression is only suitable when the random
errors exist in the dependent variable Y only, or alternatively, when we estimate
the model conditioning on X being given. If the regressior X is also random – it is
then referred to as the Errors in Variable (EIV) regression. In the figures below,
we illustrate two commonly used EIV regression lines, the orthogonal regression
line (*obtained by minimizing the sum of the squared orthogonal distances) and
the geometric mean regression line (*obtained by minimizing the sum of the
straight triangular areas).
Figure 2. Illustration of the orthogonal regression (top),
and the geometric mean regression (bottom).
As the renowned physicist E.T. Jaynes pointed out in his monogram “the most
common problem of inference faced by experimental scientists: linear regression
with both variables subject to unknown error” (Jaynes 2004, “The Logic of
Science”, Cambridge University Press, page 497), we on one hand, apprehend the
importance of the EIV models, while on the other, realize the danger of simply
applying the naïve least squares method to a regression problem.
4
The general parametric EIV structural model for a simple linear regression model
is as follows:
X   
Y  
   0  1
 ~ N  0, 2 
 ~ N  0, 2 
Here  and  are independent random errors. Furthermore,  is a random variable
following normal distribution with mean  and variance  2 , and independent to
both random errors. This implies that X and Y follow a bivariate normal distribution:

   2   2
X
  ~ N        , 
2
Y 
1   1
 0
1 2  

12 2   2  
Given a random sample of observed X’s and Y’s, the maximum likelihood estimator
(MLE) of the regression slope is given by
ˆ1 
2
SYY   S XX  ( SYY   S XX )2  4 S XY
2S XY
Its value depends on the ratio of the two error variances    2  2 , which is
generally unknown and unable to be estimated from the data alone.
Project 2A.
Our second project is to derive the maximum likelihood estimators of the EIV
model for simple linear regression, and to pin-point which special cases
correspond to the OR and the GMR. We also need to identify the two boundary
lines of the entire class of EIV regression lines.
Hint: We start by constructing the likelihood function of the entire data. In this
case, that means the entire n data points (𝑋𝑖 , 𝑌𝑖 ), i = 1, …, n. Please also note that
these n data points are independent to each other.
We provide an example below to demonstrate how to use the entire data to
obtain the likelihood. Although in the example, we have two independent normal
random variables instead of two jointly bivariate normal random variables.
Example. Suppose we have two independent random samples from two normal
5
, X n1 ~ N  1 ,  12  , and Y1 , Y2 ,
populations, that is, X 1 , X 2 ,
, Yn2 ~ N  2 ,  22  .
Furthermore, we know that 3𝜇1 = 𝜇2 = 𝜇, 𝑎𝑛𝑑 𝜎12 = 2𝜎22 = 𝜎 2 .
(1) Please derive the maximum likelihood estimators (MLE’s) for 𝜇 and 𝜎 2 .
(2) Are the above MLE’s for 𝜇 and 𝜎 2 unbiased estimators for 𝜇 and 𝜎 2
respectively? Please show detailed derivations.
Solution:
(1) The likelihood function is
𝑛1
𝑛2
𝐿 = ∏ 𝑓 (𝑥𝑖 ; 𝜃) ∏ 𝑓 (𝑦𝑖 ; 𝜃)
𝑖=1
𝑖=1
𝑛1
𝑛2
(𝑥𝑖 − 𝜇1 )2
(𝑦𝑖 − 𝜇2 )2
1
=∏
𝑒𝑥𝑝 [−
]
∏
𝑒𝑥𝑝
[−
]
2𝜎12
2𝜎22
√2𝜋𝜎
√2𝜋𝜎
1
2
𝑖=1
𝑖=1
𝑛1
=∏
𝑖=1
1
1
√2𝜋𝜎
𝑒𝑥𝑝 [−
𝜇 2
(𝑥𝑖 − 3)
2𝜎 2
𝑛2
]∏
𝑖=1
1
√𝜋𝜎
𝑒𝑥𝑝 [−
(𝑦𝑖 − 𝜇)2
]
𝜎2
𝜇 2
1
2
∑𝑛𝑖=1
(𝑥𝑖 − 3)
∑𝑛𝑖=1
(𝑦𝑖 − 𝜇)2
−𝑛1 /2 −(𝑛1 +𝑛2 )/2 [𝜎 2 ]−(𝑛1 +𝑛2 )/2
=2
𝜋
𝑒𝑥𝑝 [−
−
]
2𝜎 2
𝜎2
The log likelihood function is
𝜇 2
1
2
∑𝑛𝑖=1
(𝑥𝑖 − 3)
∑𝑛𝑖=1
(𝑦𝑖 − 𝜇)2
(𝑛1 + 𝑛2 )
𝑙 = 𝑙𝑛𝐿 = 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡 −
𝑙𝑛(𝜎 2 ) −
−
2
2𝜎 2
𝜎2
Solving
𝜇
𝑛1
𝑛2
(𝑦𝑖 − 𝜇)
𝜕𝑙 ∑𝑖=1 (𝑥𝑖 − 3) 2 ∑𝑖=1
=
+
=0
𝜕𝜇
3𝜎 2
𝜎2
and
𝜇 2
𝑛1
2
∑𝑛𝑖=1
(𝑦𝑖 − 𝜇)2
(𝑛1 + 𝑛2 ) ∑𝑖=1 (𝑥𝑖 − 3)
𝜕𝑙
=−
+
+
=0
𝜕𝜎 2
2𝜎 2
2𝜎 4
𝜎4
We obtain the MLE for 𝜇 𝑎𝑛𝑑 𝜎 2 :
𝜇̂ =
3𝑛1 𝑋̅ + 18𝑛2 𝑌̅
𝑛1 + 18𝑛2
6
𝜇̂ 2
𝑛2
1
∑𝑛𝑖=1
(𝑦𝑖 − 𝜇̂ )2
(𝑥𝑖 − 3) + 2 ∑𝑖=1
2
̂
𝜎 =
𝑛1 + 𝑛2
(2) Since
𝐸[𝑋̅] = 𝜇/3 and 𝐸[𝑌̅] = 𝜇
It is straight-forward to verify that the MLE
estimator for 𝜇
𝜃̂ = 𝑋̅
is an unbiased
̂
𝜇
Since
3𝜇1 = 𝜇2 = 𝜇, we have 𝜇̂ 1 = 3 , 𝑎𝑛𝑑 𝜇̂ 2 = 𝜇̂ . The MLE is
re-written as:
̂2 =
𝜎
Therefore,
Shuanglong):
̂2 ] =
𝐸[𝜎
1
2
∑𝑛𝑖=1
(𝑥𝑖 − 𝜇1 + 𝜇1 − 𝜇̂ 1 )2 + 2 ∑𝑛𝑖=1
(𝑦𝑖 − 𝜇2 + 𝜇2 − 𝜇̂ 2 )2
𝑛1 + 𝑛2
we calculate the expectation of the MLE as follows (from
𝑛1
1
∑𝑛𝑖=1
𝐸(𝑥𝑖 − 𝜇1 )2 − 2 ∑𝑖=1
𝐸[(𝑥𝑖 − 𝜇1 )(𝜇̂ 1 − 𝜇1 )] + 𝑛1 𝐸(𝜇̂ 1 − 𝜇1 )2
𝑛1 + 𝑛2
𝑛
+
𝑛
2
2
2 ∑𝑖=1
𝐸(𝑦𝑖 − 𝜇2 )2 − 4 ∑𝑖=1
𝐸[(𝑦𝑖 − 𝜇2 )(𝜇̂ 2 − 𝜇2 )] + 2𝑛2 𝐸(𝜇̂ 2 − 𝜇2 )2
𝑛1 + 𝑛2
Note that the second and fourth term above can be combined to be:
2
1
2
∑𝑛𝑖=1
∗
𝐸[(3𝑥𝑖 − 𝜇)(𝜇̂ − 𝜇)] 4 ∑𝑛𝑖=1
𝐸[(𝑦𝑖 − 𝜇)(𝜇̂ − 𝜇)]
9
+
𝑛1 + 𝑛2
𝑛1 + 𝑛2
=
2
𝑛2
1
(𝑦𝑖 − 𝜇) + ∗ ∑𝑛𝑖=1
(3𝑥𝑖 − 𝜇))]
𝐸[(𝜇̂ − 𝜇)(4 ∑𝑖=1
9
𝑛1 + 𝑛2
2
∗ 𝐸[(𝜇̂ − 𝜇)((3𝑛1 𝑋̅ + 18𝑛2 𝑌̅) − (𝑛1 + 18𝑛2 )𝜇)]
=9
𝑛1 + 𝑛2
=
𝑛
−2 ∗ ( 91 + 2𝑛2 ) 𝐸(𝜇̂ − 𝜇)2
𝑛1 + 𝑛2
̂2 ]
⟹ 𝐸[𝜎
7
𝑛2
1
∑𝑛𝑖=1
𝐸(𝑥𝑖 − 𝜇1 )2 + 2 ∑𝑖=1
𝐸(𝑦𝑖 − 𝜇2 )2 + 𝐸(𝜇̂ 1 − 𝜇1 )2 +2𝑛2 𝐸(𝜇̂ 2 − 𝜇2 )2
=
𝑛1 + 𝑛2
2 ∗ (𝑛1 /9 + 2𝑛2 )𝐸(𝜇̂ − 𝜇)2
𝑛1 + 𝑛2
2
(𝑛1 + 𝑛2 )𝜎 − (𝑛1 /9 + 2𝑛2 )𝑉𝑎𝑟(𝜇̂ )
(𝑛1 /9 + 2𝑛2 )𝑉𝑎𝑟(𝜇̂ )
=
= 𝜎2 −
𝑛1 + 𝑛2
𝑛1 + 𝑛2
−
And
3𝑛1 𝑋̅ + 18𝑛2 𝑌̅
9𝑛1 2 𝑉𝑎𝑟(𝑋̅) + 364 𝑛2 2 𝑉𝑎𝑟(𝑌̅)
𝑉𝑎𝑟(𝜇̂ ) = 𝑉𝑎𝑟 (
)=
(𝑛1 + 18𝑛2 )2
𝑛1 + 18𝑛2
9𝑛1 𝑉𝑎𝑟(𝑋) + 364 𝑛2 𝑉𝑎𝑟(𝑌)
9𝜎 2
=
=
(𝑛1 + 18𝑛2 )2
𝑛1 + 18𝑛2
Therefore,
̂2 ] = 𝑛1 + 𝑛2 − 1 𝜎 2
𝐸[𝜎
𝑛1 + 𝑛2
̂2 is not an unbiased estimator of 𝜎 2 .
So we know the MLE 𝜎
Project 2B.
For those of us who have trouble with our original second project (*now I call it
2A, to derive the maximum likelihood estimators of the EIV model for simple
linear regression), you may derive the method of moment estimators (MOME’s)
instead. Note: your answer should be the same as the MLE’s for the regression
slope and intercept. You can then pin-point which special cases correspond to the
OR and the GMR. We also need to identify the two boundary lines of the entire
class of EIV regression lines.
Hint: Given a fixed error variance ratio 𝜆, you will be able to obtain the MOME’s
Using the following equations based on the first and second moments:
𝐸[𝑋] = 𝑋̅
𝐸[𝑌] = 𝑌̅
∑𝑛𝑖=1(𝑋𝑖 − 𝑋̅)2
𝑉𝑎𝑟[𝑋] =
𝑛
𝑛
∑𝑖=1(𝑌𝑖 − 𝑌̅)2
𝑉𝑎𝑟[𝑌] =
𝑛
𝑛
∑𝑖=1(𝑋𝑖 − 𝑋̅)(𝑌𝑖 − 𝑌̅)
𝐶𝑜𝑣[𝑋, 𝑌] =
𝑛
Please note that the MOME does not depend on the bivariate normal distribution
assumptions, and it is easier to derive. However, unlike the MLE approach, it does
not give us other inference directly such as confidence interval or test. It does not
8
have a straight forward geometric interpretation either.
Project 3.
Our third project is to derive a class of non-parametric estimators of the EIV
model for simple linear regression based on minimizing the sum of the following
distance from each point to the line as illustrate din the figure below:
Distance = c ∗ d2v + (1 − c)d2H ,
0≤c≤1
Please also show whether there is a 1-1 relationship between this class of
estimators and those in Project 2 (A/B). That is, try to ascertain whether there is
a 1-1 relationship between c and 𝜆.
Project 4.
For those who have finished Projects 2 & 3 above, you may also examine whether
there is a 1-1 relationship between the following class of regression lines based
on minimizing the sum of squared slant distances from the point to the line, to
those in Project 2 (A/B) or 3.
9
Project 5.
For those who have finished Projects 2 & 3 & 4 above, you may also examine how
to extend Projects 2/3/4 to the case where we have two random regressors X1
and X2 .
Project 6.
For those who have finished Projects 2 & 3 & 4 above, you may also examine how
to estimate the error variance ratio λ, when we have two repeated measures on
each sample.
Dear all,
I hope you will enjoy doing these projects. Please leave enough time for
yourselves to make a good power point presentation, and to practice your
presentation, ahead of time.
Good luck!
Prof. Zhu
10
Download