Project Previously, we have learned two common estimation methods in order to obtain point estimators: (1) the maximum likelihood estimator (MLE), and (2) the method of moment estimator (MOME). Today, we introduce the simple linear regression model, and a third, commonly used estimation method, (3) the least squares estimator (LSE). Our first project is to derive the maximum likelihood estimator for the simple linear regression model, assuming the regressor X is given (that is, not random -this is also commonly referred to as conditioning on X). One must check whether the MLE of the model parameters are the same as their LSE’s. Next, we introduce the concept of errors in variables (EIV) regression model, and its two special cases of orthogonal regression (OR) and geometric mean regression (GMR). Our second project is to derive the maximum likelihood estimators of the EIV model for simple linear regression, and to pin-point which special cases correspond to the OR and the GMR. We also need to identify the two boundary lines of the entire class of EIV regression lines. The Least Squares Estimators Approach: To minimize the sum of the squared vertical distances (or the squared deviations/errors) Example: Simple Linear Regression The aim of simple linear regression is to find the linear relationship between two variables. This is in turn translated into a mathematical problem of finding the equation of the line that is closest to all points observed. Consider the scatter plot below. The vertical distance each point is above or below the line has been added to the diagram. These distances are called deviations or errors – they are symbolised as d i yi yˆ i , i 1,n . 1 Figure 1. Illustration of the least squares regression method. The least-squares regression line will minimize the sum of the squared vertical distance from every point to the line, i.e. we minimise d 2 i . ** The statistical equation of the simple linear regression line, when only the response variable Y is random, is: Y 0 1 x (or in terms of each point: Yi 0 1 xi i ). One notices that we used the lower case x for the regressor. Here 0 is called the intercept, 1 the regression slope, is the random error with mean 0, x is the regressor (independent variable), and Y the response variable (dependent variable). ** The least squares regression line is obtained by finding the values of 0 and 1 values (denoted in the solutions as ˆ0 & ˆ1 ) that will minimize the sum of the squared vertical distances from all points to the line: d i2 yi yˆ i yi 0 1 xi 2 2 The solutions are found by solving the equations: 0 and 0 0 1 ** The equation of the fitted least squares regression line is Yˆ ˆ0 ˆ1 x (or in terms of each point: Yˆi ˆ0 ˆ1 xi ) ----- For simplicity of notations, many books denote the fitted regression equation as: Yˆ b0 b1 x (* you can see that for some examples, we will use this simpler notation.) where ̂1 S xy ˆ0 y ˆ1 x . and S xx Notations: S xy x y xy x n i x yi y ; x 2 S xx x 2 n xi x ; 2 x and y are the mean values of x and y respectively. Note: Please notice that in finding the least squares regression line, we do 2 not need to assume any distribution for the random errors i . However, for statistical inference on the model parameters ( 0 and 1 ), it is often assumed that the errors have the following three properties: (1) Normally distributed errors (2) Homoscedasticity (constant error variance var i 2 for Y at all levels of X) (3) Independent errors (usually checked when data collected over time or space) ***The above three properties can be summarized as: i ~ N 0, 2 , i 1, , n i .i . d . Project 1. Assuming that the error terms are distributed as: i ~ N 0, 2 , i 1, , n i .i . d . Please derive the maximum likelihood estimator for the simple linear regression model, assuming the regressor X is given (that is, not random -- this is also commonly referred to as conditioning on X = x). One must check whether the MLE of the model parameters ( 0 and 1 ) are the same as their LSE’s. Finance Application: • • Market Model One of the most important applications of linear regression is the market model. It is assumed that rate of return on a stock (R) is linearly related to the rate of return on the overall market. R = β0 + β1Rm + ε R: Rate of return on a particular stock Rm: Rate of return on some major stock index β1: The beta coefficient measures how sensitive the stock’s rate of return is to changes in the level of the overall market. 3 Errors-in-Variables (EIV) Regression Model Please notice that the least squares regression is only suitable when the random errors exist in the dependent variable Y only, or alternatively, when we estimate the model conditioning on X being given. If the regressior X is also random – it is then referred to as the Errors in Variable (EIV) regression. In the figures below, we illustrate two commonly used EIV regression lines, the orthogonal regression line (*obtained by minimizing the sum of the squared orthogonal distances) and the geometric mean regression line (*obtained by minimizing the sum of the straight triangular areas). Figure 2. Illustration of the orthogonal regression (top), and the geometric mean regression (bottom). As the renowned physicist E.T. Jaynes pointed out in his monogram “the most common problem of inference faced by experimental scientists: linear regression with both variables subject to unknown error” (Jaynes 2004, “The Logic of Science”, Cambridge University Press, page 497), we on one hand, apprehend the importance of the EIV models, while on the other, realize the danger of simply applying the naïve least squares method to a regression problem. 4 The general parametric EIV structural model for a simple linear regression model is as follows: X Y 0 1 ~ N 0, 2 ~ N 0, 2 Here and are independent random errors. Furthermore, is a random variable following normal distribution with mean and variance 2 , and independent to both random errors. This implies that X and Y follow a bivariate normal distribution: 2 2 X ~ N , 2 Y 1 1 0 1 2 12 2 2 Given a random sample of observed X’s and Y’s, the maximum likelihood estimator (MLE) of the regression slope is given by ˆ1 2 SYY S XX ( SYY S XX )2 4 S XY 2S XY Its value depends on the ratio of the two error variances 2 2 , which is generally unknown and unable to be estimated from the data alone. Project 2A. Our second project is to derive the maximum likelihood estimators of the EIV model for simple linear regression, and to pin-point which special cases correspond to the OR and the GMR. We also need to identify the two boundary lines of the entire class of EIV regression lines. Hint: We start by constructing the likelihood function of the entire data. In this case, that means the entire n data points (𝑋𝑖 , 𝑌𝑖 ), i = 1, …, n. Please also note that these n data points are independent to each other. We provide an example below to demonstrate how to use the entire data to obtain the likelihood. Although in the example, we have two independent normal random variables instead of two jointly bivariate normal random variables. Example. Suppose we have two independent random samples from two normal 5 , X n1 ~ N 1 , 12 , and Y1 , Y2 , populations, that is, X 1 , X 2 , , Yn2 ~ N 2 , 22 . Furthermore, we know that 3𝜇1 = 𝜇2 = 𝜇, 𝑎𝑛𝑑 𝜎12 = 2𝜎22 = 𝜎 2 . (1) Please derive the maximum likelihood estimators (MLE’s) for 𝜇 and 𝜎 2 . (2) Are the above MLE’s for 𝜇 and 𝜎 2 unbiased estimators for 𝜇 and 𝜎 2 respectively? Please show detailed derivations. Solution: (1) The likelihood function is 𝑛1 𝑛2 𝐿 = ∏ 𝑓 (𝑥𝑖 ; 𝜃) ∏ 𝑓 (𝑦𝑖 ; 𝜃) 𝑖=1 𝑖=1 𝑛1 𝑛2 (𝑥𝑖 − 𝜇1 )2 (𝑦𝑖 − 𝜇2 )2 1 =∏ 𝑒𝑥𝑝 [− ] ∏ 𝑒𝑥𝑝 [− ] 2𝜎12 2𝜎22 √2𝜋𝜎 √2𝜋𝜎 1 2 𝑖=1 𝑖=1 𝑛1 =∏ 𝑖=1 1 1 √2𝜋𝜎 𝑒𝑥𝑝 [− 𝜇 2 (𝑥𝑖 − 3) 2𝜎 2 𝑛2 ]∏ 𝑖=1 1 √𝜋𝜎 𝑒𝑥𝑝 [− (𝑦𝑖 − 𝜇)2 ] 𝜎2 𝜇 2 1 2 ∑𝑛𝑖=1 (𝑥𝑖 − 3) ∑𝑛𝑖=1 (𝑦𝑖 − 𝜇)2 −𝑛1 /2 −(𝑛1 +𝑛2 )/2 [𝜎 2 ]−(𝑛1 +𝑛2 )/2 =2 𝜋 𝑒𝑥𝑝 [− − ] 2𝜎 2 𝜎2 The log likelihood function is 𝜇 2 1 2 ∑𝑛𝑖=1 (𝑥𝑖 − 3) ∑𝑛𝑖=1 (𝑦𝑖 − 𝜇)2 (𝑛1 + 𝑛2 ) 𝑙 = 𝑙𝑛𝐿 = 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡 − 𝑙𝑛(𝜎 2 ) − − 2 2𝜎 2 𝜎2 Solving 𝜇 𝑛1 𝑛2 (𝑦𝑖 − 𝜇) 𝜕𝑙 ∑𝑖=1 (𝑥𝑖 − 3) 2 ∑𝑖=1 = + =0 𝜕𝜇 3𝜎 2 𝜎2 and 𝜇 2 𝑛1 2 ∑𝑛𝑖=1 (𝑦𝑖 − 𝜇)2 (𝑛1 + 𝑛2 ) ∑𝑖=1 (𝑥𝑖 − 3) 𝜕𝑙 =− + + =0 𝜕𝜎 2 2𝜎 2 2𝜎 4 𝜎4 We obtain the MLE for 𝜇 𝑎𝑛𝑑 𝜎 2 : 𝜇̂ = 3𝑛1 𝑋̅ + 18𝑛2 𝑌̅ 𝑛1 + 18𝑛2 6 𝜇̂ 2 𝑛2 1 ∑𝑛𝑖=1 (𝑦𝑖 − 𝜇̂ )2 (𝑥𝑖 − 3) + 2 ∑𝑖=1 2 ̂ 𝜎 = 𝑛1 + 𝑛2 (2) Since 𝐸[𝑋̅] = 𝜇/3 and 𝐸[𝑌̅] = 𝜇 It is straight-forward to verify that the MLE estimator for 𝜇 𝜃̂ = 𝑋̅ is an unbiased ̂ 𝜇 Since 3𝜇1 = 𝜇2 = 𝜇, we have 𝜇̂ 1 = 3 , 𝑎𝑛𝑑 𝜇̂ 2 = 𝜇̂ . The MLE is re-written as: ̂2 = 𝜎 Therefore, Shuanglong): ̂2 ] = 𝐸[𝜎 1 2 ∑𝑛𝑖=1 (𝑥𝑖 − 𝜇1 + 𝜇1 − 𝜇̂ 1 )2 + 2 ∑𝑛𝑖=1 (𝑦𝑖 − 𝜇2 + 𝜇2 − 𝜇̂ 2 )2 𝑛1 + 𝑛2 we calculate the expectation of the MLE as follows (from 𝑛1 1 ∑𝑛𝑖=1 𝐸(𝑥𝑖 − 𝜇1 )2 − 2 ∑𝑖=1 𝐸[(𝑥𝑖 − 𝜇1 )(𝜇̂ 1 − 𝜇1 )] + 𝑛1 𝐸(𝜇̂ 1 − 𝜇1 )2 𝑛1 + 𝑛2 𝑛 + 𝑛 2 2 2 ∑𝑖=1 𝐸(𝑦𝑖 − 𝜇2 )2 − 4 ∑𝑖=1 𝐸[(𝑦𝑖 − 𝜇2 )(𝜇̂ 2 − 𝜇2 )] + 2𝑛2 𝐸(𝜇̂ 2 − 𝜇2 )2 𝑛1 + 𝑛2 Note that the second and fourth term above can be combined to be: 2 1 2 ∑𝑛𝑖=1 ∗ 𝐸[(3𝑥𝑖 − 𝜇)(𝜇̂ − 𝜇)] 4 ∑𝑛𝑖=1 𝐸[(𝑦𝑖 − 𝜇)(𝜇̂ − 𝜇)] 9 + 𝑛1 + 𝑛2 𝑛1 + 𝑛2 = 2 𝑛2 1 (𝑦𝑖 − 𝜇) + ∗ ∑𝑛𝑖=1 (3𝑥𝑖 − 𝜇))] 𝐸[(𝜇̂ − 𝜇)(4 ∑𝑖=1 9 𝑛1 + 𝑛2 2 ∗ 𝐸[(𝜇̂ − 𝜇)((3𝑛1 𝑋̅ + 18𝑛2 𝑌̅) − (𝑛1 + 18𝑛2 )𝜇)] =9 𝑛1 + 𝑛2 = 𝑛 −2 ∗ ( 91 + 2𝑛2 ) 𝐸(𝜇̂ − 𝜇)2 𝑛1 + 𝑛2 ̂2 ] ⟹ 𝐸[𝜎 7 𝑛2 1 ∑𝑛𝑖=1 𝐸(𝑥𝑖 − 𝜇1 )2 + 2 ∑𝑖=1 𝐸(𝑦𝑖 − 𝜇2 )2 + 𝐸(𝜇̂ 1 − 𝜇1 )2 +2𝑛2 𝐸(𝜇̂ 2 − 𝜇2 )2 = 𝑛1 + 𝑛2 2 ∗ (𝑛1 /9 + 2𝑛2 )𝐸(𝜇̂ − 𝜇)2 𝑛1 + 𝑛2 2 (𝑛1 + 𝑛2 )𝜎 − (𝑛1 /9 + 2𝑛2 )𝑉𝑎𝑟(𝜇̂ ) (𝑛1 /9 + 2𝑛2 )𝑉𝑎𝑟(𝜇̂ ) = = 𝜎2 − 𝑛1 + 𝑛2 𝑛1 + 𝑛2 − And 3𝑛1 𝑋̅ + 18𝑛2 𝑌̅ 9𝑛1 2 𝑉𝑎𝑟(𝑋̅) + 364 𝑛2 2 𝑉𝑎𝑟(𝑌̅) 𝑉𝑎𝑟(𝜇̂ ) = 𝑉𝑎𝑟 ( )= (𝑛1 + 18𝑛2 )2 𝑛1 + 18𝑛2 9𝑛1 𝑉𝑎𝑟(𝑋) + 364 𝑛2 𝑉𝑎𝑟(𝑌) 9𝜎 2 = = (𝑛1 + 18𝑛2 )2 𝑛1 + 18𝑛2 Therefore, ̂2 ] = 𝑛1 + 𝑛2 − 1 𝜎 2 𝐸[𝜎 𝑛1 + 𝑛2 ̂2 is not an unbiased estimator of 𝜎 2 . So we know the MLE 𝜎 Project 2B. For those of us who have trouble with our original second project (*now I call it 2A, to derive the maximum likelihood estimators of the EIV model for simple linear regression), you may derive the method of moment estimators (MOME’s) instead. Note: your answer should be the same as the MLE’s for the regression slope and intercept. You can then pin-point which special cases correspond to the OR and the GMR. We also need to identify the two boundary lines of the entire class of EIV regression lines. Hint: Given a fixed error variance ratio 𝜆, you will be able to obtain the MOME’s Using the following equations based on the first and second moments: 𝐸[𝑋] = 𝑋̅ 𝐸[𝑌] = 𝑌̅ ∑𝑛𝑖=1(𝑋𝑖 − 𝑋̅)2 𝑉𝑎𝑟[𝑋] = 𝑛 𝑛 ∑𝑖=1(𝑌𝑖 − 𝑌̅)2 𝑉𝑎𝑟[𝑌] = 𝑛 𝑛 ∑𝑖=1(𝑋𝑖 − 𝑋̅)(𝑌𝑖 − 𝑌̅) 𝐶𝑜𝑣[𝑋, 𝑌] = 𝑛 Please note that the MOME does not depend on the bivariate normal distribution assumptions, and it is easier to derive. However, unlike the MLE approach, it does not give us other inference directly such as confidence interval or test. It does not 8 have a straight forward geometric interpretation either. Project 3. Our third project is to derive a class of non-parametric estimators of the EIV model for simple linear regression based on minimizing the sum of the following distance from each point to the line as illustrate din the figure below: Distance = c ∗ d2v + (1 − c)d2H , 0≤c≤1 Please also show whether there is a 1-1 relationship between this class of estimators and those in Project 2 (A/B). That is, try to ascertain whether there is a 1-1 relationship between c and 𝜆. Project 4. For those who have finished Projects 2 & 3 above, you may also examine whether there is a 1-1 relationship between the following class of regression lines based on minimizing the sum of squared slant distances from the point to the line, to those in Project 2 (A/B) or 3. 9 Project 5. For those who have finished Projects 2 & 3 & 4 above, you may also examine how to extend Projects 2/3/4 to the case where we have two random regressors X1 and X2 . Project 6. For those who have finished Projects 2 & 3 & 4 above, you may also examine how to estimate the error variance ratio λ, when we have two repeated measures on each sample. Dear all, I hope you will enjoy doing these projects. Please leave enough time for yourselves to make a good power point presentation, and to practice your presentation, ahead of time. Good luck! Prof. Zhu 10