CIS 2033 based on Dekking et al. A Modern Introduction to Probability and Statistics. 2007 Slides by Kier Heilman Instructor Longin Jan Latecki C22: The Method of Least Squares 22.1 – Least Squares Consider the random variables: Yi = α + βxi + Ui for i = 1, 2, . . ., n. where random variables U1, U2, …, Un have zero expectation and variance σ 2 Method of Least Squares: Choose a value for α and β such that S(α,β)=( ∑ ( y − α− β x )) is minimal. n 2 i 1 i 22.1 – Regression The observed value yi corresponding to xi and the value α+βxi on the regression line y = α + βx. 22.1– Estimation After some calculus magic, we have the following two simultaneously equations to estimate α and β: n n i= 1 i= 1 n α + β ∑ x i= ∑ y i n n n i= 1 i= 1 i= 1 α ∑ x i+ β ∑ x 2i = ∑ x i y i 22.1– Estimation After some simple algebraic rearranging, we put the equations in terms of α and β: ̂β= n ∑ xi y i− ( ∑ xi )( ∑ y i ) 2 2 n ∑ x i − (∑ x i) ̂α= ȳ n− β̂ x̄ n (slope) (intercept) 22.1– Least Square Estimators are Unbiased All estimators for α and β are unbiased. For the simple linear regression model, the random variable n ̂σ 2= 1 2 ̂ (Y − ̂α − β x ) ∑ i n− 2 i= 1 i is an unbiased estimator for δ2. 22.2– Residuals Residual: The vertical distance between the ith point and the estimated regression line: ̂ i r i= yi− ̂α− βx n The sum of the residuals is zero. r i= 0 ∑ i= 1 22.2– Heteroscedasticity Homoscedasticity: The assumption of equal variance of the Ui (and therefore Yi). For instance, heteroscedasticity occurs when Yi with a large expected value have a larger variance than those with small expected values. 22.3– Relation with Maximum Likelihood What are the maximum likelihood estimates for α and β? To apply the method of least squares no assumption is needed about the type of distribution of the Ui. In case the type of distribution of the Ui is known, the maximum likelihood principle can be applied. Consider, for instance, the classical situation where the Ui are independent with an N(0, σ2) distribution. Using the maximum likelihood estimation for a normal distribution: Yi has an N (α + βxi, σ2) distribution, making the probability density function 22.3– Maximum Likelihood For fixed σ >0 the loglikelihood l (α, β, σ) obtains the maximum when n ∑1 ( yi− α− β x i)2 is minimal. Hence, when random variables independent with a N(0,δ2) distribution, the maximum likelihood principle and the least squares method return the same estimators. The maximum likelihood estimator for σ 2 is: n 1 2 ̂ ̂σ = ∑ (Y i− ̂α − β xi ) n i= 1 2