Model and data MOOC Econometrics Simple regression model: yi = α + βxi + εi Lecture 1.3 on Simple Regression: Estimation In econometrics, we do not know α and β (and εi ) we do have observations on xi and yi Philip Hans Franses Use observed data on xi and yi to find ”optimal” values of a and b so that yi ≈ a + bxi . The line y = a + bx is called the regression line. Lecture 1.3, Slide 2 of 12, Erasmus School of Economics Data and regression line Data and regression line Y Data: n pairs of observations (xi , yi ) for i = 1, 2, . . . , n Y Data: n pairs of observations (xi , yi ) for i = 1, 2, . . . , n X Fitted line: yi = a + bxi a: intercept b: slope X Fitted line: yi = a + bxi a: intercept b: slope Residuals: ei = yi − a − bxi Residuals: ei = yi − a − bxi Choose fitted line such that ei are small. Lecture 1.3, Slide 3 of 12, Erasmus School of Economics Choose fitted line such that ei are small. Lecture 1.3, Slide 4 of 12, Erasmus School of Economics Least squares Solving Least squares criterion: find a and b by minimizing P P S(a, b) = ni=1 ei2 = ni=1 (yi − a − bxi )2 Get a and b by solving ∂S ∂a = 0 and ∂S ∂b = 0. = 0 (and later we consider ∂S ∂b = 0): Pn Pn = −2 i=1 (yi − a − bxi ) = −2 i=1 ei We first analyze 0= ∂S ∂a ∂S ∂a 0= ∂S ∂a ∂S ∂a =0 = −2 Pn i=1 (yi − a − bxi ) = −2 Denote sample means by ȳ = 1 n Pn i=1 yi Pn i=1 yi + 2na + 2b and x̄ = 1 n Pn i=1 xi Pn i=1 xi , then above equation gives: −2ȳ + 2a + 2bx̄ = 0 So: a = ȳ − bx̄ Note: One residual follows from the other n-1 residuals: en = −(e1 + e2 + ... + en−1 ) Lecture 1.3, Slide 6 of 12, Erasmus School of Economics Lecture 1.3, Slide 5 of 12, Erasmus School of Economics Test question Solving ∂S ∂b =0 S(a, b) = Pn 2 i=1 ei = Pn i=1 (yi − a − bxi )2 Test Suppose we apply least squares on de-meaned data, with dependent variable yi∗ = yi − ȳ and explanatory factor xi∗ = xi − x̄. Which values do a and/or b take in this special case? Answer: Check that ȳ ∗ = 1 n Pn i=1 yi − 1 n Pn i=1 ȳ = ȳ − ȳ = 0, and likewise x̄ ∗ = 0. So: a = ȳ ∗ − bx̄ ∗ = 0 − b × 0 = 0. 0= ∂S ∂b = −2 Pn i=1 xi (yi − a − bxi ) = −2 Pn i=1 xi ei Note: if x1 6= 0, then e1 = −(x2 e2 + x3 e3 + ... + xn en )/x1 Pn − a − bxi ) P P = i=1 xi yi − a ni=1 xi − b ni=1 xi2 P P P = ni=1 xi yi − (ȳ − bx̄) ni=1 xi − b ni=1 xi2 P P P P = ni=1 xi yi − ni=1 xi ȳ + b ni=1 xi x̄ − b ni=1 xi2 P P = ni=1 xi (yi − ȳ ) − b ni=1 xi (xi − x̄) 0= i=1 xi (yi Pn Later we will see that b is not affected by de-meaning. So: b = Lecture 1.3, Slide 7 of 12, Erasmus School of Economics Pn xi (yi −ȳ ) Pi=1 n i=1 xi (xi −x̄) Lecture 1.3, Slide 8 of 12, Erasmus School of Economics Solving ∂S ∂b =0 R-squared Pn xi (yi −ȳ ) Pi=1 n i=1 xi (xi −x̄) b= Data (xi , yi ) give numerical values of a and b, with a = ȳ − bx̄. Pn Pn i=1 (yi − ȳ ) = i=1 yi Pn Pn i=1 (xi − x̄) = i=1 xi Pn x̄ i=1 (yi − ȳ ) = 0 and Use that − nȳ = 0, and similarly − nx̄ = 0, hence P x̄ ni=1 (xi − x̄) = 0 We get: b= Pn xi (yi −ȳ ) Pi=1 n i=1 xi (xi −x̄) = Pn (x −x̄)(yi −ȳ ) i=1 Pn i 2 i=1 (xi −x̄) Test What value does b take if all observations of yi are equal to 93? Answer: b = Pn (x −x̄)(yi −ȳ ) i=1 Pn i 2 i=1 (xi −x̄) with yi − ȳ = 93 − 93 = 0. So: b = 0. Then yi = a + bxi + ei = ȳ − bx̄ + bxi + ei , so yi − ȳ = b(xi − x̄) + ei Deviation yi − ȳ partly explained by xi − x̄ (ei is unexplained). P P Seen before: P ni=1 ei = 0 and Pni=1 xi ei = 0, P hence n n n i=1 (xi − x̄)ei = i=1 xi ei − x̄ i=1 ei = 0. Squaring (SS) ofP (∗) therefore gives: Pn and Summing Pn both sides n 2 = b2 2+ 2 (y − ȳ ) (x − x̄) i=1 i i=1 i i=1 ei SSTotal = SSExplained + SSResidual R2 SSExplained = =1− SSTotal Pn e2 Pn i=1 i 2 i=1 (yi −ȳ ) Lecture 1.3, Slide 9 of 12, Erasmus School of Economics Estimate of error variance (∗) Lecture 1.3, Slide 10 of 12, Erasmus School of Economics TRAINING EXERCISE 1.3 yi = α + βxi + εi with εi ∼ NID(0, σ 2 ) Unknown σ 2 is estimated from residuals ei = yi − a − bxi . Residuals ei , i = 1, 2, ..., n, have n − 2 free values (seen before). s2 = 1 n−2 Pn i=1 (ei Seen before: Train yourself by making the training exercise (see the website). After making this exercise, check your answers by studying the − ē)2 . webcast solution (also available on the website). Pn i=1 ei Therefore: s 2 = 1 n−2 = 0, so ē = 0. Pn 2 i=1 ei (see Building Blocks for case n − 1) Lecture 1.3, Slide 11 of 12, Erasmus School of Economics Lecture 1.3, Slide 12 of 12, Erasmus School of Economics