SIMPLE REGRESSION MODEL Y Y b1 b 2 X b1 X1 X2 X3 X4 X Suppose that a variable Y is a linear function of another variable X, with unknown parameters b1 and b2 that we wish to estimate. 1 SIMPLE REGRESSION MODEL Y Y b1 b 2 X b1 X1 X2 X3 X4 X Suppose that we have a sample of 4 observations with X values as shown. 2 SIMPLE REGRESSION MODEL Y Y b1 b 2 X b1 Q1 X1 Q2 X2 Q3 X3 Q4 X4 X If the relationship were an exact one, the observations would lie on a straight line and we would have no trouble obtaining accurate estimates of b1 and b2. 3 SIMPLE REGRESSION MODEL P4 Y Y b1 b 2 X P1 b1 Q1 X1 Q2 P2 X2 Q3 Q4 P3 X3 X4 X In practice, most economic relationships are not exact and the actual values of Y are different from those corresponding to the straight line. 4 SIMPLE REGRESSION MODEL P4 Y Y b1 b 2 X P1 b1 Q1 X1 Q2 P2 X2 Q3 Q4 P3 X3 X4 X To allow for such divergences, we will write the model as Y = b1 + b2X + u, where u is a disturbance term. 5 SIMPLE REGRESSION MODEL P4 Y Y b1 b 2 X u1 P1 b1 Q1 b1 b 2 X 1 X1 Q2 P2 X2 Q3 Q4 P3 u = disturbance term X3 X4 X Each value of Y thus has a nonrandom component, b1 + b2X, and a random component, u. The first observation has been decomposed into these two components. 6 SIMPLE REGRESSION MODEL P4 Y P1 P2 X1 X2 P3 X3 X4 X In practice we can see only the P points. 7 SIMPLE REGRESSION MODEL P4 Y Yˆ b1 b2 X P1 P2 b1 X1 X2 P3 X3 X4 X Obviously, we can use the P points to draw a line which is an approximation to the line Y = b1 + b2X. If we write this line Y^ = b1 + b2X, b1 is an estimate of b1 and b2 is an estimate of b2. 8 SIMPLE REGRESSION MODEL Y Y (actual value) Yˆ (fitted value) P4 Yˆ b1 b2 X R3 P1 R1 b1 X1 R2 P2 X2 R4 P3 X3 X4 X The line is called the fitted model and the values of Y predicted by it are called the fitted values of Y. They are given by the heights of the R points. 9 SIMPLE REGRESSION MODEL Y Y (actual value) Yˆ (fitted value) P4 Yˆ b1 b2 X e4 R3 e1 R2 P1 R1 b1 X1 e2 P2 X2 R4 e3 Y Yˆ e (residual) P3 X3 X4 X The discrepancies between the actual and fitted values of Y are known as the residuals. 10 SIMPLE REGRESSION MODEL Least squares criterion: Minimize RSS (residual sum of squares), where n RSS ei2 e12 ... en2 i 1 To begin with, we will draw the fitted line so as to minimize the sum of the squares of the residuals, RSS. This is described as the least squares criterion. 19 DERIVING LINEAR REGRESSION COEFFICIENTS True model Y b1 b 2 X u Y 6 Y3 Y2 5 4 3 Y1 2 1 0 0 1 2 3 X We will start with a numerical example with just three observations: (1,3), (2,5), and (3,6). 2 DERIVING LINEAR REGRESSION COEFFICIENTS True model Y b1 b 2 X u Y 6 Yˆ3 6.17 Fitted model Yˆ 1.67 1.50 X 5 Y3 Y2 Yˆ2 4.67 4 Yˆ1 3.17 3 Y1 2 b1 b2 1 0 0 1 2 3 X The fitted line and the fitted values of Y are as shown. 12 DERIVING LINEAR REGRESSION COEFFICIENTS Y True model Y b1 b 2 X u Fitted model Yˆ b1 b2 X Yˆn b1 b2 X n Yn Y1 b1 b2 Yˆ1 b1 b2 X 1 b1 Y b2 X b2 X X Y Y X X i i 2 i X1 Xn X We chose the parameters of the fitted line so as to minimize the sum of the squares of the residuals. As a result, we derived the expressions for b1 and b2. 44 DERIVING LINEAR REGRESSION COEFFICIENTS Y True model Y b1 b 2 X u Fitted model Yˆ b1 b2 X Yˆn b1 b2 X n Yn Y1 b1 b2 Yˆ1 b1 b2 X 1 b1OLS Y b2OLS X OLS 2 b X X Y Y X X i i 2 i X1 Xn X Again, we should make the mathematical point discussed in the context of the numerical example. These are the particular values of b1 and b2 that minimize RSS, and we should differentiate them from the rest by giving them special names, for example b1OLS and b2OLS. 45 DERIVING LINEAR REGRESSION COEFFICIENTS True model Y b1 b 2 X u Yˆ b1 b2 X Using the matrix algebra bˆ OLS ( X T X ) 1 X T Y . 45 SIMPLE REGRESSION MODEL Example: Using the dataset from EAEF Subset 21 (Gretl) Estimate the model: Y b1 b 2 X u where: Y – weight of the respondent in 1985 measured in pounds X – his or her height, measured in inches 19 Copyright Christopher Dougherty 2012. These slideshows may be downloaded by anyone, anywhere for personal use. Subject to respect for copyright and, where appropriate, attribution, they may be used as a resource for teaching an econometrics course. There is no need to refer to the author. The content of this slideshow comes from Section 1.2 of C. Dougherty, Introduction to Econometrics, fourth edition 2011, Oxford University Press. Additional (free) resources for both students and instructors may be downloaded from the OUP Online Resource Centre http://www.oup.com/uk/orc/bin/9780199567089/. Individuals studying econometrics on their own who feel that they might benefit from participation in a formal course should consider the London School of Economics summer school course EC212 Introduction to Econometrics http://www2.lse.ac.uk/study/summerSchools/summerSchool/Home.aspx or the University of London International Programmes distance learning course EC2020 Elements of Econometrics www.londoninternational.ac.uk/lse. 2012.10.28