Econ 399 Chapter2a

Part 1 Cross Sectional Data •Simple Linear Regression Model – Chapter 2 •Multiple Regression Analysis – Chapters 3 and 4 •Advanced Regression Topics – Chapter 6 •Dummy Variables – Chapter 7 •Note: Appendices A, B, and C are additional review if needed. 1. The Simple Regression Model 2.1 Definition of the Simple Regression Model 2.2 Deriving the Ordinary Least Squares Estimates 2.3 Properties of OLS on Any Sample of Data 2.4 Units of Measurement and Functional Form 2.5 Expected Values and Variances of the OLS Estimators 2.6 Regression through the Origin 2.1 The Simple Regression Model • Economics is built upon assumptions -assume people are utility maximizers -assume perfect information -assume we have a can opener • The Simple Regression Model is based on assumptions -more assumptions are required for more analysis -disproving assumptions leads to more complicated models 2.1 The Simple Regression Model • Recall the SIMPLE LINEAR REGRESION MODEL: y   0  1 x  u (2.1) -relates two variables (x and y) -also called the two-variable linear regression model or bivariate linear regression model y is the DEPENDENT or EXPLAINED variable x is the INDEPENDENT or EXPLANATORY variable y is a function of x 2.1 The Simple Regression Model • Recall the SIMPLE LINEAR REGRESION MODEL: y   0  1 x  u (2.1) u is the ERROR TERM or DISTURBANCE variable -u takes into account all factors other than x that affect y -u accounts for all “unobserved” impacts on y 2.1 The Simple Regression Model • Example of the SIMPLE LINEAR REGRESION MODEL: taste   0  1cookingtime  u (ie) -taste depends on cooking time -taste is explained by cooking time -taste is a function of cooking time -u accounts for other factors affecting taste (cooking skill, ingredients available, random luck, differing taste buds, etc.) 2.1 The Simple Regression Model • The SRM shows how y changes: y  1x if u  0 (2.2) -for example, if B1=3, a 2 increase in x would cause a 6 unit change in y (2 x 3 = 6) -B1 is the SLOPE PARAMETER -B0 is the INTERCEPT PARAMETER or CONSTANT TERM -not always useful in analysis 2.1 The Simple Regression Model y   0  1 x  u (2.1) -note that this equation implies CONSTANT returns -the first x has the same impact on y as the 100th x -to avoid this we can include powers or change functional forms 2.1 The Simple Regression Model -in order to achieve a ceteris paribus analysis of x’s affect on y, we need assumptions of u’s relationship with x -in order to simplify our assumptions, we first assume that the average of u in the population is zero: E (u)  0 (2.5) -if Bo is included in the equation, it can always be modified to make (2.5) true -ie: if E(u)>0, simply increase B1 2.1 x, u and Dependence -we now need to assume that x and u are unrelated -if x and u are uncorrelated, u may still be correlated to functions such as x2 -we therefore need a stronger assumption: E (u | x)  E (u )  0 (2.6) -the average value of u does not depend on x -second equality comes from (2.5) -called the ZERO CONDITIONAL MEAN ASSUMPTION 2.1 Example Take the regression: Papermark   0  1Paperquali ty  u (ie) -where u takes into other factors of the applied paper, in particular length exceeding 10 pages -assumption (2.6) requires that a paper’s length does not depend on how good it is: E (length | good paper)  E(length | bad paper)  0 2.1 The Simple Regression Model • Conditional Expectations of (2.1) and (2.6) give us: E (y | x)   0  1x (2.8) -2.8 is called the POPULATION REGRESSION FUNCTION (PRF) -a one unit increase in x increases the expected value of y by B1 -B0+B1x is the systematic (explained) part of y -u is the unsystematic (unexplained) part of y 2.2 Deriving the OLS Estimates • In order to estimate B1 and B2, we need sample data -let {(x,y):i=1,….n} be a sample of n observations from the population yi   0  1x i  u i (2.9) -here yi is explained by xi with error term ui -y5 indicates the observation of y from the 5th data point -this regression plots a “best fit” line through our data points: 2.2 Deriving the OLS Estimates These OLS estimates create a straight line going through the “middle” of the estimates: Studying and Marks 8 Studying 7 6 5 4 3 2 1 0 0 20 40 60 Marks 80 100 120 2.2 Deriving OLS Estimates In order to derive OLS, we first need assumptions. We must first assume that u has zero expected value: E (u)  0 (2.10) -Secondly, we must assume that the covariance between x and u is zero: Cov( x, u )  E ( xu)  0 (2.11) -(2.10) and (2.11) can also be rewritten in terms of x and y as: E (y -  0 - 1x)  0 E[ x(y -  0 - 1x)]  0 (2.12) (2.13) 2.2 Deriving OLS Estimates -(2.12) and (2.13) imply restrictions on the joint probability of the POPULATION -given SAMPLE data, these equations become: 1 n 1 n n  (y i 1 n ˆ - ˆ x )  0   i o 1 i  x (y i 1 i ˆ - ˆ x )  0   i o 1 i (2.14) (2.15) -notice that the “hat” above B1 and B2 indicate we are now dealing with estimates -this is an example of “method of moments” estimation (see Section C for a discussion) 2.2 Deriving OLS Estimates Using summation properties, (2.14) simplifies to: y  ˆ0  ˆ1 x (2.16) Which can be rewritten as: ˆ0  y  ˆ1 x (2.17) Which is our OLS estimate for the intercept -therefore given data and an estimate of the slope, the estimated intercept can be determined 2.2 Deriving OLS Estimates By cancelling out 1/n and combining (2.17) and 2.15 we get: n ˆ x)  ˆ x]  0 x [ y  ( y    i i 1 1 i 1 Which can be rewritten as: n n i 1 i 1 ˆ x ( y  y )    i i 1  xi ( xi  x ) 2.2 Deriving OLS Estimates Recall the algebraic properties: n n  x [ x  x]   [ x  x] i 1 i i i 1 2 i And n  x [y i 1 i n i  y ]   [ xi  x][ yi  y ] i 1 2.2 Deriving OLS Estimates We can make the simple assumption that: n  [ x  x] i 1 i 2 0 (2.18) Which essentially states that not all x’s are the same -ie: you didn’t do a survey where one question is “are you alive?” -This is essentially the key assumption needed to estimate B1hat 2.2 Deriving OLS Estimates All this gives us the OLS estimate for B1: n ˆ1   [ x  x][ y i i 1 i  y] (2.19) n  [ x  x] i 1 2 i Note that assumption (2.18) basically ensured the denominator is not zero. -also note that if x and y are positively (negatively) correlated, B1hat will be positive (negative) 2.2 Fitted Values OLS estimates of B1 and B2 give us a FITTED value for y when x=xi: ˆ ˆ yî  0  1x i (2.20) -there is one fitted or predicted value of y for each observation of x -the predicted y’s can be greater than, less than or (rarely) equal to the actual y’s 2.2 Residuals The difference between the actual y values and the estimates is the ESTIMATED error, or residuals: uî  yi  yî  yi  ˆ0  ˆ1x i (2.21) -again, there is one residual for each observation -these residuals ARE NOT the same as the actual error term 2.2 Residuals The SUM OF SQUARED RESIDUALS can be expressed as: n 2 ˆ ˆ  uî  ( yi   0  1x i ) 2 (2.22) i 1 -if B1hat and B2hat are chosen to minimize (2.22), (2.14) and (2.15) are our FIRST ORDER CONDITIONS (FOC’S) and we are able to derive the same OLS estimates as above (2.17) and (2.19) -the term “OLS” comes from the fact that the square of the residuals is minimized 2.2 Why OLS? Why minimize the sum of the squared residuals? -Why not minimize the residuals themselves? -Why not minimize the cube of the residuals? -not all minimization techniques can be expressed as formulas -OLS has the advantage of deriving unbiasedness, consistency, and other important statistical properties. 2.2 Regression Line Our OLS regression supplies us with an OLS REGRESSION LINE: ˆy  ˆ0  ˆ1x (2.23) -note that as this is an equation of a line, there are no subscripts -B0hat is the predicted value of y when x=0 -not always a valid value -(2.23) is also called the SAMPLE REGRESSION FUNCTION (SRF) -different data sets will estimate different B’s 2.2 Deriving OLS Estimates The slope estimate: ˆ1  ŷ/x (2.24) Shows the change in yhat when x changes, or alternatively, ŷ  ˆ1x (2.25) The change in x can be multiplied by B1hat to estimate the change in y. 2.2 Deriving OLS Estimates • Notes: 1) As the mathematics required to estimate OLS is difficult with more than a few data points, econometrics software (like Shazam) must be used. 2) A successful regression cannot conclude on causality, only comment on positive or negative relations between x and y 3) We often use the terminology “regress y on x” to estimate y=f(x) 2.3 Properties of OLS on Any Sample of Data •Review -Once again, simple algebraic properties are needed in order to build OLS’s foundation -OLS (B1hat and B2hat) can be used to calculate fitted values (yhat) -the residual (u) is the difference between the actual y values and the estimated y values (yhat) 2.3 Properties of OLS u=y-yhat Here yhat underpredicts y Studying Studying and Marks 8 7 6 5 4 3 2 1 0 uhat yhat 0 20 40 60 Marks y 80 100 120 2.3 Properties of OLS 1) From the FOC of OLS (2.14), the sum of all residuals is zero: n  uˆ i 1 i 0 (2.30) 2) Also from the FOC of OLS (2.15), the sample covariance between the regressors and the OLS residuals is zero: n  x uˆ i 1 i i 0 (2.31) From 2.30, the left side of 2.31) is proportional to the required sample covariance 2.3 Properties of OLS 3) The point (xbar, ybar) is always on the OLS regression line (from 2.16): y  ˆ0  ˆ1 x (2.16) Further Algebraic Gymnastics: 1) From (2.30) we know that the sample average of the fitted y values equals the sample average of the actual y values: yˆ  y 2.3 Properties of OLS Further Algebraic Gymnastics: 2) 2.30 and 2.31 combine to prove that the covariance between yhat and uhat is zero Therefore OLS breaks down yi into two uncorrelated parts – a fitted value and a residual: yi  yˆ i  uî (2.32) 2.3 Sum of Squares From the idea of fitted and residual components, we can calculate the TOTAL SUM OF SQUARES (SST), the EXPLAINED SUM OF SQUARES (SSE) and the RESIDUAL SUM OF SQUARES (SSR) n SST   (y i - y) 2 (2.33) i 1 n SSE   (ŷ i - y) 2 (2.34) i 1 n SSR   (y i - ŷ i ) 2  i 1 n 2 ( û )  i i 1 (2.35) 2.3 Sum of Squares SST measures the sample variation in y. SSE measures the sample variation in yhat (the fitted component. SSR measures the sample variation in uhat (the residual component. These relate to each other as follows: SST  SSE  SSR (2.36) 2.3 Proof of Squares The proof of (2.36) is as follows: 2 2 ˆ ˆ (y  y )  [(y  y )  ( y  y )]  i  i i i 2 ˆ ˆ   [(ui )  ( yi  y )] 2 2 ˆ ˆ   (ui )  2 û i (y i  y )   ( yi  y )]  SSR  2 û i (y i  y )  SSE Since we assumed that the covariance between residuals and fitted values is zero, 2 û i (y i  y)  0 (2.37) 2.3 Properties of OLS on Any Sample of Data •Notes -An in-depth analysis of sample and intervariable covariance is available in section C for individual study -SST, SSE and SSR have differing interpretations and labels for different econometric software. As such, it is always important to look up the base formula 2.3 Goodness of Fit -Once we’ve run a regression, the question is begged, “How well does x explain y.” -We can’t answer that yet, but we can ask, “How well does the OLS regression line fit the data?” -To measure this, we use R2, the COEFFICIENT OF DETERMINATION: SSE SSR R   1SST SST 2 (2.38) 2.3 Goodness of Fit -”R2 is the ratio of the explained variation compared to the total variation” -”the fraction of the sample variation in y that is explained by x” -R2 always lies between zero and 1 -if R2=1, all actual points lie on the regression line (usually an error) -if R2≈0, the regression explains very little; OLS is a “poor fit” 2.3 Properties of OLS on Any Sample of Data •Notes -A low R2 is not uncommon in the social sciences, especially in cross-sectional analysis -econometric regressions should not be heavily judged due to a low R2 -for example, if R2=0.12, that means 12% of the variation is explained, which is better than the 0% before the regression

Econ 399 Chapter2a

Related documents

Products

Support

Econ 399 Chapter2a

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib