Chapter 02 Ch.2 The simple regression model 1. 2. 3. 4. 5. 6. The Simple Regression Model y = 0 + 1x + u Econometrics Definition of the simple regression model Deriving the OLS estimates Mechanics of OLS Units of measurement & functional form Expected values & variances of OLSE Regression through the origin 1 Econometrics 2 Econometrics 4 2.1 Definition of the model Equation (2.1), y = 0 + 1x + u, defines the Simple Regression model. In the model, we typically refer to y as the Dependent Variable x as the Independent Variable s as parameters, and u as the error term. Econometrics 3 The Concept of Error Term A Simple Assumption for u u represents factors other than x that affect y. If the other factors in u are held fixed, so that u = 0, then y = 1x. Ex. 2.1: yield = 0 + 1fertilizer + u (2.3) u includes land quality, rainfall, etc. Ex. 2.2: wage = 0 + 1educ + u (2.4) u includes experience, ability, tenure, etc. Econometrics Simple Regression Model 5 The average value of u, the error term, in the population is 0. That is, E(u) = 0. This is not a restrictive assumption, since we can always use 0 to normalize E(u) to 0. To draw ceteris paribus conclusions about how x affects y, we have to hold all other factors (in u) fixed. Econometrics 6 1 Chapter 02 E(y|x) as a linear function of x, where for any x the distribution of y is centered about E(y|x) Zero Conditional Mean y We need to make a crucial assumption about how u and x are related. We want it to be the case that knowing something about x does not give us any information about u, so that they are completely unrelated. That is, that f(y) x1 E(y|x) = 0 + 1x . u4 { y3 y2 2 .} u1 y1 (B.27) Now we prepare 2 restrictions to estimate s. (2.10) (2.11) Simple Regression Model x2 x3 x4 Econometrics x 10 Deriving OLSE using MM Since u = y – 0 – 1x, we can rewrite; (2.12) E(u) = E(y – 0 – 1x) = 0 (2.13) E(xu) = E[x(y – 0 – 1x)] = 0 These are called moment restrictions E(u|x) = E(u) = 0 also implies that Cov(x,u) = E(xu) = 0 Econometrics .} u3 u {. Cont. To derive the OLS estimates, we need to realize that our main assumption of 8 y y4 9 Deriving OLSE using MM E(u) = 0 E(xu) = 0 x Econometrics x1 Econometrics 1 Population regression line, sample data points and the associated error terms Basic idea of regression is to estimate the population parameters from a sample. Let {(xi,yi): i = 1, …, n} denote a random sample of size n from the population. For each observation in this sample, it will be the case that yi = 0 + 1xi + ui. (2.9) Because Cov(X,Y) = E(XY) – E(X)E(Y) x2 7 2.2 Deriving the OLSE 0 . E(u|x) = E(u) = 0 (2.5&2.6), which implies E(y|x) = 0 + 1x (PRF) (2.8) Econometrics . E(y|x) = + x 11 The approach to estimation implies imposing the population moment restrictions on the sample moments. It means, a sample estimator of E(X), the mean of a population distribution, is simply the arithmetic mean of the sample. Econometrics 12 2 Chapter 02 More Derivation of OLS Cont. We want to choose values of the parameters that will ensure that the sample versions of our moment restrictions are true The sample versions are as follows: n Given the definition of a sample mean, and properties of summation, we can rewrite the first condition as follows y ˆ0 ˆ1 x (2.16) or ˆ0 y ˆ1 x (2.17) So the OLS estimated slope is n 1 yi ˆ0 ˆ1 xi 0 (2.14) i 1 n More Derivation of OLS n ˆ1 n 1 xi yi ˆ0 ˆ1 xi 0 (2.15) i 1 Econometrics Sample regression line, sample data points and the associated estimated error terms y1 Econometrics . uˆ y n i 1 2 i n i 1 i ˆ0 ˆ1 xi 2 (2.22) The first order conditions, which are the almost same as (2.14) & (2.15), y n .} û1 i 1 x1 16 Given the intuitive idea of fitting a line, we can set up a formal minimization problem. .} û3 . 14 Alternate approach to derivation yˆ ˆ 0 ˆ1 x û2 { Econometrics 15 û4 { (2.19) 2 xi x Intuitively, OLS is fitting a line through the sample points such that the sum of squared residuals is as small as possible, hence the term is called least squares. The residual, û, is an estimate of the error term, u, and is the difference between the fitted line (sample regression function) and the sample point. See (2.18) & Figure (2.3) y4 y More OLS The slope estimate is the sample covariance between x and y divided by the sample variance of x. If x and y are positively (negatively) correlated, the slope will be positive (negative). x needs to vary in our sample. y i n 13 Summary of OLS slope estimate y3 y2 i i 1 i 1 Econometrics x x y x2 Econometrics Simple Regression Model x3 x4 i n ˆ0 ˆ1 xi 0, xi yi ˆ0 ˆ1 xi 0 i 1 x 17 Econometrics 18 3 Chapter 02 2.3 Properties of OLS Cont. 2. The sample covariance between the Algebraic Properties of OLS regressors and the OLS residuals is zero 1. The sum of the OLS residuals is zero. n x uˆ Thus, the sample average of the OLS residuals is zero as well. n uˆ i 1 i 0 and thus, 1 n n uˆ i 1 i 0 Econometrics Cont. i 1 19 Econometrics 20 Goodness-of-Fit y y SST (2.33) yˆ y SSE (2.34) uˆ SSR (2.35) 2 i 2 i 2 i Then, SST SSE SSR (2.36) 21 2.4 Measurement Units & Function Form If we use the model y* = 0* + 1* x* + u* instead of y = 0 + 1 x + u, we get c ˆ0* cˆ0 and ˆ1* ˆ1 d where y* = c y and x* = d x. Similarly, y x y ˆ1 x y x It’s useful we think about how well the sample regression line fits sample data. From (2.36), SSE SSR (2.38). R2 1 SST SST R2 indicates the fraction of the sample variation in yi that is explained by the model. Econometrics 22 2.5 Means & Variance of OLSE Now, we view ̂ i as estimators for the parameters i that appears in the population, which means properties of the distributions of ̂ i over different random samples from the population. Unbiasedness of OLS Unbiased estimator: An estimator whose expected value (or mean of its sampling distribution) equals the population value (regardless of the population value). where y* = ln y and x* = ln x. Simple Regression Model 0 ( 2.31) y ˆ0 ˆ1 x up of an explained part, and an unexplained part, yi yˆ i uˆi (2.32) Then we define the following : Econometrics i through the mean of the sample We can think of each observation as being made ˆ1* i 3. The OLS regression line always goes (2.30) Algebraic Properties Econometrics Algebraic Properties 23 Econometrics 24 4 Chapter 02 Cont. Unbiasedness of OLS Cont. In order to think about unbiasedness, we need to rewrite our estimator in terms of the population parameter. Assumption for unbiasedness 1. Linear in parameters as y = 0 + 1x + u 2. Random sampling {(xi, yi): i = 1, 2, …, n}, Thus, yi = 0 + 1xi + ui 3. Sample variation in the xi, thus (x x) i 2 Unbiasedness of OLS ˆ1 xi x yi (x x) 2 i 0 then E ˆ1 1 4. Zero conditional mean, E(u|x) = 0 1 x x u (x x) i i 2 (2.49), (2.52) i x x E (u (x x) i 2 i | x ) 1 (2.53) i * we can also get E ( ˆ0 ) 0 in the same way. Econometrics 25 Unbiasedness Summary Cont. 26 Variances of the OLS Estimators The OLS estimates of 1 and 0 are unbiased. Proof of unbiasedness depends on our 4 assumptions – if any assumption fails, then OLS is not necessarily unbiased. Remember unbiasedness is a description of the estimator – in a given sample our estimate may be “near” or “far” from the true parameter. Econometrics Econometrics Now we know that the sampling distribution of our estimate is centered around the true parameter. We want to think about how spread out this distribution is. It is much easier to think about this variance under an additional assumption, so assume 5. Var(u|x) = 2 (Homoskedasticity) 27 Econometrics 28 Homoskedastic Case Variance of OLSE y 2 is also the unconditional variance, called the error variance, since f(y|x) Var(u|x) = E(u2|x) - [E(u|x)]2 2 2 2 E(u|x) = 0, so = E(u |x) = E(u ) = Var(u) And , the square root of the error variance, is called the standard deviation of the error. Then we can say . E(y|x) = + x E(y|x)=0 + 1x and Var(y|x) = 2 Econometrics Simple Regression Model x1 29 0 . 1 x2 Econometrics 30 5 Chapter 02 Heteroskedastic Case Cont. Variance of OLSE f(y|x) Var ( ˆ1 ) . . x1 . x2 x3 E(y|x) = 0 + 1x x Econometrics We don’t know what is the error variance, What we observe are only the residuals, ûi, not the errors, ui. So we can use the residuals to form an estimate of the error variance. Econometrics 33 Estimate 32 Estimate uˆi yi ˆ0 ˆ1 xi 0 1 xi ui ˆ0 ˆ1 xi u ˆ ˆ x i 0 0 1 1 i Then, an unbiased estimator of 2 is 1 uˆ 2 n 2 i (2.61) Econometrics 34 Now, consider the model without a intercept: ~ ~ y 1 x (2.63). Solving the FOC to the minimization problem, OLS estimated slope is xy ~ 1 i 2 i (2.66). xi If we substitute ˆ for , then we have the standard error of ˆ , 1 ˆ ( x i x )2 2 Simple Regression Model Econometrics 2.6 Regression through the Origin recall that s.d. ˆ Var ( ˆ ) Econometrics ( 2 . 57 ) x )2 The larger the error variance, 2, the larger the variance of the slope estimate. The larger the variability in the xi, the smaller the variance of the slope estimate. As a result, a larger sample size should decrease the variance of the slope estimate. ˆ 2 ˆ ˆ 2 Standard error of the regression i 2 Cont. Error Variance 2, because we don’t observe the errors, ui. se ˆ1 (x 31 Estimating the Error Variance Cont. Error Variance * Recall that a intercept can always normalize E(u) to 0 in the model with 0. 35 Econometrics 36 6