The Simple Regression Model y = b0 + b1x + u FIN357 Li 1 Some Terminology In the simple linear regression model, where y = b0 + b1x + u, we typically refer to y as the Dependent Variable, or Left-Hand Side Variable, or Explained Variable, or FIN357 Li 2 Some Terminology we typically refer to x as the Independent Variable, or Right-Hand Side Variable, or Explanatory Variable, or Regressor, or Control Variables FIN357 Li 3 A Simple Assumption The average value of u, the error term, in the population is 0. That is, E(u) = 0 FIN357 Li 4 We also assume E(u|x) = 0 E(y|x) = b0 + b1x FIN357 Li 5 E(y|x) as a linear function of x, where for any x the distribution of y is centered about E(y|x) y f(y) . E(y|x) = b + b x 0 . x1 1 x2 FIN357 Li 6 Ordinary Least Squares (OLS) Let {(xi,yi): i=1, …,n} denote a random sample of size n from the population For each observation in this sample, it will be the case that yi = b0 + b1xi + ui FIN357 Li 7 Population regression line, sample data points and the associated error terms E(y|x) = b0 + b1x .{ u4 y y4 y3 y2 y1 .} u3 u2 {. } u1 . x1 x2 x3 FIN357 Li x4 x 8 Basic idea of regression is to estimate the population parameters from a sample Intuitively, OLS is fitting a line through the sample points such that the sum of squared residuals is as small as possible. The residual, û, is an estimate of the error term, u, and is the difference between the fitted line (sample regression function) and the sample point FIN357 Li 9 Sample regression line, sample data points and the associated estimated error terms (residuals) y . y4 û4 { yˆ bˆ0 bˆ1 x y3 y2 y1 .} û3 û2 { . û1 } . x1 x2 x3 FIN357 Li x4 x 10 One approach to estimate coefficients Given the intuitive idea of fitting a line, we can set up a formal minimization problem That is, we want to choose our parameters such that we minimize the following: n uˆi i 1 2 n yi bˆ0 bˆ1 xi i 1 FIN357 Li 2 11 It could be shown that estimated coefficient is n bˆ1 x x y y i 1 i i n x x i 1 FIN357 Li 2 i 12 Summary of OLS slope estimate The slope estimate is the sample covariance between x and y divided by the sample variance of x If x and y are positively correlated, the slope will be positive If x and y are negatively correlated, the slope will be negative FIN357 Li 13 Algebraic Properties of OLS: in English The sum of the OLS residuals is zero Thus, the sample average of the OLS residuals is zero as well The sample covariance between the regressors and the OLS residuals is zero The OLS regression line always goes through the mean of the sample FIN357 Li 14 Algebraic Properties of OLS: In mathematics: n n uˆ i 1 n uˆi 0 and thus, n x uˆ i 1 i i i 1 i 0 0 y bˆ0 bˆ1 x FIN357 Li 15 More terminology We can think of each observatio n as being made up of an explained part, and an unexplaine d part, yi yˆ i uˆi We then define the following : y y is the total sum of squares (SST) yˆ y is the explained sum of squares (SSE) uˆ is the residual sum of squares (SSR) 2 i 2 i 2 i Then SST SSE SSR FIN357 Li 16 Notations Alert The notation SSR (Sum of Squared Residuals) in this handout and my other lecture slides= ESS (Error Sum of Squares) in our textbook. The notation SSE (Sum of Squared Explained) in this handout and my other lecture slides = RSS (Regressed Sum of Squares) in our textbook. FIN357 Li 17 Goodness-of-Fit How do we think about how well our sample regression line fits our sample data? Can compute the fraction of the total sum of squares (SST) that is explained by the model, call this the R-squared of regression R2 = SSE/SST = 1 – SSR/SST FIN357 Li 18 OLS regressions Now that we’ve derived the formula for calculating the OLS estimates of our parameters, you’ll be happy to know you don’t have to compute them by hand Regressions in GRETL are very simple. Have you installed the software yet? FIN357 Li 19 Under some conditions, OLS esimated coefficients are unbiased. Define S xi x . It could be shown that : 2 2 x bˆ1 b1 x x u i S i 2 x 1 ˆ E b1 b1 2 xi x * E ui b1 Sx FIN357 Li 20 Unbiasedness Summary The OLS estimates of b1 and b0 are unbiased Remember unbiasedness is a description of the estimator – in a given sample we may be “near” or “far” from the true parameter FIN357 Li 21 Variance of the OLS Estimators Now we know that the sampling distribution of our estimated coefficient is centered around the true parameter Want to think about how spread out this distribution is Assume Var(u|x) =Var(u) = s2 FIN357 Li 22 Variance of OLS estimators s2 is called the error variance s, the square root of the error variance is called the standard deviation of the error E(y|x)=b0 + b1x and Var(y|x) = s2 FIN357 Li 23 Variance of OLS estimator Var bˆ1 s FIN357 Li 2 S 2 x 24 Variance of OLS Summary The larger the error variance, s2, the larger the variance of the slope estimate The larger the variability in the xi, the smaller the variance of the slope estimate Problem that the error variance is unknown FIN357 Li 25 Estimating the Error Variance We don’t know what the error variance, s2, is, because we don’t observe the errors, ui What we observe are the residuals, ûi We can use the residuals to form an estimate of the error variance FIN357 Li 26 Estimating the Error Variance uˆi yi bˆ0 bˆ1 xi an unbiased estimator of s is 2 1 2 sˆ uˆi SSR / n 2 n 2 2 FIN357 Li 27 Estimating Standard Error of coefficients Estimate se bˆ1 sˆ / x x 2 1 2 i FIN357 Li 28