The Linear Regression Model: A Quick Review (Reference: Appendix to Ch. 1) Consider the following forecast setting – We want to forecast the value of a variable y, given the value of a variable x. Denote that forecast yf│x. Think of y and x as random variables jointly drawn from some underlying population. It seems reasonable to consider constructing the forecast of y based on x as the expected value of y conditional on x, i.e., yf│x = E(y │x ), the average population value of y given that value of x. It turns out that in many reasonable forecasting settings, this forecast has optimal properties (e.g., minimizing expected loss) and (approximating) this forecast guides our choice of forecast method. Note that the forecast error will be y - E(y │x ) and E[y - E(y │x )] = expected forecast error = E(y)-E(E(y│x ) = E(y)-E(y) = 0 In other words, this forecast of y is an unbiased forecast. Note too, that another name for E(y │x ) is the population regression of y (on x). In order to proceed in this direction, we need to make some additional assumptions about the underlying population and, in particular, the form of E(y │x ). The simplest assumption to make (recall the KISS principle) is to assume that the conditional expectation is a linear function of x, i.e., assume E(y │x ) = β0 + β1x If β0 and β1 are known, then the forecast problem is completed by setting yf│x = β0 + β1x However, in practice, even if the conditional expectation is linear in x, the parameters β0 and β1 will be unknown. It seems that the next best thing for us to do would be to estimate the values of β0 and β1 and use the estimated β’s in place of their actual values to form the forecasts. This substitution will not provide as accurate a forecast, since we’re introducing a new source of forecast error due to “estimation error” or “sampling error.” However, under certain conditions the resulting forecast will still be unbiased and retain certain optimality properties. Suppose we have access to a sample of T pairs of (x,y) drawn from the population from which the relevant value of y will be drawn: (x1,y1),(x2,y2),…,(xT,yT). We can represent these data graphically via a scatterplot, such as the scatterplot from Figure A.1 in the textbook. In this case, a natural estimator of β0 and β1 is the ordinary least squares (OLS) estimator, which is obtained by minimizing the sum of squared residuals T (y i 1 i 0 1 xi ) 2 with respect to β0 and β1. The solution are the OLS estimates ̂ 0 and ˆ . 1 Then, for a given value of x, we can forecast y according to yf = ̂ 0 + ˆ x 1 This estimation procedure, also called the sample regression of y on x, will provide us with a “good” estimate of the conditional expectation of y given x (i.e., the population regression of y on x) and, therefore, a “good” forecast of y given x, provided that certain additional assumptions apply to the relationship between y and x. Let ε denote the difference between y and E(y │x ). That is, ε = y - E(y │x ) i.e., y = E(y │x ) + ε and y = β0 + β1x + ε, if E(y │x ) = β0 + β1x. The assumptions that we need pertain to these ε’s (the “other factors” that determine y) and their relationship to the x’s. For instance, so long as E(εt │x1,…,xT) = 0 for t = 1,…,T, the OLS estimator of β0 and β1 based on the data (x1,y1),…,(xT,yT) will be unbiased and, as a result, the forecast constructed by replacing these “population parameters” with the OLS estimates will be unbiased. A standard set of assumptions that provide us with a lot of value – Given x1,…,xT , ε1,…,εT are i.i.d. N(0,σ2) random variables. These ideas and procedures extend naturally to the setting where we want to forecast the value of y based on the values of k other variables, say, x1,…,xk. We begin by considering the conditional expectation or population regression of y on x1,…,xk to make our forecast. That is, yf│x1,…,xk = E(y│x1,…,xk) To operationalize this forecast, we first assume that the conditional expectation is linear, i.e., E(y│x1,…,xk) = β0 + β1x1 + … + βkxk Since the β’s are generally unknown, we consider replacing this linear population regression with the sample regression – ˆ0 ˆ1 x1 ... ˆk xk where the β-hats are the OLS estimates obtained from the data set (y1,x11,…,xk1) (y2,x12,…,xk2) … (yT,x1T,…,xkT) by minimizing the sum-of-squared residuals, (yt – β0 – β1x1t - … - βkxkt)2, t = 1,…,T. As in the case of the simple regression model, this procedure to estimate the population regression function will have good properties provided that the regression errors εt = yt – E(yt│x1t,…,xkt) , t = 1,…,T have appropriate properties. Density Forecasts and Interval Forecasts – The procedures we described above produce point forecasts of y. They can also be used to produce density and interval forecasts of y, provided that the x’s and the regression errors, i.e., the ε’s, meet certain conditions.