The Linear Regression Model:

advertisement
The Linear Regression Model:
A Quick Review
(Reference: Appendix to Ch. 1)
Consider the following forecast setting –
We want to forecast the value of a variable
y, given the value of a variable x.
Denote that forecast yf│x.
Think of y and x as random variables
jointly drawn from some underlying
population.
It seems reasonable to consider constructing
the forecast of y based on x as the expected
value of y conditional on x, i.e.,
yf│x = E(y │x ),
the average population value of y given that
value of x.
It turns out that in many reasonable
forecasting settings, this forecast has optimal
properties (e.g., minimizing expected loss)
and (approximating) this forecast guides our
choice of forecast method.
Note that the forecast error will be
y - E(y │x )
and
E[y - E(y │x )] = expected forecast error
= E(y)-E(E(y│x )
= E(y)-E(y) = 0
In other words, this forecast of y is an
unbiased forecast.
Note too, that another name for E(y │x ) is
the population regression of y (on x).
In order to proceed in this direction, we need
to make some additional assumptions about
the underlying population and, in particular,
the form of E(y │x ).
The simplest assumption to make (recall the
KISS principle) is to assume that the
conditional expectation is a linear function
of x, i.e., assume
E(y │x ) = β0 + β1x
If β0 and β1 are known, then the forecast
problem is completed by setting
yf│x = β0 + β1x
However, in practice, even if the conditional
expectation is linear in x, the parameters β0
and β1 will be unknown.
It seems that the next best thing for us to do
would be to estimate the values of β0 and β1
and use the estimated β’s in place of their
actual values to form the forecasts.
This substitution will not provide as accurate
a forecast, since we’re introducing a new
source of forecast error due to “estimation
error” or “sampling error.” However, under
certain conditions the resulting forecast will
still be unbiased and retain certain
optimality properties.
Suppose we have access to a sample of T
pairs of (x,y) drawn from the population
from which the relevant value of y will be
drawn: (x1,y1),(x2,y2),…,(xT,yT).
We can represent these data graphically via
a scatterplot, such as the scatterplot from
Figure A.1 in the textbook.
In this case, a natural estimator of β0 and β1
is the ordinary least squares (OLS)
estimator, which is obtained by minimizing
the sum of squared residuals
T
(y
i 1
i
  0   1 xi ) 2
with respect to β0 and β1. The solution are
the OLS estimates ̂ 0 and ˆ .
1
Then, for a given value of x, we can forecast
y according to
yf = ̂ 0 + ˆ x
1
This estimation procedure, also called the
sample regression of y on x, will provide
us with a “good” estimate of the conditional
expectation of y given x (i.e., the population
regression of y on x) and, therefore, a
“good” forecast of y given x, provided that
certain additional assumptions apply to the
relationship between y and x.
Let ε denote the difference between y and
E(y │x ). That is,
ε = y - E(y │x )
i.e.,
y = E(y │x ) + ε
and
y = β0 + β1x + ε, if E(y │x ) = β0 + β1x.
The assumptions that we need pertain to
these ε’s (the “other factors” that determine
y) and their relationship to the x’s.
For instance, so long as E(εt │x1,…,xT) = 0
for t = 1,…,T, the OLS estimator of β0 and
β1 based on the data (x1,y1),…,(xT,yT) will
be unbiased and, as a result, the forecast
constructed by replacing these “population
parameters” with the OLS estimates will be
unbiased.
A standard set of assumptions that provide
us with a lot of value –
Given x1,…,xT , ε1,…,εT are i.i.d. N(0,σ2)
random variables.
These ideas and procedures extend naturally
to the setting where we want to forecast the
value of y based on the values of k other
variables, say, x1,…,xk.
We begin by considering the conditional
expectation or population regression of y on
x1,…,xk to make our forecast. That is,
yf│x1,…,xk = E(y│x1,…,xk)
To operationalize this forecast, we first
assume that the conditional expectation is
linear, i.e.,
E(y│x1,…,xk) = β0 + β1x1 + … + βkxk
Since the β’s are generally unknown, we
consider replacing this linear population
regression with the sample regression –
ˆ0  ˆ1 x1  ...  ˆk xk
where the β-hats are the OLS estimates
obtained from the data set
(y1,x11,…,xk1)
(y2,x12,…,xk2)
…
(yT,x1T,…,xkT)
by minimizing the sum-of-squared residuals,
(yt – β0 – β1x1t - … - βkxkt)2, t = 1,…,T.
As in the case of the simple regression
model, this procedure to estimate the
population regression function will have
good properties provided that the regression
errors
εt = yt – E(yt│x1t,…,xkt) , t = 1,…,T
have appropriate properties.
Density Forecasts and Interval Forecasts –
The procedures we described above produce
point forecasts of y. They can also be used
to produce density and interval forecasts of
y, provided that the x’s and the regression
errors, i.e., the ε’s, meet certain conditions.
Download