Chapter 11 – Simple linear regression

advertisement
MAR 5621
Advanced Statistical Techniques
Summer 2003
Dr. Larry Winner
Chapter 11 – Simple linear regression
Types of Regression Models (Sec. 11-1)
Linear Regression: Yi  0  1 X i  i






Yi - Outcome of Dependent Variable (response) for ith experimental/sampling unit
X i - Level of the Independent (predictor) variable for ith experimental/sampling unit
0  1 X i - Linear (systematic) relation between Yi and Xi (aka conditional mean)
0 - Mean of Y when X=0 (Y-intercept)
1 - Change in mean of Y when X increases by 1 (slope)
 i - Random error term
Note that 0 and 1 are unknown parameters. We estimate them by the least squares
method.
Polynomial (Nonlinear) Regression: Yi  0  1 X i  2 X i2  i
This model allows for a curvilinear (as opposed to straight line) relation. Both linear and
polynomial regression are susceptible to problems when predictions of Y are made
outside the range of the X values used to fit the model. This is referred to as
extrapolation.
Least Squares Estimation (Sec. 11-2)
1. Obtain a sample of n pairs (X1,Y1)…(Xn,Yn).
2. Plot the Y values on the vertical (up/down) axis versus their corresponding X values
on the horizontal (left/right) axis.
^
3. Choose the line Y i  b0  b1 X i that minimizes the sum of squared vertical distances
^
from observed values (Yi) to their fitted values ( Y i ) Note: SSE 
n
 (Y  Y )
i 1
4. b0 is the Y-intercept for the estimated regression equation
5. b1 is the slope of the estimated regression equation
^
i
i
2
Measures of Variation (Sec. 11-3)
Sums of Squares


Total sum of squares = Regression sum of squares + Error sum of squares
Total variation = Explained variation + Unexplained variation

Total sum of squares (Total Variation): SST 
n
 (Y  Y )
i 1
df T  n  1
2
i
n

Regression sum of squares (Explained Variation): SSR 
 (Y  Y )
^
i 1
n

Error sum of squares (Unexplained Variation): SSE 
 (Y  Y )
i 1
df R  1
2
i
^
i
i
2
df E  n  2
Coefficients of Determination and Correlation
Coefficient of Determination


Proportion of variation in Y “explained” by the regression on X
exp lained var iation SSR
r2 

0  r2  1
total var iation
SST
Coefficient of Correlation

Measure of the direction and strength of the linear association between Y and X

r  sign(b1 ) r 2
 1 r  1
Standard Error of the Estimate (Residual Standard Deviation)
 Estimated standard deviation of data
n
 SYX 
SSE

n 2
 Y  Y^ 

i
 i
i 1
n 2
2
( V (Yi )  V (i )   2 )
Model Assumptions (Sec. 11-4)



Normally distributed errors
Heteroscedasticity (constant error variance for Y at all levels of X)
Independent errors (usually checked when data collected over time or space)
Residual Analysis (Sec. 11-5)
^
Residuals: ei  Yi  Yi  Yi  (b0  b1 X i )
Plots (see prototype plots in book and in class):

^
Plot of ei vs Yi Can be used to check for linear relation, constant variance
 If relation is nonlinear, U-shaped pattern appears
 If error variance is non constant, funnel shaped pattern appears
 If assumptions are met, random cloud of points appears






Plot of ei vs Xi Can be used to check for linear relation, constant variance
If relation is nonlinear, U-shaped pattern appears
If error variance is non constant, funnel shaped pattern appears
If assumptions are met, random cloud of points appears
Plot of ei vs i
(see next section)
Can be used to check for independence when collected over time
Histogram of ei
 If distribution is normal, histogram of residuals will be mound-shaped, around 0
Measuring Autocorrelation – Durbin-Watson Test (Sec. 11-6)
Plot residuals versus time order


If errors are independent there will be no pattern (random cloud centered at 0)
If not independent (dependent), expect errors to be close together over time (distinct
curved form, centered at 0).
Durbin-Watson Test


H0: Errors are independent (no autocorrelation among residuals)
HA: Errors are dependent (Positive autocorrelation among residuals)
n

Test Statistic: D 
 (e
i2
i
 ei 1 ) 2
n
e
i 1

2
i
Decision Rule (Values of d L ( k , n) and d U ( k , n) are given in Table E.10):
 If D  d u ( k , n) then conclude H0 (independent errors) where k is the number of
independent variables (k=1 for simple regression)
 If D  d L ( k , n) then conclude HA (dependent errors) where k is the number of
independent variables (k=1 for simple regression)
 If d L ( k , n)  D  dU ( k , n) then we withhold judgment (possibly need a longer
series)
NOTE: This is a test that you “want” to include in favor of the null hypothesis. If you
reject the null in this test, a more complex model needs to be fit (will be discussed in
Chapter 13).
Inferences Concerning the Slope (Sec. 11-7)
t-test
Test used to determine whether the population based slope parameter ( 1 ) is equal to a
pre-determined value (often, but not necessarily 0). Tests can be one-sided (predetermined direction) or two-sided (either direction).
2-sided t-test:
H 0 : 1  10
TS: t obs 
H A : 1  10
b1  10
Sb1
Sb1 
SYX
n
 (X
i 1
RR:| t obs |  t 
2
i
 X )2
,n  2
1-sided t-test (Upper-tail, reverse signs for lower tail):
H 0 : 1  10
TS: t obs
H A : 1  10
b1  10

Sb1
Sb1 
SYX
n
 (X
i 1
RR: t obs  t 
2
i
 X )2
,n  2
F-test (based on k independent variables)
A test based directly on sum of squares that tests the specific hypotheses of whether the
slope parameter is 0 (2-sided). The book describes the general case of k predictor
variables, for simple linear regression, k=1.
H 0 : 1  0
H A : 1  0
MSR
SSR / k
TS: Fobs 

MSE SSE / (n  k  1)
RR: Fobs  F ,k ,n  k 1
Analysis of Variance (based on k Predictor Variables)
Source
Regression
Error
Total
df
k
n-k-1
n-1
Sum of Squares
SSR
SSE
SST
Mean Square
F
MSR=SSR/k
Fobs=MSR/MSE
MSE=SSE/(n-k-1)
-------
(1-)100% Confidence Interval for the slope parameter, 
b1  t 
2



,n  2
Sb1
If entire interval is positive, conclude >0 (Positive association)
If interval contains 0, conclude (do not reject) 0 (No association)
If entire interval is negative, conclude <0 (Negative association)
Estimation of Mean and Prediction of Individual Outcomes (Sec. 11-8)
Estimating the Population Mean Response when X=Xi
Parameter: E[Y | X  X i ]  0  1 X i
^
Point Estimate: Y i  b0  b1 X i
(1-)100% Confidence Interval for E[Y | X  X i ]  0  1 X i :
1
hi  
n
^
Y i  t
2
,n  2
SYX hi
( Xi  X )2
n
 (X
i 1
i
 X )2
Predicting a Future Outcome when X=Xi
Future Outcome: Yi  0  1 X i  i
^
Point Prediction: Y i  b0  b1 X i
(1-)100% Prediction Interval for Yi  0  1 X i  i
^
Y i  t
2
,n  2
SYX 1  hi
( Xi  X )2
1
hi   n
n
 ( Xi  X )2
i 1
Problem Areas in Regression Analysis (Sec. 11-9)




Awareness of model assumptions
Knowing methods of checking model assumptions
Alternative methods when model assumptions are not met
Not understanding the theoretical background of the problem
The most important means of identifying problems with model assumptions is based on
plots of data and residuals. Often transformations of the dependent and/or independent
variables can solve the problems. Other, more complex methods may need to be used.
See Section 11-5 for plots used to check assumptions and Exhibit 11.3 on pages 555-556
of text.
Computational Formulas for Simple Linear Regression (Sec. 11-10)
Normal Equations (Based on Minimizing SSE by Calculus)
n
n
Y
i 1
 nb0  b1  X i
i
i 1
n
 XY
i i
i 1
n
n
i 1
i 1
 b0  X i  b1  X i2
Computational Formula for the Slope, b1
b1 
SSXY
SSXX
n
SSXY 

i 1
n
( X i  X )(Yi  Y ) 
n
SSXX 

i 1
n
( Xi  X )2 

i 1

i 1
 n
 n 
  X i    Yi 
 i 1   i 1 
X i Yi 
n
 n

  Xi 
 i 1 
X i2 
n
2
Computational Formula for the Y-intercept, b0
b0  Y  b1 X
Computational Formula for the Total Sum of Squares, SST
 n 
  Yi 
n
n
 i 1 
2
2
SST   (Yi  Y )   Yi 
n
i 1
i 1
2
Computational Formula for the Regression Sum of Squares, SSR
2
n
n
^
( SSXY ) 2


SSR    Yi  Y  
 b0  Yi  b1 
SSXX
i 1
i 1
i 1
n
 n 
  Yi 
 i 1 
X i Yi 
n
2
Computational Formula for Error Sum of Squares, SSE
n
SSE 
2
 Y  Y^   SST  SSR 
i

 i

i 1
n
n
n
i 1
i 1
i 1
 Yi 2  b0  Yi  b1  X i Yi
Computational Formula for the Standard Error of the Slope, Sb1
Sb1 
SYX
SSXX

SYX
n
 (X
i 1
i
 X )2
Chapter 12 – mULTIple Linear regression
Multiple Regression Model with k Independent Variables (Sec. 12-1)
General Case: We have k variables that we control, or know in advance of outcome, that
are used to predict Y, the response (dependent variable). The k independent variables are
labeled X1, X2,...,Xk. The levels of these variables for the ith case are labeled X1i ,..., Xki .
Note that simple linear regression is a special case where k=1, thus the methods used are
just basic extensions of what we have previously done.
Yi = 0 + X1i + ... + kXki + i
where j is the change in mean for Y when variable Xj increases by 1 unit, while holding
the k-1 remaining independent variables constant (partial regression coefficient). This is
also referred to as the slope of Y with variable Xj holding the other predictors constant.
Least Squares Fitted (Prediction) Equation (obtained by minimizing SSE):
^
Y i  b0  b1 X 1i    bk X ki
Coefficient of Multiple Determination
Proportion of variation in Y “explained” by the regression on the k independent variables.
n
R 2  rY21,,k 
 Y^  Y 

 i

i 1
 Y  Y 
n
i 1
2

2
SSR
SSE
 1
SST
SST
i
Adjusted-R2
Used to compare models with different sets of independent variables in terms of
predictive capabilities. Penalizes models with unnecessary or redundant predictors.

 n  1   SSE 
 n 1 
2
Adj  R 2  radj
 1 

  1   1  rY21,...,k 

 n  k  1  SST 
 n  k  1 



Residual Analysis for Multiple Regression (Sec. 12-2)
Very similar to case for Simple Regression. Only difference is to plot residuals versus
each independent variable. Similar interpretations of plots. A review of plots and
interpretations:
^
Residuals: ei  Yi  Yi  Yi  (b0  b1 X 1i   bk X ki )
Plots (see prototype plots in book and in class):

^
Plot of ei vs Yi Can be used to check for linear relation, constant variance
 If relation is nonlinear, U-shaped pattern appears
 If error variance is non constant, funnel shaped pattern appears
 If assumptions are met, random cloud of points appears



Plot of ei vs Xji for each j Can be used to check for linear relation with respect to Xj
If relation is nonlinear, U-shaped pattern appears
If assumptions are met, random cloud of points appears


Plot of ei vs i
Can be used to check for independence when collected over time
If errors are dependent, smooth pattern will appear
If errors are independent, random cloud of points appears


Histogram of ei
 If distribution is normal, histogram of residuals will be mound-shaped, around 0
F-test for the Overall Model (Sec. 12-3)
Used to test whether any of the independent variables are linearly associated with Y
Analysis of Variance
n
 (Y  Y )
Total Sum of Squares and df: SST 
i 1
n
 (Y
Regression Sum of Squares: SSR 
^
i
i 1
n
Error Sum of Squares: SSE 
 (Y  Y )
i 1
Source
Regression
Error
Total
df
k
n-k-1
n-1
SS
SSR
SSE
SST
^
i
i
df T  n  1
2
i
2
 Y)2
df R  k
df E  n  k  1
MS
MSR=SSR/k
MSE=SSE/(n-k-1)
---
F
Fobs=MSR/MSE
-----
F-test for Overall Model
H0: k = 0
(Y is not linearly associated with any of the independent variables)
HA: Not all j = 0 (At least one of the independent variables is associated with Y )
TS: Fobs 
MSR
MSE
RR: Fobs  F ,k ,n k 1
P-value: Area in the F-distribution to the right of Fobs
Inferences Concerning Individual Regression Coefficients (Sec. 12-4)
Used to test or estimate the slope of Y with respect to Xj, after controlling for all other
predictor variables.
t-test for j
H0: j = j0 (often, but not necessarily 0)
HA: j  j0
TS: t obs 
b j   j0
Sb j
RR: | t obs |  t 
2
,n  k  1
P-value: Twice the area in the t-distribution to the right of | tobs |
(1-% Confidence Interval for j
bj  t
2



,n  k  1
Sb j
If entire interval is positive, conclude j>0 (Positive association)
If interval contains 0, conclude (do not reject) j0 (No association)
If entire interval is negative, conclude j<0 (Negative association)
Testing Portions of the Model (Sec. 12-5, But Different Notation)
Consider 2 blocks of independent varibles:


Block A containing k-q independent variables
Block B containing q independent variables
We wish to test whether the set of independent variables in block B can be dropped from
a model containing the set of independent variables. That is, the variables in block B add
no predictive value to the variables in block A.
Notation:





SSR(A&B) is the regression sum of squares for model containing both blocks
SSR(A) is the regression sum of squares for model containing block A
MSE(A&B) is the error mean square for model containing both blocks
k is the number of predictors in the model containing both blocks
q is the number of predictors in block B
H0: The ’s in block B are all 0, given that the variables in block A have been included in
model
HA: Not all ’s in block B are 0, given that the variables in block A have been included in
model
TS: Fobs
 SSR( A& B)  SSR( A)   SSR( B| A) 

 

q
q
MSR( B| A)

 




MSE ( A& B)
MSE ( A& B) MSE ( A& B)
RR: Fobs  F ,q ,n k 1
P-value: Area in Fdistribution to the right of Fobs
Coefficient of Partial Determination (Notation differs from text)
Measures the fraction of variation in Y explained by Xj, that is not explained by other
independent variables.
rY221 
rY2312 
SSR( X 1 , X 2 )  SSR( X 1 )
SSR( X 2 | X 1 )

SST  SSR( X 1 )
SST  SSR( X 1 )
SSR( X 1 , X 2 , X 3 )  SSR( X 1 , X 2 )
SSR( X 3 | X 1 , X 2 )

SST  SSR( X 1 , X 2 )
SST  SSR( X 1 , X 2 )
rYk2 12...k 1 
SSR( X 1 , X k )  SSR( X 1 , , X k 1 )
SSR( X k | X 1 , , X k 1 )

SST  SSR( X 1 , , X k 1 )
SST  SSR( X 1 , , X k 1 )
Note that since labels are arbitrary, these can be constructed for any of the independent
variables.
Quadratic Regression Models (Sec. 12-6)
When the relation between Y and X is not linear, it can often be approximated by a
quadratic model.
Population Model: Yi  0  1 X 1i  2 X 12i  i
^
Fitted Model: Y i  b0  b1 X 1i  b2 X 12i
Testing for any association: H0: 1=2=0
Testing for a quadratic effect: H0: 2=0
HA: 1 and/or 2  0
HA:   0
(t-test)
(F-test)
Models with Dummy Variables (Sec. 12-7)
Dummy variables are used to include categorical variables in the model. If the variable
has m levels, we include m-1 dummy variables, The simplest case is binary variables with
2 levels, such as gender. The dummy variable takes on the value 1 if the characteristic of
interest is present, 0 if it is absent.
Model with no interaction
Consider a model with one numeric independent variable (X1) and one dummy variable
(X2). Then the model for the two groups (characteristic present and absent) have the
following relationships with Y:
Yi  0  1 X 1i  2 X 2i  i
E ( i )  0
Characteristic Present (X2i = 1): E (Yi )  0  1 X 1i  2 (1)  ( 0  2 )  1 X 1i
Characteristic Absent (X2i=0): E (Yi )  0  1 X 1i  2 (0)  0  1 X 1i
Note that the two groups have different Y-intercepts, but the slopes with respect to X1 are
equal.
Model with interaction
This model allows the slope of Y with respect to X1 to be different for the two groups with
respect to the categorical variable:
Yi  0  1 X 1i  2 X 2i  3 X 1i X 2i  i
E ( i )  0
X2i=1: E (Yi )  0  1 X 1i  2 (1)  3 X 1i (1)  ( 0  2 )  ( 1  3 ) X 1i
X2i=0: E (Yi )  0  1 X 1i  2 (0)  3 X 1i (0)  0  1 X 1i
Note that the two groups have different Y-intercepts and slopes. More complex models
can be fit with multiple numeric predictors and dummy variables. See examples.
Transformations to Meet Model Assumptions of Regression (Sec. 12-8)
There are a wide range of possibilities of transformations to improve model when
assumptions on errors are violated. We will consider specifically two types of logarithmic
transformations that work well in a wide range of problems. Other transformations can be
useful in very specific situations.
Logarithmic Transformations
These transformations can often fix problems with nonlinearity, unequal variances, and
non-normality of errors. We will consider base 10 logarithms, and natural logarithms.
X '  log10 ( X )
X '  ln( X )
X  10 X '
X  eX'
Case 1: Multiplicative Model (with multiplicative nonnegative errors)
Yi  0 X 1i1 X 2i2 i
Taking base 10 logarithms of both sides gives model:
log10 (Yi )  log10 ( 0 X 1i1 X 2i2 i )  log10 ( 0 )  log10 ( X 1i1 )  log10 ( X 2i2 )  log10 (i ) 
log10 ( 0 )  1 log10 ( X 1i )  2 log10 ( X 2i )  log10 (i )
Procedure: First, take base 10 logarithms of Y and X1 and X2, and fit the linear regression
model on the transformed variables. The relationship will be linear for the transformed
variables.
Case 2: Exponential Model (with multiplicative nonnegative errors)
Yi  e 0  1 Xi 1  2 X 2i i
Taking natural logarithms of the Y values gives the model:
ln(Yi )  0  1 X 1i  2 X 2i  ln(i )
Multicollinearity (Sec. 12-9)
Problem: In observational studies and models with polynomial terms, the independent
variables may be highly correlated among themselves. Two classes of problems arise.
Intuitively, the variables explain the same part of the random variation in Y. This causes
the partial regression coefficients to not appear significant for individual terms (since
they are explaining the same variation). Mathematically, the standard errors of estimates
are inflated, causing wider confidence intervals for individual parameters, and smaller tstatistics for testing whether individual partial regression coefficients are 0.
Variance Inflation Factor (VIF)
VIFj 
1
1  r j2
where r j2 is is the coefficient of determination when variable Xj is regressed on the j-1
remaining independent variables. There is a VIF for each independent variable. A
variable is considered to be problematic if its VIF is 10.0 or larger. When
multicollinearity is present, theory and common sense should be used to choose the best
variables to keep in the model. Complex methods such as principal components
regression and ridge regression have been developed, we will not pursue them here, as
they don’t help in terms of the explanation of which independent variables are associated
and cause changes in Y.
Model Building (Sec. 12-10)
A major goal of regression is to fit as simple of model as possible, in terms of the number
of predictor (independent) variables, that still explains variation in Y. That is, we wish to
choose a parsimonious model (think of Ockham’s razor). We consider two methods.
Stepwise Regression
This is a computational method that uses the following algorithm:
1. Choose a significance level (P-value) to enter the model (SLE) and a significance
level to stay in the model (SLS). Some computer packages require that SLE < SLS.
2. Fit all k simple regression models, choose the independent variable with the largest tstatistic (smallest P-value). Check to make sure the P-value is less than SLE. If yes,
keep this predictor in model, otherwise stop (no model works).
3. Fit all k-1 variable models with two variables (one of which is the winning variable
from step 2). Choose the model with the new variable having the lowest P-value.
Check to determine that this new variable has P-value less than SLE. If no, then stop
and keep the model with only the winning variable from step 2. If yes, then check that
the variable from step 2 is still significant (it’s P-value is less than SLS), if not drop
the variable from step 2.
4. Fit all k-2 variable models with the winners from steps 2 and 3. Follow the process
described in step 3, where both variables in model from steps 2 and 3 are checked
against SLS.
5. Continue until no new terms can enter model by not meeting SLE criteria.
Best Subsets (Mallows’ Cp Statistic)
Consider a situation with T-1 potential independent variables (candidates). There are 2T-1
possible models containing subsets of the T-1 candidates. Note that the full model
containing all candidates actually has T parameters (including the intercept term).
1. Fit the full model and compute rT2 , the coefficient of determination
2. For each model with k predictors, compute rk2
(1  rk2 )(n  T )
3. Compute C P 
 (n  2( k  1))
1  rT2
4. Choose model with smallest k so that CP is is close to or below k+1
Chapter 13 – Time series & Forecasting
Classical Multiplicative Model (Sec. 13-2)
Time series can generally be decomposed into components representing overall level,
linear trend, cyclical patterns, seasonal shifts, and random irregularities. Series that are
measured at the annual level, cannot demonstrate the seasonal patterns. In the
multiplicative model, effects of each component enter in a multiplicative manner.
Multiplicative Model Observed at Annual Level
Yi  Ti Ci I i
where the components are trend, cycle, and irregularity effects
Multiplicative Model Observed at Seasonal Level
Yi  Ti Ci Si I i where the components are trend, cycle, seasonal, and irregularity effects
Smoothing a Time Series (Sec. 13-3)
Time series data are usually smoothed to reduce the impact of the random irregularities in
individual data points.
Moving Averages
Centered moving averages are averages of the value at the current point and neighboring
values (above and below the point). Due to the centering, they are typically odd
numbered. The 3- and 5-period moving averages at time point i would be:
MA(3) i 
Yi 1  Yi  Yi 1
3
MA(5) i 
Yi  2  Yi 1  Yi  Yi 1  Yi  2
5
Clearly, their can’t be moving averages at the lower and upper end of the series.
Exponential Smoothing
This method uses all previous data in a series to smooth it, with exponentially decreasing
weights on observations further in the past:
E1  Y1
Ei  WYi  (1  W ) Ei 1 i  2,3, 0  W  1
The exponentially smoothed value for a current time point can be used as forecast for the
next time point.
^
^
Y i 1  Ei  WYi  (1  W ) Yi
Least Squares Trend Fitting and Forecasting (Sec. 13-4)
We consider three trend models: linear, quadratic, and exponential. In each model the
independent variable Xi is the time period of observation i. If the first observation is
period 1, then Xi = i.
Linear Trend Model
Population Model: Yi  0  1 X i  i
^
Fitted Forecasting Equation: Y i  b0  b1 X i
Quadratic Trend Model
Population Model: Yi  0  1 X i  2 X i2  i
^
Fitted Forecasting Equation: Y i  b0  b1 X i  b2 X i2
Exponential Trend Model
Population Model: Yi  0 1Xi i
^
log10 (Yi )  log10 0  X i log10 (1 )  log10 (i )
Fitted Forecasting Equation: log10 (Yi )  b0  b1 X i
^
Y i  (10b0 )(10b1 ) Xi
This exponential model’s forecasting equation is obtained by first fitting a simple linear
regression of the logarithm of Y on X, then putting the least squares estimates of the Yintercept and the slope into the second equation.
Model Selection Based on Differences
First Differences: Yi  Yi 1
i  2, , n
Second Differences: (Yi  Yi 1 )  (Yi 1  Yi  2 )  Yi  2Yi 1  Yi  2
Percent Differences:
Yi  Yi 1
 100%
Yi 1
i  3, , n
i  2, , n
If first differences are approximately constant, then the linear trend model is appropriate.
If the second differences are approximately constant, then the quadratic trend model is
appropriate. If percent differences are approximately constant, then the exponential trend
model is appropriate.
Autoregressive Model for Trend Fit and Forecasting (Sec. 13-5)
Autoregressive models allow for the possibility that the mean response is a linear
function of previous response(s).
1st Order Model
Population Model: Yi  A0  A1Yi 1  i
^
Ftted Equation: Y i  a0  a1Yi 1
Pth Order Model
Population Model: Yi  A0  A1Yi 1   AP Yi  P  i
^
Ftted Equation: Y i  a0  a1Yi 1   a P Yi  P
Significance Test for Highest Order Term
H 0 : AP  0
a
TS: t obs  P
S AP
H A : AP  0
RR:| t obs |  t 
2
,n  2 P  1
This test can be used repeatedly to determine the order of the autoregressive model. You
may start with a large value of P (say 4 or 5, depending on length of series), and then
keep fitting simpler models until the test for the last term is significant.
Fitted Pth Order Forecasting Equation
Start by fitting the Pth order autoregressive model on a dataset cotaining n observations.
Then, we can get forecasts into the future as follows:
^
Y n 1  a 0  a1Yn  a 2 Yn 1 a P Yn  P 1
^
^
^
^
Y n  2  a 0  a1 Y n 1  a 2 Yn  a P Yn  P  2
^
^
Y n  j  a 0  a1 Y n  j 1  a 2 Y n  j  2  a P Y n  j  P
^
In the third equation (the general case), the Y values are the observed values when
j  P , and are forecasted values when j  P .
Choosing a Forecasting Model (Sec. 13-6)
To determine which of a set of potential forecasting models to use, first obtain the
prediction errors (residuals). Two widely used measures are the mean absolute deviation
(MAD) and mean square error (MSE).
n
^
ei  Yi  Y i
MAD 
n
 |Y  Y |  | e |
i 1
^
i
n
i

i 1
n
n
i
MSE 
 (Y  Y )
i 1
^
i
n
2
i
n

e
i 1
2
i
n
Plots of residuals versus time should be a random cloud, centered at 0. Linear, cyclical,
and seasonal patterns may also appear, leading to fitting models accounting for them. See
prototype plots on Page 695 of text book.
Time Series Forecasting of Quarterly or Monthly Series (Sec. 13-7)
Consider the exponential trend model, with quarterly and monthly dummy variables,
depending on the data in a series.
Exponential Model with Quarterly Data
Create three dummy variables: Q1, Q2, and Q3 where Qj=1 if the observation occured in
quarter j, 0 otherwise. Let Xi be the coded quarterly value (if the first quarter of the data is
to be labeled as 1, then Xi=i). The population model and its tranformed linear model are:
Yi  0 1X i 2Q1 3Q2 4Q3 i
log10 (Yi )  log10 ( 0 )  X i log10 ( 1 )  Q1 log10 ( 2 )  Q2 log10 ( 3 )  Q3 log10 ( 4 )  log10 (i )
Fit the multiple regression model with the base 10 logarithm of Yi as the response and Xi
(which is usually i), Q1, Q2, and Q3 as the independent variables. The fitted model is:
^
log10 (Yi )  b0  b1 X i  b2 Q1  b3Q2  b4 Q3
^
Y i  10b0  b1 X i  b2Q1  b3Q2  b4Q3
Exponential Model with Monthly Data
A similar approach is taken when data are monthly. We create 11 dummy variables:
M1...,M11 that take on 1 if the month of the observation is the month corresponding to the
dummy variable. That is, if the observation is for January, M1=1 and all other Mj terms
are 0. The population model and its tranformed linear model are:
Yi  0 1X i 2M1 3M2  12M11 i
log10 (Yi )  log10 ( 0 )  X i log10 ( 1 )  M1 log10 ( 2 )  M 2 log10 ( 3 )   M11 log10 ( 12 )  log10 (i )
The fitted model is:
^
log10 (Yi )  b0  b1 X i  b2 M1  b3 M 2   b12 M11
^
Y i  10b0  b1 X i  b2 M1  b3 M2  b12 M11
Download