Forecasting by Regression

advertisement
Regression Method
1
Chapter Topics
• Multiple regression
• Autocorrelation
Slide 2
Regression Methods
• To forecast an outcome (response variable,
dependent variable) of a study based on a
certain number of factors (explanatory variables,
regressors).
• The outcome has to be quantitative but the
factors can either by quantitative or categorical.
• Simple Regression deals with situations with
one explanatory variable, whereas multiple
regression tackles case with more than one
regressors.
Slide 3
Simple Linear Regression
– Collect data
Population
Unknown
Relationship
Yi  0  1 X i   i
Random Sample
Y  b0  b1 X  e
J$
J$
J$
J
J$
$
J$
J
J$
$
J$
Slide 4
Multiple Regression
• Two or more explanatory variables
• Multiple linear regression model
Y  0  1 X1  2 X 2  ...  p X p  
where  is the error term and  ~ N(0, 2)
• Multiple Linear Regression Equation
E(Y )  0  1 X1  2 X 2  ...  p X p
• Estimated Multiple Linear Regression Equation
Yˆ  b  b X  b X  ...  b X
0
1
1
2
2
p
p
Slide 5
Multiple Regression
• Least Squares Criterion
n
n
min  e  min  (Yi  Yˆi ) 2
i 1
2
i
i 1
• The formulae for the regression coefficients b0, b1, b2, . . .
bp involve the use of matrix algebra. We will rely on
computer software packages to perform the
calculations.
• bi represents an estimate of the change in Y
corresponding to a one-unit change in Xi when all other
independent variables are held constant.
Slide 6
Multiple Regression
• R2=SSR/SST=1-SSE/SST
• Adjusted R2 ( Ra2)
SSE /(n  p  1)
n 1
2
R  1
 1  (1  R )
SST /(n  1)
n  p 1
where n is the number of observations
and p is the number of independent variables
• The Adjusted R2 compensates for the number of
independent variables in the model. It may rise or fall.
• It will fall if the increase in R2 due to the inclusion of
additional variables is not enough to offset the reduction
in the degrees of freedom.
2
a
Slide 7
Test for Significance
• Test for Individual Significance: t test
– Hypothesis
H 0 : i  0
H a : i  0
– Test statistic
t
bi
sbi
– Decision rule: reject the null hypothesis at α level of
significance if
• t t

( n  p 1; )
2
, or
• p-value < α
Slide 8
Test for Significance
• Testing for Overall Significance: F test
– Test whether the multiple regression model as a
whole is useful to explain Y, i.e., at least one X–
variable in the regression model is useful to explain Y.
– Hypothesis
H0 : all slope coefficients are equal to zero
(i.e. β1 = β2 =…= βp =0)
Ha : not all slope coefficients are equal to zero
Slide 9
Test for Significance
• Testing for Overall Significance: F test
– Test statistic
(Yˆi  Y ) 2 p
MSR
SSR p

F


MSE SSE (n  p  1)  (Yi  Yˆi ) 2 (n  p  1)
– Decision rule: reject null hypothesis if
• F > Fα is based on an F distribution with p degrees of
freedom in the numerator and n – p –1 degrees of freedom in
the denominator, or
• p-value < α
Slide 10
Example: District Sales
• Use both target population and per capita discretionary
income to forecast district sales.
District
i
Sales (gross of jars;
1 gross = 12 dozens)
Target population
(‘000 persons)
Per capita discretionary
income ($)
Yi
X1i
X2i
1
162
274
2450
2
120
180
3254
3
223
375
3802
4
131
205
2838
5
67
86
2347
6
169
265
3782
7
81
98
3008
8
192
330
2450
9
116
195
2137
10
55
53
2560
11
252
430
4020
12
232
372
4427
13
144
236
2660
14
103
157
2088
15
212
370
2605
Slide 11
Example: District Sales
• Excel output
Slide 12
Example: District Sales
• Multiple regression model
Y  0  1 X1  2 X 2  
where
Y = district sales
X1 = target population
X2 = per capita discretionary income
• Multiple Regression Equation
Using the assumption E( ) = 0, we obtain
E(Y )  0  1 X1  2 X 2
Slide 13
Example: District Sales
• Estimated Regression Equation
b0, b1, b2 are the least squares estimates of 0, 1, 2. Thus
Yˆ  b  b X  b X
0
1
1
2
2
• For this example,
Yˆ  3.4526 0.4960X1  0.0092X 2
– Predicted sales are expected to increase by 0.496 gross
when the target population increases by one thousand,
holding per capita discretionary income constant.
– Predicted sales are expected to increase by 0.0092 gross
when per capita discretionary income increase by one
dollar, holding population constant.
Slide 14
Example: District Sales
• t Test for Significance of Individual Parameters
– Hypothesis
H 0 : i  0
H a : i  0
– Decision rule
For  = .05 and d.f. = 15 – 2 – 1 = 12, t.025 = 2.179
Reject H0 if |t| > 2.179
– Test statistic
b
0.49600
t 1 
 81.92
sb1 0.00605
–
Conclusions
Reject H0: 1 = 0
b2
0.00920
t

 9.50
sb2 0.000968
Reject H0: 2 = 0
Slide 15
Example: District Sales
• To test whether sales are related to population and per capita
discretionary income
– Hypothesis
H0 : β1 = β2 =0
Ha : not both β1 and β2 equal to zero
– Decision Rule
For  = .05 and d.f. = 2, 12: F.05 = 3.89
Reject H0 if F > 3.89.
– Test statistic
F = MSR/MSE = 26922/4.74 = 5679.47
– Conclusion
Reject H0, sales are related to population and per capita
discretionary income.
Slide 16
Example: District Sales
• R2 = 99.89% means that 99.89% of total variation
of sales can be explained by its linear relation
with population and per capita discretionary
income.
• Ra2 = 99.88%. Both R2 and Ra2 mean the model
fits the data very well.
Slide 17
Regression Diagnostics
• Model assumptions about the error term 
– The error  is a random variable with mean of
zero, i.e., E() = 0
– The variance of , denoted by  2, is the same
for all values of the independent variable(s),
i.e., Var() =  2
– The values of  are independent.
– The error  is a normally distributed random
variable.
Slide 18
Regression Diagnostics
• Residual analysis: validating model assumptions
• Calculate the residuals and check the following.
– Are the errors normally distributed?
• Normal probability plot
– Is the error variance constant?
• Plot of residuals against yˆ
– Are the errors uncorrelated (time series data)?
• Plot of residuals against time periods
– Are there observations that are inaccurately recorded
or do not belong to the target population?
• Double check the accuracy of outliers and influential
observations.
Slide 19
Autocorrelation
• Autocorrelation is present if the disturbance
terms are correlated. Three issues need to be
addressed.
– How does autocorrelation arise?
– How to detect autocorrelation?
– Alternative estimation strategies under
autocorrelation
Slide 20
Causes of Autocorrelation
1.
Omitting relevant regressors
Suppose the true model is
Yt  0  1 X1t  2 X 2t   t
But the model is mis-specified as
Yt  0  1 X1t  t
That is,
 t   2 X 2t   t
If X2t is correlated with X2,t-1, νt is also correlated with
νt-1. This is particularly serious if X2t represents a
lagged dependent variable.
Slide 21
Causes of Autocorrelation
2.
Specification errors in the functional form
Suppose the true model is
Yt  0  1 X t  2 X t2   t
But the model is mis-specified as
Yt  0  1 X t  t
νt would tend to be positive
for X<A and X>B, and
negative for A<X<B.
Slide 22
Causes of Autocorrelation
3.
Measurement errors in the variables
Suppose Yt = Yt* + νt
where Y is the observed value, Y* is the true value and
ν is the measurement error. Hence, the true model is
Yt*  0  1 X1t  2 X 2t  ...  p X pt   t
and the observed model is
Yt   0  1 X 1t   2 X 2t  ...  p X pt  ( t  t )

ut
Given a “common” measurement method, it is likely
that measurement errors in period t and t-1 are
correlated.
Slide 23
Causes of Autocorrelation
4.
Pattern of business cycle
Time-series data relating to business and economics
often exhibit pattern of business cycle. Sluggishness
during recession persists over a certain time period
while prosperity in bloom continues for a certain
duration of time. It is apparent that successive
observations tend to be correlated.
Slide 24
Testing for First Order Autocorrelation
• First-order autocorrelation
– The error term in time period t is related to
the error term in time period t–1 by the
equation εt = ρεt-1 + at , where at ~ N(0, σa2).
– Use Durbin-Watson test to test the existence
of first order autocorrelation
Slide 25
Testing for First Order Autocorrelation
• Durbin-Watson test
– For positive autocorrelation
H0 : The error terms are not autocorrelated (ρ = 0)
Ha : The error terms are positively autocorrelated (ρ > 0)
– For negative autocorrelation
H0 : The error terms are not autocorrelated (ρ = 0)
Ha : The error terms are negatively autocorrelated (ρ < 0)
– For positive or negative autocorrelation
H0 : The error terms are not autocorrelated (ρ = 0)
Ha : The error terms are positively or negatively autocorrelated
(ρ  0)
– Test statistic
n
DW 
2
(
e

e
)
 t t 1
t 2
n
e
t 1
2
t
Slide 26
Testing for First Order Autocorrelation
n
DW 
 (e  e
t 2
t 1
t
)

n
e
n
n
t 2
t 2
 2 et et 1
2
t 1
t 2
n
e
t 1
n

e
t 1
2
t
n
n
 e   e  e  2 et et 1
2
1
2
t
t 1
2
n
t 2
n
2
e
t
2
t
t 1
n
2 e  2 et et 1  (e12  en2 )
t 1
n
e  e
2
t
2
t
t 1

n
2
2
t
t 2
n
2
e
t
t 1
 2(1  r ) 
e12  en2
n
2
e
t
,
t 1
where r is the sample autocorrelation coefficient expressed as
n
r
e e
t 2
n
t t 1
e
t 1
2
t
Slide 27
Testing for First Order Autocorrelation
• In “large samples”, DW  2(1–r)
– If the disturbances are uncorrelated, then r = 0
and DW  2
– If negative first order autocorrelation exists,
then r<0 and DW > 2
– If positive first order autocorrelation exists,
then r>0 and DW < 2
• Exact critical values of the Durbin-Watson test
cannot be calculated. Instead, Durbin-Watson
established upper (dU) and lower (dL) bounds for
the critical values. They are for testing first order
autocorrelation only.
Slide 28
Testing for First Order Autocorrelation
• Test for positive autocorrelation
H0 : ρ = 0
Ha : ρ > 0
• Decision rules
– If DW < dL,α, we reject H0.
– If DW > dU,α, we do not reject H0.
– If dL,α ≤ DW ≤ dU,α, the test is inconclusive.
Slide 29
Example: Company Sales
• The Blasidell Company wished to predict its sales by using
industry sales as a predictor variable.
Year
Quarter
t
X
Y
1977
1
1
127.3
20.96
2
2
130
21.4
3
3
132.7
21.96
4
4
129.4
21.52
1
5
135
22.39
2
6
137.1
22.76
3
7
141.2
23.48
4
8
142.8
23.66
1
9
145.5
24.1
2
10
145.3
24.01
3
11
148.3
24.54
4
12
146.4
24.3
1
13
150.2
25
2
14
153.1
25.64
3
15
157.3
26.36
4
16
160.7
26.98
1
17
164.2
27.52
2
18
165.6
27.78
3
19
168.7
28.24
4
20
171.7
28.78
1978
1979
1980
1981
Slide 30
Example: Company Sales
• From the scatter plot, a linear regression model is
appropriate
29
28
27
26
Y
25
24
23
22
21
20
130
140
150
X
160
170
Slide 31
Example: Company Sales
• SAS output
Slide 32
Example: Company Sales
• Estimated regression equation
Yˆ  1.45475 0.17628X
• The market research analyst was concerned with the
possibility of positively correlated errors. Using the
Durbin-Watson test:
H0 : ρ = 0
Ha : ρ > 0
Slide 33
Example: Company Sales
20
DW 
 (e  e
t 2
t 1
t
20
e
t 1
2
t
)2
0.09794

0.13330
 0.735
Suppose α = 0.01. For n=20 (n
denotes the number of
observations) and k’ =1 (k’ denotes
the number of independent
variables),
dL = 0.95 and dU=1.15.
Since DW < dL, we conclude that
the error terms are positively
autocorrelated.
et
-0.02605
-0.06202
0.02202
0.16375
0.04657
0.04638
0.04362
-0.05844
-0.0944
-0.14914
-0.14799
-0.05305
-0.02293
0.10585
0.08546
0.1061
0.02911
0.04232
-0.04416
-0.03301
et-1
et-et-1
(et-et-1)^2
-0.02605
-0.06202
0.02202
0.16375
0.04657
0.04638
0.04362
-0.05844
-0.0944
-0.14914
-0.14799
-0.05305
-0.02293
0.10585
0.08546
0.1061
0.02911
0.04232
-0.04416
-0.03597
0.08404
0.14173
-0.11718
-0.00019
-0.00276
-0.10206
-0.03596
-0.05474
0.00115
0.09494
0.03012
0.12878
-0.02039
0.02064
-0.07699
0.01321
-0.08648
0.01115
sum=
0.001294
0.007063
0.020087
0.013731
3.61E-08
7.62E-06
0.010416
0.001293
0.002996
1.32E-06
0.009014
0.000907
0.016584
0.000416
0.000426
0.005927
0.000175
0.007479
0.000124
0.097942
et^2
0.000679
0.003846
0.000485
0.026814
0.002169
0.002151
0.001903
0.003415
0.008911
0.022243
0.021901
0.002814
0.000526
0.011204
0.007303
0.011257
0.000847
0.001791
0.00195
0.00109
0.1333
Slide 34
Testing for First Order Autocorrelation
• Remark
– In order to use the Durbin-Watson table, there
must be an intercept term in the model.
Slide 35
Testing for First Order Autocorrelation
• Test for negative autocorrelation
H0 : ρ = 0
Ha : ρ < 0
• Decision rules
– If 4 – DW < dL,α, we reject H0.
– If 4 – DW > dU,α, we do not reject H0.
– If dL,α ≤ 4 – DW ≤ dU,α, the test is inconclusive.
Slide 36
Testing for First Order Autocorrelation
• Test for positive or negative autocorrelation
H0 : ρ = 0
Ha : ρ  0
• Decision rules
– If DW < dL,α/2 or 4 – DW < dL,α/2, we reject H0.
– If DW > dU,α/2 and 4 – DW > dU,α/2 , we do not
reject H0.
– If dL,α/2 ≤ DW ≤ dU,α/2 or dL,α/2 ≤ 4 – DW ≤ dU,α/2 ,
the test is inconclusive.
Slide 37
Testing for First Order Autocorrelation
• Remarks
– The validity of the Durbin-Watson test
depends on the assumption that the
population of all possible residuals at any
time t has a normal distribution.
– Positive autocorrelation is found in practice
more commonly than negative
autocorrelation.
– First-order autocorrelation is not the only type
of autocorrelation.
Slide 38
Solutions to Autocorrelation (1)
1.
2.
Re-examine the model. The typical causes of autocorrelation are omitted
regressors or wrong functional forms.
Go for alternative estimation strategy. Several approaches are
commonly used. The approach considered here is the two-step
Cochrane-Orcutt procedure.
Consider the following model with AR(1) disturbances :
Yt  1  2 X t  t ,
(1)
with t  t 1  ut .
Slide 39
Solutions to Autocorrelation (2)
Since equation (1) holds true for all observation, in terms of the (t-1)th
observation, we have
Yt 1  1  2 X t 1  t 1 ,
(2)
where t 1  t  2  ut 1.
Now, multiply (2) by , we obtain
Yt 1  1  2 X t 1  t 1 ,
(3)
Subtracting (3) from (1), we get
(Yt  Yt 1 )  (1  )1  2 ( X t  X t 1 )  (t  t 1 )
That is,
Yt*  1*  2 X t*  ut
(4)
Note that the ut’s are uncorrelated. However,  is unknown and needs to be
estimated.
Slide 40
Two-step Cochrance-Orcutt
1.
Estimate equation (1) by Least Squares method and obtain the
n
resulting residuals et’s. Regress et = et-1 + ut and obtain
r
e e
t 2
n
e
t 2
2.
t t 1
2
t 1
Substitute r into equation (4) and obtain OLS estimates of coefficients based
on equation (4).
Slide 41
 The following table represents the annual U.S. personal consumption
expenditure (C) in billions of 1978 dollars from 1976 to 1990 inclusively :
Slide 42
 An OLS linear trend model has been fitted to the above data, and it gives
the following residuals :
Slide 43
 To test for positive first order autocorrelation in the error and hence
estimate a model for this error process, consider
H0 :  = 0
Ha :   0
 Using the Durbin-Watson test,
15
DW 
 (e
t 2
t
 et 1 ) 2
15
e
t 1
2
t
627.7213
15141004
.
 0.4146.

Slide 44
 When k’ =1 and n=15,
dl = 1.08, du = 1.36
Hence we reject H0
 By regressing et on et-1, we obtain r = 0.79
Hence the error process is
et  0.79et 1  ut
Re-estimate the trend model for consumption using the two-step
Cochrane-Orcutt procedure.
Slide 45
 Using the transformed model
Ct  rCt 1  1 (1  r )  2 [t  r (t  1)]  ut
with t=1 indicating year 1976, sequentially until t=15 representing year
1990, the transformed data are tabulated in following table.
Slide 46

Applying OLS to the transformed data yields
 *  41.415  18.688t *
C
t
or
Ct  41415
.
 0.79Ct 1  18.688t *
That is, 2  18.688,
41.415

1 
 197.21
1  0.79
are parameter estimates of the original model.
Slide 47
Note that
1. Because lagged values of Y and X had to be
formed, we are left with n-1 observations only
2. The estimate r is obtained based on OLS
estimation assuming a standard linear
regression model satisfying all classical
assumptions. It may not be efficient estimator
of r. This leads to the iterative Cochrane-Orcutt
estimator.
Slide 48
Chapter Summary
•
•
•
•
Simple linear regression
Multiple regression
Regression on Dummy Variables
Autocorrelation
– Durbin-Watson test
– Two step Cochrane Orcutt procedure
Slide 49
Download