FORECASTING

advertisement
King Abdulaziz University
Faculty of Engineering
Industrial Engineering Dept.
IE 436
Dynamic Forecasting
1
CHAPTER 3
Exploring Data Patterns
and an Introduction to
Forecasting techniques
 Cross-sectional
data:
collected at a single point in time.
 A Time series: collected, and
recorded over successive
increments of time.
(Page 62)
2
Exploring Time Series
Data Patterns
 Horizontal
(stationary).
 Trend.
 Cyclical.
 Seasonal.
A Stationary Series
Its mean and variance remain constant
over time
3
The Trend
The long-term component that represents
the growth or decline in the time series.
The Cyclical component
The wavelike fluctuation around the trend.
Cost
25
Cyclical Peak
Trend Line
20
15
Cyclical Valley
10
0
10
Year
20
FIGURE 3-2 Trend and Cyclical Components of an
Annual Time Series Such as Housing Costs
Page (63)
4
The Seasonal Component
A pattern of change that repeats itself year
after year.
Seasonal data
800
750
700
650
600
600
400
550 550
500
500
450
Y
500
550
400
350
400
350
350
400
350
400
350
300
300
250
250
200
200
200
150
100
2
4
6
8
10
12
14
Index
16
18
20
22
FIGURE 3-3 Electrical Usage for Washington
water Power Company, 1980-1991
24
Page (64)
5
Exploring Data Patterns with
Autocorrelation Analysis
• Autocorrelation:
The correlation between a variable
lagged one or more periods and itself.
n
 (Y  Y )(Y
rk  t  k 1
t
t k
Y)
n
2
(
Y

Y
)
 t
k  0,1, 2, ....
(3.1)
t 1
rk = autocorrelation coefficient for a lag
of k periods
Y = mean of the values of the series
Yt
= observation in time period t
Yt  k = observation at time period t-k
(Pages 64-65)
6
Autocorrelation Function
(Correlogram)
A graph of the autocorrelations for various lags.
Computation of the lag 1 autocorrelation coefficient
Table 3-1
(page 65)
7
Example 3.1
Data are presented in Table 3-1 (page 65).
 Table 3-2 shows the computations that
lead to the calculation of the lag 1
autocorrelation coefficient.
 Figure 3-4 contains a scatter diagram of
the pairs of observations (Yt, Yt-1).
 Using the totals from Table 3-2 and
Equation 3.1:

n
r1 
 (Y  Y )(Y
t 11
t 1
t
n
 (Y  Y )
t 1
t
2
Y)

843
 0.572
1474
8
Autocorrelation Function (Correlogram)
(Cont.)
Minitab instructions: Stat > Time Series > Autocorrelation
Autocorrelation Function
1.0
0.8
Autocorrelation
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
1
2
Lag
3
FIGURE 3-5 Correlogram or Autocorrelation Function for the
Data Used in Example 3.1
9
Questions to be Answered
using Autocorrelation
Analysis
 Are
 Do
the data random?
the data have a trend?
 Are
the data stationary?
 Are
the data seasonal?
(Page 68)
10
Are the data random?
If a series is random:
 The
successive values are not related
to each other.
 Almost all the autocorrelation
coefficients are significantly different
from zero.
11
Is an autocorrelation coefficient
significantly different from zero?
- The autocorrelation coefficients of random
data have an approximate normal sampling
distribution.
-At a specified confidence level, a series
can be considered random if the
autocorrelation coefficients are within the
interval [0 ± t SE(rk)],
(z instead of t for large samples).
rk
- The following t statistic can be used: t 
SE ( rk )
12
- Standard error of the
autocorrelation at lag k:
k 1
SE (rk ) 
1  2 ri 2
i 1
(3.2)
n
Where:
ri = the autocorrelation at time lag k.
k = the time lag
n = the number of observations in the time series
13
Example 3.2
(Page 69)
A hypothesis test:
Is a particular autocorrelation coefficient is
significantly different from zero?
At significant level  = 0.05: the critical
values ± 2.2 are the t upper and lower points
for n-1 = 11 degrees of freedom.
Decision Rule:
If t < -2.2 or t > 2.2, reject H◦: rk = 0
Note: t is given directly in the Minitab output under the heading T.
14
Is an autocorrelation
coefficient
different from zero? (Cont.)
The Modified Box-Pierce Q statistic
(developed by: Ljung, and Box) “LBQ”
A portmanteau test: Whether a whole set of
autocorrelation coefficients at once.
15
rk2
Q  n(n  2) 
k 1 n  k
m
(3.3)
Where:
n= number of observations
K= the time lag
m= number of time lags to be considered
rk= kth autocorrelation coefficient lagged k time periods
The value of Q can be compared with the
chi-square with m degrees of freedom.
16
Example 3.3 (Page 70)
t
Yt
t
Yt
t
Yt
t
Yt
1
2
3
4
5
6
7
8
9
10
343
574
879
728
37
227
613
157
571
72
11
12
13
14
15
16
17
18
19
20
946
142
477
452
727
147
199
744
627
122
21
22
23
24
25
26
27
28
29
30
704
291
43
118
682
577
834
981
263
424
31
32
33
34
35
36
37
38
39
40
555
476
612
574
518
296
970
204
616
17
97
Autocorrelation
Autocorrelation Function
1.0
0.8
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
1
2
3
4
5
6
7
8
Lag
Corr
T
LBQ
Lag
Corr
T
LBQ
1
2
-0.19
-0.01
-1.21
-0.04
1.57
1.58
8
9
-0.03
-0.03
-0.15
-0.18
7.67
7.73
3
-0.15
-0.89
2.53
10
0.02
0.12
7.75
4
0.10
0.63
3.04
5
6
-0.25
0.03
-1.50
0.16
6.13
6.17
7
0.17
0.95
7.63
9
10
FIGURE 3-7 Autocorrelation Function for the Data Used in Example 18
3.3
• Q statistic for m= 10 time lags is
calculated = 7.75 (using
Minitab).
• The chi-square value 
18.307,
(tested at 0.05 significance level,
degrees of freedom df = m = 10).
Table B-4 (Page 527)
2
0.05=
•Q< 
2
0.05
,
Conclusion: the series is random.
19
Do the Data have
a Trend?
A significant relationship exists between
successive time series values.
 The autocorrelation coefficients are large for
the first several time lags, and then
gradually drop toward zero as the number of
periods increases.
 The autocorrelation for time lag 1: is close
to 1, for time lag 2: is large but smaller than
for time lag 1.

20
Example 3.4 (Page 72)
Data in Table 3-4 (Page 74)
Year
Yt
Year
Yt
Year
Yt
Year
Yt
1955
3307
1966
6769
1977
17224
1988
50251
1956
3556
1967
7296
1978
17946
1989
53794
1957
3601
1968
8178
1979
17514
1990
55972
1958
3721
1969
8844
1980
25195
1991
57242
1959
4036
1970
9251
1981
27357
1992
52345
1960
4134
1971
10006
1982
30020
1993
50838
1961
4268
1972
10991
1983
35883
1994
54559
1962
4578
1973
12306
1984
38828
1995
34925
1963
5093
1974
13101
1985
40715
1996
38236
1964
5716
1975
13639
1986
44282
1997
41296
1965
6357
1976
14950
1987
48440
1998
…….
21
Data Differencing
• A time series can be differenced to
remove the trend and to create a
stationary series.
• See FIGURE 3-8 (Page 73) for
differencing the Data of Example
3.1
• See FIGURES 3-12, 3-13 (Page 75)
22
Are The Data
Seasonal?
 For
quarterly data: a significant
autocorrelation coefficient will appear
at time lag 4.
 For monthly data: a significant
autocorrelation coefficient will appear
at time lag 12.
23
Example 3.5
(Page 76)
See Figures 3-14, 3-15 (Page 77)
Table 3-5:
Year
December 31
March 31
June 30
September 30
1994
147.6
251.8
273.1
249.1
1995
139.3
221.2
260.2
259.5
1996
140.5
245.5
298.8
287.0
1997
168.8
322.6
393.5
404.3
1998
259.7
401.1
464.6
497.7
1999
264.4
402.6
411.3
385.9
2000
232.7
309.2
310.7
293.0
2001
205.1
234.4
285.4
258.7
2002
193.2
263.7
292.5
315.2
2003
178.3
274.5
295.4
286.4
2004
190.8
263.5
318.8
305.5
2005
242.6
318.8
329.6
338.2
2006
232.1
285.6
291.0
281.4
24
Time Series Graph
Quarterly Sales: 1995-2007
600
Quarterly Sales
500
400
300
200
100
0
Years
FIGURE 3-14 Time Series Plot of Quarterly Sales for Coastal
Marine for Example 3.5
25
Autocorrelation
Autocorrelation Function
1.0
0.8
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
2
7
12
Lag
Corr
T
LBQ
Lag
Corr
T
LBQ
1
0.39
2.83
8.49
8
0.34
1.48
56.92
2
0.16
1.03
10.00
9
-0.18
-0.77
59.10
3
0.29
1.81
14.91
10
-0.43
-1.79
71.46
4
0.74
4.30
46.79
11
-0.32
-1.24
78.32
5
0.15
0.67
48.14
12
0.09
0.32
78.83
6
-0.15
-0.64
49.44
13
-0.35
-1.34
87.77
7
-0.05
-0.23
49.60
FIGURE 3-15 Autocorrelation Function for quarterly Sales for Coastal
Marine for Example 3.5
Autocorrelation coefficients at time lags 1 and 4 are significantly
26
different from zero, Sales are seasonal on quarterly basis.
Choosing a
Forecasting
Technique
Questions to be Considered:







27
Why is a forecast needed?
Who will use the forecast?
What are the characteristics of the data?
What time period is to be forecast?
What are the minimum data requirements?
How much accuracy is required?
What will the forecast cost?
Choosing a
Forecasting Technique
(Cont.)
The Forecaster Should
Accomplish the Following:

Define the nature of the forecasting problem.

Explain the nature of the data.

Describe the properties of the techniques.

Develop criteria for selection.
28
Choosing a Forecasting
Technique (Cont.)
Factors Considered:

Level of Details.

Time horizon.

Based on judgment or data manipulation.

Management acceptance.
29Cost.
General considerations for
choosing the appropriate method
Method
30
Uses
Considerations
Judgment
Can be used in the absence
of historical data (e.g. new
product).
Most helpful in mediumand long-term forecasts
Subjective estimates are subject
to the biases and motives of
estimators.
Causal
Sophisticated method
Very good for medium- and
long-term forecasts
Must have historical data.
Relationships can be difficult to
specify
Time
series
Easy to implement
Work well when the series is
relatively stable
Rely exclusively on past data.
Most useful for short-term
estimates.
Minimal Data Requirements
Pattern
of Data
Time
Horizon
Type of
Model
ST , T , S
S
TS
1
Simple averages
ST
S
TS
30
Moving averages
ST
S
TS
4-20
Single Exponential smoothing
ST
S
TS
2
Linear (Double) exponential smoothing (Holt’s)
T
S
TS
3
Quadratic exponential smoothing
T
S
TS
4
Seasonal exponential smoothing (Winter’s)
S
S
TS
2xs
Adaptive filtering
S
S
TS
5xs
Simple regression
T
I
C
10
C,S
I
C
10 x V
Classical decomposition
S
S
TS
Exponential trend models
T
I,L
TS
10
S-curve fitting
T
I,L
TS
10
Gompertz models
T
I,L
TS
10
Growth curves
T
I,L
TS
10
Census X-12
S
S
TS
ST , T , C , S
S
TS
24
Lading indicators
C
S
C
24
Econometric models
C
S
C
30
Method
Naïve
Multiple regression
ARIMA (Box-Jenkins)
Nonseasonal
Seasonal
5xs
6xs
Time series multiple regression
T,S
I,L
C
Pattern of data: ST, stationary; T, trended; S, seasonal; C, cyclical.
Time horizon: S, short term (less than three months); I, intermediate; L, long term
Type of model: TS, time series; C, causal. Seasonal: s, length of seasonality. of Variable: V, number variables.
3xs
6xs
31
Measuring
Forecast Error
Basic Forecasting Notation
Yt = actual value of a time series in time t

Yt = forecast value for time period t
et = Yt -

Yt
= forecast error in time t (residual)
32
Measuring Forecasting Error
(Cont.)
The Mean Absolute Deviation
The Mean Squared Error
The Root Mean Square Error

1 n
MAD   Yt  Yt
n t 1
 2
1 n
MSE   (Yt  Yt )
n t 1
RMSE 
 2
1 n
(Yt  Yt )

n t 1
1 n
The Mean Absolute Percentage Error MAPE  
n t 1
The Mean Percentage Error
Equations (3.7 - 3.11)

Yt  Yt
Yt

n
1 (Yt  Yt )
MPE  
n t 1 Yt
33
Used for:
• The measurement of a technique usefulness or reliability.
• Comparison of the accuracy of two different techniques.
• The search for an optimal technique.
Example 3.6 (Page 83)
• Evaluate the model using:
• MAD, MSE, RMSE, MAPE, and MPE.
34
Empirical Evaluation of
Forecasting Methods
Results of the forecast accuracy for a
sample of 3003 time series (1997):
Complex methods do not necessarily
produce more accurate forecasts than
simpler ones.
 Various accuracy measures (MAD, MSE,
MAPE) produce consistent results.
 The performance of methods depends on
the forecasting horizon and the kind of data
analyzed( yearly, quarterly, monthly).

35
Determining the Adequacy
of a Forecasting Technique

Are the residuals indicate a random series?
(Examine the autocorrelation coefficients of the
residuals, there should be no significant ones)

Are they approximately normally distributed?

Is the technique simple and understood by
decision makers?
36
Download