The Forecast Process, Data Considerations, and Model Selection

advertisement
Applied Business Forecasting and
Planning
The Forecast Process, Data
Considerations, and Model Selection
Chapter Objectives

Learning Objectives



Establish framework for a successful forecasting system
Introduce the trend, cycle and seasonal factors of a time
series
Introduce concept of Autocorrelation and Estimation of the
Autocorrelation function.
The Forecast Process
The overall forecasting process can be outlined as
follows:


Problem Definition
1.
2.

Gathering Information
1.
2.

Identify time dimensions
Data considerations
Choosing and fitting models
1.
2.

Specify the objectives
Identify what to forecast
Model selection
Model evaluation
Using and evaluating a forecasting model
1.
2.
3.
Forecast preparation
Forecast presentation
Tracking results
The Forecast Process

Problem Definition
1.
Specify the objectives

2.
How the forecast will be used in a decision
context.
Determine what to forecast

Fore example to forecast sales one must decide
whether to forecast unit sales or dollar sales, Total
sales, or sales by region or product line.
The Forecast Process

Gathering Information
1.
Identify time dimensions

The length and periodicity of the forecast.

2.
Is the forecast needed on an annual, quarterly, monthly
daily basis, and how much time we have to develop the
forecast?
Data consideration

Quantity and type of data that are available.

Where to go to get the data.
The Forecast Process

Choosing and fitting models

Model selection

This phase depends on the following criteria




The pattern exhibited by the data
The quantity of historical data available
The length of the forecast horizon
Model evaluation

Test the model on the specific series that we want to forecast.


Fit: refers to how well the model model works in the set that was
used to develop it.
Accuracy refers to how well the model works in the “holdout”
period.
The Forecast Process

Using and evaluating a forecasting model

Forecast preparation


Forecast Presentation


The result of having found model or models that you believe
will produce an acceptably accurate forecast.
It involve clear communication.
Tracking results

Over time, even the best of models are likely to deteriorate in
terms of accuracy and should be adjusted or replaced with
alternative methods.
Explanatory versus Time Series forecasting

Explanatory models

Assume that the variable to be forecasted
exhibits an explanatory relationship with one or
more independent variables

DCS = f (DPI, PR, Index, Error)




DCS = domestic car sales
DPI = Disposable income
PR = prime interest rate
Index = University of Michigan index of consumer index.
Explanatory versus Time Series forecasting

Time series forecasting makes no attempt to
discover the factors affecting its behavior.
Hence prediction is based on past values of
a variable. The objective is to discover the
pattern in the historical data series and
extrapolate that pattern into the future.

DCS t+1 = f (DCS t , DCS t-1, DCS t-2, Error)
Trend, Seasonal, and Cyclical Data Patterns


The data that are used most often in forecasting
are time series.
Time series data are collected over successive
increments of time.


Example: Monthly unemployment rate, The quarterly gross
domestic product, the number of visitors to a national park
every year for a 30-year period.
Such time series data can display a variety of
patterns when plotted over time.
Data Pattern

A time series is likely to contain some or all
of the following components:




Trend
Seasonal
Cyclical
Irregular
Data Pattern

Trend in a time series is the long-term change in
the level of the data i.e. observations grow or
decline over an extended period of time.

Positive trend


Negative trend


When the series move upward over an extended period of time
When the series move downward over an extended period of
time
Stationary

When there is neither positive or negative trend.
Data Pattern

Seasonal pattern in time series is a regular
variation in the level of data that repeats
itself at the same time every year.

Examples:


Retail sales for many products tend to peak in
November and December.
Housing starts are stronger in spring and summer
than fall and winter.
Data Pattern



Cyclical patterns in a time series is presented by
wavelike upward and downward movements of
the data around the long-term trend.
They are of longer duration and are less regular
than seasonal fluctuations.
The causes of cyclical fluctuations are usually less
apparent than seasonal variations.
Data Pattern


Irregular pattern in a time series data are the
fluctuations that are not part of the other
three components
These are the most difficult to capture in a
forecasting model
Example:GDP, in 1996 Dollars
Example:Quarterly data on private housing starts
Example:U.S. billings of the Leo Burnet
advertising agency
Data Patterns and Model Selection




The pattern that exist in the data is an important
consideration in determining which forecasting
techniques are appropriate.
To forecast stationary data; use the available history to
estimate its mean value, this is the forecast for future
period.
The estimate can be updated as new information
becomes available.
The updating techniques are useful when initial
estimates are unreliable or the stability of the average
is in question.
Data Patterns and Model Selection

Forecasting techniques used for stationary
time series data are:





Naive methods
Simple averaging methods,
Moving averages
Simple exponential smoothing
autoregressive moving average(ARMA)
Data Patterns and Model Selection

Methods used for time series data with trend are:







Moving averages
Holt’s linear exponential smoothing
Simple regression
Growth curve
Exponential models
Time series decomposition
Autoregressive integrated moving average(ARIMA)
Data Patterns and Model Selection



For time series data with seasonal component the goal
is to estimate seasonal indexes from historical data.
These indexes are used to include seasonality in
forecast or remove such effect from the observed
value.
Forecasting methods to be considered for these type of
data are:



Winter’s exponential smoothing
Time series multiple regression
Autoregressive integrated moving average(ARIMA)
Data Patterns and Model Selection



Cyclical time series data show wavelike
fluctuation around the trend that tend to
repeat.
Difficult to model because their patterns are
not stable.
Because of the irregular behavior of cycles,
analyzing these type data requires finding
coincidental or leading economic indicators.
Data Patterns and Model Selection

Forecasting methods to be considered for
these type of data are:




Classical decomposition methods
Econometric models
Multiple regression
Autoregressive integrated moving average
(ARIMA)
Example:GDP, in 1996 Dollars

For GDP, which has a trend and a cycle but
no seasonality, the following might be
appropriate:




Holt’s exponential smoothing
Linear regression trend
Causal regression
Time series decomposition
Example:Quarterly data on private housing starts

Private housing starts have a trend,
seasonality, and a cycle. The likely
forecasting models are:




Winter’s exponential smoothing
Linear regression trend with seasonal
adjustment
Causal regression
Time series decomposition
Example:U.S. billings of the Leo Burnet
advertising agency

For U.S. billings of Leo Burnett
advertising, There is a non-linear trend,
with no seasonality and no cycle, therefore
the models appropriate for this data set are:


Non-linear regression trend
Causal regression
Autocorrelation


Correlation coefficient is a summary
statistic that measures the extent of linear
relationship between two variables. As such
they can be used to identify explanatory
relationships.
Autocorrelation is comparable measure that
serves the same purpose for a single
variable measured over time.
Autocorrelation


In evaluating time series data, it is useful to look at the
correlation between successive observations over time.
This measure of correlation is called autocorrelation and
may be calculated as follows:
n
rk 
(y
t  k 1
t
n
(y
t 1




 y )( yt  k  y )
t
 y)2
rk = autocorrelation coefficient for a k period lag.
y mean of the time series.
yt = Value of the time series at period t.
y t-k = Value of time series k periods before period t.
Autocorrelation

Autocorrelation coefficient for different
time lags can be used to answer the
following questions about a time series data.

Are the data random?

In this case the autocorrelations between yt and y t-k
for any lag are close to zero. The successive values
of a time series are not related to each other.
Correlograms: An Alternative Method of
Data Exploration

Is there a trend?




If the series has a trend, yt and y t-k are highly
correlated
The autocorrelation coefficients are significantly
different from zero for the first few lags and then
gradually drops toward zero.
The autocorrelation coefficient for the lag 1 is often
very large (close to 1).
A series that contains a trend is said to be nonstationary.
Correlograms: An Alternative Method of
Data Exploration

Is there seasonal pattern?


If a series has a seasonal pattern, there will be a
significant autocorrelation coefficient at the seasonal
time lag or multiples of the seasonal lag.
The seasonal lag is 4 for quarterly data and 12 for
monthly data.
Correlograms: An Alternative Method of
Data Exploration

Is it stationary?


A stationary time series is one whose basic
statistical properties, such as the mean and variance,
remain constant over time.
Autocorrelation coefficients for a stationary series
decline to zero fairly rapidly, generally after the
second or third time lag.
Correlograms: An Alternative Method of
Data Exploration

To determine whether the autocorrelation at
lag k is significantly different from zero, the
following hypothesis and rule of thumb may
be used.




H0: k= 0, Ha: k  0
2
rk 
For any k, reject H0 if
n
Where n is the number of observations.
This rule of thumb is for  = 5%
Correlograms: An Alternative Method of
Data Exploration


The hypothesis test developed to determine
whether a particular autocorrelation
coefficient is significantly different from
zero is:
Hypotheses


H0: k= 0,
Ha: k  0
Test Statistic:
t
rk  0
1 nk
Correlograms: An Alternative Method of
Data Exploration

Reject H0 if
t  t n  k ; 2
or
t  t n  k ; 2
Correlograms: An Alternative Method of
Data Exploration




The plot of the autocorrelation Function
(ACF) versus time lag is called
Correlogram.
The horizontal scale is the time lag
The vertical axis is the autocorrelation
coefficient.
Patterns in a Correlogram are used to
analyze key features of data.
Example:Mobil Home Shipment


Correlograms for the mobile home shipment
Note that this is quarterly data
1
0.8
0.6
ACF
0.4
Upper Limit
0.2
Low er Limit
0
-0.2
-0.4
1
2
3
4
5
6
7
8
9
10
11
12
Example:Japanese exchange Rate

As the world’s economy becomes increasingly
interdependent, various exchange rates between
currencies have become important in making
business decisions. For many U.S. businesses, The
Japanese exchange rate (in yen per U.S. dollar) is
an important decision variable. A time series plot
of the Japanese-yen U.S.-dollar exchange rate is
shown below. On the basis of this plot, would you
say the data is stationary? Is there any seasonal
component to this time series plot?
Example:Japanese exchange Rate
Japanese Exchange Rate
Exchange Rate ( yen per U.S. dollar)
180
160
140
120
100
EXRJ
80
60
40
20
0
0
5
10
15
Months
20
25
30
Example:Japanese exchange Rate


Here is the autocorrelation
structure for EXRJ.
With a sample size of 12,
the critical value is
2
2

 0.408
n
24

This is the approximate
95% critical value for
rejecting the null
hypothesis of zero
autocorrelation at lag K.
Obs
1
2
3
4
5
6
7
8
9
10
11
12
ACF
.8157
.5383
.2733
.0340
-.1214
-.1924
-.2157
-.1978
-.1215
-.1217
-.1823
-.2593
Example:Japanese exchange Rate

The Correlograms for EXRJ is given below
1
0.8
0.6
0.4
ACF
0.2
Upper Limit
0
-0.2
-0.4
-0.6
Low er Limit
1
2
3
4
5
6
7
8
9
10
11
12
Example:Japanese exchange Rate

Since the autocorrelation coefficients fall to
below the critical value after just two
periods, we can conclude that there is no
trend in the data.
Example:Japanese exchange Rate


To check for seasonality at  = .05
The hypotheses are:


H0; 12 = 0
Test statistic is:
t

Ha:12  0
rk  0
 .2595

 0.899
1 n  k 1 / 24  12
Reject H0 if
t  t n  k ; 2
or
t  t n  k ; 2
t n  k ; 2  t12;0.025  2.179
Example:Japanese exchange Rate

Since
t  0.899  t12;0.025  2.179

We do not reject H0 , therefore seasonality
does not appear to be an attribute of the
data.
ACF of Forecast Error


The autocorrelation function of the forecast
errors is very useful in determining if there
is any remaining pattern in the errors
(residuals) after a forecasting model has
been applied.
This is not a measure of accuracy, but rather
can be used to indicate if the forecasting
method could be improved.
Download