Chapter 3

advertisement
İŞL 276
Exploring Data Patterns & Choosing a
Forecasting Technique
Chapter 3
Fall 2014
Exploring Data Patterns & Choosing a Forecasting Technique
2
¾
Collection of valid and reliable data is the most time consuming
and difficult part of forecasting.
¾
The difficult task facing most forecasters is how to find relevant
data that will help solve their specific decision making
problems.
¾
GIGO: garbage in and garbage out
¾
The following four criteria can be applied to the determination
of whether data will be useful:
1.
Data should be reliable and accurate.
2.
Data should be relevant.
3.
Data should be consistent.
4.
Data should be timely.
Exploring Data Patterns & Choosing a Forecasting Technique
3
There Are Two Types of Data :
¾
¾
One are observations collected at a single point in time called crosssectional data and,
The other one are observations collected over successive increments
of time called time series data.
Exploring Data Patterns & Choosing a Forecasting Technique
4
Exploring Time Series Data Patterns :
The important aspects in selecting an appropriate forecasting method for
time series data is to consider the following different types of data
patterns:
1. Horizontal Pattern
¾
The observations fluctuate around a constant level or mean.
¾
This type of series is called stationary in its mean.
Exploring Data Patterns & Choosing a Forecasting Technique
5
2. Trend Pattern
¾
The observations grow or decline over an extended period of time.
¾
This type of series is called nonstationary.
The trend is the long term component that represents the growth or
decline in the time series over an extended period of time.
¾
Exploring Data Patterns & Choosing a Forecasting Technique
6
3. Cyclical Pattern
¾
The observations exhibit rises and falls that are not of a fixed period.
¾
The cyclic component is the Wave like fluctuation around the trend.
¾
Cyclical fluctuations are often influenced by changes in economic
expansions and contractions (business cycle)
Exploring Data Patterns & Choosing a Forecasting Technique
7
4. Seasonal Pattern
¾
The observations are influenced by seasonal factors.
¾
The seasonal component refers to a pattern of change that repeats it self
year after year.
¾
In the monthly series the seasonal component measures the variability of
the series each month, and in the quarterly series the seasonal component
represents the variability in each quarter…etc.
Exploring Data Patterns & Choosing a Forecasting Technique
8
Autocorrelation Analysis
¾
The autocorrelation analysis for different time lags of a variable is used
to identify time series data patterns including components such as trend
and seasonality.
¾
Autocorrelation is the correlation between a variable lagged one or more
periods and itself.
¾
This is measured using the autocorrelation coefficient at lag k , which is
denoted by ρk and it’s estimated by its sample autocorrelation coefficient
rk lag k ; k=0,1 2 …
where;
n
rk =
∑ (Y
t
− Y)(Y t−k −Y)
t=k+1
;k=0,1,2,...
n
∑ (Y −Y)
t
t=1
2
Exploring Data Patterns & Choosing a Forecasting Technique
9
¾
Where Yt and Yt-k are observations at time period t and t-k , respectively.
¾
Autocorrelation analysis can be done by demonstrating the correlogram
or autocorrelation function.
¾
The correlogram or autocorrelation function is a graph of the
autocorrelations for various lags of a time series.
Example 3.1
Harry Vernon has collected data on the
number of VCRs sold last year for Vernon’s
Music Store.
We need to know lag 1 and lag 2
autocorrelation coefficients (r1 and r2)
Month
# of VCRs
January
123
February
130
March
125
April
138
May
145
June
142
July
141
August
146
September
147
October
157
November
150
December
160
Exploring Data Patterns & Choosing a Forecasting Technique
10
Solution:
Exploring Data Patterns & Choosing a Forecasting Technique
11
Lag 1
Exploring Data Patterns & Choosing a Forecasting Technique
12
Lag 2
Exploring Data Patterns & Choosing a Forecasting Technique
13
MINITAB
Exploring Data Patterns & Choosing a Forecasting Technique
14
MINITAB
The correlogram or
autocorrelation function is
a graph of the
autocorrelation for
various lags of time series
Exploring Data Patterns & Choosing a Forecasting Technique
15
¾
With the correlogram display, the data patterns including trend and
seasonality can be studied.
¾
Autocorrelation coefficients for different time lags for a variable can be
used to answer the following questions about a time series:
1.
Are the data random?
2.
Do the data have a trend?
3.
Are the data stationary?
4.
Are the data seasonal?
¾
If a series is random, the autocorrelations between Y, and Yt-k for any lag
k are close to zero. The successive values of a time series are not related
to each other.
¾
If a series has a trend, successive observations are highly correlated and
the autocorrelation coefficients are typically significantly different from
zero for the first several time lags and then gradually drop toward zero as
the number of lags increases.
Exploring Data Patterns & Choosing a Forecasting Technique
16
¾
The autocorrelation coefficient for time lag 1 is often very large (close to
1).
¾
The autocorrelation coefficient for time lag 2 will also he large. However,
it will not he as large as for time lag 1.
¾
If a series has a seasonal pattern, a significant autocorrelation coefficient
will occur at the seasonal time lag or multiples of the seasonal lag.
¾
The seasonal lag is 4 for quarterly data and 1 2 for monthly data.
¾
How does an analyst determine whether an autocorrelation coefficient is
significantly different from zero?
¾
Statisticians showed that the sampling distribution of the sample
autocorrelation coefficient r1 is normally distributed with mean zero
and approximate standard deviation1 n.
¾
Knowing this, we can compare the sample autocorrelation coefficients
with this theoretical sampling distribution and determine whether, for
given time lags, they come from a population whose mean is zero.
Exploring Data Patterns & Choosing a Forecasting Technique
17
Checking the Significance of the Autocorrelation Coefficients
¾
In this section we are going to determine whether the autocorrelation
coefficient ρ k at lag k ; k=0,1,2,… ; is different from zero for any time
series data set.
¾
The sampling distribution of the sample autocorrelation coefficient rk at
lags k ; k=2,3,… ; is normally distributed with mean zero and
approximate standard deviation SE(rk) and is given by:
k−1
1+2
SE(r k )=
¾
SE(r1 = 1
n
∑r
2
i
i=1
n
; k=2,3,...
Exploring Data Patterns & Choosing a Forecasting Technique
18
α
,n−1
2
k
ρk ≤+tα
2
the time series data is random.
2. Testing for individual
¾
,n−1
⋅SE(r)k
ρk
Now using ΅ level of significant, we want to test for k=1,2,…
H0:
ρk =0
H1 : ρk ≠ 0
using the test statistics
t =
reject H0 if
t ≥tα
2
rk
SE(r
or P-value ΅
k
)
Exploring Data Patterns & Choosing a Forecasting Technique
19
3. Testing a subset of ρ
¾
k
;k=1, 2,
,m
We use one of the common portmanteau tests; the following modified
Box Pierce
m
Q = n(n+2)
r k2
∑ n− k
k =1
¾
We reject that all the subset of autocorrelations are zero if Q ǃ x2΅,m or
p-value ǂ ΅.
Exploring Time Series Data Types
The autocorrelation coefficients are used to know if the time series data are:
1. Random data.
¾
t
¾
The time series is random or independent if the auto correlations between
Y and Y t kfor any lag k are close to zero.
This implies that the successive data are not related to each other.
We use the option of constructing confidence interval to check that almost
all sample autocorrelations should lie within a range specified by zero.
¾
Exploring Data Patterns & Choosing a Forecasting Technique
20
For example, at 5 % level of significant, the time series data are random if
95 % of the sample autocorrelations will lie within
¾
-2.2 SE(rk) ≤
ρ
k
≤ +2.2 SE (rk) for all k = 1,2,3...
Also, it is possible to use the Q Statistic in option of testing a subset of
autocorrelations is zero.
¾
For example, at 5 % level of significant, the time series data are random if
Q for a subset of 10 autocorrelations is less than x20.05,10 = 18.31
¾
Example 3.2
A hypothesis test is developed to determine whether a particular
autocorrelation coefficient is significantly different from zero for the
correlogram figure. The null and alternative hypotheses for testing the
significance of the lag 1 population autocorrelation coefficient are
H0 : ρ 1 = 0
H 1 : ρ1 ≠ 0
Exploring Data Patterns & Choosing a Forecasting Technique
21
If the null hypothesis is true, the test statistic
t=
r1−0
r1−ρ1
r1
= SE(r) =
SE(r)1
SE(r) 1
1
has a t distribution with df = n - 1 → n- 1 = 12-1 = 11, so for 5% significance
level, the decision rule is:
Decision Rule: if t < -2.2 or t > 2.2 → we reject Ho and conclude the lag 1
autocorrelation is significantly different from 0.
The critical values ± 2.2 are the upper and lower .025 points of a t
distribution with 11 df.
The standard error of r1 is SE(r1) = √1/12 = .289 and the value of the test
statistic becomes
r1 − ρ1 .572
t=
=
=1.98
SE(r1) .289
Ho:
ρ
1
= 0 cannot be rejected because —2.2 < 1.98 < 2.2.
Exploring Data Patterns & Choosing a Forecasting Technique
22
Now for lag 2
H0 : ρ 2 = 0
H 1 : ρ2 ≠ 0
t=
r2
SE(r2)
Decision Rule: if t < 2.2 or t > 2.2 → we reject Ho and conclude the lag 2
autocorrelation is significantly different from 0 with the same critical
values of ± 2.2 and 11 df
−
The standard error of r2 is SE(r2 ) =
1+2∑ri 2
The value of the test statistic becomes t=
Ho:
ρ
2
i=1
n
=
1+2(.572) 2
=.371
12
r2
= .463 =1.25
SE(r 2 ) .371
= 0 cannot be rejected because —2.2 < 1.25< 2.2.
Exploring Data Patterns & Choosing a Forecasting Technique
23
¾
¾
An alternative way to check for significant autocorrelation is to construct,
say, 95% confidence limits centered at 0.
These limits for lags 1 and 2 are given by
¾
Autocorrelation significantly different from 0 is indicated whenever a
value for r falls outside the corresponding confidence limits.
¾
The 95% confidence limits are shown in correlogram by the dashed lines
in the graphical display of the autocorrelation function.
Exploring Data Patterns & Choosing a Forecasting Technique
24
2. Stationary and nonstationary data.
¾
The time series is stationary if the observations fluctuate around a
constant level or mean.
¾
The sample autocorrelation coefficients decline to zero fairly rapidly,
generally after the second or third time lag.
¾
The time series is nonstationary or having trend if the successive
observations are highly correlated .
¾
The autocorrelation coefficients are sufficiently different from zero for
the first several time lags and then gradually drop toward zero as the
number of lags increases.
¾
Nonstationary data to be analyzed the trend to be removed from the data
before modeling.
¾
On possible technique is used to remove the trend is the differencing
method.
Exploring Data Patterns & Choosing a Forecasting Technique
25
¾
Difference the data at order 1,NJYt =Yt - Yt 1 may remove the trend and the
time series data becomes stationary.
Exploring Data Patterns & Choosing a Forecasting Technique
26
Example 3.4
An analyst for Sears company is assigned the task of forecasting operating
revenue for 2001. She gathers the data for the years 1955 to 2000 as shown in
table below
Exploring Data Patterns & Choosing a Forecasting Technique
27
The data are plotted as a time series in
figure
A 95% confidence interval for the
autocorrelation coefficients at time lag
1 using 0 ± Z025(√1/12 ).
Time lags are significantly different
from zero (.96, .92. and .87) and that
the values then gradually drop to zero
Exploring Data Patterns & Choosing a Forecasting Technique
28
The date series were differenced using
MINITAB to remove the trend and to
create a stationary series.
The differenced series shows no evidence
of a trend
One can notice that the autocorrelation
coefficient at time lag 3. (0.32) is
significantly different from zero.
The autocorrelations at lags other than lag
3 are small
Exploring Data Patterns & Choosing a Forecasting Technique
29
4. Seasonal data.
¾
The time series is seasonal if significant autocorrelation coefficient will
occur at a seasonal time lag or multiple of the seasonal lag.
¾
For example, for the quarterly seasonal data, a significant autocorrelation
coefficient will appear at lag 4, for the monthly seasonal data, a
significant autocorrelation coefficient will appear at lag 12, and so forth.
¾
The time series data to be analyzed the seasonal component should be
removed from the data before modeling.
¾
Different techniques will be studied in future helps in removing the
seasonal components.
Exploring Data Patterns & Choosing a Forecasting Technique
30
Example 3.5
An analyst for Outboard Marine Corporation always felt that sales were
seasonal. He gathers the data shown in the table below for the quarterly
sales of Outboard Marine Corporation from 1984 to 1996 and plots them as
the time series graph
Exploring Data Patterns & Choosing a Forecasting Technique
31
By observing the time series plot, he
noticed a seasonal pattern
He computes the autocorrelation
coefficients.
He notes that the autocorrelation
coefficients at time lags 1 and 4 are
significantly different from zero
He concludes that Outboard Marine
sales are seasonal on a quarterly basis
Exploring Data Patterns & Choosing a Forecasting Technique
32
Choosing a Forecasting Technique
1. For Stationary Time Series Data
The suggested methods are as follows:
¾
Naïve Methods
¾
Simple Averaging Methods
¾
Moving Averages
¾
Autoregressive Moving Average (ARMA)
¾
Box Jenkins
These methods are suggested to use whenever the following exist:
¾
The forces generating a series have stabilized and the environment in
which the series exists is relatively unchanging.
¾
A very simple model is needed because of a lack of data or for ease of
explanation or implementation.
Exploring Data Patterns & Choosing a Forecasting Technique
33
¾
A very simple model is needed because of a lack of data or for ease of
explanation or implementation.
¾
Stability may be obtained by making simple corrections for factors such
as population growth or inflation.
¾
The series may be transformed into a stable one using logarithms, square
roots, or differences.
¾
The series is a set of forecast errors from a forecasting technique that is
considered adequate.
Examples:
¾
The unit sales of a product or service in maturation stage of its life cycle.
¾
The Number of sales resulting from a constant level of effort.
Exploring Data Patterns & Choosing a Forecasting Technique
34
¾
Number of breakdowns per week on an assembly line having a uniform
production rate.
¾
Changing income to per capita income.
2. For Nonstationary (trended) Time Series Data
¾
The suggested methods are as follows:
¾
Moving Averages
¾
Holtȇs Linear Exponential Smoothing
¾
Simple Regression
¾
Growth Curves
¾
Exponential Models
¾
Autoregressive Integrated Moving Average (ARIMA)
¾
Box Jenkins
Exploring Data Patterns & Choosing a Forecasting Technique
35
These methods are suggested to use whenever the following exist:
Increased productivity and new technology lead to changes in
lifestyle.
¾
Increasing population causes increases in demand for goods and
services.
¾
The purchasing power of the dollar affects economic variables due to
inflation.
¾
¾
Market acceptance increases.
Examples:
¾
Demand for electronic components (increased with the use of
computer).
¾
The Railroad usage (decreased with the use of airplane).
¾
Sales revenues of consumer goods.
Exploring Data Patterns & Choosing a Forecasting Technique
36
¾
Salaries, Production costs, and prices.
¾
The growth period in the life cycle of a new product.
3. For Seasonal Time Series Data
¾
¾
The suggested methods are as follows:
¾
Classical Decomposition
¾
Census X - 12
¾
Winter Exponential Smoothing
¾
Multiple Regression
¾
Autoregressive Integrated Moving Average (ARIMA)
¾
Box Jenkins
These methods are suggested to use whenever the following exist:
¾
Weather influences the variable of interest.
¾
The annual calendar influences the variable of interest.
Exploring Data Patterns & Choosing a Forecasting Technique
37
Examples:
¾ Electrical consumptions.
¾ Summer and winter activities (e.g. sports such as skiing).
¾ Clothing.
¾
¾
Agricultural growing seasons.
Retail sales influenced by holidays, three day weekends, and school
calendar.
4. For Cyclical Time Series Data
¾
The suggested methods are as follows:
¾ Classical Decomposition
¾ Economic Indicators
¾ Econometric Models
¾ Multiple Regression
¾ Autoregressive Integrated Moving Average (ARIMA)
¾ Box Jenkins
Exploring Data Patterns & Choosing a Forecasting Technique
38
¾
¾
These methods are suggested to use whenever the following are exists:
¾
The business cycle influences the variable of interest.
¾
Shifts in popular tastes occur.
¾
Shifts in population occur.
¾
Shifts in product life cycle occur.
Examples:
¾
Economic, Market, Competitive factors.
¾
Fashions, Music, Food.
¾
Wars, Famines, Epidemics, Natural Disasters.
¾
Introduction, Growth, Decline.
¾
Maturation and market saturation.
Exploring Data Patterns & Choosing a Forecasting Technique
39
Remark
¾
Time Horizon (short, intermediate, and long term) for a forecast is
important in the selection of a forecasting technique.
¾
For short and intermediate term forecasts, a variety of quantitative
techniques can be applied.
¾
As the forecasting horizon increases, a number of these techniques
become less applicable.
Exploring Data Patterns & Choosing a Forecasting Technique
40
The following table represents when to choose the appropriate forecasting technique
Exploring Data Patterns & Choosing a Forecasting Technique
41
Measuring Forecasting Error
Suppose Yt be the actual value of a time series at time t and Ŷt be the
forecast value of a time series at time t where t = 1,2,3,…n.
¾
Then the difference between the actual value and its forecast value is
called the residual or forecast error and usually denoted by еt such that
еt = Yt - Ŷt
¾
Remarks on Empirical Evaluation of Forecasting Methods
Statistically sophisticated or complex methods do not necessarily produce
more accurate forecasts than simpler methods.
¾
Various accuracy measures (MAD, MSE, MAPE, and MPE) produce
consistent results when used to evaluate different forecasting methods.
¾
Combining the three smoothing methods on the average does well in
comparison with other methods.
¾
The performance of the various forecasting methods depends on the
length of the horizon and the kind of the data analyzed.
¾
Exploring Data Patterns & Choosing a Forecasting Technique
42
Types of Forecast Accuracy Measures
1. Mean Absolute Deviation (MAD)
It is useful when the analyst wants to measure forecast error in the same
units as the original series.
1n
MAD= n ∑ Yt −Yt
2. Mean Squared Error (MSE)
i=1
It is useful because it penalizes large forecasting error and therefore the
method with moderate errors is more preferable than the method of
small errors.
1 n
MSE = ∑ (Yt −Y)t 2
n i=1
Exploring Data Patterns & Choosing a Forecasting Technique
43
3. Mean Absolute Percentage Error (MAPE)
It is useful when the size or magnitude of the forecast variable is
important in evaluating the accuracy of forecast and useful when the
actual values of a time series are large.
MAPE provides an indication of how large the forecast errors are in
comparison to the actual values of the series
Also it can be used to compare the accuracy of the same or different
techniques on two entirely different series.
t
1 n Y−
MAPE= ∑
n i=1 Y
Y
t
t
Exploring Data Patterns & Choosing a Forecasting Technique
44
4. Mean Percentage Error (MPE)
It is useful when the analyst wants to determine whether a forecasting
method is biased (consistently forecasting low or high).
Therefore;
¾
If MPE is very close to zero then the forecasting method is unbiased.
¾
If MPE is large negative percentage then the forecasting method is
consistently overestimating.
(Yt − Yt )
MPE= ∑
n i=1 Y t
1
n
Exploring Data Patterns & Choosing a Forecasting Technique
45
In general, the above four measures of forecast accuracy are usually used as
follows:
¾
To compare the accuracy of two or more different techniques.
¾
To measure the usefulness and the reliability of a particular technique.
¾
To help search for an optimal technique.
Determining the Accuracy of a Forecasting Technique
To evaluate the adequacy of the forecasting technique, we should check the
following:
Randomness of the residuals → Use the autocorrelation function for the
residuals.
¾
Normality of the residuals → Use the histogram or the normal
probability plot for the residuals.
¾
Exploring Data Patterns & Choosing a Forecasting Technique
46
Significance of parameter estimates → Use the t test for all parameter
estimates.
¾
¾
Simplicity and understandability of the technique for decision makers.
Example 3.6
The following table shows the data for the daily number of customers
requiring repair work, Y, and a forecast of these data, Yt, for Gary’s
Chevron Station.
The forecasting technique used the number of
customers serviced in the previous period as the
forecast for the current period.
This simple technique will be discussed
in Chapter 4.
The following computations were employed to
evaluate this model using MAD, MSE, MAPE,
and MPE.
Exploring Data Patterns & Choosing a Forecasting Technique
47
Exploring Data Patterns & Choosing a Forecasting Technique
48
Application to Management
¾
The following are a few examples of situations constantly arising in the
business world for which a sound forecasting technique would help the
decision making process.
¾
A soft drink company wants to project the demand for its major product
over the next two years, by month.
¾
A major telecommunications company wants to forecast the quarterly
dividend payments of its chief rival for the next three years.
¾
A university needs to forecast student credit hours by quarter for the next
four years in order to develop budget projections for the state legislature.
¾
A public accounting firm needs monthly forecasts of dollar billings so it
can plan for additional accounting positions and begin recruiting.
¾
The quality control manager of a factory that makes aluminum ingots
needs a weekly forecast of production defects for top management of the
company.
Exploring Data Patterns & Choosing a Forecasting Technique
49
¾
A banker wants to see the projected monthly revenue of a small bicycle
manufacturer that is seeking a large loan to triple its output capacity.
¾
A federal government agency needs annual projections of average miles
per gallon of American made cars over the next 10 years in order to make
regulatory recommendations.
¾
A personnel manager needs a monthly forecast of absent days for the
company workforce in order to plan overtime expenditures.
¾
A savings and loan company needs a forecast of delinquent loans over
the next two years in an attempt to avoid bankruptcy.
¾
A company that makes computer chips needs an industry forecast for the
number of personal computers sold over the next five years in order to
plan its research and development budget.
Download