Decomposition Method - City University of Hong Kong

advertisement
Decomposition Method
1
Types of Data
 Time series data: a sequence of observations
measured over time (usually at equally spaced
intervals, e.g., weekly, monthly and annually).
Examples of time series data include:
Gross Domestic Product each quarter;
annual rainfall;
daily stock market index
 Cross sectional data: data on one or more variables
collected at the same point in time
2
Time Series vs Causal Modeling
 Causal (regression) models: the investigator
specifies some behavioural relationship and
estimates the parameters using regression
techniques;
 Time series models: the investigator uses the
past data of the target variable to forecast the
present and future values of the variable
3
Time Series vs Causal Modeling
 On the other hand, there are many cases
when one cannot, or one prefers not to,
build causal models:
1. insufficient information is known about the
behavioural relationship;
2. lack of, or conflicting, theories;
3. insufficient data on explanatory variables;
4. expertise may be unavailable;
5. time series models may be more accurate
4
Time Series vs Causal Modeling
 Direct benefits of using time series models:
1. Little storage capacity is needed;
2. some time series models are automatic in that
user intervention is not required to update the
forecasts each period;
3. some time series models are evolutionary in
that the models adapt as new information is
received;
5
Classical Decomposition of
Time Series
 Trend – does not necessarily imply a
monotonically increasing or decreasing series
but simply a lack of constant mean, though in
practice, we often use a linear or quadratic
function to predict the trend;
 Cycle – refers to patterns or waves in the data
that are repeated after approximately equal
intervals with approximately equal intensity. For
example, some economists believe that “business
cycles” repeat themselves every 4 or 5 years;
6
Classical Decomposition of
Time Series
 Seasonal – refers to a cycle of one year
duration;
 Random (irregular) – refers to the
(unpredictable) variation not covered by the
above
7
Decomposition Method
 Multiplicative Models
Yt  TRt  SNt  CLt  IRt
 Additive Models
Yt  TRt  SNt  CLt  IRt
Find the estimates of these four components.
8
Multiplicative Decomposition
 Examples:
(1) US Retail and Food Services Sales from
1996 Q1 to 2008 Q1
Figure 2.1
(2) Quarterly Number of Visitor Arrivals in Hong
Kong from 2002 Q1 to 2008 Q1
Figure 2.2
9
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
108
307
107
306
106
305
105
304
104
303
103
302
102
301
101
300
100
399
199
398
198
397
197
396
196
Sales Y(t) (in MN US$)
Figure 2.1 US Retail Sales
US Retail & Food Services Sales
500,000
450,000
400,000
350,000
300,000
250,000
200,000
150,000
100,000
50,000
0
Time
Back
10
Figure 2.2 Visitor Arrivals
Number of Visitor Arrivals in Hong Kong
2500000
2000000
1500000
1000000
500000
108
Q
307
Q
107
Q
306
Q
106
Q
305
Q
105
Q
304
Q
104
Q
303
Q
103
Q
302
Q
102
0
Q
Number of Visitors Y(t)
3000000
Time
11
 Cycles are often difficult to identify with a
short time series.
 Classical decomposition typically combines
cycles and trend as one entity:
Yt  TCt  SNt  IRt
12
Illustration : Consider the following 4-year
quarterly time series on sales volume:
Period (t)
Year
Quarter
Sales
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
1
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
72
110
117
172
76
112
130
194
78
119
128
201
81
134
141
216
2
3
4
13
Figure 2.3
14
Step 1 : Estimation of seasonal
component (SNt)
 Yt = TCt  SNt  IRt

SˆNt 
Yt
TCt  IRt
72  110 117 172
 Moving Average 
4
for periods 1 – 4  117.75
110 117 172 76
Moving Average 
4
for periods 2 – 5
 118.75
15
Period (t)
Year
Quarter
Sales
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
1
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
72
110
117
172
76
112
130
194
78
119
128
201
81
134
141
216
2
3
4
MA (t)
117.75
118.75
119.25
122.5
128
128.5
130.25
129.75
131.5
132.25
136
139.25
143
16
 Assuming the average of the observations is
also the median of the observations, the MA
for periods 1 – 4, 2 – 5, 3 – 6 are centered at
positions 2.5, 3.5 and 4.5 respectively.
17
 To get an average centered at periods 3, 4, 5 etc. the
means of two consecutive moving averages are
calculated:
117.75  118.75

Centered Moving
2
Average for period 3
 118.25
118.75  119.25
Centered Moving

2
Average for period 4
 119
18
Period (t)
Year
Quarter
Sales
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
1
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
72
110
117
172
76
112
130
194
78
119
128
201
81
134
141
216
2
3
4
MA (t)
CMA(t)
117.75
118.75
119.25
122.5
128
128.5
130.25
129.75
131.5
132.25
136
139.25
143
118.25
119
120.875
125.25
128.25
129.375
130
130.625
131.875
134.125
137.625
141.125
19
 Because the CMAt contains no seasonality and
irregularity, the seasonal component may be
Yt
~
estimated by
SNt 
CMAt
117
~
For example, SN 3 
 0.989
118.25
~ 172
SN 4 
 1.445
119
20
Period (t)
Year
Quarter
Sales
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
1
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
72
110
117
172
76
112
130
194
78
119
128
201
81
134
141
216
2
3
4
~
MA (t)
CMA(t)
SN(t )
117.75
118.75
119.25
122.5
128
128.5
130.25
129.75
131.5
132.25
136
139.25
143
118.25
119
120.875
125.25
128.25
129.375
130
130.625
131.875
134.125
137.625
141.125
0.989429175
1.445378151
0.628748707
0.894211577
1.013645224
1.499516908
0.6
0.911004785
0.970616114
1.49860205
0.588555858
0.949512843
21
~
 After all SN t  s have been computed, they are
further averaged to eliminate irregularities in the
series. We also adjust the seasonal indices so that
they sum to the number of seasons in a year (i.e., 4
for quarterly data, 12 for monthly data). Why?)
22
Quarter
Average
1 (0.628748707 + 0.6 + 0.588555858)/3=
2 (0.894211577 + 0.911004785 + 0.949512843)/3=
3 (0.989429175 + 1.013645224 + 0.970616114)/3=
4 (1.445378151 + 1.499516908 + 1.49860205)/3=
Sum =
23
Step 2 : Estimation of Trend/Cycle
 Define deseasonalized (or seasonally adjusted)
series as
Dt  Yt SNˆ t
for example, D1 = 72/0.6063 = 118.7506
24
25
 TCt may be estimated by regression using a linear
trend:
Dt   0  1t   t
t  1, 2, 3
TCˆ t  Dˆ t  b0  b1t ,
where b0 and b1 are least squares estimates of
0 and 1 respectively.
26
EXCEL regression output :
So,
TˆCt  113.6997914 1.854638009t
27
For example,
TˆC1  113.6997914 1.8546380091
 115.5544294
Tˆ C 2  113.6997914 1.8546380092
 117.4090674
28
29
Step 3 : Computation of fitted
values and out-of-sample forecasts
Yˆt  TˆCt  SˆN t
In - samplefit :
Yˆ  115.5544 0.6063 70.0621
1

Yˆ16  143.37401.4825 212.5516
30
Out of sample forecast :
Yˆ17  TˆC17  SˆN17
 113.670 1.85517 0.6063
 145.2286 0.6063
 88.054
Yˆ18  TˆC18  SˆN18
 113.670 1.85518 0.9191
 147.0833 0.9191
 135.1796
31
32
Figure 2.4
33
Measuring Forecast Accuracy :
Let et  Yt  Yˆt be theerrorsof forecast.
1)
Mean Squared Error
n
MSE   et2 n
t 1
RMSE  MSE
2)
Mean Absolute Deviation
n
MAD   et n
t 1
RMAD  MAD
34
et =
Method A
–2
1.5
–1
2.1
0.7
Method B
–4
0.7
0.5
1.4
0.1
Method A : MSE =
MAD =
2.43
1.46
Method B : MSE =
MAD =
3.742
1.34
35
Naive Prediction
Yˆt  Yt 1
Theil’s u Statistics
U


 Y  Y 
Yt  Yˆt
2
2
t
t 1
n
n
if U = 1  Forecasts produced are no better than naive forecast
U = 0  Forecasts produced perfect fit
The smaller the value of U, the better the forecasts.
36
MSE = 11.932
MAD = 2.892
Theil’s U = 0.0546
37
Out-of-Sample Forecasts
1) Expost forecast

Prediction for the period in which actual
observations are available
2) Exante forecast

Prediction for the period in which actual
observations are not available.
38
“back” casting
T2
T1
estimation period
Ex-ante
forecast
Ex-post
forecast
in-sample
simulation
T3
Time
(today)
39
Additive Decomposition
Yt  TCt  SNt  IRt
Yt
Yt
Trend
Trend
(Multiplicative Seasonality)
Time
(Additive Seasonality)
Time
40
Multiplicative decomposition is used when the time
series exhibits increasing or decreasing seasonal
variation (Yt=TCt  SNt  IRt)
Yr 1
Yr 2
TCt
SNt
Yt
Yt – Yt-1
Q1
Q2
Q3
Q4
11.5
13
14.5
16
1.5
0.5
0.8
1.2
17.25
6.5
11.6
19.2
–10.75
5.1
7.6
Q1
Q2
Q3
Q4
17.5
19
20.5
22
1.5
0.5
0.8
1.2
26.25
9.5
16.4
26.4
–16.75
6.9
10
41
Additive decomposition is used when the time
series exhibits constant seasonal variation
(Yt=TCt + SNt + IRt)
Yr 1
Yr 2
TCt
SNt
Yt
Yt – Yt-1
Q1
Q2
Q3
Q4
11.5
13
14.5
16
1.8
–1
–1.5
0.7
13.3
12
13
16.7
–1.3
1
3.7
Q1
Q2
Q3
Q4
17.5
19
20.5
22
1.8
–1
–1.5
0.7
19.3
18
19
22.7
–1.3
1
3.7
42
Step 1 : Estimation of seasonal
component (SNt)
 Calculation of MAt and CMAt is the same as per
multiplicative decomposition
 Initial seasonal component may be estimated by
~
SNt  Yt  CMAt
For example,
~
SN 3  117  118.25  1.25
~
SN 4  172 119  53
43
 Seasonal indices are averaged and adjusted
so that they sum to zero (Why?)
44
45
Step 2 : Estimation of Trend/Cycle
 Deseasonalized series is defined as
Dt  Yt  SNˆ t
 TCt may be estimated by regression as per
multiplicative decomposition
46
i.e.,
Dt = o + 1t + t
ˆ  b  b t as per
TCˆt  D
and
t
0
1
Multiplicative decomposition
47
So,
and
TCˆt  113.22708331.980637255t
Yˆt  TˆCt  SˆNt
For example,
1
TˆC1  113.2270833 1.980637255
and
 115.2077206
Yˆ1  1115.2077206 50.80208333
 64.40563725
48
MSE = 27.911
MAD = 4.477
49
Download