Sales figures jan 98 - dec 01
45
40
35
30
25
20
15
10
5
0 ju n
-9
7 ja n
-9
8 ju l-
9
8 fe b
-9
9 a u g
-9
9 m a r-
0
0 o k t-
0
0 a p r-
0
1 n o v-
0
1 m a j-
0
2
600
500
400
300
200
100
0
1000
900
800
700
Tot-P ug/l, Råån, Helsingborg 1980-2001
• Non-independent observations (correlations structure)
• Systematic variation within a year (seasonal effects)
• Long-term increasing or decreasing level
(trend)
• Irregular variation of small magnitude
(noise)
• Economic indicators: Sales figures, employment statistics, stock market indices,
…
• Meteorological data: precipitation, temperature,…
• Environmental monitoring: concentrations of nutrients and pollutants in air masses, rivers, marine basins,…
•
Purpose: Estimate different parts of a time series in order to
– understand the historical pattern
– judge upon the current status
– make forecasts of the future development
•
Methodologies:
Method
Time series regression
Classical decomposition
Exponential smoothing
ARIMA modelling (Box-Jenkins)
Non-parametric tests
Transfer function and intervention models
State space modelling
Spectral domain analysis
This course?
No
No
No
Yes
Yes
Yes
Yes
No
Let y t
= (Observed) value of times series at time point t and assume a year is divided into L seasons
Regession model (with linear trend): y t
=
0
+
1 t+
j
sj x j,t
+
t where x j,t
= 1 if y t belongs to season j and 0 otherwise, j=
1,…,
L-1 and {
t
} are assumed to have zero mean and constant variance
(
2 )
The parameters
0 ,
1 ,
s 1
,…,
s,L1 are estimated by the Ordinary Least
Squares method:
( b
0
, b
1
, b s1
, … ,b s,L-1
)=argmin {
( y t
–
(
0
+
1 t+
j
sj x j,t
) 2 }
Advantages:
• Simple and robust method
• Easily interpreted components
• Normal inference (conf.intervals, hypothesis testing) directly applicable
•Drawbacks:
•Fixed components in model (mathematical trend function and constant seasonal components)
•No consideration to correlation between observations
Example: Sales figures
Sales figures January 1998 - December 2001
45
40
35
30
25
20
15
10
5
0
Ju n-
97
Ja n-
98
Ju l-9
8
Fe b-
99
A ug
-9
9
M ar
-0
0 month
O ct
-0
0
A pr
-0
1
N ov-
01
M ay-
02 jan-98 feb-98 mar-98 apr-98 maj-98 jun-98 jul-98 aug-98 sep-98 okt-98 nov-98 dec-98
20.33
20.96
23.06
24.48
25.47
28.81
30.32
29.56
30.01
26.78
23.75
24.06
jan-99 feb-99 mar-99 apr-99 maj-99 jun-99 jul-99 aug-99 sep-99 okt-99 nov-99 dec-99
23.58
24.61
27.28
27.69
29.99
30.87
32.09
34.53
30.85
30.24
27.86
24.67
jan-00 feb-00 mar-00 apr-00 maj-00 jun-00 jul-00 aug-00 sep-00 okt-00 nov-00 dec-00
26.09
26.66
29.61
32.12
34.01
32.98
36.38
35.90
36.42
34.04
31.29
28.50
jan-01 feb-01 mar-01 apr-01 maj-01 jun-01 jul-01 aug-01 sep-01 okt-01 nov-01 dec-01
28.43
29.92
33.44
34.56
34.22
38.91
41.31
38.89
40.90
38.27
32.02
29.78
Construct seasonal indicators : x
1
, x
2
, … , x
12
January (1998-2001): x
1
= 1, x
2
= 0, x
3
= 0, …, x
12
= 0
February (1998-2001): x
1
= 0, x
2
= 1, x
3
= 0, …, x
12
= 0 etc.
December (1998-2001): x
1
= 0, x
2
= 0, x
3
= 0, …, x
12
= 1 sales
20.33
20.96
23.06
24.48
32.02
I
29.78
time
1
2 x1
1
0 x2
0
1 x3
0
0 x4
0
0 x5
0
0 x6
0
0 x7
0
0 x8
0
0 x9
0
0 x10
0
0 x11
0
0 x12
0
0
47
I
3
4
48
0
I
0
0
0
0
I
0
0
0
0
I
1
0
0
0
I
0
1
0
0
I
0
0
0
0
I
0
0
0
0
I
0
0
0
0
I
0
0
0
0
I
0
0
0
0
I
0
0
0
1
I
0
0
0
0
I
0
0
1
Use 11 indicators, e.g. x
1
x
11 in the regression model
Regression Analysis: sales versus time, x1, ...
The regression equation is sales = 18.9 + 0.263 time + 0.750 x1 + 1.42 x2 + 3.96 x3 + 5.07 x4 + 6.01 x5
+ 7.72 x6 + 9.59 x7 + 9.02 x8 + 8.58 x9 + 6.11 x10 + 2.24 x11
Predictor Coef SE Coef T P
Constant 18.8583 0.6467 29.16 0.000
time 0.26314 0.01169 22.51 0.000
x1 0.7495 0.7791 0.96 0.343
x2 1.4164 0.7772 1.82 0.077
x3 3.9632 0.7756 5.11 0.000
x4 5.0651 0.7741 6.54 0.000
x5 6.0120 0.7728 7.78 0.000
x6 7.7188 0.7716 10.00 0.000
x7 9.5882 0.7706 12.44 0.000
x8 9.0201 0.7698 11.72 0.000
x9 8.5819 0.7692 11.16 0.000
x10 6.1063 0.7688 7.94 0.000
x11 2.2406 0.7685 2.92 0.006
S = 1.087 R-Sq = 96.6% R-Sq(adj) = 95.5%
Analysis of Variance
Source DF SS MS F P
Regression 12 1179.818 98.318 83.26 0.000
Residual Error 35 41.331 1.181
Total 47 1221.150
Source DF Seq SS time 1 683.542
x1 1 79.515
x2 1 72.040
x3 1 16.541
x4 1 4.873
x5 1 0.204
x6 1 10.320
x7 1 63.284
x8 1 72.664
x9 1 100.570
x10 1 66.226
x11 1 10.039
Unusual Observations
Obs time sales Fit SE Fit Residual St Resid
12 12.0 24.060 22.016 0.583 2.044 2.23R
21 21.0 30.850 32.966 0.548 -2.116 -2.25R
R denotes an observation with a large standardized residual
Predicted Values for New Observations
New Obs Fit SE Fit 95.0% CI 95.0% PI
1 32.502 0.647 ( 31.189, 33.815) ( 29.934, 35.069)
Values of Predictors for New Observations
New Obs time x1 x2 x3 x4 x5 x6
1 49.0 1.00 0.000000 0.000000 0.000000 0.000000 0.000000
New Obs x7 x8 x9 x10 x11
1 0.000000 0.000000 0.000000 0.000000 0.000000
Sales figures with predicted value
45
40
35
30
25
20
15
10
5
0
Ju n-
97
Ja n-
98
Ju l-9
8
Fe b-
99
A ug
-9
9
M ar
-0
0 month
O ct
-0
0
A pr
-0
1
N ov-
01
M ay-
02
What about serial correlation in data?
Positive serial correlation:
Values follow a smooth pattern
Negative serial correlation:
Values show a “thorny” pattern
How to obtain it?
Use the residuals.
e t
y t
t
y t
ˆ
0
ˆ
1
t
j
11
1
ˆ s , j
x j , t
; t
1 ,..., 48
Residual plot from the regression analysis:
2
1
0
-1
-2
10 20 30
Month number (from jan 1998)
Smooth or thorny?
Durbin Watson test on residuals: d
t n
2
( e t
e t
1
) 2 t n
1 e t
2
Thumb rule:
If d < 1 or d > 3, the conclusion is that residuals (and original data) are correlated.
Use shape of figure (smooth or thorny) to decide if positive or negative)
(More thorough rules for comparisons and decisions about positive or negative correlations exist.)
Durbin-Watson statistic = 2.05 ( Comes in the output )
Value > 1 and < 3 No significant serial correlation in residuals!
• Decompose – Analyse the observed time series in its different components:
– Trend part
– Seasonal part
( TR )
( SN )
– Cyclical part
– Irregular part
( CL )
( IR )
Cyclical part: State-of-market in economic time series
In environmental series, usually together with
TR
• Multiplicative model: y t
=TR t
·SN t
·CL t
·IR t
Suitable for economic indicators
Level is present in TR t
TC t
= (
TR·CL
) t or in
SN t
, IR indices t
(and CL t
) works as
Seasonal variation increases with level of y t
• Additive model: y t
=TR t
+SN t
+CL t
+IR t
More suitable for environmental data
Requires constant seasonal variation
SN t
, IR t
(and CL t
) vary around 0
Example 1: Sales data
Sales figures jan 98 - dec 01
45.00
40.00
35.00
30.00
25.00
20.00
15.00
10.00
5.00
0.00
ju n
-9
7 ja n
-9
8 ju l-
9
8 fe b
-9
9 a u g
-9
9 m a r-
0
0 o k t-
0
0 a p r-
0
1 n o v-
0
1 m a j-
0
2
Observed (blue) and deseasonalised (magenta)
45.00
40.00
35.00
30.00
25.00
20.00
15.00
10.00
5.00
0.00
ju n
-9
7 ja n
-9
8 ju l-9
8 fe b
-9
9 a ug
-9
9 ma r-0
0 o kt
-0
0 a pr-0
1 n ov-
0
1 ma j-0
2
Observed (blue) and theoretical trend (magenta)
45.00
40.00
35.00
30.00
25.00
20.00
15.00
10.00
5.00
0.00
ju n
-9
7 ja n
-9
8 ju l-9
8 fe b
-9
9 a ug
-9
9 ma r-0
0 o kt
-0
0 a pr-0
1 n ov-
0
1 ma j-0
2
Observed (blue) with estimated trendline (black)
45.00
40.00
35.00
30.00
25.00
20.00
15.00
10.00
5.00
0.00
mar-97 jul-98 dec-99 apr-01 sep-02
Example 2:
Estimation of components, working scheme
1.
Seasonally adjustment/Deseasonalisation:
• SN t usually has the largest amount of variation among the components.
•
The time series is deseasonalised by calculating centred and weighted Moving
Averages :
M t
( L ) y t
( L / 2 )
y t
( L / 2
1 )
2
...
y t
2
L
2
...
y t
( L / 2
1 )
2
y t
( L / 2 ) where L =Number of seasons within a year ( L
=2 for ½-year data, 4 for quaerterly data och 12 för monthly data)
– M t becomes a rough estimate of (TR·CL) t
.
– Rough seasonal components are obtained by
• y t
/M t
• y t
– M t in a multiplicative model in an additive model
– Mean values of the rough seasonal components are calculated for eacj season separetly. L means .
– The L means are adjusted to
• have an exact average of 1 (i.e. their sum equals
L ) in a multiplicative model.
• Have an exact average of 0 (i.e. their sum equals zero) in an additive model.
– Final estimates of the seasonal components are set to these adjusted means and are denoted: sn , ,
1
sn
L
– The time series is now deaseasonalised by
• y t
* y t
/ sn t in a multiplicative model
• y t
* y t
sn t in an additive model sn sn , ,
1
sn
L depending on which of the seasons t represents.
2. Seasonally adjusted values are used to estimate the trend component and occasionally the cyclical component.
If no cyclical component is present:
• Apply simple linear regression on the seasonally adjusted values
Estimates tr t of linear or quadratic trend component.
• The residuals from the regression fit constitutes estimates, ir t the irregular component of
If cyclical component is present:
•
Estimate trend and cyclical component as a whole (do not split them) by tc t
y t
*
m
y t
*
( m
1 )
2
m y t
*
1
y t
*
1
y t
*
m i.e. A non-weighted centred Moving Average with length 2 m +1 caclulated over the seasonally adjusted values
– Common values for 2 m +1: 3, 5, 7, 9, 11, 13
– Choice of m is based on properties of the final estimate of IR t which is calculated as
• ir
t y t
*
/( tc t
) in a multiplicative model
• ir t
y t
*
( tc t
) in an additive model
– m is chosen so to minimise the serial correlation and the variance of ir t
.
– 2 m +1 is called (number of) points of the
Moving Average.
Example, cont: Home sales data
Minitab can be used for decomposition by
Stat
Time series
Decomposition
Val av modelltyp
Option to choose between two models
Time Series Decomposition
Data Sold
Length 47,0000
NMissing 0
Trend Line Equation
Yt = 5,77613 + 4,30E-02*t
Seasonal Indices
Period Index
1 -4,09028
2 -4,13194
3 0,909722
4 -1,09028
5 3,70139
6 0,618056
7 4,70139
8 4,70139
9 -1,96528
10 0,118056
11 -1,29861
12 -2,17361
Accuracy of Model
MAPE: 16,4122
MAD: 0,9025
MSD: 1,6902
Deseasonalised data have been stored in a column with head DESE1.
Moving Averages on these column can be calculated by
Stat
Time series
Moving average
Choice of 2 m +1
TC component with 2 m +1 = 3 (blue)
MSD should be kept as small as possible
By saving residuals from the moving averages we can calculate MSD and serial correlations for each choice of 2 m+ 1.
2 m +1
3
5
7
9
11
13
MSD
1.817
1.577
1.564
1.602
1.542
1.612
Corr( e t
,e t-1
-0.444
-0.473
-0.424
-0.396
-0.431
-0.405
)
A 7-points or 9-points moving average seems most reasonable.
Serial correlations are simply calculated by
Stat
Time series
Lag and further
Stat
Basic statistics
Correlation
Or manually in Session window:
MTB > lag ’RESI4’ c50
MTB > corr ’RESI4’ c50
Analysis with multiplicative model:
Time Series Decomposition
Data Sold
Length 47,0000
NMissing 0
Trend Line Equation
Yt = 5,77613 + 4,30E-02*t
Seasonal Indices
Period Index
1 0,425997
2 0,425278
3 1,14238
4 0,856404
5 1,52471
6 1,10138
7 1,65646
8 1,65053
9 0,670985
10 1,02048
11 0,825072
12 0,700325
Accuracy of Model
MAPE: 16,8643
MAD: 0,9057
MSD: 1,6388
additive
additive additive
Classical decomposition, summary
Multiplicative model: y t
TR t
SN t
CL t
IR t
Additive model: y t
TR t
SN t
CL t
IR t
Deseasonalisation
• Estimate trend+cyclical component by a centred moving average:
CMA t
y t
( L / 2 )
y t
( L / 2
1 )
2
...
y t
L
2
2
...
y t
( L / 2
1 )
2
y t
( L / 2 ) where L is the number of seasons (e.g. 12, 4, 2)
• Filter out seasonal and error (irregular) components:
– Multiplicative model: sn t
ir t
y t
CMA t
-- Additive model: sn t
ir t
y t
CMA t
Calculate monthly averages
Multiplicative model: sn m
1 n m
(
n m sn l ir l
)
Additive model: sn m
1 n m
(
n m sn l ir l
) for seasons m
=1,…,
L
Normalise the monhtly means
Multiplicative model: sn m
1
L sn m
l
L
1 sn l
l
L
1
L sn l
Additive model: sn m
sn m
1
L
L l
1 sn l
Deseasonalise
Multiplicative model: d
t y sn t t
Additive model: d t
y t
sn t where sn t
= sn m for current month m
Fit trend function, detrend (deaseasonalised) data tr t
f ( t )
Multiplicative model: cl t
ir t
d tr t t
Additive model: cl t
ir t
d t
tr t
Estimate cyclical component and separate from error component
Multiplicative model: cl t ir t
( cl
ir ) t
k
( cl
ir ) t
( k
1 )
2
k
...
1
( cl
ir ) t
...
( cl
ir ) t
k
( cl
ir ) t cl t
Additive model: cl t ir t
(
( cl cl
ir ir ) t
) t
k
( cl
ir ) t
( k
1 )
...
2
k
1
( cl
ir ) t
cl t
...
( cl
ir ) t
k