here

advertisement
Some more issues of time series analysis
Time series regression with modelling of error terms
In a time series regression model



L 1
yt   0  1  t   2  t 2   3  t 3    s , j  xs , j   t
j 1
the error terms are tentatively assumed to be independent and identically
distributed.
Is this wise?
Performing e.g. the Durbin-Watson test we may quite easily answer the question
whether they are or not.
What if D-W gives evidence of serial correlation in the error terms?
Apply an AR(p) model to the error terms at the same time as the rest of the model
is fitted.
Standard procedure:
• Study the residuals from an ordinary regression fit
• Identify which order p of the AR-model that may be the most appropriate
for the error terms.
• Make the fit of the combined regression-AR-model
Estimation can no longer be done using ordinary least-squares.
Instead the conditional least-squares method is used.
Procedures are not curretly available in Minitab, but in more comprehensive
computer packages such as SAS and SPSS.
Example
Consider again the Hjälmaren month data set (that is used in assignments for weeks
36, 39 and 41)
Time Series Plot of Discharge.m
120
Discharge.m
100
80
60
40
20
0
Month jan
Year 1994
jan
2011
jan
2028
jan
2045
jan
2062
jan
2079
jan
2096
Minitab output from an ordinary time series regression:
The regression equation is
Discharge.m = 83.1 - 0.0300 Time.m + 2.79 Jan + 6.36 Feb + 7.89 Mar + 16.1 Apr
+ 12.2 May - 5.06 Jun - 10.9 Jul - 10.1 Aug - 10.3 Sep - 10.1 Oct
- 4.64 Nov
Predictor
Constant
Time.m
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Coef
83.13
-0.03000
2.795
6.359
7.887
16.145
12.228
-5.059
-10.938
-10.144
-10.278
-10.138
-4.638
S = 19.1121
SE Coef
33.60
0.01727
2.613
2.613
2.613
2.613
2.613
2.613
2.613
2.613
2.613
2.613
2.613
R-Sq = 18.8%
T
2.47
-1.74
1.07
2.43
3.02
6.18
4.68
-1.94
-4.19
-3.88
-3.93
-3.88
-1.77
P
0.013
0.083
0.285
0.015
0.003
0.000
0.000
0.053
0.000
0.000
0.000
0.000
0.076
R-Sq(adj) = 18.1%
Residual plots
Autocorrelation Function for RESI1
(with 5% significance limits for the autocorrelations)
Time Series Plot of RESI1
1.0
100
0.8
75
Autocorrelation
0.6
RESI1
50
25
0
0.4
0.2
0.0
-0.2
-0.4
-0.6
-25
-0.8
-1.0
-50
Month jan
Year 1991
jan
2008
jan
2025
jan
2042
jan
2059
jan
2076
jan
2093
1
5
10
15
20
25
30
35
40 45
Lag
50
55
60
65
70
75
80
Partial Autocorrelation Function for RESI1
(with 5% significance limits for the partial autocorrelations)
1.0
Residuals seem to follow an ARmodel with order 1 or 2
Partial Autocorrelation
0.8
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
1
5
10
15
20
25
30
35
40 45
Lag
50
55
60
65
70
75
80
SPSS output of a regression analysis with error term modelled as AR(1)
FINAL PARAMETERS:
Number of residuals
Standard error
Log likelihood
AIC
SBC
AR1
JAN
FEB
MAR
APR
MAY
JUN
JUL
AUG
SEP
OCT
NOV
TIME
CONSTANT
1284
15.210763
-5310.1953
10648.391
10720.599
Variables in the Model:
B
SEB
.605651
.022323
2.641382
1.644536
6.239922
2.077007
7.788472
2.295270
16.059974
2.411374
12.151510
2.468703
-5.129816
2.485909
-11.003682
2.468291
-10.204025
2.410445
-10.331080
2.293586
-10.180902
2.074100
-4.664558
1.639306
-.031821
.034726
86.726889
67.485642
Variance of pure error term
smaller than variance of error
term in ordinary regression!
T-RATIO
27.130704
1.606156
3.004285
3.393270
6.660092
4.922224
-2.063557
-4.458016
-4.233254
-4.504335
-4.908587
-2.845447
-.916338
1.285116
APPROX. PROB.
.00000000
.10848820
.00271421
.00071191
.00000000
.00000097
.03926251
.00000900
.00002469
.00000727
.00000104
.00450615
.35966355
.19898601
Non-parametric tests for trend
All models so far taken up in the course are parametric models.
Parametric models assume a specific probability distribution is governing the
obtained observations (i.e. the normal distribution)
and
The population mean value of each observation can be expressed in terms of the
parameters of the model.
What if we cannot specify this probability distribution?
• Least-squares fitting of time series regression models can still be done, but none of
the significance tests are valid  We cannot test for the presence of a trend (nor for
the presence of seasonal variation)
• Classical decomposition is still possible but they have no significance tests built-in
(they are all descriptive analysis tools)
• Conditional least-squares estimation in ARIMA models are not valid as they
emerge from the assumption that the observations are normally distributed. As a
consequence the significant tests are not valid.
The Mann-Kendall test for a monotonic trend
Example:
Look again at the data set of sales values from lecture 3, but with restriction to the
years 1985-1996
Year Sales values
1985
151
1986
151
1987
147
1988
149
1989
146
1990
142
1991
143
1992
145
1993
141
1994
143
1995
145
1996
138
Sales values
160
140
120
100
80
60
40
20
0
1985
1987
1989
1991
1993
1995
Could there be a trend in data?
If there is a trend, we do not assume that it has a specific functionalform, such as
linear or quadratic, just assume it is monotonic, i.e. decreasing or increasing.
In this case it would be a decreasing trend.
The sign function:
 1 x  0

sgn  x    1 x  0
0
x0

Now define the Mann-Kendall test statistic as

T   sgn y j  yi

i j
i.e. the statistic is a sum of +1:s, –1:s and 0:s depending on whether yj is higher
than, lower than or equal to yi for each pair of time points (i, j : i < j) .
Large positive values of T would then be consistent with an upward trend
Large negative values of T would be consistent with a downward trend
Values around 0 of T would be consistent with no trend
For the current data set:
Now, is T = – 43 enough negatively large to show evidence for a trend?
The non-parametric initial “fashion”:
• Calculate all possible values of T by letting each difference yj – yi , i < j have in
order the signs –1, 0 and 1.
• (Put these values in ascending order )
• For the test of H0: No trend vs. HA : Negative monotonic trend at the level of
significance  , calculate the (100)th percentile of the (ordered) values  T
• If the observed T is < T reject H0 , otherwise “accept” H0
If a fairly long length of the time series this procedure is quite tedious.
Approximate solution:
The variance of T can be shown to be
g

1 
VarT    n  n  1  2n  5   t p  t p  1  2t p  5 

18 
p 1


where n is the length of the time series, g is the number of so-called ties (ties
means values that have duplicates) and tp is the number of duplicates for tie p.


Then for fairly large n
T
is approx. N 0,1 if H 0 is true
VarT 

For the current time series of sales values:
n = 11
g = 3 (the values 143, 145 and 151 have each two duplicates)
 t1 = t2 = t3 = 2
 Var (T ) = (1/18)(111027 – (3219)) = 162
T
 43

 3.378
VarT 
162
 P-value is 0.00036
Thus H0 may be rejected at any reasonable level of significance
For time series with seasonal variation, Hirsch & Sclack has developed a
modification of the Mann-Kendall test with test statistic
L
TS   Tk
k 1
where Tk is the Mann-Kendall test statistic for the time series consisting of values
from season k only (e.g. for montly data we consider the series of January values,
the series of February values etc.)
Expressions for the variance of TS can be derived and analogously to the MannKendall test
TS
is approx. N 0,1 if H 0 is true
VarTS 
Download