NCSU - NCSU Statistics

advertisement
Seasonal Unit Root Tests
in Long Periodicity Cases
D. A. Dickey
Ying Zhang
Natural gas-a colorless, odorless, gaseous
hydrocarbon-may be stored in a number of
different ways. It is most commonly held in
inventory underground under pressure in three
types of facilities. These are: (1) depleted
reservoirs in oil and/or gas fields, (2) aquifers,
and (3) salt cavern formations. (Natural gas is
also stored in liquid form in above-ground tanks).
1. Regression with Time Series Errors
Y(t) = a + bt + seasonal effects + Z(t),
Z(t) a stationary time series
Seasonal effects:
Sinusoids,
Seasonal dummy variables
2. Dynamic Seasonal Models
Y(t) = Y(t-d) + e(t)
Y(t) = Y(t-d) + e(t) – b e(t-d)
copy of last season
EWMA of past seasons
Y(t) = Y(t-1) + [Y(t-d)-Y(t-d-1)] + Z(t)
Z(t) = e(t)
PROC CUT&PASTE;
Z(t) = e(t) – a e(t-1) – b e(t-d) + ab e(t-d-1) “airline”
Z(t) = (1-aB)(1-bBd) e(t)
Y(t) = Y(t-1) + [Y(t-d)-Y(t-d-1)] + e(t)
Y(t) = 10 + t + 8X3 – 8X5 -5X8 – 5X9 – 5X10+e(t)
Summary:
1.Both models can give same predictions for pure trend +
seasonal functions.
2.For data, lag model looks back 1 year and ignores (or
discounts) others. Good for slowly changing seasonality.
3.For data, dummy variable model weights all years
equally. Good for very regular seasonality.
4. Differences in forecast errors too!
Weekly natural gas data – unit root forecast
Weekly natural gas data – seasonal dummy variable forecast
A general seasonal model:
Yt –f(t) = r(Yt-d –f(t-d)) + et
f(t) = deterministic components
H0: r=1
Under H0, period d functions annihilated.
f periodic  Yt –Yt-d = (r-1)(Yt-d –f(t-d)) + et
r=1  Yt –Yt-d = et
Use double subscripts:
Quarterly (d=4) example, m years, f(t)=0
Yt = rYt -4  et
Y1=e1
(Y1,1)
Y2=e2
(Y1,2)
Y3=e3
(Y1,3)
Y4=e4
(Y1,4)
Y5=e5+re1
(Y2,1)
Y6=e6+re2
(Y2,2)
Y7=e7+re3
(Y2,3)
Y8=e8+re4
(Y2,4)
m d (rˆ - r ) =
d
m
d
m


-1
-1
-2
2
(1/
d
)
(
m
Y
e
)
/
[
d
(
m
Y




d ( i -1)  s -1 d ( i -1)  s 
d ( i -1) s -1 )]

s =1
i =1
s =1
i =1


N s   2  W (t )dW (t )
(if r = 1)
Ds  
2

1
0
W 2 (t )dt
d
N /
Denominator is  D
Numerator is
s =1 d
s
s =1
s
d = dN
/d =D
Known unit root facts (2=1):
(1) Moments (d=1 case or individual terms)
E{Ns} = 0, E{Ds} = (m-1)/(2m)1/2
Var{Ns} = (m-1)/(2m)1/2
Var{Ds} = (m-1)(m2-m+1)/(3m3)1/3
Cov{Ns, Ds} = (m-1)(m-2)/(3m2) 1/3
(2) Studentized statistic asymptotically equivalent to
(numerator sum) / (denominator sum)1/2
Basic idea is simple:
Large d  numerator approximately normal
Large d  denominator converges to E{denominator}
d
m
1
(1/ d ) [ Yd (i -1) s -1ed (i -1) s / m] 
 N (0, )
2
s =1 i =1
d
m
1
P
-1
2
2
d [ Yd (i -1) s -1 ] / m 

2
s =1 i =1
D
ratio = m d (rˆ -1) 
 N (0, 2)
D
 (H0 )
t - statistic :
numerator / denominator 
 N (0,1)
D
Nice proof Grandpa!
As you see, I’m very excited.
d = 1, N (Y ,1)
d = 2, N (Y ,1)
d = 4, N (Y ,1)
CDFs
d=4
t and N(0,1)
-1.645
(SAS)
0
1.645
CDFs
d=4
md1/2(r-1) and N(0,2)
-2.386
0
2.386
Improving the Normal Approximation:
JASA paper (Dickey, Hasza, Fuller, 1984 ) gives limit
distribution for studentized statistic (d=12)
5th %ile = -1.80
95th %ile = 1.52
50th %ile: -0.14 (Note: (1.52-1.80)/2 = -0.14 !!)
Difference: 1.52+1.80 = 3.32,
2(1.645) = 3.29 (close !!)
Suggestion: shift by median
CLT  limit distribution median is 0.
Median as function of seasonality d:
1. Get medians for d=2, 4,12 from DHF
2. Plot median vs. d-1/2 (d=2,4,12,limit)
1/ d
Median as function of seasonality d:
Regress median on d-1/2
Slope very close to ½,
Intercept very close to 0.
Median Shifts and Tau Percentiles.
d
med
-1/(2 d )
p01
p025
p05
p10
2 -0.35
-0.35355 -2.67990 -2.31352 -1.99841 -1.63510
4 -0.24
-0.25000 -2.57635 -2.20996 -1.89485 -1.53155
12 -0.14
-0.14434 -2.47069 -2.10430 -1.78919 -1.42589
inf
0.00
0
-2.32685 -1.96046 -1.64535 -1.28205
Taylor :
Ni numerator terms,
Di denominator terms,
1
mean N ~
(  d2 - d )
2d
mean D ~ ?(1/ 2,1/ 2d )
(  = 0,  2 = 1/ 2d )
Cov( N , D) 1/ (3d )
t = d N / D = 0  2d N  0  2d N (D -1/ 2) + remainder
E{ d N / D } = 0  0  2d cov( N , D) = ( 2 / 3 d ) = 0.4714 / d
could use 1/(2 d )
Simulation Evidence
• m= 100, various d values
• 2 sets of 40,000 t statistics at each (m,d)
• e.g. d=365 and m=100, (daily data 100 years)
– 36500x40000 = 1.46 billion generated data points.
– SAS: only 10 minutes run time !
– Overlay percentiles (adjusted t) on N(0,1)
– Duplicates almost exactly the same.
Simulation Evidence - Detrending
• m= 20, d =4, 6, 12, 24, 52, 96, 168, 365
• 96 quarter hours/day, 168 hours/week
• Detrending:
– None
– Constant, linear, quadratic
– Period d sinusoids (fundamental & harmonic)
• Sets of 20,000 t statistics at each (m,d).
20 years of weekly data, 20,000 simulated series TAU
20 years of weekly data, 20,000 simulated series TAU
20 years of weekly data, 20,000 simulated series TAU
20 years of weekly data, 20,000 simulated series TAU
20 years of weekly data, 20,000 simulated series TAU
20 years of weekly data, 20,000 simulated series TAU
Standard tau percentiles for various adjustments
Three replicates (of 20,000) per d value
Conclusions:
Spread between percentiles about constant
(and close to N(0,1) spread)
Medians smooth function of 1/sqrt(d)
Degree of detrending matters
Cubic smoothing regression plotted with raw %iles.
d = 4,
1/ d = 0.5
Claim: As d infinity, Tau  N(0,1)
for all of these forms of detrending
Seasonal random walk Z, data Y.
Y = Xb + Z
Detrend by OLS:
R = PY = (I - X ( X ' X )-1 X ')Y = (I - X ( X ' X )-1 X ')Z
Seasonal Random Walk has d “channels” of m values
Denominator is sum of d quadratic forms
Without detrending each has eigenvalues
-1
  2


2
4
sin
=
O
(
m
)
 

2(2m - 1)  
 
X ( X ' X )-1 X '
can be written as T T '
T 'T = I
T T '
k = rank of X matrix
Middle matrix is diagonal.
Projection =>
k diagonal entries 1
rest 0
Denominator quadratic form contains
Z ' X ( X ' X )-1 X ' Z = Z 'T T ' Z
k times maximum eigenvalue = O(km2)
Upper probability bound on unnormalized quadratic form.
Normalization is m2d so k/d0 suffices for
no limit effect of detrending.
Same for numerator, estimator, tau statistic.
Based on Taylor series (for large m) adjustment is
2(1/ 3  k / 2) / d
for regression adjustments with k columns
selected from intercept and Fourier sines and
cosines.
Your talk
seems better
now Grandpa!
Focus on Medians:
d = 4,
1/ d = 0.5
Focus on Medians:
Allowing for augmenting terms, as in seasonal
multiplicative model, follows the same proof as
in DHF.
Natural gas data:
Procedure
(1) Compute residuals (trend + harmonics)
(2) AR(2) fit to span 52 differences of residuals
(3) Filter with AR(2)
Ft = filtered series
Wt = span 52 differences Ft – Ft-52
(4) Regress Wt on Ft-52 Wt-1 Wt-2
The REG Procedure
Dependent Variable: Diff
Sum of
Mean
DF
Squares
Square
F Value
Pr > F
Model
3
718362
239454
231.53
<.0001
Error
679
702233
1034.21632
Corrected Total
682
1420595
Source
Parameter Estimates
Parameter
Standard
DF
Estimate
Error
t Value
Pr > |t|
Intercept
1
-0.68125
1.23111
-0.55
0.5802
L52FY
1
-0.99746
0.03800
-26.25
<.0001
Diff1
1
0.01417
0.00777
1.82
0.0686
Diff2
1
-0.01152
0.00730
-1.58
0.1151
Variable
Follow up:
Lag 52 coefficient near -1 suggests a52-1 near -1
Perhaps no lag correlation in the presence of sinusoids
Fit ARIMAX model as a check (AR(2), no seasonal lag):
Parameter
MU
AR1,1
AR1,2
NUM1
NUM2
NUM3
NUM4
NUM5
Estimate
727.58194
1.37442
-0.38964
0.09520
-883.25146
240.92573
-133.27021
122.42419
Standard
Approx
Error t Value Pr > |t|
684.44164
0.03379
0.03381
0.04525
23.18237
23.05715
11.51098
11.53277
1.06
40.67
-11.53
2.10
-38.10
10.45
-11.58
10.62
0.2878
<.0001
<.0001
0.0354
<.0001
<.0001
<.0001
<.0001
Lag
0
1
2
0
0
0
0
0
Variable
total
total
total
date
s1
c1
s2
c2
Lack of fit?
Box-Ljung test on residuals
Autocorrelation Check of Residuals
To
Lag
6
12
18
24
30
36
42
48
54
60
66
72
78
84
90
96
102
108
ChiPr >
Square DF ChiSq
1.40
4 0.8449
18.66 10 0.0448
23.67 16 0.0970
26.61 22 0.2263
29.61 28 0.3821
33.03 34 0.5150
46.84 40 0.2122
51.65 46 0.2625
65.50 52 0.0989
75.05 58 0.0654
80.14 64 0.0838
85.28 70 0.1033
87.52 76 0.1724
91.06 82 0.2312
96.17 88 0.2586
107.69 94 0.1582
117.16 100 0.1158
137.48 106 0.0215
-------------Autocorrelations-----------0.008 -0.012 0.001 -0.000 -0.023 0.033
-0.086 0.034 0.089 -0.009 0.017 0.077
0.022 0.002 0.025 0.012 0.047 0.055
-0.014 -0.037 0.022 -0.027 -0.028 -0.017
0.010 0.036 0.042 -0.012 -0.021 0.012
0.001 0.030 -0.027 -0.031 0.042 -0.010
-0.026 -0.081 -0.035 -0.034 0.078 -0.042
0.011 0.042 -0.044 -0.027 0.036 0.014
-0.055 0.037 -0.024 -0.008 0.085 -0.070
-0.096 0.023 -0.027 -0.002 -0.029 0.022
-0.006 -0.035 -0.053 -0.030 -0.035 -0.009
-0.060 -0.017 0.034 0.032 -0.007 0.011
-0.034 -0.012 -0.026 -0.004 -0.027 -0.001
0.018 -0.029 -0.011 -0.050 0.010 0.017
0.000 -0.030 -0.048 0.049 0.006 -0.018
-0.011 -0.053 0.006 -0.020 -0.066 -0.075
0.082 -0.059 -0.013 0.018 0.016 -0.003
-0.021 -0.058 0.044 0.021 -0.067 -0.112
Lag 104, 52
AR(2) characteristic polynomial
m2 - 1.37442 m + 0.38964
(m=1/B)
QUESTIONS?
D. A. Dickey
r=1
D.A.D.
QUESTIONS?
D. A. Dickey
r=1
OK, we’re
outa here!
D.A.D.
Following up –
No adjustment
Polynomial
Sine (fund.)
+ harmonic
add 1.0 /(2d1/2) = (1/2 + 0(2/3) )/d1/2
add 2.3/(2d1/2) ≈ (1/2 + 1(2/3) )/d1/2
add 5.0 /(2d1/2) ≈ (1/2 + 3(2/3) )/d1/2
add 7.6 /(2d1/2) ≈ (1/2 + 5(2/3) )/d1/2
Sine + linear about the same as sine
Generated 3 sets of pctles (20,000 reps) for both models
Sorted on d and 5th percentile
Result: percentiles interspersed (see below)
Moral: Use same adjustments for sine, sine + linear.
-----------------20,000 reps per line------------ d=52 ----------------------trend
t_1 t_2_5
t_5
t_10
t_25
t_50 t_75
t_90
t_95 t_97_5 t_99
n
harmonic
Harmonic
harmonic
-2.95 -2.58 -2.24 -1.86 -1.23 -0.53 0.16
-2.93 -2.54 -2.23 -1.84 -1.22 -0.54 0.15
-2.94 -2.55 -2.21 -1.85 -1.21 -0.52 0.17
0.78
0.78
0.77
1.14
1.15
1.16
1.48
1.46
1.50
1.85 1040
1.86 1040
1.88 1040
sine wave
sine wave
lin&sine
sine wave
lin&sine
lin&sine
-2.75
-2.73
-2.73
-2.69
-2.71
-2.71
-2.36
-2.34
-2.35
-2.35
-2.33
-2.33
-2.03
-2.03
-2.03
-2.01
-1.98
-1.98
-1.66
-1.65
-1.66
-1.64
-1.62
-1.62
-1.04
-1.03
-1.03
-1.03
-1.01
-1.01
-0.35
-0.34
-0.34
-0.34
-0.33
-0.33
0.34
0.34
0.34
0.34
0.35
0.35
0.95
0.96
0.97
0.95
0.98
0.98
1.34
1.34
1.34
1.31
1.35
1.35
1.65
1.67
1.66
1.65
1.65
1.65
2.03
2.05
2.01
2.03
2.04
2.04
1040
1040
1040
1040
1040
1040
mean
mean
linear
quadratic
linear
quadratic
mean
quadratic
linear
-2.49
-2.52
-2.51
-2.49
-2.53
-2.53
-2.44
-2.50
-2.52
-2.15
-2.16
-2.13
-2.13
-2.14
-2.12
-2.09
-2.10
-2.10
-1.83
-1.83
-1.82
-1.82
-1.81
-1.80
-1.79
-1.78
-1.78
-1.47
-1.46
-1.45
-1.45
-1.45
-1.42
-1.44
-1.43
-1.42
-0.84
-0.84
-0.81
-0.84
-0.83
-0.82
-0.83
-0.84
-0.83
-0.17
-0.16
-0.15
-0.15
-0.15
-0.14
-0.16
-0.15
-0.15
0.52
0.53
0.54
0.52
0.54
0.53
0.52
0.52
0.53
1.13
1.16
1.16
1.12
1.15
1.13
1.13
1.14
1.16
1.48
1.52
1.51
1.48
1.53
1.49
1.50
1.51
1.52
1.80
1.84
1.82
1.80
1.87
1.83
1.84
1.83
1.85
2.16
2.21
2.18
2.22
2.22
2.19
2.25
2.18
2.22
1040
1040
1040
1040
1040
1040
1040
1040
1040
none
none
none
-2.38 -2.05 -1.73 -1.36 -0.75 -0.07 0.62
-2.46 -2.07 -1.73 -1.36 -0.75 -0.07 0.61
-2.43 -2.04 -1.73 -1.37 -0.75 -0.07 0.62
1.23
1.22
1.23
1.60
1.61
1.59
1.90
1.93
1.90
2.25 1040
2.31 1040
2.27 1040
Download