Intervention models

advertisement
Intervention models
Something’s happened around t = 200
The first example
Seems like a series that is generally stationary, but shifts level around t = 200.
Look separately at the parts before and after the level shift.
There are in total 400 time-points. Select the first 190 and the last 190
First 190 values
Could be an AR(1) or an MA(1) or an ARMA(1,1). Quite clearly stationary!
Last 190 values
Points more towards an ARMA(1,1)
The change in level would most probably be modelled using a step function
1 t  200
St 200  
0 t  200
A complete intervention model for the times series can therefore be
Yt   0    St 200 
1  1  B
1  1  B



et
The ARMA(1,1) part
since there seems to be a permanent immediate constant change in level
at t = 200
How can this model be fitted using R?
strange.model <-arimax(strange,order=c(1,0,1),
xtransf=data.frame(step200=1*(seq(strange)>=200)),
transfer=list(c(0,0)))
The arimax command works like the arima command, but allows inclusion of
covariates.
The argument xtransf is followed by a data frame in which each column
correspond to a covariate time series (same number of observations as Yt ).
Here this data frame is constructed with the command
1*(seq(strange)>=200)
The command seq(strange) returns the indices of the vector strange
The command seq(strange)>=200 returns a vector (with the same length
as strange in which a term is FALSE if the corresponding index of strange is less
than 200 and TRUE otherwise.
Finally, the multiplication with 1 transforms FALSE into 0 and TRUE into 1 and
the variable in the data frame is also given the name step200 (for convenience)
Hence, the resulting column is a step function of the kind we want.
The argument transfer is followed by a list comprising one two-dimensional
vector for each covariate specified by xtransf
Here we have the argument list(c(0,0)) implying that the covariate shall be
included as it stands (no lagging, no filtering). Note that the argument must
always be followed by a list (even if there is only one covariate).
Giving an argument c(r,s) where both r and s are > 0 will enter the term
  1  1  B    s  B s 
1  1  B    r  Br  X t
into the model.
Since we have specified c(0,0) the term included will be
  1
1
St 200    St 200
print(strange.model)
Series: strange
ARIMA(1,0,1) with non-zero mean
Coefficients:
ar1
ma1
0.9824 -1.0000
s.e. 0.0111
0.0064
intercept
10.0026
0.0350
step200-MA0
1.9958
0.0606
sigma^2 estimated as 0.9826: log likelihood=-564.82
AIC=1137.64
AICc=1137.79
BIC=1157.6
Thus, the estimated model is
Yt  10.0026  1.9958  St 200  
1 B
et
1  0.9824  B
tsdiag(strange.model)
Seems to be some autocorrelation left in the residuals. Try an ARMA(1,2)
strange.model2 <-arimax(strange,order=c(1,0,2),
xtransf=data.frame(step200=1*(seq(strange)>=200)),
transfer=list(c(0,0)))
print(strange.model2)
Series: strange
ARIMA(1,0,2) with non-zero mean
Coefficients:
ar1
ma1
0.9730 -0.7781
s.e. 0.0133
0.0525
ma2
-0.2219
0.0521
intercept
10.0012
0.0317
step200-MA0
1.9972
0.0557
sigma^2 estimated as 0.9406: log likelihood=-556.28
AIC=1122.56
AICc=1122.77
BIC=1146.5
Coefficients seem to be significantly different from zero (divided by s.e. and
compare with 2)
Log-likelihood slightly higher.
tsdiag(strange.model2)
Clear improvement!
plot(y=strange,x=seq(strange),type="l",xlab="Time")
lines(y=fitted(strange.model),x=seq(strange),col="blue", lwd=2)
lines(y=fitted(strange.model2),x=seq(strange),col="red", lwd=1)
legend("bottomright",legend=c("original","model1","model2"),col=c("b
lack","blue","red"),lty=1,lwd=c(1,2,1))
Model 2 (ARMA(1,2) is less smooth, but may follow the correlation structure
better. However, this cannot be clearly seen from the plot.
The second example
Seems like a series that is from the beginning stationary, but gets a linear drift
(upward trend) around t = 200.
Look at the part before .
There are in total 400 time-points. Select the first 200.
First 200 values
Looks (again) like an ARMA(1,1)
eacf(strange[1:200])
AR/MA
0 1
0 x o
1 o o
2 x o
3 x x
4 o x
5 o x
6 x x
7 x x
2
o
o
o
x
x
x
o
o
3
o
o
o
o
o
o
o
o
4
o
o
o
o
o
o
o
o
5
o
o
o
o
o
o
o
o
6
o
o
o
o
o
o
o
o
7
o
o
o
o
o
o
o
o
8
o
o
o
o
o
o
o
o
9
o
o
o
o
o
o
o
o
10
o
o
o
o
o
o
o
o
11
o
o
o
o
o
o
o
o
12
o
o
o
o
o
o
o
o
13
o
o
o
o
o
o
o
o
The drift in level could be modelled using a linearly increasing step function
B
1 B
St 200 
A complete intervention model for the times series can therefore be
Yt   0 
B
1 B
 St 200 
1  1  B
et
1  1  B
The term
B
1 B
St 200 
will be problematic to estimate.
However, the following holds
0
t  200

St 200  
1 B
  t  200 t  200
B
Hence, create a covariate that is 0 until t = 200 and then 1, 2, …, 200
and use it with transfer=list(c(0,0))
Alternatively, and more efficient is to include this variable as an ordinary
explanatory variable (a regression predictor), using the argument xreg
strange_b.model <-arimax(strange_b,order=c(1,0,1),
xreg=data.frame(x=c(rep(0,200),1:200)))
print(strange_b.model)
Call:
arimax(x = strange_b, order = c(1, 0, 1), xreg =
data.frame(x = c(rep(0, 200),
1:200)))
Coefficients:
ar1
ma1
0.1219 0.0382
s.e. 0.3783 0.3827
intercept
9.9993
0.0744
sigma^2 estimated as 0.9884:
aic = 1138.5
x
0.0192
0.0009
log likelihood = -565.25,
Note! This can also be seen as a simple linear regression model with ARMA(1,1)
error terms.
tsdiag(strange_b.model)
Satisfactory!
Transfer-function models
Consider the data set boardings referred to in Exercise 11.16
data(boardings)
summary(boardings)
log.boardings
Min.
:12.40
1st Qu.:12.49
Median :12.53
Mean
:12.53
3rd Qu.:12.57
Max.
:12.70
log.price
Min.
:4.649
1st Qu.:4.973
Median :5.038
Mean
:5.104
3rd Qu.:5.241
Max.
:5.684
Two time-series, both with log-transformed values
plot.ts(boardings)
Could the price affect the boardings?
The cross-correlation function
Cross - covariance funct ion:
 t , s  X , Y   Cov X t , Ys 
If X t  and Yt  are both (weakly) st at ionary
 k  X , Y   Cov X t , Yt  k   Cov X t  k , Yt 
Note! Cov X t , Yt  k   CovYt  k , X t 
but Cov X t , Yt  k  is not in general  Cov X t  k , Yt 
 In general   k  X , Y    k  X , Y 
For st at ionaryseries
 k  X , Y   Corr X t , Yt  k  
 k X ,Y 
Var  X t   Var Yt 
the Cross - correlat ion funct ion
…measures the degree of linear dependence between the two series
Sample cross-correlation function
rk  X , Y  
 X  X  Y  Y 
 X  X    Y  Y 
t k
t
2
t
2
t
With R: ccf
For the boardings data set, we can try to calculate the cross-correlation function
between the two series
ccf(boardings[,1],boardings[,2],main=”boardings & price”,
ylab=”CCF”)
Typical look when at
least one of the times
series is non-stationary
Take first-order regular differences
diff_boardings<-diff(boardings[,1])
diff_price<-diff(boardings[,2])
ccf(diff_boardings,diff_price,ylab=”CCF”)
Still not satisfactory. Since
we have monthly data, we
should possibly try firstorder seasonal differences
as well.
diffs_boardings<-diff(diff_boardings,12)
diffs_price<-diff(diff_price,12)
ccf(diffs_boardings,diffs_price,ylab=”CCF”))
Better, but how do we interpret this
plot?
The two significant spikes for
negative lags says that the
difference in price depends on the
difference in boardings some
months earlier.
The significant spike at lag 6 says
that the difference in boardings
depends on the difference in price
some months earlier.
What explains what?
A problem: Since both series would show autocorrelations, these are unevitably
part of the cross-correlations we are estimating (cf. auto-correlation and partial
auto-correlation).
To solve this we need to “remove” the autocorrelation in the two series before we
investigate the cross-correlation.
 We should estimate cross-correlations between residual series from modelling
with ARMA-models
This procedure is known as pre-whitening
Normal procedure:
1.
2.
3.
Find a suitable ARMA model for the (differenced) series that is assumed to
constitute the covariate series.
Fit this model to both series
Investigate the cross-correlations between the residual series.
Could be an
ARMA(1,1,1,0)12
or an
ARMA(1,1,1,1)12
model1=arima(diffs_price,order=c(1,0,1),seasonal=list(order=
c(1,0,0),lag=12))
tsdiag(model1)
Could do!
model2=arima(diffs_price,order=c(1,0,1),seasonal=list(order=
c(1,0,1),lag=12))
tsdiag(model2)
Better!
Ljung-Box was not possible to do here!
Applying the last model to the differenced boardings series
model21=arima(diffs_boardings,order=c(1,0,1),seasonal=list(o
rder=c(1,0,1),lag=12))
ccf(residuals(model2),residuals(model21),ylab="CCF")
Well, not that much crosscorrelation left…
THE TSA package provide the command prewhiten with which prewhitening
is made and the resulting CCF is plotted.
The default set-up is that an AR model is fit to the covariate series (the first series
specified.
The AR model that minimizes AIC is chosen
The model can however be specified.
prewhiten(diffs_price,diffs_boardings,x.model=model2,
ylab="CCF")
Should be the same as
the manually developed
CCF earlier
With the default settings
pw=prewhiten(diffs_price,diffs_boardings,ylab="CCF")
Picture is clearer?
No significant crosscorrelations left
What AR model has
been used?
print(pw)
$ccf
Autocorrelations of series ‘X’, by lag
-1.0833
-0.3333
0.131
0.009
0.2500
1.0000
-0.023
-0.097
-1.0000 -0.9167 -0.8333 -0.7500 -0.6667 -0.5833 -0.5000 -0.4167
-0.2500 -0.1667 -0.0833 0.0000 0.0833 0.1667
0.057 -0.053 -0.167 -0.034
0.120
0.228 -0.129 -0.181
0.164
0.100
0.098
0.031 -0.065
0.019
0.3333 0.4167 0.5000 0.5833 0.6667 0.7500 0.8333 0.9167
1.0833
-0.078 -0.349
0.027 -0.155 -0.225
0.041 -0.027 -0.167
0.200
$model
Call:
ar.ols(x = x)
Coefficients:
1
2
3
8
9
10
-0.2145
0.0361 -0.1226
0.1616 -0.1462
0.1395
4
5
6
7
-0.4786
-0.1827
0.1392
-0.0133
Intercept: 0.002233 (0.00302)
Order selected 10
sigma^2 estimated as
0.0004016
Check with a scatter plot
Reasonable that
there is no
significant crosscorrelation
Another example
Observations of the input gas rate to a gas furnace and the percentage of carbon
dioxide (CO2) in the output from the same furnace
stationary?
stationary?
gasrate series
Not that far from stationary. In that
case an AR(2) would be the first
choice.
However, we also try first-order
regular differences
gasrate_diff< diff(gasrate)
More stationary than
before?
CO2 series
Stationary.
AR(2) ?
prewhiten(gasrate,CO2,ylab="CCF")
Download