JD+

advertisement
NATIONAL BANK OF BELGIUM
JD+
Towards a single RegArima modelling (Draft)
Jean Palate
7/2/2015
Introduction
Tramo-Seats and X12-ARIMA provide different RegArima modelling facilities. That can be disturbing for
many users, especially when the two algorithms are integrated in single software, like JD+. For many
reasons (transparency, coherence, maintenance…), it seems desirable to offer a unique pre-processing
module for the two methods.
X12-ARIMA is based on an old version of Tramo; so, it doesn’t contain many recent improvements of
Tramo (especially related to calendar effects, seasonality tests and over/under differencing). Moreover,
if the two routines follow roughly the same logic, they differ significantly on most details.
Tramo is significantly faster (up to 10 times) and more stable than X12-ARIMA (see the tests of the
SACE). On the other hand, X12-ARIMA offers some facilities that are not included in Tramo (handling of
the leap year effect, changes of regime, automatic detection of the length of Easter effect…).
A common RegArima modelling should take the best aspects of both solutions; it should be mainly
based on Tramo (logic, algorithms…), with the extensions provided by X12-ARIMA. Going into that
direction implies clearly that such software will move away from the original programs.
We compare shortly below some aspects of the current implementations, considering the regression
variables, the estimation methods and the main steps of the automatic model identification.
We propose at the end a road map towards a common implementation
Regression variables
Variable
Trading days
Working days
Leap year
Stock trading days
Easter effect
Labor Day
Thanksgiving
Outliers
1
Graphical User Interface
Tramo
X
X
X
X12-Arima
X
X
Special treatment in
multiplicative model
(optional)
X
X
X (several definitions for X (different mean
the last day of the
correction)
Easter period)
X
X
AO, TC, LS, SLS
AO, TC, LS, SO
(seasonal level shift)
LS, SO Outliers are 1LS, SLS Outliers are 0ending
JD+
X
X
Tramo and X12 like
X
Tramo and X12 like;
Julian Easter (no GUI1)
AO, TC, LS, SO
Outliers are 0 or 1ending
Ramps
Mean
Fixed seasonal
User-defined calendar
effects
User-defined variables
Change of regime
Fixed coefficients
ending
X
X
X (no test)
X (no test)
X
X
X
X
X
X
X (no test)
X (no GUI)
X
X
X
X
X (no test)
X (no GUI)
Estimation methods
Tramo
The estimation is based on the Kalman filter and the residuals are the one step-ahead forecast errors.
The optimization procedure is a specific version of the Levenberg-Marquardt algorithm; it uses the
Hannan-Rissanen algorithm to compute initial values of the parameters.
X12
The estimation of the RegArima model is based on a modified version of the Ljung-Box algorithm. That
solution is significantly slower than the Kalman filter. Moreover, it provides residuals that cannot be
always easily interpreted2.
The optimization procedure is a slightly modified version of the Minpack routines, also based on the
Levenberg-Marquardt algorithm (other implementation); it uses pre-defined initial values of the
parameters.
JD+
JD+ is very similar to Tramo. However, its optimization procedure slightly differs on some details. It
should be noted that, for comparability issues, JD+ uses in few cases the same algorithm as X12
(computation of the residuals…).
Automatic model identification (AMI)
JD+ offers both implementations.
Main steps
2
Tramo
X12-Arima
T. Mc Elroy, from the US-Census Bureau, also thinks that the current residuals of the X12 may be sometimes
strange and that they should not be used for testing (they are not NIID).
Preliminary seasonality test
Log/level
Calendar effects
Easter effect
Outliers detection
Other regression variables
Differencing
ARMA
Over/under differencing,
residual seasonality, other
final tests
Comparison with default
model
X
BIC-based
Automatic choice between WD,
TD (F-test)
No test for holydays or userdefined calendars
T-test
Fast detection based on
approximate estimations
No test
=
Fast detection based on
approximate estimations
(Hannan-Rissanen)
Rich
AICC-based
AIC test (pre-specified
variables, holidays, userdefined variables)
AIC test. Possible automatic
choice between different
lengths
slow detection based on exact
estimations
AIC test
=
slow detection based on exact
estimations
Very limited
Optional
Road map
We consider below the different tasks that should be fulfilled to arrive at a common pre-processing
module
Step 1
Common implementation of the regression variables.
The regression model should encompass all the options of each program. A unique definition should be
adopted (Easter, outliers…). For calendar effects, additional definitions could be considered (for
instance, Week Days+Saturdays+Sundays). Light development (1 month) and testing.
Step 2
Common estimation procedure.
For the estimation of RegArima models, the current choice of JD+ must be checked (Kalman filter +
optimization procedure). The comparison must be done following different criteria: precision,
robustness, speed. Few new developments, more testing (1 month).
Step 3
Extension of Tramo with features of X12
Modification of the current implementation of Tramo to take into account the additional features of X12
(preliminary leap year correction, automatic detection of the length of Easter effect, tests on any
regression variable). Light developments (2 month)
Step 4
Possible improvements of some sub-modules of Tramo
Possible improvement of Tramo. Any automatic routine can always be improved. Even if Tramo has
been fine-tuned by A. Maravall, some improvements are always possible (comparison with current X12
solutions…). Such research needs:


The definition of criteria to compare models (see for instance what is currently used in Tramo:
BIC, Ljung-Box of the residuals, number of outliers, stability tests…)
The comparison of the current implementation against new modules (with simulated series and
with real series); the impact should be measured for the sub-module and for the global
algorithm (using the criteria mentioned above).
Some examples are given below:






Some current seasonality tests seem too strict (QS significance level…) or not robust enough
(spectral diagnostics); they could be improved (perhaps).
The current log/levels algorithms are not robust against the presence of additive outliers: they
lead systematically to log transformations.
The choice of the ARMA model in Tramo is based on Hannan-Rissanen; the robustness of that
solution is not clear for complex models (especially with MA polynomials); moreover, the
current algorithm seems to lead to sometimes unnecessary complex models.
The current implementation of the outliers detection in Tramo is extremely efficient because it
is based on simple approximations; however, the robustness of the method should be checked,
especially at the beginning of the period.
The calendar effects are sometimes removed too early in the processing; they could be reintroduced after the outliers detection (like in X12)
More generally, the coherence of some tests (Easter…) should be improved.
Other remarks:


Some steps may be processed in parallel (1 and 2 for instance)
The training for hobby developers (September) could take some of the points discussed above
as examples


The proposed investigations could greatly improve the understanding of the routines and the
sharing of the knowledge amongst the community
The tests/improvements may be a long process, which can be spread over several years.
Conclusions and final remarks
Main questions:
Developing a unique regarima module implies automatically changes in the current core engines and
more discrepancies in comparison with them. Do we accept such implications?
The cost of such a development is not negligible (but manageable with the current resources). Are
the benefits sufficient for undertaking it? What is the priority of the project?
Remarks
In any case, the development of a new pre-processing module will constitute a major release of the tool.
It should be associated with other major modifications, like the change to Java 8. It could not be planned
before the end of 2016.
The current versions of Tramo and of X12 should not disappear of the software; however, they should
not evolve any more.
Bibliography
GOMEZ V. AND MARAVALL A (1994): "Estimation, Prediction, and Interpolation for Nonstationary Series with the
Kalman Filter", Journal of the American Statistical Association, vol. 89, n° 426, 611-624.
Ljung G. M., Box G.E.P. (1979), "The Likelihood Function of Stationary Autoregressive-Moving Average Models",
Biometrika, 66, 2, 265-270.
Otto M. C., Bell W.R., Burman J.P. (1987), "An Iterative GLS Approach to Maximum Likelihood Estimation of
Regression Models with Arima Errors", Bureau of the Census, SRD Research Report CENSUS/SRD/RR_87/34.
Download