NATIONAL BANK OF BELGIUM JD+ Towards a single RegArima modelling (Draft) Jean Palate 7/2/2015 Introduction Tramo-Seats and X12-ARIMA provide different RegArima modelling facilities. That can be disturbing for many users, especially when the two algorithms are integrated in single software, like JD+. For many reasons (transparency, coherence, maintenance…), it seems desirable to offer a unique pre-processing module for the two methods. X12-ARIMA is based on an old version of Tramo; so, it doesn’t contain many recent improvements of Tramo (especially related to calendar effects, seasonality tests and over/under differencing). Moreover, if the two routines follow roughly the same logic, they differ significantly on most details. Tramo is significantly faster (up to 10 times) and more stable than X12-ARIMA (see the tests of the SACE). On the other hand, X12-ARIMA offers some facilities that are not included in Tramo (handling of the leap year effect, changes of regime, automatic detection of the length of Easter effect…). A common RegArima modelling should take the best aspects of both solutions; it should be mainly based on Tramo (logic, algorithms…), with the extensions provided by X12-ARIMA. Going into that direction implies clearly that such software will move away from the original programs. We compare shortly below some aspects of the current implementations, considering the regression variables, the estimation methods and the main steps of the automatic model identification. We propose at the end a road map towards a common implementation Regression variables Variable Trading days Working days Leap year Stock trading days Easter effect Labor Day Thanksgiving Outliers 1 Graphical User Interface Tramo X X X X12-Arima X X Special treatment in multiplicative model (optional) X X X (several definitions for X (different mean the last day of the correction) Easter period) X X AO, TC, LS, SLS AO, TC, LS, SO (seasonal level shift) LS, SO Outliers are 1LS, SLS Outliers are 0ending JD+ X X Tramo and X12 like X Tramo and X12 like; Julian Easter (no GUI1) AO, TC, LS, SO Outliers are 0 or 1ending Ramps Mean Fixed seasonal User-defined calendar effects User-defined variables Change of regime Fixed coefficients ending X X X (no test) X (no test) X X X X X X X (no test) X (no GUI) X X X X X (no test) X (no GUI) Estimation methods Tramo The estimation is based on the Kalman filter and the residuals are the one step-ahead forecast errors. The optimization procedure is a specific version of the Levenberg-Marquardt algorithm; it uses the Hannan-Rissanen algorithm to compute initial values of the parameters. X12 The estimation of the RegArima model is based on a modified version of the Ljung-Box algorithm. That solution is significantly slower than the Kalman filter. Moreover, it provides residuals that cannot be always easily interpreted2. The optimization procedure is a slightly modified version of the Minpack routines, also based on the Levenberg-Marquardt algorithm (other implementation); it uses pre-defined initial values of the parameters. JD+ JD+ is very similar to Tramo. However, its optimization procedure slightly differs on some details. It should be noted that, for comparability issues, JD+ uses in few cases the same algorithm as X12 (computation of the residuals…). Automatic model identification (AMI) JD+ offers both implementations. Main steps 2 Tramo X12-Arima T. Mc Elroy, from the US-Census Bureau, also thinks that the current residuals of the X12 may be sometimes strange and that they should not be used for testing (they are not NIID). Preliminary seasonality test Log/level Calendar effects Easter effect Outliers detection Other regression variables Differencing ARMA Over/under differencing, residual seasonality, other final tests Comparison with default model X BIC-based Automatic choice between WD, TD (F-test) No test for holydays or userdefined calendars T-test Fast detection based on approximate estimations No test = Fast detection based on approximate estimations (Hannan-Rissanen) Rich AICC-based AIC test (pre-specified variables, holidays, userdefined variables) AIC test. Possible automatic choice between different lengths slow detection based on exact estimations AIC test = slow detection based on exact estimations Very limited Optional Road map We consider below the different tasks that should be fulfilled to arrive at a common pre-processing module Step 1 Common implementation of the regression variables. The regression model should encompass all the options of each program. A unique definition should be adopted (Easter, outliers…). For calendar effects, additional definitions could be considered (for instance, Week Days+Saturdays+Sundays). Light development (1 month) and testing. Step 2 Common estimation procedure. For the estimation of RegArima models, the current choice of JD+ must be checked (Kalman filter + optimization procedure). The comparison must be done following different criteria: precision, robustness, speed. Few new developments, more testing (1 month). Step 3 Extension of Tramo with features of X12 Modification of the current implementation of Tramo to take into account the additional features of X12 (preliminary leap year correction, automatic detection of the length of Easter effect, tests on any regression variable). Light developments (2 month) Step 4 Possible improvements of some sub-modules of Tramo Possible improvement of Tramo. Any automatic routine can always be improved. Even if Tramo has been fine-tuned by A. Maravall, some improvements are always possible (comparison with current X12 solutions…). Such research needs: The definition of criteria to compare models (see for instance what is currently used in Tramo: BIC, Ljung-Box of the residuals, number of outliers, stability tests…) The comparison of the current implementation against new modules (with simulated series and with real series); the impact should be measured for the sub-module and for the global algorithm (using the criteria mentioned above). Some examples are given below: Some current seasonality tests seem too strict (QS significance level…) or not robust enough (spectral diagnostics); they could be improved (perhaps). The current log/levels algorithms are not robust against the presence of additive outliers: they lead systematically to log transformations. The choice of the ARMA model in Tramo is based on Hannan-Rissanen; the robustness of that solution is not clear for complex models (especially with MA polynomials); moreover, the current algorithm seems to lead to sometimes unnecessary complex models. The current implementation of the outliers detection in Tramo is extremely efficient because it is based on simple approximations; however, the robustness of the method should be checked, especially at the beginning of the period. The calendar effects are sometimes removed too early in the processing; they could be reintroduced after the outliers detection (like in X12) More generally, the coherence of some tests (Easter…) should be improved. Other remarks: Some steps may be processed in parallel (1 and 2 for instance) The training for hobby developers (September) could take some of the points discussed above as examples The proposed investigations could greatly improve the understanding of the routines and the sharing of the knowledge amongst the community The tests/improvements may be a long process, which can be spread over several years. Conclusions and final remarks Main questions: Developing a unique regarima module implies automatically changes in the current core engines and more discrepancies in comparison with them. Do we accept such implications? The cost of such a development is not negligible (but manageable with the current resources). Are the benefits sufficient for undertaking it? What is the priority of the project? Remarks In any case, the development of a new pre-processing module will constitute a major release of the tool. It should be associated with other major modifications, like the change to Java 8. It could not be planned before the end of 2016. The current versions of Tramo and of X12 should not disappear of the software; however, they should not evolve any more. Bibliography GOMEZ V. AND MARAVALL A (1994): "Estimation, Prediction, and Interpolation for Nonstationary Series with the Kalman Filter", Journal of the American Statistical Association, vol. 89, n° 426, 611-624. Ljung G. M., Box G.E.P. (1979), "The Likelihood Function of Stationary Autoregressive-Moving Average Models", Biometrika, 66, 2, 265-270. Otto M. C., Bell W.R., Burman J.P. (1987), "An Iterative GLS Approach to Maximum Likelihood Estimation of Regression Models with Arima Errors", Bureau of the Census, SRD Research Report CENSUS/SRD/RR_87/34.