JBench is a java console application for multivariate benchmarking

advertisement
JBench
JBench is a java console application for multivariate benchmarking. It is able to benchmark small to
medium-sized sets of data, following temporal and/or contemporaneous constraints. The constraints
may be binding (fixed) or not. They may represent the margins of multi-ways tables or hierarchical
structures or any other kind of relationships.
The underlying method is an extension of the Cholette's method, which generalizes, amongst others,
the additive and the multiplicative Denton's procedure as well as simple proportional benchmarking.
The first aim of the software is to provide a reconciliation method when direct seasonal adjustment
is used on a set of related time series or when annual benchmarking is needed. That is why it works
on outputs generated by Demetra+. However, the software can be used for a larger set of
benchmarking problems.
Finally, it should be mentioned that the actual algorithms used by JBench are included in the package
jtstoolkit.jar, which is also the basic library of JDemetra+. The method can be executed by direct
function calls instead of by means of the command line described below.
Brief description of the algorithm
Contrary to usual implementations, which are based on expensive matrix computation, JBench uses
an approached based on state space forms (ssf) and on their related Kalman smoother. That solution
increases dramatically the performances and allows the exact handling of complex relationships of
medium-sized data set (up to several hundreds of monthly series). We shortly mention below,
without any technical details (see the technical document of jtstoolkit for further information), the
key points of the implementation.





The Cholette's method is put in state space form by considering that
𝑦𝑖𝑡 −𝑦̂
𝑖𝑡
|𝑦𝑖𝑡 |𝜆
=
𝜇𝑖𝑡 , (𝑦𝑖𝑡 𝑖𝑠 𝑎𝑛 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 𝑎𝑛𝑑 𝑦̂
𝑖𝑡 𝑖𝑠 𝑖𝑡𝑠 𝑏𝑒𝑛𝑐ℎ𝑚𝑎𝑟𝑘𝑒𝑑 𝑣𝑎𝑙𝑢𝑒) follows an auto-regressive
model of order 1: 𝜇𝑖𝑡 = 𝜌𝜇𝑖𝑡 + 𝜀𝑖𝑡 , 0 ≤ 𝜌 ≤ 1. Parameters 𝜆 = 0, 𝜌 = 1 correspond to an
additive Denton, 𝜆 = 1, 𝜌 = 1 to a multiplicative Denton and 𝜆 = 0.5, 𝜌 = 0 to a
proportional method.
The ssf-form of the multivariate problem is achieved by "stacking" the individual ssf-forms
The constraints are handled as "pseudo-observations"; so, the multivariate ssf-form has one
measurement equation for each constraint.
The multivariate ssf problem is handled in its univariate form (see Durbin and Koopman (DK)
for further details).
When the problem is diffuse (𝜌 = 1), the solution proposed by DK for an exact initialization is
followed. However, the usual implementation is slightly modified to get round numerical

instabilities that are often encountered: the diffuse part is handled by means of the array
filter approach (see Kailath, Sayed...), which corresponds to a square root filter.
The disturbance smoother of DK is used instead of the usual smoother; such a smoother
needs much less memory, at the price of a (usually) small loss of stability.
Description of the parameters
The program is launched by means of the following command line:
java [-XmxZZZZm] -jar [xxx/]jbench.jar -i inputFile [-r rho][-l lambda] [-d dconstraintsFile] [-t
tconstraintsFile] [-c cconstraintsFile][-o outputFile]
The different parameters are described below.
Parameters
-XmxZZZZm
The optional -XmxZZZZm parameter (where ZZZZ stands for the actual size) defines the memory
allocated to the Java runtime. For large sets of data, -Xmx1024m is usually a good option. For small
sets of data (or if the global parameters of Java are already set in that way), the parameter can be
omitted
-i inputFile
The -i inputFile parameter is the only mandatory one (except of course the -jar option). It identifies
the file that will provide all the input time series for the processing (except the series included in the
-d option; see below).
The format of the input file corresponds to the default csv output (list presentation) produced by
Demetra+. We recall it in the annex.
-r rho
Auto-regressive parameter of the model (1 for "Denton"). The default value is 1.
-l lambda
Power of the weighted observations (0 for additive, .5 for proportional, 1 for multiplicative). The
default value is 1.
-d dconstraintsFile
The -d option identifies temporal constraints corresponding to (some) input series. The temporal
constraints are provided in the same format as the input file. Input series and temporal constraints
are associated using their identifiers, which must be exactly the same. The temporal constraints may
be expressed in the aggregation frequency or in the same frequency as the original series. In the
latter case, annual constraints are applied by default. A typical use of that option corresponds to the
following scenario:




Use Demetra+ to seasonally adjust a set of series.
Generate csv files (you get for instance demetra_y.csv, demetra_ycal.csv, demetra_sa.csv...
Use JBench with the command ".... -i demetra_sa.csv -d demetra_ycal.csv" (or
demetra_y.csv).
The results contains the usual univariate benchmarked series
-t tconstraintsFile
The optional -t file expresses the temporal constraints in another (more flexible) way: it defines the
mapping between an aggregated series and its disaggregated counterpart; both series must belong
to the input file; the mapping is defined in a csv file as follows (the identifiers correspond of course to
the identifiers of the input file):
aggregate1,details1
aggregate2,details2
...
-c cconstraintsFile
The optional -c file (csv format) defines the contemporaneous constraints. Contemporaneous
constraints may be binding (the constraint is fixed) or not. Each constraint corresponds to a line in
the csv file, using the following conventions:
Binding case
Equation: 𝑦 = 𝑎1 𝑥1 + ⋯ + 𝑎𝑛 𝑥𝑛
csv format: "𝑦, , 𝑥1, 𝑎1, ⋯ , 𝑥𝑛 , 𝑎𝑛 "
Unbinding case
Equation: 𝑐 = 𝑎1 𝑥1 + ⋯ + 𝑎𝑛 𝑥𝑛 (𝑢𝑠𝑢𝑎𝑙𝑙𝑦 𝑐 = 0 𝑎𝑛𝑑 𝑠𝑜𝑚𝑒 𝑎𝑖 < 0)
csv format: ", 𝑐, 𝑥1, 𝑎1, ⋯ , 𝑥𝑛 , 𝑎𝑛 "
The identifiers of the variables may contain the usual wild cards (? or *). In such a case, the same
coefficient is applied to each series of the input file that matches the criterion (except the binding
constraint).
Example:
Equation: 𝑡𝑜𝑡𝑎𝑙 = 𝑠1 + ⋯ + 𝑠100
csv format: "𝑡𝑜𝑡𝑎𝑙, , 𝑠 ∗ ,1" (binding) or ", 0, 𝑡𝑜𝑡𝑎𝑙, 1, 𝑠 ∗, −1" (unbinding)
-o outputFile
By default the results are stored in the bench.csv file. However, the user can specify another file by
means of that option. The output file will only contain the endogenous series (binding constraints are
not included in the output).
Remarks:
When specifying constraints, the user has to verify that


The constraints are coherent
The constraints are not redundant.
Those points are essential for the success of the processing
The current version of the software doesn't check the compatibility of the constraints. Moreover, it
doesn't remove unnecessary contemporaneous constraints (redundant temporal constraints are
automatically removed). Future releases will improve that point.
For a good understanding of some results, the lecture of the reference book of Dagum and Cholette
("Benchmarking, Temporal distribution and Reconciliation Methods for Time Series") is strongly
recommended (see especially the chapters on the reconciliation of one-way and of two-ways tables
without temporal aggregation constraints).
Examples
1. Uni-variate multiplicative denton benchmarking, using outputs of Demetra+
java -jar jbench.jar -i demetra_sa.csv -d demetra_ycal.csv
2. Additive 2-ways Denton's like benchmarking
java -jar jbench.jar -i test.csv -c ctest.csv -l 0
with the following files:
test.csv
s11 ...
s12 ...
s21 ...
s22...
r1...
r2...
c1...
c2...[c2=r1+r2-c1]
ctest.csv
r1,,s1?,1 [r1=s11+s12]
r2,,s2?,1 [r2=s21+s22]
c1,,s?1,1 [c1=s11+s21]
[c2,,s?2,1 c2=s12+s22 is omitted]
3. Denton benchmarking, using outputs of Demetra+, with additional constraint on the totals
(identified by the series all, which must be defined in the input file.)
java -jar jbench.jar -i demetra_sa.csv -d demetra_ycal.csv -c all.csv
all.csv
all,,*,1
Bibliography
DAGUM, B.E. and CHOLETTE P.A. (2006). Benchmarking, Temporal Distribution, and Reconciliation
Methods for Time Series, Springer.
DURBIN, J. and KOOPMAN, S. J. (2001). Time Series Analysis by State Space Methods. Oxford
Statistical Science Series.
HARVEY, A.C. (1989), "Forecasting, Structural Time Series Models and the Kalman Filter", Cambridge
University Press.
PIZZINGA, A. (2009). Diffuse Restricted Kalman Filtering. 31º Meeting of the Brazilian Econometric
Society.(http://virtualbib.fgv.br/ocs/index.php/sbe/EBE09/paper/viewFile/938/296).
Annex
csv data format (options -i, -d and output)
Each series is described in a single row, composed of:
Identifier, frequency, first year, first period, number of observations, observations.
Example for the monthly series "Test", starting in January 2012 and containing 5 observations:
Id
Test
Freq
12
Year0
2012
Period0
1
N. data
5
d1
1.0
d2
2.0
Content of the csv file:
"Test, 12, 2012, 1, 5, 1.0, 2.0, 3.0, 4.0, 5.0"
Be aware that the csv format depends on the regional settings.
d3
3.0
d4
4.0
d5
5.0
Download