Statistical Service Specification

advertisement
Statistical Service Specification
Statistical Service Specification: Sample Selection
Protocol for Invoking the Service
This service is invoked by calling a function called "Sample Selection".
The protocol used to invoke this function should be in compliance with the guidance
provided for developing Statistical Service by CSPA. The service will run for several
minutes, so most likely an asynchronous calling mechanism will be required.
In addition to the input parameters specified below, the caller will include a RequestIdentifier in the input message, the value of which the service must return in the output as a
Correlation-Identifier.
Further requirements are:




The service must be able to accept and queue requests even when processing previous
requests
The service must process requests in a timely manner, but not necessarily in order of
arrival
The service must have reasonable performance and throughput and should be
scalable. This may be solved in the implementation (deployment infrastructure) rather
than in the application.
The service should be able to accept SurveyGroup data, Classifications related to the
Base and SubStratification and the PopulationData "by value" as well as "by
reference".
Parameters:


Testrun (boolean): an indicator specifying whether the service should execute a
normal production run or a testrun in which normal output is produced, but no updates
are made to the internal state (no accumulation of burden on population units, no
changes to panels)
SurveyGroup: the complete description of the Survey Group this survey belongs to,
consisting of:
o Code: Identifier for the survey group
o InitializationMethod: Id of method to be used when a new Unit is found or a
Unit changes stratum (methods predefined in service)
o BaseStratificationDimensions:



For each dimension: Identifier Code and data type
 For each Code value in the dimension, the complete mapping
of lower level code values (From= , To= )
SurveyBurden (float): the burden this survey will cause for the respondents
(population units selected); default that can be overridden at the level of the strata
IsPanel (boolean): indicator specifying whether this survey uses a panel











RotatePanel (boolean): in case of a Panel, indicates whether the panel should be
rotated
RotatePanelFraction (float): fraction of the panel that must be refreshed; default that
can be overridden at the level of the strata
SampleMethod (string): Id of method to be used for selecting the sample (future use,
methods predefined in service)
UnitIdMappingName (string): the name of the variable in the input data filling the
role of Unit Id
BaseStratVars (collection): the set of variables that map to the Base Stratification
dimensions. Each variable has a refCode refering to the dimension in the Base Strat
and a MappingName (string) containing the name of the variable in the input data that
maps to this dimension
SubStratVars (collection): the set of variables for sub-stratification. Each variable has
a MappingName (string) giving the name of the variable in the input data, a Code and
a DataType. With each variable, there also is a list of RangeGroups specifying the
mapping of the values in the input data to the aggregation level in this sub-strat
dimension
HelperVars (collection): a set of variables, each with a Code, a DataType and a
MappingName (string) refering to the variable in the input data (future use, for new
methods)
CustomFractions (collection): a set of Fractions, where each Fraction specifies a set of
Units Id's in the input data with a Fraction (probability of selecting these Units in the
sample)
Strata (collection): for each of the Base Strata and Sub-Strata, the set of Code values
that identify the (sub)stratum, together with Fraction, NumUnits, MinNumUnits,
RotationFraction and SurveyBurden
PopulationDataDataStructure: the data structure of the PopulationData, a Unit Dataset
with a single record type
PopulationData: the Population Frame input data
Input Messages
In GSIM terms, the inputs to this service are …… (ref Service Definition)
Describe specific inputs in terms to GSIM implementation
Output Message
The outputs of the service are …… (ref Service Definition)
Describe specific outputs in terms to GSIM implementation
Applicable Methodologies
Describe the statistical methods that may be implemented in this Statistical Service
(ref Service Definition)
The basic methodologies used in this service are described in the attached document from
1993 and have been used by Statistics Netherlands since then. In the meantime, the following
improvements have been made, that will also be included in the service.

Improved assignment of ED-values and panel-indicator for new units (births) and
units changing stratum

Possibility to deviate from the Base Stratification by means of sub-stratification and
the corresponding way of adapting ED-values.
Download