Statistical Service Specification Statistical Service Specification: Sample Selection Protocol for Invoking the Service This service is invoked by calling a function called "Sample Selection". The protocol used to invoke this function should be in compliance with the guidance provided for developing Statistical Service by CSPA. The service will run for several minutes, so most likely an asynchronous calling mechanism will be required. In addition to the input parameters specified below, the caller will include a RequestIdentifier in the input message, the value of which the service must return in the output as a Correlation-Identifier. Further requirements are: The service must be able to accept and queue requests even when processing previous requests The service must process requests in a timely manner, but not necessarily in order of arrival The service must have reasonable performance and throughput and should be scalable. This may be solved in the implementation (deployment infrastructure) rather than in the application. The service should be able to accept SurveyGroup data, Classifications related to the Base and SubStratification and the PopulationData "by value" as well as "by reference". Parameters: Testrun (boolean): an indicator specifying whether the service should execute a normal production run or a testrun in which normal output is produced, but no updates are made to the internal state (no accumulation of burden on population units, no changes to panels) SurveyGroup: the complete description of the Survey Group this survey belongs to, consisting of: o Code: Identifier for the survey group o InitializationMethod: Id of method to be used when a new Unit is found or a Unit changes stratum (methods predefined in service) o BaseStratificationDimensions: For each dimension: Identifier Code and data type For each Code value in the dimension, the complete mapping of lower level code values (From= , To= ) SurveyBurden (float): the burden this survey will cause for the respondents (population units selected); default that can be overridden at the level of the strata IsPanel (boolean): indicator specifying whether this survey uses a panel RotatePanel (boolean): in case of a Panel, indicates whether the panel should be rotated RotatePanelFraction (float): fraction of the panel that must be refreshed; default that can be overridden at the level of the strata SampleMethod (string): Id of method to be used for selecting the sample (future use, methods predefined in service) UnitIdMappingName (string): the name of the variable in the input data filling the role of Unit Id BaseStratVars (collection): the set of variables that map to the Base Stratification dimensions. Each variable has a refCode refering to the dimension in the Base Strat and a MappingName (string) containing the name of the variable in the input data that maps to this dimension SubStratVars (collection): the set of variables for sub-stratification. Each variable has a MappingName (string) giving the name of the variable in the input data, a Code and a DataType. With each variable, there also is a list of RangeGroups specifying the mapping of the values in the input data to the aggregation level in this sub-strat dimension HelperVars (collection): a set of variables, each with a Code, a DataType and a MappingName (string) refering to the variable in the input data (future use, for new methods) CustomFractions (collection): a set of Fractions, where each Fraction specifies a set of Units Id's in the input data with a Fraction (probability of selecting these Units in the sample) Strata (collection): for each of the Base Strata and Sub-Strata, the set of Code values that identify the (sub)stratum, together with Fraction, NumUnits, MinNumUnits, RotationFraction and SurveyBurden PopulationDataDataStructure: the data structure of the PopulationData, a Unit Dataset with a single record type PopulationData: the Population Frame input data Input Messages In GSIM terms, the inputs to this service are …… (ref Service Definition) Describe specific inputs in terms to GSIM implementation Output Message The outputs of the service are …… (ref Service Definition) Describe specific outputs in terms to GSIM implementation Applicable Methodologies Describe the statistical methods that may be implemented in this Statistical Service (ref Service Definition) The basic methodologies used in this service are described in the attached document from 1993 and have been used by Statistics Netherlands since then. In the meantime, the following improvements have been made, that will also be included in the service. Improved assignment of ED-values and panel-indicator for new units (births) and units changing stratum Possibility to deviate from the Base Stratification by means of sub-stratification and the corresponding way of adapting ED-values.