Output Message

advertisement
Statistical Service Specification
This document provides the specification of three services which implement the core
functionality in confidentialisation on the fly process.
The following assumptions apply to all three services:





A micro dataset must be prepared for external analysis before it is used by these
services. This may include removing existing variables from the dataset or adding
new variables that will not be visible to the user or discretisizing existing continuous
variables - just to name few cases;
The dataset never leaves the service boundary - it exists only on the server-side;
All metadata associated with a dataset is stored on the back-end and is used by these
services. It is referred to in this document as dataset metadata;
The methodology of the underlying confidentialization is driven by an extensive set of
configuration parameters. These parameters are carefully chosen by those who are
familiar with both the methodology and the data being confidentialized. It is critical
to keep these configuration parameters confidential.
Links to XSD types are provided for those service parameters that do have a
comparable types in DDI.
1) Statistical Service Specification: ConfidentializedModelService
This service builds a confidentialized statistical model. Not all properties of the model are
returned to the user. For example, returning model parameters and residuals would allow one
to re-produce the actual microdata values (within the level of perturbation that may have been
applied before building the model). The service also allows the user to get a set of diagnostic
plots for a specified model.
Protocol for Invoking the Service
The service can be invoked as a SOAP webservice.
The service has two methods, and all service parameters are passed by value.
a) Service method: getStatisticalModel
This method builds and returns a confidentialized statistical model.
Input Messages



userID - user ID;
datasetID – ID of the micro dataset on which to build the model; and
modelDescriptor – an object of ModelDescriptor type.
Output Message
The outputs of the service method is:

StatisticalModel - confidentialized statistical model.
b) Service method: getModelPlots
This method builds and returns one or more diagnostic plots for the specified model.
Input Messages





userID - user ID
datasetID – ID of the micro dataset on which to build the model;
modelDescriptor – an object of ModelDescriptor type;
width - plot width; and
height - plot height.
Output Message
The outputs of the service method is:

Plot- diagnostic plot of the model.
2) Statistical Service Specification: ConfidentializedEDAService
Exploratory Data Analysis (EDA) is a standard step in any statistical data analysis. Basically,
it allows a user to look at microdata and identify patterns that can then be more formally
explained in statistical models. However, in confidentialization-on-the-fly settings, the user
is not allowed to look at the microdata. Confidentialized EDA Service provides two methods
that allow a user to get familiar with the data, at a level specified by configuration
parameters, without the having a respondent identified.
Protocol for Invoking the Service
The service can be invoked as a SOAP webservice.
The service has two methods, and all service parameters are passed by value.
a) Service method: getDataSummary
This method returns a summary table.
Input Messages



userID - user ID
datasetID – ID of the micro dataset on which to build the model;
byVar– list of IDs of (categorical) variables by which to group other (cell) variables
(references dataset metadata);

cellVars - list of IDs of the variables to display in the table (references dataset); and
Output Message
The outputs of the service method is:

unit data set containing the actual table data
b) Service method: getWeightedDataSummary
This method returns a summary table.
Input Messages





userID - user ID
datasetID – ID of the micro dataset on which to build the model;
byVar– list of IDs of (categorical) variables by which to group other (cell) variables
(references dataset metadata);
cellVars - list of IDs of the variables to display in the table (references dataset); and
weightVarID - ID of the weight variable (references dataset metadata).
Output Message
The outputs of the service method is:

a UnitDataSet containing the actual table data.
c) Service method: getHexBinPlot
This method creates a hex-bin plot for two continuous variables.
Input Messages



userID - user ID
datasetID – ID of the micro dataset on which to build the model;
plotDescritor – an object of BoxPlotDescriptor type;
Output Message
The outputs of the service method is:

a plot.
3) Statistical Service Specification: ConfidentializedDataService
This service allows a user to customize an existing data set. For example, certain variables
may need to be added to the data set to accommodate building a particular statistical
model. It may seem that the service provides functionality that is independent of the
underlying confidentialization process, but it is not. For example, adding a variable to the
dataset involves also adding certain metadata associated with that variable, as specified by the
underlying configuration settings, and these are the key ingredients to preventing attacks (as
described in the methodology paper). Thise metadata is added to the dataset metadata.
Protocol for Invoking the Service
The service can be invoked as a SOAP webservice.
All method parameters are passed by value. The following are service methods:
a) Service method: createNewDataSet
Assigns one of the available dataset to the user. A user may modify his/her dataset through
one or more of the methods below. To maintain service statelessness, the data set is
serialized after every modifications ofits structure.
Input Messages




userID - user ID
surveyID - survey ID
datasetID – ID of the new micro dataset;
variables - the list of variables with their existing and new names. User may rename
variables through this parameter.
Output Message
The output of the service method is just the status of the operation.
b) Service method: deleteDataSet
This method delets user's dataset.
Input Messages


userID - user ID;
datasetID – dataset ID;
Output Message
The output of the service method is just the status of the operation.
c) Service method: keepRecords
Keeps only those records in the data set that meet the specified criteria (expressed in terms of
categrories of an existing variable). Records cannot currently be filtered based on continuous
criteria.
Input Messages



userID - user ID
datasetID – dataset ID; and
keepCriteria– a logical expression specifying the criteria for keeping records (for
example: state=10,20,30,40).
Output Message
The output of the service method is just the status of the operation.
d) Service method: dropVariable
This method creates a hex-bin plot for two continuous variables.
Input Messages



userID - user ID
datasetID – dataset ID; and
variableID – ID of the variable to drop.
Output Message
The output of the service method is just the status of the operation.
e) Service method: dropVariables
Drops one or more variables from the dataset.
Input Messages



userID - user ID;
datasetID – dataset ID; and
variableIDs – IDs of variables to drop (reference dataset metadata).
Output Message
The output of the service method is just the status of the operation.
f) Service method: addContinuousVariable
Adds a new continuous variable to the dataset.
Input Messages
o
o
o
o

userID - user ID
fromDatasetID – ID of the dataset to which to add a variable;
toDatasetID - ID of the new dataset;
variableID– ID of the new variable to add; and
expression - formula expression of the new variale (a function of existing variables
that reference dataset metadata).
Output Message
The output of the service method is just the status of the operation.
g) Service method: addCategoricalVariable
Adds a new categorical variable to the data set.
Input Messages





userID - userID
fromDatasetID –ID of the dataset to which to add a variable;
toDatasetID - ID of the new dataset;
variableID – ID of the new variable;
variableCodes- the list of variable codes.
Output Message
The output of the service method is just the status of the operation.
Applicable Methodologies
Gwenda Thompson, Stephen Broadfoot and Daniel Elazar: “Methodology for the
Automatic Confidentialisation of Statistical Outputs from Remote Servers at the
Australian Bureau of Statistics”, Australian Bureau of Statistics.
Download