Statistical Disclosure Control at SURS

advertisement
Statistical
Disclosure Control
(SDC) at
SURS
Andreja Smukavec
General Methodology and
Standards Sector
Why is confidentiality
protection needed?
• One of the fundamental principles of official
statistics is that statistical information of data
suppliers is strictly confidential, and is used only
for statistical purposes.
• Legislation places a legal obligation on NSIs to
protect data suppliers.
• Data suppliers should have confidence in the
NSI to preserve the confidentiality of individual
information – better quality of the collected data.
National legislation
• National Statistics Act
– Data published in aggregated form.
– Data may be published individually if
• written consent of reporting units is obtained;
• data are collected from public data collections;
• data are published in such a way that the reporting units cannot be
directly identified.
– The Office or authorized producers shall transmit individual
data to users on the basis of a written application.
• Other legislation
– Personal Data Protection Act;
–…
European legislation
• European Regulation (EC) No 223/2009
– General definitions;
– Chapter 5 – Statistical Confidentiality
• Access to confidential data for scientific purposes
• European Statistics Code of Practice
- Principle 5: The confidentiality of the
information the data providers provide and its
use only for statistical purposes are absolutely
guaranteed.
What does SDC cover at SORS?
• Tabular data protection
– Publication
– Eurostat and other institutions
– Users‘ requests
• Microdata protection
– Preparation of public-use files and scientificuse files
– Checking rules set up by Eurostat
• Output checking
Tabular data protection
• Tables – aggregated data
– Magnitude tables
Sum of quantitative variable of respondents, where
respondents are grouped by categorical variables.
– Frequency tables
Number of respondents, where respondents are
grouped by categorical variables.
Tabular data protection at SURS
• Method Cell Suppression
- Post-tabular method
- Non-perturbative method (less information
available)
- Implemented in Tau-Argus software (CASC
project)
- The interval of possible values for each
sensitive cell is sufficiently large
Tabular data protection
Cell Suppression
• Sensitivity rules – defining unsafe cells
– Threshold
The number of respondents in a cell is below a
certain threshold value.
– Concentration rules
One or two respondents are dominant.
– Group disclosure
All respondents in one cell have the same value for
a sensitive variable.
Cell Suppression
• Secondary suppression
-
Needed due to sums in the tables. The feasibility
interval for each unsafe cell has to be wide enough.
Optimisation problem -> LP-solver used (XPress,
CPlex).
Cell Suppression Publication
Microdata protection
• Microdata are deindividualized pieces of
information for individual units
(enterprises, persons, households).
– no direct identifiers (ID numbers, TAX
numbers, name + address…)
• Microdata files are available to our
researchers in the secure room and via
remote access.
Microdata protection
Scientific-use file (SUF)
• Prepared for researchers
• Signed contract
• Usually sent by CD + password, has to be
destroyed after usage
• More information (variables) available
• Only unintentional disclosures are
protected
Microdata protection
Public-use file (PUF)
• Publicly available or after registration
• Less information (variables) available
• All microdata protection methods are NOT
usable (too complex for normal users)
• Intentional disclosures are protected
Microdata protection
• The goal of microdata protection is to
make a safe microdata file, where
– disclosure risk is low;
– analyses done on a safe file have to give
results which are close or equal to results of
analyses done on original data.
Microdata protection methods,
used at SURS
• Modifying original microdata file, done by
– non-perturbative methods:
• global recoding;
• top and bottom coding;
• local suppression (not very usable for PUFs).
– some perturbative methods:
• microaggregation;
• rounding.
• Software packages Mu-Argus and R.
Labour Force Survey - PUF
• Prepared for Social Data Archives (DwB
project).
• We used Eurostat‘s rules for creating SUF
and by method sampling created PUF (one
third of original sample).
• We didn‘t use local suppression.
• The quality of statistics used as parameters
for method sampling is ensured, other
should be used with precaution.
Output checking
1. Researchers fill out our form after finishing
work.
2. An e-mail is sent to our common e-mail
address zascas.surs@gov.si.
3. One of the SDC methodologists checks the
output. In case of disclosive data or incorrectly
filled form, the researchers are contacted for
additional information or to correct the output.
4. After the SDC methodologist agrees with the
dissemination, the output is sent to the
researcher by e-mail.
Rules for output checking
• Rule-of-thumb model
– Threshold N – all tabular and similar output
should have at least N units.
– Dominance rule – the analysis should not be
done on groups with a dominant unit.
– Maximum and minimum are usually not
released (exception if they refer to more than
one unit).
– 100% percentile is usually not released
(maximum).
Thank you for your attention!
zascas.surs@gov.si
Download