Overview of Archiving of Microdata Session 4 United Nations Statistics Division

advertisement
Overview of Archiving of
Microdata
Session 4
United Nations Statistics Division
Demographic Statistics Section
Regional Seminar on Census Data Archiving for Africa,
Addis Ababa, Ethiopia, 20-23 September 2011
Overview of Presentation
 What are microdata?
 Why disseminate microdata?
 Data files for archiving
 Preparing the data sets
 Data security
 Tools for archiving of microdata
 Risks of disseminating microdata
Regional Seminar on Census Data Archiving for Africa,
Addis Ababa, Ethiopia, 20-23 September 2011
What are microdata?
Microdata:
are electronic data files containing the information
about each unit of enumeration such as person,
household, housing unit
are organized data files in which each line (or
record) contains information about one unit of
observation
contain information in the form of coded values
contain different types of variables-numeric,
alphanumeric, discrete or continuous-obtained from
direct responses or derived by
imputation/calculation
Regional Seminar on Census Data Archiving for Africa,
Addis Ababa, Ethiopia, 20-23 September 2011
Why disseminate microdata?


Main reason is to support research by offering
flexibility

to define variables and modify categories in a way
to meet the needs of researches

to generate more interest which facilitates wider use
of census data
A closer relationship between data providers and users
can improve the reliability and relevance of data
Regional Seminar on Census Data Archiving for Africa,
Addis Ababa, Ethiopia, 20-23 September 2011
Version of data files for archiving
Data procedures often create multiple versions of microdata
files. These files;
 are created during different stages of census operation
 differ in the quality, content and number of records
 range from raw microdata files to cleaned and edited files
for public use
Regional Seminar on Census Data Archiving for Africa,
Addis Ababa, Ethiopia, 20-23 September 2011
Regional Seminar on Census Data Archiving for Africa,
Addis Ababa, Ethiopia, 20-23 September 2011
What is sensitive in microdata?
 In order to ensure data confidentiality, census data
usually do not contain variables that are direct identifiers
 Census data sets include variables that are indirect
identifiers;
 Detail geographic information
 Detail information on professional status
 Some variables in microdata sets can be sensitive due to
the nature of the information contained in them
 Information on income, ethnicity, religion, etc.
Regional Seminar on Census Data Archiving for Africa,
Addis Ababa, Ethiopia, 20-23 September 2011
Preparing the data set
Acquisition
 Microdata can be generated from various data sources:
censuses, surveys and administrative registers
 A clear acquisition policy that describes scope, source
and mandate for the acquisition of microdata sets is
necessary
 NSO can play an important role by expanding the scope
of the data archive to official sources such as line
ministries
Regional Seminar on Census Data Archiving for Africa,
Addis Ababa, Ethiopia, 20-23 September 2011
Preparing the data set
 Data file
 Hierarchical/relational files are easier to analyze




and more
efficient for data storage
The identification variables in all data files should provide a
unique identifier
Unique identifiers to merge data files should be composed
of numeric variables for more efficient sorting and filtering
of records
A unique household identification should not be a
compilation of geographic codes since these codes are highly
identifying
All unnecessary or temporary variables from the data files
should be removed
Regional Seminar on Census Data Archiving for Africa,
Addis Ababa, Ethiopia, 20-23 September 2011
Preparing the data set
 Variables and codes
All variables are labeled (variable labels) and the
codes for all categorical variables are labeled (value
labels)
“Missing” codes should be standardized for all
variables
“Not applicable” code should be distinct from other
missing codes
If “errors” or “missing data” imputed, this should be
indicated in the data set
Regional Seminar on Census Data Archiving for Africa,
Addis Ababa, Ethiopia, 20-23 September 2011
Preparing the data set
 Verification operation
 If a dataset is hierarchical, all records in the individual level
files should have a corresponding household in the
household-level file
 The number of records in each file should be verified
 Data from all sections of the questionnaire should be
included in the dataset
===>setting up verification rules to check data sets
Regional Seminar on Census Data Archiving for Africa,
Addis Ababa, Ethiopia, 20-23 September 2011
Data security

Physical security
 Controlling access to rooms where data are held
 Logging the removal of and access to media or hard copy
material in store rooms
 Network security
 Not storing confidential data on servers or computers
connected to an external network
 Firewall protection and security-related upgrades to avoid
viruses and malicious code
Regional Seminar on Census Data Archiving for Africa,
Addis Ababa, Ethiopia, 20-23 September 2011
Data security
 Security of computer systems and files
 Locking computer systems with password and installing a
firewall system
 Implementing password protection of, and controlled access
to, data files
 Protecting servers by power surge protection systems
through line-interactive uninterruptible power supply (UPS)
systems
 Imposing non-disclosure agreements for managers or users
of confidential data
Regional Seminar on Census Data Archiving for Africa,
Addis Ababa, Ethiopia, 20-23 September 2011
Data security
 Security of personal data
 Anonymising or aggregating data
 Separating data content according to security needs
 Removing personal information from data files and storing
them separately
Regional Seminar on Census Data Archiving for Africa,
Addis Ababa, Ethiopia, 20-23 September 2011
Tools for archiving microdata
 International Household Network Survey (IHSN)
 A network of international agencies coordinated by World
Bank/PARIS21
 Develop tools, guidelines and training materials
 Advocate compliance with good practices and international
standards
Regional Seminar on Census Data Archiving for Africa,
Addis Ababa, Ethiopia, 20-23 September 2011
Tools for archiving microdata
 Redatam based IMIS
 Originally developed at CELADE to promote acess to census
microdata
 It is a database management tool that manages large
volumes of census data
 Aims to promote access to and analysis of census and other
data for informed decision making for sectoral and local
development policies and programmes
Regional Seminar on Census Data Archiving for Africa,
Addis Ababa, Ethiopia, 20-23 September 2011
Risks of disseminating microdata
 Maintaining respondents’ trust: confidentiality
protection is the key element of trust
 Potential misuse and misunderstanding of data by
users: there should be procedures to prevent misuse of
microdata; good documentation and technical support to
prevent misunderstanding of microdata
 Exposure to criticism and contradiction: data quality
may not be good enough for further dissemination; there
may be inconsistency between research results based on
microdata and published aggregated data
Regional Seminar on Census Data Archiving for Africa,
Addis Ababa, Ethiopia, 20-23 September 2011
Risks of disseminating microdata
 Legal issues: it is crucial for data procedures to ensure
there is a sound legal and ethnical base (as well as the
technical and methodological tools) for protecting
confidentiality
 Costs: these will include not only the costs of creating
and documenting microdata files, but the costs of creating
access tools and safeguards, and of supporting and
authorizing enquiries made by research community,
training and support to new users of microdata files
 Technical capacity: the files need to be welldocumented and preserved; be reviewed to identify the
risk of disclosure of individual information and the risk
reduced using various techniques
Regional Seminar on Census Data Archiving for Africa,
Addis Ababa, Ethiopia, 20-23 September 2011
Microdata is archived:
“to allow future users to retrieve, access,
decipher, view, interpret, understand and
experience documents, data and records in
meaningful and valid ways” Jeff Rothernberg
“ to create institutional memory for long term
researches”
Regional Seminar on Census Data Archiving for Africa,
Addis Ababa, Ethiopia, 20-23 September 2011
THANK YOU …..
Regional Seminar on Census Data Archiving for Africa,
Addis Ababa, Ethiopia, 20-23 September 2011
Download