Overview of Archiving of Microdata Session 4 United Nations Statistics Division Demographic Statistics Section Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, 20-23 September 2011 Overview of Presentation What are microdata? Why disseminate microdata? Data files for archiving Preparing the data sets Data security Tools for archiving of microdata Risks of disseminating microdata Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, 20-23 September 2011 What are microdata? Microdata: are electronic data files containing the information about each unit of enumeration such as person, household, housing unit are organized data files in which each line (or record) contains information about one unit of observation contain information in the form of coded values contain different types of variables-numeric, alphanumeric, discrete or continuous-obtained from direct responses or derived by imputation/calculation Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, 20-23 September 2011 Why disseminate microdata? Main reason is to support research by offering flexibility to define variables and modify categories in a way to meet the needs of researches to generate more interest which facilitates wider use of census data A closer relationship between data providers and users can improve the reliability and relevance of data Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, 20-23 September 2011 Version of data files for archiving Data procedures often create multiple versions of microdata files. These files; are created during different stages of census operation differ in the quality, content and number of records range from raw microdata files to cleaned and edited files for public use Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, 20-23 September 2011 Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, 20-23 September 2011 What is sensitive in microdata? In order to ensure data confidentiality, census data usually do not contain variables that are direct identifiers Census data sets include variables that are indirect identifiers; Detail geographic information Detail information on professional status Some variables in microdata sets can be sensitive due to the nature of the information contained in them Information on income, ethnicity, religion, etc. Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, 20-23 September 2011 Preparing the data set Acquisition Microdata can be generated from various data sources: censuses, surveys and administrative registers A clear acquisition policy that describes scope, source and mandate for the acquisition of microdata sets is necessary NSO can play an important role by expanding the scope of the data archive to official sources such as line ministries Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, 20-23 September 2011 Preparing the data set Data file Hierarchical/relational files are easier to analyze and more efficient for data storage The identification variables in all data files should provide a unique identifier Unique identifiers to merge data files should be composed of numeric variables for more efficient sorting and filtering of records A unique household identification should not be a compilation of geographic codes since these codes are highly identifying All unnecessary or temporary variables from the data files should be removed Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, 20-23 September 2011 Preparing the data set Variables and codes All variables are labeled (variable labels) and the codes for all categorical variables are labeled (value labels) “Missing” codes should be standardized for all variables “Not applicable” code should be distinct from other missing codes If “errors” or “missing data” imputed, this should be indicated in the data set Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, 20-23 September 2011 Preparing the data set Verification operation If a dataset is hierarchical, all records in the individual level files should have a corresponding household in the household-level file The number of records in each file should be verified Data from all sections of the questionnaire should be included in the dataset ===>setting up verification rules to check data sets Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, 20-23 September 2011 Data security Physical security Controlling access to rooms where data are held Logging the removal of and access to media or hard copy material in store rooms Network security Not storing confidential data on servers or computers connected to an external network Firewall protection and security-related upgrades to avoid viruses and malicious code Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, 20-23 September 2011 Data security Security of computer systems and files Locking computer systems with password and installing a firewall system Implementing password protection of, and controlled access to, data files Protecting servers by power surge protection systems through line-interactive uninterruptible power supply (UPS) systems Imposing non-disclosure agreements for managers or users of confidential data Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, 20-23 September 2011 Data security Security of personal data Anonymising or aggregating data Separating data content according to security needs Removing personal information from data files and storing them separately Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, 20-23 September 2011 Tools for archiving microdata International Household Network Survey (IHSN) A network of international agencies coordinated by World Bank/PARIS21 Develop tools, guidelines and training materials Advocate compliance with good practices and international standards Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, 20-23 September 2011 Tools for archiving microdata Redatam based IMIS Originally developed at CELADE to promote acess to census microdata It is a database management tool that manages large volumes of census data Aims to promote access to and analysis of census and other data for informed decision making for sectoral and local development policies and programmes Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, 20-23 September 2011 Risks of disseminating microdata Maintaining respondents’ trust: confidentiality protection is the key element of trust Potential misuse and misunderstanding of data by users: there should be procedures to prevent misuse of microdata; good documentation and technical support to prevent misunderstanding of microdata Exposure to criticism and contradiction: data quality may not be good enough for further dissemination; there may be inconsistency between research results based on microdata and published aggregated data Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, 20-23 September 2011 Risks of disseminating microdata Legal issues: it is crucial for data procedures to ensure there is a sound legal and ethnical base (as well as the technical and methodological tools) for protecting confidentiality Costs: these will include not only the costs of creating and documenting microdata files, but the costs of creating access tools and safeguards, and of supporting and authorizing enquiries made by research community, training and support to new users of microdata files Technical capacity: the files need to be welldocumented and preserved; be reviewed to identify the risk of disclosure of individual information and the risk reduced using various techniques Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, 20-23 September 2011 Microdata is archived: “to allow future users to retrieve, access, decipher, view, interpret, understand and experience documents, data and records in meaningful and valid ways” Jeff Rothernberg “ to create institutional memory for long term researches” Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, 20-23 September 2011 THANK YOU ….. Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, 20-23 September 2011