Introduction to Census Archiving Session 3 United Nations Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, 20-23 September, 2011 What is data archiving? Data archiving refers to the long-term storage of electronic data and its affiliated documentation (data collection instruments, study methods, descriptive information for variables and files) Archiving is not merely making copies of data Archiving/preservation is keeping digital data alive and accessible for long-term use Involves adding value to and enhancing the usefulness of the data for secondary analysis Therefore, archiving should be planned and undertaken with view point of potential long-term uses and users of the data United Nations Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, 20-23 September, 2011 Why archive the data? A: Responsibility of data producer NSOs generate vast amounts of data which represent significant investment by country and have considerable value for present and future data uses But, data dissemination (hence utilization) generally weakest part of census operation o Dissemination of micro-data even more limited Data producers have responsibility not only to collect and disseminate data but also to ensure their long-term usability through proper archiving and preservation Archiving is not just safe-keeping of data, but to promote and disseminate statistics as widely and effectively as possible United Nations Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, 20-23 September, 2011 Why archive the data? B: Need to preserve the data for long-term usage Without proper archiving valuable data could be lost over time due to loss, obsolescence of technology, or irreversible damage Note: challenges of maintaining digital content over successive generations of technology Archiving makes it possible for researchers to conduct secondary analyses on the data and extension of original findings United Nations Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, 20-23 September, 2011 What does archiving involve? Archiving encompasses a broad range of activities beyond data collection - it covers the entire data lifecycle Data archiving is a process and not an end state Important, therefore, to develop a plan Start process early (before data collection) and include preservation of accurate metadata in the plan For the long-term, require formulating an archiving and preservation policy as part of overall strategy for data sharing and utilization United Nations Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, 20-23 September, 2011 Core attributes for a data archiving programme An institutional strategy for archiving should encompass: Organizational infrastructure: defines how much data will be archived (and shared) Technological infrastructure: defines how the data will be archived Resources: defines how much it will cost to archive the data Core attributes should be balanced and work in concert in order for the program to succeed (e.g., archiving is not just IT-related but needs also proper organizational arrangements and adequate resources) United Nations Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, 20-23 September, 2011 Core attributes for a data archiving programme (contd.) o o Understand process and what’s required in terms of steps for developing an archiving system Strategy should be developed taking into account needs and requirements of individual data producers but complying with standards of good practice within the digital preservation community United Nations Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, 20-23 September, 2011 Organizational infrastructure Archiving programme exists within an organizational context and as such must fit the needs, priorities, and resources of that organization Strong, well planned and sustainable organizational setting is crucial for long-term success of archiving programme Policies and procedures o Policies reflect commitments and decisions regarding data archiving and sharing o Mission statement of NSO that encompasses data archiving and sharing o Legal frame work defines mandate and may cover data archiving and documentation, sharing and dissemination, particularly of micro-data o Development of clear policy and strategy that explicitly address archiving commitments and decisions are important for success o Written policy and procedures help to define responsibilities and requirements for archiving, sharing and access United Nations Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, 20-23 September, 2011 Technological infrastructure Technology (technological and procedural suitability and system security) is a means to achieving archiving objectives, i.e., it’s an enabler o o o o o Equipment Software Hardware Secure environment Staffing Technology enables organization to meet defined requirements for archiving – should be suitable for the purpose Decisions on technological infrastructure center on whether to build, extend, collaborate, or out-source United Nations Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, 20-23 September, 2011 Technological infrastructure (contd.) Technology is not static - organization to actively monitor technological developments and systematically consider potential implications for archiving programme (software and hardware) The infrastructure should plan for technological changes and their attendant costs As digital storage media (formats and physical storage) media will ultimately become absolete: store multiple copies on different storage media and different file formats Need to develop appropriate data migration procedures Important to maintain a flexible preservation system that evolves to meet demands of changing technology and new and increasing user expectations United Nations Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, 20-23 September, 2011 Resource framework Organizational and technological infrastructures are not sustainable without an on-going commitment of resources Organization should identify appropriate resources to develop and maintain archiving programme Resources should cover start-up, on-going and contingency funding Sustainable funding should be designated for cost of staff, equipment, software, and storage costs - to demonstrate the organization’s commitment to programme United Nations Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, 20-23 September, 2011 Major activities in archiving of micro-data Documentation Annonymization Dissemination Discussed in brief with details provided in subsequent sessions United Nations Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, 20-23 September, 2011 Role of documentation Documentation explains how the data were collected, their content and structure and any manipulation that may have taken place Documentation is required in order to understand and interpret the data by providing a context Without proper documentation, data are useless Some items for documentation: Context of data collection Data collection methods Data validation, cleaning and quality assurance procedures Descriptions of variables Definitions of codes Definitions of scientific terminology Treatment of missing values Also allows reuse of documents for future surveys Metadata categorized according to international standards (ISO, DDI, Dublin Core) United Nations Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, 20-23 September, 2011 Anonymization Dissemination of micro-data posses challenges for protection of data confidentiality Appropriate procedures should be developed to measure disclosure risks Develop appropriate procedures for anonymization of data so as to reduce risk of disclosure of confidential information United Nations Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, 20-23 September, 2011 Dissemination Benefits of disseminating micro-data (broadening use of existing data and increasing return on data collection investments) BUT there are also associated costs and risks (financial cost, exposure to criticism, disclosure of confidential information) Benefits have to be weighed against ethical and legal obligations to keep respondent information confidential Design dissemination strategy to provide access to micro-data while implementing procedures to guard against breach of data confidentiality Dissemination strategy should include determination of types of outputs, access methods, procedures for terms of use of the data, as well as possible penalties for breaching terms of use, etc. United Nations Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, 20-23 September, 2011 Challenges to implementation of an archiving programme Resource constraints Lack of specific laws/regulations and policies for archiving Lack of institutional commitment Inadequate IT infrastructure Lack of appropriate skilled personnel Implementing data anonymization procedures Lost data in original census databases No general guidelines for choosing a suitable platform to archive census data Lack of mechanism to cross check the quality of archived microdata Integration and harmonization of data from successive censuses United Nations Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, 20-23 September, 2011 Thank You! United Nations Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, 20-23 September, 2011