Introduction to Census Archiving Session 3

advertisement
Introduction to Census
Archiving
Session 3
United Nations Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia,
20-23 September, 2011
What is data archiving?
 Data archiving refers to the long-term storage of
electronic data and its affiliated documentation (data
collection instruments, study methods, descriptive
information for variables and files)
 Archiving is not merely making copies of data
 Archiving/preservation is keeping digital data alive and
accessible for long-term use
 Involves adding value to and enhancing the usefulness of
the data for secondary analysis
 Therefore, archiving should be planned and undertaken
with view point of potential long-term uses and users of the
data
United Nations Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia,
20-23 September, 2011
Why archive the data?
A: Responsibility of data producer
 NSOs generate vast amounts of data which represent
significant investment by country and have considerable value
for present and future data uses
 But, data dissemination (hence utilization) generally weakest
part of census operation
o Dissemination of micro-data even more limited
 Data producers have responsibility not only to collect and
disseminate data but also to ensure their long-term usability
through proper archiving and preservation
 Archiving is not just safe-keeping of data, but to promote and
disseminate statistics as widely and effectively as possible
United Nations Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia,
20-23 September, 2011
Why archive the data?
B: Need to preserve the data for long-term usage
 Without proper archiving valuable data could be lost over
time due to loss, obsolescence of technology, or
irreversible damage
 Note: challenges of maintaining digital content over
successive generations of technology
 Archiving makes it possible for researchers to conduct
secondary analyses on the data and extension of original
findings
United Nations Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia,
20-23 September, 2011
What does archiving involve?
 Archiving encompasses a broad range of activities beyond
data collection - it covers the entire data lifecycle
 Data archiving is a process and not an end state
 Important, therefore, to develop a plan
 Start process early (before data collection) and include
preservation of accurate metadata in the plan
 For the long-term, require formulating an archiving and
preservation policy as part of overall strategy for data
sharing and utilization
United Nations Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia,
20-23 September, 2011
Core attributes for a data archiving programme
An institutional strategy for archiving should encompass:
 Organizational infrastructure: defines how much data will
be archived (and shared)
 Technological infrastructure: defines how the data will be
archived
 Resources: defines how much it will cost to archive the
data
 Core attributes should be balanced and work in concert in
order for the program to succeed (e.g., archiving is not just
IT-related but needs also proper organizational
arrangements and adequate resources)
United Nations Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia,
20-23 September, 2011
Core attributes for a data archiving programme
(contd.)
o
o
Understand process and what’s required in terms of steps
for developing an archiving system
Strategy should be developed taking into account needs
and requirements of individual data producers but
complying with standards of good practice within the digital
preservation community
United Nations Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia,
20-23 September, 2011
Organizational infrastructure
 Archiving programme exists within an organizational context and as
such must fit the needs, priorities, and resources of that
organization
 Strong, well planned and sustainable organizational setting is
crucial for long-term success of archiving programme
 Policies and procedures
o Policies reflect commitments and decisions regarding data archiving
and sharing
o Mission statement of NSO that encompasses data archiving and sharing
o Legal frame work defines mandate and may cover data archiving and
documentation, sharing and dissemination, particularly of micro-data
o Development of clear policy and strategy that explicitly address
archiving commitments and decisions are important for success
o Written policy and procedures help to define responsibilities and
requirements for archiving, sharing and access
United Nations Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia,
20-23 September, 2011
Technological infrastructure
 Technology (technological and procedural suitability and
system security) is a means to achieving archiving objectives,
i.e., it’s an enabler
o
o
o
o
o
Equipment
Software
Hardware
Secure environment
Staffing
 Technology enables organization to meet defined requirements
for archiving – should be suitable for the purpose
 Decisions on technological infrastructure center on whether to
build, extend, collaborate, or out-source
United Nations Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia,
20-23 September, 2011
Technological infrastructure (contd.)
 Technology is not static - organization to actively monitor
technological developments and systematically consider
potential implications for archiving programme (software and
hardware)
 The infrastructure should plan for technological changes and
their attendant costs
 As digital storage media (formats and physical storage) media
will ultimately become absolete: store multiple copies on
different storage media and different file formats
 Need to develop appropriate data migration procedures
 Important to maintain a flexible preservation system that
evolves to meet demands of changing technology and new and
increasing user expectations
United Nations Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia,
20-23 September, 2011
Resource framework
 Organizational and technological infrastructures are not sustainable
without an on-going commitment of resources
 Organization should identify appropriate resources to develop and
maintain archiving programme
 Resources should cover start-up, on-going and contingency funding
 Sustainable funding should be designated for cost of staff,
equipment, software, and storage costs - to demonstrate the
organization’s commitment to programme
United Nations Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia,
20-23 September, 2011
Major activities in archiving of micro-data
 Documentation
 Annonymization
 Dissemination
Discussed in brief with details provided in subsequent sessions
United Nations Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia,
20-23 September, 2011
Role of documentation
 Documentation explains how the data were collected, their content
and structure and any manipulation that may have taken place
 Documentation is required in order to understand and interpret the
data by providing a context
 Without proper documentation, data are useless
 Some items for documentation:







Context of data collection
Data collection methods
Data validation, cleaning and quality assurance procedures
Descriptions of variables
Definitions of codes
Definitions of scientific terminology
Treatment of missing values
 Also allows reuse of documents for future surveys
 Metadata categorized according to international standards (ISO,
DDI, Dublin Core)
United Nations Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia,
20-23 September, 2011
Anonymization
 Dissemination of micro-data posses challenges for protection of
data confidentiality
 Appropriate procedures should be developed to measure disclosure
risks
 Develop appropriate procedures for anonymization of data so as to
reduce risk of disclosure of confidential information
United Nations Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia,
20-23 September, 2011
Dissemination
 Benefits of disseminating micro-data (broadening use of existing
data and increasing return on data collection investments)
 BUT there are also associated costs and risks (financial cost,
exposure to criticism, disclosure of confidential information)
 Benefits have to be weighed against ethical and legal obligations to
keep respondent information confidential
 Design dissemination strategy to provide access to micro-data
while implementing procedures to guard against breach of data
confidentiality
 Dissemination strategy should include determination of types of
outputs, access methods, procedures for terms of use of the data,
as well as possible penalties for breaching terms of use, etc.
United Nations Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia,
20-23 September, 2011
Challenges to implementation of an archiving
programme
Resource constraints
Lack of specific laws/regulations and policies for archiving
Lack of institutional commitment
Inadequate IT infrastructure
Lack of appropriate skilled personnel
Implementing data anonymization procedures
Lost data in original census databases
No general guidelines for choosing a suitable platform to archive
census data
 Lack of mechanism to cross check the quality of archived microdata
 Integration and harmonization of data from successive censuses








United Nations Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia,
20-23 September, 2011
Thank You!
United Nations Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia,
20-23 September, 2011
Download