Appraise and Select Ross Harvey Tuesday 7 October 2008

advertisement
a centre of expertise in data curation and preservation
Appraise and Select
Tuesday 7 October 2008
Ross Harvey
Graduate School of Library & Information Science
Simmons College, Boston
ross.harvey@simmons.edu
Funded by:
This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 2.5 UK:
Scotland License. To view a copy of this license, visit http://creativecommons.org/licenses/by-ncsa/2.5/scotland/ ; or, (b) send a letter to Creative Commons, 543 Howard Street, 5th Floor, San
Francisco, California, 94105, USA.
Digital Curation 101, October 6th-10th, 2008, NeSC, Edinburgh
a centre of expertise in data curation and preservation
Topics
•
•
•
•
•
•
•
•
•
•
Introduction
Why appraise and select?
Who does appraisal and selection?
What data do we want to keep?
Drivers for keeping data
Appraisal and selection policies
Appraisal tools
Reappraisal and disposal
Key activities in appraisal and selection
The next stage: Ingest
Appraise & Select
a centre of expertise in data curation and preservation
Introduction
•
Learning outcomes of lecture and exercise:
1. An awareness of appraisal techniques and why
they are needed
2. An appreciation of how appraisal techniques are
applied in digital curation
3. A practical understanding of how to align
selection criteria with organisational goals
Appraise & Select
a centre of expertise in data curation and preservation
Introduction: definitions
•
Appraisal and selection: what are they?
•
Appraisal (a term originating in archival science) is
"the process of evaluating records to determine
which are to be retained as archives, which are to
be kept for specified periods and which are to be
destroyed” (Ellis, J. (1993) (ed.). "Keeping Archives" 2nd edn
(Melbourne: Australian Society of Archivists) p.461)
•
Selection is a more general term, usually applied
when deciding what will be added to a repository
Appraise & Select
a centre of expertise in data curation and preservation
Introduction: the Curation Lifecycle
•
Appraise and Select and
the Curation Lifecycle
•
•
•
Sequential action
Follows Create or Receive
Activities:
•
•
Evaluate data & select for longterm curation & preservation
Adhere to documented
guidance, policies, legal
requirements
Appraise & Select
QuickTime™ and a
decompressor
are needed to see this picture.
a centre of expertise in data curation and preservation
Introduction: key questions
1. What data do we want to keep for the future?
2. How do we decide what are likely to be
useful?
3. How long should we plan to keep them?
4. Do we want them to be fully functional (for
example, all linked data is also available), and
to what extent, in the future?
Appraise & Select
a centre of expertise in data curation and preservation
Why appraise and select?
•
The argument for selection and appraisal
• Exponential growth in data - too much to curate effectively
• Resources available for curation are limited
• High costs, limited effectiveness of solutions such as digital
archaeology or reliance on information retrieval
• Therefore, high-quality curation requires selection & appraisal
• The argument against:
• Bias cannot be avoided (appraisal as 'an evil necessity’)
• We can’t know who the future users of the data will be
• We can keep everything
• Costs of storage are falling
• Our increasing ability to access large quantities of data
Appraise & Select
a centre of expertise in data curation and preservation
Who does appraisal & selection?
• Information professionals/data curators
• Develop selection policies, guidelines for appraisal
• Liaise with creators and depositors to ensure datasets are in
the best shape to ensure preservability
• Locate sufficient resources (funding, staff, technical
infrastructure) to ensure effective appraisal is possible
• Data creators
• Ensure that datasets created have sufficient metadata and
documentation
• Use 'curation-ready’ formats (usually open-source)
• Have a clear idea of what data are vital/important/minor
Appraise & Select
a centre of expertise in data curation and preservation
Who does appraisal & selection? (2)
• Input from stakeholders to develop appraisal
policy, criteria
• Designated Community - the people who will
understand and use that data in the future
“An identified group of potential Consumers who should be able
to understand a particular set of information” (Source: OAIS
Reference Model)
• What will (might?) the designated community
consider sufficient in the future
• For instance, will it be sufficient to keep just the
information content of a database, but not its functionality
(e.g. the ability to search and manipulate its contents)?
Appraise & Select
a centre of expertise in data curation and preservation
What data do we want to keep?
• Macro-level
• ‘Mission-critical’ data - what must you keep to
ensure your research project is successful?
• Guidance from appraisal guidelines, data audit
frameworks
• Micro-level
• The data (digital objects) identified by appraisal
guidelines or data audits PLUS relevant
Description and Representation Information - the
information about the data that is needed to make
it understandable in the future
Appraise & Select
a centre of expertise in data curation and preservation
Drivers for keeping data
• With reference to your research project
• Are these data necessary for the research project
to continue? (Data Audit Framework)
• Mandated drivers
• Does your funder require you to deposit data?
• Does your employer require you to deposit data in
an institutional repository?
• Longer-term drivers
• Are your data useful to other researchers? Will
they be used by others?
Appraise & Select
a centre of expertise in data curation and preservation
Appraisal and selection policies
Assist with answering questions such as:
•
•
•
•
•
•
•
•
Does the data or record fit into a repository's selection policy?
Who will or might use the data or record in the future? (Is there a
defined 'designated community'?)
Is it economically feasible to keep the data or record?
Can acceptable legal and intellectual property rights, to keep and reuse the data, be negotiated?
Is there a legal requirement to keep the data (and make it accessible)
for a certain period of time?
Does the data constitute the 'vital records' of a project or organisation
and therefore need to be retained indefinitely?
Is it both technically feasible and worthwhile in cost/benefit terms to
preserve the data or record?
Does sufficient documentation and metadata exist to explain the
character, and enable the discovery of the data or record?
Appraise & Select
a centre of expertise in data curation and preservation
Appraisal and selection policies (2)
Data Audit Framework’s classification questions
•
•
•
•
•
Is this data central to your research?
Will the data be useful in the future?
Are you the intellectual owner?
Is it documented and in a sustainable form?
Is it already being preserved elsewhere?
Appraise & Select
a centre of expertise in data curation and preservation
Appraisal and selection policies (3)
Example: DataPASS Appraisal Guidelines 2005 (for
social science data) http://www.icpsr.umich.edu/DATAPASS/pdf/appraisal.pdf
•
•
•
•
•
•
•
•
How significant are the data for research?
How significant is the source and scientific progress and
society?
Is the information unique?
How usable are the data?
What is the timeframe covered by the information?
Are the data related to other data in the archives?
What are the cost considerations for long-term maintenance of
the data?
What is the volume of data?
Appraise & Select
a centre of expertise in data curation and preservation
Appraisal and selection policies (4)
• Epidemiological data sets: possible retention
criteria (Lord & Macdonald JISC E-science Curation Report 2003, p.46)
•
•
•
•
•
•
•
•
•
The nature of the questions being asked by the study
Whether it addresses only one question or many
Whether the question has been asked before
The richness of the data set
If it is a longitudinal study – ‘indicates an amber light’
Sample-related studies
Stability of the measures used
Possibility to go back to the population (e.g. for consent,
ethical committee access)
Uniqueness; value for possible future comparisons
Appraise & Select
a centre of expertise in data curation and preservation
Appraisal tools
• DAF (Data Audit Framework) toolkit
• Will be described and applied in the exercise following this
lecture
• EROS (Earth Resources Observation & Science)
http://eros.usgs.gov/government/ratool/
• This tool assists the USGS in appraising record collections
that are offered to, or sought by, the USGS.
• ‘We should be expending our resources on the data we most value.
Determining that value requires us to make judgments, but utilizing a
repeatable and comprehensive scheme can allow us to judge data
responsibly. Documenting those judgments is essential, because future
generations will depend on the current scientists and records managers
to preserve the data that will “advance knowledge”’ (J. Faundeen,
USGS)
Appraise & Select
a centre of expertise in data curation and preservation
Appraise & Select
a centre of expertise in data curation and preservation
How long do we keep the data?
• The Curation Lifecycle also includes two
relevant Sequential
Actions
• Reappraise
• Dispose
QuickTime™ and a
decompressor
are needed to see this picture.
Appraise & Select
a centre of expertise in data curation and preservation
Reappraisal and disposal
• Reappraise - appraisal outcomes change as
requirements and needs change
• Initial appraisal could result in most datasets being kept
• Criteria for reappraisal are developed
• Reappraisal tests the dataset at defined intervals against
these criteria to decide whether it still meets the conditions
for applying resources to its long-term retention
• Dispose - Outcome of appraisal or reappraisal may be a
decision not to commit further resources to curating a dataset
• In this case these data could be offered to another
repository, or destroyed
Appraise & Select
a centre of expertise in data curation and preservation
Key activities in appraisal & selection
For data creators / researchers
• Have a clear idea of what data are
vital/important/minor
• Ensure that datasets created have sufficient
metadata and documentation
• Use 'curation-ready’ formats (usually open-source)
Appraise & Select
a centre of expertise in data curation and preservation
Key activities in appraisal & selection
For information professionals / data curators
• Develop, document and apply policies about
appraisal and selection, including:
• Defining the designated community – the people who will
use the data in the future
• Defining what significant properties of the data to preserve
• Deciding how long the data needs to be maintained
• Develop appraisal criteria
• Determine whether to keep data by evaluating them
against the appraisal criteria
Appraise & Select
a centre of expertise in data curation and preservation
The next stage: Ingest
• The next sequential
action in the Curation
Lifecycle (Ingest)
investigates the
transfer of data to an
archive, repository,
data centre or other
custodian
Appraise & Select
QuickTime™ and a
decompressor
are needed to see this picture.
a centre of expertise in data curation and preservation
Wrap-up
• Appraisal and selection are A Good Thing:
• Better management of resource limitations by reducing the
quantity of data and records maintained
• An increased likelihood of economical long-term viability of
data and records by reducing the cost of maintaining large
quantities of data and records (note that costs of digital
preservation are still unclear)
• Better curation; for example, creation of adequate metadata
for discovery and preservation is expensive
• More information:
• Digital Curation Centre’s Resource Centre
http://www.dcc.ac.uk/resource/
Appraise & Select
Download