a centre of expertise in data curation and preservation Appraise and Select Tuesday 7 October 2008 Ross Harvey Graduate School of Library & Information Science Simmons College, Boston ross.harvey@simmons.edu Funded by: This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 2.5 UK: Scotland License. To view a copy of this license, visit http://creativecommons.org/licenses/by-ncsa/2.5/scotland/ ; or, (b) send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA. Digital Curation 101, October 6th-10th, 2008, NeSC, Edinburgh a centre of expertise in data curation and preservation Topics • • • • • • • • • • Introduction Why appraise and select? Who does appraisal and selection? What data do we want to keep? Drivers for keeping data Appraisal and selection policies Appraisal tools Reappraisal and disposal Key activities in appraisal and selection The next stage: Ingest Appraise & Select a centre of expertise in data curation and preservation Introduction • Learning outcomes of lecture and exercise: 1. An awareness of appraisal techniques and why they are needed 2. An appreciation of how appraisal techniques are applied in digital curation 3. A practical understanding of how to align selection criteria with organisational goals Appraise & Select a centre of expertise in data curation and preservation Introduction: definitions • Appraisal and selection: what are they? • Appraisal (a term originating in archival science) is "the process of evaluating records to determine which are to be retained as archives, which are to be kept for specified periods and which are to be destroyed” (Ellis, J. (1993) (ed.). "Keeping Archives" 2nd edn (Melbourne: Australian Society of Archivists) p.461) • Selection is a more general term, usually applied when deciding what will be added to a repository Appraise & Select a centre of expertise in data curation and preservation Introduction: the Curation Lifecycle • Appraise and Select and the Curation Lifecycle • • • Sequential action Follows Create or Receive Activities: • • Evaluate data & select for longterm curation & preservation Adhere to documented guidance, policies, legal requirements Appraise & Select QuickTime™ and a decompressor are needed to see this picture. a centre of expertise in data curation and preservation Introduction: key questions 1. What data do we want to keep for the future? 2. How do we decide what are likely to be useful? 3. How long should we plan to keep them? 4. Do we want them to be fully functional (for example, all linked data is also available), and to what extent, in the future? Appraise & Select a centre of expertise in data curation and preservation Why appraise and select? • The argument for selection and appraisal • Exponential growth in data - too much to curate effectively • Resources available for curation are limited • High costs, limited effectiveness of solutions such as digital archaeology or reliance on information retrieval • Therefore, high-quality curation requires selection & appraisal • The argument against: • Bias cannot be avoided (appraisal as 'an evil necessity’) • We can’t know who the future users of the data will be • We can keep everything • Costs of storage are falling • Our increasing ability to access large quantities of data Appraise & Select a centre of expertise in data curation and preservation Who does appraisal & selection? • Information professionals/data curators • Develop selection policies, guidelines for appraisal • Liaise with creators and depositors to ensure datasets are in the best shape to ensure preservability • Locate sufficient resources (funding, staff, technical infrastructure) to ensure effective appraisal is possible • Data creators • Ensure that datasets created have sufficient metadata and documentation • Use 'curation-ready’ formats (usually open-source) • Have a clear idea of what data are vital/important/minor Appraise & Select a centre of expertise in data curation and preservation Who does appraisal & selection? (2) • Input from stakeholders to develop appraisal policy, criteria • Designated Community - the people who will understand and use that data in the future “An identified group of potential Consumers who should be able to understand a particular set of information” (Source: OAIS Reference Model) • What will (might?) the designated community consider sufficient in the future • For instance, will it be sufficient to keep just the information content of a database, but not its functionality (e.g. the ability to search and manipulate its contents)? Appraise & Select a centre of expertise in data curation and preservation What data do we want to keep? • Macro-level • ‘Mission-critical’ data - what must you keep to ensure your research project is successful? • Guidance from appraisal guidelines, data audit frameworks • Micro-level • The data (digital objects) identified by appraisal guidelines or data audits PLUS relevant Description and Representation Information - the information about the data that is needed to make it understandable in the future Appraise & Select a centre of expertise in data curation and preservation Drivers for keeping data • With reference to your research project • Are these data necessary for the research project to continue? (Data Audit Framework) • Mandated drivers • Does your funder require you to deposit data? • Does your employer require you to deposit data in an institutional repository? • Longer-term drivers • Are your data useful to other researchers? Will they be used by others? Appraise & Select a centre of expertise in data curation and preservation Appraisal and selection policies Assist with answering questions such as: • • • • • • • • Does the data or record fit into a repository's selection policy? Who will or might use the data or record in the future? (Is there a defined 'designated community'?) Is it economically feasible to keep the data or record? Can acceptable legal and intellectual property rights, to keep and reuse the data, be negotiated? Is there a legal requirement to keep the data (and make it accessible) for a certain period of time? Does the data constitute the 'vital records' of a project or organisation and therefore need to be retained indefinitely? Is it both technically feasible and worthwhile in cost/benefit terms to preserve the data or record? Does sufficient documentation and metadata exist to explain the character, and enable the discovery of the data or record? Appraise & Select a centre of expertise in data curation and preservation Appraisal and selection policies (2) Data Audit Framework’s classification questions • • • • • Is this data central to your research? Will the data be useful in the future? Are you the intellectual owner? Is it documented and in a sustainable form? Is it already being preserved elsewhere? Appraise & Select a centre of expertise in data curation and preservation Appraisal and selection policies (3) Example: DataPASS Appraisal Guidelines 2005 (for social science data) http://www.icpsr.umich.edu/DATAPASS/pdf/appraisal.pdf • • • • • • • • How significant are the data for research? How significant is the source and scientific progress and society? Is the information unique? How usable are the data? What is the timeframe covered by the information? Are the data related to other data in the archives? What are the cost considerations for long-term maintenance of the data? What is the volume of data? Appraise & Select a centre of expertise in data curation and preservation Appraisal and selection policies (4) • Epidemiological data sets: possible retention criteria (Lord & Macdonald JISC E-science Curation Report 2003, p.46) • • • • • • • • • The nature of the questions being asked by the study Whether it addresses only one question or many Whether the question has been asked before The richness of the data set If it is a longitudinal study – ‘indicates an amber light’ Sample-related studies Stability of the measures used Possibility to go back to the population (e.g. for consent, ethical committee access) Uniqueness; value for possible future comparisons Appraise & Select a centre of expertise in data curation and preservation Appraisal tools • DAF (Data Audit Framework) toolkit • Will be described and applied in the exercise following this lecture • EROS (Earth Resources Observation & Science) http://eros.usgs.gov/government/ratool/ • This tool assists the USGS in appraising record collections that are offered to, or sought by, the USGS. • ‘We should be expending our resources on the data we most value. Determining that value requires us to make judgments, but utilizing a repeatable and comprehensive scheme can allow us to judge data responsibly. Documenting those judgments is essential, because future generations will depend on the current scientists and records managers to preserve the data that will “advance knowledge”’ (J. Faundeen, USGS) Appraise & Select a centre of expertise in data curation and preservation Appraise & Select a centre of expertise in data curation and preservation How long do we keep the data? • The Curation Lifecycle also includes two relevant Sequential Actions • Reappraise • Dispose QuickTime™ and a decompressor are needed to see this picture. Appraise & Select a centre of expertise in data curation and preservation Reappraisal and disposal • Reappraise - appraisal outcomes change as requirements and needs change • Initial appraisal could result in most datasets being kept • Criteria for reappraisal are developed • Reappraisal tests the dataset at defined intervals against these criteria to decide whether it still meets the conditions for applying resources to its long-term retention • Dispose - Outcome of appraisal or reappraisal may be a decision not to commit further resources to curating a dataset • In this case these data could be offered to another repository, or destroyed Appraise & Select a centre of expertise in data curation and preservation Key activities in appraisal & selection For data creators / researchers • Have a clear idea of what data are vital/important/minor • Ensure that datasets created have sufficient metadata and documentation • Use 'curation-ready’ formats (usually open-source) Appraise & Select a centre of expertise in data curation and preservation Key activities in appraisal & selection For information professionals / data curators • Develop, document and apply policies about appraisal and selection, including: • Defining the designated community – the people who will use the data in the future • Defining what significant properties of the data to preserve • Deciding how long the data needs to be maintained • Develop appraisal criteria • Determine whether to keep data by evaluating them against the appraisal criteria Appraise & Select a centre of expertise in data curation and preservation The next stage: Ingest • The next sequential action in the Curation Lifecycle (Ingest) investigates the transfer of data to an archive, repository, data centre or other custodian Appraise & Select QuickTime™ and a decompressor are needed to see this picture. a centre of expertise in data curation and preservation Wrap-up • Appraisal and selection are A Good Thing: • Better management of resource limitations by reducing the quantity of data and records maintained • An increased likelihood of economical long-term viability of data and records by reducing the cost of maintaining large quantities of data and records (note that costs of digital preservation are still unclear) • Better curation; for example, creation of adequate metadata for discovery and preservation is expensive • More information: • Digital Curation Centre’s Resource Centre http://www.dcc.ac.uk/resource/ Appraise & Select