RDA Interest Group Proposal: “Data in Context” 1. Background Research is a global ecosystem employing multiple stakeholders. The UK Digital Curation Center (DCC) developed a life-cycle approach to manage data – encompassing both a linear sequence of work stages as well as the cyclic repetition of specific tasks. It recognizes conceptualization, creation, access and usage, appraisal and selection, disposal, ingestion, preservation, reappraisal, storage, access and reuse, transformation as the elements to constitute this cycle1. Behind each element, we anticipate at least a data producer and a data consumer to have a need or a problem to solve, e.g. a requirement; to hence define the boundaries of a particular context or request within the data management life-cycle. Furthermore, we anticipate the need for documentation of the evolution of the data asset behind each element through re-usable contextual ‘profiles’ applicable upon various datasets if standardized, i.e. constituted by (1) standardized open vocabularies, upon (2) standardized formal data profiles and (3) standardized formal semantics. A profile will (a) assist in retrieval precision and recall (including disambiguation); (b) provide information to assist the user in judging the relevance and quality of the data asset; (c) provide additional information that may be used in processing the dataset; (d) provide information that assists in explaining the dataset (purpose, relationship to persons, organisations, facilities, equipment, publications etc). With this working group for each of the three dimensions, we want to investigate: - contextually or subcontextually-aware standardization work2; re-usable priority requirements3; in order to deliver: - an overview of contextually aware standardization work; a priority list of data management requirements for contextual metadata leading to a defined programme of work. The results of the investigation will support with preparations towards the constitution of a RDA Working Group to implement the selected priority requirements, i.e. ‘profiles’ in a standardized way at all three levels, which in a further step may aim at automation of transformation between available formal data standards. The idea has first been presented at the first RDA plenary in Gothenburg (March 2013) towards setting up a working group, but is now first intended to start as a RDA Interest Group. 1 The DCC’s Ditigal Curation Lifecycle: http://www.dcc.ac.uk/digital-curation/what-digital-curation 2 e.g. CASRAI [1], CERIF [2], VIVO [3], PROV [4], PREMIS [5], MARC [6], CKAN [7], DCAT [8], etc. 3 e.g. in EU, US, ... South Americas, etc. (e.g. from Horizon2020, NIH, NSF, etc. funding programmes); The idea is to strongly collaborate and exchange ideas with related Working Groups in a coordinated way, namely: RDA Data Foundation and Terminology WG: https://www.rdalliance.org/working-groups/data-foundation-and-terminology-wg.html RDA Metadata Standards Directory WG: https://www.rd-alliance.org/workinggroups/metadata-standards-directory-working-group.html RDA PID Information Types WG: https://www.rd-alliance.org/workinggroups/pid-information-types-wg.html ICSU Open Metadata Catalogue and Knowledge Networks WG (as suggested in the Gothenburg plenary): http://www.icsu-wds.org/working-groups/metadatacatalogue-and-knowledge-network Interest Group Chairs Brigitte Jörg, JeiBee, UK, CASRAI, CA, euroCRIS, EU Keith Jeffery, Consultant, UK Initial Membership (Before 1st Plenary – Gothenburg March 2013): To be revisited David Baker, CASRAI, CA Brian Matthews STFC, UK Simon Hodson Jisc, UK Andrea Scharnhorst DANS, NL Daan Broeder Max Planck Institute, NL Angus Whyte DCC, UK Natalia Manola MADIK, GR Nikos Houssos, EKT, GR Peter Mutschke, GESIS, DE Jon Corson-Rikert VIVO, US Paolo Manghi, ISTI, IT Jia Quio, Syracuse University, US Miguel-Angel Sicilia, University of Alcala, ES References: CASRAI – Consortia Advancing Standards in Research Administration Information http://casrai.org/ CERIF – Common European Research Information Format by euroCRIS http://www.eurocris.org/Index.php?page=CERIFreleases&t=1 VIVO – An interdisciplinary network – Enabling collaboration and discovery among scientist across all disciplines http://www.vivoweb.org/ Public Record Office Victoria http://prov.vic.gov.au/government/vers/implementing-vers/standard-2 Preservation Metadata – PREMIS (Library of Congress) http://www.loc.gov/standards/premis/ MARC Standards (Library of Congress) http://www.loc.gov/marc/ CKAN Case Studies http://ckan.org/case-studies/ http://ckan.org/features/metadata/ Data Catalog Vocabulary (DCAT). W3C Working Draft 05 April 2012 http://www.w3.org/TR/vocab-dcat/ ISO – The International Organization for Standardization http://www.iso.org/iso/home.html W3C – World Wide Web Consortium http://www.w3.org/ OMG – Object Management Group http://www.omg.org/ Joan E. Beaudoin. A Framework for Contextual Metadata Used in the Digital Preservation of Cultural Objects. D-Lib Magazine. Vol 11, No. 11/12, December 2012. http://www.dlib.org/dlib/november12/beaudoin/11beaudoin2.html Sue McKemmish, Adrian Cunningham, Dagmar Parer. Metadata Mania. Monash University http://www.infotech.monash.edu.au/research/groups/rcrg/publications/recordkeepin gmetadata-sm01.html Kremers, Horst: Context Spaces and Generalization. Proc., ISGI 2007, 2nd International Symposium on Generalization of Information 2007 117-136, CODATA-Germany. http://www.horst-kremers.de/List_of_Papers_Horst_Kremers_.pdf etc.