Enhanced support for eScience: the role of Digital Libraries Digital Libraries Go eScience, ECDL, Alicante September 2006 Rachel Heery Deputy Director R&D, UKOLN UKOLN is supported by: www.ukoln.ac.uk A centre of expertise in digital informaion management Summary • New modes of scholarship – eScience service portfolio – Emerging eResearch ecology • Infrastructural elements • Data creation and capture • Data curation and preservation • Data citation, discovery and use • Adding value and knowledge extraction Vision 2010 • Richer scholarly communication based on open access to and re-use of scholarly materials • Integrated life-cycle of knowledge from research to learning • Access and re-use of scholarly materials • Added value services on scholarly materials (involving HE and commercial sectors) More repositories and more content! • Working papers, primary data, audiovisual, images • Hardware in research labs will automatically deposit experimental data • Desktop tools will deposit content • Rich data flow between networks of repositories • Rich data flows between repositories and other components in information landscape • National and institutional preservation strategies in place! Repositories interworking with other eResearch components Experimental equipment Authoring tools Name authority services Field study capture tools Terminology services repository Repositories Content Packaging tools Where are we now? Scholarship today? OA landscape http://www.flickr.com/photos/97797311@N00/61648107/ 23 June 2006 Architecture of Participation? Reference datasets as infrastructure? Datacentric 2020 vision New forms of publication: integration of data and journals Emerging ecology Defining workflows and dataflows • Analyse roles and interactions within and beween repositories • What does the user want? • Identify and define services – Potential for ‘shared services’, re-use of services • Explore potential dataflows – Aggregation, data exchange, metadata extraction and enhancement Dataflows and Workflows • How is primary research data captured in faculty and academic departments? • Where and how is primary research data stored? Made accessible? • What are processes for deriving further data and how is this is structured and stored? Made accessible? • How is data curated for the long term? Understanding the research process • Project StORe: Source-to-Output Repositories (Edinburgh) – Primary data : research publications – Survey questionnaire • RepoMMan: Repository Metadata and Management (Hull) – Survey questionnaire and interviews – Activity diagram and workflow • DCC SCARP – Curation staff working within research teams Repository ecology Experimental machine Authoring tool Departmental repository Laboratory repository Institutional Repository Institutional research system Aggregators: OAIster, Google Regional, national Data Centres Research council repositories Text mining tools Terminology services Subject repositories Learned society repositories Digital libraries & eScience Infrastructure Data capture Digital repositories, OA & preservation • Long-term access: trust, responsibility, policy • Trusted DR Audit Checklist for Certification Draft Research Libraries Group-NARA Taskforce 2005 • Defined criteria under 4 categories – – – – Organisation Functions, processes & procedures Designated community & usability Technologies & technical infrastructure • UK Digital Curation Centre: advice, tools & services http://www.dcc.ac.uk/ • RepInfo Registry • EU CASPAR Integrated Project http://www.casparpreserves.info/pages/1/index.htm • Task Force on the Permanent Access to the Records of Science http://tfpa.kb.nl/ Data, metadata and discovery • Validation, publication & discovery of data models & schema • Metadata packaging standards – METS, MPEG 21 DIDL – Complex object model? • Semantic descriptions – Formal high-level and domain ontologies – Inter-disciplinary discovery • ePrints DC Application Profile • UK Intute IR search service (eprints) • Informal social network approaches “folksonomies” • What data models and metadata schema are in place? • Have librarians been involved in their development? Persistent identifiers for data citation • How will they be used? We need use cases: depositor, author, service provider, researcher, publisher? • Schemes: DOI, Handle, ARK, PURL • Publication & citation of scientific primary data project National Library for Science & Technology (TIB), University of Hanover, Germany. STD-DOI Project DOI registry for datasets http://www.std-doi.de • What persistent identifiers have been assigned to your data? • Is there a data citation policy? • Was the Library involved? Adding value: repository services • Tools: for deposit, normalisation, manipulation, transformation….. • Linking, annotation, visualisation • Aggregators: generic, (sub-) disciplinary Knowledge extraction: • Mining (data, text, structures) • Modelling (economic, climate, mathematical, biological…) • Analysis (statistical, lexical, gene….) Is your data OA? How is your data being used and re-used? NaCTeM http://www.nactem.ac.uk/ Emerging tools: TerMine, GENIA, Cafetiere Nature 23 March 2006 OTMI: Open Text Mining Interface A Case Study in Crystallography Data capture R4L Deposit scenario (…part of….) 1. 2. 3. 4. 5. 6. 7. 8. Produce strategy for synthesis (=idea) Submit plan to SmartTea system (incl. identifiers) Retrieve and follow instructions (sub-workflow?) Experimental synthesis metadata automatically recorded on instruments (Smart Lab) Create record for synthesised sample (+ proposed chemical identifier) in R4L laboratory data management system Run spectral analyses on sample capturing further analysis metadata (incl. time-stamp, analysis software version, researcher details etc.) Save spectrum in native and common formats Invoke R4L data capture service and deposit files + metadata in laboratory repository… RAW DATA DERIVED DATA RESULTS DATA eBank UK Project http://www.ukoln.ac.uk/projects/ebank-uk/ • Promote open access crystallography data • Aggregator service harvests OAI metadata from institutional data repository (e-Crystals archive) • Service linking from data to derived research publication • Embedding eBank service in learning workflows: pedagogy • Future federation plans for crystallography data repositories UKOLN (lead), University of Southampton, University of Manchester A data repository entry ecrystals.chem.soton.ac.uk Access to the underlying data: complex objects eBank Metadata Publication • Using simple Dublin Core • Crystal structure • Title (Systematic IUPAC Name) • Authors • Affiliation • Creation Date • Additional chemical information through Qualified Dublin Core • Empirical formula • International Chemical Identifier InChI • Compound Class & Keywords • Specifies which ‘datasets’ are present in an entry • Application Profile http://www.ukoln.ac.uk/projects/ebank-uk/schemas/ • DOIs from TIB http://dx.doi.org/10 .1594/ecrystals.chem.soton.ac.uk/145 • Data citation policy http://ecrystals.chem.soton.ac.uk/rights.html Discovering data: • Domain identifier: International Chemical Identifier (INChI) code • Google molecule using INChI Slide from Simon Coles Coles, S.J., Day, N.E., Murray-Rust, P., Rzepa, H.S., Zhang, Y., Org. Biomol. Chem., 2005, (10),1832-1834. DOI: 10.1039/b502828k Adding value: eBank linking data to publications Linking research to learning - embedding eBank aggregator service in a science portal for student learners Integration into the curriculum and eLearning workflows • MChem course • Assess role in Undergraduate Chemical Informatics courses • Pedagogic evaluation • April – June 2006 • Report to follow. Roles & responsibilities: new challenges? Workforce development and capacity building • NSF Draft Report 2005 “Data scientist” - hybrid skills • Facilitate collaboration – “Multidisciplinary teams: computer scientists, domain scientists, digital library experts, statisticians/modellers e.g. eBank project – Lessons learnt: e-Science Human Factors Audit Report (to be published 2006) Roy Kawalsky, Loughborough • CURL/SCONUL e-Research Taskforce Has your (digital) library engaged with the e-Research agenda? Repositories roadmap :vision 2010 • Richer scholarly communication based on open access to and re-use of scholarly materials • Integrated life-cycle of knowledge from research to learning • Available metadata about scholarly materials • Added value services on scholarly materials (involving HE and commercial sectors) More repositories and more content! • Working papers, primary data, audiovisual, images • Hardware in research labs will automatically deposit experimental data • Desktop tools will deposit content • Rich data flow between networks of repositories • Rich data flows between repositories and other components in information landscape • National and institutional preservation strategies in place! Repository interworking with other components Virtual Learning Environment Authoring tool Name authority service Institutional research system Automated classification service repository Repository Packaging tool Where are we now? Scholarship today? OA landscape Repository ecology Experimental machine Authoring tool Departmental repository Laboratory repository Institutional Repository Institutional research system Aggregators: OAIster, Google Regional, national Data Centres Research council repositories Text mining tools Terminology services Subject repositories Learned society repositories Defining workflows and dataflows • Analyse roles and interactions within and beween repositories • What does the user want? • Identify and define services – Potential for ‘shared services’, re-use of services – In context of JISC e-Framework • Explore potential dataflows – Aggregation, data exchange, metadata extraction and enhancement Deposit a priority! • To enable users to populate repositories simply, effectively and preferably automatically • To capture content from desktop applications, experimental equipment (smart labs), learning content development tools etc • To enable repository of deposit to exchange data with further repositories in predictable manner • To hide complexity from end-user • To be compatible with follow-on added value services layered on repository content • Deposit API Working group meeting July 11/12, Warwick http://www.ukoln.ac.uk/repositories/digirep/ Thank you!