Enhanced support for eScience: the role of Digital Libraries

advertisement
Enhanced support for
eScience: the role of Digital
Libraries
Digital Libraries Go eScience, ECDL, Alicante September
2006
Rachel Heery
Deputy Director R&D, UKOLN
UKOLN is supported by:
www.ukoln.ac.uk
A centre of expertise in digital informaion management
Summary
•
New modes of scholarship
–
eScience service portfolio
–
Emerging eResearch ecology
•
Infrastructural elements
•
Data creation and capture
•
Data curation and preservation
•
Data citation, discovery and use
•
Adding value and knowledge extraction
Vision 2010
• Richer scholarly communication based on
open access to and re-use of scholarly
materials
• Integrated life-cycle of knowledge from
research to learning
• Access and re-use of scholarly materials
• Added value services on scholarly
materials (involving HE and commercial
sectors)
More repositories and more content!
• Working papers, primary data, audiovisual,
images
• Hardware in research labs will
automatically deposit experimental data
• Desktop tools will deposit content
• Rich data flow between networks of
repositories
• Rich data flows between repositories and
other components in information landscape
• National and institutional preservation
strategies in place!
Repositories interworking with other
eResearch components
Experimental
equipment
Authoring
tools
Name
authority
services
Field study
capture
tools
Terminology
services
repository
Repositories
Content
Packaging
tools
Where are we now?
Scholarship today?
OA landscape
http://www.flickr.com/photos/97797311@N00/61648107/
23 June 2006
Architecture of Participation?
Reference datasets as infrastructure?
Datacentric
2020
vision
New forms of
publication:
integration
of data and
journals
Emerging ecology
Defining workflows and dataflows
• Analyse roles and interactions within and
beween repositories
• What does the user want?
• Identify and define services
– Potential for ‘shared services’, re-use of
services
• Explore potential dataflows
– Aggregation, data exchange, metadata
extraction and enhancement
Dataflows and Workflows
• How is primary research data captured
in faculty and academic departments?
• Where and how is primary research data
stored? Made accessible?
• What are processes for deriving further
data and how is this is structured and
stored? Made accessible?
• How is data curated for the long term?
Understanding the research process
• Project StORe: Source-to-Output
Repositories (Edinburgh)
– Primary data : research publications
– Survey questionnaire
• RepoMMan: Repository Metadata and
Management (Hull)
– Survey questionnaire and interviews
– Activity diagram and workflow
• DCC SCARP
– Curation staff working within research teams
Repository ecology
Experimental
machine
Authoring
tool
Departmental
repository
Laboratory
repository
Institutional
Repository
Institutional
research system
Aggregators:
OAIster,
Google
Regional,
national
Data Centres
Research
council
repositories
Text mining
tools
Terminology
services
Subject
repositories
Learned
society
repositories
Digital libraries & eScience Infrastructure
Data capture
Digital repositories, OA & preservation
• Long-term access: trust, responsibility, policy
• Trusted DR Audit Checklist for Certification Draft Research
Libraries Group-NARA Taskforce 2005
• Defined criteria under 4 categories
–
–
–
–
Organisation
Functions, processes & procedures
Designated community & usability
Technologies & technical infrastructure
• UK Digital Curation Centre: advice, tools & services
http://www.dcc.ac.uk/
• RepInfo Registry
• EU CASPAR Integrated Project
http://www.casparpreserves.info/pages/1/index.htm
• Task Force on the Permanent Access to the Records of
Science
http://tfpa.kb.nl/
Data, metadata and discovery
• Validation, publication & discovery of
data models & schema
• Metadata packaging standards
– METS, MPEG 21 DIDL
– Complex object model?
• Semantic descriptions
– Formal high-level and domain ontologies
– Inter-disciplinary discovery
• ePrints DC Application Profile
• UK Intute IR search service (eprints)
• Informal social network approaches
“folksonomies”
• What data models and metadata
schema are in place?
• Have librarians been involved in
their development?
Persistent identifiers for data citation
• How will they be used? We need use cases: depositor,
author, service provider, researcher, publisher?
• Schemes: DOI, Handle, ARK, PURL
• Publication & citation of scientific primary data project
National Library for Science & Technology (TIB), University
of Hanover, Germany. STD-DOI Project DOI registry for
datasets http://www.std-doi.de
• What persistent
identifiers have been
assigned to your data?
• Is there a data
citation policy?
• Was the Library
involved?
Adding value: repository services
• Tools: for deposit, normalisation,
manipulation, transformation…..
• Linking, annotation, visualisation
• Aggregators: generic, (sub-)
disciplinary
Knowledge extraction:
• Mining (data, text, structures)
• Modelling (economic, climate,
mathematical, biological…)
• Analysis (statistical, lexical, gene….)
Is your data OA?
How is your data being used
and re-used?
NaCTeM
http://www.nactem.ac.uk/
Emerging tools: TerMine,
GENIA, Cafetiere
Nature 23 March 2006
OTMI: Open Text Mining Interface
A Case Study in Crystallography
Data capture
R4L Deposit scenario (…part of….)
1.
2.
3.
4.
5.
6.
7.
8.
Produce strategy for synthesis (=idea)
Submit plan to SmartTea system (incl. identifiers)
Retrieve and follow instructions (sub-workflow?)
Experimental synthesis metadata automatically recorded on instruments
(Smart Lab)
Create record for synthesised sample (+ proposed chemical identifier) in
R4L laboratory data management system
Run spectral analyses on sample capturing further analysis metadata
(incl. time-stamp, analysis software version, researcher details etc.)
Save spectrum in native and common formats
Invoke R4L data capture service and deposit files + metadata in
laboratory repository…
RAW DATA
DERIVED DATA
RESULTS DATA
eBank UK Project
http://www.ukoln.ac.uk/projects/ebank-uk/
• Promote open access crystallography data
• Aggregator service harvests OAI metadata from institutional
data repository (e-Crystals archive)
• Service linking from data to derived research publication
• Embedding eBank service in learning workflows: pedagogy
• Future federation plans for crystallography data repositories
UKOLN (lead), University of Southampton, University of
Manchester
A data repository entry
ecrystals.chem.soton.ac.uk
Access to the underlying data: complex objects
eBank Metadata Publication
• Using simple Dublin Core
• Crystal structure
• Title (Systematic IUPAC Name)
• Authors
• Affiliation
• Creation Date
• Additional chemical information through Qualified Dublin Core
• Empirical formula
• International Chemical Identifier InChI
• Compound Class & Keywords
• Specifies which ‘datasets’ are present in an entry
• Application Profile http://www.ukoln.ac.uk/projects/ebank-uk/schemas/
• DOIs from TIB http://dx.doi.org/10 .1594/ecrystals.chem.soton.ac.uk/145
• Data citation policy http://ecrystals.chem.soton.ac.uk/rights.html
Discovering data:
• Domain identifier:
International
Chemical Identifier
(INChI) code
• Google molecule
using INChI
Slide from Simon Coles
Coles, S.J., Day, N.E., Murray-Rust, P., Rzepa, H.S., Zhang, Y., Org. Biomol.
Chem., 2005, (10),1832-1834. DOI: 10.1039/b502828k
Adding value: eBank linking data to publications
Linking research to learning - embedding eBank aggregator
service in a science portal for student learners
Integration into the curriculum and eLearning workflows
• MChem course
• Assess role in
Undergraduate
Chemical Informatics
courses
• Pedagogic evaluation
• April – June 2006
• Report to follow.
Roles & responsibilities: new challenges?
Workforce development and capacity building
• NSF Draft Report 2005
“Data scientist” - hybrid skills
• Facilitate collaboration
– “Multidisciplinary teams: computer
scientists, domain scientists, digital
library experts, statisticians/modellers
e.g. eBank project
– Lessons learnt: e-Science Human
Factors Audit Report (to be published
2006) Roy Kawalsky, Loughborough
• CURL/SCONUL e-Research
Taskforce
Has your (digital) library engaged
with the e-Research agenda?
Repositories roadmap :vision 2010
• Richer scholarly communication based on
open access to and re-use of scholarly
materials
• Integrated life-cycle of knowledge from
research to learning
• Available metadata about scholarly
materials
• Added value services on scholarly
materials (involving HE and commercial
sectors)
More repositories and more content!
• Working papers, primary data, audiovisual,
images
• Hardware in research labs will
automatically deposit experimental data
• Desktop tools will deposit content
• Rich data flow between networks of
repositories
• Rich data flows between repositories and
other components in information landscape
• National and institutional preservation
strategies in place!
Repository interworking with other
components
Virtual
Learning
Environment
Authoring
tool
Name
authority
service
Institutional
research
system
Automated
classification
service
repository
Repository
Packaging
tool
Where are we now?
Scholarship today?
OA landscape
Repository ecology
Experimental
machine
Authoring
tool
Departmental
repository
Laboratory
repository
Institutional
Repository
Institutional
research system
Aggregators:
OAIster,
Google
Regional,
national
Data Centres
Research
council
repositories
Text mining
tools
Terminology
services
Subject
repositories
Learned
society
repositories
Defining workflows and dataflows
• Analyse roles and interactions within and
beween repositories
• What does the user want?
• Identify and define services
– Potential for ‘shared services’, re-use of
services
– In context of JISC e-Framework
• Explore potential dataflows
– Aggregation, data exchange, metadata
extraction and enhancement
Deposit a priority!
• To enable users to populate repositories simply,
effectively and preferably automatically
• To capture content from desktop applications,
experimental equipment (smart labs), learning
content development tools etc
• To enable repository of deposit to exchange data
with further repositories in predictable manner
• To hide complexity from end-user
• To be compatible with follow-on added value
services layered on repository content
• Deposit API Working group meeting July 11/12,
Warwick
http://www.ukoln.ac.uk/repositories/digirep/
Thank you!
Download