Create or Receive Scientific data

advertisement
a centre of expertise in data curation and preservation
Create or Receive Scientific data
Dr. Frank Gibson and Dr. Phillip Lord
Frank.Gibson@newcastle.ac.uk
Phillip.Lord@newcastle.ac.uk
Funded by:
This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 2.5 UK:
Scotland License. To view a copy of this license, visit http://creativecommons.org/licenses/by-ncsa/2.5/scotland/ ; or, (b) send a letter to Creative Commons, 543 Howard Street, 5th Floor, San
Francisco, California, 94105, USA.
Digital Curation 101, October 6th-10th, 2008, NeSC, Edinburgh
a centre of expertise in data curation and preservation
“In the standard model,
one collects data,
publishes a paper or
papers and then gradually
loses the original dataset.”
- Geoffrey Bowker
Create or Receive
a centre of expertise in data curation and preservation
Slide by Cameron Neylon http://www.slideshare.net/CameronNeylon
Create or Receive
a centre of expertise in data curation and preservation
Slide by Cameron Neylon http://www.slideshare.net/CameronNeylon
Create or Receive
a centre of expertise in data curation and preservation
Slide by Cameron Neylon http://www.slideshare.net/CameronNeylon
Create or Receive
a centre of expertise in data curation and preservation
Slide by Cameron Neylon http://www.slideshare.net/CameronNeylon
Create or Receive
If we have an paper who cares about the data?
a centre of expertise in data curation and preservation
http://flickr.com/photos/nicmcphee/2756494307/
Create or Receive
a centre of expertise in data curation and preservation
A paper = a claim (or claims)
The full record that supports that
claim should be available for detailed
examination and critique
Slide by Cameron Neylon http://www.slideshare.net/CameronNeylon
Create or Receive
a centre of expertise in data curation and preservation
Slide by Cameron Neylon http://www.slideshare.net/CameronNeylon
Create or Receive
Biocuration: Databases
a centre of expertise in data curation and preservation
Create or Receive
Biocuration: Wiki
a centre of expertise in data curation and preservation
Create or Receive
a centre of expertise in data curation and preservation
Slide by Cameron Neylon http://www.slideshare.net/CameronNeylon
Create or Receive
a centre of expertise in data curation and preservation
Create or Receive
Funders
a centre of expertise in data curation and preservation
http://flickr.com/photos/luismimunoznajar/2093185804/
Create or
Receive
a centre of expertise in data curation and preservation
Create
or
Receive
Create or Receive
Curation aims
a centre of expertise in data curation and preservation
Amenable
Preservable
Ownable
Accessible
Citable
Create or Receive
a centre of expertise in data curation and preservation
Significant Properties of Data
Content
Syntax
Semantics
Create or Receive
a centre of expertise in data curation and preservation
Content
Create or Receive
a centre of expertise in data curation and preservation
Publisher
Type
Title
Creator
Source
Identifier
Date
Rights
Create or Receive
Simple Dublin Core
a centre of expertise in data curation and preservation
Type
Format
Identifier
Source
Language
Relation
Coverage
Rights
Title
Creator
Subject
Description
Publisher
Contributor
Date
Create or Receive
a centre of expertise in data curation and preservation
Content:
Domain Specific
Create or Receive
a centre of expertise in data curation and preservation
Syntax
Create or Receive
a centre of expertise in data curation and preservation
Create or Receive
a centre of expertise in data curation and preservation
Choosing a Syntax
• Openness
• -is there an open, publicly available specification for the
format; are its specifications in the public domain; is it
unencrypted?
• Portability
• -is the format independent of hardware, operating system, of
other software; is it independent of particular institutions,
groups, or events; is it in widespread current use; does it
contain little or no built-in functionality?
• Quality
• -is it robust; simple; highly tested; loss-free?
Create or Receive
a centre of expertise in data curation and preservation
Semantics
Create or Receive
a centre of expertise in data curation and preservation
Semantics can be complex
One semantic = many words
Many words = one semantic
Create or Receive
What is fly?
a centre of expertise in data curation and preservation
•Fly
•Fly
•http://en.wikipedia.org/wiki/Image:Air_india_b747-400_vt-esn_arp.jpg
•http://en.wikipedia.org/wiki/Image:MuscuDomestica.jpg
•Fly
•Fly
•http://en.wikipedia.org/wiki/Image:Green_Highlander_salmon_fly.jpg
•http://en.wikipedia.org/wiki/Image:Fly_poster.jpg
Create or Receive
a centre of expertise in data curation and preservation
• Excel data example – do I need it?
•Zeeberg et al. BMC Bioinformatics 2004 5:80 doi:10.1186/1471-2105-5-80
Create or Receive
•Zeeberg et al. BMC Bioinformatics 2004 5:80 doi:10.1186/1471-2105-5-80
a centre of expertise in data curation and preservation
Ontology
• A controlled vocabulary is an association
between formal names (identifiers) and their
definitions.
• An ontology is a controlled vocabulary
augmented with logical constraints that
describe their interrelationships.
Create or Receive
a centre of expertise in data curation and preservation
Ontologies for Life science
• Emergence has occurred for two reasons
• Consistent annotation of data
• To add meaning and understanding that can
be interpreted computationally
• Bio-ontologies registered on the OBO foundry
Create or Receive
a centre of expertise in data curation and preservation
Application
In
Proteomics
Create or Receive
a centre of expertise in data curation and preservation
Minimum Information about a
Proteomics Experiment (MIAPE)
•
Sufficiency.
•
•
The MIAPE guidelines should require sufficient information about
a dataset and its experimental context to allow a reader to
understand and critically evaluate the interpretation and
conclusions, and to support their experimental corroboration.
Practicability.
•
Achieving compliance with MIAPE should not be so burdensome
as to prohibit its widespread use.
Create or Receive
a centre of expertise in data curation and preservation
Create or Receive
a centre of expertise in data curation and preservation
Minimum reporting guidelines
• Describe content
• Implementation
independent
• Impacts
• Publication
• Syntax
• Semantics
Create or Receive
a centre of expertise in data curation and preservation
Syntax for proteomics
• The content in MIAPE GE needs to be structured to
facilitate
• dissemination
• transfer
• storage
• A community development process to agree on a
syntax
• building upon the FuGE data model
• A pre-existing community developed representation of
scientific experiments
• Interoperable
Create or Receive
a centre of expertise in data curation and preservation
FuGE
•
•
Model of common components in science investigations, such
as materials, data, protocols, equipment and software.
Provides a framework for capturing complete laboratory
workflows, enabling the integration of pre-existing data
formats.
Create or Receive
a centre of expertise in data curation and preservation
UML/XML/RDBMS
• UML gives structure (but not syntax)
• Very abstract, very general
• XML provides a concrete syntax
• Meta language is interoperable, checkable, viable and has
basic metadata support (language, character coding and so
on).
• Tends toward the verbose. Not (very) searchable for itself.
• Therefore, transfer and archive format.
• RDBMS
•
•
•
•
SQL is (sort of) a standard
Highly computationally amenable form; v. good for searching
Conversion from XML is possible, but in a number of ways.
Hard work – nice to have an off-the-shelf implementation.
Create or Receive
GelML
a centre of expertise in data curation and preservation
Create or Receive
a centre of expertise in data curation and preservation
Semantics
for
Gels
Create or Receive
Semantics for science
a centre of expertise in data curation and preservation
Create or Receive
a centre of expertise in data curation and preservation
Curation of Gel experiments
Laboratory
Public
repositories
Data entry and transfer
I) GelML data entry tools
GelML
MAIPE
GE
II) Direct database submission
III) Automated export of GelInfoML
MAIPE
GI
sepCV
Create or Receive
Discoverability and reuse
a centre of expertise in data curation and preservation
•Persistent Identifiers
•Rights management
Create or Receive
a centre of expertise in data curation and preservation
Persistent Identifiers
• a name for a resource which will remain the same
regardless of where the resource is located
• In biology typically assigned to data upon publication
• Type of identifier dependent on publication method
• Description and Representation Information provides
more information about persistent identifiers
Create or Receive
a centre of expertise in data curation and preservation
Rights management
• Difficult to determine
• Lots of legal issues
• In biology/bioinformatics
tends to be open
access
•Creative commons
Create or Receive
Receiving data for curation
a centre of expertise in data curation and preservation
Content
Syntax
Semantics
Create or Receive
Who will receive it?
Route map
a centre of expertise in data curation and preservation
What are their policies on:
Content, Syntax, Semantics
Route map
Plan your experiment to conform to
Content, Syntax, Semantics
Implement experiment to;
Collect appropriate Content
Structure in appropriate Syntax
Ensure Semantics are preserved
Curate…
Create or Receive
a centre of expertise in data curation and preservation
Meta Route Map
• What do we do if content, syntax and
semantics are not specified.
• Define content for you – could grow into a
community standard/collaboration
• Re-use or build a syntax that allows the
content to be captured
• Re-use of contribute to existing semantics i.e
OBI.
• Curate…
Create or Receive
a centre of expertise in data curation and preservation
Appraise and Select
• Investigates the evaluation and selection of
data for longterm curation and preservation
Create or Receive
a centre of expertise in data curation and preservation
Acknowledgments
• The CARMEN project
• www.carmen.org.uk
• The Proteomics Standards Initiative (PSI)
• http://psidev.info
• Colleagues at Newcastle University
• Phillip Lord, Anil Wipat, Allyson Lister
Create or Receive
a centre of expertise in data curation and preservation
Create or Receive
Download