a centre of expertise in data curation and preservation Create or Receive Scientific data Dr. Frank Gibson and Dr. Phillip Lord Frank.Gibson@newcastle.ac.uk Phillip.Lord@newcastle.ac.uk Funded by: This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 2.5 UK: Scotland License. To view a copy of this license, visit http://creativecommons.org/licenses/by-ncsa/2.5/scotland/ ; or, (b) send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA. Digital Curation 101, October 6th-10th, 2008, NeSC, Edinburgh a centre of expertise in data curation and preservation “In the standard model, one collects data, publishes a paper or papers and then gradually loses the original dataset.” - Geoffrey Bowker Create or Receive a centre of expertise in data curation and preservation Slide by Cameron Neylon http://www.slideshare.net/CameronNeylon Create or Receive a centre of expertise in data curation and preservation Slide by Cameron Neylon http://www.slideshare.net/CameronNeylon Create or Receive a centre of expertise in data curation and preservation Slide by Cameron Neylon http://www.slideshare.net/CameronNeylon Create or Receive a centre of expertise in data curation and preservation Slide by Cameron Neylon http://www.slideshare.net/CameronNeylon Create or Receive If we have an paper who cares about the data? a centre of expertise in data curation and preservation http://flickr.com/photos/nicmcphee/2756494307/ Create or Receive a centre of expertise in data curation and preservation A paper = a claim (or claims) The full record that supports that claim should be available for detailed examination and critique Slide by Cameron Neylon http://www.slideshare.net/CameronNeylon Create or Receive a centre of expertise in data curation and preservation Slide by Cameron Neylon http://www.slideshare.net/CameronNeylon Create or Receive Biocuration: Databases a centre of expertise in data curation and preservation Create or Receive Biocuration: Wiki a centre of expertise in data curation and preservation Create or Receive a centre of expertise in data curation and preservation Slide by Cameron Neylon http://www.slideshare.net/CameronNeylon Create or Receive a centre of expertise in data curation and preservation Create or Receive Funders a centre of expertise in data curation and preservation http://flickr.com/photos/luismimunoznajar/2093185804/ Create or Receive a centre of expertise in data curation and preservation Create or Receive Create or Receive Curation aims a centre of expertise in data curation and preservation Amenable Preservable Ownable Accessible Citable Create or Receive a centre of expertise in data curation and preservation Significant Properties of Data Content Syntax Semantics Create or Receive a centre of expertise in data curation and preservation Content Create or Receive a centre of expertise in data curation and preservation Publisher Type Title Creator Source Identifier Date Rights Create or Receive Simple Dublin Core a centre of expertise in data curation and preservation Type Format Identifier Source Language Relation Coverage Rights Title Creator Subject Description Publisher Contributor Date Create or Receive a centre of expertise in data curation and preservation Content: Domain Specific Create or Receive a centre of expertise in data curation and preservation Syntax Create or Receive a centre of expertise in data curation and preservation Create or Receive a centre of expertise in data curation and preservation Choosing a Syntax • Openness • -is there an open, publicly available specification for the format; are its specifications in the public domain; is it unencrypted? • Portability • -is the format independent of hardware, operating system, of other software; is it independent of particular institutions, groups, or events; is it in widespread current use; does it contain little or no built-in functionality? • Quality • -is it robust; simple; highly tested; loss-free? Create or Receive a centre of expertise in data curation and preservation Semantics Create or Receive a centre of expertise in data curation and preservation Semantics can be complex One semantic = many words Many words = one semantic Create or Receive What is fly? a centre of expertise in data curation and preservation •Fly •Fly •http://en.wikipedia.org/wiki/Image:Air_india_b747-400_vt-esn_arp.jpg •http://en.wikipedia.org/wiki/Image:MuscuDomestica.jpg •Fly •Fly •http://en.wikipedia.org/wiki/Image:Green_Highlander_salmon_fly.jpg •http://en.wikipedia.org/wiki/Image:Fly_poster.jpg Create or Receive a centre of expertise in data curation and preservation • Excel data example – do I need it? •Zeeberg et al. BMC Bioinformatics 2004 5:80 doi:10.1186/1471-2105-5-80 Create or Receive •Zeeberg et al. BMC Bioinformatics 2004 5:80 doi:10.1186/1471-2105-5-80 a centre of expertise in data curation and preservation Ontology • A controlled vocabulary is an association between formal names (identifiers) and their definitions. • An ontology is a controlled vocabulary augmented with logical constraints that describe their interrelationships. Create or Receive a centre of expertise in data curation and preservation Ontologies for Life science • Emergence has occurred for two reasons • Consistent annotation of data • To add meaning and understanding that can be interpreted computationally • Bio-ontologies registered on the OBO foundry Create or Receive a centre of expertise in data curation and preservation Application In Proteomics Create or Receive a centre of expertise in data curation and preservation Minimum Information about a Proteomics Experiment (MIAPE) • Sufficiency. • • The MIAPE guidelines should require sufficient information about a dataset and its experimental context to allow a reader to understand and critically evaluate the interpretation and conclusions, and to support their experimental corroboration. Practicability. • Achieving compliance with MIAPE should not be so burdensome as to prohibit its widespread use. Create or Receive a centre of expertise in data curation and preservation Create or Receive a centre of expertise in data curation and preservation Minimum reporting guidelines • Describe content • Implementation independent • Impacts • Publication • Syntax • Semantics Create or Receive a centre of expertise in data curation and preservation Syntax for proteomics • The content in MIAPE GE needs to be structured to facilitate • dissemination • transfer • storage • A community development process to agree on a syntax • building upon the FuGE data model • A pre-existing community developed representation of scientific experiments • Interoperable Create or Receive a centre of expertise in data curation and preservation FuGE • • Model of common components in science investigations, such as materials, data, protocols, equipment and software. Provides a framework for capturing complete laboratory workflows, enabling the integration of pre-existing data formats. Create or Receive a centre of expertise in data curation and preservation UML/XML/RDBMS • UML gives structure (but not syntax) • Very abstract, very general • XML provides a concrete syntax • Meta language is interoperable, checkable, viable and has basic metadata support (language, character coding and so on). • Tends toward the verbose. Not (very) searchable for itself. • Therefore, transfer and archive format. • RDBMS • • • • SQL is (sort of) a standard Highly computationally amenable form; v. good for searching Conversion from XML is possible, but in a number of ways. Hard work – nice to have an off-the-shelf implementation. Create or Receive GelML a centre of expertise in data curation and preservation Create or Receive a centre of expertise in data curation and preservation Semantics for Gels Create or Receive Semantics for science a centre of expertise in data curation and preservation Create or Receive a centre of expertise in data curation and preservation Curation of Gel experiments Laboratory Public repositories Data entry and transfer I) GelML data entry tools GelML MAIPE GE II) Direct database submission III) Automated export of GelInfoML MAIPE GI sepCV Create or Receive Discoverability and reuse a centre of expertise in data curation and preservation •Persistent Identifiers •Rights management Create or Receive a centre of expertise in data curation and preservation Persistent Identifiers • a name for a resource which will remain the same regardless of where the resource is located • In biology typically assigned to data upon publication • Type of identifier dependent on publication method • Description and Representation Information provides more information about persistent identifiers Create or Receive a centre of expertise in data curation and preservation Rights management • Difficult to determine • Lots of legal issues • In biology/bioinformatics tends to be open access •Creative commons Create or Receive Receiving data for curation a centre of expertise in data curation and preservation Content Syntax Semantics Create or Receive Who will receive it? Route map a centre of expertise in data curation and preservation What are their policies on: Content, Syntax, Semantics Route map Plan your experiment to conform to Content, Syntax, Semantics Implement experiment to; Collect appropriate Content Structure in appropriate Syntax Ensure Semantics are preserved Curate… Create or Receive a centre of expertise in data curation and preservation Meta Route Map • What do we do if content, syntax and semantics are not specified. • Define content for you – could grow into a community standard/collaboration • Re-use or build a syntax that allows the content to be captured • Re-use of contribute to existing semantics i.e OBI. • Curate… Create or Receive a centre of expertise in data curation and preservation Appraise and Select • Investigates the evaluation and selection of data for longterm curation and preservation Create or Receive a centre of expertise in data curation and preservation Acknowledgments • The CARMEN project • www.carmen.org.uk • The Proteomics Standards Initiative (PSI) • http://psidev.info • Colleagues at Newcastle University • Phillip Lord, Anil Wipat, Allyson Lister Create or Receive a centre of expertise in data curation and preservation Create or Receive