Annotating Experimental Records using Ontologies Olga Giraldo, Unal de Colombia/CIAT Jael Garcia, 3Universität der Bundeswehr Alexander Garcia, UAMS Motivation and Research Question • Knowledge-based approach to managing laboratory information – it combines elements from the Semantic Web (SW), e.g. ontologies supporting organization and classification, with elements from Social Tagging Systems, e.g. collaboration, ad-hoc organization strategies. • How can we semantically annotate laboratory records? • How can we facilitate the coexistence of laboratory notebooks and electronic laboratory records? Motivation and Research Question • Easy to use, highly portable, easy to share, low cost… • Great artifacts for supporting design • Legal requirement Mutis Marie Curie da Vinci Research Question • How can we facilitate the coexistence of laboratory notebooks and electronic laboratory records? • How can we semantically annotate laboratory records? Our Approach • Documents should be able to “know about” their own content for automated processes to “know what to do” with them. Semantics…. Materials and Methods • Our scenario: supporting the annotation of experimental data for some of the processes routinely run at the Center for International Tropical Agriculture (CIAT) biotechnology laboratory • 15 laboratory notebooks together with their corresponding electronic records, e.g. XLS files, outputs from lab equipment, etc. • 10 biologists • Direct non-intrusive observation: 6 months • Ontology and prototype development: iterative and collaborative process • Existing ontologies Results • • • • • • Data types Rhetorical structure Ontologies Orchestration of ontologies Tags and ontologies Lessons Results • Data Types – Manuscript – Digital – Digital data with manuscript annotations Results • Manuscript – – – – – – – – Lists To-dos How-tos (protocols) Incomplete results Dates Formulas Electronic paths Sources for information (URLs) A B C D Results • Digital – – – – – – Photos Lists Incomplete results Protocols Figures Sequences Results • Digital + Manuscript – Digital files, print-outs, tagged with manuscript information. A B Results • We identified the rhetorical structure implicit in those laboratory notebooks we studied • And the metadata describing such structure Rhetorical structure: Header, Body. Title (DC) Creator (DC/AgMes) Header: metadata describing a lab notebook Notes (AgMes) Date of creation (DC) Date of finalization (M4L) Lab Notebook Body: metadata describing an experiment al activity Laboratory notebook number (M4L) Languaje (DC) Samples: DNA, RNA, whole plant, etc. (OBI, CHEBI, PO) Project (OBI/AGROVO C) Date (DC) Laboratory procedure (M4L) Page number (M4L) Recorded by (M4L) Protocol (OBI) Comments (BioPortal, NCIt, SNOMED) Purpose (M4L) Materials & Methods, experimental design Materials & Methods: Samples, Reagents, Assays, Equipment and supplies. Security measurements (M4L) Reagents: buffer, dNTP mix (CHEBI, M4L) Assay: extraction DNA, PCR, gel electrophoresis (OBI, M4L). Equipment & supplies: freezer, centrifuge, shaker, glove, etc. (OBI, PEO, SEP, SNOMED, BIRNLex M4L). Outcome (NCIt) Experimental design Experimental design: (OBI, M4L) DNA Extraction We focused on: DNA extraction, PCR and Electrophoresis 2 process: mechanical pulverization of plant material inheres in inheres in bearer of inheres in is a is a is a is a is a A typical process in a plant biotechnology laboratory Mechanical pulverization of plant material Results • M4L: our ontology for the experimental processes we studied – Based on OBI. – Terms proposed to OBI: 197, including new terms plus terms from other ontologies – Other terms will be proposed to other ontologies, e.g. ChEBI, GO, PO Ontology N. of concepts 0 Metadata for Laboratory Notebook (M4L) 149 1 Chemical Entities of Biological Interest (CHEBI) (Degtyarenko et al., 2008) 87 2 Ontology for Biomedical Investigation (OBI) (Brinkman et al., 2010) 59 3 Medical Subject Headings ontology (MSH) (Moerchen et al., 2008) 17 4 Gene Ontology (GO) (Ashburner et al., 2000) 14 5 Sample Processing and Separation Techniques (SEP) (http://psidev.info/index.php?q=node/312) 6 6 BIRN Project lexicon (BIRNLex) (Bug et al., 2008) 6 7 Gene Regulation Ontology (GRO) (Beisswanger et al., 2008) 5 8 National Cancer Institute thesaurus (NCIt) (Ceusters et al., 2005) 5 9 Plant Ontology Consortium (POC) (Jalswal et al., 2005) 5 10 SNOMED-CT (http://www.nlm.nih.gov/research/umls/Snomed/snomed_main.html) 5 11 BioTop Ontology (Beisswanger et al., 2007) 1 12 Foundational Model of Anatomy (FMA) (Rosse and Mejino, 2003) 1 13 Ontology for Genetic Interval (OGI) (Lin et al., 2010) 1 14 Parasite Experiment Ontology (PEO) (http://wiki.knoesis.org/index.php/Parasite_Experiment_ontology) 1 15 Proteomics Data and Process Provenance (PDPP) (Sahoo et al., 2006) 1 Results • We have structured the descriptive layers by reusing and extending existing ontologies. • For supporting the annotation within our scenario we have identified three main layers, namely: – i) that related to the document itself, – ii) the annotation layer, and – iii) that related to the experiment. Results • Orchestration of ontologies: Annotation Ontology The Annotation Ontology is a vocabulary for performing several types of annotation - comment, entities annotation (or semantic tags), textual annotation (classic tags), notes, examples, erratum... - on any kind of electronic document (text, images, audio, tables...) and document parts. AO is not providing any domain ontology but it is fostering the reuse of the existing ones for not breaking the principle of scalability of the Semantic Web. Selector (304,507) rdfs:SubClassOf aos:init ImageSelector (360,618) aos:end aof:onDocument InitEndCornerSelector rdf:type ao:context rdfs:SubClassOf Annotation rdf:type Qualifier aof:annotates Document Topic ANNOT1 ao:hasTopic GenBank: AB005238 moat:tagMeaning name Provenance pav:createdBy http://www.tags4lab.org /foaf.rdf#olga.giraldo pav:createdOn June 1, 2010 Partial sequence on psy promoter ann:body tags:name MOAT rdf:type foaf:Person rdf:type moat:Tag moat:hasMeaning Annotation rdfs:SubClassOf Definition rdf:type ANNOT2 aoex:hasMoatMeaning rdf:type aof:annotates Document http://www.ncbi.nlm.nih .gov/pubmed/12520345 moat:Meaning Results • The AO is structuring the semantic annotation as well as the tags generated by users. – In this way we are supporting complex SPARQL queries involving several ontologies, for instance: • Retrieve from the eLabBook the pages tagged by Tim Andrews or Lisa Watson with the tags rice and iron for which there is a LIMS data entry” Concluding Remarks • Although several ELNs have been proposed and replacing paper-based records has been a consistent trend for several years, the technology has not yet been widely adopted; Laboratory Information Management Systems (LIMS) in combination with paper-based laboratory notebooks continue to be commonly used; particularly in academic environments. Concluding Remarks • Sharing and organizing information happens on a concept basis – researchers studying genes involved in iron transport share information with those who undertake nutritional studies assessing the effects of iron intake in human populations – Clustering information based on concepts Concluding Remarks • Simple tagging mechanisms proved to be valuable resources for organizing information – Cloud of tags were used as TOCs – Tags were also used to support a quick view of laboratory pages – Tags tend to stabilize over time – Tags were a valuable resource of terms and evidence (use cases) for those terms Concluding Remarks • Time is difficult to model • Incremental prototyping and participatory design were key –community engagement • Limitations in the technology: – Tablets, electronic pen, ipad first generation, now motorola XOOM – Browser compatibility • Laboratory notebooks look like specialized wikis Future Work • Focus on one technology: Android OS • Semantic LIMS • Support the whole cycle (LIMS record— notebook—machine generated data) • Automatic annotation of machine generated data • Adopt minimal amounts of information • Adopt techniques from Personal Information Management approaches • Look more like a wiki Acknowledgments • John Bateman, Oscar Corcho, Joe Tohme, Cesar Montana, Alberto Labarga • The CIAT biotech lab