The Neuroscience Information Framework Making Resources Discoverable for the Computational Neuroscience Community Jeffrey S. Grethe, Ph. D. Co-Principal Investigator, NIF Center for Research in Biological Systems University of California, San Diego OCNS 2010 Workshop on Methods in Neuroinformatics The Neuroscience Information Framework: Discovery and utilization of web-based resources for neuroscience http://neuinfo.org A portal for finding and using neuroscience resources A consistent framework for describing resources Provides simultaneous search of multiple types of information, organized by category Supported by an expansive ontology for neuroscience Utilizes advanced technologies to search the “hidden web” UCSD, Yale, Cal Tech, George Mason, Washington Univ Brief History of NIF • Outgrowth of Society for Neuroscience Neuroinformatics Committee – Neuroscience Database Gateway: a catalog of neuroscience databases • “Didn’t I fund this already?” – Over 2500 databases are on-line; no one can go to them all • “Why can’t I have a Google for neuroscience” – “Easy”, comprehensive, pervasive • Phase I-II: Funded by a broad agency announcement from the NIH Neuroscience Blueprint – Feasibility • Current phase: Started Sept 2008 How can we provide a consistent and easy to implement framework for those who are providing resources, eg., data, and those looking for these data and resources ➤ Both humans and machines The Problem • Over 2000 databases have been identified through NIF – Researchers can’t visit them all – Most content from these resources not easily found through standard search engines – Even more structured content on the web • Databases provide domain specific views of data – NIF provides a snapshot of information in a simple to understand form that can be further explored in the native database – Providing a biomedical science based semantic framework for resource description and search NIF uniquely provides access to the largest registry of neuroscience resources available on the web Date Data Data Federation Federation Records Catalog Web Index Literature Corpus NIF Vocabulary 9/2008 5 60,420* 388 113,458 67,000 18,884† 7/2009 18 4,393,744* 1,605 497,740 17,086 5/2010 % yearly increase % overall increase 55 23,228,658 2,871 1,184,261 101,627 All (PubMed) 205 429 79 138 181 1,000 38,345 640 944 210 * Numbers for initial sources were generated by examining current source content † First year of NIF contract involved re-factoring of ontology 53,023 Guiding principles of NIF • Builds heavily on existing technologies (open source tools and ontologies) • Information resources come in many sizes and flavors • Framework has to work with resources as they are, not as we wish them to be – Federated system; resources will be independently maintained – Developed for their own purpose with different levels of resources • No single strategy will work for the current diversity of neuroscience resources • Trying to design the framework so it will be as broadly applicable as possible to those who are trying to develop technologies • Interface neuroscience to the broader life science community • Take advantage of emerging conventions in search, semantic web, linked data and in building web communities A Quick Tour of the NIF http://neuinfo.org Domain Enhanced Search for Neuroscience NIF now searches more than 55 databases with information neuronal descriptions, neuronal morphology, connectivity, chemical compounds… Ontology Based Search Refinement Diverse Database Content NeuroMorpho.org NeuronDB Concept-based search • • Search Google: GABAergic neuron Search NIF: “GABAergic neuron” – NIF automatically searches for types of GABAergic neurons Types of GABAergic neurons Concept-based search Use of Ontologies within NIF • Controlled vocabulary for describing type of resource and content – Database, Image, Parkinson’s disease • Entity-mapping of database and data content • Data integration across sources • Search: Mixture of mapped content and string-based search – Different parts of NIF use the vocabularies in different ways – Utilize synonyms, parents, children to refine search – Increasing use of other relationships and logical inferencing • Generation of semantic content (i.e. RDF, Linked Data) Building the NIF Ontologies http://neurolex.org Modular Ontologies • Set of expanded vocabularies largely imported from existing terminological resources • Adhere to ontology best practices as we understood them • • • • • • • • Built from existing resources when possible Standardized to same upper ontology: BFO Encoded in OWL DL Provides mapping to source terminologies Provides synonyms, lexical variants, abbreviations Single inheritance trees with minimal cross domain and intradomain properties Orthogonal: Neuroscientists didn’t like too many choices Human readable definitions (not complete yet) NIFSTD Organis m Macroscopic Anatomy Molecule Subcellular Anatomy Macromolecule Molecule Descriptors Gene Cell NS Dysfunctio n Quality NS Function Resource Investigatio n Techniques Instruments Reagent Protocols Anatomy Cell Type CNS Neuron Cellular Component Small Molecule Neurotransmitter Transmembrane Receptor Purkinje Cell Cytoarchitectural Part of Cerebellar Cortex Purkinje Cell Layer Dentate Nucleus Neuron Cpllection of Deep Cerebellar Nuclei Expressed in GABA GABA-R Presynaptic density Terminal Axon Bouton Dentate Nucleus Transmitter Vesicle Located in “Bridge files” NIF Cell • NIF has made significant enhancements to its cell ontology – Expanded neuron list – Generated neuronal classifications based on neurotransmitter, brain region, molecules, morphology, circuit role – Recommended standard naming convention – Is working with the International Neuroinformatics Coordinating Facility through the PONS (program in ontologies for neural structures) program • Creating Knowledge base for neuronal classification based on properties Neurolex Wiki •NIF has posted its vocabularies in Wiki form (Semantic MediaWiki) •Simplified interface for ontology construction and refinement •Custom forms for neurons and brain regions •Semantic linking between category pages •Significant knowledge base •Curation NIFSTD http://neurolex.org NeuroLex and NeuroML “There was further discussion of how to define specific types of morphological groups such as apical dendrites, basal dendrites, axons, etc. Several options include having predefined names for common types or linking to ontologies that define these types. We suggest adding tags or rdf for metadata that provide NeuroLex ontology ids to groups. We propose to begin with simple tags, and when a tag is present, one should assume it indicates “is a”. If more complicated semantic information is needed, we can use rdf in a way that is similar to SBML.” NeuroML Development Workshop 2010 http://www.neuroml.org/files/NeuroMLWorkshop2010.pdf Providing community access http://neuinfo.org Access at various levels… • • • • • • • • • • • A search portal (link to NIF advanced search interface) for researchers, students, or anyone looking for neuroscience information, tools, data or materials. Access to content normally not indexed by search engines, i.e, the "hidden web” Tools for resource providers to make resources more discoverable, e.g., ontologies, data federation tools, vocabulary services Tools for promoting interoperability among databases Standards for data annotation The NIFSTD ontology covering the major domains of neuroscience, e.g., brain anatomy, cells, organisms, diseases, techniques Services for accessing the NIF vocabulary and NIF tools Best practices for creating discoverable and interoperable resources Data annotation services: NIF experts can enhance your resource through semantic tagging NIF cards: Easy links to neuroscience information from any web browser Ontology services: NIF knowledge engineers can help create or extend ontologies for neuroscience Integration of NIF services and ontologies http://wholebraincatalog.org WBC and Simulation Visualization Demonstrates the neurogenesis simulation driven by the model of Aimone et al., 2009 from the Gage lab at the Salk Institute within the Whole Brain Catalog http://www.youtube.com/watch?v=1YzfXv4yNzg WBC and NeuroConstruct A network model of the cerebellar granule cell layer which can be fully expressed as a Level 3 NeuroML file. Visualised in the Whole Brain Catalog (left), and neuroConstruct (right) http://wiki.wholebraincatalog.org/wiki/Running_Simulations http://www.neuroml.org/tool_support.php Simple tool for linking search results to other sources of information NIF cards http://nifcards.neuinfo.org/nifstd/anatomi cal_structure/birnlex_1489.html NIF literature results display for “Cerebellum”; concepts in NIF ontologies highlighted and linked to more information through NIF knowledge base Providing Semantic Content RDF data / SPARQL Queries The NIF Team • • • • • • • • • • • • • • • Maryann Martone, UCSD-PI Jeff Grethe, UCSD-Co PI Amarnath Gupta, UCSD-Co-PI Ashraf Memon, UCSD, Project Manager Anita Bandrowski, UCSD, NIF Curator Fahim Imam, UCSD, Ontology Engineer David Van Essen, Wash U, Co-PI Erin Reid, Wash U Gordon Shepherd, Yale, Co-PI Perry Miller, Yale Luis Marenco, Yale Rixin Wang, Yale Paul Sternberg, Cal Tech, Co-PI Hans Michael-Muller, Cal Tech Arun Ragarajan, Cal Tech • Giorgio Ascoli, George Mason, Co-PI • Sridevi Polavaram, George Mason • Vadim Astakhov, UCSD • Andrea Arnaud-Stagg, UCSD • Lee Hornbrook, UCSD • Jennifer Lawrence, UCSD • Irfan Baig, UCSD student • Anusha Yelisetty, UCSD student • Timothy Tsui, UCSD student • Chris Condit, UCSD • Xufei Qian, UCSD • Larry Liu, UCSD