Making Resources Discoverable for the

advertisement
The Neuroscience Information Framework
Making Resources Discoverable for the Computational
Neuroscience Community
Jeffrey S. Grethe, Ph. D.
Co-Principal Investigator, NIF
Center for Research in Biological Systems
University of California, San Diego
OCNS 2010
Workshop on Methods in Neuroinformatics
The Neuroscience Information Framework: Discovery and
utilization of web-based resources for neuroscience





http://neuinfo.org
A portal for finding and
using neuroscience
resources
A consistent framework
for describing resources
Provides simultaneous
search of multiple types
of information,
organized by category
Supported by an
expansive ontology for
neuroscience
Utilizes advanced
technologies to search
the “hidden web”
UCSD, Yale, Cal Tech, George Mason, Washington Univ
Brief History of NIF
• Outgrowth of Society for Neuroscience Neuroinformatics
Committee
– Neuroscience Database Gateway: a catalog of neuroscience
databases
• “Didn’t I fund this already?”
– Over 2500 databases are on-line; no one can go to them all
• “Why can’t I have a Google for neuroscience”
– “Easy”, comprehensive, pervasive
• Phase I-II: Funded by a broad agency announcement from the
NIH Neuroscience Blueprint
– Feasibility
• Current phase: Started Sept 2008
How can we provide a consistent and easy to implement
framework for those who are providing resources, eg., data,
and those looking for these data and resources
➤ Both humans and machines
The Problem
• Over 2000 databases have been identified
through NIF
– Researchers can’t visit them all
– Most content from these resources not easily found
through standard search engines
– Even more structured content on the web
• Databases provide domain specific views of data
– NIF provides a snapshot of information in a simple to
understand form that can be further explored in the
native database
– Providing a biomedical science based semantic
framework for resource description and search
NIF uniquely provides access to the
largest registry of neuroscience
resources available on the web
Date
Data
Data
Federation
Federation
Records
Catalog
Web Index
Literature
Corpus
NIF
Vocabulary
9/2008
5
60,420*
388
113,458
67,000
18,884†
7/2009
18
4,393,744*
1,605
497,740
17,086
5/2010
% yearly
increase
% overall
increase
55
23,228,658
2,871
1,184,261
101,627
All
(PubMed)
205
429
79
138
181
1,000
38,345
640
944
210
* Numbers for initial sources were generated by examining current source content
† First year of NIF contract involved re-factoring of ontology
53,023
Guiding principles of NIF
• Builds heavily on existing technologies (open source tools and
ontologies)
• Information resources come in many sizes and flavors
• Framework has to work with resources as they are, not as we wish
them to be
– Federated system; resources will be independently maintained
– Developed for their own purpose with different levels of resources
• No single strategy will work for the current diversity of neuroscience
resources
• Trying to design the framework so it will be as broadly applicable as
possible to those who are trying to develop technologies
• Interface neuroscience to the broader life science community
• Take advantage of emerging conventions in search, semantic web,
linked data and in building web communities
A Quick Tour of the NIF
http://neuinfo.org
Domain Enhanced Search for Neuroscience
NIF now searches more than 55 databases with information
neuronal descriptions, neuronal morphology, connectivity, chemical
compounds…
Ontology Based Search Refinement
Diverse Database Content
NeuroMorpho.org
NeuronDB
Concept-based search
•
•
Search Google: GABAergic neuron
Search NIF: “GABAergic neuron”
– NIF automatically searches for types of
GABAergic neurons
Types of
GABAergic
neurons
Concept-based search
Use of Ontologies within NIF
• Controlled vocabulary for describing type of resource
and content
– Database, Image, Parkinson’s disease
• Entity-mapping of database and data content
• Data integration across sources
• Search: Mixture of mapped content and string-based
search
– Different parts of NIF use the vocabularies in different ways
– Utilize synonyms, parents, children to refine search
– Increasing use of other relationships and logical inferencing
• Generation of semantic content (i.e. RDF, Linked
Data)
Building the NIF Ontologies
http://neurolex.org
Modular Ontologies
• Set of expanded vocabularies largely imported from existing
terminological resources
• Adhere to ontology best practices as we understood them
•
•
•
•
•
•
•
•
Built from existing resources when possible
Standardized to same upper ontology: BFO
Encoded in OWL DL
Provides mapping to source terminologies
Provides synonyms, lexical variants, abbreviations
Single inheritance
trees with minimal
cross domain and
intradomain
properties
Orthogonal:
Neuroscientists
didn’t like too
many choices
Human readable
definitions (not
complete yet)
NIFSTD
Organis
m
Macroscopic
Anatomy
Molecule
Subcellular
Anatomy
Macromolecule
Molecule
Descriptors
Gene
Cell
NS
Dysfunctio
n
Quality
NS
Function
Resource
Investigatio
n
Techniques
Instruments
Reagent
Protocols
Anatomy
Cell Type
CNS
Neuron
Cellular
Component
Small
Molecule
Neurotransmitter
Transmembrane
Receptor
Purkinje
Cell
Cytoarchitectural
Part of
Cerebellar Cortex
Purkinje
Cell Layer
Dentate
Nucleus
Neuron
Cpllection of
Deep Cerebellar
Nuclei
Expressed in
GABA
GABA-R
Presynaptic
density
Terminal Axon
Bouton
Dentate
Nucleus
Transmitter
Vesicle
Located in
“Bridge files”
NIF Cell
• NIF has made significant enhancements to its
cell ontology
– Expanded neuron list
– Generated neuronal classifications based on
neurotransmitter, brain region, molecules,
morphology, circuit role
– Recommended standard naming convention
– Is working with the International Neuroinformatics
Coordinating Facility through the PONS (program in
ontologies for neural structures) program
• Creating Knowledge base for neuronal classification based
on properties
Neurolex Wiki
•NIF has posted its
vocabularies in Wiki form
(Semantic MediaWiki)
•Simplified interface for
ontology construction and
refinement
•Custom forms for neurons
and brain regions
•Semantic linking between
category pages
•Significant knowledge base
•Curation  NIFSTD
http://neurolex.org
NeuroLex and NeuroML
“There was further discussion of how to define specific
types of morphological groups such as apical dendrites,
basal dendrites, axons, etc. Several options include
having predefined names for common types or linking to
ontologies that define these types. We suggest adding
tags or rdf for metadata that provide NeuroLex ontology
ids to groups. We propose to begin with simple tags, and
when a tag is present, one should assume it indicates “is
a”. If more complicated semantic information is needed,
we can use rdf in a way that is similar to SBML.”
NeuroML Development Workshop 2010
http://www.neuroml.org/files/NeuroMLWorkshop2010.pdf
Providing community
access
http://neuinfo.org
Access at various levels…
•
•
•
•
•
•
•
•
•
•
•
A search portal (link to NIF advanced search interface) for researchers,
students, or anyone looking for neuroscience information, tools, data or
materials.
Access to content normally not indexed by search engines, i.e, the "hidden
web”
Tools for resource providers to make resources more discoverable, e.g.,
ontologies, data federation tools, vocabulary services
Tools for promoting interoperability among databases
Standards for data annotation
The NIFSTD ontology covering the major domains of neuroscience, e.g.,
brain anatomy, cells, organisms, diseases, techniques
Services for accessing the NIF vocabulary and NIF tools
Best practices for creating discoverable and interoperable resources
Data annotation services: NIF experts can enhance your resource through
semantic tagging
NIF cards: Easy links to neuroscience information from any web browser
Ontology services: NIF knowledge engineers can help create or extend
ontologies for neuroscience
Integration of NIF
services and ontologies
http://wholebraincatalog.org
WBC and Simulation Visualization
Demonstrates the
neurogenesis
simulation driven
by the model of
Aimone et al.,
2009 from the
Gage lab at the
Salk Institute
within the Whole
Brain Catalog
http://www.youtube.com/watch?v=1YzfXv4yNzg
WBC and NeuroConstruct
A network model of the cerebellar granule cell layer which can be fully
expressed as a Level 3 NeuroML file. Visualised in the Whole Brain Catalog
(left), and neuroConstruct (right)
http://wiki.wholebraincatalog.org/wiki/Running_Simulations
http://www.neuroml.org/tool_support.php
Simple tool for linking search
results to other sources of
information
NIF cards
http://nifcards.neuinfo.org/nifstd/anatomi
cal_structure/birnlex_1489.html
NIF literature results display for “Cerebellum”; concepts in NIF ontologies highlighted and linked to more information through NIF
knowledge base
Providing Semantic Content
RDF data / SPARQL Queries
The NIF Team
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Maryann Martone, UCSD-PI
Jeff Grethe, UCSD-Co PI
Amarnath Gupta, UCSD-Co-PI
Ashraf Memon, UCSD, Project Manager
Anita Bandrowski, UCSD, NIF Curator
Fahim Imam, UCSD, Ontology Engineer
David Van Essen, Wash U, Co-PI
Erin Reid, Wash U
Gordon Shepherd, Yale, Co-PI
Perry Miller, Yale
Luis Marenco, Yale
Rixin Wang, Yale
Paul Sternberg, Cal Tech, Co-PI
Hans Michael-Muller, Cal Tech
Arun Ragarajan, Cal Tech
• Giorgio Ascoli, George Mason,
Co-PI
• Sridevi Polavaram, George
Mason
• Vadim Astakhov, UCSD
• Andrea Arnaud-Stagg, UCSD
• Lee Hornbrook, UCSD
• Jennifer Lawrence, UCSD
• Irfan Baig, UCSD student
• Anusha Yelisetty, UCSD
student
• Timothy Tsui, UCSD student
• Chris Condit, UCSD
• Xufei Qian, UCSD
• Larry Liu, UCSD
Download