Slide - KR

advertisement
Mental Functioning and Semantic
Search in the Neuroscience
Information Framework
Maryann Martone
Fahim Imam
Funded in part by the NIH Neuroscience Blueprint
HHSN271200800035C via NIDA
Neuroscience Information Framework – http://neuinfo.org
The Neuroscience Information Framework: Discovery and
utilization ofLiterature
web-based resources for neuroscience
UCSD, Yale, Cal Tech, George
Mason, Washington Univ
Database
Federation
Registry
Supported by NIH Blueprint
•
A portal for finding and using
neuroscience resources

A consistent framework for
describing resources

Provides simultaneous
search of multiple types of
information, organized by
category

Supported by an expansive
ontology for neuroscience

Utilizes advanced
technologies to search the
“hidden web”
http://neuinfo.org
NIF takes a global view of resources
• NIF’s goal: Discover and use
resources
–
–
–
–
–
Data
Databases
Tools
Materials
Services
• Federated approach: Resources
are developed and maintained by
the community
– >150 data sources; 350M records
• Agile approach: the NIF system is
designed to be populated quickly
and allow for incremental
improvements to representation
and search
NIF’s Rules for
using digital
resources
#1: YOU HAVE TO FIND
THEM!!!!!!!
#2: You have to access/open
them
#3: You have to understand
them
– Contract specifies 25 sources/year
Neuroscience is inherently interdisciplinary; no one technique
reveals all
What do you mean by data?
Databases come in many shapes and sizes
• Registries:
• Primary data:
– Data available for reanalysis, e.g.,
microarray data sets from GEO; brain
images from XNAT; microscopic images
(CCDB/CIL)
• Secondary data
– Data features extracted through data
processing and sometimes
normalization, e.g, brain structure
volumes (IBVD), gene expression levels
(Allen Brain Atlas); brain connectivity
statements (BAMS)
• Tertiary data
– Claims and assertions about the meaning
of data
• E.g., gene
upregulation/downregulation, brain
activation as a function of task
– Metadata
– Pointers to data sets or
materials stored elsewhere
• Data aggregators
– Aggregate data of the same
type from multiple sources,
e.g., Cell Image Library
,SUMSdb, Brede
• Single source
– Data acquired within a single
context , e.g., Allen Brain Atlas
NIFSTD Ontologies
•
•
•
•
Set of modular ontologies
– 86, 000 + distinct concepts +
synonyms
Expressed in OWL-DL language
– Supported by common DL
Reasoners
– Currently supports OWL 2
Closely follows OBO community best
practices
Avoids duplication of efforts
– Standardized to the same upper
level ontologies
• e.g., Basic Formal Ontology
(BFO), OBO Relations
Ontology (OBO-RO)
– Relies on existing community
ontologies
• e.g., CHEBI, GO, PRO, DOID,
OBI etc.
Bill Bug et al.
•
Modules cover orthogonal domain
e.g. , Brain Regions, Cells, Molecules,
Subcellular parts, Diseases,
Nervous system functions, etc.
Neuroscience Information Framework – http://neuinfo.org
5
Importing into NIFSTD
• NIF converts to OWL and aligns to BFO, if not already
– Facilitates ingestion, but can have negative consequences for
search if model adds computational complexity
• Data sources do not make careful distinctions but use what is
customary for the domain
• Modularity: NIF seeks to have single coverage of a subdomain
– We are not UMLS or Bioportal
• NIF uses MIREOT to import individual classes or branches of
classes from large ontologies
– NIF retains identifier of source
• NIF uses ID’s for names, not text strings
– Avoids collision
– Allows retiring of class without retiring the string
NIFSTD has evolved as the ontologies have evolved; had to make many
compromises based on ontologies and tools available
NIFSTD Modules and Sources
NIFSTD Modules
Organismal taxonomy
External Source
NCBI Taxonomy, GBIF, ITIS, IMSR, Jackson Labs mouse catalog; the model
organisms in common use by neuroscientists are extracted from NCBI
Taxonomy and kept in a separate module with mappings
IUPHAR ion channels and receptors, Sequence Ontology (SO); NIDA drug lists
from ChEBI, and imported Protein Ontology (PRO)
Import/ Adapt
Adapt
Sub-cellular anatomy
Sub-cellular Anatomy Ontology (SAO). Extracted cell parts and subcellular
structures from SAO-CORE. Imported GO Cellular Component with mapping.
Adapt/Import
Cell
CCDB, NeuronDB, NeuroMorpho.org. Terminologies; OBO Cell Ontology was
not considered as it did not contain region specific cell types
NeuroNames extended by including terms from BIRNLex, SumsDB,
BrainMap.org, etc; Multi-scale representation of Nervous System, Macroscopic
anatomy
BIRN, BrainMap.org, MeSH, and UMLS , GO Biological functions
Adapt
Nervous system disease from MeSH, NINDS terminology; Imported Disease
Ontology (DO) with mapping
Phenotypic Quality Ontology (PATO); Imported as part of the OBO foundry
core
Overlaps with molecules above from ChEBI, SO, and PRO
Adapt/Import
Molecules,
Chemicals
Gross Anatomy
Nervous system
function
Nervous system
dysfunction
Phenotypic qualities
Investigation: reagents
Investigation: instruments, CogPo, BIRNLex
protocols, plans
Investigation:
resource NIF, OBI, NITRC, Biomedical Resource Ontology (BRO)
type
Biological Process
Gene Ontology (GO) biological process
Adapt/Import
Adapt
Adapt
Import
Adapt/Import
Adapt
Adapt
Import
Neuroscience Information Framework – http://neuinfo.org
What are the connections of the
hippocampus?
Hippocampus OR “Cornu Ammonis” OR
“Ammon’s horn”
Data sources
categorized by
“data type” and
level of nervous
system
Common views
across multiple
sources
Link back to
record in
original source
Query expansion: Synonyms
and related concepts
Boolean queries
Tutorials for using
full resource when
getting there from
NIF
Entity mapping
BIRNLex_435
Brodmann.3
Explicit mapping of database content helps disambiguate nonunique and custom terminology
•
•
NIF Concept-Based Search
Search Google: GABAergic neuron
Search NIF: GABAergic neuron
– NIF automatically searches for types of GABAergic
neurons
– Defined by OWL axioms
Types of GABAergic
neurons
Neuroscience Information Framework – http://neuinfo.org
Ontological Query expansion through OntoQuest
Example Query Type
A single term query for Hippocampus and its synonyms
A conjunctive query with 3 terms
A 6-term AND/OR query with one term expanded into
synonyms
A conjunctive query with 2 terms, where a user chooses
to select the subclasses of the 2nd term
A single term query for an anatomical structure where a
user chooses to select all of the anatomical parts of the
term along with synonyms
A conjunctive query with 2 terms, where a user chooses
to select all the equivalent terms for the 2nd term
A conjunctive query with 2 terms, where a user is
interested in a specific subclasses for both of the terms
A query to seek all subclasses of neuron whose soma
location is in any transitive part of the hippocampus
A query to seek a conceptual term that is
semantically equivalent to a collection of terms
rather than a single term.
Ontological Expansion
synonyms(Hippocampus);
expands to Hippocampus OR "Cornu ammonis" OR "Ammon's horn" OR "hippocampus
proper".
transcription AND gene AND pathway
(gene) AND (pathway) AND (regulation OR "biological
regulation") AND (transcription) AND (recombinant)
synonyms(zebrafish AND descendants(promoter,subclassOf))),
zebrafish gets expanded by synonym search and the second term transitively
expands to all subclasses of promoter as well as their synonyms.
synonyms(descendants(Hippocampus,partOf)),
expands to all parts of hippocampus and all their synonyms through the ontology.
All parts are joined as an “OR” operation.
synonyms(Hippocampus) AND equivalent(synonyms(memory)),
the second term uses the ontology to find all terms that are equivalent to the term
memory by ontological assertion, along with synonyms.
synonyms(x:descendants(neuron,subclassOf) where
x.neurotransmitter='GABA') AND synonyms(gene where gene.
name='IGF'), x is an internal variable.
synonyms(x:descendants(neuron,subclassOf) where
x.soma.location = descendants (Hippocampus, partOf))
'GABAergic neuron' AND
Equivalent ('GABAergic neuron'),
The term is recognized as ontologically equivalent to any neuron that has
GABA as a neurotransmitter and therefore expands to a list of inferred neuron
types
OntoQuest – NIF’s ontology management system for NIFSTD ontologies
• Implements various graph search algorithms for ontological graphs
•Automated query expansion for NIFSTD terms, including the ones with defined logical restrictions.
Gupta et al., 2010
NIF information space
NIF developed a tiered system
• Domain knowledge
Concepts
– What you would teach someone
coming into your domain
• NIFSTD/Ontoquest
• All upper level BFO categories are
suppressed
• Claims based on data
– Bridge files across domains
(constructed by NIF), Databases,
triple stores,
– Text
• Data
– Relational databases
– Spreadsheets
Knowledge
Base
Data
Concepts, Entities + data summaries
Scientists search via the terms they use,
not what we would like them to use-NIF
needs a broad net to find relevant
resources
What genes are upregulated by drugs of abuse
in the adult mouse?
Gene upregulated mice illegal drug
When searching across broad information sources, need to search for what
people are looking for
NIF “translates” common concepts through
ontology and annotation standards
• What genes are upregulated by drugs of abuse in the
adult mouse?
Morphine
Increased
expression
Adult Mouse
Arbitrary but defensible
NIFSTD AND NEUROLEX WIKI
•
•
•
•
•
Semantic wiki platform
Provides simple forms for
structured knowledge
People can add concepts,
properties, and annotations
Generate hierarchies without
having to learn complicated
ontology tools
Community can contribute
–
–
–
–
Relax rules for NIFSTD so
dedicated domain scientists can
contribute their knowledge and
review other contributions
Teaches structuring of knowledge
via red links/blue links
Process is tracked and exposed
Implemented versioning
Larson et al.
Readily indexed by Google; queries to NIF data via NIF navigator
15
NeuroLex Content Structure
Stephen D. Larson et al.
Neurolex is becoming a significant knowledge base
Top Down Vs. Bottom up
NIFSTD
NEUROLEX
Top-down ontology construction
• A select few authors have write privileges
• Maximizes consistency of terms with each other
• Making changes requires approval and re-publishing
• Works best when domain to be organized has: small corpus, formal categories,
stable entities, restricted entities, clear edges.
• Works best with participants who are: expert catalogers, coordinated users, expert
users, people with authoritative source of judgment
Bottom-up ontology construction
• Multiple participants can edit the ontology instantly
• Semantics are limited to what is convenient for the domain
• Not a replacement for top-down construction; sometimes necessary to increase flexibility
• Necessary when domain has: large corpus, no formal categories, no clear edges
• Necessary when participants are: uncoordinated users, amateur users, naïve catalogers
• Neuroscience is a domain that is less formal and neuroscientists are more uncoordinated
Larson et. al
Neuroscience Information Framework – http://neuinfo.org
Engaging domain scientists
Planned
process
Disposition
?
Continuant
?
Cognitive
process
Mental
Process
?
Mental state
Recall
Memory
Retrieval
Episodic
Nondeclarative
Encoding
Mental functioning is difficult to define
and dissect
•Very few behaviors are “pure”
•Operationally defined
through experiments
•What is a mental function?
•Activity, state, function,
process
•Subtypes are rarely disjoint
•Episodic memory
•Semantic memory
•Procedural memory
•Declarative memory
•Distinctions among paradigms,
assessments, tests, rating scales,
tasks are often subtle
Early work done in BIRN; later terms added by students and curators
Neurolex does not adhere strictly to
BFO
Concepts and things happily co-exist; content gets reconciled over time
Nevertheless...
• We do not allow
duplicates
• We do not allow
multiple inheritance
– Use “role” to shortcut
many relations
• We do try to re-factor
contributions so as to
avoid collisions across
our domains
• But...once they are in
the wiki, they will
move about and be
added to as necessary
Neuinfo.org/neurolex/wiki/COGPO_00123
Cognitive-related searches through NIF
•
•
•
•
•
•
•
•
•
•
fear prefrontal arousal
Attention and distraction
Passive viewing
stroop effect
sequence learning
studies done on the
cognitive-behavioral model
of addiction
memory recall
self-administration
Visual oddball paradigm
Sexual Orientation
•Face recognition
•neurophysiology of
language
•Olfaction
•Consciousness
•Gustatory
Scientists tend to focus on tests and general concepts rather than deep
considerations of cognitive processes
Mental Functioning: What NIF needs
• Computable taxonomies of test (assessments,
paradigms, tasks) types
– Test types should be related to the function they
purport to measure but will only be an approximation
– Not just human!!!
• Computable operational definitions of cognitive
concepts
– Translates tests into concepts used in search
– Dementia rating scale scores = Dementia
– Smoking assessment scores = smoker
Concluding Remarks
• NIFSTD is utilized to provide a semantic index to
heterogeneous data sources
– BFO allows us to promote a broad semantic
interoperability between biomedical ontologies.
– The modularity principles allows us to limit the complexity
of the base ontologies
• NIF defines a process to form complex semantics to
neuroscience concepts through NIFSTD and NeuroLex
collaborative environment.
– NIF encourages the use of community ontologies
• Moving towards building rich knowledgebase for
Neuroscience that integrates with larger life science
communities
Neuroscience Information Framework – http://neuinfo.org
Points of Discussion
CogPO/CogAT/NEMO/MHO Harmonization?
• What kind of interplay are we looking at?
• Is it about re-use of ontological vocabularies?
• What should be the best practice for reuse?
– Re-using URI vs Creating new class and Mapping
– Non-semantic reuse of classes as entities (e.g., MIREOT)
• Is it about building new relationships between the entities covered in all these four ontologies?
–
•
•
What do we achieve through doing this?
Are we trying to connect all the curated/ annotated experimental data-set to a common semantic
layer?
All of the above?
What should be NIF's role?
• How can we help to expose your experiments and results to a broader audience through our
interface?
• What kind of involvement can people have in terms of re-using your ontological content or
contributing to your content?
• We want to be the 'host' of all the NS concepts and entities, but not necessarily the 'maintainer'.
What ontology isn’t
(or shouldn’t be)
• A rigid top-down fixed hierarchy for
limiting expression in the
neurosciences
– Not about restricting expression
but how to express meaning clearly
and in a machine readable form
• A bottomless resource-eating pit
that consumes dollars and returns
nothing
• A cure-all for all our problems
• A completely solved area
– Applied vs theoretical
• Easy to understand
Mike Bergman
Download