HPCO-Project-Review-June2009.pptx

advertisement
Ontology supported Knowledge Discovery in
the field of Human Performance and Cognition
Kno.e.sis Center http://knoesis.org
Wright State University
C. Thomas, P. Mendes, D. Cameron, A. Sheth, TK Prasad
Thanks, C. Ramakrishnan
Knowledge Enabled Information and Services Science
Motivation
• Speed up the “search” part of research
• Develop a system that helps in focused
exploration of a domain
Knowledge Enabled Information and Services Science
Cognition-related concepts
Knowledge Enabled Information and Services Science
Subject  Predicate  Object search
Knowledge Enabled Information and Services Science
Subject  Predicate  ? search
Knowledge Enabled Information and Services Science
Demo
Knowledge Enabled Information and Services Science
Motivation
Knowledge Enabled Information and Services Science
Motivation
• For this reason we propose a largely
automatic domain model generation on the
basis of existing knowledge repositories
and text corpora, namely MeSH, MedLine
and Wikipedia.
Knowledge Enabled Information and Services Science
Results
Knowledge Enabled Information and Services Science
Full Class Hierarchy
Knowledge Enabled Information and Services Science
Top Levels by importance (#Concepts)
Knowledge Enabled Information and Services Science
Selective Hierarchy – CogSci/NeuroSci
Knowledge Enabled Information and Services Science
Zooming in on Cognition-related concepts
Knowledge Enabled Information and Services Science
Biomolecules-related concepts
Knowledge Enabled Information and Services Science
Technologies
Knowledge Enabled Information and Services Science
Steps
1. Carve a focused domain model out of Wikipedia
2. Identify mentions of entities and relationships in
the relevant scientific literature (Pubmed
abstracts) to support non-hierarchical guidance.
3. Map extracted entity mentions to concepts and
extracted predicates to relationships
4. Applications to access research literature guided
by the domain model.
Knowledge Enabled Information and Services Science
Steps
1. Carve a focused domain model out of Wikipedia
2. Identify mentions of entities and relationships in
the relevant scientific literature (Pubmed
abstracts) to support non-hierarchical guidance.
3. Map extracted entity mentions to concepts and
extracted predicates to relationships
4. Applications to access research literature guided
by the domain model.
Knowledge Enabled Information and Services Science
Knowledge Enabled Information and Services Science
Knowledge Enabled Information and Services Science
Wisdom of the Crowds
Knowledge Enabled Information and Services Science
Knowledge Enabled Information and Services Science
Overview - conceptual
Knowledge Enabled Information and Services Science
Expand - conceptually
Delete
Full
Graph-based
textresults
searchwith
on
lowArticle
expansion
Lucene
texts
score
Knowledge Enabled Information and Services Science
Node Similarity computation
|N (a )| ,|N (b )|
s(a,b) 
i, j
 avgwN (a),wN
N i (a ), N j (b )
M

i
j

(b)
|N (a )|

|N (b )|
avg w N i (a),
w N j (b)
j
 i

Knowledge Enabled Information and Services Science
Collecting Instances
Graph Search
Graph Search
(Denis)
Seed Query
Fulltext Concept
Search (Somnath)
Graph Search
(Denis)
Graph Search
(Denis)
B
Wikipedia
Knowledge Enabled Information and Services Science
Creating a Hierarchy
Graph Search
Graph Search
(Denis)
Seed Query
Fulltext Concept
Search (Somnath)
Graph Search
(Denis)
Graph Search
(Denis)
B
Wikipedia
Knowledge Enabled Information and Services Science
Creating a Hierarchy
Graph Search
Graph Search
(Denis)
Seed Query
Fulltext Concept
Search (Somnath)
Graph Search
(Denis)
Graph Search
(Denis)
B
Wikipedia
Knowledge Enabled Information and Services Science
Hierarchy Creation - summary
Graph Search
Graph Search
(Denis)
Seed Query
Fulltext Concept
Search (Somnath)
Graph Search
(Denis)
Graph Search
(Denis)
B
Wikipedia
Hierarchy Creation
Knowledge Enabled Information and Services Science
Reduce steps
• Remove all terms that have low
pertinence to the domain
• Intersect hierarchy with broader
focus domain
• Reduce hierarchy depth
Knowledge Enabled Information and Services Science
Remove unwanted individuals
Knowledge Enabled Information and Services Science
Remove unwanted categories
Knowledge Enabled Information and Services Science
Flatten categories
Knowledge Enabled Information and Services Science
Snapshot of final Topic Hierarchy
Knowledge Enabled Information and Services Science
Evaluation wrt. expert model
Evaluation wrt. MeSH versions of 2004 (04) and 2008
(08) for both the restricted and the full set of MeSH term
Knowledge Enabled Information and Services Science
Steps
1. Carve a focused domain model out of Wikipedia
2. Identify mentions of entities and relationships in
the relevant scientific literature (Pubmed
abstracts) to support non-hierarchical guidance.
3. Map extracted entity mentions to concepts and
extracted predicates to relationships
4. Applications to access research literature guided
by the domain model.
Knowledge Enabled Information and Services Science
Challenges and Opportunities
• Vocabularies, Thesauri, Ontologies, exist
in several fields
– How can we use them?
• lexicon: Fish Oils, Raynaud’s Disease, etc.
• types/labels: Fish Oils instance of Lipid
• relationships between types: Lipid affects Disease
• Identification of simple ontology terms
in text is not enough
But
– Compound Entities
– Complex Relationships
35
Knowledge Enabled Information and Services Science
Challenge: Compound Entities
MeSH terms
Modifier
• Entities (MeSH terms) in sentences occur in modified forms
• “adenomatous” modifies “hyperplasia”
• “An excessive endogenous or exogenous stimulation” modifies
“estrogen”
• Entities can also occur as composites of 2 or more other entities
• “adenomatous hyperplasia” and “endometrium” occur as
“adenomatous hyperplasia of the endometrium”
Knowledge Enabled Information and Services Science
36
Extraction Algorithm
Relationship head
Subject head
Object head
Knowledge Enabled Information and Services Science
Object head
37
Extraction Approach
●
●
●
Parse sentences with a dependency parser
Use a few domain-independent rules to
segment sentences into SubjPredObject
Subjects and Objects represent compound
entities
Knowledge Enabled Information and Services Science
Preliminary results
Subject types, inferred from the HEAD of the compound
Knowledge Enabled Information and Services Science
39
Extracted Triples
Knowledge Enabled Information and Services Science
40
Representation – Resulting RDF
hyperplasia
adenomatous
hasModifier
hasPart
modified_entity2
An excessive
endogenous or
exogenous stimulation
hasModifier
hasPart
modified_entity1
induces
composite_entity1
hasPart
hasPart
estrogen
Modifiers
Modified entities
Composite Entities
endometrium
41
Knowledge Enabled Information and Services Science
Using the generated RDF
stimulated
migraine
(D008881)
platelet
(D001792)
collagen
(D003094)
hasPart
hasPart
magnesium
(D008274)
stimulated
hasPart
caused_by
me_2286
_13%_and_17%_adp_and_collagen_induced_platelet_aggregation
me_3142
by_a_primary_abnormality_of_platelet_behavior
Knowledge Enabled Information and Services Science
Steps
1. Carve a focused domain model out of Wikipedia
2. Identify mentions of entities and relationships in
the research literature.
3. Map extracted entity mentions to concepts and
extracted predicates to relationships
4. Applications to access research literature guided
by the domain model.
Knowledge Enabled Information and Services Science
Step 3: Identifying Synonymy and Antonymy
between Verbs for Relationship Matching
Christopher Thomas, Wenbo
Wang
Knowledge Enabled Information and Services Science
Motivation
• An ontology has clearly demarcated, formal
relationships whereas natural language
uses multiple ways of representing these
relationships
• Focus on verbs expressing relationships
• NLP-extracted triples use verbs to indicate
relationships, but are not aligned with the
relationships yet.
• Align similar verbs  Align extracted
predicates to formal relationships
Knowledge Enabled Information and Services Science
Examples of relationships to merge
• Dilation of the fourth ventricle indicated atrophy of the
middle cerebellar peduncle.
– E.g. < Dilation of the fourth ventricle >
 indicate
<atrophy of the middle cerebellar peduncle>
GOOD
• The lesions in the rolipram-treated group also showed
increased astrogliosis and increased CREB
phosphorylation in the activated microglia and astrocytes.
– E.g. < lesions in the rolipram-treated group >

showed
indicate
< increased astrogliosis and increased CREB phosphorylation in
the activated microglia and astrocytes >
GOOD
BAD
Knowledge Enabled Information and Services Science
Short Review – training set creation
• Normalization
Remove phrases containing less than 2 verbs
• POS Tagging
Eating -> <VBG>, eats -> <VBZ>
Like <-> Hate
• Synonyms & antonyms extraction ANT:
SYN: calculate <-> compute
• Pattern extraction
He learns and memorizes materials
• Pattern2Relationship matrix
• VerbPair2Pattern matrix
What is the correlation of this pattern
and that relationship?
What is the correlation of this verb pair
and that pattern?
Knowledge Enabled Information and Services Science
Design
• Synonyms & antonyms extraction from Wordnet
– A single verb has multiple meanings
• contract, take, get (be stricken by an illness, fall victim to an
illness) "He got AIDS"; "She came down with pneumonia";
"She took a chill“
• film#1, shoot#4, take#16 (make a film or photograph of
something) "take a scene"; "shoot a movie"
• Take has 42 meanings in Wordnet
– Some meanings of a verb are less frequently used
– If the number of meanings of a verb exceeds
threshold, we will abandon this verb
– If the frequency of a specific meaning of a verb is low,
we will abandon this meaning
Knowledge Enabled Information and Services Science
Solution
• Do pattern-based text analysis to find
predicates that are used in similar contexts
and map them to upper-level relationships,
e.g. those found in UMLS
Knowledge Enabled Information and Services Science
Steps
1. Carve a focused domain model out of Wikipedia
2. Identify mentions of entities and relationships in
the research literature.
3. Map extracted entity mentions to concepts and
extracted predicates to relationships
4. Applications to access research literature guided
by the domain model.
Knowledge Enabled Information and Services Science
Step 4
Trellis:
Knowledge-Driven Web Browsing
Pablo N. Mendes, Cartic Ramakrishnan, Delroy Cameron and Amit P. Sheth
Knoesis Center, May 27 2009.
Introduction




Background
System Overview
Demo
Future Work
61
Background

Web Browsing

Search

Keyword search


Sift

Retrieve documents


62
Point of access to related content
Hyperlinks
Query Refinement
Search
and
Sift
Background

Observations
Human
Cognition
Keyword
Graduate
Student
Research
Assistant
Entity
Turkey
Model
San
Diego
Relationship
e1
e2
en
Information Need
(Documents)
63
e1
Pathway
System Overview

New Browsing Paradigm



Search
Navigate Context
Organize results
Search

Navigate/
Stitch
Share
Trellis


64
Intricate support structure for vines
Vines as information need/context through navigation
System Overview
Ontology
Index
{NLM hand annotated triples}
Spotter
Workbench
Fig.1 Trellis Architecture
65
Knowledge
Base
{Pubmed Abstracts}
Save/Publish
Magnesium -> isa -> Calcium Channel Blocker
Calcium Channel Blockers -> prevent -> Migraine
Migraine
Magnesium -> isa -> Calcium Channel Blocker
-> prevent -> Migraine
Magnesium -> prevent -> Migraine
Future Work

Ranking - Boosting Precision/Recall



Scalability


Information Extraction from natural language
Named Entity Recognition


User feedback
Corpus Statistics
User driven
Knowledge Discovery
69
Thank you
Knowledge Enabled Information and Services Science
Download