Ontology supported Knowledge Discovery in the field of Human Performance and Cognition Kno.e.sis Center http://knoesis.org Wright State University C. Thomas, P. Mendes, D. Cameron, A. Sheth, TK Prasad Thanks, C. Ramakrishnan Knowledge Enabled Information and Services Science Motivation • Speed up the “search” part of research • Develop a system that helps in focused exploration of a domain Knowledge Enabled Information and Services Science Cognition-related concepts Knowledge Enabled Information and Services Science Subject Predicate Object search Knowledge Enabled Information and Services Science Subject Predicate ? search Knowledge Enabled Information and Services Science Demo Knowledge Enabled Information and Services Science Motivation Knowledge Enabled Information and Services Science Motivation • For this reason we propose a largely automatic domain model generation on the basis of existing knowledge repositories and text corpora, namely MeSH, MedLine and Wikipedia. Knowledge Enabled Information and Services Science Results Knowledge Enabled Information and Services Science Full Class Hierarchy Knowledge Enabled Information and Services Science Top Levels by importance (#Concepts) Knowledge Enabled Information and Services Science Selective Hierarchy – CogSci/NeuroSci Knowledge Enabled Information and Services Science Zooming in on Cognition-related concepts Knowledge Enabled Information and Services Science Biomolecules-related concepts Knowledge Enabled Information and Services Science Technologies Knowledge Enabled Information and Services Science Steps 1. Carve a focused domain model out of Wikipedia 2. Identify mentions of entities and relationships in the relevant scientific literature (Pubmed abstracts) to support non-hierarchical guidance. 3. Map extracted entity mentions to concepts and extracted predicates to relationships 4. Applications to access research literature guided by the domain model. Knowledge Enabled Information and Services Science Steps 1. Carve a focused domain model out of Wikipedia 2. Identify mentions of entities and relationships in the relevant scientific literature (Pubmed abstracts) to support non-hierarchical guidance. 3. Map extracted entity mentions to concepts and extracted predicates to relationships 4. Applications to access research literature guided by the domain model. Knowledge Enabled Information and Services Science Knowledge Enabled Information and Services Science Knowledge Enabled Information and Services Science Wisdom of the Crowds Knowledge Enabled Information and Services Science Knowledge Enabled Information and Services Science Overview - conceptual Knowledge Enabled Information and Services Science Expand - conceptually Delete Full Graph-based textresults searchwith on lowArticle expansion Lucene texts score Knowledge Enabled Information and Services Science Node Similarity computation |N (a )| ,|N (b )| s(a,b) i, j avgwN (a),wN N i (a ), N j (b ) M i j (b) |N (a )| |N (b )| avg w N i (a), w N j (b) j i Knowledge Enabled Information and Services Science Collecting Instances Graph Search Graph Search (Denis) Seed Query Fulltext Concept Search (Somnath) Graph Search (Denis) Graph Search (Denis) B Wikipedia Knowledge Enabled Information and Services Science Creating a Hierarchy Graph Search Graph Search (Denis) Seed Query Fulltext Concept Search (Somnath) Graph Search (Denis) Graph Search (Denis) B Wikipedia Knowledge Enabled Information and Services Science Creating a Hierarchy Graph Search Graph Search (Denis) Seed Query Fulltext Concept Search (Somnath) Graph Search (Denis) Graph Search (Denis) B Wikipedia Knowledge Enabled Information and Services Science Hierarchy Creation - summary Graph Search Graph Search (Denis) Seed Query Fulltext Concept Search (Somnath) Graph Search (Denis) Graph Search (Denis) B Wikipedia Hierarchy Creation Knowledge Enabled Information and Services Science Reduce steps • Remove all terms that have low pertinence to the domain • Intersect hierarchy with broader focus domain • Reduce hierarchy depth Knowledge Enabled Information and Services Science Remove unwanted individuals Knowledge Enabled Information and Services Science Remove unwanted categories Knowledge Enabled Information and Services Science Flatten categories Knowledge Enabled Information and Services Science Snapshot of final Topic Hierarchy Knowledge Enabled Information and Services Science Evaluation wrt. expert model Evaluation wrt. MeSH versions of 2004 (04) and 2008 (08) for both the restricted and the full set of MeSH term Knowledge Enabled Information and Services Science Steps 1. Carve a focused domain model out of Wikipedia 2. Identify mentions of entities and relationships in the relevant scientific literature (Pubmed abstracts) to support non-hierarchical guidance. 3. Map extracted entity mentions to concepts and extracted predicates to relationships 4. Applications to access research literature guided by the domain model. Knowledge Enabled Information and Services Science Challenges and Opportunities • Vocabularies, Thesauri, Ontologies, exist in several fields – How can we use them? • lexicon: Fish Oils, Raynaud’s Disease, etc. • types/labels: Fish Oils instance of Lipid • relationships between types: Lipid affects Disease • Identification of simple ontology terms in text is not enough But – Compound Entities – Complex Relationships 35 Knowledge Enabled Information and Services Science Challenge: Compound Entities MeSH terms Modifier • Entities (MeSH terms) in sentences occur in modified forms • “adenomatous” modifies “hyperplasia” • “An excessive endogenous or exogenous stimulation” modifies “estrogen” • Entities can also occur as composites of 2 or more other entities • “adenomatous hyperplasia” and “endometrium” occur as “adenomatous hyperplasia of the endometrium” Knowledge Enabled Information and Services Science 36 Extraction Algorithm Relationship head Subject head Object head Knowledge Enabled Information and Services Science Object head 37 Extraction Approach ● ● ● Parse sentences with a dependency parser Use a few domain-independent rules to segment sentences into SubjPredObject Subjects and Objects represent compound entities Knowledge Enabled Information and Services Science Preliminary results Subject types, inferred from the HEAD of the compound Knowledge Enabled Information and Services Science 39 Extracted Triples Knowledge Enabled Information and Services Science 40 Representation – Resulting RDF hyperplasia adenomatous hasModifier hasPart modified_entity2 An excessive endogenous or exogenous stimulation hasModifier hasPart modified_entity1 induces composite_entity1 hasPart hasPart estrogen Modifiers Modified entities Composite Entities endometrium 41 Knowledge Enabled Information and Services Science Using the generated RDF stimulated migraine (D008881) platelet (D001792) collagen (D003094) hasPart hasPart magnesium (D008274) stimulated hasPart caused_by me_2286 _13%_and_17%_adp_and_collagen_induced_platelet_aggregation me_3142 by_a_primary_abnormality_of_platelet_behavior Knowledge Enabled Information and Services Science Steps 1. Carve a focused domain model out of Wikipedia 2. Identify mentions of entities and relationships in the research literature. 3. Map extracted entity mentions to concepts and extracted predicates to relationships 4. Applications to access research literature guided by the domain model. Knowledge Enabled Information and Services Science Step 3: Identifying Synonymy and Antonymy between Verbs for Relationship Matching Christopher Thomas, Wenbo Wang Knowledge Enabled Information and Services Science Motivation • An ontology has clearly demarcated, formal relationships whereas natural language uses multiple ways of representing these relationships • Focus on verbs expressing relationships • NLP-extracted triples use verbs to indicate relationships, but are not aligned with the relationships yet. • Align similar verbs Align extracted predicates to formal relationships Knowledge Enabled Information and Services Science Examples of relationships to merge • Dilation of the fourth ventricle indicated atrophy of the middle cerebellar peduncle. – E.g. < Dilation of the fourth ventricle > indicate <atrophy of the middle cerebellar peduncle> GOOD • The lesions in the rolipram-treated group also showed increased astrogliosis and increased CREB phosphorylation in the activated microglia and astrocytes. – E.g. < lesions in the rolipram-treated group > showed indicate < increased astrogliosis and increased CREB phosphorylation in the activated microglia and astrocytes > GOOD BAD Knowledge Enabled Information and Services Science Short Review – training set creation • Normalization Remove phrases containing less than 2 verbs • POS Tagging Eating -> <VBG>, eats -> <VBZ> Like <-> Hate • Synonyms & antonyms extraction ANT: SYN: calculate <-> compute • Pattern extraction He learns and memorizes materials • Pattern2Relationship matrix • VerbPair2Pattern matrix What is the correlation of this pattern and that relationship? What is the correlation of this verb pair and that pattern? Knowledge Enabled Information and Services Science Design • Synonyms & antonyms extraction from Wordnet – A single verb has multiple meanings • contract, take, get (be stricken by an illness, fall victim to an illness) "He got AIDS"; "She came down with pneumonia"; "She took a chill“ • film#1, shoot#4, take#16 (make a film or photograph of something) "take a scene"; "shoot a movie" • Take has 42 meanings in Wordnet – Some meanings of a verb are less frequently used – If the number of meanings of a verb exceeds threshold, we will abandon this verb – If the frequency of a specific meaning of a verb is low, we will abandon this meaning Knowledge Enabled Information and Services Science Solution • Do pattern-based text analysis to find predicates that are used in similar contexts and map them to upper-level relationships, e.g. those found in UMLS Knowledge Enabled Information and Services Science Steps 1. Carve a focused domain model out of Wikipedia 2. Identify mentions of entities and relationships in the research literature. 3. Map extracted entity mentions to concepts and extracted predicates to relationships 4. Applications to access research literature guided by the domain model. Knowledge Enabled Information and Services Science Step 4 Trellis: Knowledge-Driven Web Browsing Pablo N. Mendes, Cartic Ramakrishnan, Delroy Cameron and Amit P. Sheth Knoesis Center, May 27 2009. Introduction Background System Overview Demo Future Work 61 Background Web Browsing Search Keyword search Sift Retrieve documents 62 Point of access to related content Hyperlinks Query Refinement Search and Sift Background Observations Human Cognition Keyword Graduate Student Research Assistant Entity Turkey Model San Diego Relationship e1 e2 en Information Need (Documents) 63 e1 Pathway System Overview New Browsing Paradigm Search Navigate Context Organize results Search Navigate/ Stitch Share Trellis 64 Intricate support structure for vines Vines as information need/context through navigation System Overview Ontology Index {NLM hand annotated triples} Spotter Workbench Fig.1 Trellis Architecture 65 Knowledge Base {Pubmed Abstracts} Save/Publish Magnesium -> isa -> Calcium Channel Blocker Calcium Channel Blockers -> prevent -> Migraine Migraine Magnesium -> isa -> Calcium Channel Blocker -> prevent -> Migraine Magnesium -> prevent -> Migraine Future Work Ranking - Boosting Precision/Recall Scalability Information Extraction from natural language Named Entity Recognition User feedback Corpus Statistics User driven Knowledge Discovery 69 Thank you Knowledge Enabled Information and Services Science