ICDL 2004, New Dehli, India Ontology-Driven Information Retrieval Nicola Guarino Laboratory for Applied Ontology Institute for Cognitive Sciences and Technology (ISTC-CNR) Trento-Roma, Italy Summary • • • • • • The role of ontologies in (generalized) IR Semantic Matching Increasing precision Increasing recall Problems with WordNet The role of Foundational Ontologies The key problem Semantic matching Simple queries: need more knowledge about what the user wants • Search for “Washington” (the person) – Google: 26,000,000 hits – 45th entry is the first relevant – Noise: places • Search for “George Washington” – Google: 2,200,00 hits – 3rd entry is relevant – Noise: institutions, other people, places Solution: ontology+semantic markup • Ontology – Person • George Washington • George Washington Carver – Place • Washington, D.C. – Artifact • George Washington Bridge – Organization • George Washington University • Semantic disambiguation/markup of questions and corpora – What Washington are you talking about? The role of taxonomy and lexical knowledge • Search for “Artificial Intelligence Research” – Misses subfields of the general field – Misses references to “AI” and “Machine Intelligence” (synonyms) – Noise: non-research pages, other fields… Solution: • Extra knowledge – Ontology: Sub-fields (of AI) • Knowledge Representation • Machine Vision etc. • Neural networks – Lexicon: Synonyms (for AI) • Artificial Intelligence • Machine Intelligence • Techniques – Query Expansion • Add disjuncted sub-fields to search • Add disjuncted synonyms to search – Semantic Markup of question and corpora • Add “general terms” (categories) • Add “synonyms” Ontology-driven search engines • Idealized view – Ontology-driven search engines act as virtual librarians • Determine what you “really mean” • Discover relevant sources • Find what you “really want” • Requires common knowledge on all ends – Semantic linkage between questioning agent, answering agent and knowledge sources • Hence the “Semantic Web”? Two main roles of ontologies in IT • Semantic Interoperability – – – – Generalized database integration Virtual Enterprises, Concurrent Engineering e-commerce Web services • Information Retrieval – – – – Documents Facts (query answering) Products Services [Not mentioning IS analysis and design…] The role of linguistic ontologies coupled with structured representation formalisms • Why not just simple thesauri based on fixed keyword hierarchies? • Linguistic ontologies with structured representation formalisms – Data’s intrinsic dynamics needs to keep track of new terms – Need of understanding a rigid set of terms – Heterogeneous descriptions need a broad-coverage vocabulary – – – – Decouple user vocabulary from data vocabulary Increase recall (synonyms, generalizations) Increase precision (disambiguation, ontology navigation) Further increase precision (by capturing the structure of queries and data) Using WordNet as an ontology • Unclear semantic interpretation of hyperonimy • Instantiation vs. subsumption • Object-level vs. meta-level • Hyperonymy used to account for polysemy •(law both a document and a rule) • Unclear taxonomic structure • Glosses not consistent with taxonomic structure • Heterogeneous leves of generality • Formal constraints violations (especially concerning roles) • Polysemous use of antonymy (child/parent vs. daughter/son) • Poor ontology of adjectives and qualities • Shallow taxonomy of verbs When subtle distinctions are important • “Trying to engage with too many partners too fast is one of the main reasons that so many online market makers have foundered. The transactions they had viewed as simple and routine actually involved many subtle distinctions in terminology and meaning” Harvard Business Review, October 2001 Ontologies and intended meaning Commitment: K = < C,I > Conceptualization Language L Interpretation Intended models IK(L) Models MD(L) Ontology Levels of Ontological Precision tennis football game field game court game athletic game outdoor game game(x) activity(x) athletic game(x) game(x) court game(x) athletic game(x) y. played_in(x,y) court(y) tennis(x) court game(x) double fault(x) fault(x) y. part_of(x,y) tennis(y) game athletic game court game tennis outdoor game field game football Taxonomy Glossary Catalog game NT athletic game NT court game RT court NT tennis RT double fault Thesaurus Ontological precision Axiomatized theory DB/OO scheme Ontology Quality: Precision and Coverage Good High precision, max coverage BAD Max precision, low coverage Less good Low precision, max coverage WORSE Low precision and coverage Why precision is important MD(L) IB(L) IA(L) False agreement! DOLCE a Descriptive Ontology for Linguistic and Cognitive Engineering • Strong cognitive bias: descriptive (as opposite to prescriptive) attitude • Emphasis on cognitive invariants • Categories as conceptual containers: no “deep” metaphysical implications wrt “true” reality • Clear branching points to allow easy comparison with different ontological options • Rich axiomatization DOLCE’s basic taxonomy Endurant Physical Amount of matter Physical object Feature Non-Physical Mental object Social object … Perdurant Static State Process Dynamic Achievement Accomplishment Quality Physical Spatial location … Temporal Temporal location … Abstract Abstract Quality region Time region Space region Color region … … Research priorities at LOA-CNR • Foundational ontologies and ontological analysis • Domain ontologies – Physical objects – Information and information processing – Social interaction – Ontology of legal and financial entities • Ontology, language, cognition • Ontology-driven information systems – Ontology-driven conceptual modeling – Ontology-driven information access – Ontology-driven information integration www.loa-cnr.it