Lexical Semantics and Ontologies Tutorial at the ACL/HCSnet 2006 Advanced Program in Natural Language Processing Paul Buitelaar Language Technology Lab & Competence Center Semantic Web DFKI GmbH Saarbrücken, Germany © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Overview Day 1: Words and Meanings Human language as a system How do words relate to each other Day 2: Words and Object Descriptions Human language as a means of representation How do words represent objects in the/a world © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Day 1 - Introduction Words and Meanings Synsets and Senses Related Senses Lexical Semantics in WordNet Generative Lexicon and CoreLex Domains and Senses Tuning WordNet to a Domain © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Words and Meanings Lexical Semantics in WordNet Generative Lexicon and CoreLex Tuning WordNet to a Domain © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia WordNet Lexical Semantic Resource Semantic Lexicon Lexical Database Maps words to meanings (senses) Machine readable (has a formal structure) Freely available http://wordnet.princeton.edu/ © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia WordNet - Origins In 1985 a group of psychologists and linguists at Princeton University undertook to develop a lexical database … The initial idea was to provide an aid to use in searching dictionaries conceptually, rather than merely alphabetically … WordNet … instantiates hypotheses based on results of psycholinguistic research … … expose such hypotheses to the full range of the common vocabulary In anomic aphasia, there is a specific inability to name objects. When confronted with an apple, say, patients may be unable to utter ‘‘apple,’’ even though they will reject such suggestions as shoe or banana, and will recognize that apple is correct when it is provided. (Caramazza/Berndt 1978) Miller, George A., Richard Beckwith, Christiane Fellbaum, Derek Gross and Katherine J. Miller. ``Introduction to WordNet: an on-line lexical database.'' In: International Journal of Lexicography 3 (4), 1990, pp. 235 - 244. © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Synsets WordNet is organized around word meaning (not word forms as with traditional lexicons) Word meaning is represented by “synsets” Synset is a “Set of Synonyms” Example {board, plank} Piece of lumber {board, committee} Group of people © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Synset Hierarchy Synsets are organized in hierarchies Defines: generalization (hypernymy) specialization (hyponymy) Example {entity} … {whole, unit} {building material} {lumber, timber} {board, plank} © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia hypernymy hyponymy Hierarchies (WordNet 1.7) © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Hierarchy Example (WordNet 2.1) © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Synsets and Senses Synsets represent word meaning Words that occur in several synsets have a corresponding number of meanings (senses) Example © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia WordNet 2.1 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia (Other) WordNet Relations Synonymy Hypernymy/Hyponymy Similar in meaning Generalization and Specialization Meronymy Part-of e.g. study, bathroom, ... meronym house Antonymy Opposite in meaning e.g. warm antonym cold © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Words and Meanings Lexical Semantics in WordNet Generative Lexicon and CoreLex Tuning WordNet to a Domain © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Systematic Polysemy Homonymy bank embankment institution We walked along the bank of the Charles river. Did he have an account at the HBU bank? Systematic Polysemy school group (of people) (learning) process organization building © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia The school went for an outing. School starts at 8.30 The school was founded in 1910. The school has a new roof. Semantic or Pragmatic? Semantic Analysis Lexical Items of the Language school Obj1 Objects in the World Obj2 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Pragmatic Analysis school Obj4 Obj3 Obj1 Obj2 Obj4 Obj3 Underspecified Discourse Referents Anaphora Resolution [A long book heavily weighted with military technicalities]NP:event-physical_objectcontent , in this edition it is neither so long event nor so technical content as it was originally. Metonymy The Boston office called office > person person part-of office Bridging Peter bought a car. The engine runs well. engine part-of car The Boston office called. They asked for a new price. office > person © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Generative Lexicon Theory Type Coercion I began the book book > event event ‘has-relation-with’ book read is-a event multifaceted representation of lexical semantics reflecting systematic / regular / logical polysemy © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Generative Lexicon Theory Qualia Structure (Pustejovsky 1995) Formal book inheritance (is-a / hyponymy) Constitutive book modification (part-of / meronymy) constitutive read, … telic causality („how did the object come about“) Agentive book section, … purpose („what is the object used for“) Telic book artifact, communication, … formal agentive © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia write, … CoreLex (Buitelaar 1998) Automatic Qualia Structure Acquisition CoreLex is an attempt to automatically acquire underspecified lexical semantic representations that reflect systematic polysemy These representations can be viewed as shallow Qualia Structures Sense Distribution in WordNet Systematic polysemy can be empirically studied in WordNet by observing sense distributions >> If more than two words share the same sense distribution (i.e. have the same set of senses), then this may indicate a pattern of systematic polysemy (adapted from Apresjan 1973) © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Systematic Polysemous Classes book 1.{publication} 2.{product, production} 3.{fact} 4.{dramatic_composition, dramatic_work} 5.{record} 6.{section, subdivision} 7.{journal} => artifact => artifact => communication => communication => communication => communication => artifact Systematic Polysemous Class “artifact communication” amulet annals armband arrow article ballad bauble beacon bible birdcall blank blinker boilerplate book bunk cachet canto catalog catalogue chart chevron clout compact compendium convertible copperplate copy cordon corker ... guillotine homophony horoscope indicator journal laurels lay ledger loophole marker memorial nonsense novel obbligato obelisk obligato overture pamphlet pastoral paternoster pedal pennant phrase platform portrait prescription print puzzle radiogram rasp recap riddle rondeau … statement stave stripe talisman taw text tocsin token transcription trophy trumpery wand well whistle wire wrapper yardstick © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia From WordNet to CoreLex Noun1 Nounn Basic Type1 Basic Type1 Systematic Polysemous Class1 Systematic Polysemous Classn © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Other Examples “animal natural_object” alligator broadtail chamois ermine lapin leopard muskrat ... “natural_object plant” algarroba almond anise baneberry butternut candlenut cardamon ... “action artifact group_social” artillery assembly band church concourse dance gathering institution ... “action attribute event psychological” appearance concentration decision deviation difference impulse outrage … “possession quantity_definite” cent centime dividend gross penny real shilling © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia CoreLex vs. WordNet © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Representation and Interpretation „Dotted Types“ (Pustejovsky) Lexical types are either simple (human, artifact, ...) or complex (information AND physical_object) Can be represented with a „dotted type“, e.g. informationphysical_object In (Cooper 2005) interpreted as a record type (a delicious lunch can take forever): © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Related Work Apresjan 1973 Nunberg & Zaenen 1992 Semi-productive polysemy and sense extension. Peters, Peters & Vossen 1998 Word Sense Ambiguation: Clustering Related Senses. Copestake & Briscoe 1996 Systematic polysemy in lexicology and lexicography. Bill Dolan 1994 Regular Polysemy. Automatic Sense Clustering in EuroWordNet. Tomuro 1998 Semi-Automatic Induction of Systematic Polysemy from WordNet. © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Words and Meanings Lexical Semantics in WordNet Generative Lexicon and CoreLex Tuning WordNet to a Domain © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Reducing Ambiguity WordNet has too many senses … Reduce Ambiguity Cluster related senses (CoreLex) Tune WordNet to an application domain © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Domains and Senses Domains determine Sense Selection, e.g. English: cell prison cell in the Politics/Law domain living cell in the Biomedical domain English: tissue living tissue in the Biomedical domain cloth in the Fashion domain German: Probe test in the Biomedical domain rehearsal in the Theater domain >> Compute Domain-Specific Sense © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Approaches Subject Codes Topic Signatures Domain codes are in the dictionary Compute (domain-specific) context models from dictionary definitions, domain corpora, web resources Tuning of WordNet to a domain Top Down: Cucchiarelli & Velardi, 1998 Bottom Up: Buitelaar & Sacaleanu, 2001 Related recent work: McCarthy et al, 2004; Chan & Ng, 2005; Mohammad & Hirst, 2006 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Subject Codes Subject Codes (as used in LDOCE) indicate a domain in which a word is used in a particular sense Examples (2600 codes) Sub-Field Codes MDZP (Medicine:Physiology) Code Combinations MLCO (Meteorology+Building) e.g. lightning conductor MLUF (Meteorology+Europe+France) e.g. Mistral SN (sounds) high DG (drugs) ML (meteorology) © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Adding Subject Codes to WordNet Grouping Synsets together across POS MEDICINE Nouns: Verbs: doctor#1, hospital#1 operate#7 Grouping Synsets together across Sub-Hierarchies SPORT life_form#1: athlete#1 physical_object#1: game_equipment#1 act#2 : sport#1 location#1 : playing_field#1 Magnini B. & Cavaglià G. Integrating Subject Field Codes into WordNet In: Proceedings LREC 2000 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia WordNet DOMAINS Sense WordNet synset and gloss Domains 1 Depository, financial institution, bank, banking concern, banking company (a financial institution) Economy 2 Bank (sloping land) Geography, Geology 3 Bank (a supply or stock held in reserve) Economy 4 Bank, bank building (a building) Architecture, Economy 5 Bank (an arrangement of similar objects) Factotum 6 Savings bank, coin bank, money box, bank (a container) Economy 7 Bank (a long ridge or pile) Geography, Geology 8 Bank (the funds held by a gambling house ) Economy, Play 9 Bank, cant, camber (a slope in the turn of a road) Architecture 10 Bank (a flight maneuver.) Transport Bernardo Magnini, Carlo Strapparava, Giovanni Pezzuli, and Alfio Gliozzo. Using domain information for word sense disambiguation. In: Proceedings of the SENSEVAL2 workshop 2001. © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia WSD with Subject Codes Match between set of words in the context of the ambiguous word and the set of words (“neighborhoods”) in the definitions + sample sentences of all senses that share a Subject Code bank: Economics bank: Medicine and Biology write safe sum medicine product hold account person put origin place take money order treatment blood keep pay supply use store paper draw cheque organ comb human hospital Guthrie J. A. & Guthrie I. & Wilks Y. & Aidinejad H. Subject Dependent Co-Occurrence and Word Sense Disambiguation In: Proceedings of ACL 1991. © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Topic Signatures from the Web Construct Topic Signatures for WordNet synsets/senses Retrieve document collections from the web and use queries constructed for each WordNet sense, e.g. ( boy AND ( altar boy OR ball boy OR … OR male person ) AND NOT (man OR … OR broth of a boy OR son OR … OR mama’s boy OR black ) ) Agirre E. & Ansa O. & Hovy E. & Martinez D. Enriching very large ontologies using the WWW In: Proc. of the Ontology Learning Workshop ECAI 2000 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Top Down Tuning – Cucchiarelli & Velardi Automatically find the best set of (WordNet) senses that: “… represent at best the semantics of the domain” “[has the] … ‘right’ level of abstraction, so as to mediate between over-ambiguity and generality” “… [is] balanced …, i.e. words should be evenly distributed among categories” Alessandro Cucchiarelli, Paola Velardi Finding a domain-appropriate sense inventory for semantically tagging a corpus. Natural Language Engineering 4/4, p.325-344, Dec. 1998. © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Methods Used Create alternative sets of balanced categories by use of an adapted version of the Hearst/Schütze algorithm Apply a scoring function to find the best set, with parameters: Generality Discrimination Power Different senses lead to different categories (Domain) Coverage Highest possible level of generalization with a small number of categories is preferred Words in the domain corpus that are represented by the selected categories Average Ambiguity Ambiguity reduction is measured by the inverse of the average ambiguity of all words © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Balanced Categories - Hearst/Schütze Reduce WordNet noun hierarchy to a set of 726 disjoint categories, each consisting of a relatively large number of synsets and of an average size, with as small a variance as possible Group categories together into a set of 106 super-categories according to mutual co-occurrence in a training corpus Measure the frequency of categories on domain corpora 12.200 legal_system, ... 26.459 religion, ... 11.782 government, ... 25.062 breads, ... 7.859 politics, ... 24.356 mythology, ... United States Constitution Genesis Hearst M. & Schütze H. Customizing a Lexicon to Better Suit a Computational Task In: Proceedings ACL SIGLEX Workshop 1993 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Generality Generality of Category Set Ci: 1/DM(Ci) Average Distance between the Categories of Ci and the topmost synsets. 4+3/2 3/1 1 n DM (Ci ) * dm(cij ) n j 1 Ci = {Ci1, Ci2} DM (Ci )= (3.5 + 3) / 2 = 3.25 Topmost SynSet Ci1 Ci2 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia General SynSet Discrimination Power Discrimination Power of Category Set Ci: (Nc(Ci) - Npc(Ci))/ Nc(Ci) where Nc(Ci) is the number of words that reach at least one category of Ci and Npc(Ci) is the number of words that have at least two senses that reach the same category cij of Ci Ci1 Ci2 Ci3 Ci4 Ci = {Ci1 Ci2 Ci3 Ci4} General Synset Sense w1 w2 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia w3 Domain Word Coverage & Average Ambiguity Coverage of Category Set Ci: Nc(Ci)/W where Nc(Ci) is the number of words that reach at least one category in Ci Inverse of Average Ambiguity of Category Set Ci: 1/A(Ci) 1 A(Ci ) N c (Ci) N c (C i ) * Cwj(C i ) j 1 where Nc(Ci) is the number of words that reach at least one category in Ci , and for each word w in this set, Cwj(Ci) is the number of categories in Ci reached © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Best Category Set (WSJ) Category Higher-level synset C1 person, individual, someone, mortal, human, soul C2 instrumentality, instrumentation C3 written communication, written language C4 message, content, subject matter, substance C5 measure, quantity, amount, quantum C6 action C7 activity C8 group action C9 organization C10 psychological feature C11 possession C12 state C13 location Top Down categories for the financial domain, based on the Wall Street Journal © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Sense Selection with WSJ Set Sense Synset hierarchy for sense Top synset for sense 1 capital > asset possession (C11) 2 support > device instrumentality (C2) 4 document > writing written communication (C3) 5 accumulation > asset possession (C11) 6 ancestor > relative person (C1) Senses for stock - kept by domain tuning on the Wall Street Journal Sense Synset hierarchy for sense 3 stock, inventory > merchandise, wares >… 7 broth, stock > soup > … 8 stock, caudex > stalk, stem > … 9 stock > plant part > … 10 stock, gillyflower > flower > … 11 malcolm stock, stock > flower … 12 lineage, line of descent > … > genealogy > … 14 lumber, timber > … Senses for stock - discarded by domain tuning on the Wall Street Journal © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Bottom Up Tuning – Buitelaar & Sacaleanu Ranking of WordNet synsets according to a domain-specific corpus Compute term relevance against reference corpus Compute synset relevance according to term relevance (where term = synonym in synset) Ranking can be used in WSD (similar to usage of ‘most frequent heuristic’) Paul Buitelaar, Bogdan Sacaleanu Ranking and Selecting Synsets by Domain Relevance In: Proceedings of WordNet and Other Lexical Resources: Applications, Extensions and Customizations, NAACL 2001 Workshop, June 3/4 2001 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia TFIDF N tfidf ( w) tf . log( ) df ( w) The word is more important if it appears several times in a target document The word is more important if it appears in less documents tf(w) term frequency (number of word occurrences in a document) df(w) document frequency (number of documents containing the word) N number of all documents tfIdf(w) relative importance of the word in the document © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Term and Synset Relevance Term Relevance Relevance Score of Synset Members rlv (t | d ) log( tft , d ) log( N ) dft where t represents the term, d the domain, N is the total number of domains Synset Relevance Cumulated Relevance Score for a Synset rlv (c | d ) rlv (t | d ) tc © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Extended Synset Relevance Lexical Coverage Take Length of the Synset Into Account [Gefängniszelle, Zelle] ("prison cell") [Zelle] ("living cell") rlv (c | d ) tc T rlv (t | d ) c Hyponyms Take Hyponyms Into Account [Zelle,Gefängniszelle,Todeszelle] [Zelle,Körperzelle,Pflanzenzelle] © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia T rlv (c | d ) rlv (t | d ) tc c Experiment – Medical Domain Rank ed Terms -- with English translation(s) Ranked Concepts yes Eingriff (operation, intervention) all Infek tion (infection) 1. 2. 1. 2. [Eingriff:c, Operation:c, Abtreibung, Biopsie, ...] [Eingreifen:c, Eingriff:c, Intervention:c] [Entzündung:c, Infektion:c, Infektionskrankheit:c, ...] [Ansteckung:c, Infektion:c, Übertragung:c] all Studie (study, report) all Prophylaxe (prophylaxis) yes Gewebe (tissue) all Medizin (medicine) yes Gefäß (vascular, container) yes Zelle (cell) all Einschränk ung (constraint, restriction) all Aufnahme (intak e, reception) yes Sek tion (section) all Ausdehnung (spread, dimensions) yes Geburt (birth, rebirth) yes Abweichung (abnormality, divergence) yes Probe (test, rehearsal) 1. 2. 1. 2. 1. 2. 1. 2. 1. 2. 1. 2. 1. 2. 1. 2. 1. 2. 1. 2. 1. 2. 1. 2. 1. 2. [Experiment:c, Studie:c, Test:c, Versuch:c,...] [Abhandlung:c, Studie:c] [Prophylaxe:c, Empfängnisverhütung, Impfung, Verhütung] [Prophylaxe:c, Vorbeugung:c, Vorsorge:c, ...] [Gewebe:c, Körpergewebe:c, Bindegewebe, Tumor, ...] [Gewebe:c, Kleiderstoff:c, Stoff:c, Textilstoff:c, ...] [Medizin:c, Chirurgie, Frauenheilkunde, Gynäkologie, ...] [Arznei:c, Arzneimittel:c, Heilmittel:c, Medikament:c, ...] [Gefäß:c, Blutgefäß, Haargefäß, Herzkranzgefäß, Lymphgefäß] [Gefäß:c, Container, Form, Pokal, Schale, Schüssel, Tonne, ...] [Zelle:c, Körperzelle, Pflanzenzelle] [Gefängniszelle:c, Zelle:c, Todeszelle] [Beschränkung:c, Einschränkung:c, Vorbehalt:c] [Beschränkung:c, Degression:c, Drosselung:c, Einschränkung:c] [Aufnahme:c, Aufzeichnung:c, Mitschnitt:c, Protokoll, ...] [Aufnahme:c, Beherbergung:c, Unterbringung:c, Notaufnahme, ...] [Autopsie:c, Leichenöffnung:c, Obduktion:c, Sektion:c] [Amtsbereich:c, Dezernat:c, Geschäftsbereich:c, Sektion:c, ...] [Ausdehnung:c, Rauminhalt:c, Volumen:c] [Ausdehnung:c, Ausweitung:c, Dehnung:c, Erweiterung:c, ...] [Geburt:c, Fehlgeburt, Frühgeburt] [Geburt:c, Wiedergeburt] [ Abweichung:c, Differenz:c, Abnormität, Anomalie, ...] [ Abweichung:c, Differenz:c, Meinungsverschiedenheit] [Probe:c, Blutprobe, Gesteinsprobe, Urinprobe, Wasserprobe] [Bühnenprobe:c, Probe:c, Chorprobe, Generalprobe] © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Related Recent Work Diana McCarthy, Rob Koeling, Julie Weeds, and John Carroll Finding predominant senses in untagged text. In Proc. of ACL 2004. Chan, Yee Seng and Ng, Hwee Tou (2005) Word Sense Disambiguation with Distribution Estimation. Proc. of IJCAI 2005. Mohammad, Saif and Hirst, Graeme. Determining word sense dominance using a thesaurus. Proc. of EACL 2006. © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Day 2 - Introduction Words and Object Descriptions Semantics on the Semantic Web The Lexical Semantic Web Semantic Web, Ontologies and Natural Language Processing Knowledge Representation as Word Meaning A Lexicon Model for Ontologies Enriching Ontologies with Linguistic Information © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Words and Object Descriptions Semantics on the Semantic Web The “Lexical Semantic Web” A Lexicon Model for Ontologies © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Web Consists of Non-Interpreted Data Web Text Images © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Tables DBs Interpretation through Markup - Categories Markup © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Web Interpretation through Markup – User Tags Markup “Web 2.0” © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Interpretation through Markup – User Tags Markup “Web 2.0” © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Formal Interpretation - Knowledge Markup Knowledge Markup Semantic Web © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Ontologies Formal Interpretation - Knowledge Markup Knowledge Markup Semantic Web © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Ontologies Formal Interpretation - Knowledge Markup Knowledge Markup Semantic Web © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Ontologies Turns the Web into a Knowledge Base Knowledge Markup © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Ontologies Enables Semantic Web Services … Semantic Web Services Knowledge Markup © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Ontologies … and Intelligent Man-Machine Interface Semantic Web Services Knowledge Markup Ontologies Intelligent Man-Machine Interface © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Semantic Web Layer cake © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Resource Description Framework (RDF) DFKI GmbH node1 www http://www.dfki.de Kaiserslautern © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia RDF : XML-based Representation <?xml version=‘1.0’ ?> <rdf:RDF xmlns:rdf=“… rdf-syntax-ns#” xmlns:rdfs=“… rdf-schema#” xmlns=“http://example.org”> <rdf:Description rdf:nodeID=“node1”> <name>DFKI GmbH</name> <location>Kaiserslautern</location> <www rdf:resource=“http://www.dfki.de” /> </rdf:Description> </rdf:RDF> © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia RDF Schema (RDFS) Representation of classes and properties Student Person is-a Course Teacher rdf:Literal © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia RDFS : XML-based Representation © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Web Ontology Language (OWL) OWL adds further modelling vocabulary on top of RDFS, e.g. Class equivalence Property types (data vs. object property) Based on Description Logics, three versions OWL Lite OWL DL OWL Full © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia OWL Extended knowledge representation Student disjoint Person is-a Teacher rdf:Literal © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Course OWL : XML-based Representation © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia XML – RDF – RDFS - OWL Syntax XML Semantics XML Schema Namespaces Interpretation Context Data Types RDF Schema Formalization: Class Definition, Properties OWL Formalization: extended Class Definition, Properties, Property Types RDF © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Ontologies – What they are Ontology refers to an engineering artifact a specific vocabulary used to describe a certain reality a set of explicit assumptions regarding the intended meaning of the vocabulary An Ontology is an explicit specification of a conceptualization [Gruber 93] a shared understanding of a domain of interest [Uschold/Gruninger 96] © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Ontologies – Why you need them Make domain assumptions explicit Easier to exchange domain assumptions Easier to understand and update legacy data Separate domain knowledge from operational knowledge Re-use domain and operational knowledge separately A community reference for applications Shared understanding of what particular information means © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Applications of Ontologies NLP Information Extraction, e.g. Buitelaar et al. 06, Mädche, Staab & Neumann 00, Nedellec, Rebholz Information Retrieval (Semantic Search), e.g. WebKB (Martin et al. 00), OntoSeek (Guarino et al. 99), Ontobroker (Decker et al. 99) Question Answering, e.g. Harabagiu, Schlobach & de Rijke, Aqualog (Lopez and Motta 04) Machine Translation, e.g. Nirenburg et al. 04, Beale et al. 95, Hovy, Knight Other Business Process Modeling, e.g. Uschold et al. 98 Digital Libraries, e.g. Amann & Fundulaki 99 Information Integration, e.g. Kashyap 99; Wiederhold 92 Knowledge Management (incl. Semantic Web), e.g. Fensel 01, Staab & Schnurr 00; Sure et al. 00, Abecker et al. 97 Software Agents, e.g. Gluschko et al. 99; Smith & Poulter 99 User Interfaces, e.g. Kesseler 96 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Ontologies and Their Relatives Catalogs Thesauri Glossaries & Terminologies Formal isa Semantic Networks © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia General logical constraints Formal Instance Axioms: Disjoint/Inverse… Thesauri – Examples : EuroVoc EuroVoc covers terminology in all of the official EU languages for all fields (27) that concern the EU institutions, e.g. politics, trade, law, science, energy, agriculture MT UF BT1 BT2 NT1 NT1 RT 3606 natural and applied sciences gene pool genetic resource genetic stock genotype heredity biology life sciences DNA eugenics genetic engineering (6411) © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Thesauri – Examples : MeSH MeSH (Medical Subject Headings) organized by terms (~ 250,000) that correspond to medical subjects for each term syntactic, morphological or semantic variants are given MeSH Heading Entry Term Entry Term Entry Term Entry Term Entry Term Entry Term Entry Term Entry Term See Also Databases, Genetic Genetic Databases Genetic Sequence Databases OMIM Online Mendelian Inheritance in Man Genetic Data Banks Genetic Data Bases Genetic Databanks Genetic Information Databases Genetic Screening © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Semantic Networks - Examples : UMLS Unified Medical Language System integrates linguistic, terminological and semantic information Semantic Network consists of 134 semantic types and 54 relations between types Pharmacologic Substance Pharmacologic Substance Pharmacologic Substance Pharmacologic Substance Pharmacologic Substance Pharmacologic Substance affects causes complicates diagnoses prevents treats © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Pathologic Function Pathologic Function Pathologic Function Pathologic Function Pathologic Function Pathologic Function Semantic Networks - Examples : GO GO (Gene Ontology) Aligns descriptions of gene products in different databases, including plant, animal and microbial genomes Organizing principles are molecular function, biological process and cellular component Accession: Ontology: Synonyms: Definition: Term Lineage GO:0009292 biological process broad: genetic exchange In the absence of a sexual life cycle, the processes involved in the introduction of genetic information to create a genetically different individual. all : all (164142) GO:0008150 : biological process (115947) GO:0007275 : development (11892) GO:0009292 : genetic transfer (69) © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Ontologies – Example I © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Ontologies – Example II Geographical Entity (GE) is-a flow_through Inhabited GE Natural GE capital_of mountain river instance_of located_in Zugspitze height (m) 2962 city country Neckar length (km) 367 F-Logic Ontology capital_of Germany flow_through located_in flow_through Stuttgart similar Berlin Design: Philipp Cimiano © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Ontologies for NLP Information Retrieval Machine Translation Interlingua Information Extraction Query Expansion Template Definition Semantic Integration Question Answering Question Analysis Answer Selection © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Information Extraction Class-based Template Definition Allows for Reasoning over Extracted Templates with Respect to the Ontology (see e.g. [Nedellec and Nazarenko 2005] for discussion) Semantic Integration Extraction from Heterogeneous Sources (Text, Tables and other Semi-Structured Data, Image Captions) – SmartWeb [Buitelaar et al. 06] Multi-Document Information Extraction – ArtEquAKT [Alani et al. 2003] © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Question Answering Question Analysis Ontology/WordNet-based Semantic Question Interpretation (e.g. [Pasca and Harabagiu 01]) Answer Selection Ontology/WordNet-based Reasoning for Answer Type-Checking Ontology of Events [Sinha and Narayanan 05] Geographical Ontology, WordNet [Schlobach & de Rijke 04] WordNet [Pasca and Harabagiu 01] Ontology-based Question Answering Derive Answers from a Knowledge Base (e.g. Aqualog [Lopez & Motta 04]) © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Ontology Life Cycle Populate Knowledge Base Generation Validate Consistency Checks Create/Select Development and/or Selection Evolve Extension, Modification Deploy Knowledge Retrieval Maintain Usability Tests © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia NLP in the Ontology Life Cycle Ontology Population Information Extraction KB Retrieval Ontology Learning Question Answering Text Mining © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Ontology Learning x ( country(x) y capital_of (y, x) z ( capital_of (z, x) y z)) disjoint(r iver, mountain) GeneralAxioms Axiom Schemata capital_of R located_in Relation Hierarchy flow_throu gh(dom : river, range : GE) Relations capital C city, city C Inhabited GE Concept Hierarchy c : country : i(c), c , Ref C (c) Concept Formation {country, nation, Land} river, country, nation, city, capital,.. . (Multilingual) Synonyms Terms Design: Philipp Cimiano © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Words and Object Descriptions Semantics on the Semantic Web The “Lexical Semantic Web” A Lexicon Model for Ontologies © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Dictionary: Words and Senses Represent interpretations of words through senses, very much like classes that are assigned to a word, e.g. article 1. An individual thing or element of a class… 2. A particular section or item of a series in a written document… 3. A non-fictional literary composition that forms an independent part of a publication… 4. The part of speech used to indicate nouns and to specify their application 5. A particular part or subject; a specific matter or point (as provided by http://dictionary.reference.com/) © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Ontology: Classes and Labels - I Ontologies assign labels (i.e. words) to a given class In the COMMA ontology on document management the class article corresponds to sense 2 (‘section of a written document’): http://pauillac.inria.fr/cdrom/ftp/ocomma/comma.rdfs © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Ontology Classes and Labels - II In the GOLD ontology on linguistics, the class label article corresponds to sense 4 (‘part of speech ’): http://emeld.org/gold © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia The Meaning of Director - I The Semantic Web can be viewed as a large, distributed dictionary (or rather a semantic lexicon) in which we can look up the meaning of words, e.g. director … as a ‘role’ (AgentCities ontology) http://www-agentcities.doc.ic.ac.uk/ontology/shows.daml © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia The Meaning of Director - II … as ‘head of a program’ (University Benchmark ontology) http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Exploring the Lexical Semantic Web Collect ontologies OntoSelect Analyse the use of class/property labels Treat class/property labels as lexical entries Normalize Organize by language © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Ontology Collection OntoSelect Web Monitor on DAML, RDFS, OWL Files Download, Analyze and Store Included Information and Metadata Class and Property Labels Multilingual Information Included Ontologies Ontology Ranking and Selection Functionalities http://olp.dfki.de/OntoSelect © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia OntoSelect © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Multilinguality on the Semantic Web © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Multilingual Labels © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia “Lexical Semantic Ambiguity” © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Words and Object Descriptions Semantics on the Semantic Web The “Lexical Semantic Web” A Lexicon Model for Ontologies © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Ontologies – Example III © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Ontologies – Example III (continued) Student studies_at located_at University Campus works_at “Fakultät” © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia is_part_of Staff Ontologies – Example III (continued) Student studies_at located_at University Campus works_at “Fakultät” is_part_of has_German_term Fakultät has_Dutch_term has_US_English_term School © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Faculteit Staff Ontologies – Example III (continued) University “Fakultät” is_part_of has_term Term instance_of instance_of Fakultät language DE faculteit language NL © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia school language EN-US Semiotic Triangle Ogden & Richards, 1923 based on Structural Linguistics studies (de Saussure, 1916) adopted in Knowledge Representation (e.g. Sowa, 1984) © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Legend LingInfo Model – Simplified rdf:type URI rdfs:subClassOf property ... feat:ClassWithFeats feat:ClassWithFeats rdfs: subClassOf o:Defender feat:lingFeat ... classes rdfs:Class o:FootballPlayer feat:ClassWithFeats meta-classes rdfs:Class if:ImgFeat feat:ClassWithFeats o:Midfielder rdfs:Class ... lf:LingFeat feat:imgFeat feat:lingFeat lf:LingFeat lf:LingFeat lf:lang “de” lf:term “Abwehrspieler” … lf:lang “de” lf:term “Mittelfeldspieler” … if:ImgFeat instances if:color “#111111” lf:texture “&keypatchSet_223 … Design: Michael Sintek © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia LingInfo Model © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia LingInfo Instances - Example inst0 : LingInfo lang de morphSynDecomp term Fußballspielers Fußballspielers „of the football player“ inst2 : Stem case nominative gender male number singular ortographicForm Fußballspieler partOfSpeech Noun isComposedOf … inst3 : Stem analysisIndex 1 orthographicForm Fußball ... isComposedOf function modifier root semantics inst1 : InflectedWordForm case genitive gender male number singular ortographicForm Fußballspielers partOfSpeech Noun wordForm … inst1 : Root orthographicForm Spieler … inst8 : Stem analysisIndex 2 orthographicForm Spieler … root … o:BallObject inst7 : Stem (Ball) inst4 : Root (Ball) inst5 : Stem (Fuß) inst6 : Root (Fuß) © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia LingInfo Predicate-Arg Structure Design: Anette Frank © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Conclusions © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Conclusions WordNet: Appropriate Use may include Introduction of underspecified senses (sense grouping) Tuning to a domain The “Lexical Semantic Web” The Semantic Web (and Web 2.0) is a potentially rich resource for (formal) lexical semantics Mining such resources for lexical semantics (i.e. compilation of a distributed semantic lexicon) only just started Ontologies to be extended with linguistic/lexical information © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia