Ontologies German Rigau i Claramunt http://www.lsi.upc.es/~rigau TALP Research Center Departament de Llenguatges i Sistemes Informàtics Universitat Politècnica de Catalunya Ontologies Outline • • • • • • • • • • • WordNet (Miller et al. 90, Fellbaum 98) EuroWordNet (Vossen et al. 98) Spanish WordNet Combining Methods (Atserias et al. 97) Mapping hierarchies (Daudé et al. 01) Mikrokosmos (Viegas et al. 96) Cyc (Malesh et al. 96) WordNet 2 (Harabagiu 98) MindNet (Richardson et al. 97) ThoughtTreasure (Mueller 00) Meaning ... WordNet & EuroWordNet German Rigau i Claramunt http://www.lsi.upc.es/~rigau TALP Research Center Departament de Llenguatges i Sistemes Informàtics Universitat Politècnica de Catalunya WordNet & EuroWordNet WordNet • Universidad de Princeton (Miller et al. 1990) • Conceptos lexicalizados (parabras, lexíes) • Relacionados entre sí por relaciones semánticas • sinonimia • antonimia • hiperonimia-hiponimia • meronimia • implicación • causa • ... WordNet & EuroWordNet Relaciones Semánticas de WN1.5 •Sinonimia •Conceptos Lexicalizados (SYNSETS) •Noción débil de sinonimia: Sinonimia en contexto •Synset: Conjunto de palabras o lexías que en un contexto dado expresan un concepto •Hiperonimia / Hiponimia •Relación de clase a subclase WordNet & EuroWordNet Relacions Semàntiques de WN1.5 •Meronimias •Parte componente {mano}{brazo} •Elemento de colectividad {persona}{gente} •Sustancia {periódico}{papel} WordNet & EuroWordNet Relaciones Semánticas de WN1.5 • Antonimia {grande}{pequeño} • Causa {matar}{morir} • Implicación {divorciarse}{casarse} • Derivación {presidencial}{presidente} • Similitud {bueno}{positivo} WordNet & EuroWordNet Ejemplo WordNet <conveyance> <vehicle> <motor vehicle, automovile,...> <doorlock> <car door> <cruiser, squad car, patrol car, ...> <cruiser, squad car, patrol car, ...> <cab, taxi, hack, ...> WordNet & EuroWordNet EuroWordNet • Proyecto LE-2 4003 •Telematics Application Programme de la UE • Redes semánticas de diversas lenguas • Integradas e interconectadas •Inglés •Holandés •Italiano •Español Universidad de Sheffield Univ. de Amsterdam I.L.C. de Pisa UB, UPC, UNED. • Computers and the Humanities • (Vol.monográfico,1998) • http://www.hum.uva.nl/~ewn/ WordNet & EuroWordNet Extensiones EuroWordNet •EWN2 Alemán, Francés, Checo, Sueco, Estonio •Proyecto ITEM Castellano, Catalán, Vasco •CREL (Centre de Referència d’Enginyeria Lingüística) Catalán (UB, UPC) WordNet & EuroWordNet Aplicaciones •Desarrollo de recursos Básicos •Tratamiento interlingüístico de la información - Sistemas multilingües de recuperación de información (p.e., Internet) - Módulo léxico-semántico de los sistemas de ingeniería lingüística Extracción de información Traducción automática WordNet & EuroWordNet Requisitos de Diseño •Preservación de las relaciones semánticas específicas de cada lengua •Máxima compatibilidad entre los diferentes recursos •Relativa independencia de los WordNets •en el proceso de construcción •en el resultado final WordNet & EuroWordNet Componentes de EuroWordNet •Núcleo •El ILI •La Top Concept Ontology (TCO) •Ontología de dominios (DO) •Periferia •WordNets específicos WordNet & EuroWordNet Interlingual Index of EuroWordNet •Colección no estructurada de elementos •Ligados con •al menos, un synset de un EWN •un elemento de la TCO o DO •Asociados a synsets de WN 1.5 WordNet & EuroWordNet Top Concept Ontology of EuroWordNet •Jerarquía de conceptos independientes de la lengua •distinciones semánticas: objeto, lugar, dinámico, … •abstracta (no léxica) •Superpuesta al ILI •Tres tipos de entidades: •Primer orden: entidades concretas •Segundo orden: situaciones estáticas o dinámicas •Tercer orden: proposiciones abstractas WordNet & EuroWordNet Top Concept Ontology of EuroWordNet Top0 1stOrderEntity1 2ndOrderEntity0 Origin0 SituationType6 Natural 21 30 Living Plant18 Human106 Creature2 Animal23 Artifact144 0 Dynamic134 BoundedEvent183 UnboundedEvent48 28 Static Property61 Relation38 SituationComponent0 Form Cause67 Substance32 Solid63 Liquid13 Gas1 62 Object1 Composition0 Part86 Group63 Function55 Vehicle8 Agentive170 Phenomenal17 Stimulating25 Communication50 Condition62 Existence27 Experience43 Location76 Manner21 Mental90 3rdOrderEntity33 WordNet & EuroWordNet Domain Ontology of EuroWordNet •Jerarquía de etiquetas de dominio •Reducción de la polisemia •Dominios: •Tráfico: •Tráfico rodado, tráfico aéreo •Información Internacional •Micología •Medicina WordNet & EuroWordNet Relaciones de EuroWordNet •Riqueza superior a WN •Entre: •synsets (módulos monolingües) •registros ILI (multilingües): {actuar-1} EQ-SYNONYM {‘behave in a certain manner’} •registros ILI y TCO o OD WordNet & EuroWordNet Relaciones Interlingüísticas de EuroWordNet finger dito toe finger or toe IT finger toe head ING head dedo human head cabeza ES P animal head ILI eq_synonym has_eq_hyponym hoofd kop HOL has_eq_hypernym WordNet & EuroWordNet Relaciones de EuroWordNet relación ejemplo descripción HAS_XPOS_HYPERONYM HAS_XPOS_HYPONYM NEAR_SYNONYM NEAR_ANTONYM XPOS_NEAR_ANTONYM INVOLVED etiquetas aplicables dcr dr r r r dcr destrucción > cambiar cambio > destruir aparato<>instrumento construir <> destrozar construcción > destrozar martillear > martillo ROLE dcr vino > beber involved_agent dcr educar > educador role_agent dcr educador > educar role_location dcr comedor > comer HAS_MERONYM has_mero_portion has_mero_location BE_IN_STATE dcrn dcrn dcrn dcrn cara > nariz pan > mendrugo desierto > oasis belleza > bello hiperonimia transcategorial hiponimia transcategorial cuasi-sinonimia cuasi-antonimia cuasi-antonimia transcategorial entidad directamente relacionada con un evento evento directamente relacionado con una entidad involvement en que la entidad realiza un papel agentivo role en que la entidad realiza un papel agentivo role en que la entidad realiza un papel locativo meronimia (genérica) inversa de la anterior inversa de la anterior estado correspondiente a la posesión de una cierta propiedad Spanish WordNet: Building Process German Rigau i Claramunt http://www.lsi.upc.es/~rigau TALP Research Center Departament de Llenguatges i Sistemes Informàtics Universitat Politècnica de Catalunya Spanish WordNet General Methodology 1) Mapping to WN1.5 manual work automatic derivation of equivalents, using bilingual dictionaries 2) Manual correction 3) Re-structuring Spanish WordNet Main Steps: First Core (Manual Translation) – Nouns: A) WN1.5’s Tops File plus first level of hyponyms (about 800 synsets). B) The rest of EWN’s Common Base Concepts (which were not in our set). C) Manual translation of synsets intermediate between (A) and (B) following WN1.5 hyerarchy thus building a compact taxonomy equivalent to WN1.5 without gaps – Verbs: Manual translation of EWN’s Base Concepts (about 150 synsets) Spanish WordNet Main Steps: Subset 1 (Semi-automatic) Nouns: – Applying authomatic methods using bi-lingual dictionaries – Manual validation of several subsets to check if the link is correct – Deriving a Confidence Score (CS) for every authomatic method (heuristic) – Selecting pairs synset-word above 85% CS – Some manual correction of this Subset 1 (mainly, filling gaps) Verbs: – 3600 English verbs connected to WN1.5 senses and ambiguously translated to Spanish are manually inspected and disambiguated Spanish WordNet Main Steps: Subset 1 (Results 1) Synsets number of senses (variants) X variants per synset Corresponding to number of entries (words) X senses per word Language Internal Relations Average per synset Equivalent Relations to ILI (WN1.5) Average per synset Synset without ILI Percentage of Synsets without translation Nouns Verbs Others Total 18577 2602 0 21179 39620 6795 0 46415 2.22 2.61 0 2.27 23216 2278 0 25494 1.77 2.98 0 1.88 40559 3749 0 44308 2.18 1.44 0 2.09 18634 2602 0 21236 1.00 1.00 0 1.00 0 0 0 0 0% 0% 0% Spanish WordNet Main Steps: Subset 1 (Results 2) CS 100% (manual) > 97% > 95% > 93% > 86% > 85 % Total Nouns Verbs Total 5041 6795 11836 403 0 403 304 0 304 1598 0 1598 27649 0 27649 4625 0 4625 39620 6795 46415 Spanish WordNet Main Steps: Subset 2 Main goals enhance the quality of the Subset 1 by manual revision extend it by manual building of synsets 4 Sub-tasks Spanish WordNet Main Steps: Subset 2 1) Covering manually those gaps in the hyponymy chains covered by other languages 2) Manual cleaning of some automatically-generated variants. – (a) pairs of synsets which are adjacent in the hyponymy chain and share at least one variant. deleting redundant variants re-locating to either pre-existant or newly created synsets – (b) multi-word expressions present in synsets. Deleting non-lexicalized Spanish WordNet Main Steps: Subset 2 3) Manual addition of new vocabulary which has been considered relevant. – It mainly comes from the Catalan WordNet: since we are building both wordnets in parallell, we detected those synsets which were built for Catalan and not for Spanish 4) Manual addition of cross-part of speech relations between nominal and verbal synsets. – This work has been based mainly on noun-verb pairs obtained by means of morphological criteria. (Work carried out by UNED –Madrid-) Spanish WordNet Main Steps: Subset 2 (Results) Synsets number of senses (variants) X variants per synset Corresponding to number of entries (words) X senses per word Language Internal Relations Average per synset Equivalent Relations to ILI (WN1.5) Average per synset Synset without ILI Percentage of Synsets without translation Nouns Verbs Others Total 19663 3538 0 23201 39782 8394 0 48176 2.02 2.37 0 2.08 22881 3324 0 26205 1.74 2.53 0 1.84 43151 6756 2661 52568 2.19 1.91 ? 2.27 19534 3534 0 23068 0.99 1.00 0 0.99 185 4 0 189 1% 0% 0 1% Spanish WordNet Main Steps: Subset 2 (Results) Confidence (Variants) 100% (Manual) >96% >94% >92% >85% >84% Total Nouns Verbs Total 7819 8394 16213 382 0 382 0 2948 2948 0 1364 1364 0 23113 23113 0 4156 4156 39782 8394 48176 Spanish WordNet Main Steps: Beyond Subset 2 Massive Manual Checking (from Nov’98) – Using WEI – Variants automatically generated – Filling gaps in the hierachy – New vocabulary – New Adjectives Spanish WordNet Main Steps: Beyond Subset 2 Synsets No. of senses Sens./syns. Entries Sens./entry LIRels. LIRels/syns EQRels-ILI EQRels/syn Synsets without ILI Noun 24215 40759 1.68 26485 1.54 54832 2.26 24209 1.00 62 Verb Others Total 4079 2191 30485 9317 2439 52515 2.28 1.11 1.72 3828 2439 32752 2.43 1.00 1.60 7978 10855 73665 1.96 * 2.42 4074 0 28283 1.00 0 0.93 5 2191 2258 Spanish WordNet Main Steps: Beyond Subset 2 CS 99% (Manual) 97% 95% 93% 90% 86% 85% Total Nouns Verbs Adjectives Total 16568 9317 2439 28324 310 0 0 310 2652 0 0 2652 1173 0 0 1173 6 0 0 6 16605 0 0 16605 3445 0 0 3445 40759 9317 2439 52515 Spanish WordNet Main Steps: Parole Coverage Frequency parole entries 1001501-1000 251-500 101-250 51-100 31-50 21-30 11-20 6-10 3-5 2 1 overall 147 261 462 933 959 892 730 1202 1024 968 435 643 8656 Nouns Verbs parole covered %coverage parole parole %coverage entries covered 143 97.28 110 107 97.27 246 94.25 139 118 84.89 429 92.86 218 172 78.90 863 92.50 381 257 67.45 863 89.99 374 265 70.86 804 90.13 347 185 53.31 632 86.57 286 141 49.30 978 81.36 469 175 37.31 790 77.15 360 129 35.83 665 68.70 254 74 29.13 257 59.08 123 32 26.02 334 51.94 131 26 19.85 7004 80.91 3192 1681 52.66 Spanish WordNet Current Figures – Spanish, Catalan, Basque, (English) – http://nipadio.lsi.upc.es/wei2.html Nouns Verbs Adjs Synsets Words Synsets Words Synsets Words English 60556 87641 11363 14727 16428 19101 Spanish 43522 47665 7934 5312 12481 8762 Catalan 30701 32987 4505 4285 1444 1561 Combining Multiple Methods for the Automatic Construction of Multilingual WordNets German Rigau i Claramunt http://www.lsi.upc.es/~rigau TALP Research Center Departament de Llenguatges i Sistemes Informàtics Universitat Politècnica de Catalunya Combining Multiple Methods ... Outline Ten class methods – Four monosemic criteria – Four polysemic criteria – two hybrid criteria Three conceptual distance methods – CD1: using pairwise word coocurrences – CD2: using headword and genus – CD3: using bilingual Spanish entries with multiple translations Combining Multiple Methods ... Ten class methods – Four Classes SW EW SW EW EW SW EW SW SW EW SW EW Combining Multiple Methods ... Ten class methods – Four monosemic criteria SW EW Synset SW EW Synset EW Synset EW Synset SW EW Synset SW EW Synset SW SW Combining Multiple Methods ... Ten class methods – Four polysemic criteria SW EW Synset+ SW EW Synset+ EW Synset+ EW Synset+ SW EW Synset+ SW EW Synset+ SW SW Combining Multiple Methods ... Ten class methods – Variant criterion <..., EW, ..., EW, ...> SW – Field criterion <..., headword-EW, ..., Ind-EW, ...> SW Combining Multiple Methods ... Ten class methods Results Criterion mono1 mono2 mono3 mono4 poly1 poly2 poly3 poly4 Variant Field #links #synsets #words %ok 3697 3583 3697 92 935 929 661 89 1863 1158 1863 89 2688 1328 2063 85 5121 4887 1992 80 1450 1426 449 75 11687 6611 3165 58 40298 9400 3754 61 3164 2195 2261 85 510 379 421 78 Combining Multiple Methods ... Conceptual Distance methods Conceptual Distance (Agirre et al. 94) – length of the shortest path – specificity of the concepts 1 dist(w 1 , w 2 ) min c1i w1 c k path(c1i ,c 2i ) depth(c k ) c w 2i using WordNet Bilingual dictionary 2 Combining Multiple Methods ... Conceptual Distance methods Three conceptual distance methods – CD1: using pairwise word coocurrences – CD2: using headword and genus – CD3: using bilingual Spanish entries with multiple translations Combining Multiple Methods ... Conceptual Distance methods (Example CD2) <entity> <object, ...> <artifact, artefact> <structure, construction> <building, edifice> <house, lodging> <place of worship, ...> <church, church building> <abbey> <religious residence, cloiser> <convent> <monastery> <abbey> <abbey> abadía_1_2 Iglesia o monasterio regido por un abad o abadesa (abbey, a church or a monastery ruled by an abbot or an abbess) Combining Multiple Methods ... Conceptual Distance methods (Example CD2) <entity> <object, ...> <artifact, artefact> <structure, construction> <building, edifice> <house, lodging> <place of worship, ...> <church, church building> <abbey> 06 ARTIFACT <religious residence, cloiser> <convent> <monastery> <abbey> <abbey> abadía_1_2 Iglesia o monasterio regido por un abad o abadesa (abbey, a church or a monastery ruled by an abbot or an abbess) Combining Multiple Methods ... Three CD methods Results Criter. CD - 1 CD - 2 CD - 3 #links 23,828 24,739 4,567 #synsets 11,269 12,709 3,089 #words 7,283 10,300 2,313 %ok 56 61 75 Combining Multiple Methods ... Combining methods Results method1 cd1 cd2 cd3 p1 p2 size %ok size %ok size %ok size %ok size %ok method2 cd2 cd3 p1 p2 p3 p4 15736 1849 2076 556 3146 15105 79 85 86 86 72 64 0 2401 2536 592 3777 13246 0 86 88 86 75 67 0 0 205 180 215 3114 0 0 95 95 100 77 0 0 0 0 77 178 0 0 0 0 100 88 0 0 0 0 28 78 0 0 0 0 77 96 Combining Multiple Methods ... Resulting Spanish WordNets WNs SpWN v0.0 Combination SpWN v0.1 #links 10,982 7,244 15,535 #synsets 7,131 5,852 10,786 #word 8,396 3,939 9,986 #CS #poly links 87.4 1,777 85.6 2,075 86.4 3,373 Mapping Conceptual Hierarchies Using Relaxation Labelling German Rigau i Claramunt TALP Research Center UPC Mapping Conceptual Hierarchies using Relaxation Labelling Outline – Setting – Relaxation Labelling Algorithm – Constraints – Experiments & Results I (multilingual) – Experiments & Results II (monolingual) – Further work Mapping Conceptual Hierarchies using Relaxation Labelling Setting C1 C2 C3 C4 C5 C6 Mapping Conceptual Hierarchies using Relaxation Labelling Setting C1 C2 C3 C4 C5 C6 Mapping Conceptual Hierarchies using Relaxation Labelling Setting Connecting already existing Hierarchies – Relaxattion labelling Algorithn – Constraints Between – Spanish taxonomy automatically derived from an MRD (Rigau et al. 98) – WordNet using a bilingual MRD Mapping Conceptual Hierarchies using Relaxation Labelling Setting animal (Tops <animal, animate_being, ...>) ave faisán rapaz (person <beast, brute, ...>) (person <dunce, blockhead, ...>) (animal <bird>) (artifact <bird, shuttle, ...>) (food <fowl, poultry, ...>) (person <dame, doll, ...>) (animal <pheasant>) (food <pheasant>) (animal <bird>) (artifact <bird, shuttle, ...>) (food <fowl, poultry, ...>) (person <dame, doll, ...>) Mapping Conceptual Hierarchies using Relaxation Labelling Outline – Setting – Relaxation Labelling Algorithm – Constraints – Experiments & Results I (multilingual) – Experiments & Results II (monolingual) – Further work Mapping Conceptual Hierarchies using Relaxation Labelling Relaxation Labelling Algorithm – Iterative algorithm for function optimization based on local information – it can deal with any kind of constraints variables (senses of the taxonomy) labels (synsets) – Finds a weight assignment for each possible label for each variable weights for the labels of the same variable add up to one weigth assignation satisfies -to the maximum possible extent- the set of constraints Mapping Conceptual Hierarchies using Relaxation Labelling Relaxation Labelling Algorithm 1) Start with a random weight assigment 2) Compute the support value for each label of each variable (according to the constraints) 3) Increase the weights of the labels more compatible with context and decrease those and decrease those of the less compatible labels. 4) If a stopping/convergence is satisfied, stop, otherwiese go to step 2. Mapping Conceptual Hierarchies using Relaxation Labelling Outline – Setting – Relaxation Labelling Algorithm – Constraints – Experiments & Results I (multilingual) – Experiments & Results II (monolingual) – Further work Mapping Conceptual Hierarchies using Relaxation Labelling Constraints – Rely on the taxonomy structure – Coded with three characters X: Spanish Taxonomy, I (immediate), Y: English Taxonomy, A (ancestor) X: Relation, E (hypernym), O (hyponym), B (both) – Examples: IIE AAB + + + + Mapping Conceptual Hierarchies using Relaxation Labelling Hierarchical Constraints – II Constraints IIE NAACL’2001 IIO IIB Mapping Conceptual Hierarchies using Relaxation Labelling Hierarchical Constraints – AI Constraints + + + AIE NAACL’2001 + AIO AIB Mapping Conceptual Hierarchies using Relaxation Labelling Hierarchical Constraints – IA Constraints + + + IAE NAACL’2001 IAO + IAB Mapping Conceptual Hierarchies using Relaxation Labelling Hierarchical Constraints – AA Constraints + + + AAE NAACL’2001 + AAO + + + + AAB Mapping Conceptual Hierarchies using Relaxation Labelling Outline – Setting – Relaxation Labelling Algorithm – Constraints – Experiments & Results I (multilingual) – Experiments & Results II (monolingual) – Further work Combining Multiple Methods ...RANLP’97 Eight class methods – Four monosemic criteria Prec. Cov. SW EW Synset 92% 5% SW EW Synset 89% 1% EW Synset EW Synset 89% 2% SW EW Synset 85% 4% SW EW Synset SW SW Combining Multiple Methods ...RANLP’97 Eight class methods – Four polysemic criteria Prec. Cov. SW EW Synset+ 80% 8% SW EW Synset+ 75% 2% EW Synset+ EW Synset+ 58% 17% SW EW Synset+ 61% 60% SW EW Synset+ SW SW Combining Multiple Methods ...RANLP’97 Experiments & Results Poly TOK, FOK TOK, FNOK total animal food cognition communication 279 166 198 533 30 (91%) 3 (100%) 27 (90%) 40 (97%) 209 169 225 573 all TOK, FOK TOK, FNOK total animal food cognition communication 424 166 200 536 62 (95%) 83 (100%) 245 (90%) 234 (97%) 486 249 445 760 (90%) (94%) (67%) (77%) (93%) (94%) (67%) (77%) (90%) (94%) (69%) (78%) (90%) (96%) (82%) (81%) Combining Multiple Methods ...RANLP’97 Experiments & Results piel (substance <skin, fur, peel>) marta visón (substance <sable, marte, coal_back>) (substance <mink, mink_coat>) Mapping Conceptual Hierarchies using Relaxation Labelling Outline – Setting – Relaxation Labelling Algorithm – Constraints – Experiments & Results I (multilingual) – Experiments & Results II (monolingual) – Further work A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01 Generalized Constraints All Relationships – also-see, similar-to, attribute, antonym, etc. R R A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01 Generalized Constraints Non-structural constraints – W: number of word coincidences – G: word coincidences in glosses – F: number of frame coincidences (verbs) A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01 POS mapping depencences Nouns Adjectives Verbs Adverbs A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01 Constraints for Verbs Structural constraints – hyper/hyponymy – antonymy – also-see Non-structural constraints – W, G and F A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01 Constraints Adjectives Structural constraints – Adj-to-Adj antonymy, similar-to and also-see – Adj-to-Verb participle-of – Adj-to-Noun pertains and attribute Non-structural constraints – W and G A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01 Constraints Adverbs Structural constraints – Adv-to-Adv antonymy – Adv-to-Adj derived Non-structural constraints – W and G A Complete... ACL’00, NAACL’01 Example extra-POS WN1.5 02025107a evangelical evangelistic pertainym 04237485n Gospel Gospels evangel WN1.6 00843344a evangelical evangelistic Similar to 00842521a enthusiastic 02025107a evangelical pertainym 04853575n Gospel Gospels evangel A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01 Example extra-POS WN1.5 00057615r impossibly absurdly WN1.6 00294844r impossibly derived from derived from 01393725a impossible 01752468a impossible antonym 00294658a possibly A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01 Results Basic constraint set: structural constraints – Nouns: AA hyper/hyponym – Verbs: AA hyper/hyponym, II also-see – Adjectives: II antonymy, similar-to, also-see – Adverbs: II antonymy A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01 Results Basic constraint set: structural constraints Coverage N 99.7% Ambigous 94.9% - 99.6% Overall 97.6% - 99.8% V A R 93.5% - 99.2% 82.8% - 98.9% 97.5% - 100% 94.6% - 99.2% 89.5% - 99.4% 99.0% - 100% 96.9% 94.1% 80.8% Precision - recall A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01 Results Basic constraint set + W, G and F for verbs Coverage N 99.9% Ambigous 97.5% - 97.7 % Overall 98.8% - 98.9% V A R 99.4% - 99.7% 96.5% - 98.8% 97.5% - 100% 99.3% - 99.6% 97.9% - 99.3% 99.0% - 100% 99.8% 98.9% 99.5% Precision - recall A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01 Results Basic + extra-POS relationships Coverage N - - - V A R 95.8% - 98.9% 69.2% - 94.2% 90.9% - 99.4% 97.9% - 98.1% 95.8% 88.0% Ambigous Overall Precision - recall A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01 Results Basic + extra-POS relationships + WGF Coverage N 99.9% Ambigous 97.5% - 97.7 % Overall 98.8% - 98.9% V A R 99.4% - 99.7% 96.5% - 99.1% 98.3% - 100% 99.3% - 99.6% 97.9% - 99.5% 99.3% - 100% 99.8% 99.0% 99.6% Precision - recall Mapping Conceptual Hierarchies using Relaxation Labelling Conclusions – First complete mapping between Wordnet versions – Combining structural and non-structural information – Robust approach based on local information, but with global effects – Incremental POS approach – http://www.lsi.upc.es/~nlp – 90 downloads (since November 2000) Mapping Conceptual Hierarchies using Relaxation Labelling Further Work – mapping other structures WN-EDR, WN-LDOCE, etc. Other language taxonomies to EuroWordNet – SpanishEWN to WN1.6 – symmetrical philosophy rather than sourcetarget Mikrokosmos German Rigau i Claramunt http://www.lsi.upc.es/~rigau TALP Research Center Departament de Llenguatges i Sistemes Informàtics Universitat Politècnica de Catalunya Mikrokosmos Outline • Introduction • Representational Issues • The Lexicon • The Ontology • Acquisition Process • Lexicon Acquisition • Guidelines • Ontology/Lexicon Trade-off • Semantics in Action Mikrokosmos Introduction • Knowledge Base Machine Translation (KBMT) • CRL, NMSU • 5,000 concepts • Events • Objects • Properties • 7,000 Spanish word senses • 40,000 word senses • after expansion with productive Lexical Rules • comprar -> comprador, comprable, ... • Text Meaning Representation Mikrokosmos Representational Issues: The Lexicon • Typed Feature Structures (Pollard and Sag 87) • language-dependant • 10 zones • phonology • orthography • morphology • Syntactic (subcategorization) • Semantic (Lexical Semantic Representation) • syntax-semantic linking • stylistics • paradigmatic • syntacmatic Mikrokosmos Representational Issues: The Lexicon Adquirir-V1 syn: subj: cat: obj: cat: sem: acquire agent: theme: Adquirir-V2 syn: subj: cat: obj: cat: sem: acquire agent: theme: NP NP HUMAN OBJECT NP NP HUMAN INFORMATION Mikrokosmos Representational Issues: The Ontology • • • • • Taxonomic multi-hierarchical 14 local or inherited links in average language-impartial EVENTS, OBJECTS, PROPERTIES Methodology & Guidelines Mikrokosmos Representational Issues: The Ontology • ACQUIRE DEFINITION “The transfer of possession event where the agent transfers an object to its possession” IS - A TRANSFER-POSSESSION SOURCE HUMAN PLACE THEME OBJECT (NOT HUMAN) AGENT ANIMAL (DEFAULT HUMAN) DESTINATION ANIMAL PLACE (DEFAULT HUMAN) INHERITED BENEFICIARY HUMAN Mikrokosmos Acquisition Process: The Lexicon • Multi-lingual •French, English, Japanese, Russian, Spanish, etc. • Multi-media • Multi-process • • • • • • Analysis Generation (mono and multilingual) MT Summarization IE Speech Processing • Tools • corpus-search, lookup dictionary, ontology browser Mikrokosmos Acquisition Process: The Ontology • Guidelines 1) Do not add instances as concepts • Instances do not have their own instances • Concepts do not have fixed position in space/time 2) Do not decompose concepts further 3) Use close concepts 4) Do not add EVENTs with particular arguments 5) Do not add concepts with instance-specific aspects, temporal relations 6) Do not add language-specific concepts 7) Do not add ontologycal concepts for collections Mikrokosmos Acquisition Process: Ontology/Lexicon Trade-off • Daily negociations • lexicon acquirers • ontology acquirers • Possibilities • one-to-one mapping • lexicon unspecification • lexicon ontology balance Mikrokosmos Acquisition Process: Ontology/Lexicon Trade-off • one-to-one mapping PREPARE-FOOD INST: COOKING-EQUIPMENT COOK BAKE INST: STOVE cook : cuire sur le feu • Problems INST: OVEN bake : cuire ou four • Lexical: every word in a language is a concept • conceptual: cuire in french is not ambiguous Mikrokosmos Acquisition Process: Ontology/Lexicon Trade-off • Lexicon Unspecification PREPARE-FOOD INST: COOKING-EQUIPMENT cook : cuire sur le feu • Problems bake : cuire ou four • BAKE is not in the ontology INST: OVEN Mikrokosmos Acquisition Process: Ontology/Lexicon Trade-off • Lexicon-Ontology Balance PREPARE-FOOD INST: COOKING-EQUIPMENT BAKE FRY INST: STOVE INST: FRYING-PAN INST: OVEN bake cook : cuire Mikrokosmos Semantics in Action • El grupo Roche, a través de su compañía en España, adquirió Doctor Andreu. • El grupo Roche adquirió Doctor Andreu a través de su compañía en España. • La adquisición de Doctor Andreu por el grupo Roche fue hecha a través de su compañía en España. ACQUIRE-1 ORGANIZATION-1 ORGANIZATION-2 ORGANIZATION-3 Agent: ORGANIZATION-1 Theme: ORGANIZATION-2 Instrument: ORGANIZATION-3 Object-Name: Grupo Roche Object-Name: Doctor Andreu Location: España Mikrokosmos Semantics in Action • Onto-Search: Ontological search mechanism to check constraints • check-onto(ACQUIRE, EVENT) = 1 • since ACQUIRE is a type of EVENT • check-onto(ORGANIZATION, HUMAN) = 0.9 • since ORGANIZATION HAS-MEMBER HUMAN Mikrokosmos Semantics in Action 1) a-través-de INSTRUMENT, LOCATION adquirir require PHYSICAL-OBJECT 2) en LOCATION, TEMPORAL España is not a TEMPORAL-OBJECT 3) adquirir ACQUIRE, LEARN Doctor Andreu is not an INFORMATION 4) Doctor Andreu ORGANIZATION, HUMAN the Theme of ACQUIRE is not HUMAN 5) compañía CORPORATION, SOCIAL-EVENT ORGANIZATIONs typically fill the INSTRUMENT slot of ACQUIRE acts Mikrokosmos Experiment: WSD Text words words/sentence open-class words ambiguous words syntax correct % 1 347 16.5 183 57 21 51 97 2 385 24.0 167 42 19 41 99 3 370 26.4 177 57 20 45 93 4 353 20.8 177 35 12 34 99 Mean 364 21.4 176 48 18 43 97 Mikrokosmos Experiment: WSD Text words words/sentence open-class words ambiguous words syntax correct % Mean 364 21.4 176 48 18 43 97 Mean Unseen 390 26 104 26 9 23 97 WordNet2 German Rigau i Claramunt http://www.lsi.upc.es/~rigau TALP Research Center Departament de Llenguatges i Sistemes Informàtics Universitat Politècnica de Catalunya WordNet2 Outline • • • • • • • Introduction Text Inferences Defining Features Plausible inferences Inference Rules Semantic Paths What WordNet cannot do WordNet2 Introduction • (Harabagiu 98) • Commonse reasoning requires extensive knowledge • ~ 100 millions of concepts and relations • WordNet • represents almost all English words • 100.000 synsets • linked by semantic relations • WordNet2 • each synset has a gloss that, when disambiguated may increase the number of relations • WordNet glosses into semantic networks • NEW RELATIONS WordNet2 Text Inferences German was hungry He opened the refrigerator • hungry (feeling a need or desire to eat) • eat (take in solid food) • refrigerator (an appliance in which foods can be stored at low temperature) WordNet2 Defining Features • Transform each concept’s gloss into a graph where concepts are nodes and lexical relations are links • <culture> (all the knowledge shared by society) <share> --AGENT--> <society> • <doctor> (licensed medical practitioner) <medical practitioner> --ATRIBUTTE--> <licensed> WordNet2 Defining Features ship OBJECT guide PURPOSE pilot LOCATION person GLOSS water ATTRIBUTE ATTRIBUTE qualified difficult WordNet2 Inference Rules Rule 1 Rule 2 VC1 IS-A VC2 VC2 IS-A VC3 ------------------------VC1 IS-A VC3 Rule 3 VC1 IS-A VC2 VC2 ENTAIL VC3 ------------------------VC1 ENTAIL VC3 Rule 2 VC1 IS-A VC2 VC2 R_IS-A VC3 ------------------------VC1 PLAUSIBLE (not VC3) • 16 + 1 regles VC1 IS-A VC2 VC2 R_ENTAIL VC3 ------------------------VC1 EXPLAINS VC3 WordNet2 Semantic Paths 0) Create and load the KB 1) Place markers on KB concepts 2) Propagate markers The algorithm avoids cycles 3) Detect collisions To each marker collision it corresponds a path 4) Extract Inferences WordNet2 Semantic Paths Inference sequence • German was hungry • German felt a desire to eat • German felt a desire to take in food COLLISION: German=he felt a desire to take food, stored in an appliance, which he opened • He opened an appliance where food is stored • He opened the refrigerator WordNet2 What WordNet cannot do Major WordNet limitations: 1) The lack of compound concepts 2) The small number of causation and entailment relations 3) the lack of preconditions for verbs 4) the absence of case relations ThoughtTreasure German Rigau i Claramunt http://www.lsi.upc.es/~rigau TALP Research Center Departament de Llenguatges i Sistemes Informàtics Universitat Politècnica de Catalunya ThoughtTreasure Overview • a comprehensive platform for • NLP English, French • commonsense reasoning • • • • • • • A hotel room has a bed, night table, ... People has fingernails soda is a drink one hangs up at the end of a phone call the sky is blue dogs bark someone who is 16 years old is a teenager ThoughtTreasure Overview • 25,000 concepts organized into a hierarchy EVIAN -> FLAT-WATER -> DRINKING-WATER •55,000 words (English, French) food <-> aliment <-> FOOD •50,000 asertions about concepts green-pea is green •100 scripts ThoughtTreasure Overview • Text Agents for recognizing names, phones, etc • mechanisms for learning new words • • • • • • •X-phile is someone who likes X a syntactic parser a NL generator a semantic parser an anaphoric parser planning agents for achieving goals understanding agents ThoughtTreasure Example • Who created Bugs Bunny? • 1.0 (create human-interrogative-pronoun Bugs-Bunny) • 0.9 (create rock-group-the-Who Bugs-Bunny) • 1.0 (create Tex-Avery Bugs-Bunny) • 0.1 (not (create rock-group-the-Who Bugs-Bunny)) Meaning German Rigau i Claramunt http://www.lsi.upc.es/~rigau TALP Research Center Departament de Llenguatges i Sistemes Informàtics Universitat Politècnica de Catalunya Meaning Overview Bases de Conocimiento – Enriquecimiento automático de EWN (modelos verbales, etc.) – Aproximación mixta (KB + ML) – Q/A Problema – ambigüedad estructural y léxica Aproximación – localizar automáticamente ejemplos de sentidos (Leacock et al. 98, Mihalcea y Moldovan 99) – WSD a gran escala (Boosting, SVM, transductivos …) – Acquisición Conocimiento (Ribas 95, McCarthy 01) Meaning Exploiting EWN Semantic Relations <evento> <agrupación grupo colectivo> <evento social> <grupo_social> <competición, concurso> <organización> <partido_1> <partido_2, partido_político> <semifinal> <cuartos_de_final> <partido_laborista> Meaning Exploiting EWN Semantic Relations partido 1 Todos los partidos piden reformas legales para TV3. La derecha planea agruparse en un partido. El diputado reiteró que ni él ni UDC, “como partido”, han recibido dinero de Pellerols. partido 2 Pero España puso al partido intensidad, ritmo y coraje. El seleccionador cree que el partido de hoy contra Italia dará la medida de España El Racing no gana en su campo desde hace seis partidos. Meaning Exploiting EWN Semantic Relations partido 1 No negociaremos nunca com un partido político que sea partidario de la independencia de Taiwan. Una vez más es noticia la desviación de fondos destinadoss a la formación ocupacional hacia la financiación de un partido político. Estas lleyess fueron votadas gracias a un consenso general de los partidos políticos. partido 2 Rivera pide el suporte de la afición para encarrilar las semifinales. Sólo el equipo de Valero Ribera puede sentenciar una semifinal como lo hizo ayer en un Palau Blaugrana completamente entregado. El Racing ganó los cuartos de final en su campo. Meaning Arquitecture English Web Corpus ACQ WSD English EWN WSD UPLOAD UPLOAD PORT PORT ACQ Spanish Web Corpus Catalan Web Corpus Spanish EWN Catalan EWN WSD Italian EWN Italian Web Corpus ACQ PORT Multilingual Central Repository UPLOAD UPLOAD PORT Basque EWN WSD ACQ Basque Web Corpus