Where we are Yesterday: Working with Tree-Adjoining Grammar introducing XTAG analyses of extraction Day 4: Grammar implementation & metagrammars di�erent templates for di�erent constructions and valency frames rich feature sets in the nodes (for constraining substitution and adjunction) Kata Balogh & Timm Lichte University of Düsseldorf, Germany Today: How to factorize the set of templates? ESSLLI 2015, Barcelona ) express lexical generalizations, e.g. active-passive diathesis ) define tree families SFB 991 How to turn this into an electronic resource? How to plug it into a lexicon and use it? Balogh & Lichte (Düsseldorf) Outline 1 What is grammar implementation? 2 Two ways of tree template implementation Metarules Metagrammars 3 eXtensible Metagrammar (XMG) 4 LTAG as Construction Grammar 5 Lexicon and parser 6 Two kinds of grammar implementation grammar/ linguistic theory “implementation” specifications in accordance with a grammar formalism evaluation of the theory As is frequently pointed out but cannot be overemphasized, an important goal of formalization in linguistics is to enable subsequent researchers to see the defects of an analysis as clearly as its merits; only then can progress be made e�iciently. (Dowty 1979: 322) Summary Balogh & Lichte (Düsseldorf) 2 3 Balogh & Lichte (Düsseldorf) 4 Two kinds of grammar implementation What kind of grammar resource? grammar/ linguistic theory S “implementation” tree template specifications in accordance with a grammar formalism NP VP V⇧ NP “implementation” evaluation of the theory grammar resource lexical insertion anchor computational application Balogh & Lichte (Düsseldorf) 4 The implementation task for LTAG repairs Balogh & Lichte (Düsseldorf) 5 Two ways of grammar implementation with TAG Two existing toolkits: XTAG tools[27] 1 implementation tools General task Implement a large-coverage LTAG, i.e. based on the XTAG grammar! ) metarule approach Subtasks: 1 Generate unlexicalized trees (= tree templates)! 2 Generate a database of lexical anchors (= the lexicon)! 3 Connect the tree templates with the lexicon (= lexical insertion)! Balogh & Lichte (Düsseldorf) 2 editor/viewer for MorphDB and SynDB 3 parser XMG + lexConverter + TuLiPA 1 XMG: eXtensible MetaGrammar[9] ) metagrammar approach 6 2 lexConverter (LEX2ALL) 3 TuLiPA: Tübingen Linguistic Parsing Architecture[18] Balogh & Lichte (Düsseldorf) 7 Outline The situation 12 templates for intransitive verbs 1 What is grammar implementation? 2 Two ways of tree template implementation Metarules Metagrammars 3 eXtensible Metagrammar (XMG) 4 LTAG as Construction Grammar 5 Lexicon and parser 39 tree templates for transitive verbs S S NP VP NP VP V V S NP S NP S NP NP S VP NP VP V 6 V ... Summary NP ... Basically, XTAG defines a set of 1008 unrelated tree templates. Balogh & Lichte (Düsseldorf) 8 Metarules for LTAG Balogh & Lichte (Düsseldorf) 9 Metarules for LTAG: Example S S Idea from GPSG[12] , later applied to XTAG[2,3,21] extraction: =) NP NP S NP tree fragments metarules accumulation S core grammar (tree templates) metarules expanded grammar (tree templates) NP1 S passivization: NP0 VP VP =) VP metarules connect tree templates of a tree family VP V⇧ NP1 V⇧ PP P NP0 by Balogh & Lichte (Düsseldorf) 10 Balogh & Lichte (Düsseldorf) 11 Metarules for LTAG: Problems[2] Metarules for LTAG: Example Metarules are very powerful: nx0Vnx1 extraction deletion, copying, recursive application, metavariables over trees passivization order sensitive in the unrestricted case: undecidable[25] nx1Vbynx0 W0nx0Vnx1 extraction Restrictions (GPSG):[22] finite closure: apply every metarule at most once! W1nx1Vbynx0 ) still NP-complete Tnx0nx1 biclosure: apply at most two metarules in a row! ) insu�icient for LTAG metarules[2] explicit rule ordering (by means of finite state automata)[21] Balogh & Lichte (Düsseldorf) 12 Metagrammars for LTAG 13 Metagrammars for LTAG: Tree descriptions LD : Description language for trees Candito (1996)[8,9,26] Let n2 and n2 be node variables: accumulation of descriptions tree fragments Balogh & Lichte (Düsseldorf) n ! n2 | n1 !+ n2 | n1 !⇤ n2 | *. 1 + n n2 | n1 + n2 | n1 ⇤ n2 | // Description := .. 1 // . n1 = n2 | , Description ^ Description tree templates Example: S arbitrary disjunction NP corresponds to VP S tree families NP Balogh & Lichte (Düsseldorf) 14 Balogh & Lichte (Düsseldorf) ⇤ + NP corresponds to nS ! nNP nS ! nVP nNP nVP ^ ^ nS ! nNP1 ^ nS !+ nNP2 ^ nNP1 ⇤ nNP2 15 Metagrammars for LTAG: Example Metagrammars for LTAG: Example Minimal model of tree descriptions Minimal model of tree descriptions You may add edges but not nodes! You may add edges but not nodes! S S NP NP S S S S NP NP |= VP VP S NP VP V⇧ VP VP V⇧ NP NP Balogh & Lichte (Düsseldorf) 16 Metagrammars for LTAG: Example 16 You may add edges but not nodes! S Balogh & Lichte (Düsseldorf) |= NP NP S VP VP V⇧ NP NP S VP NP VP S NP S NP VP V⇧ VP S S NP NP S VP NP VP NP |= S VP NP Minimal model of tree descriptions S NP V⇧ Balogh & Lichte (Düsseldorf) You may add edges but not nodes! S VP Metagrammars for LTAG: Example Minimal model of tree descriptions NP VP |= NP Balogh & Lichte (Düsseldorf) ... S NP 16 VP VP 17 Metagrammars for LTAG: Properties Metagrammars for LTAG: Passivization S no deletion, no copying, no recursion NP declarative, order insensitive The number of minimal models is finite. ^ VP VP ^ V⇧ disjunction does the trick! BUT: the number of minimal models can grow exponentially (O(n!)) in terms of the number of described nodes. VP *. .. .. VP .. .. .. V⇧ NP .. . , V⇧ _ P S NP |= NP VP V⇧ VP V⇧ NP by S Does it su�ice? How to express passivization? PP +/ // // // // // // / - PP P NP NP by Tnx0nx1 Balogh & Lichte (Düsseldorf) 18 Metagrammar for LTAG: Classes Balogh & Lichte (Düsseldorf) 19 Metagrammar for LTAG: Classes Tree descriptions are bundled into so-called classes: LC : Description language for the combination of tree descriptions Tnx0Vnx1: Class := Name : Content Description | Name | * Content := .. Content _ Content | , Content ^ Content Upon instantiating/using a class: NP +/ / - VP Tnx0Vnx1: Subject Node variables are replaced by fresh ones. Tnx0Vnx1: Node variables are known to the instantiating class. ^ ^ VP V⇧ ^ VP V⇧ _ +/ // // PP // // P NP // // / by - VerbProjection ^ (Object _ by-Phrase) nx0V Subject The class name is replaced by the content in the instantiating class. VerbProjection ^ (Object _ by-Phrase) Object by-Phrase nx0V ) Classes can be reused! Balogh & Lichte (Düsseldorf) S *. .. .. VP .. .. .. V⇧ NP .. . , Tnx0Vnx1 20 Balogh & Lichte (Düsseldorf) 21 Metagrammar for LTAG: Class hierarchies Metagrammar for LTAG: Class hierarchies There are very many possible class hierarchies . . . Subject There are very many possible class hierarchies . . . WhNP+EmptyWord BaseSubject Subject Object WhSubject WhObject BaseObject Object nx0V VerbProjection alphanx0V VerbProjection alphanx0V WhNP+EmptyWord alphaW0nx0V nx0Vnx1 Wnx0Vnx1 alphanx0Vnx1 alphaW0nx0V alphaW0nx0Vnx1 Tnx0V alphaW0nx0Vnx1 alphaW1nx0Vnx1 alphanx0Vnx1 alphaW1nx0Vnx1 Tnx0V Tnx0Vnx1 Tnx0Vnx1 Balogh & Lichte (Düsseldorf) 22 Metagrammar for LTAG: Class hierarchies There are very many possible class hierarchies . . . WhNP+EmptyWord WhSubject Subject Subject WhObject VerbProjection 22 Metagrammar for LTAG: Class hierarchies There are very many possible class hierarchies . . . BaseSubject Balogh & Lichte (Düsseldorf) BaseSubject BaseObject WhNP+EmptyWord WhSubject Object WhObject BaseObject VerbProjection Object Tnx0V Tnx0V Balogh & Lichte (Düsseldorf) Tnx0Vnx1 Tnx0Vnx1 [1] 22 Balogh & Lichte (Düsseldorf) 22 Metagrammar for LTAG: Class hierarchies Outline . . . but not everything is possible: alphanx0Vnx1 alphaW0nx0Vnx1 1 What is grammar implementation? 2 Two ways of tree template implementation Metarules Metagrammars 3 eXtensible Metagrammar (XMG) 4 LTAG as Construction Grammar 5 Lexicon and parser 6 Summary alphanx1Vbynx0 alphaW1nx1Vbynx0 Tnx0nx1 Balogh & Lichte (Düsseldorf) 23 eXtensible Metagrammar (XMG): Background Balogh & Lichte (Düsseldorf) eXtensible Metagrammar (XMG): Example S developed at LORIA, Nancy, and LIFO, Orléans.[9] wri�en in Oz/Mozart YAP and Python (as of XMG2) NP# available at http://sourcesup.cru.fr/xmg Why “eXtensible” ? highly VP class Subject export ?S modularized[19] declare ?S ?NP ?VP dimensions with dedicated description languages and compilers (<syn>, <sem>, <frame>, <morph>, . . . ) { <syn>{ node ?S [cat=s]{ interface through shared variables node ?NP (mark=subst) [cat=np] node ?VP [cat=vp] Some existing implementations using XMG: } French: FrenchTAG[8] } English: XTAG with XMG[1] German: Balogh & Lichte (Düsseldorf) 24 } GerTT[15] 25 Balogh & Lichte (Düsseldorf) 26 eXtensible Metagrammar (XMG): Example eXtensible Metagrammar (XMG): Example S S NP# VP VP NP# VP V⇧ |= S NP# class Subject VP V⇧ export ?S declare ?S ?NP ?VP class alphanx0v { <syn>{ import VerbProjection[] node ?S [cat=s]; declare ?Subj node ?NP (mark=subst) [cat=np]; { node ?VP [cat=vp]; ?Subj = Subject[]; ?S -> ?NP; ?Subj.?VP = ?VP ?S -> ?VP; } ?NP >> ?VP } } Balogh & Lichte (Düsseldorf) 27 eXtensible Metagrammar (XMG): Example with <frame> (Lichte & Outline ... <syn>{ node ?S [cat=s]; node ?SUBJ [cat=np, top=[i=?1]]; S NP[I= 1 ] VP[E= 0 ] V⇧[E= 0 ] 26 ����� 0 66 64����� Balogh & Lichte (Düsseldorf) 37 77 17 5 28 Petitjean)[17] class Subj class Subj Balogh & Lichte (Düsseldorf) node ?VP [cat=vp,bot=[e=?0]]; node ?V (mark=anchor) [cat=v,top=[e=?0]]; ?S -> ?SUBJ; ?S -> ?VP; ?VP -> * ?V; ?SUBJ >> ?VP }; <frame>{ ?0[event, actor:?1] } ... 29 1 What is grammar implementation? 2 Two ways of tree template implementation Metarules Metagrammars 3 eXtensible Metagrammar (XMG) 4 LTAG as Construction Grammar 5 Lexicon and parser 6 Summary Balogh & Lichte (Düsseldorf) 30 LTAG as Construction Grammar LTAG as Construction Grammar By virtue of adjunction, cases of long-distance dependencies can be immediately captured: Claim (ongoing work) (1) Who does Mary say sometimes walks into the house. LTAG shares central ideas with (some versions of) Construction Grammar (CxG):[13] S 1 only surface structure: no transformational or derivational component AUX 2 grammatical constructions: “form-function pairings at varying levels of complexity and abstraction” (e.g. words, phrases) does 3 a network of constructions “which nodes are related by inheritance links ” S S NP NP S VP V NP S* VP VP say PP V walks only surface structure Balogh & Lichte (Düsseldorf) 31 LTAG as Construction Grammar X Balogh & Lichte (Düsseldorf) 32 LTAG as Construction Grammar S NP[�= 1 0 ] ] VP[�= 0 ] 0 VP[�= V⇧[�= 0 ] 0 ] NP[�= 2 ] PP[����=to, �= 3 , �= 4 ] VP VP (2) John gives/sends the book to Mary S[�= VP NP VP V⇧ 26��������� 37 66 77 2 3 66 77 66activity 77 66����� 6 77 7 17 ����� 6 66 7 4 5 26 3777 66 bounded-translocation 66 7777 66 7777 66������ 4 66����� 2 66���� 7777 66 3 4 575 4 V⇧ ⇤ Subj V⇧ VSpine DirObj DirPrepObj intransitive intransitive motion construction X transitive transitive motion construction inheritance network of constructions Balogh & Lichte (Düsseldorf) PP NP V⇧ (from Kallmeyer & Osswald (2013)) grammatical constructions VP 33 Balogh & Lichte (Düsseldorf) PO-to X 34 LTAG as Construction Grammar: Summary LTAG incorporates central ideas of CxG: only surface structure X X Outline 1 What is grammar implementation? 2 Two ways of tree template implementation Metarules Metagrammars 3 eXtensible Metagrammar (XMG) 4 LTAG as Construction Grammar 5 Lexicon and parser 6 Summary grammatical constructions inheritance network of constructions X LTAG di�ers substantially from other implementations of CxG (e.g. Sign-based CxG[23] , Fluid CxG[24] ). ) di�erent empirical predictions or theoretical ramifications? Balogh & Lichte (Düsseldorf) 35 Lexicon and parser Balogh & Lichte (Düsseldorf) 36 Lexicon and parser: A 2-layered lexicon loves love “morphological lexicon” metagrammar XMG compiler Tnx0Vnx1 “lemma lexicon” implementational cycle Morphological lexicon maps an (inflected) token to some base form (= lemma), while preserving morphological information in a feature structure. compiled metagrammar 2-layered lexicon TuLiPA parser loves Peter [pos=v; num=sing; pers=3;] [pos=n; num=sing; pers=3; case=nom|acc;] parsing result Interface with tree templates: Feature unification during lexical insertion input sentence Balogh & Lichte (Düsseldorf) love Peter 37 Balogh & Lichte (Düsseldorf) 38 Lexicon and parser: A 2-layered lexicon Lexicon and parser: The TuLiPA parser (Parmentier et al.)[18] loves love “morphological lexicon” Tnx0Vnx1 TuLiPA Tübingen Linguistic Parsing Architecture (TuLiPA) “lemma lexicon” uses Range Concatenation Grammar (RCG) as a pivot formalism. Lemma lexicon maps a lemma onto tree tuple families, while also containing selectional restrictions (e.g., case assignment). *ENTRY: love *CAT: v *SEM: *ACC: 1 *FAM: Tnx0Vnx1 *FILTERS: [] *EX: *EQUATIONS: NParg1 -> case = nom NParg2 -> case = acc *COANCHORS: Components: What is grammar implementation? 2 Two ways of tree template implementation Metarules Metagrammars 3 eXtensible Metagrammar (XMG) 4 LTAG as Construction Grammar 5 Lexicon and parser 6 Summary 2 RCG parser ! RCG derivation forest ! TAG derivation forest Parse viewer (derived tree, derivation tree, dependency view, semantic representation) Availability of TuLiPA: wri�en in Java and released under the GNU GPL (http://sourcesup.cru.fr/tulipa/) 38 Outline 1 TAG-to-RCG converter (on-line) 3 Interface with tree templates: EQUATIONS ! nodes of tree templates FILTERS ! selection of tree templates Balogh & Lichte (Düsseldorf) 1 Balogh & Lichte (Düsseldorf) 39 Summary A metagrammar contains descriptions of unanchored elementary trees. Metagrammar descriptions are declarative and multidimensional. Metagrammar descriptions make up an inheritance hierarchy. The metagrammar allows one to express and implement lexical generalizations, e.g. active-passive diathesis. TAG + metagrammars implements central ideas of CxG. Hot topics: parsing with metagrammars[7] use metagrammars for morphological descriptions[11,20] Adjacent topics: grammar induction from treebanks[5,6,14,26] Balogh & Lichte (Düsseldorf) 40 Balogh & Lichte (Düsseldorf) 41 [1] [2] [3] [4] [5] [6] [7] [8] [15] [16] [17] [18] [19] [20] [21] Alahverdzhieva, Katya. 2008. XTAG using XMG. A core Tree-Adjoining Grammar for English. University of Nancy 2 / University of Saarland Master’s Thesis. http://homepages.inf.ed.ac.uk/s0896251/pubs/msc-sb2008.pdf. Becker, Tilman. 1994. Hytag: a new type of tree adjoining grammars for hybrid syntactic representations of free word order languages. Universität des Saarlandes PhD thesis. http://www.dfki.de/~becker/becker.diss.ps.gz. Becker, Tilman. 2000. Pa�erns in metarules for TAG. In Anne Abeillé & Owen Rambow (eds.), Tree Adjoining Grammars: Formalisms, linguistic analyses and processing (CSLI Lecture Notes 107), 331–342. Stanford, CA: CSLI Publications. Candito, Marie-Hélène. 1996. A principle-based hierarchical representation of LTAGs. In Proceedings of the 16th international Conference on Computational Linguistics (COLING 96). Copenhagen. http://aclweb.org/anthology-new/C/C96/C96-1034.pdf. Chen, John, Srinivas Bangalore & K. Vijay-Shanker. 2006. Automated extraction of Tree-Adjoining Grammars from treebanks. Natural Language Engineering 12 (3). 251–299. Chiang, David. 2000. Statistical parsing with an automatically-extracted Tree Adjoining Grammar. In Proceedings of the 38th annual meeting of the Association for Computational Linguistics, 456–463. Hong Kong. de la Clergerie, Éric Villemonte. 2013. Exploring beam-based shi�-reduce dependency parsing with DyALog: results from the SPMRL 2013 shared task. In 4th workshop on statistical parsing of morphologically rich languages (SPMRL’2013). Sea�le. http://hal.inria.fr/docs/00/87/91/29/PDF/dyalogsr.pdf. Crabbé, Benoît. 2005. Représentation informatique de grammaires d’arbres fortement lexicalisées: Le cas de la grammaire d’arbres adjoints. Université Nancy 2 PhD thesis. Kallmeyer, Laura, Timm Lichte, Wolfgang Maier, Yannick Parmentier & Johannes Dellert. 2008. Developing a TT-MCTAG for German with an RCG-based parser. In European Language Resources Association (ELRA) (ed.), Proceedings of the sixth international Conference on Language Resources and Evaluation (LREC’08). Marrakech, Morocco. Kallmeyer, Laura & Rainer Osswald. 2013. Syntax-driven semantic frame composition in Lexicalized Tree Adjoining Grammar. Journal of Language Modelling 1. 267–330. Lichte, Timm & Simon Petitjean. 2015. Implementing semantic frames as typed feature structures with XMG. Journal of Language Modelling 3(1). 185–228. http://jlm.ipipan.waw.pl/index.php/JLM/article/view/96. Parmentier, Yannick, Laura Kallmeyer, Wolfgang Maier, Timm Lichte & Johannes Dellert. 2008. TuLiPA: A syntax-semantics parsing environment for mildly context-sensitive formalisms. In Proceedings of the ninth international workshop on Tree Adjoining Grammars and related formalisms (TAG+9), 121–128. Tübingen, Germany. Petitjean, Simon. 2014. Génération Modulaire de Grammaires Formelles. Orléans, France: Université d’Orléans Thèse de Doctorat. https://tel.archives-ouvertes.fr/tel-01163150/. Petitjean, Simon, Younes Samih & Timm Lichte. 2015. Une métagrammaire de l’interface morpho-sémantique dans les verbes en arabe. In Actes de la 22e conférence sur le Traitement Automatique des Langues Naturelles, 473–479. Caen, France. http://www.atala.org/taln_archives/TALN/TALN-2015/taln-2015-court-024. Prolo, Carlos A. 2002. Generating the XTAG English grammar using metarules. In Proceedings of the 19th international Conference on Computational Linguistics (COLING 2002), 814–820. Taipei. Taiwan. [9] [10] [11] [12] [13] [14] [22] [23] [24] [25] [26] [27] Crabbé, Benoit, Denys Duchier, Claire Gardent, Joseph Le Roux & Yannick Parmentier. 2013. XMG: eXtensible MetaGrammar. Computational Linguistics 39(3). 1–66. http://hal.archives-ouvertes.fr/hal-00768224/en/. Dowty, David R. 1979. Word meaning and Montague Grammar. Reprinted 1991 by Kluwer Academic Publishers. Dordrecht: D. Reidel Publishing Company. Duchier, Denys, Brunelle Magnana Ekoukou, Yannick Parmentier, Simon Petitjean & Emmanuel Schang. 2012. Describing morphologically rich languages using metagrammars: A look at verbs in Ikota. In Workshop on language technology for normalisation of less-resourced languages (SALTMIL 8 – AfLaT 2012), 55–59. http://www.tshwanedje.com/publications/SaLTMiL8-AfLaT2012.pdf#page=67. Gazdar, Gerald. 1981. Unbounded dependencies and coordinated structure. Linguistic Inquiry 12. 155–182. Goldberg, Adele. 2013. Constructionist approaches. In Thomas Ho�mann & Graeme Trousdale (eds.), The Oxford handbook of Construction Grammar, 15–31. Oxford, UK: Oxford Univ. Press. Kaeshammer, Miriam & Vera Demberg. 2012. German and English treebanks and lexica for Tree-Adjoining Grammars. In Nicole�a Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk & Stelios Piperidis (eds.), Proceedings of the eighth international Conference on Language Resources and Evaluation (LREC’12). Istanbul, Turkey: European Language Resources Association (ELRA). Ristad, Eric Sven. 1987. Revised General Phrase Structure Grammar. In Proceedings of the 25th annual meeting of the Association for Computational Linguistics, 243–250. Stanford, CA. http://www.aclweb.org/anthology/P87-1034. Sag, Ivan A. 2012. Sign-Based Construction Grammar: An informal synopsis. In Hans C. Boas & Ivan A. Sag (eds.), Sign-Based Construction Grammar, vol. 193 (CSLI Lecture Notes), 69–202. Stanford, CA: CSLI Publications. van Trijp, Remi. 2013. A comparison between Fluid Construction Grammar and Sign-Based Construction Grammar. Constructions and Frames 5 (1). 88–116. Uszkoreit, Hans & Stanley Peters. 1987. On some formal properties of metarules. English. In Walter J. Savitch, Emmon Bach, William Marsh & Gila Safran-Naveh (eds.), The formal complexity of natural language (Studies in Linguistics and Philosophy 33), 227–250. Dordrecht, The Netherlands: D. Reidel Publishing. http://dx.doi.org/10.1007/978-94-009-3401-6_9. Xia, Fei. 2001. Automatic grammar generation from two di�erent perspectives. University of Pennsylvania PhD thesis. http://faculty.washington.edu/fxia/papers_from_penn/thesis.pdf. XTAG Research Group. 2001. A Lexicalized Tree Adjoining Grammar for English. Tech. rep. Philadelphia, PA: Institute for Research in Cognitive Science, University of Pennsylvania.