8th Intex/NooJ Workshop Besançon, May 30-June 1, 2005 Building a lexicon-grammar of frozen sentences of Portuguese The inheritance problem revisited J. Baptista1,2 , G. Fernandes1 and A. Correia1 1 Universidade do Algarve – FCHS Campus de Gambelas, P – 8005-139 Faro, Portugal. jbaptis@ualg.pt; w3.ualg.pt/~jbaptis 2 L2F – Spoken Language Systems Laboratory, Inesc-ID Lisboa, R. Alves Redol, 91000-029 Lisbon, Portugal. Lexicon-grammar of Portuguese Frozen Sentences Plan the lexicon-grammar of frozen sentences general definition of frozen sentences current status of the lexicon-grammar of frozen sentences main linguistic properties described in the lexicongrammar the inheritance problem revisitation of the inheritance problem importance of inheritance for the processing of frozen sentences future perspectives «De boas intenções está o Inferno cheio» (De bonnes intentions, l’Enfer est complet) theoretical and methodological framework • lexicon-grammar (M. Gross 1975, 1982, 1989, 1996) • transformational operator grammar (Zellig S. Harris 1976, 1982, 1991) • the basic meaning unit is the elementary sentence (and not the word), tipically considered as a verb, its subject and its essential complements general definition of frozen sentences • frozen sentence is an elementary sentence, in the sense it conveys a semantic predicate, but • different from free, distributional verbs, since • the global meaning can not be calculated from the meaning of their components when they are used separately • follow general syntactic rules for sentence building • show important combinatorial constraints, namely, • on distributional variation on argument positions and • on the application of several transformations general definition of frozen sentences (cont.) O Pedro passou de cavalo para burro (Peter went from horse to donkey) ‘Peter came to be in a worst situation then he was’ *O Pedro passou de burro para cavalo Peter went from donkey to horse *O Pedro passou do cavalo castanho para o burro cinzento Peter went from the brown horse to the grey donkey general definition of frozen sentences (cont.) *O Pedro passou de cavalos para burros (Peter went from horses to donkeys) * O Pedro passou para burro de cavalo (Peter went to donkey from horse) general definition of frozen sentences (cont.) O Pedro passou de cavalo para burro - De (onde + quê) para (onde + quê) passou o Pedro? - *De cavalo para burro - From where/what to where/what did Peter go? - from horse to donkey general definition of frozen sentences (cont.) • completely frozen sentences are very rare: A procissão ainda vai no adro (The procession is still in the yard) ‘some process <kown but not mentioned> is still in its begining’ • usually one (often the subject) or more argument positions are distributionally ‘free’ • these positions are described as in free sentences general definition of frozen sentences (cont.) • distributional constraints on free argument positions depend not only on the verb but on the verb-frozen arguments combination: O Pedro amarinhou pelas paredes acima (Peter climbed the walls up) ‘Peter became very irritated’ º(o macaco+a aranha) amarinhou pelas paredes acima ‘the ape/spider climbed the walls up’ º = literal general definition of frozen sentences (cont.) • ambiguous frozen sentences: O Pedro amarinhou pelas paredes acima (ambiguous) (O Pedro + o macaco + a aranha) amarinhou pela parede acima (litteral) • total number of frozen sentences may be similar to free, ordinary, distributional verbs • appear often in discourse, • include everyday vocabulary, technical terms, etc. current status of the lexicon-grammar of frozen sentences of European Portuguese • • • • • • collection from several sources over 4,000 frozen sentences formal classification based on M. Gross (1982, 1989) description by way of LG binary matrixes examples in tables (testing) INTEX to formalize master-graphs and apply them to corpora (Silberztein 1993, 2000) • CetemPúblico corpus (www.linguateca.pt) frag. 1 & 2 (~20 M words) • Portuguese lexical ressources (delaf_v2) from public domain (label.ist.utl.pt) current status (cont.) • on-going research (far from concluded!) • current classes only include V-NPs combinations (see classification table) • certain formal classes were left out for the moment: – – – – – frozen subjects (C0) exclamations, interjections (C0E) sentential arguments (CV, C5, etc.) frozen verb-adverb combinations (CADV) sentences with ‘support-verbs’ and ‘operator-verbs’ current status (cont.) • frozen subjects: A brincadeira saiu cara ao Pedro (The game came out expensive to Peter) ‘something was prejudicial to Peter’ • exclamations, interjections: Vai ver se está a chover ! (go see if it is raining!) ‘Get lost, don’t bother me!’ current status (cont.) • sentential arguments: Vale a pena ler esse livro (it worth the sacrifice to read that book) ‘It is very useful to read that book’ • frozen verb-adverb combinations: Parece mal fazer isso (‘it looks bad to do that’) O Pedro foi-se abaixo (Peter went himself down) ‘Peter became depressed’ current status (cont.) • ‘support-verbs’ – difficulty in distinguishing support-verbs from frozen sentences – many sentences with elementary support-verbs and their main aspectual and stylistic variants – some with two frozen complements – noun is not a obviously predicative noun (abstract, associated with verb or adjective) O Pedro fez trinta por uma linha (Peter did thirty by one line) ’Peter made much mischief’ O Pedro deu com os burros na água (Peter gave with donkeys on the water) ‘Peter lost’ current status (cont.) • ‘operator-verbs’ – sentences involving operator-verbs but otherwise not analyzable by syntactic decomposition into Vop + elementary sentence: O Pedroi pôs as barbasi de molho (Peter put the beard-fp in the water) ‘Peter is getting old/tired and retired to a quieter life’ Pedroi pôs # As barbasi do Pedroi estão de molho (Peter put # Peter’s beard is in the water) NB: estar de molho (be in the water/sauce) is considered to be a Vsup_Npred combination (Ranchhod 1990) linguistic properties • absolute constructions (NP deletion) A Maria chorou (lágrimas de crocodilo + *E) (Mary cried crocodile tears) ‘Mary faked to be sad’ A Maria chorou (rios de lágrimas + E) (Mary cried rivers of tears) ‘Mary cried a lot’ linguistic properties • obligatory permutations: O Pedro fez das fraquezas forças (Peter did from weaknesses strengths) ‘Peter overcome his own difficulties’ *O Pedro fez forças das fraquezas O Pedro fez das tripas coração (Peter did from guts heart) ‘Peter overcome his own difficulties’ *O Pedro fez coração das tripas linguistic properties (cont.) • dative restructuring (Guillet & Leclère 1981; Leclère 1995) N0 V (Na de Nb)1 [Rdat] =N0 V (Na)1 a (Nb)2 O Pedro lambeu as botas do chefe = O Pedro lambeu as botas ao chefe (Peter licked the boots of the boss Peter licked the boots to the boss) ‘Peter was subservient to his boss (with the goal of personal gain)’ linguistic properties (cont.) • Symmetry O Pedro juntou os trapinhos com a Maria (Peter put together the rags with Mary) ‘Peter and Mary got married/went to live together’ = A Maria juntou os trapinhos com o Pedro = O Pedro e Maria juntaram os trapinhos = A Maria e o Pedro juntaram os trapinhos linguistic properties (cont.) • Pronouning O Pedro apertou a mãoi do Joãoi (Peter squeezed the hand of John) ‘Peter shook hands with John’ O Pedro apertou (a suai mãoi + a mãoi d_elei) (Peter squeezed his hand + the hand of him) O Pedro apertou os ossosi do Joãoi (Peter squeezed the bones of John) ‘Peter shook hands with John’ *O Pedro apertou (os seusi ossosi + os ossosi d_elei) (Peter squeezed his hand + the hand of him) linguistic properties (cont.) • Passive – usually possible whenever direct object is a free NP O Pedro lançou o João às feras (Peter threw John to the beasts) ‘Peter put John is a difficult situation’ O João foi lançado às feras (E + pelo Pedro) (John was thrown to the beasts E/by Peter) linguistic properties (cont.) • Passive (cont.) – sometimes, even frozen object NP can undergo Passive O Governo prometeu mundos e fundos ao povo (The Government promised worlds and funds to the people) ‘to promise too much, impossible to comply’ = Mundos e fundos foram prometidos ao povo (E + pelo Governo) (Worlds and funds were promised to the people E/by the Government) linguistic properties (cont.) • Passive (cont.) – as far as recognition of frozen sentences is concerned, Passives do not constitute insurmountable problem – for each verb, the corresponding adjective is given in the LG – the master-graph describes such adjectival-like constructions, including adnominal position of V-a form next to C1 Os alunos, que foram lançados às feras pelos professores, revoltaram-se (the students, which were thrown to the beast by the teacher, rebelled) Os mundos e fundos prometidos pelo Governo <...> (the worlds and funds promised by the Government) Uma santa mártir que morreu queimada, depois de ter sido lançada às feras e sobrevivido, porque <sic> estas se terem deitado e lambido os seus pés (CetemPúblico#1) linguistic properties (cont.) • Conversion-like operations (G. Gaston 1989) O Pedro deu no coco ao João (Peter gave in the coconut to John) ‘Peter spanked John’ O João apanhou no coco do Pedro (John got in the coconut from Peter) ‘John was spanked by Peter’ linguistic properties (cont.) • Conversion-like operations (cont.) O Pedro foi aos cornos (a + de)_o João (Peter went to the horns of John) ‘Peter spanked John’ O João apanhou nos cornos do Pedro (John got on the horns from Peter) ‘Peter spanked John’ linguistic properties (cont.) • Obligatory negation (NegObrig) O Pedro não chega aos calcanhares do João (Peter does not get to the heels of John) ‘Peter is not a match for/is much inferior to John’ *O Pedro chega aos calcanhares do João linguistic properties (cont.) • intrinsically pronominal constructions (Vse): O Pedro fechou-se em copas (Peter closed himself in diamonds ) ‘Peter kept himself silent’ *O Pedro fechou (E + o João) em copas linguistic properties (cont.) • interaction between NegOblig and Vse O Pedro não se deu por achado (Peter did not gave himself by found_ms) ‘Peter did not restrain himself’ <from doing something> LG Tables & Master-graphs LG Tables & Master-graphs C + + + - + - a em a em em a em em em em em em por em por por de em a em em os os os o a os a o a o as a as a a a o a a a a pés braços calcanhares ombro casaca olhos cara focinho consideração jogo costas cantiga mãos palavra língua cabeça corpo cadeira cabeça ferida sombra + + + + + + + + + + + + + - + + + + + + + + + + + + + + + + + + + + + de N = PRO:Pos de N = PRO:O [Rdat] Det Prep NegObrig + + + - Vse N0 =: N-hum N0 =: Nhum + + + + + + + + + + + + + + + + + + V <atirar> <cair> <chegar> <chorar> <cortar> <crescer> <cuspir> <dar> <descer> <entrar> <falar> <ir> <passar> <pegar> <puxar> <reger> <sair> <sentar> <subir> <tocar> <viver> + + + + + + + + + + + + + + + + + + + + + Extract from table CPN Exemplo O Zé atirou-se aos pés da Ana O Zé caiu nos braços da Ana O Zé não chega aos calcanhares da Ana O Zé chora no ombro da Ana O Zé corta na casaca da Ana O Zé cresceu aos olhos do patrão O Zé cuspiu-lhe na cara A Ana deu no focinho do Zé O Zé desceu na consideração da Ana O Zé entrou no jogo da Ana O Zé fala nas costas da Ana O Zé foi na cantiga da Ana O projecto passou pelas mãos da Ana O Zé pegou na palavra da Ana A Ana puxou pela língua do Zé O Zé rege-se pela cabeça da Ana A riqueza do Zé saiu-lhe do corpo O Zé sentou-se na cadeira do Pedro O dinheiro subiu à cabeça do Zé O Zé tocou na ferida da Ana O Zé vive na sombra da Ana LG Tables & Master-graphs • until now, priority was given to building the LG • some tests over LG examples and different sized corpora • problems on matching LG examples – – – – e.g. CP1 (70% recall) <CATEG> at lema but <PRO+Pes:R> in M_grf ok embedded graphs “:Graph” unknown causes for mismatch • waiting for new solutions under NooJ Inheritance – no solutions, just talking about it... – so far, under INTEX it is not possible to locate neither compound words nor frozen sentences by lemma under INTEX – M_grf do not allow strings matched by *cfg of delae to inherit inflection values of the <V> element: O Pedro <brincou,brincar.V:P3s> com o fogo (Peter played with the fire) ‘Peter did something dangerous’ dle: O Pedro <brincou com o fogo,brincar com o fogo.V+CP1> O Pedro <brincou com o fogo,brincar com o fogo.V+CP1:P3s> Inheritance – a re-visitation • main reason for implementing inheritance in NLP systems, beyond morphology, is the processing of compound tenses • compound tenses are very frequent in Portuguese • around ~100 auxiliary verbs (Vaux) have already been described in Portuguese (Pontes 1977, Gonçalves 2000) • their combination with main verbs is complex (e.g. clitic positioning) a very naïf approximation to Vaux-V combinations (clitic pronouns were ignored) Inheritance (cont.) • their combination with other Vaux gives rise to complex syntactical patterns (M. Gross 1999; Ranchhod 2003) • many frozen sentences appear in compound tenses, this being one of the main causes for low precision (only part of the V complex is matched) Inheritance (cont.) • Calculation of compound tenses poses a serious challenge, • Vaux-V combinations can not be predicted a priori from V • highly specific meaning that a Vaux-V combination may convey, • multiple Vaux combination in front of V (average 1-3, but up to 4; limited recursiveness of Vaux*-V and limited patterns of combination may ‘ease’ the task of describing it) Inheritance (cont.) dealing with compound tenses in the M_grf • is linguistically inadequate • mixes two distinct linguistic phenomena • and unnecessarily complicates the graphs Inheritance (cont.) • compound tense calculation can not, at least a priori, be coded in the same way as simple tenses (lexical form of Vaux plays an important role) • it is controversial the decomposing of compound tenses in multiple tense/aspect/mode/x? features, collapsing in a single word • it is unclear how to derive final attributes of a long string of Vaux from the features of each one, since they are not always cumulative • except, perhaps, commonly used tenses such as ter + Vpp (for example, the solution proposed by Silberztein 2000) Inheritance • Intex already deals in a limited way with inheritance at the morphological module (it stores the information associated to an entry and reuses it when calculating the information of the word being morphologically analyzed) • It should be possible to tackle the problem in a similar way in a two stages process (eventually with recursive first step): 1. processing of compound tenses (recursive) 2. lexical analysis of frozen sentences Inheritance (cont.) Some exceptions of frozen sentences of CV: Esta estrada vai (ter + dar) a Paris (This road goes have:w/give:w to Paris) ‘This road leads to Paris’ that should be analyzed as CV, in the present: vai ter,ir ter.V+CV:P3s, vai dar,ir dar.V+CV:P3s and not as a compound future tense of the last V: vai ter, ter.V:F3s, vai dar, dar.V:F3s Inheritance • inheriting properties can apply to other cases • frozen sentences with gender/number agreement with other words (other than V); (note: this case is rather rare) [O Pedro]NP_ms não se deu por <achado>Adj_ms (Peter did not give himself found) ‘Peter didn’t wait to be asked in order to do something’ where a suitable analysis should be: não se deu por achado, Neg dar-se por achado.V+CP1:J3ms the ‘ms’ being derived from the Adj, the ‘s’ of deu <V:J3s> is reduced to avoid duplication. Inheritance (cont.) • Another case: Named Entities Recognition (of strings of proper names designating a single person) [José_cn:ms Manuel_cn:ms Durão_fn Barroso_fn] Npr:ms Inheritance (conclusion) • hélas! no solution, • but showing problems may help finding some solutions • minimalist solution with pre-processing of common Vaux-V combinations Future Perspectives • • • • • Continue the building of the LG of frozen sentences associate UNAMB feature, apply them to large-sized corpora associate FRQ information improve and integrate FS in the syntactic analysis (along with simple, distributional verbs) References ARAÚJO-VALE, Oto (2001), Expressões Cristalizadas do Português do Brasil: Uma proposta de Tipologia. Tese de Doutoramento, BRASIL, Araquara, UNESP. BAPTISTA, Jorge, (no prelo), Construções Simétricas: Complementos e Argumentos. in FIGUEIREDO, O., M.G. RIO-TORTO e F. SILVA (org.). [Volume de Homenagem ao Prof. Mário Vilela], Porto, Univ. Porto. BAPTISTA, Jorge, CORREIA, Anabela & FERNANDES, Maria da Graça (2004), Frozen Sentences of Portuguese: Formal Descriptions for NLP. Workshop on Multiword Expressions: Integrating Processing, International Conference of the European Chapter of the Association for Computational Linguistics, Barcelona (Spain), July 26, 2004, Barcelona, ACL, pp. 72-79. BOONS, Jean-Paul, GUILLET, Alain, & LECLÈRE, Christian (1976a) : La structure des phrases simples en français. Constructions Transitives. Paris, LADL – Univ. Paris 7. BOONS, Jean-Paul, GUILLET, Alain, & LECLÈRE, Christian (1976b) : La structure des phrases simples en français. Classes de constructions transitives. Genève, Droz. CHACOTO, Lucília (2005), O verbo FAZER em Construções Nominais Predicativas, Tese de Doutoramento, Faro, Universidade do Algarve. CORREIA, Anabela (em preparação). Léxico-Gramática das Frases Fixas do Português Europeu – Construções Transitivas. Tese de Mestrado, faro, Universidade do Algarve. FERNANDES, Maria da Graça (em preparação). Léxico-Gramática das Frases Fixas do Português Europeu – Construções Intransitivas. Tese de Mestrado, faro, Univ. Algarve. FOTOPOULOU, Aggeliki (1993): Une classification des phrases à compléments figés en grec moderne. Tese de Doutoramento. Paris : LADL/ Univ. Paris 7. GROSS, Maurice (1975): Méthodes en Syntaxe. Régimes des constructions complétives. Paris, Hermann. GROSS, Maurice (1981): « Les bases empiriques de la notion de prédicat sémantique ». Langages 63, pp. 7-52, Paris, Larousse. GROSS, Maurice (1982): «Une classification des phrases "figées" du français ». Revue Québécoise de Linguistique, 11.2, 1982, 151-185, Montréal, UQAM. GROSS, Maurice (1989): Les Expressions Figées, Une description des expressions françaises et ses conséquences théoriques, Paris , Univ. Paris 7, LADL. GROSS, Maurice (1996): «Lexicon Grammar», in K. BROWN & J. MILLER (Eds.), Concise Encyclopeadia of Syntatic theories. pp. 244-258, Cambridge, Pergamon. GROSS, Maurice (1999): Lemmatization of compound tenses. Fairon, C. (ed.) Linguisticae Investigationes ??. Amsterdam: John Benjamins Pub. Co. GROSS, Maurice (2000). « Verbes à trois compléments essentiels. » BULAG. Lexique, syntaxe et sémantique, mélanges offerts à Gaston Gross, p.199-210, Univ. Franche-Comté/Centre Tesnière. GROSS, Maurice (2000). Verbes à trois compléments essentiels. BULAG. Lexique, syntaxe et sémantique, mélanges offerts à Gaston Gross, p.199-210, Univ. Franche-Comté/Centre Tesnière. GUILLET, Alain, & LECLÈRE, Christian (1981): « Restructuration du groupe nominal». Langages 63, Paris, Larousse. GUILLET, Alain, & LECLÈRE, Christian (1982) : La structure des phrases simples en français. Les constructions locatives. Genève, Droz. HARRIS, Zellig S. (1991): A Theory of Language and Information. A Mathematical Approach. Oxford, Clarendon Press. LECLÈRE, Christian (1995), «Restructuration dative». Language Research, 31:1. Language Research Institute – Seoul National University: Seoul LECLÈRE, Christian (2000): « Expressions figées dans la francophonie: le projet BFQS ». BULAG, Lexique, Syntaxe et Sémantique, Mélanges offerts à Gaston Gross. Pierre-André BUVET, Denis LE PESANT et Michel MATHIEU-COLAS (Eds.), n° Hors Série, pp. 321-331, Besançon, Centre Lucien Tesnière. LECLÈRE, Christian (2002), «Organization of the lexicon-grammar of French verbs». Lingvisticae Investigationes, XXV I, USA, John Benjamins Publishing Company. MOGÓRRON-HUERTA, Pedro Joaquín (2002): Estudio Contrastivo de las Expresiones Fijas en Ser / Estar + Prep X Y Être Prép X en Francés. Tese de Doutoramento em Filologia Românica, Univ. Valência. SANTOS, António Nogueira (1990): Novos Dicionários de expressões idiomáticas. Lisboa, Edições Sá da Costa. SENELLART, Jean (1998): «Reconnaissance automatique des entrées du lexique-grammaire des phrases figées ». in Béatrice LAMIROY (Ed.): Le LexiqueGrammaire. Travaux de Linguistique 37, Bruxelles, Ducolot, 1999, pp.109-125. RANCHHOD, Elisabete (2003): Reconhecimento de sequências de verbos auxiliares por métodos de estados finitos. Lisboa: FLUL. SILBERZTEIN, Max (1993): Dictionnaires électroniques et analyse automatique de textes. Le système Intex. Paris, Masson. SILBERZTEIN, Max (2000): Intex (Manual). Paris, ASSTRIL.