O Pedro

advertisement
8th Intex/NooJ Workshop
Besançon, May 30-June 1, 2005
Building a lexicon-grammar
of frozen sentences of Portuguese
The inheritance problem revisited
J. Baptista1,2 , G. Fernandes1 and A. Correia1
1
Universidade do Algarve – FCHS
Campus de Gambelas, P – 8005-139 Faro, Portugal.
jbaptis@ualg.pt; w3.ualg.pt/~jbaptis
2
L2F – Spoken Language Systems Laboratory, Inesc-ID Lisboa,
R. Alves Redol, 91000-029 Lisbon, Portugal.
Lexicon-grammar
of Portuguese Frozen Sentences
Plan
 the lexicon-grammar of frozen sentences
general definition of frozen sentences
current status of the lexicon-grammar of frozen sentences
main linguistic properties described in the lexicongrammar
 the inheritance problem
revisitation of the inheritance problem
importance of inheritance for the processing of frozen
sentences
 future perspectives
«De boas intenções está o Inferno cheio»
(De bonnes intentions, l’Enfer est complet)
theoretical and methodological framework
• lexicon-grammar
(M. Gross 1975, 1982, 1989, 1996)
• transformational operator grammar
(Zellig S. Harris 1976, 1982, 1991)
• the basic meaning unit is the elementary sentence
(and not the word), tipically considered as a verb,
its subject and its essential complements
general definition of frozen sentences
• frozen sentence is an elementary sentence, in the
sense it conveys a semantic predicate, but
• different from free, distributional verbs, since
• the global meaning can not be calculated from the
meaning of their components when they are used
separately
• follow general syntactic rules for sentence building
• show important combinatorial constraints, namely,
• on distributional variation on argument positions
and
• on the application of several transformations
general definition of frozen sentences (cont.)
O Pedro passou de cavalo para burro
(Peter went from horse to donkey)
‘Peter came to be in a worst situation then he was’
*O Pedro passou de burro para cavalo
Peter went from donkey to horse
*O Pedro passou do cavalo castanho para o burro
cinzento
Peter went from the brown horse to the grey donkey
general definition of frozen sentences (cont.)
*O Pedro passou de cavalos para burros
(Peter went from horses to donkeys)
* O Pedro passou para burro de cavalo
(Peter went to donkey from horse)
general definition of frozen sentences (cont.)
O Pedro passou de cavalo para burro
- De (onde + quê) para (onde + quê) passou o
Pedro?
- *De cavalo para burro
- From where/what to where/what did Peter go?
- from horse to donkey
general definition of frozen sentences (cont.)
• completely frozen sentences are very rare:
A procissão ainda vai no adro
(The procession is still in the yard)
‘some process <kown but not mentioned> is still in its
begining’
• usually one (often the subject) or more argument
positions are distributionally ‘free’
• these positions are described as in free sentences
general definition of frozen sentences (cont.)
• distributional constraints on free argument
positions depend not only on the verb but on the
verb-frozen arguments combination:
O Pedro amarinhou pelas paredes acima
(Peter climbed the walls up)
‘Peter became very irritated’
º(o
macaco+a aranha) amarinhou pelas paredes
acima
‘the ape/spider climbed the walls up’
º = literal
general definition of frozen sentences (cont.)
• ambiguous frozen sentences:
O Pedro amarinhou pelas paredes acima (ambiguous)
(O Pedro + o macaco + a aranha) amarinhou pela
parede acima (litteral)
• total number of frozen sentences may be similar to
free, ordinary, distributional verbs
• appear often in discourse,
• include everyday vocabulary, technical terms, etc.
current status of the lexicon-grammar of frozen
sentences of European Portuguese
•
•
•
•
•
•
collection from several sources
over 4,000 frozen sentences
formal classification based on M. Gross (1982, 1989)
description by way of LG binary matrixes
examples in tables (testing)
INTEX to formalize master-graphs and apply them
to corpora (Silberztein 1993, 2000)
• CetemPúblico corpus (www.linguateca.pt)
frag. 1 & 2 (~20 M words)
• Portuguese lexical ressources (delaf_v2) from public
domain (label.ist.utl.pt)
current status (cont.)
• on-going research (far from concluded!)
• current classes only include V-NPs combinations
(see classification table)
• certain formal classes were left out for the
moment:
–
–
–
–
–
frozen subjects (C0)
exclamations, interjections (C0E)
sentential arguments (CV, C5, etc.)
frozen verb-adverb combinations (CADV)
sentences with ‘support-verbs’ and ‘operator-verbs’
current status (cont.)
• frozen subjects:
A brincadeira saiu cara ao Pedro
(The game came out expensive to Peter)
‘something was prejudicial to Peter’
• exclamations, interjections:
Vai ver se está a chover !
(go see if it is raining!)
‘Get lost, don’t bother me!’
current status (cont.)
• sentential arguments:
Vale a pena ler esse livro
(it worth the sacrifice to read that book)
‘It is very useful to read that book’
• frozen verb-adverb combinations:
Parece mal fazer isso
(‘it looks bad to do that’)
O Pedro foi-se abaixo
(Peter went himself down)
‘Peter became depressed’
current status (cont.)
• ‘support-verbs’
– difficulty in distinguishing support-verbs from frozen
sentences
– many sentences with elementary support-verbs and their
main aspectual and stylistic variants
– some with two frozen complements
– noun is not a obviously predicative noun (abstract,
associated with verb or adjective)
O Pedro fez trinta por uma linha
(Peter did thirty by one line) ’Peter made much mischief’
O Pedro deu com os burros na água
(Peter gave with donkeys on the water) ‘Peter lost’
current status (cont.)
• ‘operator-verbs’
– sentences involving operator-verbs but otherwise not
analyzable by syntactic decomposition into
Vop + elementary sentence:
O Pedroi pôs as barbasi de molho
(Peter put the beard-fp in the water)
‘Peter is getting old/tired and retired to a quieter life’
Pedroi pôs # As barbasi do Pedroi estão de molho
(Peter put # Peter’s beard is in the water)
NB: estar de molho (be in the water/sauce) is considered to be a
Vsup_Npred combination (Ranchhod 1990)
linguistic properties
• absolute constructions (NP deletion)
A Maria chorou (lágrimas de crocodilo + *E)
(Mary cried crocodile tears)
‘Mary faked to be sad’
A Maria chorou (rios de lágrimas + E)
(Mary cried rivers of tears)
‘Mary cried a lot’
linguistic properties
• obligatory permutations:
O Pedro fez das fraquezas forças
(Peter did from weaknesses strengths)
‘Peter overcome his own difficulties’
*O Pedro fez forças das fraquezas
O Pedro fez das tripas coração
(Peter did from guts heart)
‘Peter overcome his own difficulties’
*O Pedro fez coração das tripas
linguistic properties (cont.)
• dative restructuring (Guillet & Leclère 1981; Leclère
1995)
N0 V (Na de Nb)1
[Rdat] =N0 V (Na)1 a (Nb)2
O Pedro lambeu as botas do chefe
= O Pedro lambeu as botas ao chefe
(Peter licked the boots of the boss
Peter licked the boots to the boss)
‘Peter was subservient to his boss
(with the goal of personal gain)’
linguistic properties (cont.)
• Symmetry
O Pedro juntou os trapinhos com a Maria
(Peter put together the rags with Mary)
‘Peter and Mary got married/went to live together’
= A Maria juntou os trapinhos com o Pedro
= O Pedro e Maria juntaram os trapinhos
= A Maria e o Pedro juntaram os trapinhos
linguistic properties (cont.)
• Pronouning
O Pedro apertou a mãoi do Joãoi
(Peter squeezed the hand of John)
‘Peter shook hands with John’
O Pedro apertou (a suai mãoi + a mãoi d_elei)
(Peter squeezed his hand + the hand of him)
O Pedro apertou os ossosi do Joãoi
(Peter squeezed the bones of John)
‘Peter shook hands with John’
*O Pedro apertou (os seusi ossosi + os ossosi d_elei)
(Peter squeezed his hand + the hand of him)
linguistic properties (cont.)
• Passive
– usually possible whenever direct object is a free NP
O Pedro lançou o João às feras
(Peter threw John to the beasts)
‘Peter put John is a difficult situation’
O João foi lançado às feras (E + pelo Pedro)
(John was thrown to the beasts E/by Peter)
linguistic properties (cont.)
• Passive (cont.)
– sometimes, even frozen object NP can undergo Passive
O Governo prometeu mundos e fundos ao povo
(The Government promised worlds and funds to the people)
‘to promise too much, impossible to comply’
= Mundos e fundos foram prometidos ao povo
(E + pelo Governo)
(Worlds and funds were promised to the people E/by the
Government)
linguistic properties (cont.)
• Passive (cont.)
– as far as recognition of frozen sentences is concerned,
Passives do not constitute insurmountable problem
– for each verb, the corresponding adjective is given in the
LG
– the master-graph describes such adjectival-like
constructions, including adnominal position of V-a form
next to C1
Os alunos, que foram lançados às feras pelos
professores, revoltaram-se
(the students, which were thrown to the beast by the
teacher, rebelled)
Os mundos e fundos prometidos pelo Governo <...>
(the worlds and funds promised by the Government)
Uma santa mártir que morreu queimada, depois de ter sido lançada às feras e sobrevivido, porque <sic>
estas se terem deitado e lambido os seus pés (CetemPúblico#1)
linguistic properties (cont.)
• Conversion-like operations (G. Gaston 1989)
O Pedro deu no coco ao João
(Peter gave in the coconut to John)
‘Peter spanked John’
O João apanhou no coco do Pedro
(John got in the coconut from Peter)
‘John was spanked by Peter’
linguistic properties (cont.)
• Conversion-like operations (cont.)
O Pedro foi aos cornos (a + de)_o João
(Peter went to the horns of John)
‘Peter spanked John’
O João apanhou nos cornos do Pedro
(John got on the horns from Peter)
‘Peter spanked John’
linguistic properties (cont.)
• Obligatory negation (NegObrig)
O Pedro não chega aos calcanhares do João
(Peter does not get to the heels of John)
‘Peter is not a match for/is much inferior to John’
*O Pedro chega aos calcanhares do João
linguistic properties (cont.)
• intrinsically pronominal constructions (Vse):
O Pedro fechou-se em copas
(Peter closed himself in diamonds )
‘Peter kept himself silent’
*O Pedro fechou (E + o João) em copas
linguistic properties (cont.)
• interaction between NegOblig and Vse
O Pedro não se deu por achado
(Peter did not gave himself by found_ms)
‘Peter did not restrain himself’ <from doing something>
LG Tables & Master-graphs
LG Tables & Master-graphs
C
+
+
+
-
+
-
a
em
a
em
em
a
em
em
em
em
em
em
por
em
por
por
de
em
a
em
em
os
os
os
o
a
os
a
o
a
o
as
a
as
a
a
a
o
a
a
a
a
pés
braços
calcanhares
ombro
casaca
olhos
cara
focinho
consideração
jogo
costas
cantiga
mãos
palavra
língua
cabeça
corpo
cadeira
cabeça
ferida
sombra
+
+
+
+
+
+
+
+
+
+
+
+
+
-
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
de N = PRO:Pos
de N = PRO:O
[Rdat]
Det
Prep
NegObrig
+
+
+
-
Vse
N0 =: N-hum
N0 =: Nhum
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
V
<atirar>
<cair>
<chegar>
<chorar>
<cortar>
<crescer>
<cuspir>
<dar>
<descer>
<entrar>
<falar>
<ir>
<passar>
<pegar>
<puxar>
<reger>
<sair>
<sentar>
<subir>
<tocar>
<viver>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Extract from table CPN
Exemplo
O Zé atirou-se aos pés da Ana
O Zé caiu nos braços da Ana
O Zé não chega aos calcanhares da Ana
O Zé chora no ombro da Ana
O Zé corta na casaca da Ana
O Zé cresceu aos olhos do patrão
O Zé cuspiu-lhe na cara
A Ana deu no focinho do Zé
O Zé desceu na consideração da Ana
O Zé entrou no jogo da Ana
O Zé fala nas costas da Ana
O Zé foi na cantiga da Ana
O projecto passou pelas mãos da Ana
O Zé pegou na palavra da Ana
A Ana puxou pela língua do Zé
O Zé rege-se pela cabeça da Ana
A riqueza do Zé saiu-lhe do corpo
O Zé sentou-se na cadeira do Pedro
O dinheiro subiu à cabeça do Zé
O Zé tocou na ferida da Ana
O Zé vive na sombra da Ana
LG Tables & Master-graphs
• until now, priority was given to building the LG
• some tests over LG examples and different sized
corpora
• problems on matching LG examples
–
–
–
–
e.g. CP1 (70% recall)
<CATEG> at lema but <PRO+Pes:R> in M_grf ok
embedded graphs “:Graph”
unknown causes for mismatch
• waiting for new solutions under NooJ
Inheritance
– no solutions, just talking about it...
– so far, under INTEX it is not possible to locate neither
compound words nor frozen sentences by lemma under
INTEX
– M_grf do not allow strings matched by *cfg of delae to
inherit inflection values of the <V> element:
O Pedro <brincou,brincar.V:P3s> com o fogo
(Peter played with the fire)
‘Peter did something dangerous’
dle: O Pedro <brincou com o fogo,brincar com o fogo.V+CP1>
 O Pedro <brincou com o fogo,brincar com o fogo.V+CP1:P3s>
Inheritance – a re-visitation
• main reason for implementing inheritance in NLP
systems, beyond morphology, is the processing of
compound tenses
• compound tenses are very frequent in Portuguese
• around ~100 auxiliary verbs (Vaux) have already
been described in Portuguese (Pontes 1977, Gonçalves
2000)
• their combination with main verbs is complex
(e.g. clitic positioning)
a very naïf approximation to Vaux-V combinations
(clitic pronouns were ignored)
Inheritance (cont.)
• their combination with other Vaux gives rise to
complex syntactical patterns (M. Gross 1999; Ranchhod
2003)
• many frozen sentences appear in compound
tenses, this being one of the main causes for low
precision (only part of the V complex is matched)
Inheritance (cont.)
• Calculation of compound tenses poses a serious
challenge,
• Vaux-V combinations can not be predicted a priori
from V
• highly specific meaning that a Vaux-V combination
may convey,
• multiple Vaux combination in front of V
(average 1-3, but up to 4; limited recursiveness of Vaux*-V
and limited patterns of combination may ‘ease’ the task of
describing it)
Inheritance (cont.)
dealing with compound tenses in the M_grf
• is linguistically inadequate
• mixes two distinct linguistic phenomena
• and unnecessarily complicates the graphs
Inheritance (cont.)
• compound tense calculation can not, at least a priori, be
coded in the same way as simple tenses (lexical form of Vaux
plays an important role)
• it is controversial the decomposing of compound tenses in
multiple tense/aspect/mode/x? features, collapsing in a single
word
• it is unclear how to derive final attributes of a long string of
Vaux from the features of each one, since they are not always
cumulative
• except, perhaps, commonly used tenses such as ter + Vpp
(for example, the solution proposed by Silberztein 2000)
Inheritance
• Intex already deals in a limited way with
inheritance at the morphological module
(it stores the information associated to an entry and reuses it when
calculating the information of the word being morphologically
analyzed)
• It should be possible to tackle the problem in a
similar way in a two stages process (eventually with
recursive first step):
1. processing of compound tenses (recursive)
2. lexical analysis of frozen sentences
Inheritance (cont.)
Some exceptions of frozen sentences of CV:
Esta estrada vai (ter + dar) a Paris
(This road goes have:w/give:w to Paris)
‘This road leads to Paris’
that should be analyzed as CV, in the present:
 vai ter,ir ter.V+CV:P3s, vai dar,ir dar.V+CV:P3s
and not as a compound future tense of the last V:
 vai ter, ter.V:F3s, vai dar, dar.V:F3s
Inheritance
• inheriting properties can apply to other cases
• frozen sentences with gender/number agreement with other
words (other than V); (note: this case is rather rare)
[O Pedro]NP_ms não se deu por <achado>Adj_ms
(Peter did not give himself found)
‘Peter didn’t wait to be asked in order to do something’
where a suitable analysis should be:
não se deu por achado, Neg dar-se por achado.V+CP1:J3ms
the ‘ms’ being derived from the Adj, the ‘s’ of deu <V:J3s> is
reduced to avoid duplication.
Inheritance (cont.)
• Another case:
Named Entities Recognition (of strings of proper
names designating a single person)
[José_cn:ms Manuel_cn:ms Durão_fn Barroso_fn] Npr:ms
Inheritance (conclusion)
• hélas! no solution,
• but showing problems may help finding some
solutions
• minimalist solution with pre-processing of common
Vaux-V combinations
Future Perspectives
•
•
•
•
•
Continue the building of the LG of frozen sentences
associate UNAMB feature,
apply them to large-sized corpora
associate FRQ information
improve and integrate FS in the syntactic analysis
(along with simple, distributional verbs)
References
ARAÚJO-VALE, Oto (2001), Expressões Cristalizadas do Português do Brasil: Uma proposta de Tipologia. Tese de Doutoramento, BRASIL, Araquara, UNESP.
BAPTISTA, Jorge, (no prelo), Construções Simétricas: Complementos e Argumentos. in FIGUEIREDO, O., M.G. RIO-TORTO e F. SILVA (org.). [Volume de
Homenagem ao Prof. Mário Vilela], Porto, Univ. Porto.
BAPTISTA, Jorge, CORREIA, Anabela & FERNANDES, Maria da Graça (2004), Frozen Sentences of Portuguese: Formal Descriptions for NLP. Workshop on
Multiword Expressions: Integrating Processing, International Conference of the European Chapter of the Association for Computational Linguistics,
Barcelona (Spain), July 26, 2004, Barcelona, ACL, pp. 72-79.
BOONS, Jean-Paul, GUILLET, Alain, & LECLÈRE, Christian (1976a) : La structure des phrases simples en français. Constructions Transitives. Paris, LADL – Univ.
Paris 7.
BOONS, Jean-Paul, GUILLET, Alain, & LECLÈRE, Christian (1976b) : La structure des phrases simples en français. Classes de constructions transitives. Genève,
Droz.
CHACOTO, Lucília (2005), O verbo FAZER em Construções Nominais Predicativas, Tese de Doutoramento, Faro, Universidade do Algarve.
CORREIA, Anabela (em preparação). Léxico-Gramática das Frases Fixas do Português Europeu – Construções Transitivas. Tese de Mestrado, faro, Universidade
do Algarve.
FERNANDES, Maria da Graça (em preparação). Léxico-Gramática das Frases Fixas do Português Europeu – Construções Intransitivas. Tese de Mestrado, faro,
Univ. Algarve.
FOTOPOULOU, Aggeliki (1993): Une classification des phrases à compléments figés en grec moderne. Tese de Doutoramento. Paris : LADL/ Univ. Paris 7.
GROSS, Maurice (1975): Méthodes en Syntaxe. Régimes des constructions complétives. Paris, Hermann.
GROSS, Maurice (1981): « Les bases empiriques de la notion de prédicat sémantique ». Langages 63, pp. 7-52, Paris, Larousse.
GROSS, Maurice (1982): «Une classification des phrases "figées" du français ». Revue Québécoise de Linguistique, 11.2, 1982, 151-185, Montréal, UQAM.
GROSS, Maurice (1989): Les Expressions Figées, Une description des expressions françaises et ses conséquences théoriques, Paris , Univ. Paris 7, LADL.
GROSS, Maurice (1996): «Lexicon Grammar», in K. BROWN & J. MILLER (Eds.), Concise Encyclopeadia of Syntatic theories. pp. 244-258, Cambridge, Pergamon.
GROSS, Maurice (1999): Lemmatization of compound tenses. Fairon, C. (ed.) Linguisticae Investigationes ??. Amsterdam: John Benjamins Pub. Co.
GROSS, Maurice (2000). « Verbes à trois compléments essentiels. » BULAG. Lexique, syntaxe et sémantique, mélanges offerts à Gaston Gross, p.199-210, Univ.
Franche-Comté/Centre Tesnière.
GROSS, Maurice (2000). Verbes à trois compléments essentiels. BULAG. Lexique, syntaxe et sémantique, mélanges offerts à Gaston Gross, p.199-210, Univ.
Franche-Comté/Centre Tesnière.
GUILLET, Alain, & LECLÈRE, Christian (1981): « Restructuration du groupe nominal». Langages 63, Paris, Larousse.
GUILLET, Alain, & LECLÈRE, Christian (1982) : La structure des phrases simples en français. Les constructions locatives. Genève, Droz.
HARRIS, Zellig S. (1991): A Theory of Language and Information. A Mathematical Approach. Oxford, Clarendon Press.
LECLÈRE, Christian (1995), «Restructuration dative». Language Research, 31:1. Language Research Institute – Seoul National University: Seoul
LECLÈRE, Christian (2000): « Expressions figées dans la francophonie: le projet BFQS ». BULAG, Lexique, Syntaxe et Sémantique, Mélanges offerts à Gaston
Gross. Pierre-André BUVET, Denis LE PESANT et Michel MATHIEU-COLAS (Eds.), n° Hors Série, pp. 321-331, Besançon, Centre Lucien Tesnière.
LECLÈRE, Christian (2002), «Organization of the lexicon-grammar of French verbs». Lingvisticae Investigationes, XXV I, USA, John Benjamins Publishing
Company.
MOGÓRRON-HUERTA, Pedro Joaquín (2002): Estudio Contrastivo de las Expresiones Fijas en Ser / Estar + Prep X Y Être Prép X en Francés. Tese de
Doutoramento em Filologia Românica, Univ. Valência.
SANTOS, António Nogueira (1990): Novos Dicionários de expressões idiomáticas. Lisboa, Edições Sá da Costa.
SENELLART, Jean (1998): «Reconnaissance automatique des entrées du lexique-grammaire des phrases figées ». in Béatrice LAMIROY (Ed.): Le LexiqueGrammaire. Travaux de Linguistique 37, Bruxelles, Ducolot, 1999, pp.109-125.
RANCHHOD, Elisabete (2003): Reconhecimento de sequências de verbos auxiliares por métodos de estados finitos. Lisboa: FLUL.
SILBERZTEIN, Max (1993): Dictionnaires électroniques et analyse automatique de textes. Le système Intex. Paris, Masson.
SILBERZTEIN, Max (2000): Intex (Manual). Paris, ASSTRIL.
Download