(I) Objectives

advertisement
CBA 2010
Corpus-Based Approaches to Praphrasing and Nominalization
Barcelona, 1-2 diciembre 2010
Spanish verbs and verb-noun
collocations paraphrase pairs
María A. Barrios, auxiba@filol.ucm.es
Luz Rello, <luzrello@gmail.com>
Outline
1.
2.
3.
4.
5.
6.
7.
8.
Objectives
Introduction: Meaning-Text Theory
BADELE.3000, a linguistic resource
Paraphrase Rule 18 from Meaning-Text Theory
Paraphrase pairs of verb and verb-noun
collocations
Verb-noun collocations that have no verbal
counterpart
Verbs that have no verb-noun collocation
counterpart
False paraphrases
1. Objectives (I)
1. To present a linguistic resource, BADELE.3000, useful
for NLP applications
2. To describe cases of paraphrase pairs composed of
verbs which can be paraphrased with support
verb+noun collocations, such as
ducharse (to shower), darse una ducha (to have a
shower)
3. To describe cases of false paraphrase pairs: words
with a morfological but not semantic relationship,
such us
1. Objectives (II)
1. To describe cases in which a verb+noun collocation
has no verbal counterpart, such as
*problemear
tener un problema (to have a problem)
1. To describe cases in which a verb has no verb+noun
collocation counterpart, such as
sincerarse (to open up to someone)
*hacer una sinceridad
Outline
1.
2.
3.
4.
5.
6.
7.
8.
Objectives
Introduction: Meaning-Text Theory
BADELE.3000, a linguistic resource
Paraphrase Rule 18 from Meaning-Text Theory
Paraphrase pairs of verb and verb-noun
collocations
Verb-noun collocations that have no verbal
counterpart
Verbs that have no verb-noun collocation
counterpart
False paraphrases
2. Introduction (I): MTT
1. Lexical function (LF) associates a given lexical
expression L (such as sound), which is the argument or
keyword of F, with a set of lexical expressions –the value
of F (such as loud, strong, heavy, deafening, etc). –
expressing a specific meaning associated with F (for
instance, ‘intense’ = Magn).
 sound = argument or keyword of LF
 loud, strong, heavy, deafening, etc. = values of LF
 ‘intense’ (specific meaning associated with Magn)
 Magn(sound) = loud, strong, heavy, deafening
2. Introduction (II): MTT

More than 100 different universal LFs
 Oper1
(ducha) = darse, Oper1 (shower) = to have,
Oper1 (douche) = prendre, Oper1 (doccia) = fare
 Si (the name of the participant in a situation)
S1(school) = student; S2(school) = teacher
 V0(reception) = to receive
S0(to receive) = reception
2. Introduction (III): MTT
2. Semantic label is the equivalent to the genus in
traditional definitions by genus and differentia.


Whale: ‘sea mammal that breathes air through a
hole at the top of its head and is hunted for meat
and for other purposes, as a source of other
materials’
Hierarchy of semantic label
 Living being > Animal > Vertebrate > Mammal > Sea
Mammal
2. Introduction (IV): MTT
3. Actants correspond to beings or things that
participate in the process expressed by a predicate:
MTT approach considers that there is a sort of
argument structure in all kinds of predicative words,
which means that not only the verbs have actants but
also the adjectives, the adverbs and the predicative
nouns.
 The actantial structure reflects the syntactic
expression of the actants,
 River [WHICH STARTS AT THE X place, FLOWS
THROUGH THE Z places AND FINISHES AT THE
Y area]
2. Introduction (V): MTT
LFs have proved to be a specially helpful tool for
lexicographic works such as the French database
Dicouèbe[1](developed in Montreal by Polguère and
Mel’cuk), the Spanish database DiCE[2] (developed in
La Coruña by Alonso Ramos), [3] the automatic
translator ETAP3 (developed in Moscow by Apresjan,
Boguslavsky et al) and [4] multilingual generation and
paraphrasing systems (developed in Barcelona by
Leo Wanner).
[1] http://olst.ling.umontreal.ca/dicouebe/
[2] http://www.dicesp.com/
[3] http://cl.iitp.ru/etap
[4] http://www.barcelonamedia.org/files/292.pdf

Outline
1.
2.
3.
4.
5.
6.
7.
8.
Objectives
Introduction: Meaning-Text Theory
BADELE.3000, a linguistic resource
Paraphrase Rule 18 from Meaning-Text Theory
Paraphrase pairs of verb and verb-noun
collocations
Verb-noun collocations that have no verbal
counterpart
Verbs that have no verb-noun collocation
counterpart
False paraphrases
3. BADELE.3000, a linguistic
resource (I)
BADELE.3000 (Barrios & Bernardos, 2007) is a
database that contains the 3,300 most frequently used
Spanish nouns and the 3,300 most frequently used
Spanish verbs, 20,700 relations were formalized by
means of LFs
BADELE.3000 is useful for natural language
processing applications and ontologies (Barrios, Aguado
de Cea and Ramos, 2009a), (Barrios, Aguado de Cea and
Ramos, 2009b), (Barrios and Vilches, 2010).
9,000 lexical relations were obtained automatically by
semantic labels and LFs
21,700 lexical relations were added manually (Bosque,
3. BADELE.3000, a linguistic
resource (II)







Inheritance Principle: those lexical units sharing a
semantic label can inherit their LFs values
automatically
Fact0(‘means of transport’) = to work (‘to do what is
supposed to be done’)
‘means of transport’ = bus, ship, train, motorbike,
plane
Fact0(bus) = to work, to run, to operate
Fact0(ship) = to work, to sail, to navigate
Fact0(train) = to work, to run, to operate
Fact0(motorbike) = to work, to run
5.
3,300 most frequently used
Spanish (I)
nouns
BADELE.3000
Semantic
label:
‘means of
transport’
El barco funciona
= the ship works
the bus works, the
bus runs, the bus
operates; the ship
sails, the ship
navigates; the
plane works, the
plane flights, the
plane glides, the
plane flies over
Ontology Engineering Group
14
3. BADELE.3000, a linguistic
resource (III)
CausFunc0 means ‘to cause something to exist’:
CausFunc0(ropa) = confeccionar > una camiseta/ pantalones, etc.
CausFunc0(clothes) = to make > to make a T-shirt/ trousers/ skirt
CausFunc0(obra artística) = componer > poema/ libro/ argumento
CausFunc0(artistic work) = to compose > a poem/ book/ plot/ etc.
CausFunc0(vivienda) = construir > una casa/ rascacielos/
apartamento, etc.
CausFunc0(accommodation) = to build > to build a house/
skyscraper/ apartment/ etc.
CausFunc0(energía) = producir > luz/ gas/ petróleo, etc.
CausFunc0(energy) = to produce > to produce light/ gas/ petrol
Outline
1.
2.
3.
4.
5.
6.
7.
8.
Objectives
Introduction: Meaning-Text Theory
BADELE.3000, a linguistic resource
Paraphrase Rule 18 from Meaning-Text Theory
Paraphrase pairs of verb and verb-noun
collocations
Verb-noun collocations that have no verbal
counterpart
Verbs that have no verb-noun collocation
counterpart
False paraphrases
4. Paraphrase rule 18 (I)
1. The paraphrase rule 18 is called “Fissions à verbe
support” (Mel’cuk et al, 1998), and is transcribed as:
“Given a verb (Vo), such as to receive,
paradigmatically related to a noun (So), such as
reception, if the noun appears in a collocation
together with a support verb Oper, such as to give a
reception, both verbal expressions (to receive, to give
a reception) are interchangeable.
to receive  Oper1(reception) = to give a reception
1. Support verbs are values of LFs Oper, Func and Labor
(Mel’čuk, 1996, 68), such as to deal a blow, to receive a
Outline
1.
2.
3.
4.
5.
6.
7.
8.
Objectives
Introduction: MTT tools
BADELE.3000, a linguistic resource
Paraphrase Rule 18 from Meaning-Text Theory
Paraphrase pairs of verb and verb-noun
collocations
Verb-noun collocations that have no verbal
counterpart
Verbs that have no verb-noun collocation
counterpart
False paraphrases
5. Paraphrase pairs (I)
There are more than 700 nouns in BADELE.3000 whose
meaning is equivalent to the meaning of some verbs,
related to V0 and S0 LFs.
We listed these nouns as potential nouns in verb-noun
collocations:
V0(blow) = to beat
S0(to beat) = blow
V0(resistance) = to resist
S0(to resist) = resistance
V0(order) = to order
S0(to order) = order
5. Paraphrase pairs (II)
Then we found the support verb collocations of these
nouns
1. to deal a blow
2. to put up resistance
3. to give an order
And then we attached these support verb-noun
collocations to the equivalent verb:
1. to bang/ beat/ hit…. to deal a blow
2. to resist … to put up resistance
5. Paraphrase pairs (III)
777 paraphrase pairs of verb and verb-noun collocation were
found in BADELE.3000
to select to make a selection
to reject
to show rejection
to assist to give assistance
to support
to give support
to research
to do a research
to control
to subject to control
to spread
to make propaganda
to define to formulate a definition
to remember to have a memory
Outline
1.
2.
3.
4.
5.
6.
7.
8.
Objectives
Introduction: MTT tools
BADELE.3000, a linguistic resource
Paraphrase Rule 18 from Meaning-Text Theory
Paraphrase pairs of verb and verb-noun
collocations
Verb-noun collocations that have no verbal
counterpart
Verbs that have no verb-noun collocation
counterpart
False paraphrases
6. Verb-noun collocations that have
5.no
Verb-noun
collocations
verbal counterpart
(I)
Lack of verbs was found frequent in nouns which
denote illnesses and feelings:
*gripear (*to flu)> tener gripe (to have flu)
*diabetear (*to diabetes) > tener diabetes (to have
diabetes)
*soledear (*to lonely) > sentir soledad (to feel loneliness)
*felicidadear (*to happy) > sentir felicidad (to feel
happiness)
And also with physical facts and non physical facts:
S0
vistazo
(look)
bocanada
(mouthful)
calor
(heat)
injusticia
(injustice)
estrategia
(estrategy)
chiste
(joke)
incidente
(incident)
senderismo
(hiking)
turismo
(tourism)
conducta
(behaviour)
realidad
(reality)
Collocation = Oper1(S0)
echar un vistazo
(to look at sth)
dar una bocanada
(*to do a mouthful)
hacer calor
(to heat)
hacer una injusticia
(to make an injustice)
tener una estrategia
(to have a strategy)
contar un chiste
(to tell a joke)
sufrir un incidente
(*to live an incident)
hacer senderismo
(to go hiking)
hacer turismo
(to travel around)
tener una conducta
(to have a behaviour)
hacerse realidad
(to make sth real)
V0 = 
*vistacear
(to look)
*bocanear
(*to mouthful)
*calorar
(to be hot)
*injusticiar
(*to injustice)
*estrategiar
(*to strategy)
*chistear
(to joke)
*incidentear
(*to incindent)
*senderear
(to hike)
*turistear
(*to tourism)
*conductear
(to behave)
*realidacear
(to reality)
Outline
1.
2.
3.
4.
5.
6.
7.
8.
Objectives
Introduction: MTT tools
BADELE.3000, a linguistic resource
Paraphrase Rule 18 from Meaning-Text Theory
Paraphrase pairs of verb and verb-noun
collocations
Verb-noun collocations that have no verbal
counterpart
Verbs that have no verb-noun collocation
counterpart
False paraphrases
7. Verbs that have no verb-noun
collocations counterpart

Previous work (Barrios, 2010) pointed out that
abstract nouns tend to form collocations using
support verbs. There are some exceptions to this
generalization (Table 2)
S0
V0
Collocation = 
respiración
respirar
* hacer la respiración
(breath)
(to breath)
(*to do the breathing)
ovulación
ovular
* hacer la ovulación
(ovulation)
(to ovulate)
(*to do an ovulation)
parpadeo
parpadear
* hacer un parpadeo
(blink)
(to blink)
(*to do a blink)
clamor
clamar
* soltar un clamor
(clamour)
(to clamour)
(*to do a clamour)
tarareo
tararear
* lanzar un tarareo
(humming)
(to hum)
(*to do a humming)
financiación
financiar
* hacer una financiación
(financing)
(to finance)
(to do a finance)
cotización
cotizar
* hacer una cotización
(quotation)
(to quote)
(*to do a quotation)
Table 2
Outline
1.
2.
3.
4.
5.
6.
7.
8.
Objectives
Introduction: MTT tools
BADELE.3000, a linguistic resource
Paraphrase Rule 18 from Meaning-Text Theory
Paraphrase pairs of verb and verb-noun
collocations
Verb-noun collocations that have no verbal
counterpart
Verbs that have no verb-noun collocation
counterpart
False paraphrases
8. False paraphrases
There are cases where the verb and the noun occurring
in the collocation have different meaning:
1. Tener frío (to be cold) ≠ enfriarse (to get cold)
2. Tener cansancio (to be tired) ≠ cansarse (to get tired)
3.
Hacer una expedición (to do an expedition) ≠
expedir (to issue)
4. Tener sueño (to be sleepy) ≠ soñar (to dream)
8. False paraphrases
5. comprar (to buy)
1.
2.
3.
Hacer una compra (*to make a purchase)
Hacer la compra (to do the shopping)
Ir de compras (to go shopping)
6. responder (to answer) = dar una respuesta (to give an
answer)


Respondió que no lo sabía
Dio una respuesta *que no lo sabía
8. False paraphrases
7. preguntar (to ask) = hacer una pregunta (to ask a
question)
1.
2.
3.
4.
5.
6.
Le preguntó dónde vivía
(He ask him where he lives)
Le hizo una pregunta *dónde vivía
(He ask him a question *where he lives)
Le hizo una pregunta: ¿dónde vives?
(He ask him a question: where do you live?)
Outline
1.
2.
3.
4.
5.
6.
7.
8.
Objectives
Introduction: MTT tools
BADELE.3000, a linguistic resource
Paraphrase Rule 18 from Meaning-Text Theory
Paraphrase pairs of verb and verb-noun
collocations
Verb-noun collocations that have no verbal
counterpart
Verbs that have no verb-noun collocation
counterpart
False paraphrases
9. Conclusions
1. We have presented BADELE.3000, a linguistic
resource useful for natural language processing
applications and ontologies.
2. We have described paraphrase pairs composed of
verbs which can be paraphrased with support
verb+noun collocations; cases of false paraphrase
pairs: words with a morfological but not semantic
relationship; cases in which a verb+noun
collocation has no verbal counterpart; and cases in
which a verb has no verb+noun collocation
counterpart.
9. Conclusions
1. We have defended that the rule 18 is quite useful
when getting collocations automatically, but
paraphrase pairs of verbs and verb-noun
collocations sometimes are not interchangeable
semantically, and frequently are not interchangeable
syntactically.
10. References
Barrios, M. A. 2010. El dominio de las funciones léxicas en el marco
de la Teoría Sentido-Texto. ELIES, 30.
http://elies.rediris.es/elies30/index30.html
Barrios, Aguado de Cea and Ramos, 2009a. Semantic labels and
genus: improving specialized domain definitions, M. Claude L’Homme,
Silvye Szulman (eds.). Proceedings of the 8th International
Conference on Terminology and Artificial Intelligence. ISSN 1613-0073.
http://ftp.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-578/
Barrios, Aguado de Cea and Ramos, 2009b. Enriching a lexicographic
tool with domain definitions: Problems and solutions. First
International Workshop on Definition Extraction. G. Sierra, M. Pozzi, J.
M. Torres Moreno. Proceedings of the 1st International Workshop on
Definition Extraction. LPN. Borovets.,14-20
10. References
Barrios M. A. & Bernardos S., 2007. “BaDELE.3000: An implementation
of the lexical inheritance principle”. In Gerdes et al. Proceedings of
the Fourth International Conference on Meaning-Text Theory.
Observatoire de linguistique Sens-Texte (OLST), Montreal,97-106.
Barrios and Vilches, 2010. It is possible to enrich ontologies with a
specialized domain linguistic resource? Workshop Establishing and
using ontologies as a basis for terminological and knowledge
engineering. TKE Conference. Dublin, 2010.
Mel’cuk, I. 1996. “Lexical functions: A tool for the description of lexical
relations in a lexicon”. In Wanner, L. (ed.), Lexical functions in
lexicography and natural language processing. Amsterdam/
Philadelphia. John Benjamin. 37-102.
Mel’cuk et al, 1998, Dictionnaire explicatif et combinatoire du Français
Contemporain. Recherches lexico-semantiques II. Montreal. Les
Presses de l’Université de Montréal.
CBA 2010
Corpus-Based Approaches to Praphrasing and Nominalization
Barcelona, 1-2 diciembre 2010
Thank you!
Download