conceptual coherence in the generation of referring expressions

advertisement
conceptual coherence in the
generation of referring expressions
Albert Gatt & Kees van Deemter
University of Aberdeen
{agatt, kvdeemte}@csd.abdn.ac.uk

Gatt and Van Deemter 2007: “Lexical Choice
and conceptual perspective in the generation
of plural referring expressions”. Journal of
Logic Language and Information (JoLLI) 16
(4), p.423-444.
some received wisdom…
Choice is ultimately dependent on the
perspective you decide to take on the
referent (...).
Will it be more effective for me to refer to
my sister as my sister or as that lady or
as the physicist ? (Levelt `99, p. 226)
the rest of this talk…
1.
2.
Generation of Referring Expressions
Perspective and Conceptual Coherence


3.
An algorithm

4.
reference to sets
experimental work
evaluation
Extensions:

local (Conceptual) Coherence in discourse
Generation of Referring Expressions
(GRE)

Part of micro-planning (Reiter/Dale `00)

At this stage, the content of a message is being determined,
including descriptions of domain objects (Noun Phrases)

The task of GRE:
– given a set of intended referents, look up properties of these
referents that will distinguish them from their distractors in a
Knowledge Base
Content determination strategies

entity
base type
occupation
specialisation
girth
e1
woman
professor
physicist
plump
e2
woman
lecturer
geologist
thin
e3
man
lecturer
biologist
thin
e4
man
postgraduate
thin
Most algorithms inspired by the Gricean maxims (Grice `75)
–
especially Brevity (Dale `89, Gardent `02)

But compare:

?? λx: professor(x) V plump(x)
?? λx: professor(x) V [plump(x) & man(x)]
 λx: biologist(x) V physicist(x)
Not all of these have an equally good ring to them.
the Conceptual Coherence constraint

Sets (and disjunction): λx: p(x) V q(x) ‘the p and the q’
–
–
–

reference to a plurality suggests to the listener that there is a relationship
holding between elements of the pluralities
p and q should be related or “similar”
semantic relatedness allows the listener to conceptualise the plurality more
easily (Sanford and Moxey, `95)
Gatt and van Deemter (`02):
–
–
–
People’s preference for descriptions of this form were highly correlated to
the semantic similarity of disjuncts
Best results achieved with a distributional definition of similarity (Lin `98)
sim(w,w’) is a function of how often w and w’ occur in the same
grammatical relations in a corpus
Lin’s definition of distributional
similarity




Let w1, w2 be two words of the same grammatical category.
E.g. dog, cat
GR contains information about a syntactic relation w occurs in:
– GR = <w, R, x, p>
– w the target word, R the relation, x the co-argument of w
– p is the probability of w and x occurring in this construction (as
mutual information).
– Example: <dog, modified-by, stray, 0.002>
sim(w1, w2) is calculated using the GR triples that w1 and w2 share.
We use SketchEngine, a large-scale implementation of this
theory, based on the BNC (Kilgarriff, `03)
experiment 1: multimodal sentence
completion



General idea:
– To refer to a set, people will prefer to use a plural that
respects the conceptual coherence constraint
– If this is impossible, then they will break down the set in
manageable parts.
Experimental domains:
– 3 targets (a,b,c) + 1 distractor (d)
– sim(a,b) could be high or low
– sim(a,c) ≈ sim(b,c) = low
Expectation:
– if 2 of the targets have semantically high-sim types, they will
be referred to in a plural description
experiment 1: example domain
Experimental domain:
a
d
£5
£20
1. Participants completed
the sentences by clicking
on the pictures.
2. Manipulation of similarity
of two of the objects (a,b).
c
£5
b
£5
Complete the following by clicking on the pictures:
The _____________ and the _____________ cost £5.
The _____________ also costs £5.
3. Hypothesis:
If {a,b} are similar, they
are more likely to be
referred to in the plural.
experiment 1: results
Proportion of plural references to designated targets {a,b} when:
{a,b} are semantically similar
{a,b} are semantically dissimilar
experiment 2: sentence continuation

Does similarity play a role in content determination?
A university building was robbed last night. The police have detained
three suspects for questioning, all of whom work or study at the
university.
1. One of them is a postgraduate. He is a physicist.
2. Another is a Greek, an undergraduate.
3. Also among the suspects is a cleaner. He is an Italian.
Both ______________________ were held in custody, but the
physicist was released last night.


Distinguishing properties: nouns (12) or adjectives (12 ).
Expectation:
–
Participants will select similar properties in the plural description
experiment 2: results
Proportion of references using pairwise similar properties:
Nouns:
Friedman 45.89, p < .001
trend as expected
Adjectives:
Friedman 36.3, p < .001
trend in the opposite direction
summary of findings so far

In referential situations, people prefer to produce plural
descriptions if the entities can be conceptualised under the
same perspective.

This holds for types, but not modifiers
– Types correspond to “concepts”, and are the way we carve
up the world and categorise objects
– Modifiers correspond to properties of those objects.

Results have been corroborated in other experiments


Aloni (2002): answers to questions “wh x?”
must conceptualise the different x using one
and the same perspective (relevant given
hearer’s information state and the context)
Our experiments confirm that this idea is on
the right track …
The challenge for an algorithm:




Complete coherence is often not possible
“the Italian, the Greek and the Spaniard” –
But what if there are 5 Spaniards?
“the Italian, the Greek and ?” – What if you
don’t know the person’s nationality?
“the table, the chair and the plant” – What if
you need to refer to an object that’s of
different kind of the other two?
a GRE algorithm

The algorithm should try to find the most coherent description
possible. Assumption: this should be done even at the cost of
brevity!

Main knowledge source:
– The relation sim (Kilgarriff `03)

Input:
– Knowledge Base
– Target referents (R )
step 1
1.
2.

Lexicalise properties in the KB
Identify types (nominal properties) and modifiers
The set of types and the similarity relation define a
semantic space S = <T, sim>
Definition 1: Perspective
A perspective P is a convex subset of S, i.e.:
∀ t, t’, t’’ ∈ T:
t, t’ ∈ P & sim(t, t’’) ≥ sim(t, t’)  t’’ ∈ P

Computed using a clustering algorithm (Gatt `06),
which recursively groups together semantic nearest
neighbours.
perspective graph
T: {lecturer, professor, postgraduate}
1
0.6
1
T: {woman, man}
M: {plump, thin}


2
3
1
T: {geologist, physicist,
biologist, chemist}
Aim: find a description for R that minimises the distance
between perspectives from which properties are selected.
Weight of a description, w(D): the sum of distances between
perspectives represented in D.
–
–
w( ‘the professor and the plump man’ ) = 1
w( ‘the biologist and the physicist’ ) = 0
descriptive coherence
Definition 2: Maximal coherence
D is maximally coherent if there is no D’ coextensive with D
such that w(D’) < w(D)
–
implies finding a shortest connection network in the perspective
graph (intractable!)
Definition 3: Local coherence
D is locally coherent if there is no D’ coextensive with D s.t.:
1. D’ is obtained by replacing a perspective in D
2. w(D’) < w(D)
search procedure



N  ∅ //the perspectives represented in D
root  perspective with most referents in its extension
starting from root do:
–
Check types and modifiers.
–
If a property excludes distractors:
 add it to D
 add the perspective to N
–
If R is not distinguished, go to the next perspective, which is
min
PV

uN
w(u, P)
(V is the set of perspectives).
evaluation

Do people prefer coherence over brevity?
–

Method: subjects (N = 39) shown 6 discourses.
–
–
–

(Two Gricean maxims: “Be brief” vs. “Be orderly”)
Each discourse introduces 3 entities
Followed by 2 possible continuations
Subjects had to indicate their preferred continuation
Each of the 6 discourses represented a condition:
–
–
–
Brevity: descriptions equally (in-)coherent, but one is brief
Coherence: descriptions equally (non-)brief; only one is
coherent
Trade-off: coherent description is non-brief
Example: the domain
Three old manuscripts were auctioned at Sotheby’s:
e1: One of them is a book, a biography of a composer
e2: The second, a sailor’s journal, was published in the
form of a pamphlet. It is a record of a voyage.
e3: The third, another pamphlet, is an essay by Hume

Intuitively, this is about texts
–
–

of different genres (e.g., essay)
published in different forms (e.g., pamphlet)
Of course our corpus-based model doesn’t
use these concepts …
Example: continuations:
(+c,-b) The biography, the journal and the
essay were sold to a collector
(+c,+b) The book and the pamphlets were sold
to a collector
(-c,+b) The biography and the pamphlets were
sold to a collector
(-c,-b) The book, the record and the essay
were sold to a collector
results: no preference for brevity
both descriptions coherent
x2 = .023, p = .8
both descriptions non-coherent
x2 = .64, p = .4
results: preference for coherence
both descriptions minimal
x2 = 16.03, p < .001
both descriptions non-minimal
x2 = 13.56, p < .001
results: trade-off

Finally, (+c,-b) preferred over (-c,+b)
x2 = 39.0, p < .001
In other words
 Coherence was more important than brevity
 In fact, brevity made no difference at all!
–
we did not confirm that +b is preferred over –b
Conclusion

When it’s impossible to use the same
perspective, use perspectives that are similar

A version of Grice’s maxim “be orderly”?
Methodology

Many experiments were done
–
–


to find a suitable notion of similarity/coherence
to discover how coherence and brevity relate
Different algorithmic interpretations would be
possible
Algorithms are almost always under-determined by
the empirical evidence
A limitation


Ambiguity/polysemy is not taken into account
For example, we might generate
–


“the river and the/its bank”
These issues investigated in Imtiaz Khan’s
PhD project
One remark: “river” might disambiguate
“bank”
An open question

Why doesn’t coherence play the same role
for modifiers as for types?
Download