CS4025 Practical 3 – hints and answers

advertisement
CS4025 Practical 10 – hints and answers
Some students found it hard to get motivated for these exercises – maybe it’s the time
of year… In practice, few got on to part 2, because many found excuses to leave
early.
For part 1, it is important to emphasise that for Q1 they only need to write down the
list of fact types – they don’t have to type in the sentences!
If students don’t appropriately use ‘clear’, they will have many excess rules loaded
and the generator will take for ever.
It’s important to emphasise that there isn’t necessarily a simple pattern that all these
texts follow. The thing to do is to identify any strong constraints that exist, for
instance:
 There is always an adult:family fact at the end
 A text usually starts with an adult:size or adult:appear
 Once a text starts talking about the caterpillar (or pupa, or eggs), it usually
gives all facts about that before moving on.
To get general constraints, one may have to look beyond the two analysed texts.
Then one designs a rather specific schema that imposes a rigid ordering satisfying
these constraints. Then one sees if it produces reasonable texts for all examples. It is
better to start off with a rigid schema and then relax it, rather than start off with a very
flexible schema (because in the second case, the software may be overwhelmed by the
number of possibilities).
I think that in this domain probably most orders of information will work OK. But it’s
important that the students think about what orders of sentences are successful and
why…
Part 1
The ways I would split up three examples into facts are shown in the three example
Prolog files. E.g. here is the one for American Snout:
adult:appear ---> ['The American Snout (Libytheana carinenta) is a
butterfly that has long labial palps (mustache-like scaly mouthparts
on either side of the proboscis) that look like a long snout.'].
adult:size ---> ['The butterfly has a 1 3/8 - 2 inch (3.5 - 5 cm)
wingspan.'].
adult:other ---> ['The front pair of legs on the male (but not the
female) are reduced in size.'].
eggs:loc ---> ['Eggs are laid in groups on the hackberry plant.'].
cater:eats ---> ['The caterpillar eats hackberry (celtis).'].
adult:eats ---> ['The adult sips nectar of the flowers from asters,
dogbane, dogwood, goldenrod, sweet pepperbush, and more.'].
adult:appear ---> ['Adult American Snout butterflies look like dead
leaves.'].
adult:other ---> ['They sometimes go on long migrations.'].
adult:fam ---> ['They are brush-footed butterflies (Family
Libytheidae).'].
Here is a fairly plausible schema. Note that the family always comes last. Eggs, pupa
and caterpillar are generally described in that order.
:- multifile '--->'/2.
distinguished(s).
s ---> intro, males, females, eggs, pupa, cater, concl.
intro ---> adult:appear*, adult:size* .
concl ---> adult:eats*, adult:loc*, adult:other*, adult:fam.
males ---> male:appear* .
females ---> female:appear* .
eggs ---> eggs:loc*, eggs:appear*, eggs:size*, eggs:other* .
pupa ---> pupa:loc*, pupa:appear*, pupa:size*, pupa:other* .
cater ---> cater:loc*, cater:appear*, cater:size*, cater:eats*,
cater:other* .
Issues that arise:



A fact of the form “X is a Y” should probably always come first. Should this
kind of fact be given a different classification?
I have not dealt with all aspects of linguistic reformulation. For instance, “the
caterpillar” would only make sense if the butterfly has already been introduced
In the American Snout example, one sentence gives the Latin name for
hackberry, whereas another sentence also mentions the plant. Clearly in a real
text, the Latin name should be given on the first occurrence. Is this something
that would be decided by later stages in the generation process (referring
expression generation/lexical choice/realisation)? If so, then it probably
doesn’t make sense to think of the Latin name information being part of either
of the facts.
Part 2
I would expect something like the following:
1. Accurate set = all hypernyms of “speller” = {speller, primer, textbook, book,
publication, work, product, …}
Conveys set = all hyponyms of “publication” = {publication, book, textbook,
primer, speller}
Intersection = {publication, book, …, speller}
Probably chooses book as being basic level
2. Accurate set = all hypernyms of “stock car” = {stock car, racing car, car,
motor vehicle, …, vehicle, …}
Conveys set = all hyponyms of “racing car” = {racing car, stock car}
Intersection = {racing car, stock car}
Probably picks racing car as being closer to basic level
In each case, one could arguably consider synonyms at each level. My basic strategy
was to find a word with roughly the right meaning (to accurately describe the entity or
satisfy the communicative goal) and navigate around there to find more and less
specific words that would belong to the two sets.
Download