CS4025 Practical 10 – hints and answers Some students found it hard to get motivated for these exercises – maybe it’s the time of year… In practice, few got on to part 2, because many found excuses to leave early. For part 1, it is important to emphasise that for Q1 they only need to write down the list of fact types – they don’t have to type in the sentences! If students don’t appropriately use ‘clear’, they will have many excess rules loaded and the generator will take for ever. It’s important to emphasise that there isn’t necessarily a simple pattern that all these texts follow. The thing to do is to identify any strong constraints that exist, for instance: There is always an adult:family fact at the end A text usually starts with an adult:size or adult:appear Once a text starts talking about the caterpillar (or pupa, or eggs), it usually gives all facts about that before moving on. To get general constraints, one may have to look beyond the two analysed texts. Then one designs a rather specific schema that imposes a rigid ordering satisfying these constraints. Then one sees if it produces reasonable texts for all examples. It is better to start off with a rigid schema and then relax it, rather than start off with a very flexible schema (because in the second case, the software may be overwhelmed by the number of possibilities). I think that in this domain probably most orders of information will work OK. But it’s important that the students think about what orders of sentences are successful and why… Part 1 The ways I would split up three examples into facts are shown in the three example Prolog files. E.g. here is the one for American Snout: adult:appear ---> ['The American Snout (Libytheana carinenta) is a butterfly that has long labial palps (mustache-like scaly mouthparts on either side of the proboscis) that look like a long snout.']. adult:size ---> ['The butterfly has a 1 3/8 - 2 inch (3.5 - 5 cm) wingspan.']. adult:other ---> ['The front pair of legs on the male (but not the female) are reduced in size.']. eggs:loc ---> ['Eggs are laid in groups on the hackberry plant.']. cater:eats ---> ['The caterpillar eats hackberry (celtis).']. adult:eats ---> ['The adult sips nectar of the flowers from asters, dogbane, dogwood, goldenrod, sweet pepperbush, and more.']. adult:appear ---> ['Adult American Snout butterflies look like dead leaves.']. adult:other ---> ['They sometimes go on long migrations.']. adult:fam ---> ['They are brush-footed butterflies (Family Libytheidae).']. Here is a fairly plausible schema. Note that the family always comes last. Eggs, pupa and caterpillar are generally described in that order. :- multifile '--->'/2. distinguished(s). s ---> intro, males, females, eggs, pupa, cater, concl. intro ---> adult:appear*, adult:size* . concl ---> adult:eats*, adult:loc*, adult:other*, adult:fam. males ---> male:appear* . females ---> female:appear* . eggs ---> eggs:loc*, eggs:appear*, eggs:size*, eggs:other* . pupa ---> pupa:loc*, pupa:appear*, pupa:size*, pupa:other* . cater ---> cater:loc*, cater:appear*, cater:size*, cater:eats*, cater:other* . Issues that arise: A fact of the form “X is a Y” should probably always come first. Should this kind of fact be given a different classification? I have not dealt with all aspects of linguistic reformulation. For instance, “the caterpillar” would only make sense if the butterfly has already been introduced In the American Snout example, one sentence gives the Latin name for hackberry, whereas another sentence also mentions the plant. Clearly in a real text, the Latin name should be given on the first occurrence. Is this something that would be decided by later stages in the generation process (referring expression generation/lexical choice/realisation)? If so, then it probably doesn’t make sense to think of the Latin name information being part of either of the facts. Part 2 I would expect something like the following: 1. Accurate set = all hypernyms of “speller” = {speller, primer, textbook, book, publication, work, product, …} Conveys set = all hyponyms of “publication” = {publication, book, textbook, primer, speller} Intersection = {publication, book, …, speller} Probably chooses book as being basic level 2. Accurate set = all hypernyms of “stock car” = {stock car, racing car, car, motor vehicle, …, vehicle, …} Conveys set = all hyponyms of “racing car” = {racing car, stock car} Intersection = {racing car, stock car} Probably picks racing car as being closer to basic level In each case, one could arguably consider synonyms at each level. My basic strategy was to find a word with roughly the right meaning (to accurately describe the entity or satisfy the communicative goal) and navigate around there to find more and less specific words that would belong to the two sets.