UNIVERSITY OF ABERDEEN SESSION 2008-2009 Examination in CS4025 (Natural Language Processing) January 2009 (9am-11am) Answer question 1 and ONE other question. Each question is worth 25 marks and the marks for each part of a question are shown in brackets. The two questions you attempt count equally. 1. This question is compulsory. Answers to the parts of this question should be brief and succinct. (a) Briefly define what is meant by the semantics of a natural language utterance, and how this differs from the pragmatics. (2.5) (b) Write down one path that could be taken through the following Hidden Markov model that produces the output “C1 C2 C3 C4 C5” and the probability of this path being taken. S2 0.8 0.7 0.2 0.3 S1 State S1: Output C1 C2 C3 State S2: Output C2 C3 C4 Probability 0.5 0.3 0.2 S3 Probability 0.8 0.1 0.1 State S3: Output C4 C5 Probability 0.5 0.5 You don’t have to calculate the actual answer as a number, as long as you show the formula that would be used to calculate it. (3.75) (c) Write down an arithmetic expression for the contents of the cell marked “???” in the following partially complete table using the Viterbi algorithm to find the optimal sequence of part of speech tags for the sentence Often handles march on. You can assume that: P(N|N) = 0.5 <start> ………. P(N|V) = 0.4 Often ………. ………. ………. handles N: 0.3 V: 0.2 P(march|N) = 0.1 march N: ??? on P(N|march) = 0.6 <end> You don’t have to calculate the value of the expression, but you should indicate how you have derived it. (3.75) (d) Give an example of a sentence where knowledge of the syntactic structure is needed (rather than, say, just knowledge of the words in the sentence) in order to determine what is meant. Explain briefly how syntactic information helps in this case. (2.5) (e) Write down two desirable properties for a formal language to be used for representing natural language meaning. (2.5) (f) What is the name of the mechanism used to implement the stages of the FASTUS information extraction engine (also used in the ANNIE system). (1.25) (g) Write down two ways in which humans can help a machine translation system produce better quality. (2.5) (h) What are the two main inputs assumed by Reiter’s model of lexical choice? Choose one of these and give an example where considering only this input could lead to the generation of an inappropriate word. (2.5) (i) Give an example of a simple measurable feature that you could provide to a machine learning system learning a classifier for disambiguating the word bat (indicating either an animal or a tool used in cricket) in context. (1.25) (j) Write down two ways in which a speech dialogue system can reduce the number of errors made because of incorrect speech recognition. (2.5) Only answer one of the following questions 2. One of the skills involved in engineering NLP systems is to find solutions that solve practical problems adequately whilst involving minimal complexity in the language models used. (a) Give an example of an application type where it is possible to build systems with different types of language models (e.g. involving simple information about individual words or involving complex information about how sentences convey meaning). Justify that both types of models could be appropriate. Indicate what criteria you might use to choose between these different possibilities for a given situation. (5.0) (b) Give an example of an NLP system or approach that only knows about individual words (not about their context). Explain what knowledge of words it needs and given an overview of how it works. Give examples to illustrate your explanation. (7.5) (c) Give an example of an NLP system or approach that models the structure of sentences and what they mean. Explain what knowledge of language it needs and given an overview of how it works. Give examples to illustrate your explanation. (7.5) (d) What does the system/approach in part (b) gain and lose through the simplicity of its approach. Similarly, what does the system/approach in part (c) gain and lose through its complexity? Would there be scope for implementing a system modelling sentence meaning for applications tackled by (b), or a system only using knowledge of individual words for applications tackled by (c)? (5.0) Question 3 is on the next page 3. A perfect non-statistical natural language understanding (NLU) or generation (NLG) system would need to have a huge amount of knowledge of the world. (a) Outline what happens in the stages of a “classic” NLU architecture (morphology, syntax, semantics, pragmatics), showing the possible analysis of an example input for a hypothetical natural language interface to a database. At what points in this process might world knowledge be used? (7.5) (b) Show why world knowledge is essential for NLU by naming two NLU tasks that in general require knowledge of the world to perform perfectly. For each task, give an example where world knowledge would be useful or essential, but where knowledge of language alone would not suffice. Justify your claims. (7.5) (c) Name an example of a natural language generation (NLG) system and give two examples of places where knowledge of the world (not just language) could be crucial in getting appropriate performance from the system. You can base your answers on what you anticipate the particular application requires, and need not base it on knowledge of how the actual system works. (5.0) (d) Nobody has yet captured a large amount of world knowledge in machine processible form, but we still have real NLP applications. How can this be? Discuss this question. (5.0) Answers NB Marks given below are out of 100. These will be divided by 4 to give the final marks. 1. (a) The semantics of an utterance is the meaning that can be computed from knowledge of the words and the syntactic structure, without regard to context. The pragmatics considers how the sentence might actually function within some specific context. (Marking: 5 for some reference to meaning; 5 for context as the key difference, or some similar concept) (b) Two possible routes. S1 -> S2 -> S1 -> S2 -> S3. The following numbers need to be multiplied together. The ones in brackets are the transition probabilities and the others are the output probabilities. 0.5 (0.2) 0.8 (0.3) 0.2 (0.2) 0.1 (0.7) 0.5 S1->S1->S1->S2->S3 0.5 (0.8) 0.3 (0.8) 0.2 (0.2) 0.1 (0.7) 0.5 (Marking: 5 for multiplying; 5 for having both transitions and outputs; 5 for correct path and correct lookup of numbers) (c) max(0.3*0.5*0.1, 0.2*0.4*0.1) (Marking: 5 for getting triples of numbers; 5 for multiplying and max; 5 for using P(march|N) rather than P(N|march) – 0.1 rather than 0.6) (d) Many examples possible here, e.g. at the simplest in John saw Mary, who saw whom? Here knowledge of word order (and the fact that it is an active verb) tells who the underlying agent and patient are. Lecture 7 gave a number of examples where syntactic knowledge resolves ambiguity. (Marking: 5 for two plausible sentences, 5 for quality of explanation) (e) Examples (from lecture 11): Lack of ambiguity, coverage of all relevant shades of meaning, canonicality, support of inference. (Marking: 5 each; no further explanations needed apart from such as the above) (f) Finite state automata (also accept finite state transducers) (Marking: 5 for one of these) (g) Preparing the input carefully (e.g. using a controlled language); postediting MT output (Marking:5 for each of these) (h) Domain knowledge and communication aims. Ignoring the first means one could say something false, ignoring the latter could mean failing to achieve the desired effect. (Marking: 5 for each) (i) Something like (a) the word coming 2 words before the occurrence of bat, (b) the number of times the word cricket occurs within a window of 10 words each side. (Marking: 5 for anything similar to one of these – it doesn’t have to be particularly sensible, but it has to involve measuring one of these types of things) (j) Various possibilities, e,g, (a) designing questions so that different answers will be acoustically different, (b) repeating information back to the user, (c) going into clarification subdialogues if confidence is low. (Marking: 5 each for two things like one of these) 2. (a) A good example would be MT, where there is anything from direct translation to use of an interlingua. But you could probably argue that just about any application, depending on what sort of performance you want. Criteria should include things like: (a) cost of creating and maintaining system (human and machine cost), (b) portability, (c) understandability, (d) level of performance needed (e.g. is it safety critical?) (b) Example could be vector space approach to IR, unigram models for speech recognition, template spotting for question answering. Need to make clear essentially what an entry in the lexicon would contain (e.g. just frequency within a corpus). I would expect some explanation showing understanding of the mathematical principles involved. As it is worth 30%, this answer should be reasonably substantial. (c) Example could be question answering, summarisation, NL interface to database. Need to present something about grammar and semantic interpretation rules, with a simple worked example, e.g. to show the use of compositionality. As it is worth 30%, this answer should be reasonably substantial. (d) The system in (b) is probably easy to port and develop (in human time), but will fail on complex examples (give marks for indicating where it will fail), producing maybe adequate but certainly not perfect performance across a wide area of application. The system in (c) will be complex to develop, will only work in a narrow domain but is likely to produce high quality results there. The last question is really to test the ability to imagine perhaps rather crazy possibilities and what their properties would probably be. Mark this part fairly hard, as it should be the part that distinguishes best the good students. 3. (a) This is standard bookwork. Allow 25 marks for the presentation of the stages, leaving 5 for saying something appropriate about the use of knowledge (which could be used at all stages probably) (b) Examples might include pronoun resolution, word sense disambuation, parsing, POS tagging. I'm looking for some understanding here of places where there may be a problem (most likely, ambiguity) and a fairly clear case that the problem can't be solved (perfectly) by simpler methods. (c) I am looking here for awareness that the key issue for NLG is choice, and examples that show this (e.g. choice of ordering of material, what to include, what words to use, how to refer to something) (d) I would expect the answer to address issues such as the scope and functionality required of real applications - many systems only need to work in limited domains or are not required to be perfect in order to be useful (e.g. the story of MT). This is supposed to be rather open-ended.