Second assessment for NLP, November 2012 The second half of Chapter 25 of Jurafsky & Martin (2nd edition) discusses statistical approaches to Machine Translation; these approaches are also discussed briefly in the lectures, which focus on a rule-based method based on syntactic transfer. We'd like you to try your best to answer the following questions, based on a thorough reading of relevant parts of the book (e.g., pages 895--912). Perfect answers are not expected: the idea is to get you to think about the different approaches, looking at some hard problems. Where you are unsure about the meaning of certain Chinese/Japanese/English expressions, please make some reasonable assumptions and state them clearly. Please note the maximum word limits on each question. a. [max 250 words] Consider the Noisy Channel formula for MT, for example in the form Argmax_E [P(F | E) P(E)] on p.911 and p.912, where F is a source language and E is English. First, explain briefly and informally how this model works, then sketch briefly and informally how it might apply to the example on p.911, where a Japanese phase containing four words is translated into English as (for example) "I apologize", explaining how the model handles the faithfulness and the fluency of a translation. (I assume that the literal translation of the Japanese phrase is something like "We are deeply reflecting".) b. [max 250 words] Consider the example of translation between Chinese and English with which the chapter opens (see e.g. Fig.25.1). Focussing on the third sentence (C3 and E3 in the Figure), discuss how suitable, or unsuitable, the statistical method discussed in the second half of the chapter is for achieving this kind of translation (i.e., translating C3 into E3). What obstacles do you see? c. [max 250 words] Compare the statistical MT method discussed in sections 25.3-25.8 of the book with the transfer-based method, by first explaining in general terms how the transfer-based method works, then discussing whether you believe that a transfer-based method would be likely to produce better translations of the examples in (a) and (b) than a statistical method. Explain your answers briefly. Answers (only intended as guidelines) a. The noisy-channel based method can be seen as balancing the two main requirements of faithfulness (i.e., fidelity) and fluency. The language model P(E) models fluency, because the more probable a string E of words is, the more fluent it (hopefully!) is. The translation model P(F|E) models faithfulness, because the more probable the source string F is, as a translation of the English string E, the more faithful F (hopefully) is to E. Applied to the Japanese-English example, the translation "We are deeply reflecting" has high faithfulness (as modelled by P(F|E)), because (we assume) it offers a literal translation; it must, however, have rather low fluency (as modelled by P(E)), because this turn of phrase is rare in English. In other words, the model as a whole suggests that this is only a mediocre translation. The alternative translation, "I apologize", offers the mirror image of the previous translation, because it has low faithfulness but high fluency; this translation is probably a bit mediocre as well. The task of translating this phrase is hard because it's difficult to find a good “compromise” translation, which has decent fluency and decent faithfulness. Perhaps no good translation exists. b. The translator decided to clarify the expression "the curtains" by adding "... of her bed". As explained in the book, these words are added because whereas Chinese readers will understand from the context that it is bed curtains that are referred to, modern English readers are not likely to understand this unless they are told explicitly. The statistical approach to MT would be unlikely to come up with Hawkes' creative translation because the addition " ... of her bed" give the English sentence both a lower faithfulness with respect to the Chinese wording (i.e., low P(F|E)) and a lower fluency -- precisely because English texts will rarely have occasion to talk about bed curtains (unless the texts are translated from Chinese). Another unexpected feature of the translation is the omission of the word "clear". It is not obvious why the translator decided to do this. It reduces (MT-statistical) fluency, and perhaps faithfulness as well. The English translation contains the definite article more often than the Chinese original. This is probably as it should be. If the English phrase "the coldness" is lined up with the Chinese word "cold" (and analogous for "curtains"), in the style of section 25.4, then this does not diminish the faithfulness score, while the fluency score benefits greatly. In other words, the statistical model could do this. The English translation also has more information about tense (i.e., time); it’s not easy to see how the model can get this right. c. (Start with a sketch of the transfer-based method, see book or lecture slides. This is omitted here.) Chinese example (focussing on the curtains): In the transfer model, it would be possible to make the Chinese word for curtains ambiguous between, say, curtains-1 ("window curtains") and curtains-2 ("bed curtains"). This move would allow two possible translations for the Chinese sentence, one for each version of the Chinese word. The challenge would be to choose between these two, because nothing in the text appears to decide between the two meanings. This method might use human intervention, allowing a person to choose between possible translations offered by the system. It's hard to see how this could work in the statistical approach. Japanese example (focussing on politeness): The problem is that no perfect English translation appears to exist. If one existed then it would have been possible to "hard code" the Japanese phrase as translating into that perfect English translation, treating it as an idiom. Both problems are typical for translation across a cultural divide: in one case Chinese readers in the 18th century versus modern English readers, who have different world knowledge; in the other case Japanese and English speakers, who do not express politeness in the same situations. Translation in other situations will often be much easier.