Second assessment for NLP, November 2012 The second half of

advertisement
Second assessment for NLP, November 2012
The second half of Chapter 25 of Jurafsky & Martin (2nd edition)
discusses statistical approaches to Machine Translation; these
approaches are also discussed briefly in the lectures, which focus on
a rule-based method based on syntactic transfer. We'd like you to try
your best to answer the following questions, based on a thorough
reading of relevant parts of the book (e.g., pages 895--912). Perfect
answers are not expected: the idea is to get you to think about the
different approaches, looking at some hard problems. Where you are
unsure about the meaning of certain Chinese/Japanese/English
expressions, please make some reasonable assumptions and state
them clearly. Please note the maximum word limits on each question.
a. [max 250 words] Consider the Noisy Channel formula for MT, for
example in the form Argmax_E [P(F | E) P(E)] on p.911 and p.912,
where F is a source language and E is English. First, explain briefly
and informally how this model works, then sketch briefly and
informally how it might apply to the example on p.911, where a
Japanese phase containing four words is translated into English as
(for example) "I apologize", explaining how the model handles the
faithfulness and the fluency of a translation. (I assume that the literal
translation of the Japanese phrase is something like "We are deeply
reflecting".)
b. [max 250 words] Consider the example of translation between
Chinese and English with which the chapter opens (see e.g.
Fig.25.1). Focussing on the third sentence (C3 and E3 in the Figure),
discuss how suitable, or unsuitable, the statistical method discussed
in the second half of the chapter is for achieving this kind of
translation (i.e., translating C3 into E3). What obstacles do you see?
c. [max 250 words] Compare the statistical MT method discussed in
sections 25.3-25.8 of the book with the transfer-based method, by
first explaining in general terms how the transfer-based method
works, then discussing whether you believe that a transfer-based
method would be likely to produce better translations of the examples
in (a) and (b) than a statistical method. Explain your answers briefly.
Answers (only intended as guidelines)
a. The noisy-channel based method can be seen as balancing the
two main requirements of faithfulness (i.e., fidelity) and fluency. The
language model P(E) models fluency, because the more probable a
string E of words is, the more fluent it (hopefully!) is. The translation
model P(F|E) models faithfulness, because the more probable the
source string F is, as a translation of the English string E, the more
faithful F (hopefully) is to E.
Applied to the Japanese-English example, the translation "We are
deeply reflecting" has high faithfulness (as modelled by P(F|E)),
because (we assume) it offers a literal translation; it must, however,
have rather low fluency (as modelled by P(E)), because this turn of
phrase is rare in English. In other words, the model as a whole
suggests that this is only a mediocre translation. The alternative
translation, "I apologize", offers the mirror image of the previous
translation, because it has low faithfulness but high fluency; this
translation is probably a bit mediocre as well. The task of translating
this phrase is hard because it's difficult to find a good “compromise”
translation, which has decent fluency and decent faithfulness.
Perhaps no good translation exists.
b. The translator decided to clarify the expression "the curtains" by
adding "... of her bed". As explained in the book, these words are
added because whereas Chinese readers will understand from the
context that it is bed curtains that are referred to, modern English
readers are not likely to understand this unless they are told explicitly.
The statistical approach to MT would be unlikely to come up with
Hawkes' creative translation because the addition " ... of her bed"
give the English sentence both a lower faithfulness with respect to the
Chinese wording (i.e., low P(F|E)) and a lower fluency -- precisely
because English texts will rarely have occasion to talk about bed
curtains (unless the texts are translated from Chinese).
Another unexpected feature of the translation is the omission of the
word "clear". It is not obvious why the translator decided to do this. It
reduces (MT-statistical) fluency, and perhaps faithfulness as well.
The English translation contains the definite article more often than
the Chinese original. This is probably as it should be. If the English
phrase "the coldness" is lined up with the Chinese word "cold" (and
analogous for "curtains"), in the style of section 25.4, then this does
not diminish the faithfulness score, while the fluency score benefits
greatly. In other words, the statistical model could do this. The
English translation also has more information about tense (i.e., time);
it’s not easy to see how the model can get this right.
c. (Start with a sketch of the transfer-based method, see book or
lecture slides. This is omitted here.)
Chinese example (focussing on the curtains): In the transfer model, it
would be possible to make the Chinese word for curtains ambiguous
between, say, curtains-1 ("window curtains") and curtains-2 ("bed
curtains"). This move would allow two possible translations for the
Chinese sentence, one for each version of the Chinese word. The
challenge would be to choose between these two, because nothing in
the text appears to decide between the two meanings. This method
might use human intervention, allowing a person to choose between
possible translations offered by the system. It's hard to see how this
could work in the statistical approach.
Japanese example (focussing on politeness): The problem is that no
perfect English translation appears to exist. If one existed then it
would have been possible to "hard code" the Japanese phrase as
translating into that perfect English translation, treating it as an idiom.
Both problems are typical for translation across a cultural divide: in
one case Chinese readers in the 18th century versus modern English
readers, who have different world knowledge; in the other case
Japanese and English speakers, who do not express politeness in
the same situations. Translation in other situations will often be much
easier.
Download