CS60057
Autumn 2007
Lecture 14b
24 August 2007
Natural Language Processing Lecture 1, 7/21/2005 1
Lecture 9: Machine Translation (I)
November 7, 2006
Dan Jurafsky
Lecture 1, 7/21/2005
Thanks to Bonnie Dorr for some of these slides!!
Natural Language Processing 2
Intro and a little history
Language Similarities and Divergences
Three classic MT Approaches
Transfer
Interlingua
Direct
Modern Statistical MT
Evaluation
Lecture 1, 7/21/2005 Natural Language Processing 3
Translating a text from one language to another automatically.
Lecture 1, 7/21/2005 Natural Language Processing 4
dai yu zi zai chuang shang gan nian bao chai you ting jian chuang wai zhu shao xiang ye zhe shang, yu sheng xi li, qing han tou mu, bu jue you di xia lei lai.
Dai-yu alone on bed top think-of-with-gratitude Bao-chai again listen to window outside bamboo tip plantain leaf of on-top rain sound sigh drop clear cold penetrate curtain not feeling again fall down tears come
As she lay there alone, Daiyu’s thoughts turned to Bao-chai… Then she listened to the insistent rustle of the rain on the bamboos and plantains outside her window. The coldness penetrated the curtains of her bed. Almost without noticing it she had begun to cry.
Lecture 1, 7/21/2005 Natural Language Processing 5
Lecture 1, 7/21/2005 Natural Language Processing 6
The Story of the Stone
=The Dream of the Red Chamber (Cao Xueqin 1792)
Issues:
Word segmentation
Sentence segmentation: 4 English sentences to 1 Chinese
Grammatical differences
Chinese rarely marks tense:
As, turned to, had begun,
tou -> penetrated
Zero anaphora
No articles
Stylistic and cultural differences
Bamboo tip plaintain leaf -> bamboos and plantains
Ma ‘curtain’ -> curtains of her bed
Rain sound sigh drop -> insistent rustle of the rain
Lecture 1, 7/21/2005 Natural Language Processing 7
Hansards: Canadian parliamentary proceeedings
Lecture 1, 7/21/2005 Natural Language Processing 8
Really hard stuff
Literature
Natural spoken speech (meetings, court reporting)
Really important stuff
Medical translation in hospitals, 911
Lecture 1, 7/21/2005 Natural Language Processing 9
Tasks for which a rough translation is fine
Web pages, email
Tasks for which MT can be post-edited
MT as first pass
“Computer-aided human translation
Tasks in sublanguage domains where high-quality MT is possible
FAHQT
Lecture 1, 7/21/2005 Natural Language Processing 10
Weather forecasting
“Cloudy with a chance of showers today and Thursday”
“Low tonight 4”
Can be modeling completely enough to use raw MT output
Word classes and semantic features like MONTH, PLACE,
DIRECTION, TIME POINT
Lecture 1, 7/21/2005 Natural Language Processing 11
1946 Booth and Weaver discuss MT at Rockefeller foundation in
New York;
1947-48 idea of dictionary-based direct translation
1949 Weaver memorandum popularized idea
1952 all 18 MT researchers in world meet at MIT
1954 IBM/Georgetown Demo Russian-English MT
1955-65 lots of labs take up MT
Lecture 1, 7/21/2005 Natural Language Processing 12
1959/1960: BarHillel “Report on the state of MT in US and GB”
Argued FAHQT too hard (semantic ambiguity, etc)
Should work on semi-automatic instead of automatic
His argument
Little John was looking for his toy box. Finally, he found it. The box was in the pen . John was very happy.
Only human knowledge let’s us know that ‘playpens’ are bigger than boxes, but ‘writing pens’ are smaller
His claim: we would have to encode all of human knowledge
13 Lecture 1, 7/21/2005 Natural Language Processing
The ALPAC report
Headed by John R. Pierce of Bell Labs
Conclusions:
Supply of human translators exceeds demand
All the Soviet literature is already being translated
MT has been a failure: all current MT work had to be post-edited
Sponsored evaluations which showed that intelligibility and informativeness was worse than human translations
Results:
MT research suffered
Funding loss
Number of research labs declined
Association for Machine Translation and Computational
Linguistics dropped MT from its name
Lecture 1, 7/21/2005 Natural Language Processing 14
1976 Meteo, weather forecasts from English to French
Systran (Babelfish) been used for 40 years
1970’s:
European focus in MT; mainly ignored in US
1980’s
ideas of using AI techniques in MT (KBMT, CMU)
1990’s
Commercial MT systems
Statistical MT
Speech-to-speech translation
Lecture 1, 7/21/2005 Natural Language Processing 15
Language Similarities and Divergences
Some aspects of human language are universal or nearuniversal, others diverge greatly.
Typology : the study of systematic cross-linguistic similarities and differences
What are the dimensions along with human languages vary?
16 Lecture 1, 7/21/2005 Natural Language Processing
Isolating languages
Cantonese, Vietnamese: each word generally has one morpheme
Vs. Polysynthetic languages
Siberian Yupik (`Eskimo’): single word may have very many morphemes
Agglutinative languages
Turkish: morphemes have clean boundaries
Vs. Fusion languages
Russian: single affix may have many morphemes
Lecture 1, 7/21/2005 Natural Language Processing 17
SVO (Subject-Verb-Object) languages
English, German, French, Mandarin
SOV Languages
Japanese, Hindi
VSO languages
Irish, Classical Arabic
SVO lgs generally prepositions: to Yuriko
VSO lgs generally postpositions:
Lecture 1, 7/21/2005
Yuriko ni
Natural Language Processing 18
Not every writing system has word boundaries marked
Chinese, Japanese, Thai, Vietnamese
Some languages tend to have sentences that are quite long, closer to English paragraphs than sentences:
Modern Standard Arabic, Chinese
Lecture 1, 7/21/2005 Natural Language Processing 19
Some ‘cold’ languages require the hearer to do more
“figuring out” of who the various actors in the various events are:
Japanese, Chinese,
Other ‘hot’ languages are pretty explicit about saying who did what to whom.
English
Lecture 1, 7/21/2005 Natural Language Processing 20
All noun phrases in blue do not appear in Chinese text …
But they are needed for a good translation
21 Lecture 1, 7/21/2005 Natural Language Processing
Word to phrases:
English “computer science” = French “informatique”
POS divergences
Eng. ‘she likes/VERB to sing’
Ger. Sie singt gerne/ADV
Eng ‘I’m hungry/ADJ
Sp. ‘tengo hambre/NOUN
Lecture 1, 7/21/2005 Natural Language Processing 22
Grammatical constraints
English has gender on pronouns, Mandarin not.
So translating “3rd person” from Chinese to English, need to figure out gender of the person!
Similarly from English “they” to French “ils/elles”
Semantic constraints
English `brother’
Mandarin ‘gege’ (older) versus ‘didi’ (younger)
English ‘wall’
German ‘Wand’ (inside) ‘Mauer’ (outside)
German ‘Berg’
English ‘hill’ or ‘mountain’
Lecture 1, 7/21/2005 Natural Language Processing 23
Lecture 1, 7/21/2005 Natural Language Processing 24
Japanese: no word for privacy
English: no word for Cantonese ‘haauseun’ or Japanese
‘oyakoko’ (something like `filial piety’)
English ‘cow’ versus ‘beef’, Cantonese ‘ngau’
25 Lecture 1, 7/21/2005 Natural Language Processing
English
The bottle floated out.
Spanish
La botella salió flotando.
The bottle exited floating
Verb-framed lg: mark direction of motion on verb
Spanish, French, Arabic, Hebrew, Japanese, Tamil, Polynesian,
Mayan, Bantu familiies
Satellite-framed lg: mark direction of motion on satellite
Crawl out, float off, jump down, walk over to, run after
Rest of Indo-European, Hungarian, Finnish, Chinese
Lecture 1, 7/21/2005 Natural Language Processing 26
G: Wir treffen uns am Mittwoch
E: We’ll meet on Wednesday
Lecture 1, 7/21/2005 Natural Language Processing 27
E: X swim across Y
S: X crucar Y nadando
E: I like to eat
G: Ich esse gern
E: I’d prefer vanilla
G: Mir wäre Vanille lieber
Lecture 1, 7/21/2005 Natural Language Processing 28
Y me gusto
I like Y
G: Mir fällt der Termin ein
E: I forget the date
Lecture 1, 7/21/2005 Natural Language Processing 29
32% of sentences in UN Spanish/English Corpus (5K)
Categorial
Conflational
Structural
X tener hambre
Y have hunger
X dar pu
ñ aladas a Z
X stab Z
X entrar en Y
X enter Y
Head Swapping X cruzar Y nadando
X swim across Y
Thematic
Lecture 1, 7/21/2005
X gustar a Y
Y likes X
Natural Language Processing
98%
83%
35%
8%
6%
30
Babelfish:
http://babelfish.altavista.com/
Google:
http://www.google.com/search?hl=en&lr=&client=safa ri&rls=en&q="1+taza+de+jugo"+%28zumo%29+de+n aranja+5+cucharadas+de+azucar+morena&btnG=Se arch
31 Lecture 1, 7/21/2005 Natural Language Processing
Direct
Transfer
Interlingua
Lecture 1, 7/21/2005 Natural Language Processing 32
Lecture 1, 7/21/2005 Natural Language Processing 33
Proceed word-by-word through text
Translating each word
No intermediate structures except morphology
Knowledge is in the form of
Huge bilingual dictionary
word-to-word translation information
After word translation, can do simple reordering
Adjective ordering English -> French/Spanish
Lecture 1, 7/21/2005 Natural Language Processing 34
Lecture 1, 7/21/2005 Natural Language Processing 35
Lecture 1, 7/21/2005 Natural Language Processing 36
German
Chinese
Lecture 1, 7/21/2005 Natural Language Processing 37
Idea: apply contrastive knowledge , i.e., knowledge about the difference between two languages
Steps:
Analysis: Syntactically parse Source language
Transfer: Rules to turn this parse into parse for Target language
Generation: Generate Target sentence from parse tree
Lecture 1, 7/21/2005 Natural Language Processing 38
Generally
English: Adjective Noun
French: Noun Adjective
Note: not always true
Route mauvaise ‘bad road, badly-paved road’
Mauvaise route ‘wrong road’)
But is a reasonable first approximation
Rule:
Lecture 1, 7/21/2005 Natural Language Processing 39
Lecture 1, 7/21/2005 Natural Language Processing 40
Transfer-based systems also need lexical transfer rules
Bilingual dictionary (like for direct MT)
English home:
German
nach Hause (going home)
Heim (home game)
Heimat (homeland, home country)
zu Hause (at home)
Can list “at home <-> zu Hause”
Or do Word Sense Disambiguation
Lecture 1, 7/21/2005 Natural Language Processing 41
Systran: combining direct and transfer
Analysis
Morphological analysis, POS tagging
Chunking of NPs, PPs, phrases
Shallow dependency parsing
Transfer
Translation of idioms
Word sense disambiguation
Assigning prepositions based on governing verbs
Synthesis
Apply rich bilingual dictionary
Deal with reordering
Morphological generation
Lecture 1, 7/21/2005 Natural Language Processing 42
N 2 sets of transfer rules!
Grammar and lexicon full of language-specific stuff
Hard to build, hard to maintain
Lecture 1, 7/21/2005 Natural Language Processing 43
Intuition: Instead of lg-lg knowledge rules, use the meaning of the sentence to help
Steps:
1) translate source sentence into meaning representation
2) generate target sentence from meaning.
Lecture 1, 7/21/2005 Natural Language Processing 44
Lecture 1, 7/21/2005 Natural Language Processing 45
Idea is that some of the MT work that we need to do is part of other NLP tasks
E.g., disambiguating E:book S:‘libro’ from E:book
S:‘reservar’
So we could have concepts like BOOKVOLUME and
RESERVE and solve this problem once for each language
Lecture 1, 7/21/2005 Natural Language Processing 46
(Bonnie Dorr)
Pros
Fast
Simple
Cheap
No translation rules hidden in lexicon
Cons
Unreliable
Not powerful
Rule proliferation
Requires lots of context
Major restructuring after lexical substitution
Lecture 1, 7/21/2005 Natural Language Processing 47
(B. Dorr)
Pros
Avoids the N 2 problem
Easier to write rules
Cons:
Semantics is HARD
Useful information lost (paraphrase)
Lecture 1, 7/21/2005 Natural Language Processing 48
Hebrew “adonoi roi” for a culture without sheep or shepherds
Something fluent and understandable, but not faithful:
“The Lord will look after me”
Something faithful, but not fluent and nautral
“The Lord is for me like somebody who looks after animals with cottonlike hair”
49 Lecture 1, 7/21/2005 Natural Language Processing
Translators often talk about two factors we want to maximize:
Faithfulness or fidelity
How close is the meaning of the translation to the meaning of the original
(Even better: does the translation cause the reader to draw the same inferences as the original would have)
Fluency or naturalness
How natural the translation is, just considering its fluency in the target language
Lecture 1, 7/21/2005 Natural Language Processing 50
Statistical MT:
Faithfulness and Fluency formalized!
Best-translation of a source sentence S:
ˆ argmax
T fluency ( T )faithfulness ( T , S )
Developed by researchers who were originally in speech recognition at IBM
Called the IBM model
51 Lecture 1, 7/21/2005 Natural Language Processing
Hmm, those two factors might look familiar…
ˆ argmax
T fluency ( T )faithfulness ( T , S )
Yup, it’s Bayes rule:
ˆ argmax
T
P ( T ) P ( S | T )
Lecture 1, 7/21/2005 Natural Language Processing 52
Assume we are translating from a foreign language sentence F to an English sentence E:
F = f
1
, f
2
, f
3
,…, f m
We want to find the best English sentence
E-hat = e
1
, e
2
, e
3
,…, e n
E-hat = argmax
E
P(E|F)
= argmax
E
P(F|E)P(E)/P(F)
= argmax
E
P(F|E)P(E)
Translation Model Language Model
Lecture 1, 7/21/2005 Natural Language Processing 53
Lecture 1, 7/21/2005 Natural Language Processing 54
How to measure that this sentence
That car was almost crash onto me is less fluent than this one:
That car almost hit me.
Answer: language models (N-grams!)
For example P(hit|almost) > P(was|almost)
But can use any other more sophisticated model of grammar
Advantage: this is monolingual knowledge!
Lecture 1, 7/21/2005 Natural Language Processing 55
French: ça me plait [that me pleases]
English:
that pleases me - most fluent
I like it
I’ll take that one
How to quantify this?
Intuition: degree to which words in one sentence are plausible translations of words in other sentence
Product of probabilities that each word in target sentence would generate each word in source sentence.
Lecture 1, 7/21/2005 Natural Language Processing 56
Need to know, for every target language word, probability of it mapping to every source language word.
How do we learn these probabilities?
Parallel texts!
Lots of times we have two texts that are translations of each other
If we knew which word in Source Text mapped to each word in Target Text, we could just count!
57 Lecture 1, 7/21/2005 Natural Language Processing
Sentence alignment:
Figuring out which source language sentence maps to which target language sentence
Word alignment
Figuring out which source language word maps to which target language word
Lecture 1, 7/21/2005 Natural Language Processing 58
Job of the faithfulness model P(S|T) is just to model
“bag of words”; which words come from say English to
Spanish.
P(S|T) doesn’t have to worry about internal facts about
Spanish word order: that’s the job of P(T)
P(T) can do Bag generation: put the following words in order (from Kevin Knight)
have programming a seen never I language better
-actual the hashing is since not collision-free usually the is less perfectly the of somewhat capacity table
Lecture 1, 7/21/2005 Natural Language Processing 59
“Usually the actual capacity of the table is somewhat less, since the hashing is not collisionfree”
How about:
loves Mary John
Lecture 1, 7/21/2005 Natural Language Processing 60
Intro and a little history
Language Similarities and Divergences
Three classic MT Approaches
Transfer
Interlingua
Direct
Modern Statistical MT
Evaluation
Lecture 1, 7/21/2005 Natural Language Processing 61