MT-1

advertisement

CS60057

Speech &Natural Language

Processing

Autumn 2007

Lecture 14b

24 August 2007

Natural Language Processing Lecture 1, 7/21/2005 1

LING 180 SYMBSYS 138

Intro to Computer Speech and

Language Processing

Lecture 9: Machine Translation (I)

November 7, 2006

Dan Jurafsky

Lecture 1, 7/21/2005

Thanks to Bonnie Dorr for some of these slides!!

Natural Language Processing 2

Outline for MT Week

 Intro and a little history

 Language Similarities and Divergences

 Three classic MT Approaches

 Transfer

 Interlingua

 Direct

 Modern Statistical MT

 Evaluation

Lecture 1, 7/21/2005 Natural Language Processing 3

What is MT?

 Translating a text from one language to another automatically.

Lecture 1, 7/21/2005 Natural Language Processing 4

Machine Translation

 dai yu zi zai chuang shang gan nian bao chai you ting jian chuang wai zhu shao xiang ye zhe shang, yu sheng xi li, qing han tou mu, bu jue you di xia lei lai.

Dai-yu alone on bed top think-of-with-gratitude Bao-chai again listen to window outside bamboo tip plantain leaf of on-top rain sound sigh drop clear cold penetrate curtain not feeling again fall down tears come

As she lay there alone, Daiyu’s thoughts turned to Bao-chai… Then she listened to the insistent rustle of the rain on the bamboos and plantains outside her window. The coldness penetrated the curtains of her bed. Almost without noticing it she had begun to cry.

Lecture 1, 7/21/2005 Natural Language Processing 5

Machine Translation

Lecture 1, 7/21/2005 Natural Language Processing 6

Machine Translation

The Story of the Stone

=The Dream of the Red Chamber (Cao Xueqin 1792)

Issues:

 Word segmentation

Sentence segmentation: 4 English sentences to 1 Chinese

Grammatical differences

Chinese rarely marks tense:

As, turned to, had begun,

 tou -> penetrated

Zero anaphora

No articles

Stylistic and cultural differences

Bamboo tip plaintain leaf -> bamboos and plantains

Ma ‘curtain’ -> curtains of her bed

Rain sound sigh drop -> insistent rustle of the rain

Lecture 1, 7/21/2005 Natural Language Processing 7

Not just literature

 Hansards: Canadian parliamentary proceeedings

Lecture 1, 7/21/2005 Natural Language Processing 8

What is MT not good for?

 Really hard stuff

 Literature

 Natural spoken speech (meetings, court reporting)

 Really important stuff

 Medical translation in hospitals, 911

Lecture 1, 7/21/2005 Natural Language Processing 9

What is MT good for?

 Tasks for which a rough translation is fine

 Web pages, email

 Tasks for which MT can be post-edited

MT as first pass

“Computer-aided human translation

 Tasks in sublanguage domains where high-quality MT is possible

 FAHQT

Lecture 1, 7/21/2005 Natural Language Processing 10

Sublanguage domain

 Weather forecasting

“Cloudy with a chance of showers today and Thursday”

“Low tonight 4”

Can be modeling completely enough to use raw MT output

Word classes and semantic features like MONTH, PLACE,

DIRECTION, TIME POINT

Lecture 1, 7/21/2005 Natural Language Processing 11

MT History

 1946 Booth and Weaver discuss MT at Rockefeller foundation in

New York;

1947-48 idea of dictionary-based direct translation

1949 Weaver memorandum popularized idea

1952 all 18 MT researchers in world meet at MIT

1954 IBM/Georgetown Demo Russian-English MT

1955-65 lots of labs take up MT

Lecture 1, 7/21/2005 Natural Language Processing 12

History of MT: Pessimism

 1959/1960: BarHillel “Report on the state of MT in US and GB”

 Argued FAHQT too hard (semantic ambiguity, etc)

Should work on semi-automatic instead of automatic

His argument

Little John was looking for his toy box. Finally, he found it. The box was in the pen . John was very happy.

Only human knowledge let’s us know that ‘playpens’ are bigger than boxes, but ‘writing pens’ are smaller

His claim: we would have to encode all of human knowledge

13 Lecture 1, 7/21/2005 Natural Language Processing

History of MT: Pessimism

 The ALPAC report

 Headed by John R. Pierce of Bell Labs

Conclusions:

 Supply of human translators exceeds demand

 All the Soviet literature is already being translated

MT has been a failure: all current MT work had to be post-edited

Sponsored evaluations which showed that intelligibility and informativeness was worse than human translations

Results:

 MT research suffered

 Funding loss

 Number of research labs declined

 Association for Machine Translation and Computational

Linguistics dropped MT from its name

Lecture 1, 7/21/2005 Natural Language Processing 14

History of MT

1976 Meteo, weather forecasts from English to French

Systran (Babelfish) been used for 40 years

1970’s:

 European focus in MT; mainly ignored in US

1980’s

 ideas of using AI techniques in MT (KBMT, CMU)

1990’s

Commercial MT systems

Statistical MT

Speech-to-speech translation

Lecture 1, 7/21/2005 Natural Language Processing 15

Language Similarities and Divergences

 Some aspects of human language are universal or nearuniversal, others diverge greatly.

 Typology : the study of systematic cross-linguistic similarities and differences

 What are the dimensions along with human languages vary?

16 Lecture 1, 7/21/2005 Natural Language Processing

Morphological Variation

Isolating languages

 Cantonese, Vietnamese: each word generally has one morpheme

Vs. Polysynthetic languages

Siberian Yupik (`Eskimo’): single word may have very many morphemes

Agglutinative languages

 Turkish: morphemes have clean boundaries

Vs. Fusion languages

 Russian: single affix may have many morphemes

Lecture 1, 7/21/2005 Natural Language Processing 17

Syntactic Variation

 SVO (Subject-Verb-Object) languages

 English, German, French, Mandarin

 SOV Languages

 Japanese, Hindi

 VSO languages

 Irish, Classical Arabic

 SVO lgs generally prepositions: to Yuriko

 VSO lgs generally postpositions:

Lecture 1, 7/21/2005

Yuriko ni

Natural Language Processing 18

Segmentation Variation

 Not every writing system has word boundaries marked

 Chinese, Japanese, Thai, Vietnamese

 Some languages tend to have sentences that are quite long, closer to English paragraphs than sentences:

 Modern Standard Arabic, Chinese

Lecture 1, 7/21/2005 Natural Language Processing 19

Inferential Load: cold vs. hot lgs

Some ‘cold’ languages require the hearer to do more

“figuring out” of who the various actors in the various events are:

 Japanese, Chinese,

Other ‘hot’ languages are pretty explicit about saying who did what to whom.

 English

Lecture 1, 7/21/2005 Natural Language Processing 20

Inferential Load (2)

All noun phrases in blue do not appear in Chinese text …

But they are needed for a good translation

21 Lecture 1, 7/21/2005 Natural Language Processing

Lexical Divergences

 Word to phrases:

English “computer science” = French “informatique”

 POS divergences

Eng. ‘she likes/VERB to sing’

Ger. Sie singt gerne/ADV

Eng ‘I’m hungry/ADJ

Sp. ‘tengo hambre/NOUN

Lecture 1, 7/21/2005 Natural Language Processing 22

Lexical Divergences: Specificity

Grammatical constraints

 English has gender on pronouns, Mandarin not.

So translating “3rd person” from Chinese to English, need to figure out gender of the person!

Similarly from English “they” to French “ils/elles”

Semantic constraints

English `brother’

Mandarin ‘gege’ (older) versus ‘didi’ (younger)

English ‘wall’

German ‘Wand’ (inside) ‘Mauer’ (outside)

German ‘Berg’

English ‘hill’ or ‘mountain’

Lecture 1, 7/21/2005 Natural Language Processing 23

Lexical Divergence: many-to-many

Lecture 1, 7/21/2005 Natural Language Processing 24

Lexical Divergence: lexical gaps

Japanese: no word for privacy

English: no word for Cantonese ‘haauseun’ or Japanese

‘oyakoko’ (something like `filial piety’)

English ‘cow’ versus ‘beef’, Cantonese ‘ngau’

25 Lecture 1, 7/21/2005 Natural Language Processing

Event-to-argument divergences

English

 The bottle floated out.

Spanish

La botella salió flotando.

 The bottle exited floating

Verb-framed lg: mark direction of motion on verb

 Spanish, French, Arabic, Hebrew, Japanese, Tamil, Polynesian,

Mayan, Bantu familiies

Satellite-framed lg: mark direction of motion on satellite

Crawl out, float off, jump down, walk over to, run after

Rest of Indo-European, Hungarian, Finnish, Chinese

Lecture 1, 7/21/2005 Natural Language Processing 26

Structural divergences

G: Wir treffen uns am Mittwoch

E: We’ll meet on Wednesday

Lecture 1, 7/21/2005 Natural Language Processing 27

Head Swapping

E: X swim across Y

S: X crucar Y nadando

E: I like to eat

G: Ich esse gern

E: I’d prefer vanilla

G: Mir wäre Vanille lieber

Lecture 1, 7/21/2005 Natural Language Processing 28

Thematic divergence

 Y me gusto

 I like Y

G: Mir fällt der Termin ein

 E: I forget the date

Lecture 1, 7/21/2005 Natural Language Processing 29

Divergence counts from Bonnie Dorr

 32% of sentences in UN Spanish/English Corpus (5K)

Categorial

Conflational

Structural

X tener hambre

Y have hunger

X dar pu

ñ aladas a Z

X stab Z

X entrar en Y

X enter Y

Head Swapping X cruzar Y nadando

X swim across Y

Thematic

Lecture 1, 7/21/2005

X gustar a Y

Y likes X

Natural Language Processing

98%

83%

35%

8%

6%

30

MT on the web

 Babelfish:

 http://babelfish.altavista.com/

 Google:

 http://www.google.com/search?hl=en&lr=&client=safa ri&rls=en&q="1+taza+de+jugo"+%28zumo%29+de+n aranja+5+cucharadas+de+azucar+morena&btnG=Se arch

31 Lecture 1, 7/21/2005 Natural Language Processing

3 methods for MT

 Direct

 Transfer

 Interlingua

Lecture 1, 7/21/2005 Natural Language Processing 32

Three MT Approaches:

Direct, Transfer, Interlingual

Lecture 1, 7/21/2005 Natural Language Processing 33

Direct Translation

 Proceed word-by-word through text

 Translating each word

 No intermediate structures except morphology

 Knowledge is in the form of

 Huge bilingual dictionary

 word-to-word translation information

 After word translation, can do simple reordering

 Adjective ordering English -> French/Spanish

Lecture 1, 7/21/2005 Natural Language Processing 34

Direct MT Dictionary entry

Lecture 1, 7/21/2005 Natural Language Processing 35

Direct MT

Lecture 1, 7/21/2005 Natural Language Processing 36

Problems with direct MT

 German

 Chinese

Lecture 1, 7/21/2005 Natural Language Processing 37

The Transfer Model

 Idea: apply contrastive knowledge , i.e., knowledge about the difference between two languages

 Steps:

 Analysis: Syntactically parse Source language

 Transfer: Rules to turn this parse into parse for Target language

 Generation: Generate Target sentence from parse tree

Lecture 1, 7/21/2005 Natural Language Processing 38

English to French

 Generally

 English: Adjective Noun

 French: Noun Adjective

Note: not always true

Route mauvaise ‘bad road, badly-paved road’

Mauvaise route ‘wrong road’)

 But is a reasonable first approximation

Rule:

Lecture 1, 7/21/2005 Natural Language Processing 39

Transfer rules

Lecture 1, 7/21/2005 Natural Language Processing 40

Lexical transfer

 Transfer-based systems also need lexical transfer rules

 Bilingual dictionary (like for direct MT)

 English home:

 German

 nach Hause (going home)

 Heim (home game)

 Heimat (homeland, home country)

 zu Hause (at home)

Can list “at home <-> zu Hause”

 Or do Word Sense Disambiguation

Lecture 1, 7/21/2005 Natural Language Processing 41

Systran: combining direct and transfer

Analysis

 Morphological analysis, POS tagging

 Chunking of NPs, PPs, phrases

 Shallow dependency parsing

Transfer

 Translation of idioms

 Word sense disambiguation

 Assigning prepositions based on governing verbs

Synthesis

 Apply rich bilingual dictionary

 Deal with reordering

 Morphological generation

Lecture 1, 7/21/2005 Natural Language Processing 42

Transfer: some problems

 N 2 sets of transfer rules!

 Grammar and lexicon full of language-specific stuff

 Hard to build, hard to maintain

Lecture 1, 7/21/2005 Natural Language Processing 43

Interlingua

 Intuition: Instead of lg-lg knowledge rules, use the meaning of the sentence to help

 Steps:

 1) translate source sentence into meaning representation

 2) generate target sentence from meaning.

Lecture 1, 7/21/2005 Natural Language Processing 44

Interlingua for

Mary did not slap the green witch

Lecture 1, 7/21/2005 Natural Language Processing 45

Interlingua

Idea is that some of the MT work that we need to do is part of other NLP tasks

E.g., disambiguating E:book S:‘libro’ from E:book

S:‘reservar’

 So we could have concepts like BOOKVOLUME and

RESERVE and solve this problem once for each language

Lecture 1, 7/21/2005 Natural Language Processing 46

Direct MT: pros and cons

(Bonnie Dorr)

Pros

 Fast

Simple

Cheap

 No translation rules hidden in lexicon

Cons

Unreliable

Not powerful

Rule proliferation

Requires lots of context

Major restructuring after lexical substitution

Lecture 1, 7/21/2005 Natural Language Processing 47

Interlingual MT: pros and cons

(B. Dorr)

 Pros

 Avoids the N 2 problem

 Easier to write rules

 Cons:

 Semantics is HARD

 Useful information lost (paraphrase)

Lecture 1, 7/21/2005 Natural Language Processing 48

The impossibility of translation

Hebrew “adonoi roi” for a culture without sheep or shepherds

 Something fluent and understandable, but not faithful:

“The Lord will look after me”

 Something faithful, but not fluent and nautral

“The Lord is for me like somebody who looks after animals with cottonlike hair”

49 Lecture 1, 7/21/2005 Natural Language Processing

What makes a good translation

 Translators often talk about two factors we want to maximize:

 Faithfulness or fidelity

 How close is the meaning of the translation to the meaning of the original

 (Even better: does the translation cause the reader to draw the same inferences as the original would have)

 Fluency or naturalness

 How natural the translation is, just considering its fluency in the target language

Lecture 1, 7/21/2005 Natural Language Processing 50

Statistical MT:

Faithfulness and Fluency formalized!

 Best-translation of a source sentence S:

ˆ  argmax

T fluency ( T )faithfulness ( T , S )

 Developed by researchers who were originally in speech recognition at IBM



 Called the IBM model

51 Lecture 1, 7/21/2005 Natural Language Processing

The IBM model



Hmm, those two factors might look familiar…

ˆ  argmax

T fluency ( T )faithfulness ( T , S )

Yup, it’s Bayes rule:

ˆ  argmax

T

P ( T ) P ( S | T )



Lecture 1, 7/21/2005 Natural Language Processing 52

More formally

Assume we are translating from a foreign language sentence F to an English sentence E:

 F = f

1

, f

2

, f

3

,…, f m

We want to find the best English sentence

 E-hat = e

1

, e

2

, e

3

,…, e n

 E-hat = argmax

E

P(E|F)

= argmax

E

P(F|E)P(E)/P(F)

= argmax

E

P(F|E)P(E)

Translation Model Language Model

Lecture 1, 7/21/2005 Natural Language Processing 53

The noisy channel model for MT

Lecture 1, 7/21/2005 Natural Language Processing 54

Fluency: P(T)

How to measure that this sentence

 That car was almost crash onto me is less fluent than this one:

 That car almost hit me.

Answer: language models (N-grams!)

 For example P(hit|almost) > P(was|almost)

But can use any other more sophisticated model of grammar

Advantage: this is monolingual knowledge!

Lecture 1, 7/21/2005 Natural Language Processing 55

Faithfulness: P(S|T)

French: ça me plait [that me pleases]

English:

 that pleases me - most fluent

I like it

I’ll take that one

How to quantify this?

Intuition: degree to which words in one sentence are plausible translations of words in other sentence

 Product of probabilities that each word in target sentence would generate each word in source sentence.

Lecture 1, 7/21/2005 Natural Language Processing 56

Faithfulness P(S|T)

Need to know, for every target language word, probability of it mapping to every source language word.

How do we learn these probabilities?

Parallel texts!

 Lots of times we have two texts that are translations of each other

 If we knew which word in Source Text mapped to each word in Target Text, we could just count!

57 Lecture 1, 7/21/2005 Natural Language Processing

Faithfulness P(S|T)

 Sentence alignment:

 Figuring out which source language sentence maps to which target language sentence

 Word alignment

 Figuring out which source language word maps to which target language word

Lecture 1, 7/21/2005 Natural Language Processing 58

Big Point about Faithfulness and

Fluency

Job of the faithfulness model P(S|T) is just to model

“bag of words”; which words come from say English to

Spanish.

P(S|T) doesn’t have to worry about internal facts about

Spanish word order: that’s the job of P(T)

 P(T) can do Bag generation: put the following words in order (from Kevin Knight)

 have programming a seen never I language better

-actual the hashing is since not collision-free usually the is less perfectly the of somewhat capacity table

Lecture 1, 7/21/2005 Natural Language Processing 59

P(T) and bag generation: the answer

“Usually the actual capacity of the table is somewhat less, since the hashing is not collisionfree”

 How about:

 loves Mary John

Lecture 1, 7/21/2005 Natural Language Processing 60

Summary

 Intro and a little history

 Language Similarities and Divergences

 Three classic MT Approaches

 Transfer

 Interlingua

 Direct

 Modern Statistical MT

 Evaluation

Lecture 1, 7/21/2005 Natural Language Processing 61

Download