Tutorial to Statistical Machine Translation

advertisement
Tutorial to
Statistical
Machine
Translation
Dr Khalil
Sima’an
Word-Based
Models
Alignment
Symmetrization
Phrase-Based
Models
Limitations of
PB Models
Statistical Machine Translation
Part I: Khalil Sima’an
Data and Models
Universiteit van Amsterdam
Syntax
Part II: Trevor Cohn
Decoding and efficiency
University of Sheffield
Tutorial to
Statistical
Machine
Translation
Dr Khalil
Sima’an
Word-Based
Models
Alignment
Symmetrization
Statistical Machine Translation: PART I
Dr. Khalil Sima’an
Statistical Language Processing and Learning
Institute for Logic, Language and Computation
Universiteit van Amsterdam
Phrase-Based
Models
Limitations of
PB Models
Syntax
Some slides use figures from Philipp Koehn, Barry Haddow and Sophie Arnoult
Data and Models: Structure of lecture
Tutorial to
Statistical
Machine
Translation
Dr Khalil
Sima’an
Word-Based
Models
Alignment
Symmetrization
Phrase-Based
Models
Limitations of
PB Models
Syntax
General statistical framework
Word-based models: word alignments
Phrase-based models: phrase-alignments
Tree-based models: tree-alignments
Statistical Approach: Parallel Corpora
Tutorial to
Statistical
Machine
Translation
Dr Khalil
Sima’an
Task: Translate a source sentence f to a target sentence e.
Data: Parallel corpus (source-target sentence pairs).
‫ﻧﺤﻦ ﺍﻟﺸﻌﻮﺏ‬
‫ﻭﺳﺎﺋﻂ ﺍﻹﻋﻼﻡ ﺍﻟﻤﺘﻌﺪﺩﺓ ﻭﺛﺎﺋﻖ ﻭﺧﺮﺍﺋﻂ ﻣﻨﺸﻮﺭﺍﺕ ﻭﻃﻮﺍﺑﻊ ﻭﻗﻮﺍﻋﺪ ﺍﻟﺒﻴﺎﻧﺎﺕ ﺍﻷﻣﻢ ﺍﻟﻤﺘﺤﺪﺓ ﺗﻌﻤﻞ ﺑﺤﺚ‬
Word-Based
Models
‫ﺍﻟﺴﻠﻢ ﻭﺍﻷﻣﻦ ﺍﻟﺘﻨﻤﻴﺔ ﺍﻻﻗﺘﺼﺎﺩﻳﺔ ﻭﺍﻻﺟﺘﻤﺎﻋﻴﺔ ﺣﻘﻮﻕ ﺍﻹﻧﺴﺎﻥ ﺍﻟﺸﺆﻭﻥ ﺍﻹﻧﺴﺎﻧﻴﺔ ﺍﻟﻘﺎﻧﻮﻥ ﺍﻟﺪﻭﻟﻲ‬
Daily Briefing│Press Releases│Radio, TV, Photo│Documents, Maps│Publications, Stamps, Databases│UN Works
Peace & Security│ Economic & Social Development │ Human Rights │ Humanitarian Affairs │ International Law
Welcome to the United Nations
Alignment
Symmetrization
Situation in the Middle East
UN News Centre
Situation in Iraq
About the United Nations
Renewing the United Nations
UN Action against Terrorism
Main Bodies
Issues on the UN Agenda
Conferences & Events
Phrase-Based
Models
Limitations of
PB Models
Syntax
‫ﺍﻷﻣﻴﻦ ﺍﻟﻌﺎﻡ‬
‫ﺍﻷﻫﺪﺍﻑ ﺍﻹﻧﻤﺎﺋﻴﺔ ﻟﻸﻣﻢ‬
‫ﺍﻟﺤﺎﻟﺔ ﻓﻲ ﺍﻟﺸﺮﻕ ﺍﻷﻭﺳﻂ‬
Secretary-General
UN Millennium
Development Goals
‫ﺍﻟﻤﺘﺤﺪﺓ‬
‫ﺍﻟﺤﺎﻟﺔ ﻓﻲ ﺍﻟﻌﺮﺍﻕ‬
‫ﻣﺮﻛﺰ ﺃﻧﺒﺎء ﺍﻷﻣﻢ ﺍﻟﻤﺘﺤﺪﺓ‬
‫ﺗﺠﺪﻳﺪ ﺍﻷﻣﻢ ﺍﻟﻤﺘﺤﺪﺓ‬
‫ﺍﻷﻣﻢ ﺍﻟﻤﺘﺤﺪﺓ ﻓﻲ ﻣﻮﺍﺟﻬﺔ ﺍﻹﺭﻫﺎﺏ‬
‫ﻗﻀﺎﻳﺎ ﻋﻠﻰ ﺟﺪﻭﻝ ﺃﻋﻤﺎﻝ ﺍﻷﻣﻢ ﺍﻟﻤﺘﺤﺪﺓ‬
‫ ﺍﻟﻤﺆﺳﺴﺎﺕ‬/ ‫ﺍﻟﻤﺠﺘﻤﻊ ﺍﻟﻤﺪﻧﻲ‬
‫ﻣﻌﻠﻮﻣﺎﺕ ﻋﻦ ﺍﻷﻣﻢ ﺍﻟﻤﺘﺤﺪﺓ‬
‫ﺃﻫﻼ ﺑﻜﻢ ﻓﻲ ﻣﻮﻗﻊ ﺍﻷﻣﻢ‬
‫ﺍﻷﺟﻬﺰﺓ ﺍﻟﺮﺋﻴﺴﻴﺔ‬
‫ﺍﻟﻤﺘﺤﺪﺓ‬
‫ﻣﺆﺗﻤﺮﺍﺕ ﻭﻣﻨﺎﺳﺒﺎﺕ‬
‫ﺍﻟﺘﺠﺎﺭﻳﺔ‬
Civil Society & Business
Member States
UN Webcast
General Assembly President
CyberSchoolBus
International Conference on Financing for
Development, Doha, Qatar
29 November - 2 December 2008
Home │ Recent Additions │ Employment │ Procurement │ Comments │ Q&A │ UN System Sites │ Index │ Search
‫ | ﻋﺮﺑﻲ‬中文 | English | Français | Русский | Español
Copyright, United Nations, 2000-2008 | Terms of Use | Privacy Notice | Help
‫ﺍﻟﺒﺚ ﺍﻟﺸﺒﻜﻲ‬
‫ﺍﻟﺪﻭﻝ ﺍﻷﻋﻀﺎء‬
‫ﺍﻟﺤﺎﻓﻠﺔ ﺍﻟﻤﺪﺭﺳﻴﺔ ﺍﻹﻟﻜﺘﺮﻭﻧﻴﺔ‬
‫ﺭﺋﻴﺲ ﺍﻟﺠﻤﻌﻴﺔ ﺍﻟﻌﺎﻣﺔ‬
‫ﻣﺆﺗﻤﺮ ﺍﻟﻤﺘﺎﺑﻌﺔ ﺍﻟﺪﻭﻟﻲ ﻟﺘﻤﻮﻳﻞ ﺍﻟﺘﻨﻤﻴﺔ ﺍﻟﻤﻌﻨﻲ‬
‫ﺑﺎﺳﺘﻌﺮﺍﺽ ﺗﻨﻔﻴﺬ ﺗﻮﺍﻓﻖ ﺁﺭﺍء ﻣﻮﻧﺘﻴﺮﻱ‬
‫ﻗﻄﺮ‬-‫ﺍﻟﺪﻭﺣﺔ‬
‫ﺩﻳﺴﻤﺒﺮ‬/‫ ﻛﺎﻧﻮﻥ ﺍﻷﻭﻝ‬2 - ‫ﻧﻮﻓﻤﺒﺮ‬/‫ ﺗﺸﺮﻳﻦ ﺍﻟﺜﺎﻧﻲ‬29
2008
‫ﺻﻔﺤﺔ ﺍﻻﺳﺘﻘﺒﺎﻝ ﺇﺿﺎﻓﺎﺕ ﺟﺪﻳﺪﺓ ﻓﺮﺹ ﻋﻤﻞ ﺷﻌﺒﺔ ﺍﻟﻤﺸﺘﺮﻳﺎﺕ ﻣﻼﺣﻈﺎﺕ ﺃﺳﺌﻠﺔ ﻭﺃﺟﻮﺑﺔ ﻣﻮﺍﻗﻊ ﻣﻨﻈﻮﻣﺔ ﺍﻷﻣﻢ ﺍﻟﻤﺘﺤﺪﺓ ﻓﻬﺮﺱ ﺍﻟﻤﻮﻗﻊ‬
‫ ﻋﺮﺑﻲ‬中文 English Français Русский Español
‫ﺻﻔﺤﺔ ﺍﻟﻤﺴﺎﻋﺪﺓ ﺷﺮﻭﻁ ﺍﺳﺘﺨﺪﺍﻡ ﺍﻟﻤﻮﻗﻊ ﺑﻴﺎﻥ ﺍﻟﺨﺼﻮﺻﻴﺎﺕ‬
2008-2000 ‫ﺣﻘﻮﻕ ﺍﻟﻄﺒﻊ ﺍﻷﻣﻢ ﺍﻟﻤﺘﺤﺪﺓ‬
Source-Channel Approach: IBM Models (1990’s)
Parallel Corpus Example
Tutorial to
Statistical
Machine
Translation
Dr Khalil
Sima’an
Parallel corpus C = a collection of text-chunks and their
translations.
Parallel corpora are the by-product of human translation.
Every source chunk is paired with a target chunk.
Word-Based
Models
Alignment
Symmetrization
Phrase-Based
Models
Limitations of
PB Models
Dutch
De prijs van het huis is gestegen.
Het huis kan worden verkocht.
Als het de marktprijs daalt zullen sommige
gezinnen een zware tijd doormaken.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
English
The price of the house has risen.
The house can be sold.
If the market price goes down, some families
will go through difficult times.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Syntax
Hansards Canadian Parliament Proc. (English-French).
European Parliament Proc. (23 languages).
United Nations documents.
Newspapers: Chinese-English; Arabic-English; Urdu-English.
TAUS corpora.
Generative Source-Channel Framework
Tutorial to
Statistical
Machine
Translation
Dr Khalil
Sima’an
Word-Based
Models
Alignment
Symmetrization
Phrase-Based
Models
Given source sentence f, select target sentence e
L.M.
T .M.
z }| { z }| {
arg maxe∈E(f) { P(e | f) } = arg maxe∈E(f) { P(e) × P(f | e) }
Set E(f) is the set of hypothesized translations of f.
P(f | e): accounts for divergence in . . .
word order
Limitations of
PB Models
morphology
Syntax
syntactic relations
idiomatic ways of expression
..
.
How to estimate P(e | f)? Sparse-data problem!
Inducing The Structure of Translation Data
Tutorial to
Statistical
Machine
Translation
e = Mary did not slap the green witch .
????
Dr Khalil
Sima’an
Word-Based
Models
Alignment
Symmetrization
Phrase-Based
Models
Limitations of
PB Models
Syntax
f = Maria no dio una bofetada a la bruja verde .
The latent structure of translation equivalence
Graphical representations ∆f and ∆e for f and e.
Relation a between ∆f and ∆e
arg maxe∈E(f) { P(e | f) } =
arg maxe∈E(f) {
P
h∆f ,a,∆e i
P(e, ∆f , ∆e , a | f) }
The difficult question: Which ∆f/e and a fit data best?
Structure in current models
Tutorial to
Statistical
Machine
Translation
a
∆f → ∆e
Dr Khalil
Sima’an
Word-Based
Models
Alignment
Symmetrization
Phrase-Based
Models
Limitations of
PB Models
Syntax
In most current models structure of reordering:
∆f/e are structures over word positions.
a is an alignment between groups of word positions in
∆f and ∆e .
Challenge: Number of permutations of n words is n!
Structure shows translation units composing together
What are the atomic translation units?
How these compose together efficiently?
How to put probs. on these structures?
Structure helps combat sparsity and complexity
Structure in Existing Models: Sketch
l
Tutorial to
Statistical
Machine
Translation
uage
Lang
aj
Alignment a
Word-based
Dr Khalil
Sima’an
el on
Mod
e
el
e aj
e1
1
"translation dictionary"
m
j
fm
fj
1
f1
Word-Based
Models
Alignment
Symmetrization
Phrase-based
Phrase-Based
Models
Limitations of
PB Models
l
uage
aj
Syntax
Lang
1
Tree-based
Mod
el on
e
el
e aj
e1
m
j
fm
fj
1
f1
Problem: No sufficient stats to estimate P(e | f) from data
Tutorial to
Statistical
Machine
Translation
Dr Khalil
Sima’an
Word-Based
Models
Alignment
Symmetrization
Phrase-Based
Models
Limitations of
PB Models
Syntax
Word-Based Models: Word Alignments
Some History and References
Tutorial to
Statistical
Machine
Translation
Dr Khalil
Sima’an
Word-Based
Models
Alignment
Symmetrization
Phrase-Based
Models
Limitations of
PB Models
Syntax
Statistical models with word-alignments:
Brown, Cocke, Della Pietra, Della Pietra, Jelinek,
Lafferty, Mercer and Roossin. A statistical approach to
machine translation. Computational Linguistics, 1990.
Brown, Della Pietra, Della Pietra and Mercer. The
mathematics of statistical machine translation:
parameter estimation., Computational Linguistics,
1993.
Och and Ney: A Systematic Comparison of Various
Statistical Alignment Models. Computational
Linguistics, 2003.
Word-Based Models and Word-Alignment
Tutorial to
Statistical
Machine
Translation
Dr Khalil
Sima’an
Word-Based
Models
Alignment
Symmetrization
Phrase-Based
Models
Limitations of
PB Models
Syntax
a is a mapping between word positions.
a
e
f
∆f and ∆e are sequences of word positions.
e = e1l = e1 . . . el and f = f1m = f1 . . . fm
A hidden word-alignment a:
X
P(f | e) =
P(a, f | e)
a
Assume that a target word-position ei translates into
zero or more source word-positions
a : {posf } → ({pose } ∪ {0})
ai or a(i), i.e., word position in e with which fi is aligned.
Word Alignment Example
Tutorial to
Statistical
Machine
Translation
Dr Khalil
Sima’an
Word-Based
Models
Alignment
Symmetrization
Phrase-Based
Models
Limitations of
PB Models
Syntax
Word Alignment Example
Tutorial to
Statistical
Machine
Translation
Dr Khalil
Sima’an
Word-Based
Models
Alignment
Symmetrization
Phrase-Based
Models
Limitations of
PB Models
Syntax
(Le reste appartenait aux autochtones |
Word Alignment Example: Not covered in this
setting
Tutorial to
Statistical
Machine
Translation
Dr Khalil
Sima’an
Word-Based
Models
Alignment
Symmetrization
Phrase-Based
Models
Limitations of
PB Models
Syntax
Word Alignment Matrix Example
Tutorial to
Statistical
Machine
Translation
Dr Khalil
Sima’an
Word-Based
Models
Alignment
Symmetrization
Phrase-Based
Models
Limitations of
PB Models
Syntax
Translation model with word alignment
Tutorial to
Statistical
Machine
Translation
Dr Khalil
Sima’an
Word-Based
Models
Alignment
Symmetrization
Phrase-Based
Models
Limitations of
PB Models
Syntax
arg maxe P(e | f) = arg maxe P(e)×P(f | e)
P(f | e) =
Questions
P
a P(a, f
| e) =
P
a P(a|
e) × P(f | a, e)
How to parametrize the model?
How are e, f and a composed from basic units?
How to train the model?
How to acquire word alignment?
How to translate with this model?
Decoding and computational issues (for second part)
Word-Alignment As Hidden Structure
Tutorial to
Statistical
Machine
Translation
l
ge M
ngua
La
aj
Dr Khalil
Sima’an
Alignment a
Alignment
Symmetrization
e aj
"translation dictionary"
m
j
Phrase-Based
Models
Limitations of
PB Models
el
e1
1
Word-Based
Models
on e
odel
fm
fj
1
Syntax
f1
We need to decompose
The alignment a and the length m: P(a | e)
“Translation dictionary” P(f | e, a)
Word Alignment Models: General Scheme
Tutorial to
Statistical
Machine
Translation
Dr Khalil
Sima’an
Word-Based
Models
Alignment
Symmetrization
Phrase-Based
Models
Limitations of
PB Models
Syntax
Alignment of positions in f with positions in e:
a = a1m = a1 . . . am
Markov process over a
P(a1m , f1m | e1l ) = P(m | e) ×
m
Y
P(aj | a1j−1 , f1j−1 , m, e) × P(fj | a1j , f1j−1 , m, e)
j=1
In words: to generate alignment a and foreign sentence f
1
2
3
Choose a length m for f
Generate alignment aj given the preceding alignments,
words in f, m, and e
Generate word fj conditioned on structure so far and e.
IBM models are obtained by simplifications of this formula.
IBM Model I
Tutorial to
Statistical
Machine
Translation
Dr Khalil
Sima’an
Word-Based
Models
Alignment
Symmetrization
Phrase-Based
Models
Limitations of
PB Models
Syntax
P(a1m , f1m | e1 . . . el ) = P(m | e) ×
m
Y
P(aj | a1j−1 , f1j−1 , m, e) × P(fj | a1j , f1j−1 , m, e)
j=1
IBM Model I:
Length: P(m | e) =≈ P(m | l) ≈ = A fixed probability .
Align with uniform probability j with any aj in el1 or
NULL: P(aj | a1j−1 , f1j−1 , m, e) ≈ (l + 1)−1
Note that aj can be linked with l positions in e or with NULL.
Lexicon: lexicon parameters πt (f | e)
P(fj | a1j , f1j−1 , m, e) ≈ P(fj | eaj ) = πt (fj | eaj )
Parameters: and {πt (f | e) | hf , ei ∈ C}.
Sketch IBM Model I
Tutorial to
Statistical
Machine
Translation
l
Dr Khalil
Sima’an
L
age
angu
aj
Word-Based
Models
e
el
eaj
1
Alignment
Symmetrization
el on
Mod
e1
label every f position with an e position
choose length m given l
"translation dictionary"
Phrase-Based
Models
m
Limitations of
PB Models
j
fm
Syntax
fj
1
f1
IBM Model I Parameters and Data Likelhood
Tutorial to
Statistical
Machine
Translation
Dr Khalil
Sima’an
Data Likelihood:
P(f |e) =
=
Alignment
Symmetrization
Limitations of
PB Models
P(a1m , f1m | e1 . . . el )
a1m
Word-Based
Models
Phrase-Based
Models
X
l
l Y
m
X
X
×
...
πt (fj | eaj )
(l + 1)m
a1 =0
am =0 j=1
Parameters: and {πt (f | e) | hf , ei ∈ C}.
Fix , i.e., in practice put a uniform probability over a range [1..m], for some natural number m.
Syntax
Dilemma
To estimate these parameters we need word-alignment
To get word-alignment we need these parameters.
IBM Model II
Tutorial to
Statistical
Machine
Translation
Dr Khalil
Sima’an
Word-Based
Models
Alignment
Symmetrization
Phrase-Based
Models
Limitations of
PB Models
Syntax
Extends IBM Model I at alignment probs:
P(a1m , f1m
m
Y
| e1 . . . el ) ≈ × P(aj | a1j−1 , f1j−1 , m, e)×πt (fj | eaj )
j=1
IBM Model II: changes only one element in IBM Model I:
IBM Model I does not take into account the position of
words in both strings
P(aj | a1j−1 , f1j−1 , m, e) = P(aj | j, l, m) := πA (aj | j, l, m)
Where πA (.|.) are parameters to be learned from data.
IBM Models III, IV and V concentrate on more complex
alignments allowing, e.g., 1 − to − n (fertility)
IBM Model II Parameters
Tutorial to
Statistical
Machine
Translation
Dr Khalil
Sima’an
Word-Based
Models
Alignment
Symmetrization
Phrase-Based
Models
Limitations of
PB Models
Syntax
P(a1m , f1m | e1 . . . el ) ≈ ×
m
Y
πA (aj | j, l, m) × πt (fj | eaj )
j=1
Parameters: {πA (aj | j, l, m)} and {πt (fj | eaj )}
Dilemma
To estimate these parameters we need word-alignment
To get word-alignment we need these parameters.
Estimating Model Parameters
Tutorial to
Statistical
Machine
Translation
Dr Khalil
Sima’an
Word-Based
Models
Alignment
Symmetrization
Phrase-Based
Models
Limitations of
PB Models
Syntax
Maximum-Likelihood Estimation of model M on parallel
corpus C
Y
arg max P(C | m) = arg max
Pm (e | f)
m∈M
m∈M
he,fiin C
Example IBM Model I:
Model M is defined by model parameters.
Data is incomplete: no closed form solution.
Expectation-Maximization (EM) sketch
Init: Set the parameters at some m0 and let i = 0
Repeat until convergence (in perplexity)
EMi (C) = C completed using estimate mi
EMi (C) contains mi -expectations over he, f, ai: P(a | f, e)
mi+1 = Relative Frequency Estimates from EMi (C).
EM for Lexicon and Word Alignment Probs
Tutorial to
Statistical
Machine
Translation
Dr Khalil
Sima’an
Word-Based
Models
Alignment
Symmetrization
Phrase-Based
Models
Limitations of
PB Models
Syntax
... la maison ... la maison blue ... la fleur ...
... the house ... the blue house ... the flower ...
EM for Lexicon and Word Alignment Probs
Tutorial to
Statistical
Machine
Translation
Dr Khalil
Sima’an
Word-Based
Models
Alignment
Symmetrization
Phrase-Based
Models
Limitations of
PB Models
Syntax
... la maison ... la maison blue ... la fleur ...
... the house ... the blue house ... the flower ...
EM for Lexicon and Word Alignment Probs
Tutorial to
Statistical
Machine
Translation
Dr Khalil
Sima’an
Word-Based
Models
Alignment
Symmetrization
Phrase-Based
Models
Limitations of
PB Models
Syntax
... la maison ... la maison bleu ... la fleur ...
... the house ... the blue house ... the flower ...
EM for Lexicon and Word Alignment Probs
Tutorial to
Statistical
Machine
Translation
Dr Khalil
Sima’an
Word-Based
Models
Alignment
Symmetrization
Phrase-Based
Models
Limitations of
PB Models
Syntax
... la maison ... la maison bleu ... la fleur ...
... the house ... the blue house ... the flower ...
Translation Using EM Estimates
Tutorial to
Statistical
Machine
Translation
Dr Khalil
Sima’an
Word-Based
Models
Alignment
Symmetrization
Phrase-Based
Models
Limitations of
PB Models
Syntax
Lexicon probability estimates: {π̂t (fj | eaj )}
Alignment probabilities: {πˆA (aj | j, m, l)}
Translation Model + Language Model + Decoder
X
arg maxe P(e | f) = arg maxe P(e) ×
P(a, f | e)
a
Source Language Text
Preprocessing
Global search
e* = argmax p(e|f)
e
Preprocessing
Target Language Text
Language model
p(e)
Translation model
p(f|e)
Viterbi Word-Alignment using EM estimates
Tutorial to
Statistical
Machine
Translation
Dr Khalil
Sima’an
Word-Based
Models
Alignment
Symmetrization
After EM has stabilized on estimates
{π̂t (fj | eaj )}
For every hf, ei in C apply the following
arg maxam P(a1m , f1m | e1 . . . el ) ≈
Phrase-Based
Models
Limitations of
PB Models
Syntax
and {πˆA (aj | j, m, l)}
1
arg maxam ×
1
m
Y
j=1
πˆA (aj | j, m, l) × π̂t (fj | eaj )
HMM Alignment Model: General Form
Tutorial to
Statistical
Machine
Translation
Dr Khalil
Sima’an
Word-Based
Models
Alignment
Symmetrization
Phrase-Based
Models
Limitations of
PB Models
Syntax
m
Y
P(a1m , f1m | e1 . . . el ) ≈ × P(aj | a1j−1 , f1j−1 , m, e)×πt (fj | eaj )
j=1
Words do not move independently of each other:
condition word movement on previous word movement
P(aj | a1j−1 , f1j−1 , m, e) ≈ P(aj | aj−1 , m)
IBM Model III (and IV): Example
Tutorial to
Statistical
Machine
Translation
Dr Khalil
Sima’an
Word-Based
Models
A hidden word-alignment a: P(f | e) =
P
a P(a, f
Alignment
Symmetrization
Phrase-Based
Models
Limitations of
PB Models
Syntax
Estimate alignment + lexicon + reordering + fertility
parameters.
| e)
Word-based Models (Och & Ney 2003)
Tutorial to
Statistical
Machine
Translation
Dr Khalil
Sima’an
Word-Based
Models
Alignment
Symmetrization
Phrase-Based
Models
Limitations of
PB Models
Syntax
Word-Alignment As Hidden Structure:
Sufficient?
Tutorial to
Statistical
Machine
Translation
l
ge M
ngua
La
aj
Dr Khalil
Sima’an
Alignment a
Alignment
Symmetrization
e aj
"translation dictionary"
m
j
Phrase-Based
Models
Limitations of
PB Models
el
e1
1
Word-Based
Models
on e
odel
fm
fj
1
Syntax
f1
We assumed alignment between words and dictionary:
Alignment a and the length m: P(a | e)
Dictionary P(f | e, a)
Limitations of Word-based Models
Tutorial to
Statistical
Machine
Translation
Dr Khalil
Sima’an
Word-Based
Models
Alignment
Symmetrization
Phrase-Based
Models
Limitations of
PB Models
Syntax
Limitations of word-based translation:
Many-to-one and many-to-many is common:
“Makes more difficult”/bemoeilijkt “Dat richtte (hen)
ten gronde”/”That destroyed (them)”
Reordering takes place (often) by whole blocks.
Reordering individual words increases ambiguity.
“The (big heavy) cow/la vaca (pesada grande)”
Translation works by “fixed expressions” (idiomatic).
Concatenating word-translations increases ambiguity.
Estimates of P(f | e) by word-based models are inaccurate.
Instead of words as basic events: multi-word events in
corpus.
Tutorial to
Statistical
Machine
Translation
Dr Khalil
Sima’an
Word-Based
Models
Alignment
Symmetrization
Phrase-Based
Models
Limitations of
PB Models
Syntax
Obtaining Symmetrized Word Alignments
Asymetric Alignments: Limitations
Tutorial to
Statistical
Machine
Translation
Dr Khalil
Sima’an
Word-Based
Models
Alignment
Symmetrization
Word-based models presented so far are based on
asymmetric word alignment.
Each position i in f is aligned with at most one position
in e: ai
What about such word alignments?
Phrase-Based
Models
Limitations of
PB Models
Syntax
Or when a word in f translated into two or more in e?
Symmetrization Heursitics
Tutorial to
Statistical
Machine
Translation
Dr Khalil
Sima’an
Word-Based
Models
Alignment
Symmetrization
Phrase-Based
Models
Limitations of
PB Models
Syntax
Obtain Af→e and Ae→f
From Intersection Af→e ∩ Ae→f to Union Af→e ∪ Ae→f
Symmetrization Heursitics Algoritm
Tutorial to
Statistical
Machine
Translation
Dr Khalil
Sima’an
Word-Based
Models
Alignment
Symmetrization
Phrase-Based
Models
Limitations of
PB Models
Syntax
Obtain Af→e and Ae→f
From Intersection Af→e ∩ Ae→f to Union Af→e ∪ Ae→f
Tutorial to
Statistical
Machine
Translation
Dr Khalil
Sima’an
Word-Based
Models
Alignment
Symmetrization
Phrase-Based
Models
Limitations of
PB Models
Syntax
Phrase-based Models: Alignment between Phrases
Statistical “Memory-based” Translation
Tutorial to
Statistical
Machine
Translation
Dr Khalil
Sima’an
Word-Based
Models
Alignment
Symmetrization
Phrase-Based
Models
Limitations of
PB Models
Syntax
Store arbitrary length source-target translation units
from training parallel corpus.
Translate new input by “covering” it with translation
units replayed from memory.
Idiomatic = Tiling: Phrase-Based SMT
Assume word-alignment a is given in parallel corpus.
Phrase-pair = contiguous source-target hn, mi-grams
that are translational equivalents under a.
Estimate phrase-pair probabilities Θ(f i | ei )
Translate f by “tiling it with phrases with order
permutation”
PBSMT some references
Tutorial to
Statistical
Machine
Translation
Dr Khalil
Sima’an
Word-Based
Models
Alignment
Symmetrization
Phrase-Based
Models
Limitations of
PB Models
Syntax
Alignment-template Approach to Stat. Machine
Translation (RWTH Aachen 1999)
Phrase-based statistical machine translation (Zens,
Och and Ney 2002)
Phrase based SMT (Koehn, Och and Marcu 2003)
Joint Phrase-based SMT (Wang and Marcu 2005)
Statistical Machine Translation. (Ph. Koehn, Cambridge
University Press 2010)
Relation to EBMT 1984; Data-Oriented Translation (2000).
Phrase-Based Models: Conceptual
Tutorial to
Statistical
Machine
Translation
Dr Khalil
Sima’an
Word-Based
Models
Alignment
Symmetrization
Phrase-Based
Models
Saudi
economic officials
arg maxe P(e | f) = arg maxe P(e) × P(f | e)
P(f | e) =
P
I
hf 1 ,eI i
1
I
P(f 1 ,eI1 | e)
Limitations of
PB Models
Syntax
T he president meets
{z
}
|
Segment foreign sentence f
I
into I phrases f 1
arg maxe P(f | e) ≈ arg max
I
hf 1 ,eI i
1
QI
i=1
zQ
P(f i | ei )×d(starti −endi−1 −1)
ph.table
}|
{ z
Dist.reord.
}|
{
I
Θ(f i | ei )×d(starti −endi−1 −1)
i=1
starti /endi are positions of first/last words of f i (translateing to ei ).
d(x) = αx exponentially decaying in words skipped (α ∈ (0, 1]).
Phrase-Based Models: Linear-interpolation
Tutorial to
Statistical
Machine
Translation
Dr Khalil
Sima’an
Word-Based
Models
Alignment
Symmetrization
Phrase-Based
Models
Limitations of
PB Models
Syntax
Segment foreign sentence f
I
into I phrases f 1
T he president meets
{z
}
|
Saudi
economic officials
Log-linear interpolation of factors:
X
score(e|f) =
λf × log Hf (e, f)
f∈F
Where set F consists of:
Bag of phrases translation =
QI
i=1 Θ(f i
| ei )
d() Phrases reordered with reordering model d(.)
lm Language model (5-grams or even 7-grams).
other Smoothing + length penalty terms.
Topics to discuss
Tutorial to
Statistical
Machine
Translation
Dr Khalil
Sima’an
Word-Based
Models
Alignment
Symmetrization
Phrase-Based
Models
Limitations of
PB Models
Syntax
Phrase table extraction
Estimating {Θ(f i | ei )} and {Θ(ei | f i )}
Lexicalized and hierarchical phrase reordering models
Other: phrase, length penalty . . .
Log-linear interpolation and minimum error-rate training
Decoding and optimization
Extracting phrase pairs
Tutorial to
Statistical
Machine
Translation
Dr Khalil
Sima’an
Word-Based
Models
Alignment
Symmetrization
Phrase-Based
Models
Limitations of
PB Models
Syntax
A phrase pair hf , ei is consistent with alignment a iff
Non-empty: at least one alignment pair from a is in
hf , ei
No foreign positions inside hf , ei aligned to positions
outside it
No english positions inside hf , ei aligned to positions
outside it
Extracting phrase pairs
Tutorial to
Statistical
Machine
Translation
Dr Khalil
Sima’an
Word-Based
Models
Alignment
Symmetrization
Phrase-Based
Models
Limitations of
PB Models
Syntax
Extracting phrase pairs
Tutorial to
Statistical
Machine
Translation
Dr Khalil
Sima’an
Word-Based
Models
Alignment
Symmetrization
Phrase-Based
Models
Limitations of
PB Models
Syntax
Extracting phrase pairs
Tutorial to
Statistical
Machine
Translation
Dr Khalil
Sima’an
Word-Based
Models
Alignment
Symmetrization
Phrase-Based
Models
Limitations of
PB Models
Syntax
Extracting phrases from permutations
Tutorial to
Statistical
Machine
Translation
1
3
4
2
5
1
3
4
2
5
1
2
3
4
5
1
2
3
4
5
1
3
4
2
5
1
3
4
2
5
1
2
3
4
5
1
2
3
4
5
1
3
4
2
5
1
3
4
2
5
1
2
3
4
5
1
2
3
4
5
Dr Khalil
Sima’an
Word-Based
Models
Alignment
Symmetrization
Phrase-Based
Models
Limitations of
PB Models
Syntax
Phrase pair weights
Tutorial to
Statistical
Machine
Translation
Dr Khalil
Sima’an
Extract phrase pairs from corpus into multiset
Tab = {hf , ei, freq(f , e)}
Word-Based
Models
Alignment
Symmetrization
Phrase-Based
Models
Limitations of
PB Models
Syntax
Weights for hf , ei
Θ(f | e) =
P
0
freq(f ,e)
0
freq(f ,e)
hf ,ei∈Tab
Θ(e | f ) =
P
freq(e,f )
freq(f ,e0 )
hf ,e0 i∈Tab
Smoothing with lexical word alignment estimates from
IBM models
Distance-Based Reordering Sketch
Tutorial to
Statistical
Machine
Translation
Dr Khalil
Sima’an
Word-Based
Models
Alignment
Symmetrization
Phrase-Based
Models
Limitations of
PB Models
Syntax
Lexicalized Reordering Sketch (Tillmann 2004)
Tutorial to
Statistical
Machine
Translation
Dr Khalil
Sima’an
Word-Based
Models
Alignment
Symmetrization
Phrase-Based
Models
Limitations of
PB Models
Syntax
Limited generalization over parallel data (1)
Tutorial to
Statistical
Machine
Translation
Dr Khalil
Sima’an
Word-Based
Models
Alignment
Symmetrization
Phrase-Based
Models
Limitations of
PB Models
Syntax
Non-productive Phrase Table: Phrase Variants?
Morphological e.g., changing inflection, agreement
Al$arekat Alhindiyya
the-companies the-Indian
the Indian companies
$areka hindiyya
company Indian
(an) Indian company
Syntactic e.g. adding adjective/proposition/adverbials
the fish in the deep sea swims
the fish swims
Reordering minor reordering of same words not allowed
In Arabic V-S-O and S-V-O are allowed.
Semantic e.g. synonyms, paraphrases
Non-productive Phrase Table = Data Sparseness
Limited generalization over parallel data (2)
Tutorial to
Statistical
Machine
Translation
Dr Khalil
Sima’an
Word-Based
Models
Reordering
Local, monotone, almost non-lexicalized reordering.
What about long range reordering?
Alignment
Symmetrization
Phrase-Based
Models
Limitations of
PB Models
Syntax
Five phrases need to be reversed: see Chiang 2007 (J. CL).
Reordering target phrases with a coarse “source road
map”?
Limitations: Data-Sparseness
Tutorial to
Statistical
Machine
Translation
Dr Khalil
Sima’an
Word-Based
Models
Alignment
Symmetrization
Phrase-Based
Models
Limitations of
PB Models
Syntax
Non-productive phrase table + Local, Uncharted reordering
⇓
Data-sparseness: Shorter phrases will apply down to word
level.
⇓
Shorter phrases combined assuming independence.
⇓
Target phrase selection hard due to large hypotheses lattice
Target Language Model = Only “GLUE” over target phrases.
The Shorter the Phrases, the Greater the Risk
Idiomatic Approach: GOOD, BAD and UGLY
Tutorial to
Statistical
Machine
Translation
Dr Khalil
Sima’an
Word-Based
Models
Alignment
Symmetrization
Phrase-Based
Models
Limitations of
PB Models
Syntax
Phrases as atomic units
Good: Less ambiguity in lexical choice and reordering.
Match-Retrieve exactly is largely safe.
Bad: Weak generalization over data.
No phrase variants, weak reordering
Ugly: Fall-back on shorter phrases downto
word-to-word
LM as “glue” is insufficient.
Idiomatic approach does not alleviate data-sparseness
How Should We Translate Novel Phrases?
Tutorial to
Statistical
Machine
Translation
Dr Khalil
Sima’an
Word-Based
Models
Alignment
Symmetrization
Phrase-Based
Models
Limitations of
PB Models
Syntax
Towards the land of bi-trees
Alignments between Tree Pairs, ITG, Hierarchical
Models and Syntax
Hidden Structure of Translation: Tree Pairs
Tutorial to
Statistical
Machine
Translation
l
Dr Khalil
Sima’an
on e
el
uag
Lang e aj
aj
Word-Based
Models
del
e Mo
1
e1
Alignment
Symmetrization
Phrase-Based
Models
m
Limitations of
PB Models
j
Syntax
fm
fj
1
f1
Reordering n words
Tutorial to
Statistical
Machine
Translation
Dr Khalil
Sima’an
Word-Based
Models
Alignment
Symmetrization
Phrase-Based
Models
Permutations of n words: n!
1
3
4
1
2
3
5
4
5
Surely not all permutations are needed! (Wu 1995)
Use trees and allow permutations on the nodes?
There is an exponential number of trees in n
ITG hypothesis (Wu 1995)
Limitations of
PB Models
Syntax
2
[]
[]
Assume binary trees with two operations
<>
[]
1 3
4 2
5
Phrase-based forms of ITG (Chiang 2005; 2007): Hiero
Syntax-Driven Phrase Translation
Tutorial to
Statistical
Machine
Translation
Syntax-driven Re-Ordering
Hierarchical (ITG)
Dr Khalil
Sima’an
Linguistic Syntax
XP
XP
Word-Based
Models
XP
XP
Alignment
Symmetrization
Phrase-Based
Models
1
3
4
1
2
3
2
4
5
5
Limitations of
PB Models
Is translation syntactically cohesive?
Syntax
Reordering == Moving children in parse tree?
Binary: monotone or inverted order at every node.
Lexical elements can be phrase pairs.
Covers word-alignments in parallel corpora?
Word order difference and syntax: Impression
Tutorial to
Statistical
Machine
Translation
Dr Khalil
Sima’an
Word-Based
Models
Alignment
Symmetrization
Phrase-Based
Models
Limitations of
PB Models
Syntax
Phrase-based Hierarchical Model: Hiero
(Chiang 2005; 2007)
Tutorial to
Statistical
Machine
Translation
Extracting phrase-pairs with gaps (hierarchical trees):
Dr Khalil
Sima’an
Word-Based
Models
Alignment
Symmetrization
Phrase-Based
Models
Limitations of
PB Models
Syntax
X → X1 dar una bufetada a X2 / X1 slap X2
X → Maria no X la bruja verde / Mary did not X the green witch
ITG with syntactic labels
Tutorial to
Statistical
Machine
Translation
Dr Khalil
Sima’an
Word-Based
Models
Alignment
Symmetrization
Phrase-Based
Models
Limitations of
PB Models
Syntax
Tutorial to
Statistical
Machine
Translation
Dr Khalil
Sima’an
Word-Based
Models
Alignment
Symmetrization
Phrase-Based
Models
Limitations of
PB Models
Syntax
Part II: Trevor Cohn
Decoding algorithms and efficiency
Download