Factored SMT Models

advertisement
Factored SMT Models
Q.Q
June 3, 2014
Standard phrase-based models
Limitations of phrase-based models:
No explicit use of
linguistic information
Word = Token
 Words in different forms are treated
independent of each other.
 Unknown words cannot be
translated, especially in
morhologically rich languages.
ex: eat, eating, ate, eaten
Integration of linguistic information
into the translation model:
•
•
•
•
Draw on richer statistics
Overcome data sparseness problems
Direct modeling of linguistic aspects
Reordering in translation result
Word = Vector
Input
Output
Word
Word
Lemma
Lemma
POS
POS
Morphology
Morphology
Word class
Word class
...
Factored translation model
Input
Word
Lemma
POS
Output
Word
Lemma
POS
Morphology
Morphology
Word class
Word class
Decomposition
Translate input lemma to output lemma
Translate morphological and POS factors
Generate surface forms given the lemma and linguistic factors
neue
häuser
werden gebaut
new
houses
are built
Surface-form
häuser
Lemma
haus
POS
NN
Count
plural
Case
nominative
Gender
neutral
neue
häuser
werden gebaut
new
houses
are built
Input phrase expansion
Translate input lemma to output lemma
haus → house, home, building, shell
Translate morphological and POS factors
NN|plural-nominative-neutral → NN|plural, NN|singular
Generate surface forms given the lemma and linguistic factors
house|NN|plural → houses
house|NN|singular → house
home|NN|plural → homes
neue
häuser
werden gebaut
new
houses
are built
häuser|haus|NN|plural-nominative-neutral
List of translation options
Translate input lemma to output lemma
{?|house|?|?, ?|home|?|?, ?|building|?|?, ?|shell|?|?}
Translate morphological and POS factors
{?|house|NN|plural, ?|home|NN|plural, ?|building|NN|plural, ?|shell|NN|plural,
?|house|NN|singular, ... }
Generate surface forms given the lemma and linguistic factors
{houses|house|NN|plural, homes|home|NN|plural, buildings|building|NN|plural,
shells|shell|NN|plural, house|house|NN|singular, ... }
Synchronous factored models
Translation steps: on the phrase level
Generation steps: on the word level
Training
Prepare on training data (automatic tools on the corpus to add information)
Establish word alignment (symmetrized GIZA++ alignments)
Map steps to form components of the overall model
Extract phrase pairs that are consistent with the word alignment
Estimate scoring functions (conditional phrase translation probabilities or
lexical translation probabilities)
Word alignment
Extract phrase
natürlich hat john # naturally john has
Extract phrase for other factors
• ADV V NNP # ADV NNP V
Training the generation model
On the output side only:
• No word alignment
• Additional monolingual data may be
used
Learn on a word-for-word basis
Map factor(s) to factor(s)
• Example: word → POS and POS → word
– The/DET big/ADJ tree/NN
– Count collection:
count( the, DET )++
count( big, ADJ )++
count( tree, NN )++
– Probability distributions (maximum likelihood estimates)
p( the | DET ) and p( DET | the )
p( big | ADJ ) and p( ADJ | big )
p( tree | NN ) and p( NN | tree )
Combination of components
Language model
Reordering model
Translation steps
Generation steps
Efficient decoding
Mapping steps → additional complexity
• Single table → multiple tables
Pre-computation
Prior to the heuristic beam search:
• The expansions of mapping steps can be pre-computed
• can be stored as translation options
All possible translation options are computed before decoding.
• No change to fundamental search algorithm
Beam search
Empty hypothesis
New hypothesis by using all applicable translation options
Generate further hypothesis in the same manner
Cover the full input sentence
 Highest scoring complete hypothesis = Best translation according to the model
Problem
Too many translation options to handle
caused by a vast increase of
expansions by one or more mapping
steps
Current solution
• Early pruning of expansions
• Limitation on the number of translation
options per input phrase (max: 50)
Experiments and results
Moses system
http://www.statmt.org/moses/
Syntactically enriched output
Input
Output
Word
Word
Tri-gram
POS
7-gram
Syntactically enriched output
English - German
• Europarl, 30 million
words, 2006
Model
BLEU
best published
result
18.15%
baseline (surface)
18.04%
surface + POS
18.15%
surface + POS +
morph
18.22%
Morphological analysis
and generation
Input
Word
Lemma
POS
Morphology
Output
Word
Lemma
POS
Morphology
Morphological analysis
and generation
Model
BLEU
baseline (surface)
18.19%
+ POS LM
19.05%
pure lemma /
morph model
14.46%
backoff lemma /
morph model
19.47%
German - English
• News Commentary
data, 1 million words,
2007
Use of automatic word classes
Input
Output
Word
Word
Tri-gram
Word class
7-gram
Use of automatic word classes
Model
BLEU
baseline (surface)
19.54%
English - Chinese
• IWSLT, 39953
sentences, 2006
surface + word class 21.10%
Integrated recasing
Input
Lower-cased
Output
Lower-cased
Mixed-cased
Integrated recasing
Chinese - English
• IWSLT, 39953
sentences, 2006
Model
BLEU
standard two-pass:
SMT + recase
20.65%
integrated factored
model (optimized)
21.08%
References
P. Koehn and H. Hoang, "Factored translation models",
Proceedings of the 2007 Joint Conference on Empirical
Methods in Natural Language Processing and
Computational Natural Language Learning (EMNLPCoNLL), vol. 868, p. 876, 2007.
P. Koehn, Statistical Machine Translation, Cambridge
University Press, UK, pp. 127-130, 2010.
P. Porkaew, A. Takhom and T. Supnithi, "Factored
Translation Model in English-to-Thai Translation", Eighth
International Symposium on Natural Language
Processing, 2009.
S. Li, D. Wong and L. Chao, "Korean-Chinese statistical
translation model", Proceedings of the 2012 International
Conference on Machine Learning and Cybernetics, Xian,
2012.
Download