Factored SMT Models Q.Q June 3, 2014 Standard phrase-based models Limitations of phrase-based models: No explicit use of linguistic information Word = Token Words in different forms are treated independent of each other. Unknown words cannot be translated, especially in morhologically rich languages. ex: eat, eating, ate, eaten Integration of linguistic information into the translation model: • • • • Draw on richer statistics Overcome data sparseness problems Direct modeling of linguistic aspects Reordering in translation result Word = Vector Input Output Word Word Lemma Lemma POS POS Morphology Morphology Word class Word class ... Factored translation model Input Word Lemma POS Output Word Lemma POS Morphology Morphology Word class Word class Decomposition Translate input lemma to output lemma Translate morphological and POS factors Generate surface forms given the lemma and linguistic factors neue häuser werden gebaut new houses are built Surface-form häuser Lemma haus POS NN Count plural Case nominative Gender neutral neue häuser werden gebaut new houses are built Input phrase expansion Translate input lemma to output lemma haus → house, home, building, shell Translate morphological and POS factors NN|plural-nominative-neutral → NN|plural, NN|singular Generate surface forms given the lemma and linguistic factors house|NN|plural → houses house|NN|singular → house home|NN|plural → homes neue häuser werden gebaut new houses are built häuser|haus|NN|plural-nominative-neutral List of translation options Translate input lemma to output lemma {?|house|?|?, ?|home|?|?, ?|building|?|?, ?|shell|?|?} Translate morphological and POS factors {?|house|NN|plural, ?|home|NN|plural, ?|building|NN|plural, ?|shell|NN|plural, ?|house|NN|singular, ... } Generate surface forms given the lemma and linguistic factors {houses|house|NN|plural, homes|home|NN|plural, buildings|building|NN|plural, shells|shell|NN|plural, house|house|NN|singular, ... } Synchronous factored models Translation steps: on the phrase level Generation steps: on the word level Training Prepare on training data (automatic tools on the corpus to add information) Establish word alignment (symmetrized GIZA++ alignments) Map steps to form components of the overall model Extract phrase pairs that are consistent with the word alignment Estimate scoring functions (conditional phrase translation probabilities or lexical translation probabilities) Word alignment Extract phrase natürlich hat john # naturally john has Extract phrase for other factors • ADV V NNP # ADV NNP V Training the generation model On the output side only: • No word alignment • Additional monolingual data may be used Learn on a word-for-word basis Map factor(s) to factor(s) • Example: word → POS and POS → word – The/DET big/ADJ tree/NN – Count collection: count( the, DET )++ count( big, ADJ )++ count( tree, NN )++ – Probability distributions (maximum likelihood estimates) p( the | DET ) and p( DET | the ) p( big | ADJ ) and p( ADJ | big ) p( tree | NN ) and p( NN | tree ) Combination of components Language model Reordering model Translation steps Generation steps Efficient decoding Mapping steps → additional complexity • Single table → multiple tables Pre-computation Prior to the heuristic beam search: • The expansions of mapping steps can be pre-computed • can be stored as translation options All possible translation options are computed before decoding. • No change to fundamental search algorithm Beam search Empty hypothesis New hypothesis by using all applicable translation options Generate further hypothesis in the same manner Cover the full input sentence Highest scoring complete hypothesis = Best translation according to the model Problem Too many translation options to handle caused by a vast increase of expansions by one or more mapping steps Current solution • Early pruning of expansions • Limitation on the number of translation options per input phrase (max: 50) Experiments and results Moses system http://www.statmt.org/moses/ Syntactically enriched output Input Output Word Word Tri-gram POS 7-gram Syntactically enriched output English - German • Europarl, 30 million words, 2006 Model BLEU best published result 18.15% baseline (surface) 18.04% surface + POS 18.15% surface + POS + morph 18.22% Morphological analysis and generation Input Word Lemma POS Morphology Output Word Lemma POS Morphology Morphological analysis and generation Model BLEU baseline (surface) 18.19% + POS LM 19.05% pure lemma / morph model 14.46% backoff lemma / morph model 19.47% German - English • News Commentary data, 1 million words, 2007 Use of automatic word classes Input Output Word Word Tri-gram Word class 7-gram Use of automatic word classes Model BLEU baseline (surface) 19.54% English - Chinese • IWSLT, 39953 sentences, 2006 surface + word class 21.10% Integrated recasing Input Lower-cased Output Lower-cased Mixed-cased Integrated recasing Chinese - English • IWSLT, 39953 sentences, 2006 Model BLEU standard two-pass: SMT + recase 20.65% integrated factored model (optimized) 21.08% References P. Koehn and H. Hoang, "Factored translation models", Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLPCoNLL), vol. 868, p. 876, 2007. P. Koehn, Statistical Machine Translation, Cambridge University Press, UK, pp. 127-130, 2010. P. Porkaew, A. Takhom and T. Supnithi, "Factored Translation Model in English-to-Thai Translation", Eighth International Symposium on Natural Language Processing, 2009. S. Li, D. Wong and L. Chao, "Korean-Chinese statistical translation model", Proceedings of the 2012 International Conference on Machine Learning and Cybernetics, Xian, 2012.