presentation slides

advertisement
Amirkabir
University of
Technology
Meta-level Statistical Machine Translation System
Sajad Ebrahimi, Kourosh Meshgi, Shahram Khadivi
and Mohammad Ebrahim Shiri Ahmad Abady
Human Language Technology Lab
Amirkabir University of Technology
IJCNLP 2013, Nagoya, Japan
Amirkabir
University of
Technology
Outline
Introduction
Background
Stacking for classification
Adapting Stacking to SMT
Experiments and Results
Related Work
Conclusion and Future Work
1. Introduction
Amirkabir
University of
Technology

Traditional approaches to System Combination need multiple structurally different SMT
systems.

In this research, we focus on a single SMT system.

We try to introduce a meta-level SMT which can learn how to decrease or modify
translation errors.

To do this, we utilize an Ensemble Learning algorithm, called Stacking.

The basic idea :


a collection of base-level SMTs is generated for obtaining a meta-level corpus

Then a meta-level SMT is trained on this corpus
We address the issue of how to adapt Stacking to SMT.
2. Background
Amirkabir
University of
Technology
2.1 Log-linear model and statistical machine translation :

Given a source string S , the goal of SMT is to find a target string 𝑡 from all possible
translations :
𝑡 = 𝑎𝑟𝑔𝑚𝑎𝑥𝑡1 {𝑝𝑟 𝑡1 𝑠 }

In meta-SMT, given a machinery output 𝑡 , the goal is to find a target sentence 𝑡 :
𝑡 = 𝑎𝑟𝑔𝑚𝑎𝑥𝑡2 {𝑝𝑟 𝑡2 𝑡 }
2.1 Stacking for Classification
Amirkabir
University of
Technology
Overview

Proposed by Wolpert(1992)

learn a meta-level (or level-1) classifier based on the output of base-level (or level-0)
classifiers, estimated via cross-validation as follows:

Define 𝐷
= { 𝑥𝑖 , 𝑦𝑖 , 𝑖 = 1, … , 𝐾}
Feature Vector Class Value
𝐷

J-fold cross-validation
𝐽 disjoint almost equal parts 𝐷1 … 𝐷𝐽
𝐿1 … 𝐿𝑁 : set of different Learning algorithms
2.1 Stacking for Classification
Amirkabir
University of
Technology
Overview

Define
Test set
𝐷 𝑗 , 𝐷\𝐷 𝑗
Training set

At each j-th step, 𝑗 = 1 … 𝐽 given the 𝐿1 … 𝐿full
algorithms,
𝑁 learning
meta-level
data set we invoke each of
them on 𝐷\𝐷 𝑗 to induce 𝐶1 𝑗 … 𝐶𝑁 𝑗 and apply to the test part 𝐷 𝑗 .

The concatenated predictions + the original class value => 𝑀𝐷 𝑗

At the end of the entire cross-validation : 𝑀𝐷
=
𝑗 𝑀𝐷
𝑗, 𝑗
= 1…𝐽
2.1 Stacking for Classification
Amirkabir
University of
Technology
3.1. Overview
𝑀𝐷 is applied to a learning algorithm 𝐿𝑀 to induce meta-level classifier 𝐶𝑀 .
 Finally ,
 All the learning algorithms (𝐶1 … 𝐶𝑁 ) are applied to the entire data set 𝐷
inducing the final base-level classifiers 𝐶1 … 𝐶𝑁 to be used at runtime.


to classify a new instance : the concatenated predictions of all base-level
classifiers form a meta-level vector that is assigned a class value by the metalevel classifier 𝐶𝑀
We adopt stacking to SMT in a
principled way…
3. Adapting Stacking to SMT

Amirkabir
University of
Technology
New Source Sentences
we adapt it to SMT as follows:
𝐷𝑗
𝐷\𝐷 𝑗
𝑆𝑀𝑇𝑗
𝑆𝑀𝑇 𝑃𝑎𝑟𝑎𝑑𝑖𝑔𝑚
𝑗
𝑀𝐷𝑛
𝑆𝑀𝑇 𝑃𝑎𝑟𝑎𝑑𝑖𝑔𝑚
𝑆𝑀𝑇
𝑆𝑀𝑇 𝑃𝑎𝑟𝑎𝑑𝑖𝑔𝑚
𝑚𝑒𝑡𝑎 − 𝑆𝑀𝑇
Target Sentences
3.1 Training base-level SMTs

Amirkabir
University of
Technology
we train 5 phrase-based SMT systems on the training part and obtain the
result of these systems on the corresponding test sets. We need these
results for the next step.
3.2 Training meta-level SMTs

We gathered the n-best outputs of base-level SMTs on the corresponding
test sets to :

build a meta-level corpus using these outputs along with correct human
translations

Then, train a meta-SMT on this new corpus.

We train our meta-SMT on 10 meta-level corpus which is progressively created
from n-best outputs of base-level systems, 𝑛 = 1, … , 10.

we call these systems as meta-SMT (1-best) and meta-SMT (2-best) and so on.
3.3 Tuning meta-level SMTs
Amirkabir
University of
Technology

To build a meta-level development set, we tune 5 base-level SMT systems
on the tuning part and obtain the result of these systems on the
corresponding test sets.

Finally a meta-level development set is created by gathering these
outputs paired with correct human translations to tune meta-level SMTs.
4. Experiments
Amirkabir
University of
Technology
4.1 Data

The corpus that is used for training and cross-validation process is
Verbmobil project corpus
# of sentences
# of words
English
23K
249K
Persian
23K
216K
4. Experiments
Amirkabir
University of
Technology
4.2 Experimental setup

Giza++ => bi-directional word alignment

SRILM => language model training

case-insensitive BLEU => quality measuring

Moses decoder => a phrase-based SMT (both base-level and meta-level)

MERT => tune the feature weights on the development data
4. Experiments
Amirkabir
University of
Technology
4.3 Evaluation
BLEU (%) scores of baseline SMT and meta-SMTs on the Verbmobil test set that
has 250 sentences with four reference translations.
Type of SMT
baseline SMT
Test set
30.47
meta-SMT (1-best)
31.20
meta-SMT (2-best)
31.00
meta-SMT (3-best)
31.37
meta-SMT (4-best)
31.49
meta-SMT (5-best)
31.41
meta-SMT (6-best)
31.05
meta-SMT (7-best)
31.19
meta-SMT (8-best)
31.40
meta-SMT (9-best)
31.30
meta-SMT (10-best)
31.54
4. Experiments
Amirkabir
University of
Technology
4.3 Evaluation
• Some examples:
• Delete a wrong word:
• EN : that is perfect . then we have talked about everything . goodbye .
. ‫ خداحافظ‬. ‫ پس ما همه چیز درباره اش صحبت کردیم میبینم‬. ‫ آن عالی است‬: FA (main) •
. ‫ خداحافظ‬. ‫ پس ما همه چیز دیروز صحبت کردیم‬. ‫ آن عالی است‬: FA (meta) •
• Translate an untranslated word:
• EN : I think we will take the Metropol hotel . could you reserve two single rooms ?
‫ مجزا ؟‬rooms ‫ میتوانیم شما دو رزرو‬Metropol . ‫ من فکر میکنم ما را هتل‬: FA (main) •
‫ میتوانیم شما دو رزرو اتاقها بیندازم تک ؟‬Metropol . ‫ من فکر میکنم ما را هتل‬: FA (meta) •
• EN : yes , I would suggest the flight at a quarter past seven .
. ‫ یک ربع بعد از ساعت هفت‬flight ‫ من را پیشنهاد میکنم‬، ‫ بله‬: FA (main) •
. ‫ من را پیشنهاد میکنم پرواز یک ربع بعد از ساعت هفت‬، ‫ بله‬: FA (meta) •
4. Experiments
Amirkabir
University of
Technology
4.3 Evaluation
• Some examples:
• Rephrase and reordering :
• EN : the best thing would be for us to take the subway from our hotel to the station.
. ‫ بهترین چیز برای ما خواهد بود تا را از مترو هتل ما تا ایستگاه‬: FA (main) •
. ‫ بهترین چیز برای ما خواهد بود تا از هتل ما تا ایستگاه مترو‬: FA (meta) •
4. Experiments
Amirkabir
University of
Technology
4.3 Evaluation


two factors possibly contribute to these results :

performing cross-validation on the training set

the re-optimization on the system
we perform two experiments to investigate the effect of each factor :

(Straight1) => test the approach without any cross-validation process, but with
the development set obtained from stacking.

(Straight2) => to build meta-level SMTs tuned with a development set which is
obtained directly from baseline SMT (i.e., without performing cross-validation
on it).
4. Experiments
Amirkabir
University of
Technology
4.3 Evaluation
32
31
30.5
30
29.5
n-best list
10
9
8
7
6
5
4
3
2
1
el
in
e
29
ba
s
% BLEU
31.5
Stacking
Straight1
Straight2
Comparison of Stacking, Straight1 and Straight2
4. Experiments
Amirkabir
University of
Technology
4.3 Evaluation
 After analyzing the results:
 it can be concluded that both factors, i.e., cross-validation and
re-optimizing the system with the stacking-based development
set, are important to outperform the baseline SMT system. Since
use of both factors, consistently lead to the best results.
 We conducted statistical significance tests using paired bootstrap
resampling proposed by Koehn (2004) to measure the reliability of
the conclusion that meta-SMTs are really better than baseline SMT. It
is observed that all stacking-based meta-SMTs are really better than
the baseline SMT in 99% of the times.
5. Related Work
Amirkabir
University of
Technology
 Xiao et al. (2010) presented a general solution for adaption of
bagging and boosting to SMT. Their results showed that
ensemble learning algorithms are promising in SMT.
 Simard et al. (2007a), trained a “mono-lingual” Phrase-based
SMT system on the output of an RBMT system for the source side
of the training set of the Phrase-based SMT system and the
corresponding human translated (manually post-edited)
reference.
 Béchara et al. (2011) designed a full phrase-based SMT pipeline
that included a translation step and a post-editing step. They
use a novel context aware approach.
5. Conclusion and future work
Amirkabir
University of
Technology
 We have presented a simple and effective approach to
translation error modification by building a meta-level SMT using
a meta-level corpus that is created form original corpus by cross
validation.
 Experimental results showed that such a meta-SMT can fix
many translation errors that occur in the baseline translations.
 As a future work, we have planned to develop a technique for
combining multiple SMT systems using stacked generalization
algorithm
5. Conclusion and future work
Amirkabir
University of
Technology
 Moreover, we are running more tests with different languagepairs and larger corpora.
 As another future work, we will apply our framework under
different SMT paradigms such as hierarchical phrase-based SMT
and syntax-based SMT.
6. References
1.
2.
3.
4.
5.
6.
7.
Amirkabir
University of
Technology
Almut Silja Hildebrand and Stephan Vogel. 2008. Combination of machine translation systems
via hypothesis selection from combined n-best lists. In Proc. of the 8th AMTA conference, pages
254-261.
Evegeny Matusov, Nicola Ueffing and Hermann Ney. 2006. Computing consensus translation
from multiple machine translation systems using enhanced hypotheses alignment. In Proc. of
EACL 2006, pages 33-40.
Antti-Veikko Rosti, Spyros Matsoukas and Richard Schwartz. 2007. Improved word-level system
combination for machine translation. In Proc. of the 45th Annual Meeting of the Association for
Computational Linguistics, pages. 312-319.
Michel Simard, Cyril Goutte, and Pierre Isabelle. 2007a. Statistical phrase-based post-editing. In
NAACL-HLT, pages 508-515
Béchara, H., Y. Ma, and J. van Genabith. 2011. Post-editing for a statistical MT system. In MT
Summit XIII, pages 308-315
David H. Wolpert. 1992. Stacked generalization. Neural Networks, 5(2): 241-259.
Leo Breiman. 1996b. Bagging predictors. Machine Learning, 24(2):123-140.
Amirkabir
University of
Technology
THANK YOU
Download