Tuning SMT

advertisement
Tuning SMT
June 3 2014 Overview
•  Brief recap about SMT •  A new approach •  Tuning based on MERT •  Tuning based on PRO •  Tuning based on MIRA SMT and genera1ve/noisy channel model •  We want to translate from f (source) to e (target) •  Noisy channel model •  p (f | e) is the translaKon model •  p (e) is the language model •  Search for best target e = argmaxe p (f | e) * p(e) •  Assume independences and •  Use MLE to define the parameters for the models •  Enhanced version where p (f | e) is decomposed into: •  Phrase based model •  DistorKon model SMT and discrimina1ve model
Example of features
Log linear model an example •  We have idenKfied a set of ”basic features” for each (f, e) •  Language model p(e) is the feature h1 for (e,f) •  Phrase model p(pm) is the feature h2 for (e,f) •  DistorKon model p(d) is the feature h3 for (e,f) •  We assume that we have defined the weights •  P(ei | f) gets a score and a probability •  sum(h1i*w1+h2i*w2+h3i*w3) •  p (ei| f) = (1/Z )* exp(sum(h1i*w1+h2i*w2+h3i*w3)) •  e* = argmax_e (p(e|f)) Log-­‐linear model, ques1ons? •  What is the meaning of the scores? •  How do we define good features? •  Feature selecKon or feature engineering process •  How do we define good weights (or parameters)? •  parameter tuning/supervised machine learning Parameter Tuning, an overview • OpKmize the weights in the log-­‐linear model •  Assume that we have defined features •  Examples of features are phrase transla6on model, language model, reordering model, backward phrase transla6on probability, etc. •  Metrics to evaluate translaKon quality automaKcally •  Tuning set •  Online or batch •  Online opKmizes the weights aaer processing each sentence •  Batch opKmizes the weights aaer processing the whole data •  Algorithms/methods to perform tuning •  Minimum Error Rate Training (MERT) •  Pairwise Ranking OpKmizaKon (PRO) •  Margin Infused Relaxed Algorithm (MIRA) Automa1c evalua1on criteria of transla1on quality Tuning Set
•  Limited amount of sentences (1000-­‐2000) •  Each sentence has been translated into corresponding n-­‐best list Minimum Error Rate Training (MERT) MERT overview in Moses models Tuning set Decoder n-­‐best Scorer/OpKmizer inner loop ini6al weights weights Outer loop opKmal weights MERT summary
•  RelaKvely simple and very established •  Moses /MERT support different automaKc evaluaKon metrics • 
• 
• 
• 
BLEU TranslaKon Edit Rate (TER) PosiKon Independent Error Rate (PER) Cover Disjoint Error Rate (CDER) •  MERT only opKmizes based on the best hypothesis •  Does not scale well (search all direcKons) •  Can support up 15-­‐30 features (Moses) •  Batch opKmizaKon Pairwise Ranking Op1miza1on (PRO)
•  Purpose of PRO is to support scalability •  OpKmizaKon by ranking pair of translaKons •  Feature space based on difference of features for each translaKon pair •  Linear binary classficaKon •  Define a gold scoring funcKon to rank translaKons •  BLEU+1 used to score each sentence •  Only impact the opKmizaKon funcKon ”inner loop” •  Batch opKmizaKon PRO defini1on (1)
•  Define the relaKon: •  g(e1) > g(e2) ó h(e1) > h(e2) •  g(e) is the gold evaluaKon metric •  h(e) is the model score based wT.x(e,f) •  where w is the weight vector •  x(e,f) is the feature vector •  h(e1) – h(e2) > 0 ó wT.x(e1,f) – wT.x(e2,f) > 0 •  wT.(x(e1,f) – x(e2,f)) > 0 PRO defini1on (2) •  Define binary classifier using a new feature space •  wT.x_d > 0 is a binary classifier •  x(e_i,f) – x(e_k,f) > 0 corresponds to ”+” •  x(e_l,f) – x(e_t,f) < 0 corresponds to ”-­‐” •  A sampling funcKon reduces the number of difference vectors otherwise too much data Summary of PRO process
1. 
2. 
3. 
4. 
5. 
6. 
7. 
Generate the n-­‐translaKon hypotheses for each sentence Calculate the gold scores for each translaKon Sample the score differences to define training data Feed the training data to a linear classifier The weights generated by the classfier are the opKmal weights Go back to the decoder with the new weights Go to 1 Margin Infused Relaxed Algorithm (MIRA)
MIRA Summary
•  Tuning translaKon system with very large number of features •  Basic MIRA is online •  There is a version of MIRA that is batch based 
Download