Tuning SMT

Tuning SMT June 3 2014 Overview •  Brief recap about SMT •  A new approach •  Tuning based on MERT •  Tuning based on PRO •  Tuning based on MIRA SMT and genera1ve/noisy channel model •  We want to translate from f (source) to e (target) •  Noisy channel model •  p (f | e) is the translaKon model •  p (e) is the language model •  Search for best target e = argmaxe p (f | e) * p(e) •  Assume independences and •  Use MLE to define the parameters for the models •  Enhanced version where p (f | e) is decomposed into: •  Phrase based model •  DistorKon model SMT and discrimina1ve model Example of features Log linear model an example •  We have idenKfied a set of ”basic features” for each (f, e) •  Language model p(e) is the feature h1 for (e,f) •  Phrase model p(pm) is the feature h2 for (e,f) •  DistorKon model p(d) is the feature h3 for (e,f) •  We assume that we have defined the weights •  P(ei | f) gets a score and a probability •  sum(h1i*w1+h2i*w2+h3i*w3) •  p (ei| f) = (1/Z )* exp(sum(h1i*w1+h2i*w2+h3i*w3)) •  e* = argmax_e (p(e|f)) Log-‐linear model, ques1ons? •  What is the meaning of the scores? •  How do we define good features? •  Feature selecKon or feature engineering process •  How do we define good weights (or parameters)? •  parameter tuning/supervised machine learning Parameter Tuning, an overview • OpKmize the weights in the log-‐linear model •  Assume that we have defined features •  Examples of features are phrase transla6on model, language model, reordering model, backward phrase transla6on probability, etc. •  Metrics to evaluate translaKon quality automaKcally •  Tuning set •  Online or batch •  Online opKmizes the weights aaer processing each sentence •  Batch opKmizes the weights aaer processing the whole data •  Algorithms/methods to perform tuning •  Minimum Error Rate Training (MERT) •  Pairwise Ranking OpKmizaKon (PRO) •  Margin Infused Relaxed Algorithm (MIRA) Automa1c evalua1on criteria of transla1on quality Tuning Set •  Limited amount of sentences (1000-‐2000) •  Each sentence has been translated into corresponding n-‐best list Minimum Error Rate Training (MERT) MERT overview in Moses models Tuning set Decoder n-‐best Scorer/OpKmizer inner loop ini6al weights weights Outer loop opKmal weights MERT summary •  RelaKvely simple and very established •  Moses /MERT support different automaKc evaluaKon metrics •  •  •  •  BLEU TranslaKon Edit Rate (TER) PosiKon Independent Error Rate (PER) Cover Disjoint Error Rate (CDER) •  MERT only opKmizes based on the best hypothesis •  Does not scale well (search all direcKons) •  Can support up 15-‐30 features (Moses) •  Batch opKmizaKon Pairwise Ranking Op1miza1on (PRO) •  Purpose of PRO is to support scalability •  OpKmizaKon by ranking pair of translaKons •  Feature space based on difference of features for each translaKon pair •  Linear binary classficaKon •  Define a gold scoring funcKon to rank translaKons •  BLEU+1 used to score each sentence •  Only impact the opKmizaKon funcKon ”inner loop” •  Batch opKmizaKon PRO defini1on (1) •  Define the relaKon: •  g(e1) > g(e2) ó h(e1) > h(e2) •  g(e) is the gold evaluaKon metric •  h(e) is the model score based wT.x(e,f) •  where w is the weight vector •  x(e,f) is the feature vector •  h(e1) – h(e2) > 0 ó wT.x(e1,f) – wT.x(e2,f) > 0 •  wT.(x(e1,f) – x(e2,f)) > 0 PRO defini1on (2) •  Define binary classifier using a new feature space •  wT.x_d > 0 is a binary classifier •  x(e_i,f) – x(e_k,f) > 0 corresponds to ”+” •  x(e_l,f) – x(e_t,f) < 0 corresponds to ”-‐” •  A sampling funcKon reduces the number of difference vectors otherwise too much data Summary of PRO process 1.  2.  3.  4.  5.  6.  7.  Generate the n-‐translaKon hypotheses for each sentence Calculate the gold scores for each translaKon Sample the score differences to define training data Feed the training data to a linear classifier The weights generated by the classfier are the opKmal weights Go back to the decoder with the new weights Go to 1 Margin Infused Relaxed Algorithm (MIRA) MIRA Summary •  Tuning translaKon system with very large number of features •  Basic MIRA is online •  There is a version of MIRA that is batch based

Tuning SMT

Related documents

Products

Support

Tuning SMT

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib