Generation in the Context of MT Final Report 8/22/2002 The Team – Senior members & affiliate members • • • • Jan Hajič, Charles Univ., Prague Drago Radev, Univ. of Michigan Gerald Penn, Univ. of Toronto Jason Eisner, Johns Hopkins Univ. Owen Rambow, Univ. of Pennsylvania Dan Gildea, Univ. of Pennsylvania Bonnie Dorr, Univ. of Maryland – Students: • Yuan Ding, Univ. of Pennsylvania • Terry Koo, MIT • Jan Cuřín, Charles University Martin Čmejrek, Charles Univ., Prague Kristen Parton, Stanford Univ. Ivona Kučerová, Charles University – Pre-workshop work (Charles University): • Zdeněk Žabokrtský • Václav Honetschläger • Vladislav Kuboň 8/22/2002 Petr Pajas Alena Böhmová Jiří Havelka The Goal • Generate English (linear surface form) – from syntactic-semantic sentence representation (so-called “tectogrammatical”, or TR) • Possible application setting: – machine translation – other uses: • Front-end for QA systems, summarization • Evaluate under various circumstances 8/22/2002 Tectogrammatical Representation According to his opinion UAL’s executives were misinformed about the financing of the original transaction 8/22/2002 Tectogrammatical Representation According to he opinion UAL’s executive were misinform about the financing of the original transaction 8/22/2002 TR in Machine Translation NULL 8/22/2002 Vedení UAL bylo podle jeho názoru o financování původní transakce nesprávně informováno. The MT Framework TR trees WS’02 transfer (tectogrammatics, TR) deep syntax to surface syntax word order “surface” syntax morphology/tagging 8/22/2002 lemmatized,POS punctutation lemmatized,POS morphology (gen.) Source language text Target language text CZECH ENGLISH The MT Framework TR trees AR trees CZECH 8/22/2002 ENGLISH Tools and Data Resources • Tools: • WS98 Czech parser + other Czech tools (tagger) • GIZA (WS99) + ISI decoder • Data: • • • • PTB (40k sentences) PTB translation to Czech (11k sentences) Prague Dependency Treebank 1.0 (90k sentences) Prague Dependency Treebank 2.0 preliminary – 15k sentences manually annotated • Monolingual data 8/22/2002 The Evaluation Metric: BLEU • Plain English output (MT, Generation): – difficult and/or expensive to evaluate subjectively • BLEU (IBM): – – – – automatic method, score 0..1 relative scores subjective human evaluation needs several reference “gold standards” n-gram-based metric w/small-length penalty • Different “local” evaluations throughout, too 8/22/2002 Presentation Outline • The Systems and Their Inputs – Getting the data & tools ready • The Statistical Generation System – The channel model – Word order, Punctuation, Morphology • • • • 8/22/2002 The Hybrid Approach Evaluation Results Student Project Proposals Conclusions and Future Directions Where are we? Transfer English TR to AR Deep syntax (Czech) Word Order Punctuation Morphology CZECH 8/22/2002 ENGLISH The Systems and Their Inputs Martin Čmejrek 8/22/2002 WS02GMT System 1: statistical System 2: hybrid Output: English linear surface form Input 1: automatically created English TR Input 2: manually created English TR Input 3: improved automatic English TR (PropBank) Input 4: Czenglish TR (simple translation) 8/22/2002 Input 1: Automatic English TR Penn Treebank v. 3 + heads (Jason Eisner’s code + modifications) + lemmatization + word IDs + rule-based transformation to English AR, TR (by Kučerová & Žabokrtský) English TR (I1), size: 40k sentences 8/22/2002 Input 2: Manual English TR Penn Treebank v. 3 Input 1 + manual annotation (correction) (IK) including: deep word order, conversion of grammatical codes English TR (I2), size: 1.5k sentences 8/22/2002 Input 3: Enhanced Automatic English TR Penn Treebank v. 3 Input 1 + PropBank + additional sources English TR (I3): size: 40k sentences 8/22/2002 Input 4: Automatic Czenglish TR Linear Surface Czech + Czech tagging & lemmatization + Parsed to Czech AR, Czech TR + [Simple] Transfer (Lemma translation) - lexical replacement dictionary collected from web, MRDs + trained on TR lemmas by GIZA “Czenglish” TR (I4): 11k sentences 8/22/2002 Dictionary Filtering 4 Czech/English Dictionary Sources (WinGED, GNU/FDL, PCTrans, EuroWordNet) Merging, Pruning Czech POS English POS Czech/English parallel Penn TreeBank Corpus GIZA++ Training Czech/English Dictionary for Transfer 8/22/2002 Frequencies on English Monolingual Corpus (North American News Text) 365 M words Word-by-word translation of TR lemmas - Word by word dictionary: 42 835 entries, 65408 translations - format: <e>tečka<t>N <tr>spot<trt>N<prob>0.353598 <tr>dot<trt>N<prob>0.28792 <tr>full @stop<trt>N<prob>0.28729 - 1-1, 1-2 (2-1 translations not yet implemented) - packed forest representation for multiple translation choice - simplified version – choose the first best 8/22/2002 Where are we? Transfer w/additional info English TR to AR Deep syntax (Czech) Word Order Punctuation Morphology CZECH 8/22/2002 ENGLISH Automatically Annotating a Tectogrammatical Corpus Owen Rambow 8/22/2002 Goal • Use PropBank annotations to – Improve automatic construction of English TRs – Allow generation from “generic” pred-arg structures 8/22/2002 Types of Corpus Annotation • • • • • 8/22/2002 Surface Syntax Deep Syntax Local Lexical Semantics Global Lexical Semantics Hybrid: Deep Syntactic/Global Semantic = Tectogrammatical level used here Surface Syntax E.g., Penn Treebank loads subj John loaded prepobj obj hay into comp trucks John loads hay into trucks 8/22/2002 prepobj prepobj by subj hay into comp John is comp trucks Hay is loaded into trucks by John Deep Syntax E.g., TAG load subj John John loads hay into trucks 8/22/2002 obj2 obj hay truck Hay is loaded into trucks by John Local Semantics Penn PropBank (brand new) load arg0 John John loads hay into trucks 8/22/2002 arg1 arg2 hay truck John loads trucks with hay Global Semantics LCS (U. Md.) load agent John goal theme hay truck John loads hay into trucks 8/22/2002 throw agent John goal theme hay truck John throws hay into trucks Tectogrammatical Representation • First two syntactic arguments of verb: deep-syntactic • All other arguments: global semantic load act John pat acmp hay truck John loads trucks with hay 8/22/2002 load act John throw dir3 pat hay truck John loads hay into trucks act John dir3 pat hay truck John throws hay into trucks Why Use TR? Research Hypothesis: • Replacing function words by TR arc labels makes transfer easier • Choice of realization: target language-dependent • Deep-syntactic labels for first two arguments: realization more verb-specific • Global semantic labels on remaining arguments: realization just label-specific 8/22/2002 Available Resources for Input 3 • Surface syntax: PTB corpus (hand, checked) • Deep syntax: derived automatically from PTB (Chen01) • Local semantics: PropBank corpus and frame lexicon (hand, checked) • Global semantics: LCS lexicon (partially hand, partially checked) • TR: PTB subset corpus (hand), PropBank TR dictionary (hand, not checked) (I. Kučerová) 8/22/2002 Experiment: Machine Learning of TR Labels Using Ripper • Ripper (Cohen 1996) = greedy symbolic rule learner, set- and bag-valued features • Features: – Surface, deep syntactic info – Local, global semantic info – Kučerová’s PropBank TR dictionary (handcrafted) – Input 1 (Automatic English TR) 8/22/2002 Results (TR Label Error Rates) Semantics Syntax none local localglobal all PB TR dict none 58.8% 25.9% 23.7% 22.6% 37.7% Input 1 19.5% 17.7% 16.3% 15.9% 17.1% surfacedeep surfacedeep-Inp1 16.5% 16.4% 17.1% 16.7% 16.2% 15.5% 15.9% 16.2% 16.1% 14.4% Average accuracy on 5-fold cross-validation (1326 data points) 8/22/2002 Conclusions • Machine learning can improve on handwritten conversion rules (= Input 1) • PropBank is useful • Best results: – All syntactic features + PropBank TR dictionary • Future work: use PropBank LCS dictionary (developed during workshop) 8/22/2002 Where are we? Transfer English TR to AR Deep syntax (Czech) Word Order Punctuation Morphology CZECH 8/22/2002 ENGLISH The MAGENTA System • Statistically based • The pipeline: – – – – 8/22/2002 TR to AR by a channel model Word order by reordering on dep. trees Punctuation insertion Morphology Where are we? Transfer English TR to AR Deep syntax (Czech) Word Order Punctuation Morphology CZECH 8/22/2002 ENGLISH The Tree-to-Tree Transductions a A Jason Eisner . c b d f B C+D E prep e F det 8/22/2002 Translating trees a A c b d inform B C+D misinform learn this 2:1 mapping (or in dictionary) wrongly E f prep Also 1:2, 2:0, etc., & rearrangements ... e F 8/22/2002 det 0:1 mapping Translating trees a A c b B C+D d E f prep e F 8/22/2002 det Statistical: Need a model of tree pairs Mainly interested in (TR,AR) pairs But our techniques are quite general E.g., example below is not a (TR,AR) pair “the girl kissed her kitty cat” “the girl gave a kiss to her cat” Pred, Pred kissed Subj girl Subj, Det Det,the Obj Obj,cat kitty Det her Det, 8/22/2002 S gave S, NP girl NP, Det,the Det NP,kiss Det,a Det PP to PP, NP cat NP, Det Det,her Training: Our team has many tree pairs Should be nicer to model than string pairs - why we built them! What Czech trees went with what English trees in training? ... Learn parameters of a joint model P(T1,T2). “the girl kissed her kitty cat” “the girl gave a kiss to her cat” Pred, Pred kissed Subj girl Subj, Det Det,the Obj Obj,cat kitty Det her Det, 8/22/2002 S gave S, NP girl NP, Det,the Det NP,kiss Det,a Det PP to PP, NP cat NP, Det Det,her Decoding: Complete a tree pair Training: given T1 and T2 find to maximize P(T1,T2) Decoding: given T1 and find T2 to maximize P(T1,T2) Horrible sparse data problem - can’t just do tree lookup. “the girl kissed her kitty cat” Pred, Pred kissed Subj girl Subj, Det Det,the Obj Obj,cat kitty Det her Det, 8/22/2002 ?? How should a model of tree pairs look? Joint model P(T1,T2). Wise to use noisy-channel form: P(T1 | T2) * P(T2) But any joint model will do. could be trained on zillions of individual English AR trees train on paired trees could also take advantage of English-Czech dictionaries 8/22/2002 How should a model P (T1,T2) of tree pairs look? Intuition: some kind of correspondence between words. Try to learn correspondence using EM alignment (could seed with a dictionary). “the girl kissed her kitty cat” “the girl gave a kiss to her cat” Pred, Pred kissed Subj girl Subj, Det Det,the Obj Obj,cat kitty Det her Det, 8/22/2002 S gave S, NP girl NP, Det,the Det NP,kiss Det,a Det PP to PP, NP cat NP, Det Det,her How should a model P (T1,T2) of tree pairs look? Intuition: some kind of correspondence between words. Try to learn correspondence using EM alignment (could seed with a dictionary). “the girl kissed her kitty cat” “the girl gave a kiss to her cat” Pred, Pred kissed Subj girl Subj, Det Det,the kitty Obj Obj,cat S gave S, NP girl NP, Det,the Det NP,kiss Det,a Det Det her different, bad alignment! Det, 8/22/2002 PP to PP, NP cat NP, Det Det,her How should a model P (T1,T2) of tree pairs look? Intuition: some kind of correspondence between words. Try to learn correspondence using EM alignment (could seed with a dictionary). So model must consider alignment: P (T1,T2,A) Why A is complicated: •The correspondence isn’t 1 to 1 •Also need to model word order (indeed topology) “the girl kissed her kitty cat” 8/22/2002 • kiss gave a kiss “the girl gave a kiss • kitty cat cat to her cat” • to Solution : Use the right grammar formalism Grammars can assemble words or phrases into trees. Let’s work up to the “right” formalism. Model must consider alignment: P (T1,T2,A) Why A is complicated: •The correspondence isn’t 1 to 1 •Also need to model word order (indeed topology) “the girl kissed her kitty cat” 8/22/2002 • kiss gave a kiss “the girl gave a kiss • cat kitty cat to her cat” • to Context-Free Grammar “the girl kissed her cat” S NP VP Det N the girl V NP Det etc. 8/22/2002 N Augment CFG nonterminals with headwords “the girl kissed her cat” S S, S kissed S, S kissed NP, NP girl Det, Det the the N, girl girl VP,kissed V, kissed Det etc. 8/22/2002 NP NP,girl NP, NP cat N,cat Det Det, N, girl VP,kissed VP,kissed V, kissed NP N, the girl the girl NP,cat Det N,cat Augment CFG nonterminals with headwords “the girl kissed her cat” S, S kissed NP VP,kissed V, kissed kissed etc. 8/22/2002 NP look at all the rules headed by kissed ... Lexicalized Tree Substitution Grammar “the girl kissed her cat” S, S kissed NP VP,kissed V, kissed kissed etc. 8/22/2002 NP look at all the rules headed by kissed ... a natural chunk = open role waiting to be filled = can fill open roles higher up Lexicalized Tree Substitution Grammar “the girl kissed her cat” S S S NP Det the VP N girl kissed 8/22/2002 V NP NP NP Det the N cat Det Det VP N V NP NP kissed the girl Det N Lexicalized Tree Substitution Grammar S one “parse” of the tree into elementary subtrees NP Det the VP N girl kissed 8/22/2002 V NP NP NP Det her S S N cat Det Det VP N V NP NP kissed the girl Det N Dependency-Style Lexicalized Tree Substitution Grammar Simplify structure: Eliminate extra internal nodes Just one node per word (“dependency style”) Yields the kind of AR and TR trees we actually have S,kissed S, S (kissed) NP VP,(kissed) V, NP NP Pred,kissed (kissed) kissed 8/22/2002 NP Subj Obj Dependency-Style Lexicalized Tree Substitution Grammar “the girl kissed her kitty cat” Pred Pred,kissed Pred, Pred kissed Subj girl Subj, Det Det,the Obj Obj,cat Obj Obj,cat kitty Det, Det her 8/22/2002 Subj Subj,girl Det Det,the kitty Det Det,her Synchronous Dependency-Style Lexicalized Tree Substitution Grammar “the girl kissed her kitty cat” “the girl gave a kiss to her cat” Pred, Pred kissed Subj girl Subj, Det Det,the Obj Obj,cat kitty Det her Det, 8/22/2002 S gave S, NP girl NP, Det,the Det NP,kiss Det,a Det PP to PP, NP cat NP, Det Det,her Synchronous Dependency-Style Lexicalized Tree Substitution Grammar “the girl kissed her kitty cat” “the girl gave a kiss to her cat” Pred, Pred kissed Subj girl Subj, Det Det,the Obj Obj,cat kitty Det her Det, 8/22/2002 S gave S, NP girl NP, Det,the Det NP,kiss Det,a Det PP to PP, NP cat NP, Det Det,her Synchronous Dependency-Style Lexicalized Tree Substitution Grammar “the girl kissed her kitty cat” “the girl gave a kiss to her cat” S Pred Pred,kissed Subj Subj,girl Det Det,the Obj Obj Obj,cat kitty Det Det,her 8/22/2002 S,gave NP NP,girl Det Det,the NP,kiss Det Det,a PP PP,to NP NP,cat Det Det,her Synchronous Dependency-Style Lexicalized Tree Substitution Grammar “the girl kissed her kitty cat” “the girl gave a kiss to her cat” S Pred Pred,kissed Subj Subj,girl Det Det,the Obj Obj Obj,cat kitty Det Det,her 8/22/2002 S,gave NP NP,girl Det Det,the NP,kiss Det Det,a PP PP,to NP NP,cat Det Det,her Condition generation of t1, t2 on their joint root nonterminals P(T1, T2, A) = p(t1,t2,a | n) So any aligned BIG TREE PAIR is built from a set of aligned LITTLE TREE PAIRS Pred Pred,kissed Subj Subj,girl Det Det,the 8/22/2002 kitty Obj Obj Obj,cat S S,gave NP NP,girl Det Det,the NP,kiss Det Det,a PP PP,to NP NP,cat P(T1, T2, A) = p(t1,t2,a | n) How This Simplifies Things • Alignment: find A to max P(T1,T2,A) • Decoding: find T2, A to max P(T1,T2,A) • Training: find to max A P(T1,T2,A) • Do everything on little trees instead! • Only need to train & decode a model of p(t1,t2,a) • But not sure how to break up big tree correctly 8/22/2002 – So try all possible little trees & all ways of combining them, by dynamic prog. System Architecture Probability Model p(t1,t2,a) of Little Trees propose little translations t2 score little trees update parameters make p(...) big find p(...) raise p(...) for each possible t1, various (t1,t2,a) each proposed p(t1,t2,a) Decoder scores all alignments between a big tree T1 & a forest of big trees T2 each possible p(t1,t2,a) in T1, T2 inside-outside estimated counts p(t1,t2,a | T1,T2) Trainer scores all alignments of two big trees T1,T2 dynamic programming engine Viterbi 8/22/2002 alignment yields output T2 System Architecture Probability Model p(t1,t2,a) of Little Trees propose little translations t2 score little trees update parameters Decoder Trainer dynamic programming engine output 8/22/2002 Related Work • Synchronous grammars (Shieber & Schabes 1990) – Statistical work has allowed only 1:1 (isomorphic trees) • Stochastic inversion transduction grammars (Wu 1995) • Head transducer grammars (Alshawi et al. 2000) • Statistical tree translation – Noisy channel model (Yamada & Knight 2000) • Infers tree: trains on (string, tree) pair, not (tree, tree) pair • But again, allows only 1:1, plus 1:0 at leaves • Statistical tree generation - find most prob. expressing meaning – Dynamic prog. search in packed forest (Langkilde 2000) – Stack decoder (Ratnaparkhi 2000) 8/22/2002 Where are we? Transfer English TR to AR Deep syntax (Czech) Word Order Punctuation Morphology CZECH 8/22/2002 ENGLISH The Little Trees Jan Hajič p( Pred,kissed Subj Obj S,gave NP NP,kiss Det 8/22/2002 PP ) Probability Model p(t1,t2,a) of Little Trees propose little translations t2 p( Pred,kissed Subj update parameters score little trees Obj S,gave NP NP,kiss PP ) Det • Data still sparse, but better than for big trees • No alignment needed - already hypothesized for us 8/22/2002 Form of the model for 1:1 (AR:TR) • Base form – p(cat,PL,PAT,cat,NNS,Obj,alignment) • High-level Backoff – p(cat,cat) * p(PL,NNS) * p(PAT,Obj) * p(alignment) • Low-level Backoff – p(align) * (1/L*T*F) , where • (L = size of <Tlemma,Alemma>, etc.) 8/22/2002 Non-1:1 Correspondences • Joint model – 0:1 • p(to,TO,AuxY,alignment)k01 – 1:0 • p(&Gen;NULL,ACT,align)k10 – 1:2 • p(home,SG,LOC,in,IN,AuxP,home,NNS,Adv,alignment)k12 – etc.; + corresponding backoff scheme 8/22/2002 Smoothing issues • Other backoff schemes? – Too many to do all • Graphical models? – Derive from (manual) alignments • esp. for types of alignment the model cannot handle (1:4, for example) 8/22/2002 Where are we? Transfer English TR to AR Deep syntax (Czech) Word Order Punctuation Morphology CZECH 8/22/2002 ENGLISH The Proposer Yuan Ding 8/22/2002 Map TR to AR Suggest Suggest <VBG> Secretary Secretary <NNP> American 8/22/2002 an <T> American <J> of <IN> Proposer for Decoder • Collecting Feature Patterns on TR • Construct AR using observed possible TRAR transform • For unobserved TR, using naïve mapping onto AR 8/22/2002 Proposer: Observes during Training Observed TR-AR transform TR-AR transform hash table TR feature pattern set 8/22/2002 Proposer: During Decoding TR filled TR-AR pair TR feature pattern set Query with TR Features Found in TR-AR Transform hash table? NO Construct AR naively 8/22/2002 Construct AR using YES observed transform Example Secretary Secretary Root.Trlemma=? FirstChild.Functor=? 8/22/2002 of APP APP State Secretary Look up in TR-AR transform hash table State Where are we? Transfer English TR to AR Deep syntax (Czech) Word Order Punctuation Morphology Evaluation CZECH 8/22/2002 ENGLISH The Classifier(s) Terry Koo 8/22/2002 Tree Transduction Model TR Little Tree Proposer TR + AR Proposals Prelabeled Full TR Tree 8/22/2002 Tree Transduction Model’s Decoder • Global information in labels suppress proposals Preposition Insertion Labeler • C5.0 decision tree classifier • Labels: nothing, insert_of, … Gov talk Gov talk to Jane Self insert_to 8/22/2002 Inserted Preposition Jane Self Preposition Insertion Labeler • Trained on Input 1 (Automatic English TR) TR Lemma, TR Func, POS Gov TR Func, POS TR Func, POS 8/22/2002 LBro Self LSon TR Func, POS TR Func, RBro POS RSon TR Func, POS Preposition Insertion Labeler • Some TR nodes should be ignored: “fly to Baltimore and from Boston” fly fly and and to Baltimore 8/22/2002 and and from Boston to Baltimore from Boston Boosting Insertion Recall • Overgenerating better than undergenerating • Using C5.0’s misclassification costs to discourage nothing • Training on preposition-only data 8/22/2002 Boosting Insertion Recall • N Best Labels • Confidence Threshold – N = Average of # Labels • Aggressive Confidence Threshold – N = Average of # Labels 8/22/2002 Insertion Recall vs N Aggressive Confidence Threshold N = 5, R = 84.35% N = 3, R = 80.26% N = 4, R = 80.59% N = 3, R = 76.39% N Best 8/22/2002 Confidence Threshold What should be done next? • Clustering TR Lemmas into a tractable number of classes • “Ripper” instead of C5.0 8/22/2002 Where are we? Transfer English TR to AR Deep syntax (Czech) Word Order Punctuation Morphology CZECH 8/22/2002 ENGLISH Word Order Dan Gildea 8/22/2002 Word order • Tree-based models: • Analytical level surface dependency, tree-based • Collins model • Uses function information (Sb, Obj, Atr, ...), POS, lemmas • 94% of nodes have correct ordering of children (chance: 68%) • No punctuation (inserted later) • Input order completely irrelevant 8/22/2002 Where are we? Transfer English TR to AR Deep syntax (Czech) Word Order Punctuation Morphology CZECH 8/22/2002 ENGLISH Punctuation -Morphology Kristen Parton 8/22/2002 Punctuation Insertion: Motivation • Important for sentence meaning, understanding • BLEU - n-gram statistics – commas are most frequent lemma in WSJ • Focusing on commas (~95% of intra-sentence punctuation) • Difficulties: – English comma usage very flexible – varies with style, meaning of sentence – quotes not marked in TR trees 8/22/2002 Why insert commas separately? • Commas depend not only on underlying syntax/semantics but also on the surface realization of the sentence. • Soon, she will realize her mistake. • ? Soon she will realize her mistake. • She will soon realize her mistake. • ?? She will, soon, realize her mistake. • * She will soon, realize her mistake. • * She will, soon realize her mistake. • Channel model deals with unordered trees • Easier to do comma insertion after surface ordering 8/22/2002 Commas in AR Trees • TR tree - autosemantic words - commas deleted • AR tree - commas are AuxX or Apos (=apposition; governors) • Input Data: ordered, unpunctuated AR tree, with AR and TR functors, POS • Task: insert AuxX nodes into AR tree, and link them in correct surface order. 8/22/2002 Another Example 8/22/2002 Comma Insertion Model • C5.0 decisions tree classifier • Trained on English AR trees with TR functors (sect. 0-19 WSJ) with punctuation stripped • Node Labels: NO-ACTION, INSERT-RIGHT • Feature vectors: • Local features (AFun, TFun, POS) – For node, left/right brother, parent, grandparent • Global features (Zhang 02) (position in sentence, …) 8/22/2002 Decision Tree Model: Results System Implementation Data Baseline Final Accuracy Reduction in error rate Beeferman et al (1998) Trigram LM with WSJ gap insertion penalty 32.90% 54.00% 31.45% Zhang et al (2002) Trigram LM with hidden tags Tech manuals 56.44% 76.90% 46.97% Zhang et al (2002) Decision tree on Amalgam Tech manuals 56.35% 74.94% 42.59% Dependency tree* Decision tree on AR tree WSJ 41.61% 75.70% 58.38% *Preliminary results - still based on hand parsed WSJ • Evaluation metric is sentence accuracy – What is (human) upper bound? • Systems are hard to compare; models and data sets very different 8/22/2002 Results for Generation • Comma insertion improves BLEU score • Possible improvements: – Adding n-gram information to insertion model – Trying with other punctuation marks 8/22/2002 Surface Morphology • Morphology dictionary - 365 M words (Cuřín) • morpha (morph analyzer) - lemmatize words, keep counts • Word -> POS surface_form lemma – – – – NN VBD VBD VBG want-you-babe wanted wanting wanting want-you-babe want want want frequency 1 45595 1 3708 • Task: Lemma + POS -> surface form = reverse lookup – Clashes resolved by frequency 8/22/2002 Surface Morphology Results • OOV rate: 1.69 % – For many, surface form = lemma, so correct by default – English morphology not complex; most OOV are proper nouns • Non-OOV words: 99.74% accuracy – 86% of mistakes were contractions: 'm ~ am, 've ~ have, etc. (Actually correct.) Ignoring these: 99.96% correct rate • Overall, ignoring contractions: error rate of 0.03% • High accuracy rate, fast runtime, good coverage unnecessary to improve more 8/22/2002 Where are we? Transfer English TR to AR Deep syntax (Czech) Word Order Punctuation Morphology CZECH 8/22/2002 ENGLISH Improving Czech Parsing (AR-TR) Gerald Penn 8/22/2002 Improving Czech TR Parsing • Pre-workshop state: – Czech Deep Syntax: mapping AR to TR (Boehmova, Honetschlaeger, Zabokrtsky) – Two parts of the system • rule-based – 19 transformations by order-dependent perl code • statistical – C4.5-based labeling of TR functions; 84% accuracy 8/22/2002 Czech Deep Syntax: mapping AR to TR • New statistical system: – tree transduction model has same form as for generation – little-tree model reversed for parsing (AR to TR mapping) – initial EM pass uses simple model based on PDT (manual) node ID alignment – (reversed) proposer not finished yet 8/22/2002 Where are we? Transfer English TR to AR Deep syntax (Czech) Word Order Punctuation Morphology CZECH 8/22/2002 ENGLISH The Hybrid Approach to Generation: The ARGENT System Dragomir Radev 8/22/2002 Example Target sentence Alan Spoon, recently named Newsweek president, said Newsweek’s ad rates would increase 5 pct in January. PRED say APPS , ACT Spoon RSTR Alan PAT increase ACT name TWHEN recently PAT president ACT &Gen; 8/22/2002 ACT rate APP Newsweek RSTR Newsweek EXT pct RSTR ad RSTR 5 TWHEN January PRED say PAT increase APPS , ACT Spoon RSTR Alan TWHEN recently PAT president ACT &Gen; ACT rate ACT name APP Newsweek EXT pct RSTR ad TWHEN January RSTR 5 RSTR Newsweek Say , increase Spoon name rate pct January Alan recently president Newsweek ad 5 Newsweek Alan Spoon, recently named Newsweek president, said Newsweek ’s ad rates would increase 5 pct in January. 8/22/2002 NLG architectures • Statistical approaches – MAGENTA [Hajič et al. 02] • Rule-based approaches – FUF/Surge [Elhadad 93, Elhadad and Robin 98] – KPML [Bateman 97] • Hybrid approaches – – – – 8/22/2002 NitroGen [Knight and Hatzivassiloglou 95] HaloGen [Langkilde and Knight 00] Fergus [Bangalore and Rambow 00] ARGENT FUF/Surge • What FUF can do (given sufficient control information) – – – – – – – • What FUF cannot do – – – – – 8/22/2002 Maps FUF-style thematic structure onto syntactic roles Performs syntactic paraphrasing and alternations (e.g., dative move, passive) Provides defaults for syntactic features (e.g., present tense, third person) Propagates agreement features Selects closed class words Inflects words Provides linear precedence constraints among syntactic constituents convert dependency to phrase-structure provide control for syntactic paraphrasing provide control for lexical features (conditionals, past tense, …) choose determiners provide a robust grammar (setq r '((process ((lex "say") (tense past) (object-clause that))) (circum ((time ((cat pp) (prep ((lex "in"))) (np ((lex "January") (determiner none))))))) (partic ((affected ((cat clause) (process ((lex "increase") (tense past))) (partic ((created ((cat measure) (quantity ((value 5))) (unit ((lex "pct"))))) (agent ((cat np) (head ((lex "rate") (number plural) (determiner none))) (classifier ((lex "ad") (determiner none))) (possessor ((lex "Newsweek") (determiner none))))))))) (agent ((complex apposition) (punctuation ((after ","))) (distinct ~(((lex "Spoon") (classifier ((lex "Alan"))) (determiner none)) ((lex "name") (classifier ((lex "president"))) (determiner none)))))))))) 8/22/2002 PRED say TR APPS , ACT Spoon RSTR Alan PAT increase ACT name TWHEN recently ACT rate PAT president ACT &Gen; APP Newsweek EXT pct RSTR ad TWHEN January RSTR 5 RSTR Newsweek FUF PARTIC AFFECTED PROCESS CAT clause PARTIC PROCESS CREATED LEX say CIRCUM AGENT AGENT TENSE OBJECT-CLAUSE past that CAT 8/22/2002 CAT HEAD CLASSIFIER POSSESSOR LEX Newsweek CAT pp NP PREP LEX in LEX January DETERMINER none Grammar development • translating TG FUF (deterministic channel) – write high coverage rules first – problem: no aligned training data – four types of rules [Langkilde-Geary 02] - recasting, ordering, filling, morphing • Three modules – Top-level – Recursion – Bottom-level 8/22/2002 Evaluation • Robustness – ARGENT: 245/248 sentences = 98.7% – HaloGen: 80% • Speed – ARGENT: 1.4-2.9 sec/sentence – HaloGen: 28.9-55.5 sec/sentence • BLEU score -- later 8/22/2002 Future work • Complete grammar – improve coverage – use other grammatemes • degree of comparison (comparative) , sentmod (interrogative), verbmod (imperative) • Better error recovery – inconsistent PTB markup, TR transformation, translation • Grammar induction • N-gram based insertion of missing words • Integrate with MAGENTA 8/22/2002 Where are we? Transfer English TR to AR Deep syntax (Czech) Word Order Punctuation Morphology Evaluation CZECH 8/22/2002 ENGLISH The Implemented Systems: Creating Data for Generation Czech Tagger & Parser (WS98, pre-WS02) Czech-English Transfer (WS99, pre-WS02) New Statistical Czech Parser to TR Input3: Improved English TR for training 8/22/2002 The Generation Systems Aligner and Decoder Little Tree Joint Model Proposer Preposition Classifier Word Order by Tree LM Comma insertion Morphology The Hybrid System (TR to FUF translation) 8/22/2002 Evaluation • Evaluation data for BLEU (1-4grams) – devtest/evaltest: 248/249 sentences, 5 ref. translations • Inputs – – – – 1: Automatic English TR 2: Manual English TR 3: Enhanced Automatic English TR 4: Automatic Czenglish TR • Systems: Statistical, Hybrid (FUF-based) 8/22/2002 Upper estimate • 5 reference translations – 1 original WSJ text from PTB – 4 retranslations from Czech to English • 2 US, 2 Czech • Evaluate the translations: – take one out – evaluate against remaining 4 – Average BLEU score: 0.556 8/22/2002 Results • Input 1 (Automatic English TR) BLEU SCORE Baseline (Lemmas, rand.) MAGENTA (best) ARGENT (best) 8/22/2002 Devtest Evaltest 0.040 0.042 0.253 0.244 0.155 0.145 Results • Input 2 (Manual English TR) BLEU SCORE Baseline (Lemmas, Deep WO) MAGENTA (best) ARGENT (best) 8/22/2002 Devtest Evaltest 0.190 0.160 0.237 0.237 Results • Input 3 (Improved Auto English TR) BLEU SCORE Baseline (Lemmas, rand.) MAGENTA ARGENT (best) 8/22/2002 Devtest Evaltest 0.041 0.243 Results • Input 4 (“Czenglish”Automatic TR) BLEU SCORE GIZA (large LM) Baseline (Lemmas, Czech WO) MAGENTA (best) ARGENT (best) Devtest Evaltest Evaltest 1g 0.210 0.190 0.623 0.082 0.059 0.481 0.064 0.042 0.601 0.015 Unigram BLEU score for the reference set 0.844 8/22/2002 Conclusions and Future Work 8/22/2002 The Good News and the Bad News • Good news: – End-to-end, tree-transformation system running • Written in 4 weeks, fully trainable from data • Generates from “semantic” (TR) English significantly better than the baseline – Datasets developed for generation/MT, evaluation • Bad news: – not fully integrated (proposer, little tree model) – on full MT, cannot beat baseline (and yes, GIZA) 8/22/2002 Things To Do (1) • Integrate the proposer • Integrate the preposition classifier • Write more classifiers, integrate – classifiers running in parallel/sequentially? • True EM smoothing (by adaptation of aligner) • Make the system more modular – e.g., declarative specification of smoothing 8/22/2002 Things To Do (2) • The aligner/decoder: – Pruning during aligning/decoding – Better smoothing of the little tree model – More dependence among little trees • through shared nonterminals or lexicalized nonterminals – Little-tree joint model noisy channel model • i.e., integrate Gildea’s tree LM “directly” – Better initial model for EM • ML training off manual alignments 8/22/2002 • Nondeterministic transfer Things To Do (3) • Make use of TR’s deep (discourse) word order • More experiments • with new smoothing, integrated proposer • different order of modules • other punctuation: classifier or inside the model? • Different settings/applications: • AR to TR (parsing) • AR to AR (surface translation) • TR to TR (translation); other languages 8/22/2002 The End The beginning! 8/22/2002