Linguistics 187/287 Week 6 Generation Term-rewrite System Machine Translation Martin Forst, Ron Kaplan, and Tracy King Generation Parsing: string to analysis Generation: analysis to string What type of input? How to generate Why generate? Machine translation Lang1 string -> Lang1 fstr -> Lang2 fstr -> Lang2 string Sentence condensation Long string -> fstr -> smaller fstr -> new string Question answering Production of NL reports – State of machine or process – Explanation of logical deduction Grammar debugging F-structures as input Use f-structures as input to the generator May parse sentences that shouldn’t be generated May want to constrain number of generated options Input f-structure may be underspecified XLE generator Use the same grammar for parsing and generation Advantages – maintainability – write rules and lexicons once But – special generation tokenizer – different OT ranking Generation tokenizer/morphology White space – Parsing: multiple white space becomes a single TB John appears. -> John TB appears TB . TB – Generation: single TB becomes a single space (or nothing) John TB appears TB . TB -> John appears. *John appears . Suppress variant forms – Parse both favor and favour – Generate only one Morphconfig for parsing & generation STANDARD ENGLISH MOPRHOLOGY (1.0) TOKENIZE: P!eng.tok.parse.fst G!eng.tok.gen.fst ANALYZE: eng.infl-morph.fst G!amerbritfilter.fst G!amergen.fst ---- Reversing the parsing grammar The parsing grammar can be used directly as a generator Adapt the grammar with a special OT ranking GENOPTIMALITYORDER Why do this? – parse ungrammatical input – have too many options Ungrammatical input Linguistically ungrammatical – They walks. – They ate banana. Stylistically ungrammatical – No ending punctuation: They appear – Superfluous commas: John, and Mary appear. – Shallow markup: [NP John and Mary] appear. Too many options All the generated options can be linguistically valid, but too many for applications Occurs when more than one string has the same, legitimate f-structure PP placement: – In the morning I left. I left in the morning. Using the Gen OT ranking Generally much simpler than in the parsing direction – Usually only use standard marks and NOGOOD no * marks, no STOPPOINT – Can have a few marks that are shared by several constructions one or two for dispreferred one or two for preferred Example: Prefer initial PP S --> (PP: @ADJUNCT @(OT-MARK GenGood)) NP: @SUBJ; VP. VP --> V (NP: @OBJ) (PP: @ADJUNCT). GENOPTIMALITYORDER NOGOOD +GenGood. parse: they appear in the morning. generate: without OT: In the morning they appear. They appear in the morning. with OT: In the morning they appear. Debugging the generator When generating from an f-structure produced by the same grammar, XLE should always generate Unless: – OT marks block the only possible string – something is wrong with the tokenizer/morphology regenerate-morphemes: if this gets a string the tokenizer/morphology is not the problem Hard to debug: XLE has robustness features to help Underspecified Input F-structures provided by applications are not perfect – may be missing features – may have extra features – may simply not match the grammar coverage Missing and extra features are often systematic – specify in XLE which features can be added and deleted Not matching the grammar is a more serious problem Adding features English to French translation: – English nouns have no gender – French nouns need gender – Soln: have XLE add gender the French morphology will control the value Specify additions in xlerc: – set-gen-adds add "GEND" – can add multiple features: set-gen-adds add "GEND CASE PCASE" – XLE will optionally insert the feature Note: Unconstrained additions make generation undecidable Example The cat sleeps. -> Le chat dort. [ PRED 'dormir<SUBJ>' SUBJ [ PRED 'chat' NUM sg SPEC def ] TENSE present ] [ PRED 'dormir<SUBJ>' SUBJ [ PRED 'chat' NUM sg GEND masc SPEC def ] TENSE present ] Deleting features French to English translation – delete the GEND feature Specify deletions in xlerc – set-gen-adds remove "GEND" – can remove multiple features set-gen-adds remove "GEND CASE PCASE" – XLE obligatorily removes the features no GEND feature will remain in the f-structure – if a feature takes an f-structure value, that fstructure is also removed Changing values If values of a feature do not match between the input f-structure and the grammar: – delete the feature and then add it Example: case assignment in translation – set-gen-adds remove "CASE" set-gen-adds add "CASE" – allows dative case in input to become accusative e.g., exceptional case marking verb in input language but regular case in output language Generation for Debugging Checking for grammar and lexicon errors – create-generator english.lfg – reports ill-formed rules, templates, feature declarations, lexical entries Checking for ill-formed sentences that can be parsed – parse a sentence – see if all the results are legitimate strings – regenerate “they appear.” Rewriting/Transfer System Why a Rewrite System Grammars produce c-/f-structure output Applications may need to manipulate this – Remove features – Rearrange features – Continue linguistic analysis (semantics, knowledge representation – next week) XLE has a general purpose rewrite system (aka "transfer" or "xfr" system) Sample Uses of Rewrite System Sentence condensation Machine translation Mapping to logic for knowledge representation and reasoning Tutoring systems What does the system do? Input: set of "facts" Apply a set of ordered rules to the facts – this gradually changes the set of input facts Output: new set of facts Rewrite system uses the same ambiguity management as XLE – can efficiently rewrite packed structures, maintaining the packing Example F-structure Facts PERS(var(1),3) PRED(var(1),girl) CASE(var(1),nom) NTYPE(var(1),common) NUM(var(1),pl) SUBJ(var(0),var(1)) PRED(var(0),laugh) TNS-ASP(var(0),var(2)) TENSE(var(2),pres) arg(var(0),1,var(1)) lex_id(var(0),1) lex_id(var(1),0) F-structures get var(#) Special arg facts lex_id for each PRED Facts have two arguments (except arg) Rewrite system allows for any number of arguments Rule format Obligatory rule: LHS ==> RHS. Optional rule: LHS ?=> RHS. Unresourced fact: |- clause. LHS clause : match and delete +clause : match and keep -LHS : negation (don't have fact) LHS, LHS : conjunction ( LHS | LHS ) : disjunction { ProcedureCall } : procedural attachment RHS clause : replacement facts 0 : empty set of replacement facts stop : abandon the analysis Example rules "PRS (1.0)" grammar = toy_rules. "obligatorily add a determiner if there is a noun with no spec" +NTYPE(%F,%%), -SPEC(%F,%%) ==> SPEC(%F,def). "optionally make plural nouns singular this will split the choice space" NUM(%F, pl) ?=> NUM(%F, sg). PERS(var(1),3) PRED(var(1),girl) CASE(var(1),nom) NTYPE(var(1),common) NUM(var(1),pl) SUBJ(var(0),var(1)) PRED(var(0),laugh) TNS-ASP(var(0),var(2)) TENSE(var(2),pres) arg(var(0),1,var(1)) lex_id(var(0),1) lex_id(var(1),0) Example Obligatory Rule "obligatorily add a determiner if there is a noun with no spec" +NTYPE(%F,%%), -SPEC(%F,%%) ==> SPEC(%F,def). Output facts: all the input facts plus: SPEC(var(1),def) PERS(var(1),3) PRED(var(1),girl) CASE(var(1),nom) NTYPE(var(1),common) NUM(var(1),pl) SUBJ(var(0),var(1)) PRED(var(0),laugh) TNS-ASP(var(0),var(2)) TENSE(var(2),pres) arg(var(0),1,var(1)) lex_id(var(0),1) lex_id(var(1),0) Example Optional Rule "optionally make plural nouns singular this will split the choice space" NUM(%F, pl) ?=> NUM(%F, sg). PERS(var(1),3) PRED(var(1),girl) CASE(var(1),nom) NTYPE(var(1),common) NUM(var(1),pl) SPEC(var(1),def) SUBJ(var(0),var(1)) Output facts: all the input facts plus choice split: A1: NUM(var(1),pl) A2: NUM(var(1),sg) PRED(var(0),laugh) TNS-ASP(var(0),var(2)) TENSE(var(2),pres) arg(var(0),1,var(1)) lex_id(var(0),1) lex_id(var(1),0) Output of example rules Output is a packed f-structure Generation gives two sets of strings – The girls {laugh.|laugh!|laugh} – The girl {laughs.|laughs!|laughs} Manipulating sets Sets are represented with an in_set feature – He laughs in the park with the telescope ADJUNCT(var(0),var(2)) in_set(var(4),var(2)) in_set(var(5),var(2)) PRED(var(4),in) PRED(var(5),with) Might want to optionally remove adjuncts – but not negation Example Adjunct Deletion Rules "optionally remove member of adjunct set" +ADJUNCT(%%, %AdjSet), in_set(%Adj, %AdjSet), -PRED(%Adj, not) ?=> 0. "obligatorily remove adjunct with nothing in it" ADJUNCT(%%, %Adj), -in_set(%%,%Adj) ==> 0. He laughs with the telescope in the park. He laughs in the park with the telescope He laughs with the telescope. He laughs in the park. He laughs. Manipulating PREDs Changing the value of a PRED is easy – PRED(%F,girl) ==> PRED(%F,boy). Changing the argument structure is trickier – Make any changes to the grammatical functions – Make the arg facts correlate with these Example Passive Rule "make actives passive make the subject NULL; make the object the subject; put in features" SUBJ( %Verb, %Subj), arg( %Verb, %Num, %Subj), OBJ( %Verb, %Obj), CASE( %Obj, acc) ==> SUBJ( %Verb, %Obj), arg( %Verb, %Num, NULL), CASE( %Obj, nom), PASSIVE( %Verb, +), VFORM( %Verb, pass). the girls saw the monkeys ==> The monkeys were seen. in the park the girls saw the monkeys ==> In the park the monkeys were seen. Templates and Macros Rules can be encoded as templates n2n(%Eng,%Frn) :: PRED(%F,%Eng), +NTYPE(%F,%%) ==> PRED(%F,%Frn). @n2n(man, homme). @n2n(woman, femme). Macros encode groups of clauses/facts sg_noun(%F) := +NTYPE(%F,%%), +NUM(%F,sg). @sg_noun(%F), -SPEC(%F) ==> SPEC(%F,def). Unresourced Facts Facts can be stipulated in the rules and refered to – Often used as a lexicon of information not encoded in the f-structure For example, list of days and months for manipulation of dates |- day(Monday). |- day(Tuesday). etc. |- month(January). |- month(February). etc. +PRED(%F,%Pred), ( day(%Pred) | month(%Pred) ) ==> … Rule Ordering Rewrite rules are ordered (unlike LFG syntax rules but like finite-state rules) – Output of rule1 is input to rule2 – Output of rule2 is input to rule3 This allows for feeding and bleeding – Feeding: insert facts used by later rules – Bleeding: remove facts needed by later rules Can make debugging challenging Example of Rule Feeding Early Rule: Insert SPEC on nouns +NTYPE(%F,%%), -SPEC(%F,%%) ==> SPEC(%F, def). Later Rule: Allow plural nouns to become singular only if have a specifier (to avoid bad count nouns) NUM(%F,pl), +SPEC(%F,%%) ==> NUM(%F,sg). Example of Rule Bleeding Early Rule: Turn actives into passives (simplified) SUBJ(%F,%S), OBJ(%F,%O) ==> SUBJ(%F,%O), PASSIVE(%F,+). Later Rule: Impersonalize actives SUBJ(%F,%%), -PASSIVE(%F,+) ==> SUBJ(%F,%S), PRED(%S,they), PERS(%S,3), NUM(%S,pl). – will apply to intransitives and verbs with (X)COMPs but not transitives Debugging XLE command line: tdbg – steps through rules stating how they apply ============================================ Rule 1: +(NTYPE(%F,A)), -(SPEC(%F,B)) ==>SPEC(%F,def) File /tilde/thking/courses/ling187/hws/thk.pl, lines 4-10 Rule 1 matches: [+(2)] NTYPE(var(1),common) 1 --> SPEC(var(1),def) ============================================ Rule 2: NUM(%F,pl) ?=>NUM(%F,sg) File /tilde/thking/courses/ling187/hws/thk.pl, lines 11-17 girls laughed Rule 2 matches: [3] NUM(var(1),pl) 1 --> NUM(var(1),sg) ============================================ Rule 5: SUBJ(%Verb,%Subj), arg(%Verb,%Num,%Subj), OBJ(%Verb,%Obj), CASE(%Obj,acc) ==>SUBJ(%Verb,%Obj), arg(%Verb,%Num,NULL), CASE(%Obj,nom), PASSIVE(%Verb,+), VFORM(%Verb,pass) File /tilde/thking/courses/ling187/hws/thk.pl, lines 28-37 Rule does not apply Running the Rewrite System create-transfer : adds menu items load-transfer-rules FILE : loads rules from file f-str window under commands has: – transfer : prints output of rules in XLE window – translate : runs output through generator Need to do (where path is $XLEPATH/lib): setenv LD_LIBRARY_PATH /afs/ir.stanford.edu/data/linguistics/XLE/SunOS/lib Rewrite Summary The XLE rewrite system lets you manipulate the output of parsing – Creates versions of output suitable for applications – Can involve significant reprocessing Rules are ordered Ambiguity management is as with parsing Grammatical Machine Translation Stefan Riezler & John Maxwell Translation System + Lots of statistics Translation rules Source XLE Parsing German LFG F-structures Transfer F-structures. XLE Generation English LFG Target Transfer-Rule Induction from aligned bilingual corpora 1. Use standard techniques to find many-to-many candidate word-alignments in source-target sentence-pairs 2. Parse source and target sentences using LFG grammars for German and English 3. Select most similar f-structures in source and target 4. Define many-to-many correspondences between substructures of f-structures based on many-to-many word alignment 5. Extract primitive transfer rules directly from aligned fstructure units 6. Create powerset of possible combinations of basic rules and filter according to contiguity and type matching constraints Induction Example sentences: Dafür bin ich zutiefst dankbar. I have a deep appreciation for that. Many-to-many word alignment: Dafür{6 7} bin{2} ich{1} zutiefst{3 4 5} dankbar{5} F-structure alignment: Extracting Primitive Transfer Rules Rule (1) maps lexical predicates Rule (2) maps lexical predicates and interprets subj-to-subj link as indication to map subj of source with this predicate into subject of target and xcomp of source into object of target %X1, %X2, %X3, … are variables for f-structures (1) PRED(%X1, ich) ==> PRED(%X1, I) (2) PRED(%X1, sein), SUBJ(%X1,%X2), XCOMP(%X1,%X3) ==> PRED(%X1, have), SUBJ(%X1,%X2) OBJ(%X1,%X3) Extracting Complex Transfer Rules Complex rules are created by taking all combinations of primitive rules, and filtering (4) zutiefst dankbar sein ==> have a deep appreciation (5) zutiefst dankbar dafür sein ==> have a deep appreciation for that (6) ich bin zutiefst dankbar dafür ==> I have a deep appreciation for that Transfer Contiguity constraint Transfer contiguity constraint: 1. Source and target f-structures each have to be connected 2. F-structures in the transfer source can only be aligned with f-structures in the transfer target, and vice versa Analogous to constraint on contiguous and alignment-consistent phrases in phrase-based SMT Prevents extraction of rule that would translate dankbar directly into appreciation since appreciation is aligned also to zutiefst Transfer contiguity allows learning idioms like es gibt - there is from configurations that are local in fstructure but non-local in string, e.g., es scheint […] zu geben - there seems […] to be Linguistic Filters on Transfer Rules Morphological stemming of PRED values (Optional) filtering of f-structure snippets based on consistency of linguistic categories – Extraction of snippet that translates zutiefst dankbar into a deep appreciation maps incompatible categories adjectival and nominal; valid in string-based world – Translation of sein to have might be discarded because of adjectival vs. nominal types of their arguments – Larger rule mapping zutiefst dankbar sein to have a deep appreciation is ok since verbal types match Transfer Parallel application of transfer rules in nondeterministic fashion – Unlike XLE ordered-rule rewrite system Each fact must be transferred by exactly one rule Default rule transfers any fact as itself Transfer works on chart using parser’s unification mechanism for consistency checking Selection of most probable transfer output is done by beam-decoding on transfer chart Generation Bi-directionality allows us to use same grammar for parsing training data and for generation in translation application Generator has to be fault-tolerant in cases where transfer-system operates on FRAGMENT parse or produces non-valid f-structures from valid input fstructures Robust generation from unknown (e.g., untranslated) predicates and from unknown f-structures Robust Generation Generation from unknown predicates: – Unknown German word “Hunde” is analyzed by German grammar to extract stem (e.g., PRED = Hund, NUM = pl) and then inflected using English default morphology (“Hunds”) Generation from unknown constructions: – Default grammar that allows any attribute to be generated in any order is mixed as suboptimal option in standard English grammar, e.g. if SUBJ cannot be generated as sentenceinitial NP, it will be generated in any position as any category » extension/combination of set-gen-adds and OT ranking Statistical Models 1. Log-probability of source-to-target transfer rules, where probability r(e|f) or rule that transfers source snippet f into target snippet e is estimated by relative frequency count( f e) r(e | f ) count( f e') e 2. Log-probability of target-to-source transfer rules, estimated by relative frequency Statistical Models, cont. 3. Log-probability of lexical translations l(e|f) from source to target snippets, estimated from Viterbi alignments a* between source word positions i=1, …n and target word positions j=1,…,m for stems fi and ej in snippets f and e with relative word translation frequencies t(ej|fi): 1 l(e | f ) t(e j | fi ) * | {i | (i, j) a } | (i, j)a * j 4. Log-probability of lexical translations from target to source snippets Statistical Model, cont. 5. 6. 7. 8. 9. 10. Number of transfer rules Number of transfer rules with frequency 1 Number of default transfer rules Log-probability of strings of predicates from root to frontier of target f-structure, estimated from predicate trigrams in English f-structures Number of predicates in target f-structure Number of constituent movements during generations based on original order of head predicates of the constituents Statistical Models, cont. 11. 12. 13. Number of generation repairs Log-probability of target string as computed by trigram language model Number of words in target string Experimental Evaluation Experimental setup – German-to-English on Europarl parallel corpus (Koehn ‘02) – Training and evaluation on sentences of length 5-15, for quick experimental turnaround – Resulting in training set of 163,141 sentences, development set of 1,967 sentences, test of 1,755 sentences (used in Koehn et al. HLT’03) – Improved bidirectional word alignment based on GIZA++ (Och et al. EMNLP’99) – LFG grammars for German and English (Butt et al. COLING’02; Riezler et al. ACL’02) – SRI trigram language model (Stocke’02) – Comparison with PHARAOH (Koehn et al. HLT’03) and IBM Model 4 as produced by GIZA++ (Och et al. EMNLP’99) Experimental Evaluation, cont. Around 700,000 transfer rules extracted from fstructures chosen by dependency similarity measure System operates on n-best lists of parses (n=1), transferred f-structures (n=10), and generated strings (n=1,000) Selection of most probable translations in two steps: – Most probable f-structure by beam search (n=20) on transfer chart using features 1-10 – Most probable string selected from strings generated from selected n-best f-structures using features 11-13 Feature weights for modules trained by MER on 750 in-coverage sentences of development set Automatic Evaluation M4 in-coverage 5.13 full test set LFG *5.82 *5.57 *5.62 P *5.99 6.40 NIST scores (ignoring punctuation) & Approximate Randomization for significance testing (see above) 44% in-coverage of grammars; 51% FRAGMENT parses and/or generation repair; 5% timeouts – In-coverage: Difference between LFG and P not significant – Suboptimal robustness techniques decrease overall quality Manual Evaluation Closer look at in-coverage examples: – Random selection of 500 in-coverage examples – Two independent judges indicated preference for LFG or PHARAOH, or equality, in blind test – Separate evaluation under criteria of grammaticality/fluency and translational/semantic adequacy – Significance assessed by Approximate Randomization via stratified shuffling of preference ratings between systems Manual Evaluation adequacy grammaticality j1\j2 P LFG eq P LFG eq P 48 8 7 36 2 9 LFG 10 105 18 6 113 17 equal 53 60 192 51 44 223 Result differences on agreed-on ratings are statistically significant at p < 0.0001 Net improvement in translational adequacy on agreed-on examples is 11.4% on 500 sentences (57/500), amounting to 5% overall improvement in hybrid system (44% of 11.4%) Net improvement in grammaticality on agreed-on examples is 15.4% on 500 sentences, amounting to 6.7% overall improvement in hybrid system Examples: LFG > PHARAOH src: in diesem fall werde ich meine verantwortung wahrnehmen sef: then i will exercise my responsibility LFG: in this case i accept my responsibility P: in this case i shall my responsibilities src: die politische stabilität hängt ab von der besserung der lebensbedingungen ref: political stability depends upon the improvement of living conditions LFG: the political stability hinges on the recovery the conditions P: the political stability is rejects the recovery of the living conditions Examples: PHARAOH > LFG src: das ist schon eine seltsame vorstellung von gleichheit ref: a strange notion of equality LFG: equality that is even a strange idea P: this is already a strange idea of equality src: frau präsidentin ich beglückwünsche herrn nicholson zu seinem ausgezeichneten bericht ref: madam president I congratulate mr nicholson on his excellent report LFG: madam president I congratulate mister nicholson on his report excellented P: madam president I congratulate mr nicholson for his excellent report Discussion High percentage of out-of-coverage examples – Accumulation of 2 x 20% error-rates in parsing training data – Errors in rule extraction – Together result in ill-formed transfer rules causing high number of generation failures/repairs Propagation of errors through the system also for incoverage examples – Error analysis: 69% transfer errors, 10% due to parse errors Discrepancy between NIST and manual evaluation – Suboptimal integration of generator, making training and translation with large n-best lists infeasible – Language and distortion models applied after generation Conclusion Integration of grammar-based generator into dependency-based SMT system achieves state-ofthe-art NIST and improved grammaticality and adequacy on in-coverage examples Possibility of hybrid system since it is determinable when sentences are in coverage of system Grammatical Machine Translation II Ji Fang, Martin Forst, John Maxwell, and Michael Tepper Overview over different approaches to MT Level of transfer Transfer Disambiguation “Traditional” MT (e.g. Systran) String (with minimal analysis) Mainly hand-developed rules Heuristics Statistical MT (e.g. Google) String (morpholical analysis) (synt. rearrangements) Phrase correspondences with statistics acquired on bitexts Machine-Learned (transfer probabilities, LM) Grammatical MT I (2006) F-structure Term-rewriting rules with statistics induced from parsed bitexts Machine-Learned (ME models, LM) Context-Based MT (Meaningful Machines) String Semi-automatically developed phrase pairs Machine-Learned (LM) Grammatical MT II (2008) F-structure Term-rewriting rules without statistics induced from semi-automatically developed phrase pairs, potentially bitexts Machine-Learned (ME models, LM) Limitations of string-based approaches Transfer rules/correspondences of little generality Problems with long-distance dependencies Perform less well for morphologically rich (target) languages N-gram LM-based disambiguation seems to have leveled out Limitations of string-based approaches - little generality From Europarl: Das tut mir leid. = I’m sorry [about that]. Google (SMT): I’m sorry. Perfect! But: As soon as input changes a bit, we get garbage. Das tut ihr leid. ‘She is sorry about that.’ It does their suffering. Der Tod deines Vaters tut mir leid. ‘I am sorry about the death of your father.’ The death of your father I am sorry. Der Tod deines Vaters tut ihnen leid. ‘They are sorry about the death of your father.’ The death of your father is doing them sorry. Limitations of string-based approaches - problems with LDDs From Europarl: Dies stellt eine der großen Herausforderungen für die französische Präsidentschaft dar . = This is one of the major issues of the French Presidency . Google (SMT): This is one of the major challenges for the French presidency represents. Particle verb is identified and translated correctly But: two verbs ungrammatical; seem to be too far apart to be filtered by LM Limitations of string-based approaches - rich morphology Language pairs involving morphologically rich languages, e.g., Finnish, are hard From Koehn (2005, MT Summit) Limitations of string-based approaches - rich morphology Morphologically rich, free word order languages, e.g. German, are particularly hard as target languages. Again from Koehn (2005, MT Summit) Limitations of string-based approaches - n-gram LMs Even for morphologically poor languages, improving n-gram LMs becomes increasingly expensive. Adding data helps improve translation quality (BLEU scores), but not enough. Assuming best improvement rate observed in Brants et al. (2007), ~400 million times available data needed to attain human translation quality by LM improvement. Limitations of string-based approaches - n-gram LMs From Brants et al. (2007) Best improvement rate: +0.7 BP/x2 Would need 40 more doublings to obtain human translation quality. (42 + 0.7*40 ≈ 70) Necessary training data in tokens: 1e22 (1e10*2^40 ≈ 1e22) 4e8 times current English Web (estimate) (2.5e13*4e8 = 1e22) Limitations of bitext-based approaches Generally available bitexts are limited in size and specialized in genre – Parliament proceedings – UN texts – Judiciary texts (from multilingual countries) Makes it hard to repurpose bitext-based systems to new genres Induced transfer rules/correspondences often of mediocre quality – “Loose” translations – Bad alignments Limitations of bitext-based approaches - availability and quality Readily available bitexts are limited in size and specialized in genre Approaches to auto-extracting bitexts from the web exist. Additional data help to some degree, but then effect levels out. – Still a genre bias in bitexts, despite automatic acquisition? – Still more general problems with alignment quality etc.? Limitations of bitext-based approaches - availability and quality Much more data needed to attain human translation quality Logarithmic gains (at best) by adding bitext data From Munteanu & Marcu (2005) Base Line: 100K - 95M English Words Mid Line (+auto): + 90K 2.1M Top Line (+oracle): + 90K 2.1M Context-Based MT / Meaningful Machines Combines example-based MT (EBMT) and SMT Very large (target) language model, large amount of monolingual text required No transfer statistics, thus no parallel text required Translation lexicon is developed semiautomatically (i.e. hand-validated) Lexicon has slotted phrase pairs (like EBMT), i.e. “NP1 biss ins Gras.” = “NP1 bit the dust.” Context-Based MT / Meaningful Machines - pros High-quality translation lexicon seems to allow for – Easier repurposing of system(s) to new genres – Better translation quality From Carbonell (2006) Context-Based MT / Meaningful Machines - cons Works really well for English-Spanish. How about other language pairs? Same problems with n-gram LMs as “traditional” SMT; probably affects pairs involving morphologically rich (target) language particularly badly. How much manual labor involved in development of translation lexicon? Computationally expensive Grammatical Machine Translation Syntactic transfer-based approach Parsing and generation identical/similar between GMT I pyramid and GMT II F-structure transfer rules – transfer, score target FSs – String-level statistical methods Grammatical Machine Translation GMT I vs. GMT II GMT I Transfer rules induced from parsed bitexts Target f-structures ranked using individual transfer rule statistics GMT II Transfer rules induced from manually/semiautomatically constructed phrase lexicon Target f-structures ranked using monolingually trained bilexical dependency statistics and general transfer rule statistics GMT II Where do the transfer rules come from? Where do statistics/machine learning come induced from manually/semiin? automatically compiled phrase pyramid pairs with ``slots’’; potentially, but not necessarily from bitexts log-linear model trained on synt. annotated monolingual corpus F-structure transfer rules – transfer, score target FSs – String-level statistical methods log-linear model trained on bitext data; includes score from parse ranking model and very general transfer features log-linear model trained on bitext data; includes scores from other two models and features/score of monolingually trained model for realization ranking GMT II - The phrase dictionary Contains phrase pairs with ``slot’’ categories (Ddeff, Ddef, NP1nom, NP1, etc.) that allow for well-formed phrases without being included in induced rules Currently hand-written Will hopefully be compiled (semi-)automatically from bilingual dictionaries Bitexts might also be used; how exactly remains to be defined. GMT II - Rule induction from the phrase dictionary Sub-FSs of “slot” variables are not included FS attributes can be defined as irrelevant for translation, e.g. CASE (in both en and de), GEND (in de). Attributes so defined are never included in induced rules. set-gen-adds remove CASE GEND FS attributes can be defined as “remove_equal_features”. Attributes defined as such are not included in induced rules when they are equal. set remove_equal_features NUM OBJ OBL-AG PASSIVE SUBJ TENSE more general rules GMT II - Rule induction from the phrase dictionary (noun) Ddeff Verfassung = Ddef constitution PRED(%X1, Verfassung), NTYPE(%X1, %Z2), NSEM(%Z2, %Z3), COMMON(%Z3, count), NSYN(%Z2, common) ==> PRED(%X1, constitution), NTYPE(%X1, %Z4), NSYN(%Z4, common). GMT II - Rule induction from the phrase dictionary (adjective) europäische = European PRED(%X1, europäisch) ==> PRED(%X1, European). To accommodate certain non-parallelism with respect to SUBJs of adjectives etc., special mechanism removes SUBJs of nonverbs and makes them addable in generation. GMT II - Rule induction from the phrase dictionary (verb) NP1nom koordiniert NP2acc. = NP1 coordinates NP2. PRED(%X1, koordinieren), arg(%X1, 1, %A2), arg(%X1, 2, %A3), VTYPE(%X1, main) ==> PRED(%X1, coordinate), arg(%X1, 1, %A2), arg(%X1, 2, %A3), VTYPE(%X1, main). GMT II - Rule induction (argument switching) NP1nom tut NP2dat leid. = NP2 is sorry about NP1. PRED(%X1, leid#tun), SUBJ(%X1, %A2), OBJ-TH(%X1, %A3), VTYPE(%X1, main) ==> PRED(%X1,be), SUBJ(%X1,%A3), XCOMP-PRED(%X1,%Z1), PRED(%Z1, sorry), OBL(%Z1,%Z2), PRED(%Z2,about), OBJ(%Z2,%A2), VTYPE(%X1,copular). GMT II - Rule induction (head switching) Ich versuche nur, mich jeder Demagogie zu enthalten. = It is just that I am trying not to indulge in demagoguery. NP1nom Vfin nur. = It is ist just that NP1 Vs. +ADJUNCT(%X1,%Z2), in_set(%X3,%Z2), PRED(%X3,nur), ADV-TYPE(%X3,unspec) ==> PRED(%Z4,be), SUBJ(%Z4,%X3), NTYPE(%X3,%Z5), NSYN(%Z5,pronoun), GEND-SEM(%Z5,nonhuman), HUMAN(%Z5,), NUM(%Z5,sg), PERS(%Z5,3), PRON-FORM(%Z5,it), PRON-TYPE(%Z5,expl_), arg(%Z4,1,%Z6), PRED(%Z6, just), SUBJ(%Z6,%Z7), arg(%Z6,1,%A1), COMP-FORM(%A1,that), COMP(%Z6,%A1), nonarg(%Z6,1,%Z7), ATYPE(%Z6,predicative), DEGREE(%Z6, positive), nonarg(%Z4,1,%X3), TNS-ASP(%Z4,%Z8), MOOD(%Z8,indicative), TENSE(%Z8, pres), XCOMP-PRED(%Z4,%Z6), CLAUSE-TYPE(%Z4,decl), PASSIVE(%Z4,-), VTYPE(%A2,copular). GMT II - Rule induction (more on head switching) In addition to rewriting terms, system re-attaches rewritten FS if necessary. Here, this might be the case of %X1. +ADJUNCT(%X1,%Z2), in_set(%X3,%Z2), PRED(%X3,nur), ADV-TYPE(%X3,unspec) ==> PRED(%Z4,be), SUBJ(%Z4,%X3), NTYPE(%X3,%Z5), NSYN(%Z5,pronoun), GEND-SEM(%Z5,nonhuman), HUMAN(%Z5,), NUM(%Z5,sg), PERS(%Z5,3), PRON-FORM(%Z5,it), PRON-TYPE(%Z5,expl_), arg(%Z4,1,%Z6), PRED(%Z6, just), SUBJ(%Z6,%Z7), arg(%Z6,1,%A1), COMP-FORM(%A1,that), COMP(%Z6,%A1), nonarg(%Z6,1,%Z7), ATYPE(%Z6,predicative), DEGREE(%Z6, positive), nonarg(%Z4,1,%X3), TNS-ASP(%Z4,%Z8), MOOD(%Z8,indicative), TENSE(%Z8, pres), XCOMP-PRED(%Z4,%Z6), CLAUSE-TYPE(%Z4,decl), PASSIVE(%Z4,-), VTYPE(%A2,copular). GMT II - Pros and cons of rule induction from a phrase dictionary Development of phrase pairs can be carried out by someone with little knowledge of grammar and transfer system; manual development of transfer rules would require experts (for boring, repetitive labor). Phrase pairs can remain stable while grammars keep evolving. Since transfer rules are induced fully automatically, they can easily be kept in sync with grammars. Induced rules are of much higher quality than rules induced from parsed bitexts (GMT I). Although there is hope that phrase pairs can be constructed semi-automatically from bilingual dictionaries, it is not yet clear to what extent this can be automated. If rule induction from parsed bitexts can be improved, the two approaches might well be complementary. Lessons Learned for Parallel Grammar Development Absence of a feature like PERF=+/- is not equivalent to PERF=-. FS-internal features should not say anything about the function of the FS – Example: PRON-TYPE=poss instead of PRONTYPE=pers Compounds should be analyzed similarly, whether spelt together (de) or apart (en) – Possible with SMOR – Very hard or even impossible with DMOR Absence of PERF PERF=- No function info in FS-internal features I think NP1 Vs. = In my opinion NP1 Vs. Parallel analysis of compounds More Lessons Learned for Parallel Grammar Development ParGram needs to agree on a parallel PRED value for (personal) pronouns We need an “interlingua” for numbers, clock times, dates etc. Guessers should analyze (composite) names similarly Parallel PRED values for (personal) pronouns Otherwise the number of rules we have to learn for them explodes. de-en: pro/er → he, pro/er → it, pro/sie → she, pro/sie → it, pro/es → it, pro/es → he, pro/es → she Also: PRED-NUM-PERS combination may make no sense!!! Result: A lot of generator effort for nothing… en-de: he → pro/er, she → pro/sie, it → pro/es, it → pro/er, it → pro/sie, … Interlingua for numbers, clock times, dates, etc. We cannot possibly learn transfer rules for all dates. Guessed (composite) names We cannot possibly learn transfer rules for all proper names in this world. And Yet More Lessons Learned for Grammar Development Reflexive pronouns - PERS and NUM agreement should be insured via inside-out function application, e.g. ((SUBJ ^) PERS)= (^PERS). Semantically relevant features should not be hidden in CHECK Reflexive pronouns Introduce their own values for PERS and NUM – Overgeneration: *Ich wasche sich. – NUM ambiguity for (frequent) “sich” – Less generalization possible in transfer rules for inherently reflexive verbs - 6 rules necessary instead of 1. Reflexive pronouns Semantically relevant features in CHECK sie = they Sie = you (formal) Since CHECK features are not used for translations, the distinction between “sie” and “Sie” is lost. Planned experiments - Motivation We do not have the resources to develop a “general purpose” phrase dictionary in the short or medium term. Nevertheless, we want to get an idea about how well our new approach may scale. Planned Experiments 1 Manually develop phrase dictionary for a few hundred Europarl sentences Train target FS ranking model and realization ranking model on those sentences Evaluate output in terms of BLEU, NIST and manually Can we make this new idea work under ideal conditions? It seems we can. Planned Experiments 2 Manually develop phrase dictionary for a few hundred Europarl sentences Use bilingual dictionary to add possible phrase pairs that may distract the system Train target FS ranking model and realization ranking model on those sentences Evaluate output in terms of BLEU, NIST and manually How well can our system deal with the “distractors”? Planned Experiments 3 Manually develop phrase dictionary for a few hundred Europarl sentences Use bilingual dictionary to add possible phrase pairs that may distract the system Degrade the phrase dictionary at various levels of severity – Take out a certain percentage of phrase pairs – Shorter phrases may be penalized less than longer ones Train target FS ranking model and realization ranking model on those sentences Evaluate output in terms of BLEU, NIST and manually How good or bad is the output of the system when the bilingual phrase dictionary lacks coverage? Main Remaining Challenges Get comprehensive and high-quality dictionary of phrase pairs Get more and better (i.e. more normalized and parallel) analyses from grammars Improve ranking models, in particular on source side Improve generation behavior of grammars - So far, grammar development has mostly been “parsingoriented”. Efficiency, in particular on the generation side, i.a. packed transfer and generation