Generation

advertisement
Linguistics 187/287 Week 6
Generation
Term-rewrite System
Machine Translation
Martin Forst, Ron Kaplan,
and Tracy King
Generation




Parsing: string to analysis
Generation: analysis to string
What type of input?
How to generate
Why generate?

Machine translation
Lang1 string -> Lang1 fstr -> Lang2 fstr -> Lang2 string

Sentence condensation
Long string -> fstr -> smaller fstr -> new string


Question answering
Production of NL reports
– State of machine or process
– Explanation of logical deduction

Grammar debugging
F-structures as input




Use f-structures as input to the generator
May parse sentences that shouldn’t be
generated
May want to constrain number of generated
options
Input f-structure may be underspecified
XLE generator


Use the same grammar for parsing and
generation
Advantages
– maintainability
– write rules and lexicons once

But
– special generation tokenizer
– different OT ranking
Generation tokenizer/morphology

White space
– Parsing: multiple white space becomes a single
TB
John
appears. -> John TB appears TB . TB
– Generation: single TB becomes a single space (or
nothing)
John TB appears TB . TB -> John appears.
*John
appears .

Suppress variant forms
– Parse both favor and favour
– Generate only one
Morphconfig for parsing & generation
STANDARD ENGLISH MOPRHOLOGY (1.0)
TOKENIZE:
P!eng.tok.parse.fst G!eng.tok.gen.fst
ANALYZE:
eng.infl-morph.fst G!amerbritfilter.fst
G!amergen.fst
----
Reversing the parsing grammar



The parsing grammar can be used directly as
a generator
Adapt the grammar with a special OT ranking
GENOPTIMALITYORDER
Why do this?
– parse ungrammatical input
– have too many options
Ungrammatical input

Linguistically ungrammatical
– They walks.
– They ate banana.

Stylistically ungrammatical
– No ending punctuation: They appear
– Superfluous commas: John, and Mary appear.
– Shallow markup: [NP John and Mary] appear.
Too many options



All the generated options can be linguistically
valid, but too many for applications
Occurs when more than one string has the
same, legitimate f-structure
PP placement:
– In the morning I left.
I left in the morning.
Using the Gen OT ranking

Generally much simpler than in the parsing
direction
– Usually only use standard marks and NOGOOD
no * marks, no STOPPOINT
– Can have a few marks that are shared by several
constructions
one or two for dispreferred
one or two for preferred
Example: Prefer initial PP
S --> (PP: @ADJUNCT @(OT-MARK GenGood))
NP: @SUBJ;
VP.
VP --> V
(NP: @OBJ)
(PP: @ADJUNCT).
GENOPTIMALITYORDER NOGOOD +GenGood.
parse: they appear in the morning.
generate: without OT: In the morning they appear.
They appear in the morning.
with OT: In the morning they appear.
Debugging the generator


When generating from an f-structure
produced by the same grammar, XLE should
always generate
Unless:
– OT marks block the only possible string
– something is wrong with the tokenizer/morphology
regenerate-morphemes: if this gets a string
the tokenizer/morphology is not the problem

Hard to debug: XLE has robustness features
to help
Underspecified Input

F-structures provided by applications are not
perfect
– may be missing features
– may have extra features
– may simply not match the grammar coverage

Missing and extra features are often
systematic
– specify in XLE which features can be added and
deleted

Not matching the grammar is a more serious
problem
Adding features

English to French translation:
– English nouns have no gender
– French nouns need gender
– Soln: have XLE add gender
the French morphology will control the value

Specify additions in xlerc:
– set-gen-adds add "GEND"
– can add multiple features:
set-gen-adds add "GEND CASE PCASE"
– XLE will optionally insert the feature
Note: Unconstrained additions make generation undecidable
Example
The cat sleeps. -> Le chat dort.
[ PRED 'dormir<SUBJ>'
SUBJ [ PRED 'chat'
NUM sg
SPEC def ]
TENSE present ]
[ PRED 'dormir<SUBJ>'
SUBJ [ PRED 'chat'
NUM sg
GEND masc
SPEC def ]
TENSE present ]
Deleting features

French to English translation
– delete the GEND feature

Specify deletions in xlerc
– set-gen-adds remove "GEND"
– can remove multiple features
set-gen-adds remove "GEND CASE PCASE"
– XLE obligatorily removes the features
no GEND feature will remain in the f-structure
– if a feature takes an f-structure value, that fstructure is also removed
Changing values

If values of a feature do not match between
the input f-structure and the grammar:
– delete the feature and then add it

Example: case assignment in translation
– set-gen-adds remove "CASE"
set-gen-adds add "CASE"
– allows dative case in input to become accusative
e.g., exceptional case marking verb in input
language but regular case in output language
Generation for Debugging

Checking for grammar and lexicon errors
– create-generator english.lfg
– reports ill-formed rules, templates, feature
declarations, lexical entries

Checking for ill-formed sentences that can be
parsed
– parse a sentence
– see if all the results are legitimate strings
– regenerate “they appear.”
Rewriting/Transfer
System
Why a Rewrite System


Grammars produce c-/f-structure output
Applications may need to manipulate this
– Remove features
– Rearrange features
– Continue linguistic analysis (semantics,
knowledge representation – next week)

XLE has a general purpose rewrite system
(aka "transfer" or "xfr" system)
Sample Uses of Rewrite System




Sentence condensation
Machine translation
Mapping to logic for knowledge
representation and reasoning
Tutoring systems
What does the system do?


Input: set of "facts"
Apply a set of ordered rules to the facts
– this gradually changes the set of input facts

Output: new set of facts

Rewrite system uses the same ambiguity
management as XLE
– can efficiently rewrite packed structures,
maintaining the packing
Example F-structure Facts
PERS(var(1),3)
PRED(var(1),girl)
CASE(var(1),nom)
NTYPE(var(1),common)
NUM(var(1),pl)
SUBJ(var(0),var(1))
PRED(var(0),laugh)
TNS-ASP(var(0),var(2))
TENSE(var(2),pres)
arg(var(0),1,var(1))
lex_id(var(0),1)
lex_id(var(1),0)





F-structures get var(#)
Special arg facts
lex_id for each PRED
Facts have two arguments (except arg)
Rewrite system allows for any number
of arguments
Rule format




Obligatory rule: LHS ==> RHS.
Optional rule: LHS ?=> RHS.
Unresourced fact: |- clause.
LHS
clause : match and delete
+clause : match and keep
-LHS : negation (don't have fact)
LHS, LHS : conjunction
( LHS | LHS ) : disjunction
{ ProcedureCall } : procedural attachment

RHS
clause : replacement facts
0 : empty set of replacement facts
stop : abandon the analysis
Example rules
"PRS (1.0)"
grammar = toy_rules.
"obligatorily add a determiner
if there is a noun with no spec"
+NTYPE(%F,%%), -SPEC(%F,%%)
==>
SPEC(%F,def).
"optionally make plural nouns singular
this will split the choice space"
NUM(%F, pl) ?=> NUM(%F, sg).
PERS(var(1),3)
PRED(var(1),girl)
CASE(var(1),nom)
NTYPE(var(1),common)
NUM(var(1),pl)
SUBJ(var(0),var(1))
PRED(var(0),laugh)
TNS-ASP(var(0),var(2))
TENSE(var(2),pres)
arg(var(0),1,var(1))
lex_id(var(0),1)
lex_id(var(1),0)
Example Obligatory Rule
"obligatorily add a determiner
if there is a noun with no spec"
+NTYPE(%F,%%), -SPEC(%F,%%)
==>
SPEC(%F,def).
Output facts:
all the input facts
plus:
SPEC(var(1),def)
PERS(var(1),3)
PRED(var(1),girl)
CASE(var(1),nom)
NTYPE(var(1),common)
NUM(var(1),pl)
SUBJ(var(0),var(1))
PRED(var(0),laugh)
TNS-ASP(var(0),var(2))
TENSE(var(2),pres)
arg(var(0),1,var(1))
lex_id(var(0),1)
lex_id(var(1),0)
Example Optional Rule
"optionally make plural nouns singular
this will split the choice space"
NUM(%F, pl) ?=> NUM(%F, sg).
PERS(var(1),3)
PRED(var(1),girl)
CASE(var(1),nom)
NTYPE(var(1),common)
NUM(var(1),pl)
SPEC(var(1),def)
SUBJ(var(0),var(1))
Output facts:
all the input facts
plus choice split:
A1: NUM(var(1),pl)
A2: NUM(var(1),sg)
PRED(var(0),laugh)
TNS-ASP(var(0),var(2))
TENSE(var(2),pres)
arg(var(0),1,var(1))
lex_id(var(0),1)
lex_id(var(1),0)
Output of example rules


Output is a packed f-structure
Generation gives two sets of strings
– The girls {laugh.|laugh!|laugh}
– The girl {laughs.|laughs!|laughs}
Manipulating sets

Sets are represented with an in_set feature
– He laughs in the park with the telescope
ADJUNCT(var(0),var(2))
in_set(var(4),var(2))
in_set(var(5),var(2))
PRED(var(4),in)
PRED(var(5),with)

Might want to optionally remove
adjuncts
– but not negation
Example Adjunct Deletion Rules
"optionally remove member of adjunct set"
+ADJUNCT(%%, %AdjSet), in_set(%Adj, %AdjSet),
-PRED(%Adj, not)
?=> 0.
"obligatorily remove adjunct with nothing in it"
ADJUNCT(%%, %Adj), -in_set(%%,%Adj)
==> 0.
He laughs with the telescope in the park.
He laughs in the park with the telescope
He laughs with the telescope.
He laughs in the park.
He laughs.
Manipulating PREDs

Changing the value of a PRED is easy
– PRED(%F,girl) ==> PRED(%F,boy).

Changing the argument structure is trickier
– Make any changes to the grammatical functions
– Make the arg facts correlate with these
Example Passive Rule
"make actives passive
make the subject NULL; make the object the subject;
put in features"
SUBJ( %Verb, %Subj), arg( %Verb, %Num, %Subj),
OBJ( %Verb, %Obj), CASE( %Obj, acc)
==>
SUBJ( %Verb, %Obj), arg( %Verb, %Num, NULL), CASE( %Obj, nom),
PASSIVE( %Verb, +), VFORM( %Verb, pass).
the girls saw the monkeys ==>
The monkeys were seen.
in the park the girls saw the monkeys ==>
In the park the monkeys were seen.
Templates and Macros

Rules can be encoded as templates
n2n(%Eng,%Frn) ::
PRED(%F,%Eng), +NTYPE(%F,%%)
==> PRED(%F,%Frn).
@n2n(man, homme).
@n2n(woman, femme).

Macros encode groups of clauses/facts
sg_noun(%F) :=
+NTYPE(%F,%%), +NUM(%F,sg).
@sg_noun(%F), -SPEC(%F)
==> SPEC(%F,def).
Unresourced Facts

Facts can be stipulated in the rules and
refered to
– Often used as a lexicon of information not
encoded in the f-structure

For example, list of days and months for
manipulation of dates
|- day(Monday). |- day(Tuesday).
etc.
|- month(January). |- month(February). etc.
+PRED(%F,%Pred), ( day(%Pred) | month(%Pred) ) ==> …
Rule Ordering

Rewrite rules are ordered (unlike LFG syntax
rules but like finite-state rules)
– Output of rule1 is input to rule2
– Output of rule2 is input to rule3

This allows for feeding and bleeding
– Feeding: insert facts used by later rules
– Bleeding: remove facts needed by later rules

Can make debugging challenging
Example of Rule Feeding

Early Rule: Insert SPEC on nouns
+NTYPE(%F,%%), -SPEC(%F,%%) ==>
SPEC(%F, def).

Later Rule: Allow plural nouns to become
singular only if have a specifier (to avoid bad
count nouns)
NUM(%F,pl), +SPEC(%F,%%) ==> NUM(%F,sg).
Example of Rule Bleeding

Early Rule: Turn actives into passives
(simplified)
SUBJ(%F,%S), OBJ(%F,%O) ==>
SUBJ(%F,%O), PASSIVE(%F,+).

Later Rule: Impersonalize actives
SUBJ(%F,%%), -PASSIVE(%F,+) ==>
SUBJ(%F,%S), PRED(%S,they), PERS(%S,3), NUM(%S,pl).
– will apply to intransitives and verbs with
(X)COMPs but not transitives
Debugging

XLE command line: tdbg
– steps through rules stating how they apply
============================================
Rule 1: +(NTYPE(%F,A)), -(SPEC(%F,B))
==>SPEC(%F,def)
File /tilde/thking/courses/ling187/hws/thk.pl, lines 4-10
Rule 1 matches: [+(2)] NTYPE(var(1),common) 1
--> SPEC(var(1),def)
============================================
Rule 2: NUM(%F,pl)
?=>NUM(%F,sg)
File /tilde/thking/courses/ling187/hws/thk.pl, lines 11-17
girls laughed
Rule 2 matches: [3] NUM(var(1),pl) 1
--> NUM(var(1),sg)
============================================
Rule 5: SUBJ(%Verb,%Subj), arg(%Verb,%Num,%Subj), OBJ(%Verb,%Obj),
CASE(%Obj,acc)
==>SUBJ(%Verb,%Obj), arg(%Verb,%Num,NULL), CASE(%Obj,nom),
PASSIVE(%Verb,+), VFORM(%Verb,pass)
File /tilde/thking/courses/ling187/hws/thk.pl, lines 28-37
Rule does not apply
Running the Rewrite System



create-transfer : adds menu items
load-transfer-rules FILE : loads rules from file
f-str window under commands has:
– transfer : prints output of rules in XLE window
– translate : runs output through generator

Need to do (where path is $XLEPATH/lib):
setenv LD_LIBRARY_PATH
/afs/ir.stanford.edu/data/linguistics/XLE/SunOS/lib
Rewrite Summary

The XLE rewrite system lets you manipulate
the output of parsing
– Creates versions of output suitable for applications
– Can involve significant reprocessing


Rules are ordered
Ambiguity management is as with parsing
Grammatical Machine Translation
Stefan Riezler & John Maxwell
Translation System
+ Lots of statistics
Translation
rules
Source
XLE
Parsing
German
LFG
F-structures
Transfer
F-structures.
XLE
Generation
English
LFG
Target
Transfer-Rule Induction from
aligned bilingual corpora
1. Use standard techniques to find many-to-many candidate
word-alignments in source-target sentence-pairs
2. Parse source and target sentences using LFG grammars
for German and English
3. Select most similar f-structures in source and target
4. Define many-to-many correspondences between
substructures of f-structures based on many-to-many word
alignment
5. Extract primitive transfer rules directly from aligned fstructure units
6. Create powerset of possible combinations of basic rules
and filter according to contiguity and type matching
constraints
Induction
Example sentences: Dafür bin ich zutiefst dankbar.
I have a deep appreciation for that.
Many-to-many word alignment:
Dafür{6 7} bin{2} ich{1} zutiefst{3 4 5} dankbar{5}
F-structure alignment:
Extracting Primitive Transfer Rules



Rule (1) maps lexical predicates
Rule (2) maps lexical predicates and interprets subj-to-subj link as
indication to map subj of source with this predicate into subject of target
and xcomp of source into object of target
%X1, %X2, %X3, … are variables for f-structures
(1) PRED(%X1, ich)
==>
PRED(%X1, I)
(2) PRED(%X1, sein),
SUBJ(%X1,%X2),
XCOMP(%X1,%X3)
==>
PRED(%X1, have),
SUBJ(%X1,%X2)
OBJ(%X1,%X3)
Extracting Complex Transfer Rules

Complex rules are created by taking all
combinations of primitive rules, and filtering
(4) zutiefst dankbar sein
==>
have a deep appreciation
(5) zutiefst dankbar dafür sein
==>
have a deep appreciation for that
(6) ich bin zutiefst dankbar dafür
==>
I have a deep appreciation for that
Transfer Contiguity constraint

Transfer contiguity constraint:
1. Source and target f-structures each have to be connected
2. F-structures in the transfer source can only be aligned with
f-structures in the transfer target, and vice versa



Analogous to constraint on contiguous and
alignment-consistent phrases in phrase-based SMT
Prevents extraction of rule that would translate
dankbar directly into appreciation since appreciation
is aligned also to zutiefst
Transfer contiguity allows learning idioms like es
gibt - there is from configurations that are local in fstructure but non-local in string, e.g., es scheint […]
zu geben - there seems […] to be
Linguistic Filters on Transfer Rules


Morphological stemming of PRED values
(Optional) filtering of f-structure snippets based on
consistency of linguistic categories
– Extraction of snippet that translates zutiefst dankbar into a
deep appreciation maps incompatible categories adjectival
and nominal; valid in string-based world
– Translation of sein to have might be discarded because of
adjectival vs. nominal types of their arguments
– Larger rule mapping zutiefst dankbar sein to have a deep
appreciation is ok since verbal types match
Transfer

Parallel application of transfer rules in nondeterministic fashion
– Unlike XLE ordered-rule rewrite system




Each fact must be transferred by exactly one rule
Default rule transfers any fact as itself
Transfer works on chart using parser’s unification
mechanism for consistency checking
Selection of most probable transfer output is done by
beam-decoding on transfer chart
Generation



Bi-directionality allows us to use same grammar for
parsing training data and for generation in translation
application
Generator has to be fault-tolerant in cases where
transfer-system operates on FRAGMENT parse or
produces non-valid f-structures from valid input fstructures
Robust generation from unknown (e.g., untranslated)
predicates and from unknown f-structures
Robust Generation

Generation from unknown predicates:
– Unknown German word “Hunde” is analyzed by German
grammar to extract stem (e.g., PRED = Hund, NUM = pl)
and then inflected using English default morphology
(“Hunds”)

Generation from unknown constructions:
– Default grammar that allows any attribute to be generated in
any order is mixed as suboptimal option in standard English
grammar, e.g. if SUBJ cannot be generated as sentenceinitial NP, it will be generated in any position as any category
» extension/combination of set-gen-adds and OT ranking
Statistical Models
1.
Log-probability of source-to-target transfer
rules, where probability r(e|f) or rule that
transfers source snippet f into target snippet
e is estimated by relative frequency
count( f  e)
r(e | f ) 
 count( f  e')
e
2.

Log-probability of target-to-source transfer
rules, estimated by relative frequency
Statistical Models, cont.
3.
Log-probability of lexical translations l(e|f) from
source to target snippets, estimated from Viterbi
alignments a* between source word positions i=1,
…n and target word positions j=1,…,m for stems fi
and ej in snippets f and e with relative word
translation frequencies t(ej|fi):
1
l(e | f )  
t(e j | fi )

*
| {i | (i, j)  a } | (i, j)a *
j

4.

Log-probability of lexical translations from target to
source snippets
Statistical Model, cont.
5.
6.
7.
8.
9.
10.
Number of transfer rules
Number of transfer rules with frequency 1
Number of default transfer rules
Log-probability of strings of predicates from
root to frontier of target f-structure,
estimated from predicate trigrams in
English f-structures
Number of predicates in target f-structure
Number of constituent movements during
generations based on original order of head
predicates of the constituents
Statistical Models, cont.
11.
12.
13.
Number of generation repairs
Log-probability of target string as computed
by trigram language model
Number of words in target string
Experimental Evaluation

Experimental setup
– German-to-English on Europarl parallel corpus (Koehn ‘02)
– Training and evaluation on sentences of length 5-15, for quick
experimental turnaround
– Resulting in training set of 163,141 sentences, development
set of 1,967 sentences, test of 1,755 sentences (used in
Koehn et al. HLT’03)
– Improved bidirectional word alignment based on GIZA++ (Och
et al. EMNLP’99)
– LFG grammars for German and English (Butt et al.
COLING’02; Riezler et al. ACL’02)
– SRI trigram language model (Stocke’02)
– Comparison with PHARAOH (Koehn et al. HLT’03) and IBM
Model 4 as produced by GIZA++ (Och et al. EMNLP’99)
Experimental Evaluation, cont.



Around 700,000 transfer rules extracted from fstructures chosen by dependency similarity measure
System operates on n-best lists of parses (n=1),
transferred f-structures (n=10), and generated strings
(n=1,000)
Selection of most probable translations in two steps:
– Most probable f-structure by beam search (n=20) on transfer
chart using features 1-10
– Most probable string selected from strings generated from
selected n-best f-structures using features 11-13

Feature weights for modules trained by MER on 750
in-coverage sentences of development set
Automatic Evaluation
M4
in-coverage 5.13
full test set


LFG
*5.82
*5.57 *5.62
P
*5.99
6.40
NIST scores (ignoring punctuation) & Approximate
Randomization for significance testing (see above)
44% in-coverage of grammars; 51% FRAGMENT
parses and/or generation repair; 5% timeouts
– In-coverage: Difference between LFG and P not significant
– Suboptimal robustness techniques decrease overall quality
Manual Evaluation

Closer look at in-coverage examples:
– Random selection of 500 in-coverage examples
– Two independent judges indicated preference for
LFG or PHARAOH, or equality, in blind test
– Separate evaluation under criteria of
grammaticality/fluency and translational/semantic
adequacy
– Significance assessed by Approximate
Randomization via stratified shuffling of
preference ratings between systems
Manual Evaluation
adequacy



grammaticality
j1\j2
P
LFG
eq
P
LFG
eq
P
48
8
7
36
2
9
LFG
10
105
18
6
113
17
equal
53
60
192
51
44
223
Result differences on agreed-on ratings are statistically
significant at p < 0.0001
Net improvement in translational adequacy on agreed-on
examples is 11.4% on 500 sentences (57/500), amounting to
5% overall improvement in hybrid system (44% of 11.4%)
Net improvement in grammaticality on agreed-on examples is
15.4% on 500 sentences, amounting to 6.7% overall
improvement in hybrid system
Examples: LFG > PHARAOH
src: in diesem fall werde ich meine verantwortung wahrnehmen
sef: then i will exercise my responsibility
LFG: in this case i accept my responsibility
P: in this case i shall my responsibilities
src: die politische stabilität hängt ab von der besserung der
lebensbedingungen
ref: political stability depends upon the improvement of living conditions
LFG: the political stability hinges on the recovery the conditions
P: the political stability is rejects the recovery of the living conditions
Examples: PHARAOH > LFG
src: das ist schon eine seltsame vorstellung von gleichheit
ref: a strange notion of equality
LFG: equality that is even a strange idea
P: this is already a strange idea of equality
src: frau präsidentin ich beglückwünsche herrn nicholson zu seinem
ausgezeichneten bericht
ref: madam president I congratulate mr nicholson on his excellent
report
LFG: madam president I congratulate mister nicholson on his report
excellented
P: madam president I congratulate mr nicholson for his excellent report
Discussion

High percentage of out-of-coverage examples
– Accumulation of 2 x 20% error-rates in parsing training data
– Errors in rule extraction
– Together result in ill-formed transfer rules causing high
number of generation failures/repairs

Propagation of errors through the system also for incoverage examples
– Error analysis: 69% transfer errors, 10% due to parse errors

Discrepancy between NIST and manual evaluation
– Suboptimal integration of generator, making training and
translation with large n-best lists infeasible
– Language and distortion models applied after generation
Conclusion


Integration of grammar-based generator into
dependency-based SMT system achieves state-ofthe-art NIST and improved grammaticality and
adequacy on in-coverage examples
Possibility of hybrid system since it is determinable
when sentences are in coverage of system
Grammatical Machine
Translation II
Ji Fang, Martin Forst,
John Maxwell, and
Michael Tepper
Overview over different
approaches to MT
Level of transfer
Transfer
Disambiguation
“Traditional” MT
(e.g. Systran)
String
(with minimal analysis)
Mainly hand-developed
rules
Heuristics
Statistical MT
(e.g. Google)
String
(morpholical analysis)
(synt. rearrangements)
Phrase correspondences
with statistics acquired on
bitexts
Machine-Learned
(transfer
probabilities, LM)
Grammatical MT
I (2006)
F-structure
Term-rewriting rules with
statistics induced from
parsed bitexts
Machine-Learned
(ME models, LM)
Context-Based
MT (Meaningful
Machines)
String
Semi-automatically
developed phrase pairs
Machine-Learned
(LM)
Grammatical MT
II (2008)
F-structure
Term-rewriting rules
without statistics induced
from semi-automatically
developed phrase pairs,
potentially bitexts
Machine-Learned
(ME models, LM)
Limitations of string-based
approaches




Transfer rules/correspondences of little
generality
Problems with long-distance dependencies
Perform less well for morphologically rich
(target) languages
N-gram LM-based disambiguation seems to
have leveled out
Limitations of string-based
approaches - little generality
From Europarl: Das tut mir leid. = I’m sorry [about that].


Google (SMT): I’m sorry. Perfect!
But: As soon as input changes a bit, we get garbage.
Das tut ihr leid. ‘She is sorry about that.’ 
It does their suffering.
Der Tod deines Vaters tut mir leid. ‘I am sorry about the death
of your father.’  The death of your father I am sorry.
Der Tod deines Vaters tut ihnen leid. ‘They are sorry about the
death of your father.’  The death of your father is doing
them sorry.
Limitations of string-based
approaches - problems with LDDs
From Europarl: Dies stellt eine der großen
Herausforderungen für die französische
Präsidentschaft dar . =
This is one of the major issues of the French
Presidency .

Google (SMT): This is one of the major challenges for
the French presidency represents.

Particle verb is identified and translated correctly
But: two verbs  ungrammatical; seem to be too far
apart to be filtered by LM

Limitations of string-based
approaches - rich morphology

Language pairs involving morphologically rich
languages, e.g., Finnish, are hard
From Koehn (2005, MT Summit)
Limitations of string-based
approaches - rich morphology

Morphologically rich,
free word order
languages, e.g.
German, are
particularly hard as
target languages.
Again from Koehn
(2005, MT Summit)
Limitations of string-based
approaches - n-gram LMs



Even for morphologically poor languages,
improving n-gram LMs becomes increasingly
expensive.
Adding data helps improve translation quality
(BLEU scores), but not enough.
Assuming best improvement rate observed
in Brants et al. (2007), ~400 million times
available data needed to attain human
translation quality by LM improvement.
Limitations of string-based
approaches - n-gram LMs




From Brants et al. (2007)
Best improvement
rate: +0.7 BP/x2
Would need 40 more
doublings to obtain
human translation
quality. (42 + 0.7*40
≈ 70)
Necessary training
data in tokens: 1e22
(1e10*2^40 ≈ 1e22)
4e8 times current
English Web
(estimate)
(2.5e13*4e8 = 1e22)
Limitations of bitext-based
approaches

Generally available bitexts are limited in size and
specialized in genre
– Parliament proceedings
– UN texts
– Judiciary texts (from multilingual countries)
 Makes it hard to repurpose bitext-based systems
to new genres

Induced transfer rules/correspondences often of
mediocre quality
– “Loose” translations
– Bad alignments
Limitations of bitext-based
approaches - availability and quality



Readily available bitexts are limited in size
and specialized in genre
Approaches to auto-extracting bitexts from
the web exist.
Additional data help to some degree, but
then effect levels out.
– Still a genre bias in bitexts, despite automatic
acquisition?
– Still more general problems with alignment quality
etc.?
Limitations of bitext-based
approaches - availability and quality


Much more data needed to attain human
translation quality
Logarithmic gains (at best) by adding bitext
data
From Munteanu & Marcu
(2005)
Base Line: 100K - 95M
English Words
Mid Line (+auto): + 90K 2.1M
Top Line (+oracle): + 90K 2.1M
Context-Based MT /
Meaningful Machines





Combines example-based MT (EBMT) and
SMT
Very large (target) language model, large
amount of monolingual text required
No transfer statistics, thus no parallel text
required
Translation lexicon is developed semiautomatically (i.e. hand-validated)
Lexicon has slotted phrase pairs (like EBMT),
i.e. “NP1 biss ins Gras.” =
“NP1
bit the dust.”
Context-Based MT /
Meaningful Machines - pros

High-quality translation lexicon seems to
allow for
– Easier repurposing of system(s) to new genres
– Better translation quality
From Carbonell (2006)
Context-Based MT /
Meaningful Machines - cons




Works really well for English-Spanish. How
about other language pairs?
Same problems with n-gram LMs as
“traditional” SMT; probably affects pairs
involving morphologically rich (target)
language particularly badly.
How much manual labor involved in
development of translation lexicon?
Computationally expensive
Grammatical Machine Translation


Syntactic transfer-based approach
Parsing and generation identical/similar
between GMT I pyramid
and GMT II
F-structure transfer rules
– transfer, score target FSs –
String-level statistical methods
Grammatical Machine Translation
GMT I vs. GMT II
GMT I

Transfer rules induced
from parsed bitexts

Target f-structures
ranked using individual
transfer rule statistics
GMT II


Transfer rules induced
from manually/semiautomatically constructed phrase lexicon
Target f-structures
ranked using
monolingually trained
bilexical dependency
statistics and general
transfer rule
statistics
GMT II


Where do the transfer rules come from?
Where do statistics/machine learning come
induced from manually/semiin?
automatically compiled phrase
pyramid
pairs with ``slots’’; potentially, but
not necessarily from bitexts
log-linear model
trained on synt.
annotated
monolingual
corpus
F-structure transfer rules
– transfer, score target FSs –
String-level statistical methods
log-linear model trained
on bitext data; includes
score from parse ranking
model and very general
transfer features
log-linear model trained
on bitext data; includes
scores from other two
models and
features/score of
monolingually trained
model for realization
ranking
GMT II - The phrase dictionary




Contains phrase pairs with ``slot’’ categories
(Ddeff, Ddef, NP1nom, NP1, etc.) that allow
for well-formed phrases without being
included in induced rules
Currently hand-written
Will hopefully be compiled (semi-)automatically from bilingual dictionaries
Bitexts might also be used; how exactly
remains to be defined.
GMT II - Rule induction from the
phrase dictionary



Sub-FSs of “slot” variables are not included
FS attributes can be defined as irrelevant for
translation, e.g. CASE (in both en and de), GEND (in
de). Attributes so defined are never included in
induced rules.
set-gen-adds remove CASE GEND
FS attributes can be defined as
“remove_equal_features”. Attributes defined as such
are not included in induced rules when they are
equal.
set remove_equal_features NUM OBJ
OBL-AG PASSIVE SUBJ TENSE
 more general rules
GMT II - Rule induction from the
phrase dictionary (noun)

Ddeff Verfassung = Ddef constitution

PRED(%X1, Verfassung),
NTYPE(%X1, %Z2),
NSEM(%Z2, %Z3),
COMMON(%Z3, count),
NSYN(%Z2, common)
==>
PRED(%X1, constitution),
NTYPE(%X1, %Z4),
NSYN(%Z4, common).
GMT II - Rule induction from the
phrase dictionary (adjective)

europäische = European

PRED(%X1, europäisch)
==>
PRED(%X1, European).

To accommodate certain non-parallelism with respect to SUBJs
of adjectives etc., special mechanism removes SUBJs of nonverbs and makes them addable in generation.
GMT II - Rule induction from the
phrase dictionary (verb)

NP1nom koordiniert NP2acc. =
NP1 coordinates NP2.

PRED(%X1, koordinieren),
arg(%X1, 1, %A2),
arg(%X1, 2, %A3),
VTYPE(%X1, main)
==>
PRED(%X1, coordinate),
arg(%X1, 1, %A2),
arg(%X1, 2, %A3),
VTYPE(%X1, main).
GMT II - Rule induction
(argument switching)

NP1nom tut NP2dat leid. =
NP2 is sorry about NP1.

PRED(%X1, leid#tun),
SUBJ(%X1, %A2),
OBJ-TH(%X1, %A3),
VTYPE(%X1, main)
==>
PRED(%X1,be),
SUBJ(%X1,%A3),
XCOMP-PRED(%X1,%Z1),
PRED(%Z1, sorry),
OBL(%Z1,%Z2),
PRED(%Z2,about),
OBJ(%Z2,%A2),
VTYPE(%X1,copular).
GMT II - Rule induction
(head switching)

Ich versuche nur, mich jeder Demagogie zu enthalten. =
It is just that I am trying not to indulge in demagoguery.
NP1nom Vfin nur. = It is ist just that NP1 Vs.

+ADJUNCT(%X1,%Z2), in_set(%X3,%Z2), PRED(%X3,nur),
ADV-TYPE(%X3,unspec)
==>
PRED(%Z4,be), SUBJ(%Z4,%X3), NTYPE(%X3,%Z5),
NSYN(%Z5,pronoun), GEND-SEM(%Z5,nonhuman), HUMAN(%Z5,), NUM(%Z5,sg), PERS(%Z5,3), PRON-FORM(%Z5,it),
PRON-TYPE(%Z5,expl_), arg(%Z4,1,%Z6), PRED(%Z6, just),
SUBJ(%Z6,%Z7), arg(%Z6,1,%A1), COMP-FORM(%A1,that),
COMP(%Z6,%A1), nonarg(%Z6,1,%Z7), ATYPE(%Z6,predicative),
DEGREE(%Z6, positive), nonarg(%Z4,1,%X3),
TNS-ASP(%Z4,%Z8), MOOD(%Z8,indicative), TENSE(%Z8,
pres), XCOMP-PRED(%Z4,%Z6), CLAUSE-TYPE(%Z4,decl),
PASSIVE(%Z4,-), VTYPE(%A2,copular).
GMT II - Rule induction
(more on head switching)

In addition to rewriting terms, system re-attaches
rewritten FS if necessary. Here, this might be the
case of %X1.

+ADJUNCT(%X1,%Z2), in_set(%X3,%Z2), PRED(%X3,nur),
ADV-TYPE(%X3,unspec)
==>
PRED(%Z4,be), SUBJ(%Z4,%X3), NTYPE(%X3,%Z5),
NSYN(%Z5,pronoun), GEND-SEM(%Z5,nonhuman), HUMAN(%Z5,), NUM(%Z5,sg), PERS(%Z5,3), PRON-FORM(%Z5,it),
PRON-TYPE(%Z5,expl_), arg(%Z4,1,%Z6), PRED(%Z6, just),
SUBJ(%Z6,%Z7), arg(%Z6,1,%A1), COMP-FORM(%A1,that),
COMP(%Z6,%A1), nonarg(%Z6,1,%Z7), ATYPE(%Z6,predicative),
DEGREE(%Z6, positive), nonarg(%Z4,1,%X3),
TNS-ASP(%Z4,%Z8), MOOD(%Z8,indicative), TENSE(%Z8,
pres), XCOMP-PRED(%Z4,%Z6), CLAUSE-TYPE(%Z4,decl),
PASSIVE(%Z4,-), VTYPE(%A2,copular).
GMT II - Pros and cons of rule
induction from a phrase dictionary

Development of phrase pairs can be carried out by someone
with little knowledge of grammar and transfer system; manual
development of transfer rules would require experts (for boring,
repetitive labor).

Phrase pairs can remain stable while grammars keep evolving.
Since transfer rules are induced fully automatically, they can
easily be kept in sync with grammars.

Induced rules are of much higher quality than rules induced from
parsed bitexts (GMT I).

Although there is hope that phrase pairs can be constructed
semi-automatically from bilingual dictionaries, it is not yet clear
to what extent this can be automated.

If rule induction from parsed bitexts can be improved, the two
approaches might well be complementary.
Lessons Learned for Parallel
Grammar Development


Absence of a feature like PERF=+/- is not
equivalent to PERF=-.
FS-internal features should not say anything
about the function of the FS
– Example: PRON-TYPE=poss instead of PRONTYPE=pers

Compounds should be analyzed similarly,
whether spelt together (de) or apart (en)
– Possible with SMOR
– Very hard or even impossible with DMOR
Absence of PERF  PERF=-
No function info in FS-internal
features

I think NP1 Vs. = In my opinion NP1 Vs.
Parallel analysis of compounds
More Lessons Learned for Parallel
Grammar Development

ParGram needs to agree on a parallel PRED
value for (personal) pronouns

We need an “interlingua” for numbers, clock
times, dates etc.

Guessers should analyze (composite) names
similarly
Parallel PRED values for
(personal) pronouns

Otherwise the number of rules we have to learn for
them explodes.
de-en: pro/er → he, pro/er → it, pro/sie → she, pro/sie → it,
pro/es → it, pro/es → he, pro/es → she
Also: PRED-NUM-PERS combination may make no
sense!!! Result: A lot of generator effort for nothing…
en-de: he → pro/er, she → pro/sie, it → pro/es, it → pro/er,
it → pro/sie, …
Interlingua for numbers, clock
times, dates, etc.

We cannot possibly
learn transfer rules
for all dates.
Guessed (composite) names
We cannot possibly
learn transfer rules
for all proper names
in this world.
And Yet More Lessons Learned for
Grammar Development

Reflexive pronouns - PERS and NUM
agreement should be insured via inside-out
function application, e.g. ((SUBJ ^) PERS)=
(^PERS).

Semantically relevant features should not be
hidden in CHECK
Reflexive pronouns

Introduce their own values for PERS and
NUM
– Overgeneration: *Ich wasche sich.
– NUM ambiguity for (frequent) “sich”
– Less generalization possible in transfer rules for
inherently reflexive verbs - 6 rules necessary
instead of 1.
Reflexive pronouns
Semantically relevant features in
CHECK


sie = they
Sie = you (formal)
Since CHECK features are
not used for
translations, the
distinction between “sie”
and “Sie” is lost.
Planned experiments - Motivation


We do not have the resources to develop a
“general purpose” phrase dictionary in the
short or medium term.
Nevertheless, we want to get an idea about
how well our new approach may scale.
Planned Experiments 1



Manually develop phrase dictionary for a few
hundred Europarl sentences
Train target FS ranking model and realization
ranking model on those sentences
Evaluate output in terms of BLEU, NIST and
manually
Can we make this new idea work under ideal
conditions? It seems we can.
Planned Experiments 2

Manually develop phrase dictionary for a few hundred Europarl
sentences

Use bilingual dictionary to add possible phrase
pairs that may distract the system

Train target FS ranking model and realization ranking model on
those sentences
Evaluate output in terms of BLEU, NIST and manually

How well can our system deal with the “distractors”?
Planned Experiments 3





Manually develop phrase dictionary for a few hundred Europarl
sentences
Use bilingual dictionary to add possible phrase pairs that may
distract the system
Degrade the phrase dictionary at various levels of
severity
– Take out a certain percentage of phrase pairs
– Shorter phrases may be penalized less than longer ones
Train target FS ranking model and realization ranking model on
those sentences
Evaluate output in terms of BLEU, NIST and manually
How good or bad is the output of the system when the
bilingual phrase dictionary lacks coverage?
Main Remaining Challenges





Get comprehensive and high-quality dictionary of
phrase pairs
Get more and better (i.e. more normalized and
parallel) analyses from grammars
Improve ranking models, in particular on source side
Improve generation behavior of grammars - So far,
grammar development has mostly been “parsingoriented”.
Efficiency, in particular on the generation side, i.a.
packed transfer and generation
Download