Generation in the Context of MT Final Report 8/22/2002

advertisement
Generation in the Context of MT
Final Report
8/22/2002
The Team
– Senior members & affiliate members
•
•
•
•
Jan Hajič, Charles Univ., Prague
Drago Radev, Univ. of Michigan
Gerald Penn, Univ. of Toronto
Jason Eisner, Johns Hopkins Univ.
Owen Rambow, Univ. of Pennsylvania
Dan Gildea, Univ. of Pennsylvania Bonnie Dorr, Univ. of Maryland
– Students:
• Yuan Ding, Univ. of Pennsylvania
• Terry Koo, MIT
• Jan Cuřín, Charles University
Martin Čmejrek, Charles Univ., Prague
Kristen Parton, Stanford Univ.
Ivona Kučerová, Charles University
– Pre-workshop work (Charles University):
• Zdeněk Žabokrtský
• Václav Honetschläger
• Vladislav Kuboň
8/22/2002
Petr Pajas
Alena Böhmová
Jiří Havelka
The Goal
• Generate English (linear surface form)
– from syntactic-semantic sentence representation
(so-called “tectogrammatical”, or TR)
• Possible application setting:
– machine translation
– other uses:
• Front-end for QA systems, summarization
• Evaluate under various circumstances
8/22/2002
Tectogrammatical Representation
According to his opinion
UAL’s executives were
misinformed about the financing
of the original transaction
8/22/2002
Tectogrammatical Representation
According to he opinion
UAL’s executive were
misinform about the financing
of the original transaction
8/22/2002
TR in Machine Translation
NULL
8/22/2002
Vedení UAL bylo podle jeho názoru o financování původní transakce nesprávně informováno.
The MT Framework
TR trees
WS’02
transfer
(tectogrammatics, TR)
deep syntax to surface syntax
word order
“surface” syntax
morphology/tagging
8/22/2002
lemmatized,POS
punctutation
lemmatized,POS
morphology (gen.)
Source language text
Target language text
CZECH
ENGLISH
The MT Framework
TR trees
AR trees
CZECH
8/22/2002
ENGLISH
Tools and Data Resources
• Tools:
• WS98 Czech parser + other Czech tools (tagger)
• GIZA (WS99) + ISI decoder
• Data:
•
•
•
•
PTB (40k sentences)
PTB translation to Czech (11k sentences)
Prague Dependency Treebank 1.0 (90k sentences)
Prague Dependency Treebank 2.0 preliminary
– 15k sentences manually annotated
• Monolingual data
8/22/2002
The Evaluation Metric: BLEU
• Plain English output (MT, Generation):
– difficult and/or expensive to evaluate subjectively
• BLEU (IBM):
–
–
–
–
automatic method, score 0..1
relative scores  subjective human evaluation
needs several reference “gold standards”
n-gram-based metric w/small-length penalty
• Different “local” evaluations throughout, too
8/22/2002
Presentation Outline
• The Systems and Their Inputs
– Getting the data & tools ready
• The Statistical Generation System
– The channel model
– Word order, Punctuation, Morphology
•
•
•
•
8/22/2002
The Hybrid Approach
Evaluation Results
Student Project Proposals
Conclusions and Future Directions
Where are we?
Transfer
English TR to AR
Deep syntax (Czech)
Word Order
Punctuation
Morphology
CZECH
8/22/2002
ENGLISH
The Systems and Their Inputs
Martin Čmejrek
8/22/2002
WS02GMT
System 1: statistical
System 2: hybrid
Output: English linear surface form
Input 1: automatically created English TR
Input 2: manually created English TR
Input 3: improved automatic English TR (PropBank)
Input 4: Czenglish TR (simple translation)
8/22/2002
Input 1: Automatic English TR
Penn Treebank v. 3
+ heads (Jason Eisner’s code + modifications)
+ lemmatization
+ word IDs
+ rule-based transformation to English AR, TR
(by Kučerová & Žabokrtský)
 English TR (I1), size: 40k sentences
8/22/2002
Input 2: Manual English TR
Penn Treebank v. 3
Input 1
+ manual annotation (correction) (IK)
including:
deep word order, conversion of grammatical codes
 English TR (I2), size: 1.5k sentences
8/22/2002
Input 3: Enhanced Automatic English TR
Penn Treebank v. 3
Input 1
+ PropBank
+ additional sources
 English TR (I3): size: 40k sentences
8/22/2002
Input 4: Automatic Czenglish TR
Linear Surface Czech
+ Czech tagging & lemmatization
+ Parsed to Czech AR, Czech TR
+ [Simple] Transfer (Lemma translation)
- lexical replacement
dictionary collected from web, MRDs +
trained on TR lemmas by GIZA
 “Czenglish” TR (I4): 11k sentences
8/22/2002
Dictionary Filtering
4 Czech/English
Dictionary Sources
(WinGED, GNU/FDL,
PCTrans, EuroWordNet)
Merging,
Pruning
Czech POS
English POS
Czech/English parallel
Penn TreeBank Corpus
GIZA++ Training
Czech/English
Dictionary for
Transfer
8/22/2002
Frequencies on English
Monolingual Corpus
(North American News Text)
365 M words
Word-by-word translation of TR
lemmas
- Word by word dictionary: 42 835 entries, 65408
translations
- format:
<e>tečka<t>N
<tr>spot<trt>N<prob>0.353598
<tr>dot<trt>N<prob>0.28792
<tr>full @stop<trt>N<prob>0.28729
- 1-1, 1-2 (2-1 translations not yet implemented)
- packed forest representation for multiple translation choice
- simplified version – choose the first best
8/22/2002
Where are we?
Transfer
w/additional info
English TR to AR
Deep syntax (Czech)
Word Order
Punctuation
Morphology
CZECH
8/22/2002
ENGLISH
Automatically Annotating a
Tectogrammatical Corpus
Owen Rambow
8/22/2002
Goal
• Use PropBank annotations to
– Improve automatic construction of English TRs
– Allow generation from “generic” pred-arg
structures
8/22/2002
Types of Corpus Annotation
•
•
•
•
•
8/22/2002
Surface Syntax
Deep Syntax
Local Lexical Semantics
Global Lexical Semantics
Hybrid: Deep Syntactic/Global Semantic
= Tectogrammatical level used here
Surface Syntax
E.g., Penn Treebank
loads
subj
John
loaded
prepobj
obj
hay
into
comp
trucks
John loads hay into trucks
8/22/2002
prepobj
prepobj
by
subj
hay
into
comp
John
is
comp
trucks
Hay is loaded into trucks
by John
Deep Syntax
E.g., TAG
load
subj
John
John loads hay into trucks
8/22/2002
obj2
obj
hay truck
Hay is loaded into trucks
by John
Local Semantics
Penn PropBank (brand new)
load
arg0
John
John loads hay into trucks
8/22/2002
arg1
arg2
hay
truck
John loads trucks with hay
Global Semantics
LCS (U. Md.)
load
agent
John
goal
theme
hay
truck
John loads hay into trucks
8/22/2002
throw
agent
John
goal
theme
hay truck
John throws hay into trucks
Tectogrammatical Representation
• First two syntactic arguments of verb: deep-syntactic
• All other arguments: global semantic
load
act
John
pat
acmp
hay
truck
John loads
trucks with hay
8/22/2002
load
act
John
throw
dir3
pat
hay
truck
John loads hay
into trucks
act
John
dir3
pat
hay truck
John throws
hay into trucks
Why Use TR? Research Hypothesis:
• Replacing function words by TR arc labels
makes transfer easier
• Choice of realization: target language-dependent
• Deep-syntactic labels for first two arguments:
realization more verb-specific
• Global semantic labels on remaining arguments:
realization just label-specific
8/22/2002
Available Resources for Input 3
• Surface syntax: PTB corpus (hand, checked)
• Deep syntax: derived automatically from PTB
(Chen01)
• Local semantics: PropBank corpus and frame lexicon
(hand, checked)
• Global semantics: LCS lexicon (partially hand,
partially checked)
• TR: PTB subset corpus (hand), PropBank  TR
dictionary (hand, not checked) (I. Kučerová)
8/22/2002
Experiment: Machine Learning
of TR Labels Using Ripper
• Ripper (Cohen 1996) = greedy symbolic
rule learner, set- and bag-valued features
• Features:
– Surface, deep syntactic info
– Local, global semantic info
– Kučerová’s PropBank  TR dictionary (handcrafted)
– Input 1 (Automatic English TR)
8/22/2002
Results (TR Label Error Rates)
Semantics
Syntax
none
local
localglobal
all
PB
TR
dict
none
58.8%
25.9%
23.7%
22.6% 37.7%
Input 1
19.5%
17.7%
16.3%
15.9% 17.1%
surfacedeep
surfacedeep-Inp1
16.5%
16.4%
17.1%
16.7% 16.2%
15.5%
15.9%
16.2%
16.1% 14.4%
Average accuracy on 5-fold cross-validation (1326 data points)
8/22/2002
Conclusions
• Machine learning can improve on handwritten conversion rules (= Input 1)
• PropBank is useful
• Best results:
– All syntactic features + PropBank  TR
dictionary
• Future work: use PropBank  LCS
dictionary (developed during workshop)
8/22/2002
Where are we?
Transfer
English TR to AR
Deep syntax (Czech)
Word Order
Punctuation
Morphology
CZECH
8/22/2002
ENGLISH
The MAGENTA System
• Statistically based
• The pipeline:
–
–
–
–
8/22/2002
TR to AR by a channel model
Word order by reordering on dep. trees
Punctuation insertion
Morphology
Where are we?
Transfer
English TR to AR
Deep syntax (Czech)
Word Order
Punctuation
Morphology
CZECH
8/22/2002
ENGLISH
The Tree-to-Tree Transductions
a
A
Jason Eisner .
c
b
d
f
B
C+D
E
prep
e
F
det
8/22/2002
Translating trees
a
A
c
b
d
inform
B
C+D misinform
learn this 2:1 mapping
(or in dictionary)
wrongly
E
f
prep
Also 1:2, 2:0, etc., &
rearrangements ...
e
F
8/22/2002
det
0:1 mapping
Translating trees
a
A
c
b
B
C+D
d
E
f
prep
e
F
8/22/2002
det
Statistical: Need a model of tree pairs
Mainly interested in (TR,AR) pairs
But our techniques are quite general
E.g., example below is not a (TR,AR) pair
“the girl kissed
her kitty cat”
“the girl gave a kiss
to her cat”
Pred,
Pred kissed
Subj girl
Subj,
Det
Det,the
Obj
Obj,cat
kitty
Det her
Det,
8/22/2002
S gave
S,
NP girl
NP,
Det,the
Det
NP,kiss
Det,a
Det
PP to
PP,
NP cat
NP,
Det
Det,her
Training: Our team has many tree pairs
Should be nicer to model than string pairs - why we built them!
What Czech trees went with what English trees in training?
... Learn parameters  of a joint model P(T1,T2).
“the girl kissed
her kitty cat”
“the girl gave a kiss
to her cat”
Pred,
Pred kissed
Subj girl
Subj,
Det
Det,the
Obj
Obj,cat
kitty
Det her
Det,
8/22/2002
S gave
S,
NP girl
NP,
Det,the
Det
NP,kiss
Det,a
Det
PP to
PP,
NP cat
NP,
Det
Det,her
Decoding: Complete a tree pair
Training: given T1 and T2 find  to maximize P(T1,T2)
Decoding: given T1 and 
find T2 to maximize P(T1,T2)
Horrible sparse data problem - can’t just do tree lookup.
“the girl kissed
her kitty cat”
Pred,
Pred kissed
Subj girl
Subj,
Det
Det,the
Obj
Obj,cat
kitty
Det her
Det,
8/22/2002
??
How should a model of tree pairs look?
Joint model P(T1,T2).
Wise to use noisy-channel form: P(T1 | T2) * P(T2)
But any joint model will do.
could be trained on zillions
of individual English AR trees
train on paired trees
could also take advantage of
English-Czech dictionaries
8/22/2002
How should a model P (T1,T2)
of tree pairs look?
Intuition: some kind of correspondence between words.
Try to learn correspondence using EM alignment
(could seed with a dictionary).
“the girl kissed
her kitty cat”
“the girl gave a kiss
to her cat”
Pred,
Pred kissed
Subj girl
Subj,
Det
Det,the
Obj
Obj,cat
kitty
Det her
Det,
8/22/2002
S gave
S,
NP girl
NP,
Det,the
Det
NP,kiss
Det,a
Det
PP to
PP,
NP cat
NP,
Det
Det,her
How should a model P (T1,T2)
of tree pairs look?
Intuition: some kind of correspondence between words.
Try to learn correspondence using EM alignment
(could seed with a dictionary).
“the girl kissed
her kitty cat”
“the girl gave a kiss
to her cat”
Pred,
Pred kissed
Subj girl
Subj,
Det
Det,the
kitty
Obj
Obj,cat
S gave
S,
NP girl
NP,
Det,the
Det
NP,kiss
Det,a
Det
Det her different, bad alignment!
Det,
8/22/2002
PP to
PP,
NP cat
NP,
Det
Det,her
How should a model P (T1,T2)
of tree pairs look?
Intuition: some kind of correspondence between words.
Try to learn correspondence using EM alignment
(could seed with a dictionary).
So model must consider alignment: P (T1,T2,A)
Why A is complicated:
•The correspondence isn’t 1 to 1
•Also need to model word order (indeed topology)
“the girl kissed
her kitty cat”
8/22/2002
• kiss  gave a kiss “the girl gave a kiss
• kitty cat  cat
to her cat”
•   to
Solution :
Use the right grammar formalism
Grammars can assemble words or phrases into trees.
Let’s work up to the “right” formalism.
Model must consider alignment: P (T1,T2,A)
Why A is complicated:
•The correspondence isn’t 1 to 1
•Also need to model word order (indeed topology)
“the girl kissed
her kitty cat”
8/22/2002
• kiss  gave a kiss “the girl gave a kiss
• cat  kitty cat
to her cat”
•   to
Context-Free Grammar
“the girl kissed her cat”
S
NP
VP
Det
N
the
girl
V
NP
Det
etc.
8/22/2002
N
Augment CFG nonterminals
with headwords
“the girl kissed her cat”
S
S,
S kissed
S,
S kissed
NP,
NP girl
Det,
Det
the
the
N,
girl
girl
VP,kissed
V,
kissed
Det
etc.
8/22/2002
NP
NP,girl
NP,
NP cat
N,cat
Det
Det,
N,
girl
VP,kissed
VP,kissed
V,
kissed NP
N,
the
girl
the
girl
NP,cat
Det
N,cat
Augment CFG nonterminals
with headwords
“the girl kissed her cat”
S,
S kissed
NP
VP,kissed
V,
kissed
kissed
etc.
8/22/2002
NP
look at all the rules
headed by kissed ...
Lexicalized Tree Substitution Grammar
“the girl kissed her cat”
S,
S kissed
NP
VP,kissed
V,
kissed
kissed
etc.
8/22/2002
NP
look at all the rules
headed by kissed ...
a natural chunk
= open role waiting to be filled
= can fill open roles higher up
Lexicalized Tree Substitution Grammar
“the girl kissed her cat”
S
S
S
NP
Det
the
VP
N
girl
kissed
8/22/2002
V
NP
NP
NP
Det
the
N
cat
Det
Det
VP
N
V
NP
NP
kissed
the
girl
Det
N
Lexicalized Tree Substitution Grammar
S
one “parse” of the tree
into elementary subtrees
NP
Det
the
VP
N
girl
kissed
8/22/2002
V
NP
NP
NP
Det
her
S
S
N
cat
Det
Det
VP
N
V
NP
NP
kissed
the
girl
Det
N
Dependency-Style
Lexicalized Tree Substitution Grammar
Simplify structure:
Eliminate extra internal nodes
Just one node per word (“dependency style”)
Yields the kind of AR and TR trees we actually have
S,kissed
S,
S (kissed)
NP
VP,(kissed)
V,
NP
NP
Pred,kissed
(kissed)
kissed
8/22/2002
NP
Subj
Obj
Dependency-Style
Lexicalized Tree Substitution Grammar
“the girl kissed
her kitty cat”
Pred
Pred,kissed
Pred,
Pred kissed
Subj girl
Subj,
Det
Det,the
Obj
Obj,cat
Obj
Obj,cat
kitty
Det,
Det her
8/22/2002
Subj
Subj,girl
Det
Det,the
kitty
Det
Det,her
Synchronous Dependency-Style
Lexicalized Tree Substitution Grammar
“the girl kissed
her kitty cat”
“the girl gave a kiss
to her cat”
Pred,
Pred kissed
Subj girl
Subj,
Det
Det,the
Obj
Obj,cat
kitty
Det her
Det,
8/22/2002
S gave
S,
NP girl
NP,
Det,the
Det
NP,kiss
Det,a
Det
PP to
PP,
NP cat
NP,
Det
Det,her
Synchronous Dependency-Style
Lexicalized Tree Substitution Grammar
“the girl kissed
her kitty cat”
“the girl gave a kiss
to her cat”
Pred,
Pred kissed
Subj girl
Subj,
Det
Det,the
Obj
Obj,cat
kitty
Det her
Det,
8/22/2002
S gave
S,
NP girl
NP,
Det,the
Det
NP,kiss
Det,a
Det
PP to
PP,
NP cat
NP,
Det
Det,her
Synchronous Dependency-Style
Lexicalized Tree Substitution Grammar
“the girl kissed
her kitty cat”
“the girl gave a kiss
to her cat”
S
Pred
Pred,kissed
Subj
Subj,girl
Det
Det,the
Obj
Obj
Obj,cat
kitty
Det
Det,her
8/22/2002
S,gave
NP
NP,girl
Det
Det,the
NP,kiss
Det
Det,a
PP
PP,to
NP
NP,cat
Det
Det,her
Synchronous Dependency-Style
Lexicalized Tree Substitution Grammar
“the girl kissed
her kitty cat”
“the girl gave a kiss
to her cat”
S
Pred
Pred,kissed
Subj
Subj,girl
Det
Det,the
Obj
Obj
Obj,cat
kitty
Det
Det,her
8/22/2002
S,gave
NP
NP,girl
Det
Det,the
NP,kiss
Det
Det,a
PP
PP,to
NP
NP,cat
Det
Det,her
Condition generation of t1, t2
on their joint root nonterminals
P(T1, T2, A) =  p(t1,t2,a | n)
So any aligned BIG TREE PAIR
is built from a set of aligned LITTLE TREE PAIRS
Pred
Pred,kissed
Subj
Subj,girl
Det
Det,the
8/22/2002
kitty
Obj
Obj
Obj,cat
S
S,gave
NP
NP,girl
Det
Det,the
NP,kiss
Det
Det,a
PP
PP,to
NP
NP,cat
P(T1, T2, A) =  p(t1,t2,a | n)
How This Simplifies Things
• Alignment: find A to max P(T1,T2,A)
• Decoding: find T2, A to max P(T1,T2,A)
• Training: find  to max A P(T1,T2,A)
• Do everything on little trees instead!
• Only need to train & decode a model of p(t1,t2,a)
• But not sure how to break up big tree correctly
8/22/2002
– So try all possible little trees
& all ways of combining them, by dynamic prog.
System Architecture
Probability Model p(t1,t2,a) of Little Trees
propose
little translations t2
score
little trees
update
parameters 
make p(...) big
find p(...)
raise p(...)
for each possible
t1, various
(t1,t2,a)
each
proposed
p(t1,t2,a)
Decoder
scores all alignments
between a big tree T1
& a forest of big trees T2
each
possible
p(t1,t2,a)
in T1, T2
inside-outside
estimated counts
p(t1,t2,a | T1,T2)
Trainer
scores all alignments
of two big trees T1,T2
dynamic programming engine
Viterbi
8/22/2002
alignment yields output T2
System Architecture
Probability Model p(t1,t2,a) of Little Trees
propose
little translations t2
score
little trees
update
parameters 
Decoder
Trainer
dynamic programming engine
output
8/22/2002
Related Work
• Synchronous grammars (Shieber & Schabes 1990)
– Statistical work has allowed only 1:1 (isomorphic trees)
• Stochastic inversion transduction grammars (Wu 1995)
• Head transducer grammars (Alshawi et al. 2000)
• Statistical tree translation
– Noisy channel model (Yamada & Knight 2000)
• Infers tree: trains on (string, tree) pair, not (tree, tree) pair
• But again, allows only 1:1, plus 1:0 at leaves
• Statistical tree generation - find most prob. expressing meaning
– Dynamic prog. search in packed forest (Langkilde 2000)
– Stack decoder (Ratnaparkhi 2000)
8/22/2002
Where are we?
Transfer
English TR to AR
Deep syntax (Czech)
Word Order
Punctuation
Morphology
CZECH
8/22/2002
ENGLISH
The Little Trees
Jan Hajič
p(
Pred,kissed
Subj
Obj
S,gave
NP
NP,kiss
Det
8/22/2002
PP
)
Probability Model p(t1,t2,a) of Little Trees
propose
little translations
t2
p(
Pred,kissed
Subj
update
parameters 
score
little trees
Obj
S,gave
NP
NP,kiss
PP
)
Det
• Data still sparse, but better than for big trees
• No alignment needed - already hypothesized for us
8/22/2002
Form of the model for 1:1 (AR:TR)
• Base form
– p(cat,PL,PAT,cat,NNS,Obj,alignment)
• High-level Backoff
– p(cat,cat) * p(PL,NNS) * p(PAT,Obj) * p(alignment)
• Low-level Backoff
– p(align) * (1/L*T*F) , where
• (L = size of <Tlemma,Alemma>, etc.)
8/22/2002
Non-1:1 Correspondences
• Joint model
– 0:1
• p(to,TO,AuxY,alignment)k01
– 1:0
• p(&Gen;NULL,ACT,align)k10
– 1:2
• p(home,SG,LOC,in,IN,AuxP,home,NNS,Adv,alignment)k12
– etc.; + corresponding backoff scheme
8/22/2002
Smoothing issues
• Other backoff schemes?
– Too many to do all
• Graphical models?
– Derive from (manual) alignments
• esp. for types of alignment the model cannot handle
(1:4, for example)
8/22/2002
Where are we?
Transfer
English TR to AR
Deep syntax (Czech)
Word Order
Punctuation
Morphology
CZECH
8/22/2002
ENGLISH
The Proposer
Yuan Ding
8/22/2002
Map TR to AR
Suggest
Suggest
<VBG>
Secretary
Secretary
<NNP>
American
8/22/2002
an
<T>
American
<J>
of
<IN>
Proposer for Decoder
• Collecting Feature Patterns on TR
• Construct AR using observed possible TRAR transform
• For unobserved TR, using naïve mapping
onto AR
8/22/2002
Proposer: Observes during Training
Observed
TR-AR
transform
TR-AR
transform
hash table
TR feature
pattern set
8/22/2002
Proposer: During Decoding
TR filled
TR-AR
pair
TR
feature
pattern
set
Query
with TR
Features
Found in
TR-AR
Transform
hash table?
NO
Construct
AR
naively
8/22/2002
Construct AR
using
YES
observed
transform
Example
Secretary
Secretary
Root.Trlemma=?
FirstChild.Functor=?
8/22/2002
of
APP
APP
State
Secretary
Look up in TR-AR
transform hash table
State
Where are we?
Transfer
English TR to AR
Deep syntax (Czech)
Word Order
Punctuation
Morphology
Evaluation
CZECH
8/22/2002
ENGLISH
The Classifier(s)
Terry Koo
8/22/2002
Tree Transduction Model
TR Little Tree
Proposer
TR + AR
Proposals
Prelabeled
Full TR Tree
8/22/2002
Tree
Transduction
Model’s Decoder
• Global information in labels
suppress proposals
Preposition Insertion Labeler
• C5.0 decision tree classifier
• Labels: nothing, insert_of, …
Gov
talk
Gov
talk
to
Jane
Self
insert_to
8/22/2002
Inserted
Preposition
Jane
Self
Preposition Insertion Labeler
• Trained on Input 1 (Automatic English TR)
TR Lemma, TR
Func, POS
Gov
TR Func,
POS
TR Func,
POS
8/22/2002
LBro
Self
LSon
TR Func,
POS
TR Func,
RBro
POS
RSon
TR Func,
POS
Preposition Insertion Labeler
• Some TR nodes should be ignored:
“fly to Baltimore and from Boston”
fly
fly
and
and
to
Baltimore
8/22/2002
and
and
from
Boston
to
Baltimore
from
Boston
Boosting Insertion Recall
• Overgenerating better than undergenerating
• Using C5.0’s misclassification costs to
discourage nothing
• Training on preposition-only data
8/22/2002
Boosting Insertion Recall
• N Best Labels
• Confidence Threshold
– N = Average of # Labels
• Aggressive Confidence Threshold
– N = Average of # Labels
8/22/2002
Insertion Recall vs N
Aggressive
Confidence
Threshold
N = 5, R = 84.35%
N = 3, R = 80.26%
N = 4, R = 80.59%
N = 3, R = 76.39%
N Best
8/22/2002
Confidence
Threshold
What should be done next?
• Clustering TR Lemmas into a tractable
number of classes
• “Ripper” instead of C5.0
8/22/2002
Where are we?
Transfer
English TR to AR
Deep syntax (Czech)
Word Order
Punctuation
Morphology
CZECH
8/22/2002
ENGLISH
Word Order
Dan Gildea
8/22/2002
Word order
• Tree-based models:
• Analytical level surface dependency, tree-based
• Collins model
• Uses function information (Sb, Obj, Atr, ...), POS,
lemmas
• 94% of nodes have correct ordering of children
(chance: 68%)
• No punctuation (inserted later)
• Input order completely irrelevant
8/22/2002
Where are we?
Transfer
English TR to AR
Deep syntax (Czech)
Word Order
Punctuation
Morphology
CZECH
8/22/2002
ENGLISH
Punctuation
-Morphology
Kristen Parton
8/22/2002
Punctuation Insertion: Motivation
• Important for sentence meaning, understanding
• BLEU - n-gram statistics
– commas are most frequent lemma in WSJ
• Focusing on commas (~95% of intra-sentence punctuation)
• Difficulties:
– English comma usage very flexible
– varies with style, meaning of sentence
– quotes not marked in TR trees
8/22/2002
Why insert commas separately?
• Commas depend not only on underlying
syntax/semantics but also on the surface realization
of the sentence.
•
Soon, she will realize her mistake.
• ? Soon she will realize her mistake.
•
She will soon realize her mistake.
• ?? She will, soon, realize her mistake.
• * She will soon, realize her mistake.
• * She will, soon realize her mistake.
• Channel model deals with unordered trees
• Easier to do comma insertion after surface ordering
8/22/2002
Commas in AR Trees
• TR tree - autosemantic
words - commas deleted
• AR tree - commas are
AuxX or Apos
(=apposition; governors)
• Input Data: ordered,
unpunctuated AR tree, with
AR and TR functors, POS
• Task: insert AuxX nodes
into AR tree, and link them
in correct surface order.
8/22/2002
Another Example
8/22/2002
Comma Insertion Model
• C5.0 decisions tree classifier
• Trained on English AR trees with TR functors (sect.
0-19 WSJ) with punctuation stripped
• Node Labels: NO-ACTION, INSERT-RIGHT
• Feature vectors:
• Local features (AFun, TFun, POS)
– For node, left/right brother, parent, grandparent
• Global features (Zhang 02) (position in sentence, …)
8/22/2002
Decision Tree Model: Results
System
Implementation
Data
Baseline
Final
Accuracy
Reduction
in error rate
Beeferman et al (1998) Trigram LM with
WSJ
gap insertion penalty
32.90%
54.00%
31.45%
Zhang et al (2002)
Trigram LM with
hidden tags
Tech
manuals
56.44%
76.90%
46.97%
Zhang et al (2002)
Decision tree on
Amalgam
Tech
manuals
56.35%
74.94%
42.59%
Dependency tree*
Decision tree on AR
tree
WSJ
41.61%
75.70%
58.38%
*Preliminary results - still based on hand parsed WSJ
• Evaluation metric is sentence accuracy
– What is (human) upper bound?
• Systems are hard to compare; models and data sets very different
8/22/2002
Results for Generation
• Comma insertion improves BLEU score
• Possible improvements:
– Adding n-gram information to insertion model
– Trying with other punctuation marks
8/22/2002
Surface Morphology
• Morphology dictionary - 365 M words (Cuřín)
• morpha (morph analyzer) - lemmatize words, keep
counts
• Word -> POS surface_form lemma
–
–
–
–
NN
VBD
VBD
VBG
want-you-babe
wanted
wanting
wanting
want-you-babe
want
want
want
frequency
1
45595
1
3708
• Task: Lemma + POS -> surface form = reverse lookup
– Clashes resolved by frequency
8/22/2002
Surface Morphology Results
• OOV rate: 1.69 %
– For many, surface form = lemma, so correct by default
– English morphology not complex; most OOV are proper nouns
• Non-OOV words: 99.74% accuracy
– 86% of mistakes were contractions: 'm ~ am, 've ~ have, etc.
(Actually correct.) Ignoring these: 99.96% correct rate
• Overall, ignoring contractions: error rate of 0.03%
• High accuracy rate, fast runtime, good coverage unnecessary to improve more
8/22/2002
Where are we?
Transfer
English TR to AR
Deep syntax (Czech)
Word Order
Punctuation
Morphology
CZECH
8/22/2002
ENGLISH
Improving Czech Parsing (AR-TR)
Gerald Penn
8/22/2002
Improving Czech TR Parsing
• Pre-workshop state:
– Czech Deep Syntax: mapping AR to TR
(Boehmova, Honetschlaeger, Zabokrtsky)
– Two parts of the system
• rule-based
– 19 transformations by order-dependent perl code
• statistical
– C4.5-based labeling of TR functions; 84% accuracy
8/22/2002
Czech Deep Syntax:
mapping AR to TR
• New statistical system:
– tree transduction model has same form as for
generation
– little-tree model reversed for parsing
(AR to TR mapping)
– initial EM pass uses simple model based on
PDT (manual) node ID alignment
– (reversed) proposer not finished yet
8/22/2002
Where are we?
Transfer
English TR to AR
Deep syntax (Czech)
Word Order
Punctuation
Morphology
CZECH
8/22/2002
ENGLISH
The Hybrid Approach to
Generation: The ARGENT System
Dragomir Radev
8/22/2002
Example
Target sentence
Alan Spoon, recently named Newsweek
president, said Newsweek’s ad rates would
increase 5 pct in January.
PRED
say
APPS
,
ACT
Spoon
RSTR
Alan
PAT
increase
ACT
name
TWHEN
recently
PAT
president
ACT
&Gen;
8/22/2002
ACT
rate
APP
Newsweek
RSTR
Newsweek
EXT
pct
RSTR
ad
RSTR
5
TWHEN
January
PRED
say
PAT
increase
APPS
,
ACT
Spoon
RSTR
Alan
TWHEN
recently
PAT
president
ACT
&Gen;

ACT
rate
ACT
name
APP
Newsweek
EXT
pct
RSTR
ad
TWHEN
January
RSTR
5
RSTR
Newsweek
Say , increase Spoon name rate pct January Alan recently president Newsweek ad 5 Newsweek
Alan Spoon, recently named Newsweek president, said Newsweek ’s ad rates would increase 5 pct in January.
8/22/2002
NLG architectures
• Statistical approaches
– MAGENTA [Hajič et al. 02]
• Rule-based approaches
– FUF/Surge [Elhadad 93, Elhadad and Robin 98]
– KPML [Bateman 97]
• Hybrid approaches
–
–
–
–
8/22/2002
NitroGen [Knight and Hatzivassiloglou 95]
HaloGen [Langkilde and Knight 00]
Fergus [Bangalore and Rambow 00]
ARGENT
FUF/Surge
•
What FUF can do (given sufficient control information)
–
–
–
–
–
–
–
•
What FUF cannot do
–
–
–
–
–
8/22/2002
Maps FUF-style thematic structure onto syntactic roles
Performs syntactic paraphrasing and alternations (e.g., dative move, passive)
Provides defaults for syntactic features (e.g., present tense, third person)
Propagates agreement features
Selects closed class words
Inflects words
Provides linear precedence constraints among syntactic constituents
convert dependency to phrase-structure
provide control for syntactic paraphrasing
provide control for lexical features (conditionals, past tense, …)
choose determiners
provide a robust grammar
(setq r '((process ((lex "say")
(tense past)
(object-clause that)))
(circum ((time ((cat pp)
(prep ((lex "in")))
(np ((lex "January")
(determiner none)))))))
(partic ((affected ((cat clause)
(process ((lex "increase")
(tense past)))
(partic ((created ((cat measure)
(quantity ((value 5)))
(unit ((lex "pct")))))
(agent ((cat np)
(head ((lex "rate")
(number plural)
(determiner none)))
(classifier ((lex "ad")
(determiner none)))
(possessor ((lex "Newsweek")
(determiner none)))))))))
(agent ((complex apposition)
(punctuation ((after ",")))
(distinct ~(((lex "Spoon")
(classifier ((lex "Alan")))
(determiner none))
((lex "name")
(classifier ((lex "president")))
(determiner none))))))))))
8/22/2002
PRED
say
TR
APPS
,
ACT
Spoon
RSTR
Alan
PAT
increase
ACT
name
TWHEN
recently
ACT
rate
PAT
president
ACT
&Gen;
APP
Newsweek
EXT
pct
RSTR
ad
TWHEN
January
RSTR
5
RSTR
Newsweek
FUF
PARTIC
AFFECTED
PROCESS
CAT
clause
PARTIC
PROCESS
CREATED
LEX
say
CIRCUM
AGENT
AGENT
TENSE OBJECT-CLAUSE
past
that
CAT
8/22/2002
CAT
HEAD
CLASSIFIER
POSSESSOR
LEX
Newsweek
CAT
pp
NP
PREP
LEX
in
LEX
January
DETERMINER
none
Grammar development
• translating TG  FUF (deterministic channel)
– write high coverage rules first
– problem: no aligned training data
– four types of rules [Langkilde-Geary 02] - recasting,
ordering, filling, morphing
• Three modules
– Top-level
– Recursion
– Bottom-level
8/22/2002
Evaluation
• Robustness
– ARGENT: 245/248 sentences = 98.7%
– HaloGen: 80%
• Speed
– ARGENT: 1.4-2.9 sec/sentence
– HaloGen: 28.9-55.5 sec/sentence
• BLEU score -- later
8/22/2002
Future work
• Complete grammar
– improve coverage
– use other grammatemes
• degree of comparison (comparative) , sentmod (interrogative),
verbmod (imperative)
• Better error recovery
– inconsistent PTB markup, TR transformation, translation
• Grammar induction
• N-gram based insertion of missing words
• Integrate with MAGENTA
8/22/2002
Where are we?
Transfer
English TR to AR
Deep syntax (Czech)
Word Order
Punctuation
Morphology
Evaluation
CZECH
8/22/2002
ENGLISH
The Implemented Systems:
Creating Data for Generation
  Czech Tagger & Parser (WS98, pre-WS02)
  Czech-English Transfer (WS99, pre-WS02)
  New Statistical Czech Parser to TR
  Input3: Improved English TR for training
8/22/2002
The Generation Systems
  Aligner and Decoder
  Little Tree Joint Model
  Proposer
  Preposition Classifier
  Word Order by Tree LM
  Comma insertion
  Morphology
  The Hybrid System (TR to FUF translation)
8/22/2002
Evaluation
• Evaluation data for BLEU (1-4grams)
– devtest/evaltest: 248/249 sentences, 5 ref. translations
• Inputs
–
–
–
–
1: Automatic English TR
2: Manual English TR
3: Enhanced Automatic English TR
4: Automatic Czenglish TR
• Systems: Statistical, Hybrid (FUF-based)
8/22/2002
Upper estimate
• 5 reference translations
– 1 original WSJ text from PTB
– 4 retranslations from Czech to English
• 2 US, 2 Czech
• Evaluate the translations:
– take one out
– evaluate against remaining 4
– Average BLEU score: 0.556
8/22/2002
Results
• Input 1 (Automatic English TR)
BLEU SCORE
Baseline
(Lemmas, rand.)
MAGENTA
(best)
ARGENT
(best)
8/22/2002
Devtest
Evaltest
0.040
0.042
0.253
0.244
0.155
0.145
Results
• Input 2 (Manual English TR)
BLEU SCORE
Baseline
(Lemmas, Deep WO)
MAGENTA
(best)
ARGENT
(best)
8/22/2002
Devtest
Evaltest
0.190
0.160
0.237
0.237
Results
• Input 3 (Improved Auto English TR)
BLEU SCORE
Baseline
(Lemmas, rand.)
MAGENTA
ARGENT
(best)
8/22/2002
Devtest
Evaltest
0.041
0.243
Results
• Input 4 (“Czenglish”Automatic TR)
BLEU SCORE
GIZA (large LM)
Baseline
(Lemmas, Czech WO)
MAGENTA
(best)
ARGENT
(best)
Devtest Evaltest
Evaltest 1g
0.210
0.190
0.623
0.082
0.059
0.481
0.064
0.042
0.601
0.015
Unigram BLEU score for the reference set 0.844
8/22/2002
Conclusions and Future Work
8/22/2002
The Good News and the Bad News
• Good news:
– End-to-end, tree-transformation system running
• Written in 4 weeks, fully trainable from data
• Generates from “semantic” (TR) English significantly
better than the baseline
– Datasets developed for generation/MT, evaluation
• Bad news:
– not fully integrated (proposer, little tree model)
– on full MT, cannot beat baseline (and yes, GIZA)
8/22/2002
Things To Do (1)
• Integrate the proposer
• Integrate the preposition classifier
• Write more classifiers, integrate
– classifiers running in parallel/sequentially?
• True EM smoothing (by adaptation of aligner)
• Make the system more modular
– e.g., declarative specification of smoothing
8/22/2002
Things To Do (2)
• The aligner/decoder:
– Pruning during aligning/decoding
– Better smoothing of the little tree model
– More dependence among little trees
• through shared nonterminals or lexicalized nonterminals
– Little-tree joint model  noisy channel model
• i.e., integrate Gildea’s tree LM “directly”
– Better initial model for EM
• ML training off manual alignments
8/22/2002
• Nondeterministic transfer
Things To Do (3)
• Make use of TR’s deep (discourse) word order
• More experiments
• with new smoothing, integrated proposer
• different order of modules
• other punctuation: classifier or inside the model?
• Different settings/applications:
• AR to TR (parsing)
• AR to AR (surface translation)
• TR to TR (translation); other languages
8/22/2002
The End
The beginning!
8/22/2002
Download