slides

advertisement
Learning Accurate, Compact, and
Interpretable Tree Annotation
Slav Petrov, Leon Barrett, Romain Thibaux,
Dan Klein
The Game of Designing a Grammar
 Annotation refines base treebank symbols to
improve statistical fit of the grammar
 Parent annotation [Johnson ’98]
The Game of Designing a Grammar
 Annotation refines base treebank symbols to
improve statistical fit of the grammar
 Parent annotation [Johnson ’98]
 Head lexicalization [Collins ’99, Charniak ’00]
The Game of Designing a Grammar
 Annotation refines base treebank symbols to
improve statistical fit of the grammar
 Parent annotation [Johnson ’98]
 Head lexicalization [Collins ’99, Charniak ’00]
 Automatic clustering?
Previous Work:
Manual Annotation
[Klein & Manning ’03]
 Manually split categories
 NP: subject vs object
 DT: determiners vs demonstratives
 IN: sentential vs prepositional
 Advantages:
 Fairly compact grammar
 Linguistic motivations
 Disadvantages:
F1
 Performance leveled out Model
 Manually annotated
Naïve Treebank Grammar 72.6
Klein & Manning ’03
86.3
[Matsuzaki et. al ’05,
Prescher ’05]
Previous Work:
Automatic Annotation Induction
 Advantages:
 Automatically learned:
Label all nodes with latent variables.
Same number k of subcategories
for all categories.
 Disadvantages:
 Grammar gets too large
 Most categories are
oversplit while others
are undersplit.
Model
Klein & Manning ’03
Matsuzaki et al. ’05
F1
86.3
86.7
Previous work is complementary
Manual Annotation This Work
Automatic Annotation
Allocates splits where needed
Very tedious
Compact Grammar
Misses Features
Splits uniformly
Automatically learned
Large Grammar
Captures many features
Learning Latent Annotations
Forward
EM algorithm:
 Brackets are known
 Base categories are known
 Only induce subcategories
X1
X2
X3
X7
X4
X5
X6
.
Just like Forward-Backward
for HMMs.
He
was
right
Backward
Limit of
computational
resources
Overview
Parsing accuracy (F1)
90
k=16
k=8
85
k=4
80
k=2
75
70
65
- Hierarchical Training
- Adaptive Splitting
- Parameter Smoothing
k=1
60
50
250
450
650
850
1050
1250
Total Number of grammar symbols
1450
1650
Refinement of the DT tag
DT
DT-1
DT-2
DT-3
DT-4
Refinement of the DT tag
DT
Hierarchical refinement of the DT tag
Hierarchical Estimation Results
90
Parsing accuracy (F1)
88
86
84
82
80
78
76
74
100
300
500
Model
F1
1300
1500
1700
Baseline
87.3
Total Number of grammar symbols
Hierarchical Training 88.4
700
900
1100
Refinement of the , tag
 Splitting all categories the same amount is
wasteful:
The DT tag revisited
Oversplit?
Adaptive Splitting
 Want to split complex categories more
 Idea: split everything, roll back splits which
were least useful
Adaptive Splitting
 Want to split complex categories more
 Idea: split everything, roll back splits which
were least useful
Adaptive Splitting
 Want to split complex categories more
 Idea: split everything, roll back splits which
were least useful
Adaptive Splitting
 Evaluate loss in likelihood from removing each
split =
Data likelihood with split reversed
Data likelihood with split
 No loss in accuracy when 50% of the splits are
reversed.
Adaptive Splitting Results
Parsing accuracy (F1)
90
88
86
84
82
80
50% Merging
78
Hierarchical Training
76
Flat Training
74
100
300
Model
F1
1300
1500
1700
Previous
88.4
Total Number of grammar symbols
With 50% Merging 89.5
500
700
900
1100
0
LST
ROOT
X
WHADJP
RRC
SBARQ
INTJ
WHADVP
UCP
NAC
FRAG
CONJP
SQ
WHPP
PRT
SINV
NX
PRN
WHNP
QP
SBAR
ADJP
S
ADVP
PP
VP
NP
Number of Phrasal Subcategories
40
35
30
25
20
15
10
5
0
LST
ROOT
X
WHADJP
RRC
SBARQ
INTJ
WHADVP
UCP
NAC
FRAG
CONJP
SQ
WHPP
PRT
SINV
NX
N
P
PRN
30
WHNP
35
QP
40
SBAR
ADJP
S
ADVP
PP
VP
NP
Number of Phrasal Subcategories
VP
PP
25
20
15
10
5
0
LST
ROOT
X
WHADJP
RRC
NA
C
SBARQ
INTJ
WHADVP
10
UCP
15
NAC
FRAG
CONJP
SQ
WHPP
PRT
SINV
NX
PRN
WHNP
QP
SBAR
ADJP
S
ADVP
PP
VP
NP
Number of Phrasal Subcategories
40
35
30
25
20
X
5
30
20
0
NNP
JJ
NNS
NN
VBN
RB
VBG
VB
VBD
CD
IN
VBZ
VBP
DT
NNPS
CC
JJR
JJS
:
PRP
PRP$
MD
RBR
WP
POS
PDT
WRB
-LRB.
EX
WP$
WDT
-RRB''
FW
RBS
TO
$
UH
,
``
SYM
RP
LS
#
Number of Lexical Subcategories
70
60
50
40
PO
S
T
O
,
10
60
50
40
30
0
NNP
JJ
NNS
NN
VBN
RB
VBG
VB
VBD
CD
IN
VBZ
VBP
DT
NNPS
CC
JJR
JJS
:
PRP
PRP$
MD
RBR
WP
POS
PDT
WRB
-LRB.
EX
WP$
WDT
-RRB''
FW
RBS
TO
$
UH
,
``
SYM
RP
LS
#
Number of Lexical Subcategories
70
R
B
VBx
IN
DT
20
10
70
60
50
40
30
0
NNP
JJ
NNS
NN
VBN
RB
VBG
VB
VBD
CD
IN
VBZ
VBP
DT
NNPS
CC
JJR
JJS
:
PRP
PRP$
MD
RBR
WP
POS
PDT
WRB
-LRB.
EX
WP$
WDT
-RRB''
FW
RBS
TO
$
UH
,
``
SYM
RP
LS
#
Number of Lexical Subcategories
NN
P
JJ
NN
S
N
N
20
10
Smoothing
 Heavy splitting can lead to overfitting
 Idea: Smoothing allows us to pool
statistics
Linear Smoothing
Result Overview
90
Parsing accuracy (F1)
88
86
84
82
80
50% Merging and Smoothing
78
50% Merging
Hierarchical Training
76
Flat Training
74
100
300
500
700
Total Number of grammar symbols
900
1100
Result Overview
90
Parsing accuracy (F1)
88
86
84
82
80
50% Merging and Smoothing
78
50% Merging
Hierarchical Training
76
Flat Training
74
100
300
500
700
Total Number of grammar symbols
900
1100
Result Overview
90
Parsing accuracy (F1)
88
86
84
82
80
50% Merging and Smoothing
78
50% Merging
Hierarchical Training
76
Flat Training
74
100
300
Model
F1
900
1100
Total Number of grammar symbols
Previous
89.5
With Smoothing 90.7
500
700
Final Results
F1
≤ 40 words
F1
all words
Klein & Manning ’03
86.3
85.7
Matsuzaki et al. ’05
86.7
86.1
This Work
90.2
89.7
Parser
Final Results
F1
≤ 40 words
F1
all words
Klein & Manning ’03
86.3
85.7
Matsuzaki et al. ’05
86.7
86.1
Collins ’99
88.6
88.2
Charniak & Johnson ’05
90.1
89.6
This Work
90.2
89.7
Parser
Linguistic Candy
 Proper Nouns (NNP):
NNP-14
Oct.
Nov.
Sept.
NNP-12
John
Robert
James
NNP-2
J.
E.
L.
NNP-1
Bush
Noriega
Peters
NNP-15
New
San
Wall
NNP-3
York
Francisco
Street
 Personal pronouns (PRP):
PRP-0
It
He
I
PRP-1
it
he
they
PRP-2
it
them
him
Linguistic Candy
 Relative adverbs (RBR):
RBR-0
further
lower
higher
RBR-1
more
less
More
RBR-2
earlier
Earlier
later
 Cardinal Numbers (CD):
CD-7
one
two
Three
CD-4
1989
1990
1988
CD-11
million
billion
trillion
CD-0
1
50
100
CD-3
1
30
31
CD-9
78
58
34
Conclusions
 New Ideas:
 Hierarchical Training
 Adaptive Splitting
 Parameter Smoothing
 State of the Art Parsing Performance:
 Improves from X-Bar initializer 63.4 to 90.2
 Linguistically interesting grammars to sift
through.
Thank You!
petrov@eecs.berkeley.edu
Other things we tried
 X-Bar vs structurally annotated grammar:
 X-Bar grammar starts at lower performance, but provides
more flexibility
 Better Smoothing:
 Tried different (hierarchical) smoothing methods, all
worked about the same
 (Linguistically) constraining rewrite
possibilities between subcategories:
 Hurts performance
 EM automatically learns that most subcategory
combinations are meaningless: ≥ 90% of the possible
rewrites have 0 probability
Download