Factors Affecting the Accuracy of Korean Parsing Tagyoung Chung Matt Post

advertisement
Factors Affecting the Accuracy of
Korean Parsing
Tagyoung Chung
Matt Post
Daniel Gildea
Department of Computer Science
University of Rochester
1
Korean
2
Complicating factors
• Rich morphology
– 㥋㟃㨋㣨㥽㐹㣥㖦㒫?
= negative + see + passive + honorific
+ past + future + formal + indicative + interrogative
• Scrambling
㩕㨋
㜔㛽㦂㐳
㫾㧺
㩳㥽㖰
.
John-NOM
Mary-DAT
book-ACC
give-PAST-DEC
.
–
• Null anaphora
– 㩞㥕㥱?
Did you like it?
3
Korean treebank
• Newswire text (LDC2000T45)
• 5K sentences, 132K words, 14K unique morphemes
• Tokenized & allomorph neutralized
• Penn Treebank-like annotations (Han et al., 2001)
4
Parsing with probabilistic context-free grammars
5
Initial experiments
F1
F1≤40
types
tokens
Korean
52.78
56.55
6.6K
194K
English (§02–04)
72.20
73.29
7.5K
147K
English (§02–21)
71.61
72.74
23K
950K
• Standard PCFG
• All N ULL elements and function tags removed
• Nonterminal to terminal ratio is similar
• Data sparsity may not be the main problem
6
Initial observations
• KTB sentences are much longer
Mean length of 48 instead of 23 for PBT
• Ambiguous rules
NP → NP NP NP NP (6)
NP → NP NP NP NP NP (3)
NP → NP NP NP NP NP NP (2)
NP → NP NP NP NP NP NP NP (2)
NP → NP NP NP NP occurs only once in PTB
• Coarse nonterminals (43 vs. 72) preterminals (33 vs. 44)
7
Function tags
SBJ
subject with nominative case marker
OBJ
complement with accusative case marker
COMP
complement with adverbial postposition
ADV
NP that function as adverbial phrase
VOC
noun with vocative case maker
LV
NP coupled with “light” verb construction
NP- SBJ
NPR
NNC
PAU
㣙㧗㭐
㞇㡐
㧹
• Mark grammatical functions of NP and S nodes
• Show child nodes’ morphological information
• (S → NP-SBJ S) is twice as common as (S → NP-OBJ S)
8
Parsing with function tags
w/o function tags
w/ function tags
F1
F1≤40
F1
F1≤40
Korean
52.78
56.55
56.18
60.21
English (§02–04)
72.20
73.29
70.50
71.78
English (§02–21)
71.61
72.74
72.82
74.05
• Evaluated against the same test set (without function tags)
• Nonterminals are too coarse without function tags
• Further improvements?
9
Latent annotations
• Learning refined grammar improves English parsing
Parent annotation (Johnson, 1998) lexicalization (Collins, 1999)
• Petrov et al. (2006) introduce automatic learning
of latent annotation using split and merge EM
1. Split a symbol into two subcategories using EM
2. Merge it back if loss in likelihood for merging is small
3. Additive smoothing
4. Repeat
10
Parsing with latent annotations
w/ function tags
latent annotation
F1
F1≤40
F1
F1≤40
Korean
56.18
60.21
79.93
82.04
English (§02–04)
70.50
71.78
85.21
English (§02–21)
72.82
74.05
89.21
• After five cycles
• Not directly comparable but obvious improvement
11
N ULL elements
• N ULL elements are prevalent in KTB
• Zero pronouns are especially common (1.8 per sentence)
Dropped wherever pragmatically inferable
• Do N ULL elements affect parsing?
Train parser with N ULL elements
Parse text with N ULL elements
Evaluate it against the reference without N ULL elements
12
Parsing with N ULL elements
English (§02–21)
Korean
F1
F1≤40
coarse
71.61
72.74
w/ N ULLs
73.29
74.38
w/ verb ellipses
52.85
56.52
w/ traces
55.88
59.42
w/ relative construction markers
56.74
59.87
w/ zero pronouns
57.56
61.17
latent (5) w/ N ULLs
89.56
91.03
• Adding r.c. markers and zero pronouns has the largest impact
• Latent annotation with N ULLs produces the best result
13
Parsing with tree substitution grammars
14
Tree substitution grammars
SBARQ
SBARQ
WHNP
SQ
WHNP
.
WP
WHNP
.
SQ
.
?
→
who
WP
?
who
• Nonterminals can be rewritten as tree fragments of any size
• Learning TSG can be a challenge
15
Spinal grammar
• Chiang (2000) proposed the heuristic for learning TAG
Node with different head from its parent becomes new root
• Learning TSG with spinal grammar requires head rules
Created head rules that maximally project important morphemes
S
NP-SBJ
VP
SFN
NPR
NNC
PAU
NP-ADV
VP
㣙㧗㭐
㞇㡐
㧹
DAN
NNC
Schwartz
doctor
TOPIC
㒔
㘏
NNC
XSV
EPF
EFN
that
afterwards
㲩㑌
㖾㲠
㥽
㖰
discharge
PASSIVE
PAST
DEC
.
VV
16
Bayesian Learning of TSG
• Similar techniques are independently proposed by
Cohn et al. (2009) O’Donnell et al. (2009) Post and Gildea (2009)
– DP prior prevents learning unnecessarily large rules
– Gibbs sampling makes space complexity manageable
– Algorithm visits every node and decides to join or split
according to probability given by the sampler
17
Parsing with TSG
model
F1
F1≤40
size
CFG (coarse)
52.78
56.55
5.4K
spinal (head left)
59.49
63.33
49K
NP
NPR
NNC
NNU
㨆㧙
spinal (head right)
66.05
69.96
29K
spinal (head rules)
66.28
70.61
29K
induced
68.93
73.79
16K
NNX
㜼
• TSG shows improvement over CFG
• Head rules show modest improvement over simple baseline
• Induced TSG is the best and lends itself to further analysis
• English parsing experiments show similar trend
18
Word order
• Korean shows long distance scrambling (Rambow and Lee, 1994)
It is permissible but is it common?
• KTB is from newswire which maintains more rigid word order
– SOV is the most common word order but OSV is also permitted
– Analysis of KTB shows SOV sentences 63.5 times more numerous
– However, order is not completely fixed even in the formal writing
• Free word order does not apply to all constituents
Morphemes always agglutinate in fixed order
19
Conclusion
• KTB nonterminals are underspecified
Refined annotations bring considerable improvement
• Prevalence of N ULL elements is a challenge
Possible solutions include:
– Automatically insert N ULL elements
– Special annotation for parent/sibling with deleted nodes
• Word order may not be a huge problem
further investigation is needed
• Potential implications for machine translation
20
Questions?
21
Download