result

advertisement
EMPIRICAL INVESTIGATIONS OF
ANAPHORA AND SALIENCE
Massimo Poesio
Università di Trento and
University of Essex
Vilem Mathesius Lectures
Praha, 2007
Plan of the series



Wednesday: Annotating context dependence,
and particularly anaphora
Yesterday: Using anaphorically annotated
corpora to investigate local & global salience
Today: Using anaphorically annotated
corpora to investigate anaphora resolution
Today’s lecture




The Vieira / Poesio work on robust definite
description resolution
Bridging references
Discourse-new
(If time allows):Task-oriented evaluation
Massimo Poesio:
Add better examples (e.g.,
from The book of evidence)
Preliminary corpus study (Poesio
and Vieira, 1998)
Annotators asked to classify about 1,000 definite
descriptions from the ACL/DCI corpus (Wall Street
Journal texts) into three classes:
DIRECT ANAPHORA: a house … the house
DISCOURSE-NEW:
the belief that ginseng tastes like spinach is more
widespread than one would expect
BRIDGING DESCRIPTIONS:
the flat … the living room; the car … the vehicle
Poesio and Vieira, 1998
Results:
More than half of the def descriptions are first-mention
Subjects didn’t always agree on the classification of an
antecedent (bridging descriptions: ~8%)
The Vieira / Poesio system for robust
definite description resolution
Follows a SHALLOW PROCESSING approach (Carter,
1987; Mitkov, 1998): it only uses
Structural information (extracted from Penn Treebank)
Existing lexical sources (WordNet)
(Very little) hand-coded information
(Vieira & Poesio, 1996 / Vieira, 1998 /
Vieira & Poesio, 2001)
Methods for resolving direct anaphors
DIRECT ANAPHORA:
the red car, the car, the blue car:
premodification heuristics
segmentation: approximated with ‘loose’ windows
Methods for resolving discoursenew definite descriptions
DISCOURSE-NEW DEFINITES
the first man on the Moon, the fact that Ginseng
tastes of spinach:
a list of the most common functional predicates (fact,
result, belief) and modifiers (first, last, only… )
heuristics based on structural information (e.g.,
establishing relative clauses)
A `knowledge-based’ classification of
bridging descriptions (Vieira, 1998)
Based on LEXICAL RELATIONS such as synonymy,
hyponymy, and meronimy, available from a lexical
resource such as WordNet
the flat … the living room
The antecedent is introduced by a PROPER NAME
Bach … the composer
The anchor is a NOMINAL MODIFIER introduced as
part of the description of a discourse entity:
selling discount packages … the discounts
… continued
(cases NOT attempted by our system)
The anchor is introduced by a VP:
Kadane oil is currently drilling two oil wells. The
activity…
The anchor is not explicitly mentioned in the text, but is
a `discourse topic’
the industry (in a text about oil companies)
The resolution depends on more general
commonsense knowledge
last week’s earthquake … the suffering people
Distribution of bridging
descriptions
Class
Total
Percentage
Syn/Hyp/Mer
12/14/12
19%
Names
49
24%
Compound
Nouns
25
12%
Events
40
20%
Discourse Topic
15
7%
Inference
37
18%
Total
204
100%
The (hand-coded) decision tree
1.
2.
3.
4.
Apply ‘safe’ discourse-new recognition heuristics
Attempt to resolve as same-head anaphora
Attempt to classify as discourse new
Attempt to resolve as bridging description.
Search backward 1 sentence at a time and apply
heuristics in the following order:
1.
2.
3.
Named entity recognition heuristics – R=.66, P=.95
Heuristics for identifying compound nouns acting as
anchors – R=.36
Access WordNet – R, P about .28
Overall Results


Evaluated on a ‘test corpus’ of 464 definite
descriptions
Overall results:
R
P
F
Version 1
53%
76%
62%
Version 2
57%
70%
62%
D-N def
77%
77%
77%
Overall Results

Results for each type of definite description:
R
P
F
Direct
62%
anaphora
83%
71%
Disc new
Bridging
72%
38%
70%
32.9%
69%
29%
Questions raised by the Vieira / Poesio
work



Do these results hold for larger datasets?
Do discourse-new detectors help?
Bridging:
–
–
–

How to define the phenomenon?
Where to get the information?
How to combine salience with lexical & commonsense
knowledge?
Can such a system be helpful for applications?
Mereological bridging references
Cartonnier (Filing Cabinet) with Clock
This piece of mid-eighteenth-century furniture was meant to be
used like a modern filing cabinet; papers were placed in leatherfronted cardboard boxes (now missing) that were fitted into the
openshelves.
A large table decorated in the same manner would have been
placed in front for working with those papers.
Access to the cartonnier's lower half can only be gained by the
doors at the sides, because the table would have blocked the
front.
PREVIOUS RESULTS




A series of experiments using the Poesio / Vieira dataset,
containing 204 bridging references, including 39 `WordNet’
bridges
(Vieira and Poesio, 2000, but also Carter 1985, Hobbs - a
number of papers-, etc) need lexical knowledge
But: even large lexical resources such as WordNet not enough,
particularly for mereological references (Poesio et al, 1997;
Vieira and Poesio, 2000; Poesio, 2003; Garcia-Almanza, 2003)
Partial solution: use lexical acquisition (HAL, Hearst-style
construction method). Best results (for mereology):
construction-style
FINDING MERONYMICAL RELATIONS USING
SYNTACTIC INFORMATION

Some syntactic constructions suggest semantic
relations
–

Ishikawa 1998, Poesio et al 2002: use syntactic
constructions to extract mereological information
from corpora
–
–
–

(Cfr. Hearst 1992, 1998 for hyponyms)
The WINDOW of the CAR
The CAR’s WINDOW
The CAR WINDOW
See also Berland & Charniak 1999, Girju et al 2002
LEXICAL RESOURCES FOR
BRIDGING: A SUMMARY
Class
Syn
Hyp
Mer
Total WN
Total
12
14
12
38
WordNet
4 (33.3%)
8 (57.1%)
3 (33.3%)
15 (39%)
HAL
4 (33.3%)
2 (14.3%)
2 (16.7%)
8 (22.2%)
Constructions
1 (8.3%)
0
8 (66.7%)
9 (23.7%)
(All using the Vieira / Poesio dataset.)
FOCUSING AND MEREOLOGICAL
BRIDGES
Cartonnier (Filing Cabinet) with Clock
This piece of mid-eighteenth-century furniture was meant to be
used like a modern filing cabinet; papers were placed in leatherfronted cardboard boxes (now missing) that were fitted into the
openshelves.
A large table decorated in the same manner would have been
placed in front for working with those papers.
Access to the cartonnier's lower half can only be gained by the
doors at the sides, because the table would have blocked the
front.
(See Sidner, 1979; Markert et al, 1995.)
FOCUS (CB) TRACKING + GOOGLE
SEARCH (POESIO, 2003)


Analyzed 169 associative BDs in GNOME corpus
(58 mereology)
Correlation between distance and focusing (Poesio
et al, 2004) and choice of anchor
–
–
–

77.5% anchor same or previous sentence; 95.8% in last five
sentences
CB(U-1) anchor for only 33.6% of BDs,
but 89% of anchors had been CB or CP
Using `Google distance’ to choose among salient
anchor candidates
FINDING MEREOLOGICAL RELATIONS
USING GOOGLE

Lexical vicinity measure (for MERONYMS) between
NBD and NPA
–
Search in Google for “the NBD of the NPA” (cfr. Ishikawa,
1998; Poesio et al, 2002)

–


E.g., “the drawer of the cabinet”
Choose as anchor the PA whose NPA results in the greater
number of hits
Preliminary results for associative BDs: around 70%
P/R (by hand)
See also: Markert et al, 2003, 2005; Modjeska et al,
2003
NEW EXPERIMENTS
(Poesio et al, 2004)

Using the GNOME corpus
–
–
–

Completely automatic feature extraction
–
–

58 mereological bridging refs realized by the-nps
153 mereological bridging references in total
Reliably annotated
Google & WordNet for lexical distance
Using (an approximation of) salience
Using machine learning to combine the features
More (and reliably annotated) data:
the GNOME corpus


Texts from 3 genres (museum descriptions,
pharmaceutical leaflets, tutorial dialogues)
Reliably annotated syntactic, semantic and
discourse information
–
–
–
grammatical function, agreement features
anaphoric relations
uniqueness, ontological information, animacy, genericity, …

Reliable annotation of bridging references

http://cswww.essex.ac.uk/Research/NLE/corpora/GNOME
METHODS

Salience features:
–
–
–

Lexical distance:
–
–
–

Utterance distance
First mention
‘Global first mention’ (approximate CB)
WordNet (using a pure hypernym-based search strategy)
Google
Tried both separately and together
Statistical classifiers: MLP, Naïve Bayes
–
(MatLab / Weka ML Library)
Lexical Distance 1 (WordNet)

Computing WordNet Distance:
 Get the head noun of the anaphor and find all the (noun)
senses for the head noun.
 Get all the noun senses for the head noun of the potential
antecedent under consideration.
 Retrieve the hypernym trees from WordNet for each sense of
anaphor and the antecedent.
 Traverse each unique path in these trees and find a common
parent for the anaphor and the antecedent; count the no. of
nodes they are apart.
 Select the least distance path across all combinations.
 If no common parent is found, assign an hypothetical
distance (30).
Lexical Distance, 1: WordNet
Lexical Distance 2 (Google)



As in (Poesio, 2003)
But use Google API to access the Google search engine
Computing Google hits:
– Get the head noun for BR and potential candidate.
– Check whether the potential candidate is a mass or count noun.
– If count, build the query as “the body of the person” and search
for the pattern.
– Retrieve the no. of Google hits
WN vs GOOGLE
Description
No path in WordNet
No path in WordNet between BD and correct anchor
Anchor with Min WN Distance correct
Zero Google Hits
Zero Google Hits for correct anchor
Max Google Hits identify correct candidate
Results
503/1720
10/58
8/58
1089/1720
24/58
8/58
BASELINES
BASELINE
ACCURACY
Random choice (previous 5)
4%
Random choice (previous)
19%
Random choice among FM
21.3%
Min Google Distance
13.8%
Min WN Distance
13.8%
FM entity in previous sentence
31%
Min Google in previous sentence
17.2%
Min WN in previous sentence
25.9%
Min Google among FM
12%
Min WN among FM
24.1%
RESULTS
(58 THE-NPs, 50:50)
WN DISTANCE
GOOGLE
DISTANCE
MatLab NN, self-tuned
92 (79.3%)
89 (76.7%)
Weka NN Algorithm
91 (78.4%)
86 (74.1%)
Weka Naïve Bayes
88 (75.9%)
85 (73.3%)
Prec
Recall
F
WN distance
75.4%
84.5%
79.6%
Google distance
70.6%
86.2%
77.6%
MORE RESULTS
1:3 dataset:
WN distance
Google distance
Accuracy
F
80.6%
55.7%
82%
56.7%
all 153 mereological BRs:
Accuracy
F
WN distance
224 (74.2%)
76.3%
Google distance
230 (75.2%)
75.8%
MEREOLOGICAL BDS REALIZED
WITH BARE-NPS
The combination of rare and expensive materials used on this cabinet
indicates that it was a particularly expensive commission. The four
Japanese lacquer panels date from the mid- to late 1600s and were
created with a technique known as kijimaki-e. For this type of lacquer,
artisans sanded plain wood to heighten its strong grain and used it as
the background of each panel. They then added the scenic elements of
landscape, plants, and animals in raised lacquer. Although this
technique was common in Japan, such large panels were rarely
incorporated into French eighteenth-century furniture.
Heavy Ionic pilasters, whose copper-filled flutes give an added rich
color and contrast to the gilt-bronze mounts, flank the panels. Yellow
jasper, a semiprecious stone, rather than the usual marble, forms the
top.
HARDER TEST
Using classifiers trained on balanced /slightly
unbalanced data (the-nps) on unbalanced
ones (10-fold cross validation)
Distance
Balance
Accuracy on
balanced
F on
bal
Accuracy on
unbal
F on
unbal
WN
1:1
1:3
70.2%
75.9%
.7
.4
80.2%
91.7%
.2
0
Google
1:1
1:3
64.4%
79.8%
.7
.5
63.6%
88.4%
.1
.3
WN +
Google
1:1
1:3
66.3%
77.9%
.6
.4
65.3%
92.5%
.2
.5
DISCUSSION

Previous results:
–
–

This work:
–

Construction-based techniques provide adequate lexical
resources, particularly when using Web as corpus
But need to combine lexical knowledge and salience
modeling
Combining (simple) salience with lexical resources results in
significant improvements
Future work:
–
–
Larger dataset
Better approximation of focusing
Back to discourse-new detection


The GUITAR system
Recent results
GUITAR (Kabadjov, to appear)


A robust, usable anaphora resolution system
designed to work as part of an XML pipeline
Incorporates:
–
–
–

Several versions
–
–
–

Pronouns: the Mitkov algorithm
Definite descriptions: the Vieira / Poesio algorithm
Proper nouns: the Bontcheva alg.
Version 1: (Poesio & Kabadjov, 2004): direct anaphora
Version 2: DN detection
Version 3: proper name resolution
Freely available from
http://privatewww.essex.ac.uk/~malexa/GuiTAR/
DISCOURSE-NEW DEFINITE
DESCRIPTIONS
(1) Toni Johnson pulls a tape measure across the front
of what was once a stately Victorian home.
(2) The Federal Communications Commission allowed
American Telephone & Telegraph Co. to continue
offering discount phone services for large-business
customers and said it would soon re-examine its
regulation of the long-distance market.
Poesio and Vieira (1998): about 66% of definite
descriptions in their texts (WSJ) are discourse-new
WOULD DNEW RECOGNITION
HELP?
First version of GUITAR without DN detection on subset
of DDs in GNOME corpus - 574 DDs, of which
- 184 anaphoric (32%)
- 390 discourse-new (67.9%)
Total
Sys
Ana
Corr
NM
WM
SM
R
P
F
574
(184)
198
457
(119)
38
27
52
79.6
(60.1)
79.6
(64.7)
79.6
(62.3)
26.3%
SPURIOUS MATCHES
If your doctor has told you in detail HOW MUCH
to use and HOW OFTEN then keep to this
advice.
…..
If you are not sure then follow the advice on
the back of this leaflet.
GOALS OF THE WORK


Vieira and Poesio’s (2000) system incorporated
DISCOURSE-NEW DD DETECTORS (P=69, R=72,
F=70.5)
Two subsequent strands of work:
–
–

Bean and Riloff (1999), Uryupina (2003) developed
improved detectors (e.g., Uryupina: F=86.9)
Ng and Cardie (2002) questioned whether such detectors
improve results
Our project: systematic investigation of whether DN
detectors actually help
–
–
ACL 04 ref res: features, preliminary results
THIS WORK: results of further experiments
DN CLASSIFIER:
THE UPPER BOUND


Current number of SMs: 52/198 (26.3%)
If SM = 0,
P=R=F overall = 509/574 = 88.7
–
(P=R=F on anaphora only: 119/146= 81.5)
VIEIRA AND POESIO’S
DN DETECTORS
Recognize SEMANTICALLY FUNCTIONAL descriptions:
SPECIAL PREDICATES / PREDICATE MODIFIERS
(HAND-CODED)
the front of what was once a stately Victorian home
the best chance of saving the youngest children
PROPER NAMES.
the Federal Communications Commission …
LARGER SITUATION descriptions (HAND-CODED):
the City, the sun, ….
VIEIRA AND POESIO’S
DN DETECTORS, II
PREDICATIVE descriptions:
COPULAR CLAUSES:
he is the hardworking son of a Church of Scotland
minister ….
APPOSITIONS.
Peter Kenyon, the Chelsea chief executive …
Descriptions ESTABLISHED by modification:
The warlords and private militias who were once
regarded as the West’s staunchest allies are now
a greater threat to the country’s security than the
Taliban …. (Guardian, July 13th 2004, p.10)
VIEIRA AND POESIO’S DECISION
TREES
Tried both hand-coded and ML
Hand-coded decision tree:
1. Try the DN detectors with highest accuracy
(attempt to classify as functional using special
predicates, and as predicative by looking for apposition)
2. Attempt to resolve the DD as direct anaphora
3. Try other DN detectors in order: proper name,
establishing clauses, proper name modification ….
ML DT: swap 1. and 2.
VIEIRA AND POESIO’S RESULTS
P
R
F
50.8
100
67.4
DN detection
69
72
70
Hand-coded DT
62
85
71.7
77
77
77
75
75
75
Baseline
(partial)
Hand-coded DT
(total)
ID3
BEAN AND RILOFF (1999)
Developed a system for identifying DN definites
Adopted syntactic heuristics from Vieira and Poesio,
and developed several new techniques:
SENTENCE-ONE (S1) EXTRACTION
identify as discourse-new every description found in
first sentence of a text.
DEFINITE PROBABILITY
create a list of nominal groups encountered at least 5
times with definite article, but never with indefinite
VACCINES: block heuristics when prob. too low.
BEAN AND RILOFF’S ALGORITHM
1. If the head noun appeared earlier, classify as anaphoric
2. If DD occurs in S1 list, classify as DN unless vaccine
3. Classify DD as DN if one of the following applies:
(a) high definite probability;
(b) matches a EHP pattern;
(c) matches one of the syntactic heuristics
4. Classify as anaphoric
BEAN AND RILOFF’S RESULTS
P
R
Baseline
100
72.2
Syn heuristics
43
93.1
66.3
60.7
69.2
84.3
87.3
83.9
Syn Heuristics + S1 +
EHP + DO
81.7
82.2
Syn Heuristics + S1+
EHP + DO + V
79.1
84.5
Syn Heuristics +
S1
EHP
DO
NG AND CARDIE
(2002)


Directly investigate question of whether
discourse-new detectors improves
performance of anaphora resolution system
Dealing with ALL types of anaphoric
expressions
NG AND CARDIE’S METHODS

DN detectors:
–
–
–

statistical classifiers trained using C4.5 and RIPPER
Features: predicate & superlative detection / head match /
position in text of NP
Tested over MUC-6 (F=86) and MUC-7 (F=84)
2 architectures for integration of detectors and AR:
1.
2.
Run DN detector first,
apply AR on NPs classified as anaphoric
Run AR if str_match or alias=Y;
otherwise, as in 1.
NG AND CARDIE’S RESULTS
MUC-6
MUC-7
P
R
F
P
R
F
Baseline (no DN
detection)
DN detection runs
first
70.3
58.3
63.8
65.5
58.2
61.6
57.4
71.6
63.7
47.0
77.1
58.4
Same head runs
first
63.4
68.3
65.8
59.7
69.3
64.2
NG AND CARDIE’S RESULTS
MUC-6
MUC-7
P
R
F
P
R
F
Baseline (no DN
detection)
DN detection runs
first
70.3
58.3
63.8
65.5
58.2
61.6
57.4
71.6
63.7
47.0
77.1
58.4
Same head runs
first
63.4
68.3
65.8
59.7
69.3
64.2
URYUPINA’S
METHODS


A DN statistical classifier trained using
RIPPER
Trained / tested over Ng and Cardie’s
MUC-7 data
URYUPINA’S FEATURES:
WEB-BASED DEFINITE PROBABILITY
" the Y"
" the Y"
" a Y"
" Y"
" the H"
" the H"
" a H"
" H"
URYUPINA’S RESULTS
(DNEW CLASSIFIER)
All NPs
Def NPs
P
R
F
No Def Prob
87.9
86.0
86.9
Def Prob
88.5
84.3
86.3
No Def Prob
82.5
79.3
80.8
Def Prob
84.8
82.3
83.5
(On MUC-7)
URYUPINA’S RESULTS
(DNEW CLASSIFIER)
All NPs
Def NPs
P
R
F
No Def Prob
87.9
86.0
86.9
Def Prob
88.5
84.3
86.3
No Def Prob
82.5
79.3
80.8
Def Prob
84.8
82.3
83.5
(On MUC-7)
PRELIMINARY CONCLUSIONS

Quite a lot of agreement on features for DN
recognition:
–
–
–
–
Recognizing predicative NPs
Recognizing establishing relatives
Recognizing DNEW proper names
Identifying functional DDs



Automatic detection of these better
Using the Web best
All these systems integrate DN detection with some
form of AR resolution
–
See Ng’s results concerning how `globally optimized’
classifiers are better than `locally optimized’ ones (ACL
2004)
PRELIMINARY CONCLUSIONS, II

Ng and Cardie’s results not the last word:
–
–
Performance of their DN detector not as high as
Uryupina’s (F=84 vs. F=87 on same dataset, MUC-7)
Overall performance of their resolution system not that
high



best performance: F=65.8 on ALL NPs
But on full NPs (i.e., excluding PNs and pronouns): F=31.7
(GUITAR on DDs, unparsed text: F=56.4)
Room for improvement
A NEW SET OF EXPERIMENTS

Incorporate the improvements in DN detection
technology to
–
–

the Vieira / Poesio algorithm, as reimplemented in a stateof-the-art `specialized’ AR system, GUITAR
a statistical `general purpose’ AR resolver (Uryupina, in
progress)
Test over a large variety of data
–
–
–
New: GNOME corpus (623 DDs)
Original Vieira and Poesio dataset (1400 DDs)
MUC-7 (for comparison with Ng and Cardie, Uryupina)
(3000 DDs)
ARCHITECTURE

A two-level system:
–
–
–

Run GUITAR’s direct anaphora resolution
Results used as one of the features of a statistical
discourse-new classifier
A `globally optimized’ system (Ng, ACL 2004)
Trained / tested over
–
–
GNOME corpus
Vieira / Poesio dataset, converted to MMAX, converted to
MAS-XML (still correcting the annotation)
A NEW SET OF FEATURES
DIRECT ANAPHORA
Run the Vieira / Poesio algorithm;
-1 if no result
else distance
PREDICATIVE NP DETECTOR
DD occurs in apposition
DD occurs in copular construction
PROPER NAMES
c-head
c-premod
Bean and Riloff’s S1
A REVISED SET OF FEATURES (II)
FUNCTIONALITY
Uryupina’s four definite probabilities
(computed off the Web)
superlative
ESTABLISHING RELATIVE
(a single feature)
POSITION IN TEXT OF NP (Ng and Cardie)
header / first sentence / first para
LEARNING A DN CLASSIFIER

Use of the data:
–
–

Classifiers: from the Weka package
–


8% for parameter tuning
10-fold cross-validation over the rest
Decision Tree (C4.5), NN (MLP), SVM
3 evaluations (overall, DN, DA)
Performance comparison: t-test (cfr. Dietterich
98
3 EVALUATIONS
OVERALL
DN sys  DAsys
DN corr  DAcorr
P
,R 
DN sys  DAsys
DN  DA
DN
DN sys
DN corr
P
,R 
DN sys
DN
DA
DAsys
DAcorr
P
,R 
DAsys
DA
RESULTS:
OVERALL
GuiTAR
T
Res
C
P=R=F
574
574
457
79.6
p  .1
GuiTAR +MLP
GuiTAR +C4.5
574
574
574
574
473
466
82.4
81.18
not
sig
RESULTS:
DNEW CLASSIFICATION
P
R
F
A
DNC4.5
86.9
92.3
89.3
85.04
DNMLP
86.4
94.6
90.2
85.89
DNSVM
90.0
86.4
88.1
84.15
BASELINE
67.5
100
80.6
67.5
(all DDs are DN)
RESULTS:
DIRECT ANAPHORA RESOLUTION
T
Res
C
NM
WM
SM
P
R
F
GuiTAR
184
198
119
38
27
52
60.1
64.7
62.3
GuiTAR +MLP
184
142
104
60
20
18
74.1
56.5
63.4
GuiTAR +C4.5
184
158
106
56
22
30
68.9
57.7
62.1
GuiTAR +SVM
184
198
119
38
27
52
60.1
64.7
62.3
ERROR ANALYSIS

A 65% reduction in spurious matches:
–
–
–

“the answer to any of these questions“
“the title of cabinet maker and sculptor to Louis
XIV, King of France”
“the other half of the plastic“
But: a 58% increase in no matches
–
“the palm of the hand”
THE DECISION TREE
DirectAna <= -1?
Y
N
DNEW (339/36)
DirectAna <= 20?
Y
N
TheY/A Y <= 201.2?
DNEW (11/1)
Y
N
Relative = 0?
Y
DirectAna <= 12?
1stPar = 0?
N
DNEW
Y
ANAPH
N
DNEW (12/1)
RESULTS:
THE VIEIRA/POESIO CORPUS




Tested on 400 DDs (the ‘test’ corpus)
Initial results at DN detection very poor
Problem: the two conversions resulted in the
loss of much information about modification,
particularly relatives
Currently correcting the annotation by hand
RESULTS:
AUTOMATIC PARSING



GUITAR without DN detection over the same
texts, but using a chunker: 10% less
accuracy
Main problem: many DDs not detected
(particularly possessives)
Currently experimenting with full parsers
(tried several, settled on Charniak’s)
CONCLUSIONS AND DISCUSSION


All results so far support the idea that DN detectors improve
the performance of AR with DD (if perhaps by only a few
percent)
Some agreement on what features are useful
–


One clear lesson: interleave AR and DN detection!
But: will need to test on larger corpora (also to improve
performance of classifier)
Current work:
–
–
Test on unparsed text
Test on MUC-7 data
Task-based evaluation

RANLP / EMNLP slides
Conclusions
URLs

Massimo Poesio:
http://cswww.essex.ac.uk/staff/poesio

GUITAR:
http://privatewww.essex.ac.uk/~malexa/GuiTAR/

WEKA:
http://www.cs.waikato.ac.nz/~ml
Download