Modelling Human Thematic Fit Judgements

advertisement
Modelling Human Thematic
Fit Judgments
IGK Colloquium
3/2/2005
Ulrike Padó
Overview
• (Very) quick introduction to my framework
• Testing the Semantic Module
 Different input corpora
 Smoothing
• Comparing the Semantic Module to
standard selectional preference methods
Modelling Semantic Processing
• General idea: Build a
 probabilistic
 large scale
 broad coverage
model of syntactic and semantic sentence
processing
Semantic Processing
• Assign thematic roles on the basis of cooccurrence statistics from semantically
annotated corpora
• Corpus-based frequency estimates of:
 Semantic Subcategorisation (Probability of
seeing the role with the verb)
 Selectional Preferences (Probability of seeing
the argument head in a role given the verb
frame)
Testing the Semantic Module
•
•
Evaluate just thematic fit of verbs and
argument phrases
Evaluation:
1. Correlate predictions with human judgments
2. Role labelling (prefer correct role)
•
Try
 Different input corpora
 Smoothing
Training Data
Frequency counts from
• the PropBank (ca. 3000 verb types)
 Very specific domain
 Relatively flat, syntax-based annotation
• FrameNet (ca. 1500 verb types)
 Deep semantic annotation: Frames code situations, group verbs
that describe similar events and their arguments
 Extracted from balanced corpus
 Skewed sample through frame-wise annotation
Development/Test Data
• Development: 60 verb-argument pairs from
McRae et al. 98
 Two judgments for each data point: Agent/Patient
 Use to determine optimal parameters of clustering
(number of clusters, smoothing)
• Test: 50 verb-argument pairs, 100 data points
Sparse Data
• Raw frequencies are sparse:
 1 (Dev)/2 (Test) pairs seen in PropBank
 0 (Dev)/2 (Test) pairs seen in FrameNet
• Use semantic classes as level of
abstraction: Class-based smoothing
Smoothing
Reconstruct probabilities for unseen data
• Smoothing by verb and noun classes
 Count class members instead of word tokens
• Compare two alternatives:
 Hand-constructed classes
 Induced verb classes (clustering)
Hand-constructed
Verb and Noun classes
• WordNet: Use top-level ontology and
synsets as noun classes
• VerbNet: Use top-level classes for verbs
• Presumably correct and reliable
• Result: No significant correlations with
human data for either training corpus
Induced Verb Classes
• Automatically cluster verbs
 Group by similarities of argument heads,
paths from argument to verb, frame, role
labels
 Determine optimal number of clusters and
parameters of the clustering algorithm on the
development set
Induced Classes, PB/FN
Raw data
All Arguments
Just NPs
Data points
covered
/
Significance
2
-/-
2
-/-
59
ns
12
=0.55/
p<0.05
48
ns
16
=0.56/
p<0.05
Results
• Hand-built classes do not work (with this
amount of data)
• Module achieves reliable correlations with
FN data:
 Important result for the overall feasibility of my
model
Adding Noun Classes (PB/FN)
Raw data
PB, all args,
Noun classes
FN, just NPs,
Noun classes
Data points
covered
/
Significance
2
-/-
2
-/-
4
=1/ p<0.01
18
=0.63/
p<0.01
Results
• Hand-built classes do not work (with this
amount of data)
• Module achieves reliable correlations with
FN data
• Adding noun classes helps yet a little
Comparison with Selectional
Preference Methods
• Have established that our system reliably
predicts human data
• How do we do in comparison to standard
computational linguistics methods?
Selectional Preference Methods
• Clark & Weir (2002)
 Add data points by finding the topmost class
in WN that still reliably mirrors the target word
frequency
• Resnik (1996)
 Quantify contribution of WN class n to the
overall preference strength of the verb
• Both rely on WN noun classes, no verb
class smoothing
Selectional Preference Methods
(PB/FN)
Data points
covered
/
Significance
Labelling
(Cov/Acc)
Sem. Module 1
18
=0.63/ p<0.01
38%/47.4%
Sem. Module 2
16
=0.56/
p<0.05
30%/60%
72
ns
84%/50%
23
ns
36%/50%
75
ns
74%/48.6%
46
ns
50%/48%
Clark & Weir
Resnik
Results
• Too little input data
 No results for selectional preference models
 Small coverage for Semantic Module
• Semantic module manages to make
predictions all the same
 Relies on verb clusters: Verbs are less sparse
than nouns in small corpora
• Annotate larger corpus with FN roles
Annotating the BNC
• Annotate large, balanced corpus: BNC
 More data points for verbs covered in FN
 More verb coverage (though purely syntactic
annotation for unknown verbs)
• Results:
 Annotation relatively sensible and reliable for non-FN
verbs
 Frame-wise annotation in FN causes problems for FN
verbs
Download