slides

advertisement
Exploiting Subjective Annotations
Dennis Reidsma and Rieks op den Akker
Human Media Interaction
University of Twente
http://hmi.ewi.utwente.nl
Types of content




Annotation as a task of subjective judgments?
Manifest content
Pattern latent content
Projective latent content
Cf. Potter and Levine-Donnerstein 1999
Projective latent content

Why annotate data as projective latent
content?


Because it cannot be defined exhaustively,
whereas annotators have good `mental
schema’s’ for it
Because the data should be annotated in
terms that fit with the understanding of
`naïve users’
Inter-annotator agreement
and projective content

Disagreements may be caused by



Errors by annotators
Invalid scheme (no true label exists)
Different annotators having different
`truths’ in interpretation of behavior
(subjectivity)
Subjective annotation

People communicate in different ways,
and therefore, as an observer, may also
judge the behavior of others differently
Subjective annotation


People communicate in different ways,
and therefore, as an observer, may also
judge the behavior of others differently
Projective content may be especially
vulnerable to this problem
Subjective annotation



People communicate in different ways,
and therefore, as an observer, may also
judge the behavior of others differently
Projective content may be especially
vulnerable to this problem
How to work with subjectively
annotated data?
Subjective annotation

How to work with subjectively
annotated data? Unfortunately, it leads
to low levels of agreement, and
therefore usually would be avoided as
`unproductive material’
I. Predicting agreement


One way to work with subjective data is
to try to find out in which contexts
annotators would agree, and focus on
those situations.
Result: a classifier that will not always
classify all instances, but if it does, it
will do so with greater accuracy
II. Explicitly modeling
intersubjectivity


A second way: model different annotators
separately, then find the cases where the
models agree, and assume that those are the
cases where the annotators would have
agreed, too.
Result: a classifier that tells you for which
instances other annotators would most
probably agree with its classification
Advantages



Both solutions lead to `cautious
classifiers’ that only render a judgment
in those cases where annotators would
have been expected to agree
This may carry over to users, too…
Neither solution needs to have all data
multiply annotated for this

Time?

Pressing questions so far?
(The remainder of the talk will give two
case studies.)
Case studies


I. Predicting agreement from
information in other (easier) modalities:
The case of contextual addressing
II. Explicitly modeling intersubjectivity
in dialog markup: The case of Voting
Classifiers
Data used: The AMI Corpus

100h of recorded meetings, annotated
with dialog acts, focus of attention,
gestures, addressing, decision points,
and other layers
I. Contextual addressing




Addressing, and focus of attention.
Agreement is highest for certain FOA
contexts.
In those contexts, the classifier also
performed better.
… more in paper
II. Modeling intersubjectivity


Modeling single annotators, for `yeah’
utterances
Data annotated non-overlapping, 3 annotators
All data
d
s
Trn
Tst
Trn
Tst
(3585) (2289) (1753) (528)
v
Trn
Tst
(3500) (1362)
II. Modeling intersubjectivity

Cross annotator
training and testing
TST_d
TST_s
TST_v
TST_all
C_d
69
64
52
63
C_s
59
68
48
57
C_v
63
57
66
63
II. Modeling intersubjectivity

Building a voting classifier:
Only classify an instance when all three
annotator-specific expert classifiers
agree
II. Modeling intersubjectivity

In the unanimous voting context,
performance is higher due to increased
precision (avg 6%)
Conclusions



Possible subjective aspects to
annotation should be taken into account
Agreement metrics are not designed to
handle this
We proposed two methods designed to
cope with subjective data
Thank you!

Questions?
Download