rocling07wiebe - University of Pittsburgh

advertisement
Subjectivity and Sentiment Analysis
Jan Wiebe
Department of Computer Science
CERATOPS: Center for the Extraction and Summarization of Events
and Opinions in Text
University of Pittsburgh
What is Subjectivity?
• The linguistic expression of somebody’s
opinions, sentiments, emotions, evaluations,
beliefs, speculations (private states)
Wow, this is my 4th Olympus camera
Staley declared it to be “one wicked song”
Most voters believe he won‘t raise their taxes
One Motivation
• Automatic question answering…
Fact-Based Question Answering
Q: When is the first day of spring in 2007?
Q: Does the US have a tax treaty with Cuba?
Fact-Based Question Answering
Q: When is the first day of spring in 2007?
A: March 21
Q: Does the US have a tax treaty with Cuba?
A: Thus, the U.S. has no tax treaties with nations
like Iraq and Cuba.
Opinion Question Answering
Q: What is the international reaction to the reelection
of Robert Mugabe as President of Zimbabwe?
Opinion Question Answering
Q: What is the international reaction to the reelection
of Robert Mugabe as President of Zimbabwe?
A: African observers generally approved of his
victory while Western Governments strongly
denounced it.
More Motivations
•
•
•
•
•
Product review mining: What features of the
ThinkPad T43 do customers like and which do they
dislike?
Review classification: Is a review positive or negative
toward the movie?
Information Extraction: Is “killing two birds with one
stone” a terrorist event?
Tracking sentiments toward topics over time: Is
anger ratcheting up or cooling down?
Etcetera!
<ppol=“neg”>condemn</ppol>
<ppol=“pos”>great</ppol>
<ppol=“neg”>wicked</ppol>
<> </>
<> </>
<> </>
<> </>
<> </>
<> </>
<cpol=“pos”>wicked
</cpol> visuals
<cpol=“neg”>loudly
condemned</cpol>
The building has been
<subjectivity=“obj”>
condemned
</subjectivity>
<> </>
<> </>
QA
Review
Mining
Opinion
Tracking
<ppol=“neg”>condemn</ppol>
<ppol=“pos”>great</ppol>
<ppol=“neg”>wicked</ppol>
<> </>
<> </>
<> </>
<> </>
1
<> </>
<> </>
<cpol=“pos”>wicked
</cpol> visuals
<cpol=“neg”>loudly
condemned</cpol>
The building has been
<subjectivity=“obj”>
condemned
</subjectivity>
<> </>
<> </>
QA
Review
Mining
Opinion
Tracking
<ppol=“neg”>condemn</ppol>
<ppol=“pos”>great</ppol>
<ppol=“neg”>wicked</ppol>
<> </>
<> </>
<> </>
<> </>
<> </>
<> </>
<cpol=“pos”>wicked
</cpol> visuals
<cpol=“neg”>loudly
condemned</cpol>
The building has been
<subjectivity=“obj”>
condemned
</subjectivity>
2
<> </>
<> </>
QA
Review
Mining
Opinion
Tracking
<ppol=“neg”>condemn</ppol>
<ppol=“pos”>great</ppol>
<ppol=“neg”>wicked</ppol>
<> </>
<> </>
<> </>
<> </>
<> </>
<> </>
<cpol=“pos”>wicked
</cpol> visuals
<cpol=“neg”>loudly
condemned</cpol>
The building has been
<subjectivity=“obj”>
condemned
</subjectivity>
<> </>
<> </>
3
QA
Review
Mining
Opinion
Tracking
<ppol=“neg”>condemn</ppol>
<ppol=“pos”>great</ppol>
<ppol=“neg”>wicked</ppol>
<> </>
<> </>
<> </>
<> </>
<> </>
<> </>
4
<cpol=“pos”>wicked
</cpol> visuals
<cpol=“neg”>loudly
condemned</cpol>
The building has been
<subjectivity=“obj”>
condemned
</subjectivity>
<> </>
<> </>
QA
Review
Mining
Opinion
Tracking
Outline
• Subjectivity annotation scheme (MPQA)
• Learning subjective expressions from
unannotated texts
• Contextual polarity of sentiment
expressions
• Word sense and subjectivity
• Conclusions and pointers to related work
<ppol=“neg”>condemn</ppol>
<ppol=“pos”>great</ppol>
<ppol=“neg”>wicked</ppol>
<> </>
<> </>
<> </>
<> </>
1
<> </>
<> </>
<cpol=“pos”>wicked
</cpol> visuals
<cpol=“neg”>loudly
condemned</cpol>
The building has been
<subjectivity=“obj”>
condemned
</subjectivity>
<> </>
<> </>
QA
Review
Mining
Opinion
Tracking
Corpus Annotation
Wiebe, Wilson, Cardie 2005
Wilson & Wiebe 2005
Somasundaran, Wiebe, Hoffmann, Litman 2006
Wilson 2007
Outline for Section 1
•
•
•
•
Overview
Frame types
Nested Sources
Extensions
Overview
• Fine-grained: expression level rather
than sentence or document level
• Annotate
– expressions of opinions, sentiments,
emotions, evaluations, speculations, …
– material attributed to a source, but presented
objectively
Overview
• Opinions, evaluations, emotions,
speculations are private states
• They are expressed in language by
subjective expressions
Private state: state that is not open to objective
observation or verification
Quirk, Greenbaum, Leech, Svartvik (1985). A
Comprehensive Grammar of the English Language.
Overview
• Focus on three ways private states are
expressed in language
– Direct subjective expressions
– Expressive subjective elements
– Objective speech events
Direct Subjective Expressions
• Direct mentions of private states
The United States fears a spill-over from the
anti-terrorist campaign.
• Private states expressed in speech events
“We foresaw electoral fraud but not daylight
robbery,” Tsvangirai said.
Expressive Subjective Elements
Banfield 1982
• “We foresaw electoral fraud but not daylight
robbery,” Tsvangirai said
• The part of the US human rights report about
China is full of absurdities and fabrications
Objective Speech Events
• Material attributed to a source, but
presented as objective fact (without
evaluation…)
The government, it added, has amended the Pakistan
Citizenship Act 10 of 1951 to enable women of Pakistani
descent to claim Pakistani nationality for their children
born to foreign husbands.
Nested Sources
“The report is full of absurdities,’’ Xirao-Nima said the next day.
Nested Sources
(Writer)
“The report is full of absurdities,’’ Xirao-Nima said the next day.
Nested Sources
(Writer, Xirao-Nima)
“The report is full of absurdities,’’ Xirao-Nima said the next day.
Nested Sources
(Writer Xirao-Nima)
(Writer Xirao-Nima)
“The report is full of absurdities,’’ Xirao-Nima said the next day.
Nested Sources
(Writer)
(Writer Xirao-Nima)
(Writer Xirao-Nima)
“The report is full of absurdities,’’ Xirao-Nima said the next day.
“The report is full of absurdities,” Xirao-Nima said the next day.
Objective speech event
anchor: the entire sentence
source: <writer>
implicit: true
Direct subjective
anchor: said
source: <writer, Xirao-Nima>
intensity: high
expression intensity: neutral
attitude type: negative
target: report
Expressive subjective element
anchor: full of absurdities
source: <writer, Xirao-Nima>
intensity: high
attitude type: negative
“The report is full of absurdities,” Xirao-Nima said the next day.
Objective speech event
anchor: the entire sentence
source: <writer>
implicit: true
Direct subjective
anchor: said
source: <writer, Xirao-Nima>
intensity: high
expression intensity: neutral
attitude type: negative
target: report
Expressive subjective element
anchor: full of absurdities
source: <writer, Xirao-Nima>
intensity: high
attitude type: negative
“The report is full of absurdities,” Xirao-Nima said the next day.
Objective speech event
anchor: the entire sentence
source: <writer>
implicit: true
Direct subjective
anchor: said
source: <writer, Xirao-Nima>
intensity: high
expression intensity: neutral
attitude type: negative
target: report
Expressive subjective element
anchor: full of absurdities
source: <writer, Xirao-Nima>
intensity: high
attitude type: negative
“The report is full of absurdities,” Xirao-Nima said the next day.
Objective speech event
anchor: the entire sentence
source: <writer>
implicit: true
Direct subjective
anchor: said
source: <writer, Xirao-Nima>
intensity: high
expression intensity: neutral
attitude type: negative
target: report
Expressive subjective element
anchor: full of absurdities
source: <writer, Xirao-Nima>
intensity: high
attitude type: negative
“The report is full of absurdities,” Xirao-Nima said the next day.
Objective speech event
anchor: the entire sentence
source: <writer>
implicit: true
Direct subjective
anchor: said
source: <writer, Xirao-Nima>
intensity: high
expression intensity: neutral
attitude type: negative
target: report
Expressive subjective element
anchor: full of absurdities
source: <writer, Xirao-Nima>
intensity: high
attitude type: negative
“The report is full of absurdities,” Xirao-Nima said the next day.
Objective speech event
anchor: the entire sentence
source: <writer>
implicit: true
Direct subjective
anchor: said
source: <writer, Xirao-Nima>
intensity: high
expression intensity: neutral
attitude type: negative
target: report
Expressive subjective element
anchor: full of absurdities
source: <writer, Xirao-Nima>
intensity: high
attitude type: negative
“The US fears a spill-over’’, said Xirao-Nima, a professor of
foreign affairs at the Central University for Nationalities.
(Writer)
“The US fears a spill-over’’, said Xirao-Nima, a professor of
foreign affairs at the Central University for Nationalities.
(writer, Xirao-Nima)
“The US fears a spill-over’’, said Xirao-Nima, a professor of
foreign affairs at the Central University for Nationalities.
(writer, Xirao-Nima, US)
“The US fears a spill-over’’, said Xirao-Nima, a professor of
foreign affairs at the Central University for Nationalities.
(Writer)
(writer, Xirao-Nima, US)
(writer, Xirao-Nima)
“The US fears a spill-over’’, said Xirao-Nima, a professor of
foreign affairs at the Central University for Nationalities.
“The US fears a spill-over’’, said Xirao-Nima, a professor of
foreign affairs at the Central University for Nationalities.
Objective speech event
anchor: the entire sentence
source: <writer>
implicit: true
Objective speech event
anchor: said
source: <writer, Xirao-Nima>
Direct subjective
anchor: fears
source: <writer, Xirao-Nima, US>
intensity: medium
expression intensity: medium
attitude type: negative
target: spill-over
Corpus
• @ www.cs.pitt.edu/mpqa
• English language versions of articles from the world
press (187 news sources)
• Also includes contextual polarity annotations
• Themes of the instructions:
– No rules about how particular words should be annotated.
– Don’t take expressions out of context and think about what
they could mean, but judge them as they are used in that
sentence.
Extensions
Wilson 2007
Extensions
Wilson 2007
I think people are happy because Chavez has fallen.
direct subjective
span: think
source: <writer, I>
attitude:
attitude
span: think
type: positive arguing
intensity: medium
target:
target
span: people are happy because
Chavez has fallen
direct subjective
span: are happy
source: <writer, I, People>
attitude:
attitude
span: are happy
type: pos sentiment
intensity: medium
target:
target
span: Chavez has fallen
inferred attitude
span: are happy because
Chavez has fallen
type: neg sentiment
intensity: medium
target:
target
span: Chavez
Outline
• Subjectivity annotation scheme (MPQA)
• Learning subjective expressions from
unannotated texts
• Contextual polarity of sentiment
expressions
• Word sense and subjectivity
• Conclusions and pointers to related work
<ppol=“neg”>condemn</ppol>
<ppol=“pos”>great</ppol>
<ppol=“neg”>wicked</ppol>
<> </>
<> </>
<> </>
<> </>
<> </>
<> </>
<cpol=“pos”>wicked
</cpol> visuals
<cpol=“neg”>loudly
condemned</cpol>
The building has been
<subjectivity=“obj”>
condemned
</subjectivity>
2
<> </>
<> </>
QA
Review
Mining
Opinion
Tracking
Outline for Section 2
• Learning subjective nouns with extraction
pattern bootstrapping
• Automatically generating training data with
high-precision classifiers
• Learning subjective and objective
expressions (not simply words or n-grams)
• Riloff, Wiebe, Wilson 2003; Riloff & Wiebe 2003;
Wiebe & Riloff 2005; Riloff, Patwardhan, Wiebe
2006.
Outline for Section 2
• Learning subjective nouns with
extraction pattern bootstrapping
• Automatically generating training data
with high-precision classifiers
• Learning subjective and objective
expressions
Information Extraction
• Information extraction (IE) systems identify
facts related to a domain of interest.
• Extraction patterns are lexico-syntactic
expressions that identify the role of an object.
For example:
<subject> was killed
assassinated <dobj>
murder of <np>
Learning Subjective Nouns
• Goal: to learn subjective nouns from
unannotated text
• Method: applying IE-based bootstrapping
algorithms that were designed to learn
semantic categories
• Hypothesis: extraction patterns can identify
subjective contexts that co-occur with
subjective nouns
Example: “expressed <dobj>”
concern, hope, support
Extraction Examples
expressed <dobj>
condolences, hope, grief, views,
worries
indicative of <np>
compromise, desire, thinking
inject <dobj>
vitality, hatred
reaffirmed <dobj>
resolve, position, commitment
voiced <dobj>
outrage, support, skepticism,
opposition, gratitude, indignation
show of <np>
support, strength, goodwill, solidarity
<subj> was shared
anxiety, view, niceties, feeling
Meta-Bootstrapping Riloff & Jones 1999
Unannotated Texts
Ex: hope, grief, joy,
concern, worries
Best Extraction Pattern
Extractions (Nouns)
Ex:
expressed <DOBJ>
Ex: happiness, relief,
condolences
Subjective Seed Words
cowardice
crap
delight
disdain
dismay
embarrassment
fool
gloom
grievance
happiness
hatred
hell
hypocrisy
love
nonsense
outrage
slander
sigh
twit
virtue
Subjective Noun Results
• Bootstrapping corpus: 950 unannotated news articles
• We ran both bootstrapping algorithms for several
iterations
• We manually reviewed the words and labeled them as
strong, weak, or not subjective
• 1052 subjective nouns were learned (454 strong, 598
weak)
– included in our subjectivity lexicon @ www.cs.pitt.edu/mpqa
Examples of Strong Subjective
Nouns
anguish
antagonism
apologist
atrocities
barbarian
belligerence
bully
condemnation
denunciation
devil
diatribe
exaggeration
exploitation
evil
fallacies
genius
goodwill
humiliation
ill-treatment
injustice
innuendo
insinuation
liar
mockery
pariah
repudiation
revenge
rogue
sanctimonious
scum
smokescreen
sympathy
tyranny
venom
Examples of Weak Subjective Nouns
aberration
allusion
apprehensions
assault
beneficiary
benefit
blood
controversy
credence
distortion
drama
eternity
eyebrows
failures
inclination
intrigue
liability
likelihood
peaceful
persistent
plague
pressure
promise
rejection
resistant
risk
sincerity
slump
spirit
success
tolerance
trick
trust
unity
Outline for Section 2
• Learning subjective nouns with
extraction pattern bootstrapping
• Automatically generating training data
with high-precision classifiers
• Learning subjective and objective
expressions
Training Data Creation
unlabeled texts
subjective clues
rule-based
subjective
sentence
classifier
rule-based
objective
sentence
classifier
subjective
& objective
sentences
Subjective Clues
• Subjectivity lexicon
– @ www.cs.pitt.edu/mpqa
– Entries from manually developed resources
(e.g., General Inquirer, Framenet) +
– Entries automatically identified (E.g., nouns
just described)
Creating High-Precision RuleBased Classifiers
GOAL: use subjectivity clues from previous research to
build a high-precision (low-recall) rule-based classifier
• a sentence is subjective if it contains  2 strong
subjective clues
• a sentence is objective if:
– it contains no strong subjective clues
– the previous and next sentence contain  1 strong subjective
clue
– the current, previous, and next sentence together contain
 2 weak subjective clues
Accuracy of Rule-Based Classifiers
(measured against MPQA annotations)
Subj RBC
SubjRec SubjPrec SubjF
34.2
90.4
49.6
Obj RBC
ObjRec
30.7
ObjPrec
82.4
ObjF
44.7
Generated Data
We applied the rule-based classifiers to ~300,000
sentences from (unannotated) new articles
~53,000 were labeled subjective
~48,000 were labeled objective
training set of over 100,000 labeled
sentences!
Generated Data
• The generated data may serve as training
data for supervised learning, and initial
data for self training Wiebe & Riloff 2005
• The rule-based classifiers are part of
OpinionFinder @ www.cs.pitt.edu/mpqa
• Here, we use the data to learn new
subjective expressions…
Outline for Section 2
• Learning subjective nouns with
extraction pattern bootstrapping
• Automatically generating training data
with high-precision classifiers
• Learning subjective and objective
expressions
Representing Subjective Expressions
with Extraction Patterns
• EPs can represent expressions that are not
fixed word sequences
drove [NP] up the wall
- drove him up the wall
- drove George Bush up the wall
- drove George Herbert Walker Bush up the wall
step on [modifiers] toes
- step on her toes
- step on the mayor’s toes
- step on the newly elected mayor’s toes
gave [NP] a [modifiers] look
- gave his annoying sister a really really mean look
The Extraction Pattern Learner
• Used AutoSlog-TS [Riloff 96] to learn EPs
• AutoSlog-TS needs relevant and irrelevant
texts as input
• Statistics are generated measuring each
pattern’s association with the relevant texts
• The subjective sentences are the relevant
texts, and the objective sentences are the
irrelevant texts
<subject> passive-vp
<subject> active-vp
<subject> active-vp dobj
<subject> active-vp infinitive
<subject> passive-vp infinitive
<subject> auxiliary dobj
<subj> was satisfied
<subj> complained
<subj> dealt blow
<subj> appears to be
<subj> was thought to be
<subj> has position
active-vp <dobj>
infinitive <dobj>
active-vp infinitive <dobj>
passive-vp infinitive <dobj>
subject auxiliary <dobj>
endorsed <dobj>
to condemn <dobj>
get to know <dobj>
was meant to show <dobj>
fact is <dobj>
passive-vp prep <np>
active-vp prep <np>
infinitive prep <np>
noun prep <np>
opinion on <np>
agrees with <np>
was worried about <np>
to resort to <np>
Relevant
Irrelevant
AutoSlog-TS
(Step 1)
[The World Trade Center], [an icon] of [New York City], was
intentionally attacked very early on [September 11, 2001].
Parser
Syntactic Templates
Extraction Patterns:
<subj> was attacked
icon of <np>
was attacked on <np>
AutoSlog-TS (Step 2)
Relevant
Irrelevant
Extraction Patterns:
<subj> was attacked
icon of <np>
was attacked on <np>
Extraction Patterns
Freq Prob
<subj> was attacked
100 .90
icon of <np>
5 .20
was attacked on <np> 80 .79
Identifying Subjective Patterns
• Subjective patterns:
– Freq > X
– Probability > Y
– ~6,400 learned on 1st iteration
• Evaluation against the MPQA corpus:
– direct evaluation of performance as simple classifiers
– evaluation of classifiers using patterns as additional
features
– Does the system learn new knowledge?
• Add patterns to the strong subjective set and re-run the
rule-based classifier
• recall += 20 while precision -+ 7 on 1st iteration
Patterns with Interesting
Behavior
PATTERN
<subj> asked
<subj> was asked
<subj> was expected
was expected from <np>
FREQ
128
11
P(Subj | Pattern)
.63
1.0
45
5
.42
1.0
187
10
.67
.90
<subj> talk
talk of <np>
<subj> is talk
28
10
5
.71
.90
1.0
<subj> is fact
fact is <dobj>
38
12
1.0
1.0
<subj> put
<subj> put end
Conclusions for Section 2
• Extraction pattern bootstrapping can learn subjective nouns
(unannotated data)
• Extraction patterns can represent richer subjective expressions
• Learning methods can discover subtle distinctions that are
important for subjectivity (unannotated data)
• Ongoing work:
– lexicon representation integrating different types of entries,
enabling comparisons (e.g., subsumption relationships)
– learning and bootstrapping processes applied to much larger
unannotated corpora
– processes applied to learning positive and negative subjective
expressions
Outline
• Subjectivity annotation scheme (MPQA)
• Learning subjective expressions from
unannotated texts
• Contextual polarity of sentiment
expressions (briefly)
– Wilson, Wiebe, Hoffmann 2005; Wilson 2007; Wilson
& Wiebe forthcoming
• Word sense and subjectivity
• Conclusions and pointers to related work
<ppol=“neg”>condemn</ppol>
<ppol=“pos”>great</ppol>
<ppol=“neg”>wicked</ppol>
<> </>
<> </>
<> </>
<> </>
<> </>
<> </>
<cpol=“pos”>wicked
</cpol> visuals
<cpol=“neg”>loudly
condemned</cpol>
The building has been
<subjectivity=“obj”>
condemned
</subjectivity>
<> </>
<> </>
3
QA
Review
Mining
Opinion
Tracking
Prior Polarity versus Contextual
Polarity
• Most approaches use a lexicon of positive
and negative words
Prior polarity: out of context, positive or negative
beautiful  positive
horrid  negative
• A word may appear in a phrase that
expresses a different polarity in context
“Cheers to Timothy Whitfield for the wonderfully
horrid visuals.”
Contextual polarity
Goal of This Work
• Automatically distinguish prior and
contextual polarity
Approach
Lexicon
All
Instances
Corpus
Step 1
Neutral
or
Polar?
Step 2
Polar
Instances
Contextual
Polarity?
•
Use machine learning and variety of
features
•
Achieve significant results for a large subset
of sentiment expressions
Manual Annotations
Subjective expressions of the MPQA corpus
annotated with contextual polarity
Annotation Scheme
• Mark polarity of subjective expressions as
positive, negative, both, or neutral
positive
African observers generally approved of his victory while
Western governments denounced it.
negative
Besides, politicians refer to good and evil …
both
Jerome says the hospital feels no different than a
hospital in the states.
neutral
Annotation Scheme
• Judge the contextual polarity of sentiment
ultimately being conveyed
They have not succeeded, and will never
succeed, in breaking the will of this valiant
people.
Annotation Scheme
• Judge the contextual polarity of sentiment
ultimately being conveyed
They have not succeeded, and will never
succeed, in breaking the will of this valiant
people.
Annotation Scheme
• Judge the contextual polarity of sentiment
ultimately being conveyed
They have not succeeded, and will never
succeed, in breaking the will of this valiant
people.
Annotation Scheme
• Judge the contextual polarity of sentiment
ultimately being conveyed
They have not succeeded, and will never
succeed, in breaking the will of this valiant
people.
Features
• Many inspired by Polanyi & Zaenen
(2004): Contextual Valence Shifters
Example:
little threat
little truth
• Others capture dependency relationships
between words
Example:
pos
wonderfully horrid
mod
Lexicon
All
Instances
Corpus
1.
2.
3.
4.
5.
Step 1
Neutral
or
Polar?
Word features
Modification features
Structure features
Sentence features
Document feature
Step 2
Polar
Instances Contextual
Polarity?
• Word token
terrifies
• Word part-of-speech
VB
• Context
that terrifies me
• Prior Polarity
negative
• Reliability
strong subjective
Lexicon
Step 1
All
Instances
Corpus
1.
2.
3.
4.
5.
Neutral
or
Polar?
Word features
Modification features
Structure features
Sentence features
Document feature
poses
subj
obj
report
det
adj
The human rights
det
adj
Polar
Instances Contextual
Polarity?
Binary features:
• Preceded by
– adjective
– adverb (other than not)
– intensifier
• Self intensifier
• Modifies
– strong clue
– weak clue
challenge
mod
Step 2
p
a substantial …
• Modified by
– strong clue
– weak clue
Dependency
Parse Tree
Lexicon
Step 1
All
Instances
Corpus
1.
2.
3.
4.
5.
Neutral
or
Polar?
Word features
Modification features
Structure features
Sentence features
Document feature
obj
report
det
adj
The human rights
Polarity?
Binary features:
• In subject
[The human rights report]
poses
I am confident
challenge
mod
Polar
Instances Contextual
• In copular
poses
subj
Step 2
det
adj
p
a substantial …
• In passive voice
must be regarded
Lexicon
All
Instances
Corpus
1.
2.
3.
4.
5.
Step 1
Neutral
or
Polar?
Word features
Modification features
Structure features
Sentence features
Document feature
Step 2
Polar
Instances Contextual
Polarity?
• Count of strong clues in
previous, current, next
sentence
• Count of weak clues in
previous, current, next
sentence
• Counts of various parts of
speech
Lexicon
All
Instances
Corpus
Neutral
or
Polar?
Word features
Modification features
Structure features
Sentence features
Document feature
Step 2
Polar
Instances Contextual
Polarity?
• Document topic (15)
– economics
– health
…
1.
2.
3.
4.
5.
Step 1
– Kyoto protocol
– presidential election in
Zimbabwe
Example: The disease can be contracted if a
person is bitten by a certain tick or if a person comes
into contact with the blood of a congo fever sufferer.
Lexicon
All
Instances
Corpus
Step 1
Neutral
or
Polar?
• Word token
• Word prior polarity
•
•
•
•
•
•
•
•
Negated
Negated subject
Modifies polarity
Modified by polarity
Conjunction polarity
General polarity shifter
Negative polarity shifter
Positive polarity shifter
Step 2
Polar
Instances Contextual
Polarity?
Word token
terrifies
Word prior polarity
negative
Lexicon
All
Instances
Corpus
Step 1
Neutral
or
Polar?
• Word token
• Word prior polarity
• Negated
• Negated subject
•
•
•
•
•
•
Step 2
Polar
Instances Contextual
Polarity?
Binary features:
• Negated
For example:
– not good
– does not look very good
 not only good but amazing
Modifies polarity
Modified by polarity
Conjunction polarity
• Negated subject
General polarity shifter
No politically prudent Israeli
Negative polarity shifter
could support either of them.
Positive polarity shifter
Lexicon
All
Instances
Corpus
•
•
•
•
Step 1
Neutral
or
Polar?
Word token
Word prior polarity
Negated
Negated subject
Step 2
Polar
Instances Contextual
Polarity?
• Modifies polarity
5 values: positive, negative,
neutral, both, not mod
substantial: negative
• Modifies polarity
• Modified by polarity • Modified by polarity
•
•
•
•
Conjunction polarity
General polarity shifter
Negative polarity shifter
Positive polarity shifter
5 values: positive, negative,
neutral, both, not mod
challenge: positive
substantial (pos) challenge (neg)
Lexicon
All
Instances
Corpus
•
•
•
•
•
•
Step 1
Neutral
or
Polar?
Word token
Word prior polarity
Negated
Negated subject
Modifies polarity
Modified by polarity
Step 2
Polar
Instances Contextual
Polarity?
• Conjunction polarity
5 values: positive, negative,
neutral, both, not mod
good: negative
• Conjunction polarity
• General polarity shifter
• Negative polarity shifter
• Positive polarity shifter
good (pos) and evil (neg)
Lexicon
All
Instances
Corpus
•
•
•
•
•
•
•
•
•
•
Step 1
Neutral
or
Polar?
Word token
Word prior polarity
Negated
Negated subject
Modifies polarity
Modified by polarity
Conjunction polarity
General polarity shifter
Negative polarity shifter
Positive polarity shifter
Step 2
Polar
Instances Contextual
Polarity?
• General polarity
shifter
have few risks/rewards
• Negative polarity
shifter
lack of understanding
• Positive polarity
shifter
abate the damage
Findings
• Statistically significant improvements can
be gained
• Require combining all feature types
• Ongoing work:
– richer lexicon entries
– compositional contextual processing
S/O and Pos/Neg: both Important
Subjective
Positive
Objective
Negative Neutral Both
• Several studies have found two steps
beneficial
– Yu & Hatzivassiloglou 2003; Pang & Lee 2004;
Wilson et al. 2005; Kim & Hovy 2006
S/O and Pos/Neg: both Important
Subjective
Positive
Objective
Negative Neutral Both
• S/O can be more difficult
–
–
–
–
–
Manual annotation of phrases Takamura et al. 2006
Contextual polarity Wilson et al. 2005
Sentiment tagging of words Andreevskaia & Bergler 2006
Sentiment tagging of word senses Esuli & Sabastiani 2006
Later: evidence that S/O more significant for senses
S/O and Pos/Neg: both Important
• Desirable for NLP systems to find a wide range
of private states, including motivations, thoughts,
and speculations, not just positive and negative
sentiments
S/O and Pos/Neg: both Important
Sentiment
Pos
Neg
Other Subjectivity
Both
Pos Neg Neu Both
Objective
Outline
• Subjectivity annotation scheme (MPQA)
• Learning subjective expressions from
unannotated texts
• Contextual polarity of sentiment
expressions
• Word sense and subjectivity
– Wiebe & Mihalcea 2006
• Conclusions and pointers to related work
<ppol=“neg”>condemn></ppol>
<subjl=“subj”>condemn#1</subj>
<subj=“obj”>condemn#2</subj>
<subj=“obj”>condemn#3</subj>
<> </>
<> </>
<> </>
<> </>
<> </>
<> </>
4
<cpol=“pos”>wicked
</cpol> visuals
<cpol=“neg”>loudly
condemned</cpol>
The building has been
<subjectivity=“obj”>
condemned
</subjectivity>
<> </>
<> </>
QA
Review
Mining
Opinion
Tracking
Introduction
• Continuing interest in word sense
– Sense annotated resources being
developed for many languages
• www.globalwordnet.org
– Active participation in evaluations such as
SENSEVAL
Word Sense and Subjectivity
• Though both are concerned with text
meaning, until recently they have been
investigated independently
Subjectivity Labels on Senses
S
Alarm, dismay, consternation – (fear
resulting from the awareness of danger)
O
Alarm, warning device, alarm system – (a
device that signals the occurrence of some
undesirable event)
Subjectivity Labels on Senses
S
O
Interest, involvement -- (a sense of concern with
and curiosity about someone or something; "an
interest in music")
Interest -- (a fixed charge for borrowing money;
usually a percentage of the amount borrowed;
"how much interest do you pay on your
mortgage?")
WSD using Subjectivity Tagging
He spins a riveting plot which
grabs and holds the reader’s interest.
Sense 4 “a sense of concern
with and curiosity about
someone or something” S
Sense 1 “a fixed charge
for borrowing money”
Sense 4
Sense 1?
WSD
System
O
Sense 1
Sense 4?
The notes do not pay interest.
WSD using Subjectivity Tagging
He spins a riveting plot which
grabs and holds the reader’s interest.
S
Subjectivity
Classifier
Sense 4 “a sense of concern
with and curiosity about
someone or something” S
Sense 1 “a fixed charge
for borrowing money”
Sense 4
Sense 1?
WSD
System
O
Sense 1
Sense 4?
O
The notes do not pay interest.
WSD using Subjectivity Tagging
He spins a riveting plot which
grabs and holds the reader’s interest.
S
Subjectivity
Classifier
Sense 4 “a sense of concern
with and curiosity about
someone or something” S
Sense 1 “a fixed charge
for borrowing money”
Sense 4
Sense 1?
WSD
System
O
Sense 1
Sense 4?
O
The notes do not pay interest.
Subjectivity Tagging using WSD
S O?
He spins a riveting plot which
grabs and holds the reader’s interest.
Subjectivity
Classifier
O S?
The notes do not pay interest.
Subjectivity Tagging using WSD
S O?
He spins a riveting plot which
grabs and holds the reader’s interest.
Sense 4
S Sense 4 “a sense of
Subjectivity
Classifier
concern with and curiosity
about someone or something”
O Sense 1 “a fixed charge
WSD
System
for borrowing money”
O S?
Sense 1
The notes do not pay interest.
Subjectivity Tagging using WSD
S O?
He spins a riveting plot which
grabs and holds the reader’s interest.
Sense 4
S Sense 4 “a sense of
Subjectivity
Classifier
concern with and curiosity
about someone or something”
O Sense 1 “a fixed charge
WSD
System
for borrowing money”
O S?
Sense 1
The notes do not pay interest
Refining WordNet
• Semantic Richness
• Find inconsistencies and gaps
– Verb assault – attack, round, assail, last out,
snipe, assault (attack in speech or writing)
“The editors of the left-leaning paper attacked
the new House Speaker”
– But no sense for the noun as in “His verbal
assault was vicious”
Goals
• Explore interactions between word sense
and subjectivity
– Can subjectivity labels be assigned to word
senses?
• Manually
• Automatically
– Can subjectivity analysis improve word
sense disambiguation?
– Can word sense disambiguation improve
subjectivity analysis? Current work
Outline for Section 4
• Motivation and Goals
• Assigning Subjectivity Labels to Word
Senses
– Manually
– Automatically
• Word Sense Disambiguation using
Automatic Subjectivity Analysis
• Conclusions
Annotation Scheme
• Assigning subjectivity labels to WordNet
senses
– S: subjective
– O: objective
– B: both
Examples
S
O
Alarm, dismay, consternation – (fear resulting form the
awareness of danger)
– Fear, fearfulness, fright – (an emotion experiences in anticipation
of some specific pain or danger (usually accompanied by a
desire to flee or fight))
Alarm, warning device, alarm system – (a device that
signals the occurrence of some undesirable event)
– Device – (an instrumentality invented for a particular purpose;
“the device is small enough to wear on your wrist”; “a device
intended to conserve water”
Subjective Sense Definition
• When the sense is used in a text or
conversation, we expect it to express
subjectivity, and we expect the
phrase/sentence containing it to be
subjective.
Subjective Sense Examples
• His alarm grew
Alarm, dismay, consternation – (fear resulting form the
awareness of danger)
– Fear, fearfulness, fright – (an emotion experiences in anticipation
of some specific pain or danger (usually accompanied by a
desire to flee or fight))
• He was boiling with anger
Seethe, boil – (be in an agitated emotional state; “The
customer was seething with anger”)
– Be – (have the quality of being; (copula, used with an adjective
or a predicate noun); “John is rich”; “This is not a good answer”)
Subjective Sense Examples
• What’s the catch?
Catch – (a hidden drawback; “it sounds good
but what’s the catch?”)
Drawback – (the quality of being a hindrance; “he
pointed out all the drawbacks to my plan”)
• That doctor is a quack.
Quack – (an untrained person who pretends to
be a physician and who dispenses medical
advice)
– Doctor, doc, physician, MD, Dr., medico
Objective Sense Examples
• The alarm went off
Alarm, warning device, alarm system – (a device that
signals the occurrence of some undesirable event)
– Device – (an instrumentality invented for a particular purpose;
“the device is small enough to wear on your wrist”; “a device
intended to conserve water”
• The water boiled
Boil – (come to the boiling point and change from a
liquid to vapor; “Water boils at 100 degrees Celsius”)
– Change state, turn – (undergo a transformation or a change of
position or action)
Objective Sense Examples
• He sold his catch at the market
Catch, haul – (the quantity that was
caught; “the catch was only 10 fish”)
– Indefinite quantity – (an estimated quantity)
• The duck’s quack was loud and brief
Quack – (the harsh sound of a duck)
– Sound – (the sudden occurrence of an
audible event)
Objective Senses: Observation
• We don’t necessarily expect
phrases/sentences containing objective
senses to be objective
– Will someone shut that darn alarm off?
– Can’t you even boil water?
• Subjective, but not due to alarm and boil
Objective Sense Definition
• When the sense is used in a text or
conversation, we don’t expect it to express
subjectivity and, if the phrase/sentence
containing it is subjective, the subjectivity
is due to something else.
Inter-Annotator Agreement Results
• Overall:
– Kappa=0.74
– Percent Agreement=85.5%
• Without the 12.3% cases when a judge is
uncertain:
– Kappa=0.90
– Percent Agreement=95.0%
S/O and Pos/Neg
• Hypothesis: moving from word to word
sense is more important for S versus O
than it is for Positive versus Negative
• Pilot study with the nouns of the
SENSEVAL-3 English lexical sample task
– ½ have both subjective and objective senses
– Only 1 has both positive and negative
subjective senses
Outline for Section 4
• Motivation and Goals
• Assigning Subjectivity Labels to Word
Senses
– Manually
– Automatically
• Word Sense Disambiguation using
Automatic Subjectivity Analysis
• Conclusions
Overview
• Main idea: assess the subjectivity of a word
sense based on information about the
subjectivity of
– a set of distributionally similar words
– in a corpus annotated with subjective expressions
Preliminaries
• Suppose the goal were to assess the
subjectivity of a word w, given an
annotated corpus
• We could consider how often w appears in
subjective expressions
• Or, we could take into account more
evidence – the subjectivity of a set of
distributionally similar words
Lin’s Distributional Similarity
R2
R3
I
have
a
brown
R1
R4
Word
I
have
brown
R
R1
R2
R3
...
W
have
dog
dog
Lin 1998
dog
Lin’s Distributional Similarity
Word1
R W
R W
Word2
RW
RW
RW
RW
RW
RW
RW
RW
RW
RW
Subjectivity of word w
Unannotated
Corpus
(BNC)
Annotated
Corpus
(MPQA)
subj(w) = #insts(DSW) in SE - #insts(DSW) not in SE
#insts (DSW)
DSW = {dsw1, …, dswj}
Subjectivity of word w
Unannotated
Corpus
(BNC)
Annotated
Corpus
(MPQA)
subj(w) = #insts(DSW) in SE - #insts(DSW) not in SE
#insts (DSW)
[-1, 1] [highly objective, highly subjective]
DSW = {dsw1, …, dswj}
Subjectivity of word w
Unannotated
Corpus
(BNC)
Annotated
Corpus
(MPQA)
dsw1inst1
+1
dsw1inst2
-1
dsw2inst1
+1
subj(w) =
+1 -1 +1
3
DSW = {dsw1,dsw2}
= 1/3
Subjectivity of word sense wi
Annotated
Corpus
(MPQA)
Rather than 1, add or subtract
sim(wi,dswj)
dsw1inst1
+sim(wi,dsw1)
dsw1inst2
-sim(wi,dsw1)
dsw2inst1
+sim(wi,dsw2)
+sim(wi,dsw1) - sim(wi,dsw1) + sim(wi,dsw2)
subj(wi) =
2 * sim(wi,dsw1) + sim(wi,dsw2)
Method –Step 1
• Given word w
• Find distributionally similar words
– DSW = {dswj | j = 1 .. n}
Method –Step 2
DSW1
word w = Alarm Panic
DSW2
Detector
Sense w1 “fear
sim(w1,panic)
sim(w1,detector)
sim(w2,panic)
sim(w2,detector)
resulting from the
awareness of
danger”
Sense w2 “a
device that signals
the occurrence of
some undesirable
event”
Method – Step 2
• Find the similarity between each word
sense and each distributionally similar word
wnss1( wsi , dsw j )
sim ( wsi , dsw j ) 
 wnss1(ws , dsw )
i 'senses( w )
wnss1(wsi , dsw j ) 
 wnss
max
ksenses( dswj )
i'
j
wnss(wsi , dswkj )
can be any concept-based similarity measure
between word senses
 we use Jiang & Conrath 1997
Method –Step 3
Input: word sense wi of word w
DSW = {dswj | j = 1..n}
sim(wi,dswj)
MPQA Opinion Corpus
Output: subjectivity score subj(wi)
Method –Step 3
j
totalsim =

#insts(dswj) * sim(wi,dswj)
1
evi_subj = 0
for each dswj in DSW:
for each instance k in insts(dswj):
if k is in a subjective expression:
evi_subj += sim(wi,dswj)
else:
evi_subj -= sim(wi,dswj)
subj(wi) = evi_subj / totalsim
Evaluation
• Calculate subj scores for each word
sense, and sort them
• While 0 is a natural candidate for division
between S and O, we perform the
evaluation for different thresholds in [1,+1]
• Determine the precision of the algorithm at
different points of recall
Evaluation: precision/recall curves
1
baseline
0.9
selected
0.8
all
Precision
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Recall
1
Number of distributionally similar
words = 160
Outline for Section 4
• Motivation and Goals
• Assigning Subjectivity Labels to Word
Senses
– Manually
– Automatically
• Word Sense Disambiguation using
Automatic Subjectivity Analysis
• Conclusions
Overview
• Augment an existing data-driven WSD
system with a feature reflecting the
subjectivity of the context of the
ambiguous word
• Compare the performance of original and
subjectivity-aware WSD systems
• The ambiguous nouns of the SENSEVAL3 English Lexical Task
• SENSEVAL-3 data
Original WSD System
• Integrates local and topical features:
• Local: context of three words to the left and right,
their part-of-speech
• Topical: top five words occurring at least three
times in the context of a word sense
• [Ng & Lee, 1996], [Mihalcea, 2002]
• Naïve Bayes classifier
• [Lee & Ng, 2003]
Subjectivity Classifier
• Rule-based automatic sentence classifier
from Wiebe & Riloff 2005
• Harvests subjective and objective
sentences it is certain about from
unannotated data
• Part of OpinionFinder @ www.cs.pitt.edu/mpqa/
Subjectivity Tagging for WSD
Sentences of the SENSEVAL-3data that contain
a target noun are tagged with O, S, or B:
Sentencej
Sentencek
“interest”
“interest”
“atmosphere”
S
Subjectivity
Classifier
O
S
…
… … …
Sentencei
Tags are fed as input to the Subjectivity Aware WSD System
WSD using Subjectivity Tagging
Sentencei
Original
WSD System
“interest”
Sense 4 Sense 1
S
Subjectivity
Classifier
S, O, or B
Subjectivity
Aware WSD
System
Sense 4 “a fixed charge for
borrowing money”
Sense 1 “a sense of
concern with and curiosity
about someone or
something”
WSD using Subjectivity Tagging
Hypothesis: instances of subjective senses are more likely
to be in subjective sentences, so sentence subjectivity is an
informative feature for WSD of words with both
subjective and objective senses
Words with S and O Senses
Word
argument
atmosphere
difference
difficulty
image
interest
judgment
plan
sort
source
Average
Classifier
Senses Baseline basic
+ subj
5 49.4% 51.4% 54.1%
6 65.4% 65.4% 66.7%
5 40.4% 54.4% 57.0%
4 17.4% 47.8% 52.2%
7 36.5% 41.2% 43.2%
7 41.9% 67.7% 68.8%
7 28.1% 40.6% 43.8%
3 81.0% 81.0% 81.0%
4 65.6% 66.7% 67.7%
9 40.6% 40.6% 40.6% S sense not in data
46.6% 55.6% 57.5%
4.3% error reduction; significant (p < 0.05 paired t-test)
Words with Only O Senses
Word
arm
audience
bank
degree
disc
organization
paper
party
performance
shelter
Average
Classifier
Senses Baseline basic
+ subj
6 82.0% 85.0% 84.2%
4 67.0% 74.0% 74.0%
10 62.6% 62.6% 62.6%
7 60.9% 71.1% 71.1%
4 38.0% 65.6% 66.4%
7 64.3% 64.3% 64.3%
7 25.6% 49.6% 48.0%
5 62.1% 62.9% 62.9%
5 26.4% 34.5% 34.5%
5 44.9% 65.3% 65.3%
53.3% 63.5% 63.3%
>
=
=
=
< often target
=
>
=
=
=
Overall 2.2% error reduction; significant (p < 0.1)
Conclusions for Section 4
• Can subjectivity labels be assigned to
word senses?
– Manually
• Good agreement; Kappa=0.74
• Very good when uncertain cases removed;
Kappa=0.90
– Automatically
• Method substantially outperforms baseline
• Showed feasibility of assigning subjectivity labels
to the fine-grained sense level of word senses
Conclusions for Section 4
• Can subjectivity analysis improve word sense
disambiguation?
– Quality of a WSD system can be improved with
subjectivity information
• Improves performance, but mainly for words with both S and
O senses
– 4.3% error reduction; significant (p < 0.05)
• Performance largely remains the same or degrades for words
that don’t
• Once senses have been assigned subjectivity labels, a WSD
system could consult them to decide whether to consider the
subjectivity feature
Pointers To Related Work
• Tutorial held at EUROLAN 2007
“Semantics, Opinion, and Sentiment in
Text”, Iaşi, Romania, August
• Slides, bibliographies, …
– www.cs.pitt.edu/~wiebe/EUROLAN07
Conclusions
• Subjectivity is common in language
• Recognizing it is useful in many NLP tasks
• It comes in many forms and often is contextdependent
• A wide variety of features seem to be necessary
for opinion and polarity recognition
• Subjectivity may be assigned to word senses,
promising improved performance for both
subjectivity analysis and WSD
– Promising as well for multi-lingual subjectivity analysis
Mihalcea, Banea, Wiebe 2007
Acknowledgements
• CERATOPS Center for the Extraction and
Summarization of Events and Opinions in Text
– Pittsburgh: Paul Hoffmann, Josef Ruppenhofer, Swapna
Somasundaran, Theresa Wilson
– Cornell: Claire Cardie, Eric Breck, Yejin Choi, Ves Stoyanov
– Utah: Ellen Riloff, Sidd Patwardhan, Bill Phillips
• UNT: Rada Mihalcea, Carmen Banea
• [email protected]: Wendy Chapman, Rebecca Hwa, Diane
Litman, …
Download