09-18-12-sentiment

advertisement
Sentiment and Opinion
Sep18, 2012
Analysis of Social Media Seminar
William Cohen
First assignment: due Friday
• Go to http://malt/mw
• Create an account for yourself
– use andrew id
• Go to your user page
– Your real name & a link to your home page
– Preferably a picture
– Who you are and what you hope to get out of the class (Let me
know if you’re just auditing)
– Any special skills you have, research interests that you have,
related projects you have been or might be working on, etc.
Outline
• Announcements
• Recap
– With a little more on word senses
• More discussion: what exactly is
subjectivity, sentiment and polarity?
– Annotating a corpus for subjectivity
– Fine-grained sentiment for reviews
• More distinctions:
– Agreement and discourse
3
In our previous episode…
4
Motivations: sentiment common…
Analysis : modeling & learning
People
Communication,
Language
Social
Media
Networks
5
…and important…
•
•
•
•
Product review mining: What features of the
ThinkPad T43 do customers like and which do they
dislike?
Review classification: Is a review positive or negative
toward the movie?
Tracking sentiments toward topics over time: Is
anger ratcheting up or cooling down?
Etc.
[These are all ways to summarize one sort of
content that is common on blogs, bboards,
newsgroups, etc. –W]
6
…and non-trivial
7
What units do we attach
sentiment to?
•
•
•
•
•
Individual words (“nice”, “comfortable”)
Phrases (“slow service”)
Sentences?
Documents?
…?
8
Hatzivassiloglou & McKeown 1997
Build a graph of adjectives linked by the same or
different semantic orientation (determined by
conjunctions)…
scenic
nice
terrible
painful
handsome
fun
expensive
comfortable
ICWSM 2008
9
Hatzivassiloglou & McKeown 1997
…and a clustering algorithm partitions the adjectives
into two subsets
+
slow
scenic
nice
terrible
handsome
painful
fun
expensive
comfortable
ICWSM 2008
10
Word senses
Senses
Jan - ICWSM 2008
11
Senses
Is this polar?
Jan - ICWSM 2008
12
Non-subjective senses of brilliant
1. Method for identifying brilliant material in
paint - US Patent 7035464
2. In a classic pasodoble, an opening section
in the minor mode features a brilliant
trumpet melody, while the second section
in the relative major begins with the violins.
Jan - ICWSM 2008
13
Subjective Sense Examples
S
N
• His alarm grew
Alarm, dismay, consternation – (fear resulting form the
awareness of danger)
– Fear, fearfulness, fright – (an emotion experiences in anticipation
of some specific pain or danger (usually accompanied by a
desire to flee or fight))
• He was boiling with anger
Seethe, boil – (be in an agitated emotional state; “The
customer was seething with anger”)
S
N
– Be – (have the quality of being; (copula, used with an adjective
or a predicate noun); “John is rich”; “This is not a good
answer”)
ICWSM 2008
14
Objective Sense Examples
• The alarm went off
Alarm, warning device, alarm system – (a device that
signals the occurrence of some undesirable event)
– Device – (an instrumentality invented for a particular purpose;
“the device is small enough to wear on your wrist”; “a device
intended to conserve water”
• The water boiled
Boil – (come to the boiling point and change from a
liquid to vapor; “Water boils at 100 degrees Celsius”)
– Change state, turn – (undergo a transformation or a change of
position or action)
ICWSM 2008
15
Objective Senses: Observation
• We don’t necessarily expect
phrases/sentences containing objective
senses to be objective
– Will someone shut that darn alarm off?
– Can’t you even boil water?
• Subjective, but not due to alarm and boil
ICWSM 2008
16
Objective Sense Definition
• When the sense is used in a text or
conversation, we don’t expect it to
express subjectivity and, if the
phrase/sentence containing it is
subjective, the subjectivity is due to
something else.
ICWSM 2008
17
Hatzivassiloglou & McKeown 1997
• Later/related work:
– LIWC, General Inquirer, other hand-built lexicons
– Turney & Littman, TOIS 2003: Similar performance with 100M
word corpus and PMI – higher accuracy better if you allow
abstention on 25% of the “hard” cases.
– Kamps et al, LREC 04: Determine orientation by graph analysis
of Wordnet (distance to “good”, “bad” in graph determined by
synonymy relation)
– SentiWordNet, Esuli and Sebastiani, LREC 06: Similar to Kamps
et al, also using a BOW classifier and WordNet glosses
(definitions).
18
What units do we attach
sentiment to?
•
•
•
•
•
Individual words (“nice”, “comfortable”)
Phrases (“slow service”)
Sentences?
Documents?
…?
19
Turney 2002
• Goal: classify reviews as “positive” or
“negative”.
– Epinions “[not] recommended” as given by authors.
• Method:
– Find (possibly) meaningful phrases from review (e.g.,
“bright display”, “inspiring lecture”, …),
• based on POS patterns, like ADJ NOUN
– Estimate “semantic orientation” of each candidate
phrase
• Based on pointwise mutual information: Altavista counts of
phrase’s cooccurrence with “excellent”, “poor”
– Assign overall orentation of review by averaging
orentation of the phrases in the review
20
21
Pang et al EMNLP 2002
22
Pang & Lee EMNLP 2004
23
Methods: 2002
• Movie review classification as pos/neg.
• Method one: count human-provided polar words
(sort of like Turney):
– Eg, “love, wonderful, best, great, superb, still,
beautiful” vs “bad, worst, stupid, waste, boring, ?, !”
gives 69% accuracy on 700+/700- movie reviews
• Method two: plain ‘ol text classification
– Eg, Naïve Bayes bag of words: 78.7; SVM-lite “set of
words”: 82.9 was best result
– Adding bigrams and/or POS tags doesn’t change
things much.
24
Pang & Lee EMNLP 2004
• Can you capture the discourse in the document?
– Expect longish runs of subjective text and longish
runs of objective text.
– Can you tell which is which?
• Idea:
– Classify sentences as subjective/objective, based on
two corpora: short biased reviews, and IMDB plot
summaries.
– Smooth classifications to promote longish
homogeneous sections.
– Classify polarity based on the K “most subjective”
sentences
25
What units do we attach
sentiment to?
•
•
•
•
•
Individual words (“nice”, “comfortable”)
Phrases (“slow service”)
Sentences?
Documents?
…?
26
Outline
• Announcements
• Recap
– With a little more on word senses
• More discussion: what exactly is
subjectivity, sentiment and polarity?
– Annotating a corpus for subjectivity
– Fine-grained sentiment for reviews
• More distinctions:
– Agreement and discourse
27
Manual and Automatic
Subjectivity and Sentiment
Analysis
Jan Wiebe
Josef Ruppenhofer
Swapna Somasundaran
University of Pittsburgh
Everyone knows that dragons don't exist. But
while this simplistic formulation may satisfy the
layman, it does not suffice for the scientific mind.
The School of Higher Neantical Nillity is in fact
wholly unconcerned with what does exist. Indeed,
the banality of existence has been so amply
demonstrated, there is no need for us to discuss it
any further here. The brilliant Cerebron, attacking
the problem analytically, discovered three distinct
kinds of dragon: the mythical, the chimerical, and
the purely hypothetical. They were all, one might
say, nonexistent, but each nonexisted in an
entirely different way...
- Stanislaw Lem, “The Cyberiad”
29
Preliminaries
• What do we mean by subjectivity?
• The linguistic expression of somebody’s
emotions, sentiments, evaluations,
opinions, beliefs, speculations, etc.
– Wow, this is my 4th Olympus camera.
– Staley declared it to be “one hell of a
collection”.
– Most voters believe that he's not going to
raise their taxes
30
Corpus Annotation
Wiebe, Wilson, Cardie 2005
Annotating Expressions of Opinions and Emotions in
Language
Leaving aside what’s
possible, what sort of
inferences about
sentiment, opinion, etc
would we like to be able
to make?
31
Overview
• Fine-grained: expression-level rather than
sentence or document level
– The photo quality was the best that I have seen in a
camera.
– The photo quality was the best that I have seen in a
camera.
• Annotate
– expressions of opinions, evaluations, emotions
– material attributed to a source, but presented
objectively
32
Overview
• Fine-grained: expression-level rather than
sentence or document level
– The photo quality was the best that I have seen in a
camera.
– The photo quality was the best that I have seen in a
camera.
• Annotate
– expressions of opinions, evaluations, emotions,
beliefs
– material attributed to a source, but presented
objectively
33
Overview
• Opinions, evaluations, emotions,
speculations are private states.
• They are expressed in language by
subjective expressions.
Private state: state that is not open to objective
observation or verification.
Quirk, Greenbaum, Leech, Svartvik (1985). A
Comprehensive Grammar of the English Language.
34
Overview
• Focus on three ways private states are
expressed in language
– Direct subjective expressions
– Expressive subjective elements
– Objective speech events
35
Direct Subjective Expressions
• Direct mentions of private states
The United States fears a spill-over from the
anti-terrorist campaign.
Fear is a private state
• Private states expressed in speech events
“I fear electoral fraud,” Tsvangirai said.
Fear is a private state
but not of the author
36
Direct Subjective Expressions
• Direct mentions of private states
The United States fears a spill-over from the
anti-terrorist campaign.
Fear is a private state
• Private states expressed in speech events
“We foresaw electoral fraud but not daylight
robbery,” Tsvangirai said.
This implies a private
state, so it’s not direct..
37
Expressive Subjective Elements
[Banfield 1982]
• “We foresaw electoral fraud but not daylight
robbery,” Tsvangirai said
Understood as implying certain mental state
• The part of the US human rights report about
China is full of absurdities and fabrications
Compare:
•“We foresaw difficulties with the electoral process but not to this
extent”, Tsvangirai said.
•The part of the US human rights report about China contains many
statements that we were unable to verify.
38
Objective Speech Events
• Material attributed to a source, but
presented as objective fact
The government, it added, has amended the Pakistan
Citizenship Act 10 of 1951 to enable women of Pakistani
descent to claim Pakistani nationality for their children
born to foreign husbands.
[What does this have to do with opinion? You
need it to sort out who has opinions about
what… -W]
39
An example…
40
Nested Sources
(Writer)
“The report is full of absurdities,’’ Xirao-Nima said the next day.
41
Nested Sources
(Writer, Xirao-Nima)
“The report is full of absurdities,’’ Xirao-Nima said the next day.
42
Nested Sources
(Writer Xirao-Nima)
(Writer Xirao-Nima)
“The report is full of absurdities,’’ Xirao-Nima said the next day.
43
“The report is full of absurdities,” Xirao-Nima said the next day.
Objective speech event
anchor: the entire sentence
source: <writer>
implicit: true
Direct subjective
anchor: said
source: <writer, Xirao-Nima>
intensity: high
expression intensity: neutral
attitude type: negative
target: report
Expressive subjective element
anchor: full of absurdities
source: <writer, Xirao-Nima>
intensity: high
attitude type: negative
Attributes:
The anchor is the linguistic expression—the stretch
of text—that tells us that there is a private state.
[Where to ‘hang’ the annotation’ -W]
The source is the person to whom the private state
is attributed. Note that this can be a chain of
people.
The target is the content of the private state or
what the private state is about.
Attitude type: If not specified, it is to be understood
as neutral but can be set to positive or negative as
required.
Intensity records the intensity of “the private state
as a whole.”
44
Another example…
45
“The US fears a spill-over’’, said Xirao-Nima, a professor
of foreign affairs at the Central University for Nationalities.
ICWSM 2008
46
(Writer)
“The US fears a spill-over’’, said Xirao-Nima, a professor
of foreign affairs at the Central University for Nationalities.
ICWSM 2008
47
(writer, Xirao-Nima)
“The US fears a spill-over’’, said Xirao-Nima, a professor
of foreign affairs at the Central University for Nationalities.
ICWSM 2008
48
(writer, Xirao-Nima, US)
“The US fears a spill-over’’, said Xirao-Nima, a professor
of foreign affairs at the Central University for Nationalities.
ICWSM 2008
49
(Writer)
(writer, Xirao-Nima, US)
(writer, Xirao-Nima)
“The US fears a spill-over’’, said Xirao-Nima, a professor
of foreign affairs at the Central University for Nationalities.
ICWSM 2008
50
“The US fears a spill-over’’, said Xirao-Nima, a professor
of foreign affairs at the Central University for Nationalities.
Objective speech event
anchor: the entire sentence
source: <writer>
implicit: true
Objective speech event
anchor: said
source: <writer, Xirao-Nima>
Direct subjective
anchor: fears
source: <writer, Xirao-Nima, US>
intensity: medium
expression intensity: medium
…
ICWSM 2008
51
Corpus
• www.cs.pitt.edu/mqpa/databaserelease (version 2)
• English language versions of articles from the world
press (187 news sources)
• Themes of the instructions:
– No rules about how particular words should be annotated.
– Don’t take expressions out of context and think about
what they could mean, but judge them as they are used in
that sentence.
• Kappa around 0.7 – 0.8.
52
Reasons for fine-grain annotation
and analysis
• Turney, Pang et al: document D is about a
known product PD, sentiment refers to PD. Life is
more complicated:
– “The part of the US human rights report about China is full
of absurdities and fabrications”:
• What is “absurd & fabricated”? The part, the US, the report,
or China?
• For sentiment about products we want to know
what is good or bad: there are usually tradeoffs
– Huge screen  very heavy
– Very fast  really expensive
53
Outline
• Announcements
• Recap
– With a little more on word senses
• More discussion: what exactly is
subjectivity, sentiment and polarity?
– Annotating a corpus for subjectivity
– Fine-grained sentiment for reviews
• More distinctions:
– Agreement and discourse
54
55
Hu & Liu 2004
Mining Opinion Features in Customer Reviews
Sample- one of many papers
• Here: explicit product features only, expressed
as nouns or compound nouns
• Use association rule mining technique rather
than symbolic or statistical approach to
terminology
• Extract associated items (item-sets) based on
support (>1%)
I think this technique basically amounts to taking
frequent ngrams, after they do the pruning - W
ICWSM 2008
56
Hu & Liu 2004
• Feature pruning
– compactness
• “I had searched for a digital camera for 3 months.”
• “This is the best digital camera on the market”
• “The camera does not have a digital zoom”
– redundancy/overlap
• manual ; manual mode; manual setting
• Feature expansion
– For sentences with opinion words and no
features, add NP closest to each opinion word
ICWSM 2008
57
Hu & Liu 2004
• For sentences with frequent feature,
extract nearby adjective as “effective
opinion” for
• Based on opinion words, gather
infrequent features (N, NP nearest to an
opinion adjective)
– The salesman was easy going and let me try
all the models on display.
ICWSM 2008
58
Hu & Liu 2004
• Semantic orientation of words
– Propogate labels for a set of 30 seeds through
WordNet using synonymy and antonymy
• Opinion sentences: opinion word + feature
• Semantic orientation of sentences
– Flip word polarity if there are nearby negations
– Go with the majority of opinion words
– Break ties with majority of words that are part of
“effective opinions”
ICWSM 2008
• i.e., adjective closest
to a feature
59
Hu & Liu 2004
• Summary:
– Feature identification: 72-80% recall/precision
• on 500 reviews from five domains.
– Opinion sentence extraction (opinion word +
feature): 60-80% recall/precision
– Sentence-level orientation accuracy: 73-95%
Comment: 80% on each step does not mean you’re
done… -W
ICWSM 2008
60
61
Outline
• Announcements
• Recap
– With a little more on word senses
• More discussion: what exactly is
subjectivity, sentiment and polarity?
– Annotating a corpus for subjectivity
– Fine-grained sentiment for reviews
• More distinctions:
– Agreement and discourse
62
Everyone knows that dragons don't exist. But...
- Stanislaw Lem, “The Cyberiad”
63
(General) Subjectivity Types
[Wilson 2008]
Other (including cognitive)
Note: similar ideas:
polarity, semantic orientation, sentiment
64
PDTB
[In that suit, the SEC accused Mr. Antar of
engaging in a "massive financial fraud" to
overstate the earnings of Crazy Eddie, Edison,
N.J., over a three-year period. ARG1]
IMPLICIT_CONTRAST [ Through his lawyers,
Mr. Antar has denied allegations in the SEC suit
and in civil suits previously filed by shareholders
against Mr. Antar and others. ARG2]
Contrast between the SEC accusing Mr. Antar of
something, and his denying the accusation
ICWSM 2008
65
Subjectivity
In that suit, the SEC [[accused SENTIMENTNEG] Mr. Antar of engaging in a "massive
financial fraud" to overstate the earnings of
Crazy Eddie, Edison, N.J. ARGUING-POS], over
a three-year period.
Through his lawyers, Mr. Antar [has denied
AGREE-NEG] allegations in the SEC suit and in
civil suits previously filed by shareholders
against Mr. Antar and others.
Two attitudes combined into one large
disagreement between two parties
66
Subjectivity
In that suit, the SEC [[accused SENTIMENT-NEG] Mr.
Antar of engaging in a "massive financial fraud" to
overstate the earnings of Crazy Eddie, Edison, N.J.
ARGUING-POS], over a three-year period.
Through his lawyers, Mr. Antar [has denied AGREENEG] allegations in the SEC suit and in civil suits
previously filed by shareholders against Mr. Antar and
others.
Subjectivity: arguing-pos and agree-neg with
different sources; Hypothesis: common with
contrast. Help recognize the implicit contrast.
ICWSM 2008
67
68
George Orwell
69
70
71
72
73
Where do we look for…?
Sentiment/Subjectivity
• Individual words (“nice”,
“comfortable”)
• Phrases (“slow service”)
• Sentences
• Documents
• Genres
– RottenTomatos vs IMBD
plot summaries
Coherence
• Between words
– Cooccurence
– Relations in WordNet
• Between sentences
– Proximity
– Discourse structure
• Between documents
– Hyperlinks, references to
entities
– Agreement/disagreement
74
Download