Subjectivity and Sentiment Analysis Jan Wiebe Department of Computer Science CERATOPS: Center for the Extraction and Summarization of Events and Opinions in Text University of Pittsburgh What is Subjectivity? • The linguistic expression of somebody’s opinions, sentiments, emotions, evaluations, beliefs, speculations (private states) Wow, this is my 4th Olympus camera Staley declared it to be “one wicked song” Most voters believe he won‘t raise their taxes One Motivation • Automatic question answering… Fact-Based Question Answering Q: When is the first day of spring in 2007? Q: Does the US have a tax treaty with Cuba? Fact-Based Question Answering Q: When is the first day of spring in 2007? A: March 21 Q: Does the US have a tax treaty with Cuba? A: Thus, the U.S. has no tax treaties with nations like Iraq and Cuba. Opinion Question Answering Q: What is the international reaction to the reelection of Robert Mugabe as President of Zimbabwe? Opinion Question Answering Q: What is the international reaction to the reelection of Robert Mugabe as President of Zimbabwe? A: African observers generally approved of his victory while Western Governments strongly denounced it. More Motivations • • • • • Product review mining: What features of the ThinkPad T43 do customers like and which do they dislike? Review classification: Is a review positive or negative toward the movie? Information Extraction: Is “killing two birds with one stone” a terrorist event? Tracking sentiments toward topics over time: Is anger ratcheting up or cooling down? Etcetera! <ppol=“neg”>condemn</ppol> <ppol=“pos”>great</ppol> <ppol=“neg”>wicked</ppol> <> </> <> </> <> </> <> </> <> </> <> </> <cpol=“pos”>wicked </cpol> visuals <cpol=“neg”>loudly condemned</cpol> The building has been <subjectivity=“obj”> condemned </subjectivity> <> </> <> </> QA Review Mining Opinion Tracking <ppol=“neg”>condemn</ppol> <ppol=“pos”>great</ppol> <ppol=“neg”>wicked</ppol> <> </> <> </> <> </> <> </> 1 <> </> <> </> <cpol=“pos”>wicked </cpol> visuals <cpol=“neg”>loudly condemned</cpol> The building has been <subjectivity=“obj”> condemned </subjectivity> <> </> <> </> QA Review Mining Opinion Tracking <ppol=“neg”>condemn</ppol> <ppol=“pos”>great</ppol> <ppol=“neg”>wicked</ppol> <> </> <> </> <> </> <> </> <> </> <> </> <cpol=“pos”>wicked </cpol> visuals <cpol=“neg”>loudly condemned</cpol> The building has been <subjectivity=“obj”> condemned </subjectivity> 2 <> </> <> </> QA Review Mining Opinion Tracking <ppol=“neg”>condemn</ppol> <ppol=“pos”>great</ppol> <ppol=“neg”>wicked</ppol> <> </> <> </> <> </> <> </> <> </> <> </> <cpol=“pos”>wicked </cpol> visuals <cpol=“neg”>loudly condemned</cpol> The building has been <subjectivity=“obj”> condemned </subjectivity> <> </> <> </> 3 QA Review Mining Opinion Tracking <ppol=“neg”>condemn</ppol> <ppol=“pos”>great</ppol> <ppol=“neg”>wicked</ppol> <> </> <> </> <> </> <> </> <> </> <> </> 4 <cpol=“pos”>wicked </cpol> visuals <cpol=“neg”>loudly condemned</cpol> The building has been <subjectivity=“obj”> condemned </subjectivity> <> </> <> </> QA Review Mining Opinion Tracking Outline • Subjectivity annotation scheme (MPQA) • Learning subjective expressions from unannotated texts • Contextual polarity of sentiment expressions • Word sense and subjectivity • Conclusions and pointers to related work <ppol=“neg”>condemn</ppol> <ppol=“pos”>great</ppol> <ppol=“neg”>wicked</ppol> <> </> <> </> <> </> <> </> 1 <> </> <> </> <cpol=“pos”>wicked </cpol> visuals <cpol=“neg”>loudly condemned</cpol> The building has been <subjectivity=“obj”> condemned </subjectivity> <> </> <> </> QA Review Mining Opinion Tracking Corpus Annotation Wiebe, Wilson, Cardie 2005 Wilson & Wiebe 2005 Somasundaran, Wiebe, Hoffmann, Litman 2006 Wilson 2007 Outline for Section 1 • • • • Overview Frame types Nested Sources Extensions Overview • Fine-grained: expression level rather than sentence or document level • Annotate – expressions of opinions, sentiments, emotions, evaluations, speculations, … – material attributed to a source, but presented objectively Overview • Opinions, evaluations, emotions, speculations are private states • They are expressed in language by subjective expressions Private state: state that is not open to objective observation or verification Quirk, Greenbaum, Leech, Svartvik (1985). A Comprehensive Grammar of the English Language. Overview • Focus on three ways private states are expressed in language – Direct subjective expressions – Expressive subjective elements – Objective speech events Direct Subjective Expressions • Direct mentions of private states The United States fears a spill-over from the anti-terrorist campaign. • Private states expressed in speech events “We foresaw electoral fraud but not daylight robbery,” Tsvangirai said. Expressive Subjective Elements Banfield 1982 • “We foresaw electoral fraud but not daylight robbery,” Tsvangirai said • The part of the US human rights report about China is full of absurdities and fabrications Objective Speech Events • Material attributed to a source, but presented as objective fact (without evaluation…) The government, it added, has amended the Pakistan Citizenship Act 10 of 1951 to enable women of Pakistani descent to claim Pakistani nationality for their children born to foreign husbands. Nested Sources “The report is full of absurdities,’’ Xirao-Nima said the next day. Nested Sources (Writer) “The report is full of absurdities,’’ Xirao-Nima said the next day. Nested Sources (Writer, Xirao-Nima) “The report is full of absurdities,’’ Xirao-Nima said the next day. Nested Sources (Writer Xirao-Nima) (Writer Xirao-Nima) “The report is full of absurdities,’’ Xirao-Nima said the next day. Nested Sources (Writer) (Writer Xirao-Nima) (Writer Xirao-Nima) “The report is full of absurdities,’’ Xirao-Nima said the next day. “The report is full of absurdities,” Xirao-Nima said the next day. Objective speech event anchor: the entire sentence source: <writer> implicit: true Direct subjective anchor: said source: <writer, Xirao-Nima> intensity: high expression intensity: neutral attitude type: negative target: report Expressive subjective element anchor: full of absurdities source: <writer, Xirao-Nima> intensity: high attitude type: negative “The report is full of absurdities,” Xirao-Nima said the next day. Objective speech event anchor: the entire sentence source: <writer> implicit: true Direct subjective anchor: said source: <writer, Xirao-Nima> intensity: high expression intensity: neutral attitude type: negative target: report Expressive subjective element anchor: full of absurdities source: <writer, Xirao-Nima> intensity: high attitude type: negative “The report is full of absurdities,” Xirao-Nima said the next day. Objective speech event anchor: the entire sentence source: <writer> implicit: true Direct subjective anchor: said source: <writer, Xirao-Nima> intensity: high expression intensity: neutral attitude type: negative target: report Expressive subjective element anchor: full of absurdities source: <writer, Xirao-Nima> intensity: high attitude type: negative “The report is full of absurdities,” Xirao-Nima said the next day. Objective speech event anchor: the entire sentence source: <writer> implicit: true Direct subjective anchor: said source: <writer, Xirao-Nima> intensity: high expression intensity: neutral attitude type: negative target: report Expressive subjective element anchor: full of absurdities source: <writer, Xirao-Nima> intensity: high attitude type: negative “The report is full of absurdities,” Xirao-Nima said the next day. Objective speech event anchor: the entire sentence source: <writer> implicit: true Direct subjective anchor: said source: <writer, Xirao-Nima> intensity: high expression intensity: neutral attitude type: negative target: report Expressive subjective element anchor: full of absurdities source: <writer, Xirao-Nima> intensity: high attitude type: negative “The report is full of absurdities,” Xirao-Nima said the next day. Objective speech event anchor: the entire sentence source: <writer> implicit: true Direct subjective anchor: said source: <writer, Xirao-Nima> intensity: high expression intensity: neutral attitude type: negative target: report Expressive subjective element anchor: full of absurdities source: <writer, Xirao-Nima> intensity: high attitude type: negative “The US fears a spill-over’’, said Xirao-Nima, a professor of foreign affairs at the Central University for Nationalities. (Writer) “The US fears a spill-over’’, said Xirao-Nima, a professor of foreign affairs at the Central University for Nationalities. (writer, Xirao-Nima) “The US fears a spill-over’’, said Xirao-Nima, a professor of foreign affairs at the Central University for Nationalities. (writer, Xirao-Nima, US) “The US fears a spill-over’’, said Xirao-Nima, a professor of foreign affairs at the Central University for Nationalities. (Writer) (writer, Xirao-Nima, US) (writer, Xirao-Nima) “The US fears a spill-over’’, said Xirao-Nima, a professor of foreign affairs at the Central University for Nationalities. “The US fears a spill-over’’, said Xirao-Nima, a professor of foreign affairs at the Central University for Nationalities. Objective speech event anchor: the entire sentence source: <writer> implicit: true Objective speech event anchor: said source: <writer, Xirao-Nima> Direct subjective anchor: fears source: <writer, Xirao-Nima, US> intensity: medium expression intensity: medium attitude type: negative target: spill-over Corpus • @ www.cs.pitt.edu/mpqa • English language versions of articles from the world press (187 news sources) • Also includes contextual polarity annotations • Themes of the instructions: – No rules about how particular words should be annotated. – Don’t take expressions out of context and think about what they could mean, but judge them as they are used in that sentence. Extensions Wilson 2007 Extensions Wilson 2007 I think people are happy because Chavez has fallen. direct subjective span: think source: <writer, I> attitude: attitude span: think type: positive arguing intensity: medium target: target span: people are happy because Chavez has fallen direct subjective span: are happy source: <writer, I, People> attitude: attitude span: are happy type: pos sentiment intensity: medium target: target span: Chavez has fallen inferred attitude span: are happy because Chavez has fallen type: neg sentiment intensity: medium target: target span: Chavez Outline • Subjectivity annotation scheme (MPQA) • Learning subjective expressions from unannotated texts • Contextual polarity of sentiment expressions • Word sense and subjectivity • Conclusions and pointers to related work <ppol=“neg”>condemn</ppol> <ppol=“pos”>great</ppol> <ppol=“neg”>wicked</ppol> <> </> <> </> <> </> <> </> <> </> <> </> <cpol=“pos”>wicked </cpol> visuals <cpol=“neg”>loudly condemned</cpol> The building has been <subjectivity=“obj”> condemned </subjectivity> 2 <> </> <> </> QA Review Mining Opinion Tracking Outline for Section 2 • Learning subjective nouns with extraction pattern bootstrapping • Automatically generating training data with high-precision classifiers • Learning subjective and objective expressions (not simply words or n-grams) • Riloff, Wiebe, Wilson 2003; Riloff & Wiebe 2003; Wiebe & Riloff 2005; Riloff, Patwardhan, Wiebe 2006. Outline for Section 2 • Learning subjective nouns with extraction pattern bootstrapping • Automatically generating training data with high-precision classifiers • Learning subjective and objective expressions Information Extraction • Information extraction (IE) systems identify facts related to a domain of interest. • Extraction patterns are lexico-syntactic expressions that identify the role of an object. For example: <subject> was killed assassinated <dobj> murder of <np> Learning Subjective Nouns • Goal: to learn subjective nouns from unannotated text • Method: applying IE-based bootstrapping algorithms that were designed to learn semantic categories • Hypothesis: extraction patterns can identify subjective contexts that co-occur with subjective nouns Example: “expressed <dobj>” concern, hope, support Extraction Examples expressed <dobj> condolences, hope, grief, views, worries indicative of <np> compromise, desire, thinking inject <dobj> vitality, hatred reaffirmed <dobj> resolve, position, commitment voiced <dobj> outrage, support, skepticism, opposition, gratitude, indignation show of <np> support, strength, goodwill, solidarity <subj> was shared anxiety, view, niceties, feeling Meta-Bootstrapping Riloff & Jones 1999 Unannotated Texts Ex: hope, grief, joy, concern, worries Best Extraction Pattern Extractions (Nouns) Ex: expressed <DOBJ> Ex: happiness, relief, condolences Subjective Seed Words cowardice crap delight disdain dismay embarrassment fool gloom grievance happiness hatred hell hypocrisy love nonsense outrage slander sigh twit virtue Subjective Noun Results • Bootstrapping corpus: 950 unannotated news articles • We ran both bootstrapping algorithms for several iterations • We manually reviewed the words and labeled them as strong, weak, or not subjective • 1052 subjective nouns were learned (454 strong, 598 weak) – included in our subjectivity lexicon @ www.cs.pitt.edu/mpqa Examples of Strong Subjective Nouns anguish antagonism apologist atrocities barbarian belligerence bully condemnation denunciation devil diatribe exaggeration exploitation evil fallacies genius goodwill humiliation ill-treatment injustice innuendo insinuation liar mockery pariah repudiation revenge rogue sanctimonious scum smokescreen sympathy tyranny venom Examples of Weak Subjective Nouns aberration allusion apprehensions assault beneficiary benefit blood controversy credence distortion drama eternity eyebrows failures inclination intrigue liability likelihood peaceful persistent plague pressure promise rejection resistant risk sincerity slump spirit success tolerance trick trust unity Outline for Section 2 • Learning subjective nouns with extraction pattern bootstrapping • Automatically generating training data with high-precision classifiers • Learning subjective and objective expressions Training Data Creation unlabeled texts subjective clues rule-based subjective sentence classifier rule-based objective sentence classifier subjective & objective sentences Subjective Clues • Subjectivity lexicon – @ www.cs.pitt.edu/mpqa – Entries from manually developed resources (e.g., General Inquirer, Framenet) + – Entries automatically identified (E.g., nouns just described) Creating High-Precision RuleBased Classifiers GOAL: use subjectivity clues from previous research to build a high-precision (low-recall) rule-based classifier • a sentence is subjective if it contains 2 strong subjective clues • a sentence is objective if: – it contains no strong subjective clues – the previous and next sentence contain 1 strong subjective clue – the current, previous, and next sentence together contain 2 weak subjective clues Accuracy of Rule-Based Classifiers (measured against MPQA annotations) Subj RBC SubjRec SubjPrec SubjF 34.2 90.4 49.6 Obj RBC ObjRec 30.7 ObjPrec 82.4 ObjF 44.7 Generated Data We applied the rule-based classifiers to ~300,000 sentences from (unannotated) new articles ~53,000 were labeled subjective ~48,000 were labeled objective training set of over 100,000 labeled sentences! Generated Data • The generated data may serve as training data for supervised learning, and initial data for self training Wiebe & Riloff 2005 • The rule-based classifiers are part of OpinionFinder @ www.cs.pitt.edu/mpqa • Here, we use the data to learn new subjective expressions… Outline for Section 2 • Learning subjective nouns with extraction pattern bootstrapping • Automatically generating training data with high-precision classifiers • Learning subjective and objective expressions Representing Subjective Expressions with Extraction Patterns • EPs can represent expressions that are not fixed word sequences drove [NP] up the wall - drove him up the wall - drove George Bush up the wall - drove George Herbert Walker Bush up the wall step on [modifiers] toes - step on her toes - step on the mayor’s toes - step on the newly elected mayor’s toes gave [NP] a [modifiers] look - gave his annoying sister a really really mean look The Extraction Pattern Learner • Used AutoSlog-TS [Riloff 96] to learn EPs • AutoSlog-TS needs relevant and irrelevant texts as input • Statistics are generated measuring each pattern’s association with the relevant texts • The subjective sentences are the relevant texts, and the objective sentences are the irrelevant texts <subject> passive-vp <subject> active-vp <subject> active-vp dobj <subject> active-vp infinitive <subject> passive-vp infinitive <subject> auxiliary dobj <subj> was satisfied <subj> complained <subj> dealt blow <subj> appears to be <subj> was thought to be <subj> has position active-vp <dobj> infinitive <dobj> active-vp infinitive <dobj> passive-vp infinitive <dobj> subject auxiliary <dobj> endorsed <dobj> to condemn <dobj> get to know <dobj> was meant to show <dobj> fact is <dobj> passive-vp prep <np> active-vp prep <np> infinitive prep <np> noun prep <np> opinion on <np> agrees with <np> was worried about <np> to resort to <np> Relevant Irrelevant AutoSlog-TS (Step 1) [The World Trade Center], [an icon] of [New York City], was intentionally attacked very early on [September 11, 2001]. Parser Syntactic Templates Extraction Patterns: <subj> was attacked icon of <np> was attacked on <np> AutoSlog-TS (Step 2) Relevant Irrelevant Extraction Patterns: <subj> was attacked icon of <np> was attacked on <np> Extraction Patterns Freq Prob <subj> was attacked 100 .90 icon of <np> 5 .20 was attacked on <np> 80 .79 Identifying Subjective Patterns • Subjective patterns: – Freq > X – Probability > Y – ~6,400 learned on 1st iteration • Evaluation against the MPQA corpus: – direct evaluation of performance as simple classifiers – evaluation of classifiers using patterns as additional features – Does the system learn new knowledge? • Add patterns to the strong subjective set and re-run the rule-based classifier • recall += 20 while precision -+ 7 on 1st iteration Patterns with Interesting Behavior PATTERN <subj> asked <subj> was asked <subj> was expected was expected from <np> FREQ 128 11 P(Subj | Pattern) .63 1.0 45 5 .42 1.0 187 10 .67 .90 <subj> talk talk of <np> <subj> is talk 28 10 5 .71 .90 1.0 <subj> is fact fact is <dobj> 38 12 1.0 1.0 <subj> put <subj> put end Conclusions for Section 2 • Extraction pattern bootstrapping can learn subjective nouns (unannotated data) • Extraction patterns can represent richer subjective expressions • Learning methods can discover subtle distinctions that are important for subjectivity (unannotated data) • Ongoing work: – lexicon representation integrating different types of entries, enabling comparisons (e.g., subsumption relationships) – learning and bootstrapping processes applied to much larger unannotated corpora – processes applied to learning positive and negative subjective expressions Outline • Subjectivity annotation scheme (MPQA) • Learning subjective expressions from unannotated texts • Contextual polarity of sentiment expressions (briefly) – Wilson, Wiebe, Hoffmann 2005; Wilson 2007; Wilson & Wiebe forthcoming • Word sense and subjectivity • Conclusions and pointers to related work <ppol=“neg”>condemn</ppol> <ppol=“pos”>great</ppol> <ppol=“neg”>wicked</ppol> <> </> <> </> <> </> <> </> <> </> <> </> <cpol=“pos”>wicked </cpol> visuals <cpol=“neg”>loudly condemned</cpol> The building has been <subjectivity=“obj”> condemned </subjectivity> <> </> <> </> 3 QA Review Mining Opinion Tracking Prior Polarity versus Contextual Polarity • Most approaches use a lexicon of positive and negative words Prior polarity: out of context, positive or negative beautiful positive horrid negative • A word may appear in a phrase that expresses a different polarity in context “Cheers to Timothy Whitfield for the wonderfully horrid visuals.” Contextual polarity Goal of This Work • Automatically distinguish prior and contextual polarity Approach Lexicon All Instances Corpus Step 1 Neutral or Polar? Step 2 Polar Instances Contextual Polarity? • Use machine learning and variety of features • Achieve significant results for a large subset of sentiment expressions Manual Annotations Subjective expressions of the MPQA corpus annotated with contextual polarity Annotation Scheme • Mark polarity of subjective expressions as positive, negative, both, or neutral positive African observers generally approved of his victory while Western governments denounced it. negative Besides, politicians refer to good and evil … both Jerome says the hospital feels no different than a hospital in the states. neutral Annotation Scheme • Judge the contextual polarity of sentiment ultimately being conveyed They have not succeeded, and will never succeed, in breaking the will of this valiant people. Annotation Scheme • Judge the contextual polarity of sentiment ultimately being conveyed They have not succeeded, and will never succeed, in breaking the will of this valiant people. Annotation Scheme • Judge the contextual polarity of sentiment ultimately being conveyed They have not succeeded, and will never succeed, in breaking the will of this valiant people. Annotation Scheme • Judge the contextual polarity of sentiment ultimately being conveyed They have not succeeded, and will never succeed, in breaking the will of this valiant people. Features • Many inspired by Polanyi & Zaenen (2004): Contextual Valence Shifters Example: little threat little truth • Others capture dependency relationships between words Example: pos wonderfully horrid mod Lexicon All Instances Corpus 1. 2. 3. 4. 5. Step 1 Neutral or Polar? Word features Modification features Structure features Sentence features Document feature Step 2 Polar Instances Contextual Polarity? • Word token terrifies • Word part-of-speech VB • Context that terrifies me • Prior Polarity negative • Reliability strong subjective Lexicon Step 1 All Instances Corpus 1. 2. 3. 4. 5. Neutral or Polar? Word features Modification features Structure features Sentence features Document feature poses subj obj report det adj The human rights det adj Polar Instances Contextual Polarity? Binary features: • Preceded by – adjective – adverb (other than not) – intensifier • Self intensifier • Modifies – strong clue – weak clue challenge mod Step 2 p a substantial … • Modified by – strong clue – weak clue Dependency Parse Tree Lexicon Step 1 All Instances Corpus 1. 2. 3. 4. 5. Neutral or Polar? Word features Modification features Structure features Sentence features Document feature obj report det adj The human rights Polarity? Binary features: • In subject [The human rights report] poses I am confident challenge mod Polar Instances Contextual • In copular poses subj Step 2 det adj p a substantial … • In passive voice must be regarded Lexicon All Instances Corpus 1. 2. 3. 4. 5. Step 1 Neutral or Polar? Word features Modification features Structure features Sentence features Document feature Step 2 Polar Instances Contextual Polarity? • Count of strong clues in previous, current, next sentence • Count of weak clues in previous, current, next sentence • Counts of various parts of speech Lexicon All Instances Corpus Neutral or Polar? Word features Modification features Structure features Sentence features Document feature Step 2 Polar Instances Contextual Polarity? • Document topic (15) – economics – health … 1. 2. 3. 4. 5. Step 1 – Kyoto protocol – presidential election in Zimbabwe Example: The disease can be contracted if a person is bitten by a certain tick or if a person comes into contact with the blood of a congo fever sufferer. Lexicon All Instances Corpus Step 1 Neutral or Polar? • Word token • Word prior polarity • • • • • • • • Negated Negated subject Modifies polarity Modified by polarity Conjunction polarity General polarity shifter Negative polarity shifter Positive polarity shifter Step 2 Polar Instances Contextual Polarity? Word token terrifies Word prior polarity negative Lexicon All Instances Corpus Step 1 Neutral or Polar? • Word token • Word prior polarity • Negated • Negated subject • • • • • • Step 2 Polar Instances Contextual Polarity? Binary features: • Negated For example: – not good – does not look very good not only good but amazing Modifies polarity Modified by polarity Conjunction polarity • Negated subject General polarity shifter No politically prudent Israeli Negative polarity shifter could support either of them. Positive polarity shifter Lexicon All Instances Corpus • • • • Step 1 Neutral or Polar? Word token Word prior polarity Negated Negated subject Step 2 Polar Instances Contextual Polarity? • Modifies polarity 5 values: positive, negative, neutral, both, not mod substantial: negative • Modifies polarity • Modified by polarity • Modified by polarity • • • • Conjunction polarity General polarity shifter Negative polarity shifter Positive polarity shifter 5 values: positive, negative, neutral, both, not mod challenge: positive substantial (pos) challenge (neg) Lexicon All Instances Corpus • • • • • • Step 1 Neutral or Polar? Word token Word prior polarity Negated Negated subject Modifies polarity Modified by polarity Step 2 Polar Instances Contextual Polarity? • Conjunction polarity 5 values: positive, negative, neutral, both, not mod good: negative • Conjunction polarity • General polarity shifter • Negative polarity shifter • Positive polarity shifter good (pos) and evil (neg) Lexicon All Instances Corpus • • • • • • • • • • Step 1 Neutral or Polar? Word token Word prior polarity Negated Negated subject Modifies polarity Modified by polarity Conjunction polarity General polarity shifter Negative polarity shifter Positive polarity shifter Step 2 Polar Instances Contextual Polarity? • General polarity shifter have few risks/rewards • Negative polarity shifter lack of understanding • Positive polarity shifter abate the damage Findings • Statistically significant improvements can be gained • Require combining all feature types • Ongoing work: – richer lexicon entries – compositional contextual processing S/O and Pos/Neg: both Important Subjective Positive Objective Negative Neutral Both • Several studies have found two steps beneficial – Yu & Hatzivassiloglou 2003; Pang & Lee 2004; Wilson et al. 2005; Kim & Hovy 2006 S/O and Pos/Neg: both Important Subjective Positive Objective Negative Neutral Both • S/O can be more difficult – – – – – Manual annotation of phrases Takamura et al. 2006 Contextual polarity Wilson et al. 2005 Sentiment tagging of words Andreevskaia & Bergler 2006 Sentiment tagging of word senses Esuli & Sabastiani 2006 Later: evidence that S/O more significant for senses S/O and Pos/Neg: both Important • Desirable for NLP systems to find a wide range of private states, including motivations, thoughts, and speculations, not just positive and negative sentiments S/O and Pos/Neg: both Important Sentiment Pos Neg Other Subjectivity Both Pos Neg Neu Both Objective Outline • Subjectivity annotation scheme (MPQA) • Learning subjective expressions from unannotated texts • Contextual polarity of sentiment expressions • Word sense and subjectivity – Wiebe & Mihalcea 2006 • Conclusions and pointers to related work <ppol=“neg”>condemn></ppol> <subjl=“subj”>condemn#1</subj> <subj=“obj”>condemn#2</subj> <subj=“obj”>condemn#3</subj> <> </> <> </> <> </> <> </> <> </> <> </> 4 <cpol=“pos”>wicked </cpol> visuals <cpol=“neg”>loudly condemned</cpol> The building has been <subjectivity=“obj”> condemned </subjectivity> <> </> <> </> QA Review Mining Opinion Tracking Introduction • Continuing interest in word sense – Sense annotated resources being developed for many languages • www.globalwordnet.org – Active participation in evaluations such as SENSEVAL Word Sense and Subjectivity • Though both are concerned with text meaning, until recently they have been investigated independently Subjectivity Labels on Senses S Alarm, dismay, consternation – (fear resulting from the awareness of danger) O Alarm, warning device, alarm system – (a device that signals the occurrence of some undesirable event) Subjectivity Labels on Senses S O Interest, involvement -- (a sense of concern with and curiosity about someone or something; "an interest in music") Interest -- (a fixed charge for borrowing money; usually a percentage of the amount borrowed; "how much interest do you pay on your mortgage?") WSD using Subjectivity Tagging He spins a riveting plot which grabs and holds the reader’s interest. Sense 4 “a sense of concern with and curiosity about someone or something” S Sense 1 “a fixed charge for borrowing money” Sense 4 Sense 1? WSD System O Sense 1 Sense 4? The notes do not pay interest. WSD using Subjectivity Tagging He spins a riveting plot which grabs and holds the reader’s interest. S Subjectivity Classifier Sense 4 “a sense of concern with and curiosity about someone or something” S Sense 1 “a fixed charge for borrowing money” Sense 4 Sense 1? WSD System O Sense 1 Sense 4? O The notes do not pay interest. WSD using Subjectivity Tagging He spins a riveting plot which grabs and holds the reader’s interest. S Subjectivity Classifier Sense 4 “a sense of concern with and curiosity about someone or something” S Sense 1 “a fixed charge for borrowing money” Sense 4 Sense 1? WSD System O Sense 1 Sense 4? O The notes do not pay interest. Subjectivity Tagging using WSD S O? He spins a riveting plot which grabs and holds the reader’s interest. Subjectivity Classifier O S? The notes do not pay interest. Subjectivity Tagging using WSD S O? He spins a riveting plot which grabs and holds the reader’s interest. Sense 4 S Sense 4 “a sense of Subjectivity Classifier concern with and curiosity about someone or something” O Sense 1 “a fixed charge WSD System for borrowing money” O S? Sense 1 The notes do not pay interest. Subjectivity Tagging using WSD S O? He spins a riveting plot which grabs and holds the reader’s interest. Sense 4 S Sense 4 “a sense of Subjectivity Classifier concern with and curiosity about someone or something” O Sense 1 “a fixed charge WSD System for borrowing money” O S? Sense 1 The notes do not pay interest Refining WordNet • Semantic Richness • Find inconsistencies and gaps – Verb assault – attack, round, assail, last out, snipe, assault (attack in speech or writing) “The editors of the left-leaning paper attacked the new House Speaker” – But no sense for the noun as in “His verbal assault was vicious” Goals • Explore interactions between word sense and subjectivity – Can subjectivity labels be assigned to word senses? • Manually • Automatically – Can subjectivity analysis improve word sense disambiguation? – Can word sense disambiguation improve subjectivity analysis? Current work Outline for Section 4 • Motivation and Goals • Assigning Subjectivity Labels to Word Senses – Manually – Automatically • Word Sense Disambiguation using Automatic Subjectivity Analysis • Conclusions Annotation Scheme • Assigning subjectivity labels to WordNet senses – S: subjective – O: objective – B: both Examples S O Alarm, dismay, consternation – (fear resulting form the awareness of danger) – Fear, fearfulness, fright – (an emotion experiences in anticipation of some specific pain or danger (usually accompanied by a desire to flee or fight)) Alarm, warning device, alarm system – (a device that signals the occurrence of some undesirable event) – Device – (an instrumentality invented for a particular purpose; “the device is small enough to wear on your wrist”; “a device intended to conserve water” Subjective Sense Definition • When the sense is used in a text or conversation, we expect it to express subjectivity, and we expect the phrase/sentence containing it to be subjective. Subjective Sense Examples • His alarm grew Alarm, dismay, consternation – (fear resulting form the awareness of danger) – Fear, fearfulness, fright – (an emotion experiences in anticipation of some specific pain or danger (usually accompanied by a desire to flee or fight)) • He was boiling with anger Seethe, boil – (be in an agitated emotional state; “The customer was seething with anger”) – Be – (have the quality of being; (copula, used with an adjective or a predicate noun); “John is rich”; “This is not a good answer”) Subjective Sense Examples • What’s the catch? Catch – (a hidden drawback; “it sounds good but what’s the catch?”) Drawback – (the quality of being a hindrance; “he pointed out all the drawbacks to my plan”) • That doctor is a quack. Quack – (an untrained person who pretends to be a physician and who dispenses medical advice) – Doctor, doc, physician, MD, Dr., medico Objective Sense Examples • The alarm went off Alarm, warning device, alarm system – (a device that signals the occurrence of some undesirable event) – Device – (an instrumentality invented for a particular purpose; “the device is small enough to wear on your wrist”; “a device intended to conserve water” • The water boiled Boil – (come to the boiling point and change from a liquid to vapor; “Water boils at 100 degrees Celsius”) – Change state, turn – (undergo a transformation or a change of position or action) Objective Sense Examples • He sold his catch at the market Catch, haul – (the quantity that was caught; “the catch was only 10 fish”) – Indefinite quantity – (an estimated quantity) • The duck’s quack was loud and brief Quack – (the harsh sound of a duck) – Sound – (the sudden occurrence of an audible event) Objective Senses: Observation • We don’t necessarily expect phrases/sentences containing objective senses to be objective – Will someone shut that darn alarm off? – Can’t you even boil water? • Subjective, but not due to alarm and boil Objective Sense Definition • When the sense is used in a text or conversation, we don’t expect it to express subjectivity and, if the phrase/sentence containing it is subjective, the subjectivity is due to something else. Inter-Annotator Agreement Results • Overall: – Kappa=0.74 – Percent Agreement=85.5% • Without the 12.3% cases when a judge is uncertain: – Kappa=0.90 – Percent Agreement=95.0% S/O and Pos/Neg • Hypothesis: moving from word to word sense is more important for S versus O than it is for Positive versus Negative • Pilot study with the nouns of the SENSEVAL-3 English lexical sample task – ½ have both subjective and objective senses – Only 1 has both positive and negative subjective senses Outline for Section 4 • Motivation and Goals • Assigning Subjectivity Labels to Word Senses – Manually – Automatically • Word Sense Disambiguation using Automatic Subjectivity Analysis • Conclusions Overview • Main idea: assess the subjectivity of a word sense based on information about the subjectivity of – a set of distributionally similar words – in a corpus annotated with subjective expressions Preliminaries • Suppose the goal were to assess the subjectivity of a word w, given an annotated corpus • We could consider how often w appears in subjective expressions • Or, we could take into account more evidence – the subjectivity of a set of distributionally similar words Lin’s Distributional Similarity R2 R3 I have a brown R1 R4 Word I have brown R R1 R2 R3 ... W have dog dog Lin 1998 dog Lin’s Distributional Similarity Word1 R W R W Word2 RW RW RW RW RW RW RW RW RW RW Subjectivity of word w Unannotated Corpus (BNC) Annotated Corpus (MPQA) subj(w) = #insts(DSW) in SE - #insts(DSW) not in SE #insts (DSW) DSW = {dsw1, …, dswj} Subjectivity of word w Unannotated Corpus (BNC) Annotated Corpus (MPQA) subj(w) = #insts(DSW) in SE - #insts(DSW) not in SE #insts (DSW) [-1, 1] [highly objective, highly subjective] DSW = {dsw1, …, dswj} Subjectivity of word w Unannotated Corpus (BNC) Annotated Corpus (MPQA) dsw1inst1 +1 dsw1inst2 -1 dsw2inst1 +1 subj(w) = +1 -1 +1 3 DSW = {dsw1,dsw2} = 1/3 Subjectivity of word sense wi Annotated Corpus (MPQA) Rather than 1, add or subtract sim(wi,dswj) dsw1inst1 +sim(wi,dsw1) dsw1inst2 -sim(wi,dsw1) dsw2inst1 +sim(wi,dsw2) +sim(wi,dsw1) - sim(wi,dsw1) + sim(wi,dsw2) subj(wi) = 2 * sim(wi,dsw1) + sim(wi,dsw2) Method –Step 1 • Given word w • Find distributionally similar words – DSW = {dswj | j = 1 .. n} Method –Step 2 DSW1 word w = Alarm Panic DSW2 Detector Sense w1 “fear sim(w1,panic) sim(w1,detector) sim(w2,panic) sim(w2,detector) resulting from the awareness of danger” Sense w2 “a device that signals the occurrence of some undesirable event” Method – Step 2 • Find the similarity between each word sense and each distributionally similar word wnss1( wsi , dsw j ) sim ( wsi , dsw j ) wnss1(ws , dsw ) i 'senses( w ) wnss1(wsi , dsw j ) wnss max ksenses( dswj ) i' j wnss(wsi , dswkj ) can be any concept-based similarity measure between word senses we use Jiang & Conrath 1997 Method –Step 3 Input: word sense wi of word w DSW = {dswj | j = 1..n} sim(wi,dswj) MPQA Opinion Corpus Output: subjectivity score subj(wi) Method –Step 3 j totalsim = #insts(dswj) * sim(wi,dswj) 1 evi_subj = 0 for each dswj in DSW: for each instance k in insts(dswj): if k is in a subjective expression: evi_subj += sim(wi,dswj) else: evi_subj -= sim(wi,dswj) subj(wi) = evi_subj / totalsim Evaluation • Calculate subj scores for each word sense, and sort them • While 0 is a natural candidate for division between S and O, we perform the evaluation for different thresholds in [1,+1] • Determine the precision of the algorithm at different points of recall Evaluation: precision/recall curves 1 baseline 0.9 selected 0.8 all Precision 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Recall 1 Number of distributionally similar words = 160 Outline for Section 4 • Motivation and Goals • Assigning Subjectivity Labels to Word Senses – Manually – Automatically • Word Sense Disambiguation using Automatic Subjectivity Analysis • Conclusions Overview • Augment an existing data-driven WSD system with a feature reflecting the subjectivity of the context of the ambiguous word • Compare the performance of original and subjectivity-aware WSD systems • The ambiguous nouns of the SENSEVAL3 English Lexical Task • SENSEVAL-3 data Original WSD System • Integrates local and topical features: • Local: context of three words to the left and right, their part-of-speech • Topical: top five words occurring at least three times in the context of a word sense • [Ng & Lee, 1996], [Mihalcea, 2002] • Naïve Bayes classifier • [Lee & Ng, 2003] Subjectivity Classifier • Rule-based automatic sentence classifier from Wiebe & Riloff 2005 • Harvests subjective and objective sentences it is certain about from unannotated data • Part of OpinionFinder @ www.cs.pitt.edu/mpqa/ Subjectivity Tagging for WSD Sentences of the SENSEVAL-3data that contain a target noun are tagged with O, S, or B: Sentencej Sentencek “interest” “interest” “atmosphere” S Subjectivity Classifier O S … … … … Sentencei Tags are fed as input to the Subjectivity Aware WSD System WSD using Subjectivity Tagging Sentencei Original WSD System “interest” Sense 4 Sense 1 S Subjectivity Classifier S, O, or B Subjectivity Aware WSD System Sense 4 “a fixed charge for borrowing money” Sense 1 “a sense of concern with and curiosity about someone or something” WSD using Subjectivity Tagging Hypothesis: instances of subjective senses are more likely to be in subjective sentences, so sentence subjectivity is an informative feature for WSD of words with both subjective and objective senses Words with S and O Senses Word argument atmosphere difference difficulty image interest judgment plan sort source Average Classifier Senses Baseline basic + subj 5 49.4% 51.4% 54.1% 6 65.4% 65.4% 66.7% 5 40.4% 54.4% 57.0% 4 17.4% 47.8% 52.2% 7 36.5% 41.2% 43.2% 7 41.9% 67.7% 68.8% 7 28.1% 40.6% 43.8% 3 81.0% 81.0% 81.0% 4 65.6% 66.7% 67.7% 9 40.6% 40.6% 40.6% S sense not in data 46.6% 55.6% 57.5% 4.3% error reduction; significant (p < 0.05 paired t-test) Words with Only O Senses Word arm audience bank degree disc organization paper party performance shelter Average Classifier Senses Baseline basic + subj 6 82.0% 85.0% 84.2% 4 67.0% 74.0% 74.0% 10 62.6% 62.6% 62.6% 7 60.9% 71.1% 71.1% 4 38.0% 65.6% 66.4% 7 64.3% 64.3% 64.3% 7 25.6% 49.6% 48.0% 5 62.1% 62.9% 62.9% 5 26.4% 34.5% 34.5% 5 44.9% 65.3% 65.3% 53.3% 63.5% 63.3% > = = = < often target = > = = = Overall 2.2% error reduction; significant (p < 0.1) Conclusions for Section 4 • Can subjectivity labels be assigned to word senses? – Manually • Good agreement; Kappa=0.74 • Very good when uncertain cases removed; Kappa=0.90 – Automatically • Method substantially outperforms baseline • Showed feasibility of assigning subjectivity labels to the fine-grained sense level of word senses Conclusions for Section 4 • Can subjectivity analysis improve word sense disambiguation? – Quality of a WSD system can be improved with subjectivity information • Improves performance, but mainly for words with both S and O senses – 4.3% error reduction; significant (p < 0.05) • Performance largely remains the same or degrades for words that don’t • Once senses have been assigned subjectivity labels, a WSD system could consult them to decide whether to consider the subjectivity feature Pointers To Related Work • Tutorial held at EUROLAN 2007 “Semantics, Opinion, and Sentiment in Text”, Iaşi, Romania, August • Slides, bibliographies, … – www.cs.pitt.edu/~wiebe/EUROLAN07 Conclusions • Subjectivity is common in language • Recognizing it is useful in many NLP tasks • It comes in many forms and often is contextdependent • A wide variety of features seem to be necessary for opinion and polarity recognition • Subjectivity may be assigned to word senses, promising improved performance for both subjectivity analysis and WSD – Promising as well for multi-lingual subjectivity analysis Mihalcea, Banea, Wiebe 2007 Acknowledgements • CERATOPS Center for the Extraction and Summarization of Events and Opinions in Text – Pittsburgh: Paul Hoffmann, Josef Ruppenhofer, Swapna Somasundaran, Theresa Wilson – Cornell: Claire Cardie, Eric Breck, Yejin Choi, Ves Stoyanov – Utah: Ellen Riloff, Sidd Patwardhan, Bill Phillips • UNT: Rada Mihalcea, Carmen Banea • NLP@Pitt: Wendy Chapman, Rebecca Hwa, Diane Litman, …