Lec 6 slides

advertisement
Computational Extraction of Social and
Interactional Meaning from Speech
Dan Jurafsky and Mari Ostendorf
Lecture 5:
Agreement, Citation, Propositional Attitude
Mari Ostendorf
Agreement, Citation, Propositional Attitude
 Agreement vs. disagreement with propositions (and
people)
 How to make friends & influence people…
 Tool for affiliation, indicator of influence
 Tool for distancing, indicator of factions or rifts in groups
 Important component of group problem solving
Speech Examples Revisited
A: This’s probably what the LDC uses. I mean they do a lot of transcription at the LDC.
B: OK.
A: I could ask my contacts at the LDC what it is they actually use.
B: Oh! Good idea, great idea.
A: After all these things, he raises hundreds of millions of dollars. I mean uh the fella
B: but he never stops talking about it. A: but ok
B: Aren’t you supposed to y- I mean A: well that’s a little- the Lord says
B: Does charity mean something if you’re constantly using it as a cudgel to beat your
enemies over the- I’m better than you. I give money to charity.
A: Well look, now I…
Subgroups Example: Wikipedia Talk Page
By including the "Haditha Massacre" in the Human Rights Abuse section, we are effectively convicting the
Marines that are currently on trial. I think we need to wait until the trial is over. – UnregisteredUser1
Disagree. All I see is the listing "Haditha killings (Under investigation)." Is the word Massacre used? If not,
I believe it should be because this word fits every version of the story presented in the public, including
Time, the US Marines, and the Iraqi Government. – RegisteredUser1
I agree with RegisteredUser1, this is about (current) history, not law. Just because something hasn't been
decided by a court doesn't mean it didn't happen. It should be enough in the article to just mention that
the marines charged/suspected of the massacre have not yet been convicted. –RegisteredUser2
I disagree, you cannot call it a human rights violation if it’s not stated what happened there. Also your
statement "have not yet been convicted" is kind of the thing we are attempting to avoid. Without guilt or
a better understanding of the situation I think it’s premature to put it in the human rights violation
section. – RegisteredUser3
Actually, as long as NPOV, WP:Verifiability are maintained you can call it a human rights violation even if
it is untrue. As Wikipedia says "As counterintuitive as it may seem, the threshold for inclusion in
Wikipedia is verifiability, not truth." Like it or not, as long as there are reputable sources calling it a
massacre and/or a human rights violation then it can be included in the article. —RegisteredUser4
Calling it a human rights violation in itself is POV. I also do not think anyone would appreciate you
attempting to manipulate wiki policy for the sake of adding POV into an article. – RegisteredUser3
Influencing Example
There is a guideline that we shouldn't semi-protect articles linked from front page, so as to
allow new editors a chance to edit articles they are most likely to read. But in this case all
we are doing is enabling a swarm of socks. Semi-protection is definitely needed in this
instance, with an apology should a new, well-intentioned editor actually show up amidst
the swarm and be prevented from editing. Semi-protect this sucker, or we'll never
determine the appropriate course of action for this article. RegUser2
Even though semi-protection is defidentally good for what is nominally "my" side …
it's against policy and not appropriate. Please take it off. RegUser3
Is is absolutely not against policy. Wikipedia:Protection policy is very clear: … For
this article at this time, it's necessary. That's in perfect compliance with policy. RegUser2
Removing the image without discussion is aggressively bad editing (which I am
often guilty of). It's not vandalism. sprotect is only for vandalism. RegUser3
Repeated violations of 3RR and using sockpuppets, together with admitting
that the purpose of removing the image is to curry favour with one's god and not to
improve Wikipedia, doesn't so much cross the line from bad editing to vandalism as pole
vault it. – RegUser4
Ok, my WP:AGF is falling. I still think sprotect is agressive, but not as badly
as I did before. RegUser3
Influenced participant: alignment change
Online Political Discussion Forum
Q: Gavin Newsom- I expected more from him when I supported him in
the 2003 election. He showed himself as a family-man/Catholic, but
he ended up being the exact oppisate, supporting abortion, and
giving homosexuals marriage licenses. I love San Francisco, but I
hate the people. Sometimes, the people make me want to move to
Sacramento or DC to fix things up.
R: And what is wrong with giving homosexuals the right to settle down
with the person they love? What is it to you if a few limp-wrists get
married in San Francisco? Homosexuals are people, too, who take
out their garbage, pay their taxes, go to work, take care of their
dogs, and what they do in their bedroom is none of your business.
Citations (from Teufel et al., 2006)
 Following Pereira et al. ‘93, we measure word
similarity by the relative entropy or Kulbach-Leibler
(KL) distance, between the corresponding conditional
distributions.
 His [Hindle’s] notion of similarity seems to agree with
our intuitions in many cases, but it is not clear how it
can be used directly to construct word classes and
corresponding models of association.
Overview
 Common threads
 Examples:
 Agreements & disagreements in meetings
 Agreements & disagreements in online discussions
 Citation function
 More common threads
(Plus examples from unpublished UW studies on Wikipedia discussions.)
Overview
 Common threads
 Examples:
 Agreements & disagreements in meetings
 Agreements & disagreements in online discussions
 Citation function
 More common threads
Common Threads
 Sentiment detection (sort of)
 Discussions: agreement/disagreement/neutral
 Citations: positive/negative/neutral (opt. contrast)
 Most studies detect person/paper as target, not the
proposition per se
 Challenges
 Cultural bias & infrequent negatives
 Bag of words is not enough
 Identifying person/paper target of agreement (context
can extend beyond the “sentiment” sentence)
 Computational modeling
Challenge: Cultural Bias
 English meetings: many more agreements than
disagreements
 Mandarin wiki dicussions: fewer explicit
disagreements than in English
 Citations: several studies find that negative citations
are rare (presumably because they are politically
dangerous)
 People use positive words to soften the blow:
 “right but….”, “yeah” with negative intonation
Challenge: Polarity Words in BOW
 Need to account for negation
 “agree” vs. “don’t agree”, “absolutely” vs. “absolutely not”
 BUT fewer than half the positive words in negative turns are
lexically negated
 Some part-of-speech issues, e.g. “well”
 People include positive words to soften the blow
 dissenting turns have more positive words than negative
 “right” occurs 75 times in dissenting turns, 162 times in
neutral turns & only 33 times in supporting turns
Polarity Word Trickiness (cont.)
 Positive negatives
 “yeah larry i i want to correct something randi said of
course”
 “right but but you you can't say that punching him in the
back of the head is justified”
 Negative positives
 “Steph- vent away – that sucks –”
 “no you stick with what you're doing”
Challenge: Identifying the Target
 Baseline: The target is the most recent speaker:
 67% accurate for Wiki discussions
 80% accurate for meetings
 Adding names doesn’t help much (70% accurate for
Wiki discussions)
 Target can be more than one person
 In political discussion forum (Abbott et al. 11), 82% of
posts with quotes have quotes that can be linked to
previous post
 Citation information often not in the same sentence
as the citation (Teufel et al. 06).
Chat: complication of asynchrony
PubCoord
Acct
Secty
Secty
PubCoord
PubCoord
ProjMgr
Secty
PubCoord
Acct
PubCoord
PubCoord
Acct
Secty
Acct
ProjMgr
Acct
Secty
Acct
PubCoord
ProjMgr
Acct
Secty
Acct
PubCoord
Are we agreed on about 60 for soda?
yeah, only ourselves are set apart, I think
They can't take a bottle.
Okay, I agree on 60 for soda
Vote
agreed
Yeah, agree
How much does ice cost?
2.50 per pack
how about 50, because project manager
won't drink that much soda
probably
What is he a camel?
and some folks won't drink any?
lol
no, some people dont like flavor, carbonation
Shut up! Soda can be harsh
or, OMG calories
please stay on topic
yeah, i don’t like the carbonation
Alright, I've identified two of you
I was just going to say that...
me too!
so was that $50 for ice?
actually, I guess I know who everyone is then
What?
?
Acct
Secty
PubCoord
Secty
PubCoord
ProjMgr
Acct
no, 50 for pop
oh
No, 50 for soda is fine I guess
please vote between 50 or 60
I think maybe 10 for ice
Yeah :/
and someone already volunteered
their cooler?
PubCoord Yessir
Secty
*please vote between 50 or 60 for
soda
Secty
I vote 60
PubCoord 60
ProjMgr
50
Acct
i vote 50
ProjMgr
TIE!
PubCoord then?
Secty
50 it is
Acct
g d it
Acct
yeah, 55
Secty
okay, 55
Secty
so how much is left, accountant?
?
Computational Modeling -- Review
 Standard text classification problem
 Extract feature vector  apply model  score classes
 Choose class with best score
 Popular models





Naïve Bayes
Decision trees/forests vs. boostexter/icsiboost
Maximum entropy
New since Lec 5
SVMs
K-nearest neighbor (lazy learning or memory-based)
 Feature selection or regularization
 Evaluation:
 Classification accuracy or Macro F (mean of F measures)
Feature Extraction – Noise Issue
 Both speech and text have “noise” challenges
 Speech: speech recognition errors (especially when
there is overlapping speech)
 Online discussions: typos and funny spellings
 defidentally good
 the exact oppisate
 Not a big issue for edited text (e.g. most articles that
would have citations)
Challenge: Skewed Priors
 Large percentage of sentences are neutral, standard
training algorithms emphasize the frequent classes
 Some solutions:
 Use development set to tune detection thresholds
 Random sampling using biased priors and bagging
(classifier combination)
Overview
 Common threads
 Examples:
 Agreements & disagreements in meetings
 Agreements & disagreements in online discussions
 Citation function
 More common threads
Detecting (Dis)Agreements in Meetings
A: I could ask my contacts at the LDC what it is they
actually use.
B: Oh! Good idea, great idea.
 Adjacency pair speaker detection (given B, find A)
 Target detection for agreements & disagreements
 Also includes question/answer, offer/acceptance, etc.
 Classify B as agreement/disagreement/other
(Backchannels modeled separately, but including in “other
for scoring.)
Galley et al. 2004
Meeting Data
 ICSI Meeting corpus
 75 1-hour meetings, average of 6.5 participants/meeting
 Hand transcribed, audio automatically time aligned
 Hand labeled for adjacency pairs
 7 meetings pause-segmented into “spurts”
 Class distribution:
 Agree: 12%
 Disagree: 7%
 Other: 81%
Adjacency Pair – Speaker Ranking
 Features (B given, A is candidate target)
 Structural: +/- overlap, # of speakers/spurts between A
& B, etc
 Duration: duration of overlap, duration of A, time
between A & B, overlap with others, speaking rate
 Lexical: word counts, counts of shared words, cue word
indicators, name indicator, …
 Dialog acts (oracle)
 Feature selection: incremental
 Classifier: Maximum entropy
Adjacency Pair Results
Only small gain from
oracle DA information:
91.3%
Agreement/Disagreement Classifier
 Features
 Structural: previous next spurt same/diff
 Duration: spurt, silence & overlap duration, speech rate
 Lexical: similar to adjacency pairs, plus polarity word
counts
 Label dependency: contextual tags (a speaker is likely to
disagree with someone who disagrees with them)
 Classifier
 Conditional Markov model (Max Entropy Markov Model)
Agreement/Disagreement Results
Overview
 Common threads
 Examples:
 Agreements & disagreements in meetings
 Agreements & disagreements in online discussions
 Citation function
 More common threads
Detecting (Dis)Agreement in Online Discussions
Task: label R in a Q-R (quote-response) pair as agreement/disagreement.
Abbott et al., 2011
ARGUE Data
 110k forum posts (11k discussion threads, 2764
authors) from website 4forums.com
 Forums include: evolution, gun control, abortion, gay
marriage, healthcare, death penalty, …
 Annotations by Mechanical Turkers with [-5,5] scale
 Disagree-agree (Krippendorff’s a = 0.62)
 Other annotations had a < 0.5: attach, fact/emotion,
sarcasm, nice/nasty
 8k “good” Q-R pairs annotated  sample & use (-1,1)
threshold gives 682 pairs for testing
 Class distribution: resampled to be balanced
(Dis)Agree Classifier
 Features
 MetaPost: author info, time between posts, # other quotes
 Unigram & Bigram counts, initial unigram/bigram/trigram
 Repeated punctuation (collapsed to ??,!!, ?!)
 LIWC measures
 Parse dependencies <relation,wi,wj>, POS-polarity opinion
dependencies
 Tf-idf cosine distance to previous post
 Classifier: Naïve Bayes & JRip (WEKA toolkit)
 Chi-squared feature selection, plus feature selection
implicit in JRip (rule learner)
Sample (Dis)Agree Classifier
(Dis)Agree Classification Results
• JRip beats NB
• JRip Accuracy:
Local features: 68%
Othe annotations: 81%
Caveat: optimistic,
since neutral cases
are removed.
Overview
 Common threads
 Examples:
 Agreements & disagreements in meetings
 Agreements & disagreements in online discussions
 Citation function
 More common threads
Classification of Citation Function
Teufel et al., 2006
 Agreement, usage,
compatibility (6)
 Weakness (4)
 Contrast
 neutral
Citation Study Data
 26 articles w/ 548 citations
 Kappa = 0.72 for 12 categories
 Class distribution: >67% neutral + neutral contrast, 4%
negative, 19% usage
Citation Classifier
 Features
 Grammar of 1762 cue phrases, e.g. “as far as we are
aware” from other work + 892 from this corpus
 185 POS patterns for recognizing agents (self-cites vs.
others) w/ 20 manually acquired verb clusters
 Verb tense, voice, modality
 Sentence location in paragraph & section
 Classifier: K-nearest neighbor (WEKA toolkit)
Citation Classification Results
K=0.75 for humans for
these categories
Overview
 Common threads
 Examples:
 Agreements & disagreements in meetings
 Agreements & disagreements in online discussions
 Citation function
 More common threads
Collected Observations re Features
 Phrase patterns and location-based n-grams are useful
 Structural features are useful
 Location of turn relative to other authors/speakers
 Location of sentence in turn & document
 Broader context (beyond target sentence) is useful
 Sequential patterns of disagreement
 Emotion context
 Simple cosine similarity is not so useful
 Prosodic features not being taken advantage of
More Challenges
 Explicit agreement & disagreement do not capture all the
phenomena associated with alignment & distancing
 Implicit (dis)agreement via stating an opposite opinion
A: The video is still an allegation
B: The video is hard evidence or rhetorical question
… or a rhetorical question
A: Such a topic is far more broad than the current article but should certainly
contain a link back to this one.
B: How is the [[Iraq invasion controversy]] suggestion more broad?
 Support vs. attack
Well, you have proven yoruself [sic] to be a man with no brain
Steph- vent away – that sucks
 These phenomena are hard for human annotators to more
consistently (exception: citation labels?)
 Different studies may group or distinguish them
Example Wikipedia Talk Page
The victims were teenagers, not children. Furthermore, the teenagers were throwing rocks and makeshift
grenades at the soldiers. Second, the video is still an allegation. We should wait until the investigation is completed
before putting it up. – RegisteredUser1
The video is hard evidence. If this was 1945, you'd be telling us not to include any footage of the Nazi
concentration camps until the Germans had concluded that they committed war crimes. As for your suggestions
that those children *deserved* what happened because they allegedly throw rocks at soldiers carrying assault
rifles, I find that as offensive as suggesting that America deserved the 9/11 attack because of its foreign policies. –
AnonymousUser1
THEY WEREN'T CHILDREN! The article makes NO mention of children whatsoever. So before you all let your
emotions run wild over this: a) they weren't children b) they had hand grenades. – RegisteredUser1
YES THEY WERE CHILDREN! Watch the video. The soldiers are clearly acting in hatred and blood-lust, not selfdefense. Defending them is like defending a child molester or serial murderer. The video SHOWS children being
assaulted. – AnonymousUser2
A 14 year old is definitely a child. There's a reason we don't let 14 year-olds drink, vote, drive, "consent" to sex
with adults, or sign legal agreements without a guardian. – RegisteredUser2
At 14 you are definitely a teenager, not a child. 14 year olds can throw a grenade and shoot a rifle, and know the
consequences of their actions. Furthermore 18 isn't the age of majority in Iraq so far as I know. In much of the
world the drinking and driving ages are 14 and 16. The world is not centered upon our American beliefs, and it's
high time that we started accepting that in ALL situations, not just the ones we deem acceptable. I'm absolutely
sickened by the brainwashed vehemence and anti-US hatred expressed by so many so called "liberals" on
Wikipedia. - RegisteredUser1
In the English language the word adult is generally not used for people under the age of 18. If you want to use it
differently you need to explain it in the article in order not to be misleading. Please calm down and do not
personally attack others as "brainwashed" or spreading "hatred". – RegisteredUser4
Summary
 Why look for (dis)agreement, support, etc?
 Dissecting discussions for influence, subgroups,
affiliation, successful problem solving, etc
 Understanding citation impact
 These tasks are very related to sentiment detection,
except that the target is often part of the problem
 Different ways of handling agreement vs. support
 The neutral class is huge – don’t ignore it
 Computational advice:
 Many better alternatives to Naïve Bayes
 Consider features beyond n-grams
Download