Computational Extraction of Social and Interactional Meaning from Speech Dan Jurafsky and Mari Ostendorf Lecture 5: Agreement, Citation, Propositional Attitude Mari Ostendorf Agreement, Citation, Propositional Attitude Agreement vs. disagreement with propositions (and people) How to make friends & influence people… Tool for affiliation, indicator of influence Tool for distancing, indicator of factions or rifts in groups Important component of group problem solving Speech Examples Revisited A: This’s probably what the LDC uses. I mean they do a lot of transcription at the LDC. B: OK. A: I could ask my contacts at the LDC what it is they actually use. B: Oh! Good idea, great idea. A: After all these things, he raises hundreds of millions of dollars. I mean uh the fella B: but he never stops talking about it. A: but ok B: Aren’t you supposed to y- I mean A: well that’s a little- the Lord says B: Does charity mean something if you’re constantly using it as a cudgel to beat your enemies over the- I’m better than you. I give money to charity. A: Well look, now I… Subgroups Example: Wikipedia Talk Page By including the "Haditha Massacre" in the Human Rights Abuse section, we are effectively convicting the Marines that are currently on trial. I think we need to wait until the trial is over. – UnregisteredUser1 Disagree. All I see is the listing "Haditha killings (Under investigation)." Is the word Massacre used? If not, I believe it should be because this word fits every version of the story presented in the public, including Time, the US Marines, and the Iraqi Government. – RegisteredUser1 I agree with RegisteredUser1, this is about (current) history, not law. Just because something hasn't been decided by a court doesn't mean it didn't happen. It should be enough in the article to just mention that the marines charged/suspected of the massacre have not yet been convicted. –RegisteredUser2 I disagree, you cannot call it a human rights violation if it’s not stated what happened there. Also your statement "have not yet been convicted" is kind of the thing we are attempting to avoid. Without guilt or a better understanding of the situation I think it’s premature to put it in the human rights violation section. – RegisteredUser3 Actually, as long as NPOV, WP:Verifiability are maintained you can call it a human rights violation even if it is untrue. As Wikipedia says "As counterintuitive as it may seem, the threshold for inclusion in Wikipedia is verifiability, not truth." Like it or not, as long as there are reputable sources calling it a massacre and/or a human rights violation then it can be included in the article. —RegisteredUser4 Calling it a human rights violation in itself is POV. I also do not think anyone would appreciate you attempting to manipulate wiki policy for the sake of adding POV into an article. – RegisteredUser3 Influencing Example There is a guideline that we shouldn't semi-protect articles linked from front page, so as to allow new editors a chance to edit articles they are most likely to read. But in this case all we are doing is enabling a swarm of socks. Semi-protection is definitely needed in this instance, with an apology should a new, well-intentioned editor actually show up amidst the swarm and be prevented from editing. Semi-protect this sucker, or we'll never determine the appropriate course of action for this article. RegUser2 Even though semi-protection is defidentally good for what is nominally "my" side … it's against policy and not appropriate. Please take it off. RegUser3 Is is absolutely not against policy. Wikipedia:Protection policy is very clear: … For this article at this time, it's necessary. That's in perfect compliance with policy. RegUser2 Removing the image without discussion is aggressively bad editing (which I am often guilty of). It's not vandalism. sprotect is only for vandalism. RegUser3 Repeated violations of 3RR and using sockpuppets, together with admitting that the purpose of removing the image is to curry favour with one's god and not to improve Wikipedia, doesn't so much cross the line from bad editing to vandalism as pole vault it. – RegUser4 Ok, my WP:AGF is falling. I still think sprotect is agressive, but not as badly as I did before. RegUser3 Influenced participant: alignment change Online Political Discussion Forum Q: Gavin Newsom- I expected more from him when I supported him in the 2003 election. He showed himself as a family-man/Catholic, but he ended up being the exact oppisate, supporting abortion, and giving homosexuals marriage licenses. I love San Francisco, but I hate the people. Sometimes, the people make me want to move to Sacramento or DC to fix things up. R: And what is wrong with giving homosexuals the right to settle down with the person they love? What is it to you if a few limp-wrists get married in San Francisco? Homosexuals are people, too, who take out their garbage, pay their taxes, go to work, take care of their dogs, and what they do in their bedroom is none of your business. Citations (from Teufel et al., 2006) Following Pereira et al. ‘93, we measure word similarity by the relative entropy or Kulbach-Leibler (KL) distance, between the corresponding conditional distributions. His [Hindle’s] notion of similarity seems to agree with our intuitions in many cases, but it is not clear how it can be used directly to construct word classes and corresponding models of association. Overview Common threads Examples: Agreements & disagreements in meetings Agreements & disagreements in online discussions Citation function More common threads (Plus examples from unpublished UW studies on Wikipedia discussions.) Overview Common threads Examples: Agreements & disagreements in meetings Agreements & disagreements in online discussions Citation function More common threads Common Threads Sentiment detection (sort of) Discussions: agreement/disagreement/neutral Citations: positive/negative/neutral (opt. contrast) Most studies detect person/paper as target, not the proposition per se Challenges Cultural bias & infrequent negatives Bag of words is not enough Identifying person/paper target of agreement (context can extend beyond the “sentiment” sentence) Computational modeling Challenge: Cultural Bias English meetings: many more agreements than disagreements Mandarin wiki dicussions: fewer explicit disagreements than in English Citations: several studies find that negative citations are rare (presumably because they are politically dangerous) People use positive words to soften the blow: “right but….”, “yeah” with negative intonation Challenge: Polarity Words in BOW Need to account for negation “agree” vs. “don’t agree”, “absolutely” vs. “absolutely not” BUT fewer than half the positive words in negative turns are lexically negated Some part-of-speech issues, e.g. “well” People include positive words to soften the blow dissenting turns have more positive words than negative “right” occurs 75 times in dissenting turns, 162 times in neutral turns & only 33 times in supporting turns Polarity Word Trickiness (cont.) Positive negatives “yeah larry i i want to correct something randi said of course” “right but but you you can't say that punching him in the back of the head is justified” Negative positives “Steph- vent away – that sucks –” “no you stick with what you're doing” Challenge: Identifying the Target Baseline: The target is the most recent speaker: 67% accurate for Wiki discussions 80% accurate for meetings Adding names doesn’t help much (70% accurate for Wiki discussions) Target can be more than one person In political discussion forum (Abbott et al. 11), 82% of posts with quotes have quotes that can be linked to previous post Citation information often not in the same sentence as the citation (Teufel et al. 06). Chat: complication of asynchrony PubCoord Acct Secty Secty PubCoord PubCoord ProjMgr Secty PubCoord Acct PubCoord PubCoord Acct Secty Acct ProjMgr Acct Secty Acct PubCoord ProjMgr Acct Secty Acct PubCoord Are we agreed on about 60 for soda? yeah, only ourselves are set apart, I think They can't take a bottle. Okay, I agree on 60 for soda Vote agreed Yeah, agree How much does ice cost? 2.50 per pack how about 50, because project manager won't drink that much soda probably What is he a camel? and some folks won't drink any? lol no, some people dont like flavor, carbonation Shut up! Soda can be harsh or, OMG calories please stay on topic yeah, i don’t like the carbonation Alright, I've identified two of you I was just going to say that... me too! so was that $50 for ice? actually, I guess I know who everyone is then What? ? Acct Secty PubCoord Secty PubCoord ProjMgr Acct no, 50 for pop oh No, 50 for soda is fine I guess please vote between 50 or 60 I think maybe 10 for ice Yeah :/ and someone already volunteered their cooler? PubCoord Yessir Secty *please vote between 50 or 60 for soda Secty I vote 60 PubCoord 60 ProjMgr 50 Acct i vote 50 ProjMgr TIE! PubCoord then? Secty 50 it is Acct g d it Acct yeah, 55 Secty okay, 55 Secty so how much is left, accountant? ? Computational Modeling -- Review Standard text classification problem Extract feature vector apply model score classes Choose class with best score Popular models Naïve Bayes Decision trees/forests vs. boostexter/icsiboost Maximum entropy New since Lec 5 SVMs K-nearest neighbor (lazy learning or memory-based) Feature selection or regularization Evaluation: Classification accuracy or Macro F (mean of F measures) Feature Extraction – Noise Issue Both speech and text have “noise” challenges Speech: speech recognition errors (especially when there is overlapping speech) Online discussions: typos and funny spellings defidentally good the exact oppisate Not a big issue for edited text (e.g. most articles that would have citations) Challenge: Skewed Priors Large percentage of sentences are neutral, standard training algorithms emphasize the frequent classes Some solutions: Use development set to tune detection thresholds Random sampling using biased priors and bagging (classifier combination) Overview Common threads Examples: Agreements & disagreements in meetings Agreements & disagreements in online discussions Citation function More common threads Detecting (Dis)Agreements in Meetings A: I could ask my contacts at the LDC what it is they actually use. B: Oh! Good idea, great idea. Adjacency pair speaker detection (given B, find A) Target detection for agreements & disagreements Also includes question/answer, offer/acceptance, etc. Classify B as agreement/disagreement/other (Backchannels modeled separately, but including in “other for scoring.) Galley et al. 2004 Meeting Data ICSI Meeting corpus 75 1-hour meetings, average of 6.5 participants/meeting Hand transcribed, audio automatically time aligned Hand labeled for adjacency pairs 7 meetings pause-segmented into “spurts” Class distribution: Agree: 12% Disagree: 7% Other: 81% Adjacency Pair – Speaker Ranking Features (B given, A is candidate target) Structural: +/- overlap, # of speakers/spurts between A & B, etc Duration: duration of overlap, duration of A, time between A & B, overlap with others, speaking rate Lexical: word counts, counts of shared words, cue word indicators, name indicator, … Dialog acts (oracle) Feature selection: incremental Classifier: Maximum entropy Adjacency Pair Results Only small gain from oracle DA information: 91.3% Agreement/Disagreement Classifier Features Structural: previous next spurt same/diff Duration: spurt, silence & overlap duration, speech rate Lexical: similar to adjacency pairs, plus polarity word counts Label dependency: contextual tags (a speaker is likely to disagree with someone who disagrees with them) Classifier Conditional Markov model (Max Entropy Markov Model) Agreement/Disagreement Results Overview Common threads Examples: Agreements & disagreements in meetings Agreements & disagreements in online discussions Citation function More common threads Detecting (Dis)Agreement in Online Discussions Task: label R in a Q-R (quote-response) pair as agreement/disagreement. Abbott et al., 2011 ARGUE Data 110k forum posts (11k discussion threads, 2764 authors) from website 4forums.com Forums include: evolution, gun control, abortion, gay marriage, healthcare, death penalty, … Annotations by Mechanical Turkers with [-5,5] scale Disagree-agree (Krippendorff’s a = 0.62) Other annotations had a < 0.5: attach, fact/emotion, sarcasm, nice/nasty 8k “good” Q-R pairs annotated sample & use (-1,1) threshold gives 682 pairs for testing Class distribution: resampled to be balanced (Dis)Agree Classifier Features MetaPost: author info, time between posts, # other quotes Unigram & Bigram counts, initial unigram/bigram/trigram Repeated punctuation (collapsed to ??,!!, ?!) LIWC measures Parse dependencies <relation,wi,wj>, POS-polarity opinion dependencies Tf-idf cosine distance to previous post Classifier: Naïve Bayes & JRip (WEKA toolkit) Chi-squared feature selection, plus feature selection implicit in JRip (rule learner) Sample (Dis)Agree Classifier (Dis)Agree Classification Results • JRip beats NB • JRip Accuracy: Local features: 68% Othe annotations: 81% Caveat: optimistic, since neutral cases are removed. Overview Common threads Examples: Agreements & disagreements in meetings Agreements & disagreements in online discussions Citation function More common threads Classification of Citation Function Teufel et al., 2006 Agreement, usage, compatibility (6) Weakness (4) Contrast neutral Citation Study Data 26 articles w/ 548 citations Kappa = 0.72 for 12 categories Class distribution: >67% neutral + neutral contrast, 4% negative, 19% usage Citation Classifier Features Grammar of 1762 cue phrases, e.g. “as far as we are aware” from other work + 892 from this corpus 185 POS patterns for recognizing agents (self-cites vs. others) w/ 20 manually acquired verb clusters Verb tense, voice, modality Sentence location in paragraph & section Classifier: K-nearest neighbor (WEKA toolkit) Citation Classification Results K=0.75 for humans for these categories Overview Common threads Examples: Agreements & disagreements in meetings Agreements & disagreements in online discussions Citation function More common threads Collected Observations re Features Phrase patterns and location-based n-grams are useful Structural features are useful Location of turn relative to other authors/speakers Location of sentence in turn & document Broader context (beyond target sentence) is useful Sequential patterns of disagreement Emotion context Simple cosine similarity is not so useful Prosodic features not being taken advantage of More Challenges Explicit agreement & disagreement do not capture all the phenomena associated with alignment & distancing Implicit (dis)agreement via stating an opposite opinion A: The video is still an allegation B: The video is hard evidence or rhetorical question … or a rhetorical question A: Such a topic is far more broad than the current article but should certainly contain a link back to this one. B: How is the [[Iraq invasion controversy]] suggestion more broad? Support vs. attack Well, you have proven yoruself [sic] to be a man with no brain Steph- vent away – that sucks These phenomena are hard for human annotators to more consistently (exception: citation labels?) Different studies may group or distinguish them Example Wikipedia Talk Page The victims were teenagers, not children. Furthermore, the teenagers were throwing rocks and makeshift grenades at the soldiers. Second, the video is still an allegation. We should wait until the investigation is completed before putting it up. – RegisteredUser1 The video is hard evidence. If this was 1945, you'd be telling us not to include any footage of the Nazi concentration camps until the Germans had concluded that they committed war crimes. As for your suggestions that those children *deserved* what happened because they allegedly throw rocks at soldiers carrying assault rifles, I find that as offensive as suggesting that America deserved the 9/11 attack because of its foreign policies. – AnonymousUser1 THEY WEREN'T CHILDREN! The article makes NO mention of children whatsoever. So before you all let your emotions run wild over this: a) they weren't children b) they had hand grenades. – RegisteredUser1 YES THEY WERE CHILDREN! Watch the video. The soldiers are clearly acting in hatred and blood-lust, not selfdefense. Defending them is like defending a child molester or serial murderer. The video SHOWS children being assaulted. – AnonymousUser2 A 14 year old is definitely a child. There's a reason we don't let 14 year-olds drink, vote, drive, "consent" to sex with adults, or sign legal agreements without a guardian. – RegisteredUser2 At 14 you are definitely a teenager, not a child. 14 year olds can throw a grenade and shoot a rifle, and know the consequences of their actions. Furthermore 18 isn't the age of majority in Iraq so far as I know. In much of the world the drinking and driving ages are 14 and 16. The world is not centered upon our American beliefs, and it's high time that we started accepting that in ALL situations, not just the ones we deem acceptable. I'm absolutely sickened by the brainwashed vehemence and anti-US hatred expressed by so many so called "liberals" on Wikipedia. - RegisteredUser1 In the English language the word adult is generally not used for people under the age of 18. If you want to use it differently you need to explain it in the article in order not to be misleading. Please calm down and do not personally attack others as "brainwashed" or spreading "hatred". – RegisteredUser4 Summary Why look for (dis)agreement, support, etc? Dissecting discussions for influence, subgroups, affiliation, successful problem solving, etc Understanding citation impact These tasks are very related to sentiment detection, except that the target is often part of the problem Different ways of handling agreement vs. support The neutral class is huge – don’t ignore it Computational advice: Many better alternatives to Naïve Bayes Consider features beyond n-grams