Notes for sentiment, etc

advertisement
Computational Extraction of Social and
Interactional Meaning from Speech
Dan Jurafsky and Mari Ostendorf
Lecture 5: Register & Genre
Mari Ostendorf
Register and Genre
 Variation in language associated with the social
context/situation
 Formal vs. casual, status and familiarity
 Spoken vs. written
 Audience size
 Broadcast vs. private (performance vs. personal?)
 Reading level or audience age
 Rhetorical form or purpose
 Reporting, editorial, review, entertainment
Example:
In which case is the speaker assuming that a human
(vs. a computer) will be listening to them?
Yeah. Yeah. I’ve noticed that that that’s one of the first things I do when
I go home is I either turn on the TV or the radio. It’s really weird.
I want to go from Denver to Seattle on January 15.
Example:
o- ohio state’s pretty big isn’t it yeah yeah I mean oh it’s you know
we’re about to do like the the uh fiesta bowl there oh yeah
structure,
content
A: Ohio State’s pretty big, isn’t it?
B: Yeah. We’re about to do the Fiesta
Bowl there.
A: Oh, yeah.
register/genre
More Speech Examples
A: This’s probably what the LDC uses. I mean they do a lot of transcription at the LDC.
B: OK.
A: I could ask my contacts at the LDC what it is they actually use.
B: Oh! Good idea, great idea.
A: Ok, so what do you think?
B: Well that’s a pretty loaded topic.
A: Absolutely.
B: Well, here in – Hang on just a minute, the dog is barking -- Ok, here in Oklahoma, we
just went through a major educational reform…
A: After all these things, he raises hundreds of millions of dollars. I mean uh the fella
B: but he never stops talking about it. A: but ok
B: Aren’t you supposed to y- I mean A: well that’s a little- the Lord says
B: Does charity mean something if you’re constantly using it as a cudgel to beat your
enemies over the- I’m better than you. I give money to charity.
A: Well look, now I…
What can you tell about these people?
Text Examples
 WSJ treebank
Fujitsu Ltd.'s top executive took the unusual step of publicly apologizing for his
company's making bids of just one yen for several local government projects, while
computer rival NEC Corp. made a written apology for indulging in the same practice.
 Amazon book review
By tradition, I have to say SPOILERS ALERT here. By opinion, I have to say the book is
spoiled already. I don't think I've ever seen a worse case of missed opportunity than
Breaking Dawn.
 Weblogs
I realise I'm probably about the last person in the blogosphere to post on this stuff,
but I was away so have some catching up to do. When I read John Reid's knuckleheaded pronouncements on Thursday, my first thought was that the one who "just
doesn't get it" is Reid.*
 Newgroups
Many disagreements almost charge the unique committee. How will we neglect after
Rahavan merges the hon tour's match? Some entrances project, cast, and terminate.
Others wickedly set. Somebody going perhaps, unless Edwina strokes flowers without
Mohammar's killer.
Text or Speech?
Interestingly, four Republicans, including the Senate Majority Leader, joined all the
Democrats on the losing end of a 17-12 vote. Welcome to the coven of secularists and
atheists. Not that this has anything to do with religion, as the upright Senator ___ swears,
… I guess no one told this neanderthal that lying is considered a sin by most religions.
Brian: we should get a RC person if we can
Sidney: ah, nice connection
Sidney: Mary ___
Brian: she's coming to our mini-conference too
Sidney: great. we can cite her :)
Brian: copiously
Alan: ha
…
Sidney: Brian, check out ch 2 because it relates directly to some of the task
modeling issues we're discussing. the lit review can be leveraged
Word Usage in Different Genres
UW ‘03
mtgs
swbd
email
papers
nouns
17
13
29
31
32
19
pronouns
10
14
3
2
2
13
adjectives
6
4
11
10
10
9
uh
3
3
0
0
0
.04
Biber ‘93
rand. web conv. web
conversations
press reports
relative clauses
2.9
4.6
causative adverbial
subord. clauses
3.5
.5
that complement
clauses
4.1
3.4
Why should you care about genre?
 Document retrieval:
 Genre specification improves IR
 Junk filtering
 Automatic detection of register characteristics
provides cues to social context
 Social role, group affinity, etc.
 Training computational models for ASR, NLP or text
classification: word usage varies as a function of genre
 Impacts utility of different data sources in training,
strategies for mixing data
 Impacts strategy for domain transfer
Overview
 Dimensions of register and genre
 Genre classification
 Computational considerations
 Examples
 Cues to social context: accommodation examples
 Impact on NLP: system engineering examples
Overview
 Dimensions of register and genre
 Genre classification
 Computational considerations
 Examples
 Cues to social context: accommodation examples
 Impact on NLP: system engineering examples
Biber’s 5 Main Dimensions of Register
 Informational vs Involved Production
 Narrative vs Nonnarrative Concerns
 Elaborated vs Situation-Dependent Reference
 Overt Expression of Persuasion
 Abstract vs Non-abstract Style
informational
news reportage
broadcasts
academic
prose
news editorials
0
fiction
professional
letters
Dimension 1
From Biber, 1993
Comp. Linguistics
personal letters
spontaneous
speeches
conversations
involved
situated
Dimension 3
elaborated
Examples of Other Dimensions
 Narrative vs Nonnarrative
Fiction  Exposition, professional letters, telephone
conversations
 Overt argumentation and persuasion
Editorials  news reports
 Abstract vs Nonabstract Style
Academic prose  conversations, public speeches,
fiction
Register as Formality
 Brown & Levinson (1987)
 Petersen et al. (2010)
model of politeness
 Factors that influence
communication techniques
mapping for Enron email
 Influencing factors
 Symmetric social distance
 Symmetric social distance
 Person vs. business
between participants
 Asymmetric power/status
difference between
participants
 Weight of an imposition
 Frequency of social contact
 Asymmetric power/status
 Rank difference
(CEO>pres>VP>director…)
 Weight of an imposition
 Automatic request classifier
 Size of audience
Overview
 Dimensions of register and genre
 Genre classification
 Computational considerations
 Examples
 Cues to social context: accommodation examples
 Impact on NLP: system engineering examples
Features for Genre Classification
 Layout (generally used for web pages)
 Inclusion of graphics, links, etc.
 Line spacing, tabulation, ….
We won’t
consider these
 Features of text or transcriptions
 Lexical
 structural
 Acoustic features (if speech)
NOTE: Typically you need to normalize for doc length.
Features for Genre Classification
 Features of text or transcriptions
 Words, n-grams, phrases
 Word classes (POS, LIWC, slang, fillers, …)
 Punctuation, emoticons, case
 Sentence complexity, verb tense
 Disfluencies
 Acoustic features
 Speaker turn-taking
 Speaking rate
Feature Selection Methods
 Information filtering (information theoretic)
 Max MI between word & class label
 Max information gain (MI of word indicator & class label)
 Max KL distance: D[p(c|w)||p(c))
D(p||q) = Si p(i)log[p(i)/q(i)]
 Decision tree learning
 Regularization in learning (see later slide)
MI = mutual information (see lecture 1)
Popular Classifiers
 Naïve Bayes (see lecture 1)
 Assumes features are independent (e.g. bag of words)
 Different variations in Rainbow toolkit for weighting
word features, feature selection, smoothing
 Decision tree (in Mallet)
 Greedy rule learner, good for mixed continuous &
discrete features (can be high variance)
 Implicit feature selection in learning
 Adaboost (ICSIboost = version that’s good for text)
 Weighted combination of little trees, progressively
trained to minimize errors from previous iterations
Popular Classifiers (cont.)
 Maximum entropy (in Mallet)
 Loglinear model: exp(weighted comb. of features)
 Used with regularization (penalty on feature weights)
provides feature selection mechanism
 Support vector machine (SVM in svmlight)
 2-class linear classifier: weighted sum of similarity to
important examples
 Can use kernel functions to compute similarity and
increase complexity
 For multi-class problems, use a collection of binary
classifiers
Genre Classification
 Standard text classification problem
 Extract feature vector  apply model  score classes
 Choose class with best score
 Possible variation
 Threshold test for “unknown genre”
 Evaluation:
 Classification accuracy
 Precision/recall (if allowing unknown genre)
Genre as Text Types





IR-motivated text types (Dewdney et al., 2001)
7 Types: Ads, bulletin board, FAQ, message board,
Reuters news, radio news, TV news (ASR audio
transcripts)
Forced decision: 92% recall
Best results with SVM & multiple feature types
Most confusable categories:
radio vs. TV transcripts (5-10%)
2. ads vs. bulletin board (2-8%)
1.
Genre as Text Types (II)
 British National Corpus
 4 text & 6 speech genres
 Results:
 Santini et al., 2004
 POS trigrams & Naïve Bayes, truncated documents
 85.8% accuracy 10-way, 99.3% speech vs. text
 Unpublished UW duplication
 Similar results with full documents
 Slight improvement for POS histogram approach
Genre as Text Types (III)
Document
POS
tagging
Data sources: (from LDC)
•
Speech: broadcast
news (bn), broadcast
conversations (bc),
meetings (mt),
switchboard (sb)
•
Text: newswire (nw),
weblogs (wl)
(Feldman et al. 09)
collect
windowed
histograms
compute
histogram
statistics
Z-norm
+ PCA
Gaussian
classifier
Open Set Challenges
 Test on “BC-like” web text
 Test on matched sources
collected with frequent n-grams
(Feldman et al. ’09)
% correct
QDA w/ POS
histograms
98%
Naïve Bayes w/
bag-of-words
95%
 Main confusions:
 BC  BN
 WL  BN, NW
Improve classifier
with higher-order
moments & more
genres for training.
Very little of BC-like
web data is actually
classified as BC!
Consider formality
filtering instead.
Genre as Formality (Peterson et al. 2011)
 Features:
 Informal words, including: interjections, misspellings and
words classified as informal, vulgar or offensive by
Wordnik
 Punctuation: !, …, absence of sentence-final punctuation
 Case: various measures of lower casing
 Classifier: maximum entropy
 Results: 81% acc, 72% F
 Punctuation is single most useful feature; informal
words and case are lower on recall
Overview
 Dimensions of register and genre
 Genre classification
 Computational considerations
 Examples
 Cues to social context: accommodation examples
 Impact on NLP: system engineering examples
Group Accommodation
 Language use & socialization (Nguyen & Rose, 2011)
 Data: online health forum (re breast cancer) Jan 2011
crawl, <8 year span, only long-term users (2+ yrs)
 Analysis variables:
 Distribution change of high frequency words
 Questions:
 What are characteristic language features of the group?
 How does language change for long-term participants?
Language Change for Long-Term Poster
 Early post:
I am also new to the form, but not new to bc, diagnosed
last yr, [..] My follow-up with surgeon for reports is not
until 8/9 over a week later. My husband too is so
wonderful, only married a yr in May, 1 month before bc
diagnosed, I could not get through this if it weren’t for him,
[…] I wish everyone well. We will all survive.
 2-4 years later:
Oh Kim- sorry you have so much going on – and an idiot DH
on top of it all. [..] Steph- vent away – that sucks – [..]
XOXOXOXXOXOXOX [..] quiet weekend kids went to DD’s &
SIL o Friday evening, [..] mad an AM pop in as I am
supposed to, SIL is an idiot but then you all know that
Short- vs. Long-time Members
Predicting long vs. short-time users:
• 88 LIWC categories better than 1258 POS
• Best single type is unigrams+bigrams
K-L Divergence between…
Gender Accommodation/Differences
 Language use & gender-pairs (Boulis & Ostendorf, 2005)
 Data: Switchboard telephone conversations
 Mostly strangers
 Prescribed topics, 5 min conversations
 Analysis variables: MM, MF, FM, FF
 Questions:
 Can you detect gender or gender pair?
 What words matter?
 Classification features = unigrams or unigrams+bigrams
 Feature selection: KL distance
Accommodation??
Detecting gender pair from
one side of conversation
(unigrams)
Distinguishing same/different
gender pairs (accuracy)
F-measure
FF
unigrams
bigrams
FF-MM
98.9
99.5
FM-MF
69.2
78.9
.78
FM
.07
MF
.21
MM
.64
People change styles more with matched
genders… OR
The matched-gender is an affiliation group.
Gender-Indicative Language Use
 Men:





Swear words
Wife
Names of men
Bass, dude
Filled pauses (uh) – floor holding
 Women:





Family relation terms
Husband, boyfriend
Names of women
Cute
Laughter, backchannels (uh-huh) -- acknowledging
Gender-dependent Language Models
 Big matched/mismatched differences in perplexity for
pair-dependent LMs (FF vs. MM biggest difference)
 Significant F/M difference
 BUT, best results are from combining all data, since
more data trumps gender differences
Overview
 Dimensions of register and genre
 Genre classification
 Computational considerations
 Examples
 Cues to social context: accommodation examples
 Impact on NLP: system engineering examples
Design Issues in HLT for New Genres
 Text normalization
 101  one hundred and one (text to speech)
 lol  laugh out loud (messaging, twitter, etc.)
 Lexicon differences
 New words/symbols
 Same words but different senses
 Feature engineering
 Model retraining or adaptation
Sentiment Detection on Twitter
 Text normalization
 Abbreviations: gr8  great, rotf  rolling on the floor
(http://www.noslang.com)
 Mapping targets and urls to generic token (||T||, ||U||)
 Spelling variants: coooool  coool
 Punctuation and other symbols
 Emoticons  emoticon polarity dictionary
 Emphasis punctuation: !!!!, ????, !*?#!!
 Only 30% of tokens are found in WordNet
Argarwal et al., 2011
Sentiment in Twitter (Agarwal et al. cont.)
 Feature engineering:
 Unigrams
 Sentiment features (counts & polarity scores of pos/neg
sentiment words from dictionary; punctuation,
capitalization)
 Tree kernel
 Model: SVM
 Observations
 100 senti-features have similar performance to 10k
unigrams alone, tree kernel is better, combo is best
 Pos/neg acc = 75.4%, Pos/Neg/Neutral acc = 60.6%
 Most important features are prior word polarity & POS
N-gram Language Modeling
 Conventional wisdom:
 Mercer: There’s no data like more data.
 Banko & Brill: Getting more data has more impact than algorithm
tuning.
 Manning & Schutze: Having more training data is generally more
useful than any concern of balance.
 Since the 70’s, the amount of data used in language model
training has grown by an order of magnitude every decade
 Problem: Genre mismatch
 Mismatched data can actually hurt performance (e.g. using
newswire to train air travel information system)
 General web n-gram statistics ≠ general English (bias of advertising
& pornography)
More Text/Transcript Examples
 Meeting transcript
A: okay. so there are certain cues that are very strong either lexical or topic-based
um concept cues
B: from the discourse that – yeah.
A: for one of those. and then in that second row or whatever that row of time of day
through that – so all of those – some of them come from the utterance and some
of them are sort of either world knowledge or situational things. right? so that
you have no distinction between those and okay
B: right. one uh – uh. um, anything else you want to say Bhaskara?
C: um
A: time of day
C: yeah i m- i mean –
B: one thing – uh –
D: yeah. they’re – they’re are a couple of more things. i mean uh. I would actually
suggest we go through this one more time so we – we all uh agree on what – what
the meaning of these things is at the moment and maybe what changes....
 WSJ:
Fujitsu Ltd.'s top executive took the unusual step of publicly apologizing for his
company's making bids of just one yen for several local government projects, while
computer rival NEC Corp. made a written apology for indulging in the same practice.
Examples (cont.)
 Lecture transcript
right okay so the expectation operator is actually a functional which
means it is a function of a function so we describe it as such here
is e. which means expectation then we may or may not indicate the
actual random variable over which we are taking the expectation
then we have an open bracket we have the function of which we
are taking the expectation and a closing bracket and this is in fact
equal to the integral over all x minus infinity infinity of f. at x. times
the probability of x. d. x. and this is actually the probability density
function of x. okay so us there are two expectations that are far more
important than all the rest the first one is ...
N-gram Language Modeling (cont.)
 Standard approach to dealing with this: mixture modeling
 Train separate language models on each data source
 Learn weights of the different components from target data
P(wt|wt-1) = Si li(wt-1) Pi(wt|wt-1)
Class-dependent Mixture Weights
100%
CTS
LM
90%
80%
70%
ch_en
60%
swbd-p2
swbd-cell
swbd
50%
40%
BN
web
30%
20%
Bulyko et al. 03)
10%
0%
1gr
2gr
3gr
No class
• Weights
2gr
3gr
Noun
2gr
3gr
Backchannel
for web data are higher for content words, lower
for conversational speech phenomena
• Higher order n-grams have higher weight on web data
N-gram Language Modeling (cont.)
 Some text sources hurt WER unless weight is very low:
 Newswire for telephone speech (Iyer & Ostendorf 99)
 Newswire for lectures (Fuegen et al. 06)
 General web data for talk shows (Marin et al. 09, even with
weight = .001)
 Small, query-based topic language model outperforms
large, static topic mixture (UW unpublished)
 Question: Can we get BETTER data from the web?
 Genre-specific web queries
 Genre filtering
N-gram Language Models (cont.)
 Bulyko et al., 2007
Register/Genre Take-Aways
 Our choice of wording depends on the social context: the
event, the audience, and our relationship to them
 Detecting different genres
 Is useful for information retrieval
 Is fairly reliable with just word-POS features and standard
classifiers
 Genre variations reflect social phenomena, so genre cues
are also useful for detecting social role, affiliation, etc.
 Genre variations in language impact the design of human
language technology in terms of: text processing, feature
engineering, and how we leverage different data sources
Download