WOSN-Final

advertisement
Why Watching Movie Tweets Won’t
Tell the Whole Story?
Felix Ming-Fai Wong, Soumya Sen, Mung Chiang
EE, Princeton University
WOSN’12. August 17, Helsinki.
1
Twitter: A Gold Mine?
Apps: Mombo, TwitCritics, Fflick
2
Or Too Good to be True?
3
Why representativeness?
 Selection bias
 Gender
 Age
 Education
 Computer skills
 Ethnicity
 Interests
 The silent majority
4
Why Movies?
 Large online interest
 Relatively unambiguous
 Right in Timing: Oscars
5
Key Questions
 Is there a bias in Twitter movie ratings?
 How do Twitters compare to other online site?
 Is there a quantifiable difference across types of movies?
 Oscar nominated vs. non-nominated commercial movies
 Can hype ensure Box-office gains?
6
Data Stats
 12 Million Tweets
 1.77M valid movie tweets
 February 2 – March 12, 2012
 34 movie titles
 New releases (20)
 Oscar-nominees (14)
7
Example Movies
Commercial Movies (non-nominees)
Oscar-nominees
The Grey
The Descendants
Underworld: Awakening
The Artist
Contraband
The Help
Haywire
Hugo
The Woman in Black
Midnight in Paris
Chronicle
Moneyball
The Vow
The Tree of Life
Journey 2: The Mysterious Island
War Horse
Oscar-nominees for Best Picture or Best Animated Film
8
Rating Comparison
 Twitter movie ratings
 Binary Classification: Positivity/Negativity
 IMDb, Rotten Tomatoes ratings
 Rating scale: 0 – 10, 0 – 5
 Benchmark definition




Rotten Tomatoes: Positive - 3.5/5
IMDb: Positive: 7/10
Movie score: weighted average
High mutual information
9
Challenges
 Common Words in titles
 e.g., “Thanks for the help”, “the grey sky”
 No API support for exact matching
 e.g., “a grey cat in the box”
 Misrepresentations
 e.g., “underworld awakening”
 Non-standard vocabulary
 e.g., “sick movie”
 Non-English tweets
 Retweets: approves original opinion
10
Tweet Classification
11
Steps
 Preprocessing
 Remove usernames
 Conversion:
 Punctuation marks (e.g., “!” to a meta-word “exclmark”)
 Emoticons (e,g,,  or :),  etc. ), URLs, “@”
 Feature Vector Conversion
 Preprocessed tweet to binary feature vector (using MALLET)
 Training & Classification
 11K non-repeated, randomly sampled tweets manually labeled
 Trained three classifiers
12
Classifier Comparison
 SVM based classifier performance is significantly better
 Numbers in brackets are balanced accuracy rates
13
Temporal Context
Fraction of tweets in joint-classes
 Most “current” tweets
are “mentions”
 LBS
 “Before” or “After”
tweets are much more
positive
Woman
in Black
Chronicle
The Vow Safe House
14
Sentiment: Positive Bias
Woman
in Black
Chronicle
The Vow Safe House
Haywire
Star Wars I
 Twitters are overwhelmingly more positive for almost all movies
15
Rotten Tomatoes vs. IMDb?
 Good match between RT and IMDb (Benchmark)
16
Twitters vs. IMDb vs. RT ratings (1)
IMDb vs. Twitter
RT vs. Twitter
 New (non-nominated) movies score more positively from Twitter users
17
Twitters vs. IMDb vs. RT ratings (2)
IMDb vs. Twitter
RT vs. Twitter
 Oscar-nominees rate more positively on IMDb and RT
18
Quantification Metrics (1)
Positiveness (P):
 (x*, y*) are the medians of
proportion of positive ratings
in Twitter and IMDb/RT
Bias (B):
IMDb/RT score
1
y*
B
P
x*
1
Twitter score
19
Quantification Metrics (2)
Inferrability (I):
 If one knows the average rating on Twitter,
how well can he/she predict the ratings on IMDb/RT?
20
Summary Metrics
 Oscar-nominees have higher positiveness
 RT-IMDb have high mutual inferrability and low bias
 Twitter users are more positively biased for non-nominated
new releases
21
Hype-approval vs. Ratings
This Means War
This Means War
The Vow
The Vow
 Higher (lower) Hype-approval ≠ Higher (lower) IMDb/RT ratings
22
Hype-approval vs. Box-office
Journey 2
 High Hype + High IMDb rating: Box-office success
 Other combination: Unpredictable
23
Summary
 Tweeters aren’t representative enough
 Need to be careful about tweets as online polls
 Tweeters’ tend to appear more positive, have specific
interests and taste
 Need for quantitative metrics
Thanks!
24
Download