watching - Idiap Research Institute

advertisement
computational social media
lecture 05: watching
daniel gatica-perez
13.05.2015
this lecture
1. the rise of online video
2. conceptualizing YouTube
3. conversational social video
uses of conversational video
verbal and nonverbal behavioral analysis
personality and mood impressions
1
“Everything exists to end up in a book”
Stéphane Mallarmé, 1842-1998
“Everything exists to end up in a photograph”
Susan Sontag, On Photography, 1977
“Everything exists to end up in YouTube”
(this one is probably mine)
1.
the rise of online video
2
http://www.pewinternet.org/2013/10/28/photo-and-video-sharing-grow-online/
http://www.pewinternet.org/2013/10/28/additional-analysis/
3
watch:
http://www.pewinternet.org/2013/10/10/video-the-rise-of-online-video/
2.
conceptualizing YouTube
4
http://youtube-trends.blogspot.ch/
YouTube statistics
(accessed may 2014, slightly updated in 2015)
founded in February 2005
most popular online video community
1B unique users visit each month
6B hours of video watched each month
300h of video uploaded per minute (100h
in 2014)
more US 18-34yo than any cable network
80% traffic from outside US
mobile: 40% of global watch time
top YouTube creators more popular than
mainstream celebrities among US teens
http://www.youtube.com/yt/press/statistics.html
5
image (cc): alan klim @ flickr
synchronous
few-to-many
passive
centralized
asynchronous
many-to-many
interactive
decentralized
https://variety.com/2014/digital/news/survey-youtube-stars-more-popular-than-mainstream-celebs-among-u-s-teens-1201275245/
6
View #1: YouTube is a weird place
silliness
YouTube and the rudeness of crowds
absurdity
http://stupid-youtube-comments.blogspot.com/
J. Stossel, ABC 20/20:
"Do you like watching kids doing stupid and reckless things?
Beauty queens falling down?
Or a thousand prisoners dancing to the music of Thriller?
It’s all in YouTube"
Burgess and Green:
"Rather than video about nothing, this could be situated in the
much longer history of vernacular creativity – the wide range
of everyday creative practices ... practiced outside the cultural
value systems of high culture or commercial practice"
J. Burgess and J. Green, YouTube. Online Video and Participatory Culture, Polity, 2009
7
View #2: YouTube is a video high-school
YouTube and popular videos
analysis of video popularity
distributions (power-law with
truncated tails)
large-scale analysis (106 videos)
no content was analyzed
shifts in video popularity over time
video popularity distribution of YouTube:
power-law in waist, sharp decay in tail
M. Cha, H. Kwak, P. Rodriguez, Y.-Y. Ahn and S. Moon, "I Tube, You Tube, Everybody Tubes: Analyzing the World's Largest
User Generated Content Video System," ACM Internet Measurement Conference, Oct. 2007
8
View #3: YouTube is a place of pirates
YouTube and copyright infringement
near-duplicate (ND) detection:
popular research topic
YouTube partners with TV and
movie companies to remove
unauthorized copies
NDs are typically seen as
redundant and deemed to be
eliminated...
...NDs are not always uploaded
with bad intent but to enrich the
original material (e.g. subtitles)
M. Cherubini, R. Oliveira, and N. Oliver, "Understanding Near-Duplicate Videos: A User-Centric Approach",
ACM Multimedia, Oct. 2009
9
View #4: YouTube is a dangerous place
YouTube and political radicalization
analysis of online supporters of
jihad-promoting video content
people posting and commenting
martyr-promoting material (Iraq)
small-scale analysis (50 videos, 30
users, 940-user network)
85% of users under 35 yo
biggest supporters located in the
US (42%), UK (15%), Canada
(8%), Germany (7%)
http://www.smugnews.com/page/3/
M. Conway and L. McInerney, "Jihadi video and autoradicalisation: Evidence from an exploratory YouTube Study," First
European Conference on Intelligence and Security Informatics, Dec. 2008,
10
View #5: YouTube is a place for marketing
http://www.youtube.com/yt/advertise/index.html
11
View #6: YouTube is a place for expression
12
conversational social video: provide a rich
communication experience
reading session
Mirjam Wattenhofer, Roger Wattenhofer, and Zack Zhu
The YouTube Social Network
Proc. ICWSM 2012
http://static.googleusercontent.com/media/research.google.com/en//pubs/ar
chive/37738.pdf
13
3.
conversational social video
video advice
http://www.isasaweis.com/
14
video testimonials
https://www.lafitness.com/Pages/VidTestimonialPlay.aspx
video college applications
https://www.youtube.com/watch?v=KRDMYDCd0fk
15
political participation: #YoSoy132
La Jornada, Mexico, 15.05.2012, http://www.jornada.unam.mx/2012/05/15/
two communication channels
https://www.lafitness.com/Pages/VidTestimonialPlay.aspx
sentiment
topic
language style
opinion
beyond
https://www.lafitness.com/Pages/VidTestimonialPlay.aspx
16
nonverbal communication
gaze
gestures
alternative to spoken words
indicate states, traits, relationships
unconscious, hard to fake
proxemics
accurate judgments
prosody and speaking
activity
body posture
nonverbal communication in vlogs
nonverbal cues are all there
data in the wild
context
personal: individual differences
social: relationship types
temporal: relationships evolve
cultural: regional differences
http://www.coverpop.com
17
research framework
1. video crowdsourcing
study social perception
2. nonverbal behavioral analysis
automatically characterize vloggers
1. video crowdsourcing
study social perception
can we crowdsource reliable human impressions from vlogs?
J.-I. Biel and D. Gatica-Perez, “The Good, the Bad, and the Angry: Analyzing Crowdsourced Impressions of
Vloggers,” in Proc. AAAI Int. Conf. on Weblogs and Social Media, Dublin, Jun. 2012
18
a basic model for nonverbal behavior and
interpersonal perception
self-reports
impressions
R. Gifford, “A Lens-Mapping Framework for Understanding the Encoding and Decoding of Interpersonal Dispositions
in Nonverbal Behavior,” Journal of Personality and Social Psychology, 1994. Vol. 66. No. 2, 398-412
vlogger impressions
attractiveness
beautiful
smart
sexy
mood
happy
relaxed
bored
stressed
angry
big-five traits
extraverted
agreeable
conscientious
stable
open
19
vlog data
shot-based analysis
conversational data
442 YouTube vloggers
53% female / 47% male
first conversational minute
crowdsourcing to collect impressions
source: cooltownstudios
Wikipedia:
“Crowdsourcing is a process that involves outsourcing tasks to a
distributed group of people... an undefined public rather than a specific one.”
20
crowdsourcing impressions
watch one-minute slices
answer questionnaires
big-five (Gosling ‘03)
attractiveness
mood
5 annotators/vlog
113 workers (US & India)
crowdsourced vlogger demographics (majority vote)
groundtruth:
47% M; 53% F
21
descriptive stats
of impressions
impression reliability
Intraclass correlation (ICC): agreement achieved with aggregated scores across
annotators*
Trait
Extr
ICC
0.77
Mood
ICC
Happy
.76
Attractiveness
ICC
Excited
.74
Beautiful
.69
Angry
.67
Sexy
.60
Disappointed
.61
Friendly
.51
Sad
.58
Likable
.44
Smart
.35
Over. attr
.61
Agr
0.65
Open
0.47
Cons
0.45
Relaxed
.54
Emot
0.42
Bored
.52
Stressed
.50
Surprised
.48
Nervous
.25
Over. mood
.75
*Shrout & Fleiss (1979). "Intraclass Correlations: Uses in Assessing Rater Reliability".
Psychological Bulletin 86 (2): 420–428
22
connections among impressions
halo effect: more attractive people are
attributed more positive traits
Dion et al., Stereotyping physical attractiveness: A
sociocultural perspective. Journal of Cross-Cultural
Psychology, 21(2):158–179, 1990.
2. nonverbal behavioral analysis
automatically characterize vloggers
are verbal & nonverbal cues linked with personality and mood
impressions lin vlogging?
J.-I. Biel and D. Gatica-Perez, "The YouTube Lense: Crowdsourced Personality Impressions and Audiovisual Analysis of
Vlogs,” IEEE Trans. on Multimedia, Jan. 2013
D. Sanchez-Cortes, J.-I Biel, S. Kumano, J. Yamato, K. Otsuka, and D. Gatica-Perez, Inferring Mood in Ubiquitous
Conversational Video in Proc. Int. Conf. on Mobile and Ubiquitous Multimedia (MUM), Lulea, Dec. 2013
23
nonverbal cue extraction (1)
from activity segmentations
AUDIO
Prosody
• Energy
• Pitch
• Voice rate
VISUAL
AUDIO
• Speaking time
• Num Turns
VISUAL
• Looking time
• Num Turns
• Proximity to camera
• Framing
MULTIMODAL
47
Looking & Speaking
Looking & not-speaking
• Accumulated motion
47
nonverbal cues (2)
Computer Expression
Recognition Toolbox (CERT)
• basic facial expressions
• smile
raw statistical aggregates
from segmentations
• Active Time
• Num segments
24
verbal cues
Linguistic Inquiry Word Count (LIWC)
65 categories related to psychological
constructs and personal concerns
word count per category
big-five impressions & audio/visual activity cues
Cue utilization (# of significant correlations)
Trait
Extr
# cues
24
Cons Open
16
12
Agr
Emot
10
3
Selected Effects (p<0.05)
+
speaking time
(talkative)
Extr visual activity
(dynamic)
dominance ratio
Cons
Agr
50
-
speaking turns
(fluency)
•
•
•
Extr: highest cue
utilization
Agr: low cue
utilization
Effects backed up by
social psych literature
looking time
visual activity
(persistent gaze) (quiet)
vertical framing
(upper body)
25
big-five impressions & facial expression cues
Cue utilization (# of significant correlations)
Trait
Extr
Open
Agr
# cues
75
37
27
Cons Emot
17
•
Extr: highest cue
utilization
9
•
FE favor
impressions of
Open and Agr
Selected Effects (p<0.05)
+
Extr
-
joy, smile
Open surprise
Agr
•
anger, disgust
anger
joy, smile
anger
Joy, anger and
smile show
consistent effects
big-five impressions & verbal cues
Cue utilization (# of significant correlations)
Trait
Cons
Agr
Extr
# cues
24
15
12
Emot Open
12
10
•
Higher cue utilization
for Cons and Agr
Selected Effects (p<0.05)
+
Cons
Agr
work, achieve
(long words)
negate, swear
posemo, i,
friend
anger, negemo,
Emot leisure, work
Extr
52
-
you, social,
sexual
•
Many effects backed
up by previous
research
negemo, affect
tentative,
exclusive
26
personality inference
5 regression tasks on Big-Five scores
machine learning: SVM and RF
performance measured with R2
squared sum of
prediction errors
squared sum of prediction errors
using mean predictor
prediction results (random forest)
R2
R2
Trait
Extr
Cons
Open
Agr
Emot
AV
.39
.10
.10
.06
.06
FE
.24
.06
.11
.08
.07
Trait
Agr
Cons
Emot
Extr
Open
Manual Automatic
transcripts transcripts
.31
.18
.17
.13
.04
.10
.08
.05
.02
.02
AV = audiovisual activity cues
FE = facial expression cues
27
feature fusion for personality impression inference
•
Extraversion:
AV + FE
•
Openness to Experience:
AV + FE
•
R2 = .48
R 2 = .17
Agreeableness
FE + VB
R 2 = .39
mood classification
●
classifiers
● SVM (Gaussian kernel)
● random forest
●
two classes per mood (divided by median value)
●
10-fold cross validation
28
performance for happy mood (random forest )
A: Audio
V: Visual
F: Facial expressions
L: Verbal
–– Baseline
–– Significantly better
than baseline
57
relevant word categories
happy
angry
29
best results per mood
Mood
Accuracy (%)
Features
Overall
69.0 (RF)
Verbal+Facial
Happy
64.0 (RF)
Verbal+Facial
Excited
68.3 (RF)
Audio+Visual+Facial
Angry
65.1 (SVM)
Verbal+Facial
Disappointed
66.0 (RF)
Verbal+Visual
Sad
64.9 (SVM)
All
Relaxed
66.0 (SVM)
All
Bored
64.1 (RF)
Audio+Visual+Facial
(all statistically significantly better than random)
59
conclusions
online video is different than text & images
is rising
YouTube
is multifaceted
and complex to understand
conversational video
multimedia behavioral analysis
mood & big-5
personal & cultural differences
many applications
30
questions?
gatica@idiap.ch
daniel.gatica-perez@epfl.ch
@dgaticaperez
31
Download