computational social media lecture 05: watching daniel gatica-perez 13.05.2015 this lecture 1. the rise of online video 2. conceptualizing YouTube 3. conversational social video uses of conversational video verbal and nonverbal behavioral analysis personality and mood impressions 1 “Everything exists to end up in a book” Stéphane Mallarmé, 1842-1998 “Everything exists to end up in a photograph” Susan Sontag, On Photography, 1977 “Everything exists to end up in YouTube” (this one is probably mine) 1. the rise of online video 2 http://www.pewinternet.org/2013/10/28/photo-and-video-sharing-grow-online/ http://www.pewinternet.org/2013/10/28/additional-analysis/ 3 watch: http://www.pewinternet.org/2013/10/10/video-the-rise-of-online-video/ 2. conceptualizing YouTube 4 http://youtube-trends.blogspot.ch/ YouTube statistics (accessed may 2014, slightly updated in 2015) founded in February 2005 most popular online video community 1B unique users visit each month 6B hours of video watched each month 300h of video uploaded per minute (100h in 2014) more US 18-34yo than any cable network 80% traffic from outside US mobile: 40% of global watch time top YouTube creators more popular than mainstream celebrities among US teens http://www.youtube.com/yt/press/statistics.html 5 image (cc): alan klim @ flickr synchronous few-to-many passive centralized asynchronous many-to-many interactive decentralized https://variety.com/2014/digital/news/survey-youtube-stars-more-popular-than-mainstream-celebs-among-u-s-teens-1201275245/ 6 View #1: YouTube is a weird place silliness YouTube and the rudeness of crowds absurdity http://stupid-youtube-comments.blogspot.com/ J. Stossel, ABC 20/20: "Do you like watching kids doing stupid and reckless things? Beauty queens falling down? Or a thousand prisoners dancing to the music of Thriller? It’s all in YouTube" Burgess and Green: "Rather than video about nothing, this could be situated in the much longer history of vernacular creativity – the wide range of everyday creative practices ... practiced outside the cultural value systems of high culture or commercial practice" J. Burgess and J. Green, YouTube. Online Video and Participatory Culture, Polity, 2009 7 View #2: YouTube is a video high-school YouTube and popular videos analysis of video popularity distributions (power-law with truncated tails) large-scale analysis (106 videos) no content was analyzed shifts in video popularity over time video popularity distribution of YouTube: power-law in waist, sharp decay in tail M. Cha, H. Kwak, P. Rodriguez, Y.-Y. Ahn and S. Moon, "I Tube, You Tube, Everybody Tubes: Analyzing the World's Largest User Generated Content Video System," ACM Internet Measurement Conference, Oct. 2007 8 View #3: YouTube is a place of pirates YouTube and copyright infringement near-duplicate (ND) detection: popular research topic YouTube partners with TV and movie companies to remove unauthorized copies NDs are typically seen as redundant and deemed to be eliminated... ...NDs are not always uploaded with bad intent but to enrich the original material (e.g. subtitles) M. Cherubini, R. Oliveira, and N. Oliver, "Understanding Near-Duplicate Videos: A User-Centric Approach", ACM Multimedia, Oct. 2009 9 View #4: YouTube is a dangerous place YouTube and political radicalization analysis of online supporters of jihad-promoting video content people posting and commenting martyr-promoting material (Iraq) small-scale analysis (50 videos, 30 users, 940-user network) 85% of users under 35 yo biggest supporters located in the US (42%), UK (15%), Canada (8%), Germany (7%) http://www.smugnews.com/page/3/ M. Conway and L. McInerney, "Jihadi video and autoradicalisation: Evidence from an exploratory YouTube Study," First European Conference on Intelligence and Security Informatics, Dec. 2008, 10 View #5: YouTube is a place for marketing http://www.youtube.com/yt/advertise/index.html 11 View #6: YouTube is a place for expression 12 conversational social video: provide a rich communication experience reading session Mirjam Wattenhofer, Roger Wattenhofer, and Zack Zhu The YouTube Social Network Proc. ICWSM 2012 http://static.googleusercontent.com/media/research.google.com/en//pubs/ar chive/37738.pdf 13 3. conversational social video video advice http://www.isasaweis.com/ 14 video testimonials https://www.lafitness.com/Pages/VidTestimonialPlay.aspx video college applications https://www.youtube.com/watch?v=KRDMYDCd0fk 15 political participation: #YoSoy132 La Jornada, Mexico, 15.05.2012, http://www.jornada.unam.mx/2012/05/15/ two communication channels https://www.lafitness.com/Pages/VidTestimonialPlay.aspx sentiment topic language style opinion beyond https://www.lafitness.com/Pages/VidTestimonialPlay.aspx 16 nonverbal communication gaze gestures alternative to spoken words indicate states, traits, relationships unconscious, hard to fake proxemics accurate judgments prosody and speaking activity body posture nonverbal communication in vlogs nonverbal cues are all there data in the wild context personal: individual differences social: relationship types temporal: relationships evolve cultural: regional differences http://www.coverpop.com 17 research framework 1. video crowdsourcing study social perception 2. nonverbal behavioral analysis automatically characterize vloggers 1. video crowdsourcing study social perception can we crowdsource reliable human impressions from vlogs? J.-I. Biel and D. Gatica-Perez, “The Good, the Bad, and the Angry: Analyzing Crowdsourced Impressions of Vloggers,” in Proc. AAAI Int. Conf. on Weblogs and Social Media, Dublin, Jun. 2012 18 a basic model for nonverbal behavior and interpersonal perception self-reports impressions R. Gifford, “A Lens-Mapping Framework for Understanding the Encoding and Decoding of Interpersonal Dispositions in Nonverbal Behavior,” Journal of Personality and Social Psychology, 1994. Vol. 66. No. 2, 398-412 vlogger impressions attractiveness beautiful smart sexy mood happy relaxed bored stressed angry big-five traits extraverted agreeable conscientious stable open 19 vlog data shot-based analysis conversational data 442 YouTube vloggers 53% female / 47% male first conversational minute crowdsourcing to collect impressions source: cooltownstudios Wikipedia: “Crowdsourcing is a process that involves outsourcing tasks to a distributed group of people... an undefined public rather than a specific one.” 20 crowdsourcing impressions watch one-minute slices answer questionnaires big-five (Gosling ‘03) attractiveness mood 5 annotators/vlog 113 workers (US & India) crowdsourced vlogger demographics (majority vote) groundtruth: 47% M; 53% F 21 descriptive stats of impressions impression reliability Intraclass correlation (ICC): agreement achieved with aggregated scores across annotators* Trait Extr ICC 0.77 Mood ICC Happy .76 Attractiveness ICC Excited .74 Beautiful .69 Angry .67 Sexy .60 Disappointed .61 Friendly .51 Sad .58 Likable .44 Smart .35 Over. attr .61 Agr 0.65 Open 0.47 Cons 0.45 Relaxed .54 Emot 0.42 Bored .52 Stressed .50 Surprised .48 Nervous .25 Over. mood .75 *Shrout & Fleiss (1979). "Intraclass Correlations: Uses in Assessing Rater Reliability". Psychological Bulletin 86 (2): 420–428 22 connections among impressions halo effect: more attractive people are attributed more positive traits Dion et al., Stereotyping physical attractiveness: A sociocultural perspective. Journal of Cross-Cultural Psychology, 21(2):158–179, 1990. 2. nonverbal behavioral analysis automatically characterize vloggers are verbal & nonverbal cues linked with personality and mood impressions lin vlogging? J.-I. Biel and D. Gatica-Perez, "The YouTube Lense: Crowdsourced Personality Impressions and Audiovisual Analysis of Vlogs,” IEEE Trans. on Multimedia, Jan. 2013 D. Sanchez-Cortes, J.-I Biel, S. Kumano, J. Yamato, K. Otsuka, and D. Gatica-Perez, Inferring Mood in Ubiquitous Conversational Video in Proc. Int. Conf. on Mobile and Ubiquitous Multimedia (MUM), Lulea, Dec. 2013 23 nonverbal cue extraction (1) from activity segmentations AUDIO Prosody • Energy • Pitch • Voice rate VISUAL AUDIO • Speaking time • Num Turns VISUAL • Looking time • Num Turns • Proximity to camera • Framing MULTIMODAL 47 Looking & Speaking Looking & not-speaking • Accumulated motion 47 nonverbal cues (2) Computer Expression Recognition Toolbox (CERT) • basic facial expressions • smile raw statistical aggregates from segmentations • Active Time • Num segments 24 verbal cues Linguistic Inquiry Word Count (LIWC) 65 categories related to psychological constructs and personal concerns word count per category big-five impressions & audio/visual activity cues Cue utilization (# of significant correlations) Trait Extr # cues 24 Cons Open 16 12 Agr Emot 10 3 Selected Effects (p<0.05) + speaking time (talkative) Extr visual activity (dynamic) dominance ratio Cons Agr 50 - speaking turns (fluency) • • • Extr: highest cue utilization Agr: low cue utilization Effects backed up by social psych literature looking time visual activity (persistent gaze) (quiet) vertical framing (upper body) 25 big-five impressions & facial expression cues Cue utilization (# of significant correlations) Trait Extr Open Agr # cues 75 37 27 Cons Emot 17 • Extr: highest cue utilization 9 • FE favor impressions of Open and Agr Selected Effects (p<0.05) + Extr - joy, smile Open surprise Agr • anger, disgust anger joy, smile anger Joy, anger and smile show consistent effects big-five impressions & verbal cues Cue utilization (# of significant correlations) Trait Cons Agr Extr # cues 24 15 12 Emot Open 12 10 • Higher cue utilization for Cons and Agr Selected Effects (p<0.05) + Cons Agr work, achieve (long words) negate, swear posemo, i, friend anger, negemo, Emot leisure, work Extr 52 - you, social, sexual • Many effects backed up by previous research negemo, affect tentative, exclusive 26 personality inference 5 regression tasks on Big-Five scores machine learning: SVM and RF performance measured with R2 squared sum of prediction errors squared sum of prediction errors using mean predictor prediction results (random forest) R2 R2 Trait Extr Cons Open Agr Emot AV .39 .10 .10 .06 .06 FE .24 .06 .11 .08 .07 Trait Agr Cons Emot Extr Open Manual Automatic transcripts transcripts .31 .18 .17 .13 .04 .10 .08 .05 .02 .02 AV = audiovisual activity cues FE = facial expression cues 27 feature fusion for personality impression inference • Extraversion: AV + FE • Openness to Experience: AV + FE • R2 = .48 R 2 = .17 Agreeableness FE + VB R 2 = .39 mood classification ● classifiers ● SVM (Gaussian kernel) ● random forest ● two classes per mood (divided by median value) ● 10-fold cross validation 28 performance for happy mood (random forest ) A: Audio V: Visual F: Facial expressions L: Verbal –– Baseline –– Significantly better than baseline 57 relevant word categories happy angry 29 best results per mood Mood Accuracy (%) Features Overall 69.0 (RF) Verbal+Facial Happy 64.0 (RF) Verbal+Facial Excited 68.3 (RF) Audio+Visual+Facial Angry 65.1 (SVM) Verbal+Facial Disappointed 66.0 (RF) Verbal+Visual Sad 64.9 (SVM) All Relaxed 66.0 (SVM) All Bored 64.1 (RF) Audio+Visual+Facial (all statistically significantly better than random) 59 conclusions online video is different than text & images is rising YouTube is multifaceted and complex to understand conversational video multimedia behavioral analysis mood & big-5 personal & cultural differences many applications 30 questions? gatica@idiap.ch daniel.gatica-perez@epfl.ch @dgaticaperez 31