Social Media Intelligence: Measuring Brand Sentiment from Online Conversations David A. Schweidel Goizueta Business School Emory University October 2012 What’s Trending on Social Media? Agenda • Social Dynamics in Social Media Behavior – Why do people post a product opinion? What influences their posting behavior? (Moe and Schweidel 2012) – What is the impact of these social dynamics on product sales? (Moe and Trusov 2011) • Social Media Intelligence (work in progress with W. Moe) – What factors influence social media metrics? – How can we adjust our metrics for different sources of data? Why do people post? • Opinion formation – Pre/post purchase (Kuksov and Xie Pre-Purchase Evaluation E[uij] Purchase Decision and Product Experience 2008) – Customer satisfaction and wordof-mouth (Anderson and Sullivan 1993, Post-Purchase Evaluation Vij=f(uij, E[uij]) Anderson 1998) • Opinion expression – Opinion dynamics (Godes and Silva 2009, Li and Hitt 2008, Schlosser 2005, McAllister and Studlar 1991) – Opinion polls and voter turnout (see for example McAllister and Studlar 1991) LATENT EXPERIENCE MODEL Opinion formation versus opinion expression (Berinsky 2005) Incidence Decision Evaluation Decision SELECTION EFFECT ADJUSTMENT EFFECT Posted Product Ratings INCIDENCE & EVALUATION MODELS • Selection Effects: What influences participation? • Extremely dissatisfied customers are more likely to engage in offline word-of-mouth (Anderson 1998) • Online word-of-mouth is predominantly positive (e.g. Chevalier and Mayzlin 2006, Dellarocas and Narayan 2006) • Subject to the opinions of others – Bandwagon effects (McAllister and Studlar 1991, Marsh 1984) – Underdog effects (Gartner 1976, Straffin 1977) – Effect of consensus (Epstein and Strom 1981, Dubois 1983, Jackson 1983, Delli Carpini 1984, Sudman 1986) Adjustment effects: What influences posted ratings? • Empirical evidence of opinion dynamics – Online opinions decline as product matures (Li and Hitt 2008) – Online opinions decline with ordinality of rating (Godes and Silva 2009) • Other behavioral explanations of opinion dynamics – “Experts” differentiate from the crowd by being more negative (Schlosser 2005) – “Multiple audience” effects when opinion variance is high (Fleming et al 1990) Modeling Overview • Online product ratings for bath, fragrance and home retailer over 6 months in 2007 (sample of 200 products with 3681 ratings) • Two component model structure (Ying, Feinberg and Wedel 2006): – Incidence (Probit) – Evaluation (Ordered Probit) • Product utility links incidence and evaluation models (non-linear) • Bandwagon versus differentiation effects – Covariates of ratings environment – Separate but correlated effects in each model component • Product and individual heterogeneity Role of post-purchase evaluation (Vij) in rating incidence 0 1.5 2 2.5 3 3.5 Utility Effect on Posting Incidence -0.1 -0.2 -0.3 -0.4 -0.5 -0.6 -0.7 Individual-Product Utility Value 4 4.5 5 Classifying Opinion Contributors • Groups based on frequency of posts (β0) Empirical Trends 5 3.5 4.8 3 • Posted ratings – Average rating decreases over time – Variance increases over time • Poster composition – Community-builders are overrepresented in the posting population. – As forum evolves, participation from community-builders increases while that of LI and BW decreases. 2.5 4.4 4.2 2 4 1.5 3.8 3.6 Variance Average Rating 4.6 1 3.4 0.5 3.2 3 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 Rating Average Variance 0.7 Proportion of Posters 0.6 0.5 0.4 0.3 0.2 0.1 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 Rating % Activists % Low-Involvement Effect of Opinion Variance MEDIAN CUSTOMER BASE 5 POLARIZED CUSTOMER BASE 5 3.5 4.8 4.8 3 4.6 3 4.6 2 4 1.5 3.8 3.6 1 3.4 2.5 4.4 4.2 2 4 1.5 3.8 3.6 1 3.4 0.5 3.2 3 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 0.5 3.2 3 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 Rating Average Rating Variance Average Variance • Customer bases have same mean but different variance in opinions • With a polarized customer base, ratings exhibit: – Lower average with a significant decreasing trend – Greater variance • Negative ratings do not necessarily signal a lower average opinion among customers Variance 4.2 Average Rating 2.5 4.4 Variance Average Rating 3.5 Conclusions: Individual-level Analysis • Empirical findings – Heterogeneity in posting incidence and evaluation – Incidence and evaluation behavior are related and can result in a systematic evolution of posting population • General trends in the evolution of product forums: – Dominated by “activists”” – Participation by activists tends to increase as forum evolves while participation by low-involvement individuals tend to decrease • Implications: – Ratings environment does not necessarily reflect the opinions of the entire customer base or even the socially unbiased opinions of the posters – Posting behavior is subject to venue effects… Posting Decisions Opinion / Brand Evaluation Do I post? What do I post? Where do I post? Sentiment Venue Format Product Domain Attribute SOCIAL MEDIA METRICS The value of social media as a research tool • Does it matter where (i.e., blogs, microblogs, forums, ratings/reviews, etc.) we listen? – Product/topic differences across venues? – Systematic sentiment differences across venues? – Venue specific trends and dynamics? • How do social media metrics compare to other available measures? Social media monitoring IN PRACTICE • Early warning system: Kraft removed transfats from Oreos in 2003 after monitoring blogs IN RESEARCH • Twitter to predict sock prices (Bollen, Mao and Zeng 2011) • Twitter to predict movie sales (Rui, • Customer feedback: Land of Nod monitors reviews to help with product modifications and redesigns • Measuring sentiment: Social media listen platforms collect comments across venues Whinston and Winkler 2009) • Discussion forums to predict TV ratings (Godes and Mayzlin 2004) • Ratings and Reviews to predict sales (Chevalier and Mayzlin 2006) Does source of data matter? • Online venues (e.g., blogs, forums, social networks, micro-blogs) differ in: – Extent of social interaction – Amount of information – Audience attracted – Focal product/attribute • Venue is a choice – Consumers seek out brand communities (Muniz and O’Guinn 2001) – Venue depends on posting motivation (Chen and Kirmani 2012) • Social dynamics affects posting (Moe and Schweidel 2012, Moe and Trusov 2011) Research Objective • Assess online social media as a listening tool • Disentangle the following factors that can systematically influence posted sentiment – Venue differences – Product and attribute differences – Within venue trends and dynamics • Examine differences across different venue types – Sentiment – Product and attribute differences – Implications for social media monitoring and metrics Social Media Data • Provided by Converseon (leading online social media listening platform and agency) • Sample of approximately 500 postings per month pertaining to target brand • Comments manually coded for: – Sentiment (positive, neutral, negative) – Venue and venue format – Focal product/attribute • Categories: (1) enterprise software, (2) telecommunications, (3) credit card services, and (4) automobile manufacturing Data for Enterprise Software Brand • 140 products within the brand portfolio • 59 brand attributes (e.g., compatibility, price, service, etc.) • Social Media data spanned a 15 month period – June 2009 – August 2010 – 7565 posted comments – Across 800+ domains Venue Format Discussion Forum Micro-blog Blog Social Network Mainstream News Social News Wiki Video Review Sites Illustrative Website forums.adobe.com twitter.com wordpress.com linkedin.com cnbc.com digg.com adobe.wikia.com vimeo.com epinions.com Frequency 2728 2333 2274 155 36 19 10 6 4 Direct Experience 93% 37% 23% 40% 3% 47% 50% 0% 25% Modeling Social Media Sentiment • Comments coded as “negative”, “neutral”, or “positive” • Ordered probit regression r 1 Pr( y i r ) Pr( v ( i ) U i i v ( i ) ) r U i VS i * * venue-specific brand sentiment p ( i ), 1 a ( i ), 1 product effect attribute effect What affects venue-specific brand sentiment? • General brand impression (GBI) • Domain and venue effects (including dynamics) VS i GBI t (i ) d ( i ) v ( i ) v ( i ), t ( i ) domain effect venue-specific dynamics venue-format effect Venue Attractiveness • Model venue format as a choice made by the poster • Multinomial logit model Vij = g j + l j wi + k j × GBIt(i) effect of content on venue choice wi = p p(i),2 + aa(i),2 product effect Attribute effect Model Comparisons • Baseline model – Independent sentiment and venue decisions – Controlling for product, topic and domain effects Baseline + Sentiment link + Venue specific sentiment + Venue specific dynamics DIC 32588.0 30040.7 29677.2 29529.1 Sentiment hit rate 0.454 0.451 0.465 0.473 Venue choice hit rate 0.316 0.424 0.424 0.424 Sentiment metrics vary depending on what you are measuring. CORRELATIONS 0.5 0.4 0.3 0.2 GBI 0.1 Observed Average Sentiment 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 -0.1 Obs. Avg. Sentiment GBI GBI Obs. Avg. Sentiment Blog -0.2 Forum -0.3 Microblogs Blog Forum Microblogs 1 -0.0346 1 0.678 0.496 1 0.00409 0.263 -0.0870 1 0.751 0.503 0.8196 -0.139 1 Sentiment Differences across Venues (v(i)) 3.5 3 2.5 2 1.5 1 0.5 0 Divergent Trends across Venues (v(i),t(i)) 0.4 0.3 0.2 0.1 Blog 0 -0.1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Forum Micro-blog -0.2 -0.3 -0.4 -0.5 * Blogs, forums and microblogs are the 3 most common venues • Sentiment varies across venues • Venue-specific sentiment is subject to venue-specific dynamics Attribute Effects on Sentiment (a(i),1) 0.4 0.3 0.2 -0.5 Size of Company Security Quality of Product Price Mobile Compatibility -0.4 Hardware Compatibility -0.3 Ease of Installation -0.2 Documentation/Suppor t -0.1 Brand Reputation 0 Application Compatibility 0.1 -0.6 • Provides attribute-specific sentiment metrics • Empirical measures are problematic due to data sparsity • Correlation between model-based effects and observed attribute-sentiment metrics = -.276 Venue Attractiveness Results Venue Intercept (g) Product/Attrib. Effects (l) Blog 5.012 1.000 -1.610 Forum 5.655 -2.713 5.002 Mainstream Media 0.246 2.474 -2.632 Microblogs 5.378 1.005 2.978 Photoshare -4.509 -0.469 0.321 Review site -1.229 -2.427 0.767 Social network 2.349 1.771 1.086 Social news aggregator 0.607 -0.058 1.848 Video share -1.024 0.784 -1.105 -- -- -- Wiki Effect of GBI (k) Posters with positive sentiments toward the brand are attracted to forums and microblogs. Forums attract comments that focus on different products and attributes than microblogs. -0.2 -0.4 -0.6 * For products with >5 mentions only 0 Protocol Compatibility Application Compat. Hardware Compatibility Graphics support Resource optimization Ease of… Customization I/O Performance Technical Issue TOP 10 Server Virtualization Infrastructure Cost Private/Internal Cloud Future of Virtualization Market Share Microsoft Partnership Cloud Computing Marketing Marketing/Campaign Corporate Partnership 0.8 Industry News Attribute effects on venue choice (a(i),2) BOTTOM 10 0.6 0.4 0.2 Predictive Value of GBI • Offline brand tracking survey – Satisfaction surveys conducted from Nov 2009 to Aug 2010 in waves (overlapping with last 10 months of social media data) – Approximately 100 surveys conducted per month – 7 questions re: overall sentiment toward brand • Company stock price – Weekly and monthly closing prices for firm – Weekly and monthly closing S&P – June 2009 to September 2010 (extra month for lag) 9.1 0.3 9.05 0.2 9 0.1 8.95 0 8.9 -0.1 8.85 -0.2 8.8 -0.3 8.75 GBI Sentiment GBI at t Survey 0.4 9.1 0.3 9.05 0.2 9 0.1 8.95 0 8.9 -0.1 8.85 -0.2 8.8 -0.3 8.75 GBI at t-1 Survey Sentiment 0.4 Survey Survey Sentiment GBI Sentiment GBI vs. Offline Survey • Potential for GBI as a lead indicator • Correlation with survey – – – – – – GBI(t) = .375 [.277,.469]* GBI(t-1) = .875 [.824,.919]* Avg sentiment = -.0346 Blogs = .678 Forums = .00410 Microblogs = .751 GBI and Stock Price (DV=monthly close) Iteration level Posterior Means Constant S&P* GBI(t) GBI(t-1) Adj R-sq Coeff StdErr p-val -69.045 34.044 0.070 0.104 0.031 0.008 -16.695 10.324 0.137 30.693 10.375 0.014 .475 * Closing price in month Median Coeff % p-val <.05 Constant -67.096 0.138 S&P* 0.102 1 GBI(t) -15.872 0.012 GBI(t-1) 29.921 0.9892 Adj R-sq .4635 Observed Social Media Metrics (DV=monthly close) Average Coeff Constant p-val -44.550 30.776 Blogs Forums Microblogs Coeff p-val Coeff p-val Coeff p-val 0.178 -93.215 31.675 0.015 222.605 52.477 S&P** 0.073 0.026 0.018 0.071 0.021 0.007 -0.014 0.021 SM(t) 34.995 17.181 0.069 6.059 8.055 0.469 -42.767 10.838 SM(t-1) 5.078 18.008 0.784 18.267 8.289 0.052 -39.252 10.840 Adj R-sq 0.404 * Closing price in month 0.547 0.753 0.390 Conclusions • Social media behavior varies across venue formats Need to account for the source of SM data • Potential to use social media as market research Adjusted measure (GBI) can serve as lead indicator • Implications for academic research that use social media measures and for practitioners who monitor social media sentiment • Next steps – Additional data sets – Simulations of different GBI scenarios and resulting metrics