Wisdom In The Social Crowd: An Analysis Of Quora Gang Wang, Konark Gill, Manish Mohanlal, Haitao Zheng and Ben Y. Zhao University of California at Santa Barbara gangw@cs.ucsb.edu Asking Questions on the Internet • Systems to answer user questions on the Internet • Google - general information • Wikipedia - factual knowledge Q: What is the population of Rio? • But we often have questions that require… • Domain-specific knowledge • First-hand life experiences Q: What is the most interesting souvenir you can buy in Rio? 1 Online Q&A Services Today • Question and Answer (Q&A) sites • Web services where people ask and answer questions • A crowd-sourced way to search information • Large online knowledge repositories • 300+ Million questions • 1+ Billion answers • 3.5+ Million questions • 6.8+ Million answers • As the Q&A systems grow to massive scales… • More difficult for users to locate useful answers or interesting questions • Low-value questions (spam) overwhelm the system 2 Quora - Social Q&A • “Hottest” (most successful) today • First social network based Q&A • 350% traffic growth in 2012 • Many answers are returned as top answers to Google queries • Quora’s advantages • High-quality questions and answers • True domain experts participation politicians, actors, startup founders, etc. 3 How does Quora’s internal structures contribute to its success? A Measurement Study of Quora • Limited understanding of Quora • Size of site (questions, users), growth rate • Mechanisms for content discovery, quality control • Questions we asked in our study • How does Quora grow over time? • What’s the impact of social graph on Q&A activities? • How does Quora direct users to the valuable content? Match experts w/ questions, and seekers w/ answers 4 Outline • Introduction • Characterizing Quora • Analyzing Graph Structures • Implications 5 A Typical Question Page Topics Related Questions Question Votes Answer 6 Graphs, Graphs, More Graphs • User-topic graph: user following topics • Social graph: user following other users • Related question graph: connecting related questions Topics Q Q Q Q A A A Q 7 Data Collection Website Data Since Total Questions Total Topics Total Users Total Answers Question Coverage Quora Oct. 2009 437K 56K 264K 979K 58% StackOverflow Jun. 2008 3.45M 22K 1.3M 6.86M 100% • Crawling Quora • Snowball-crawled related question graph (August 2012) • Obtained the largest connected component • Slow speed, minor impact to the site • Using the dataset of StackOverflow as a comparison 8 Growth Over Time Number of Questions 10,000,000 1,000,000 761K Total # of questions estimated by Qid 100,000 437K (58%) 10,000 Stack Overflow Similar growth trend with StackOverflow 1,000 Quora (Total) Quora (Crawled) 100 0 5 2009/5 10 15 2010/3 20 25 2011/1 30 35 2011/11 40 45 2012/7 50 2008/7 9 Outline • Introduction • Characterizing Quora • Analyzing Graph Structures • Social Graph • Related Question Graph Details on User-Topic Graph in the paper! • Implications 10 How do social ties impact Q&A activities? 11 Social Graph Structure • Users can follow other users to build social connections • Asymmetric social graph • Users receive items in their newsfeed from people they follow 100 CCDF (%) 10 Social degree has power-law distribution 1 Followers Followees 0.1 0.01 12 0.001 1 10 100 1000 Social Degree 10000 100000 100,000 100,000 10,000 10,000 1,000 100 10 Followers Per User (Average) Followers Per User (Average) Is the Social Graph Meaningful? 1 1,000 100 10 1 1 answers 10 100 1,000 10,000 1 100 followers 10,000 • More or high-quality answers == more User Answers User Received Votes • Social structure could indicate content quality • Correlation between user’s # of followers and • # of total answers the user wrote • # of votes the user ever received 13 Using Social Ties to Attract Answers • Would social ties help to attract answers? • Defining “super-users” Questions (%)ofofQuestions CDF CDF(%) • Top 5% users sorted by # of followers 100 80 80 Normal Users 60 60 Normal Users Super Users 40 Super Users 40 20 Social20 ties 0 have no effect on attracting answers 0.01 0.1 1 00.001 0 2 4Answers 6 8 Per 10 Question 12 14 16 18 20 (Normalized by #Followers) Answers Per Question 14 How does Quora direct users to “interesting” questions? 15 Related Question Graph Q Q • Related question feature Q Q • Allows users to browse a series of related questions • Related question graph • Questions as nodes, edges indicates “related” relationships • Power-law structure • A small set of “core” questions inside each topic CCDF (%) • Graph properties 100 10 1 0.1 0.01 0.001 0.0001 16 1 10 100 1000 Question Degree Impact of Question Degree 7 Views 4000 6 Answers 5 3000 4 2000 3 Average Number of Answers Average Number of Views 5000 • Strong correlation between question degree and 1000 user’s attention on the question 2 • Question graph drives users to “core” questions 0 1 0 5 10 15 20 25 Questions Bucketized By Degree 30 35 17 User Attention on Similar Questions • Similar questions in Quora • Questions around very close (same) subjects • Redundant questions asked by different users • Do users pay equal attention to similar questions? • Locating similar questions by partitioning question graph • METIS, produce clusters, each contains similar questions Q Q Q Q Q Q Q Q Q Q 18 Equal Attention on Similar Questions? • Is user attention evenly distributed in each cluster? • Gini coefficient (G): evaluate the uniformity of distribution % of Total Views • • 19 0 G=0 0 % of Total Views 100 100 100 100 80 80 80 80 60 60 Answers60 40 60User attention is highly 40 skewed in each cluster 40 20 20 Views 20 0 distracted by similar 0questions 0 40Excellent! users are not 0 50 100 0 50 100 0 50 100 20of Questions % of Questions % % of Questions CDF of Clusters (%) % of Total Views • G=0: perfect equality • G~1: extremely skewed distribution 0.2 G=0.4 0.4 0.6 Gini Coefficient (per Cluster) 0.8 G=0.9 1 Implication and Conclusion • Implication for crowdsourcing content sites • Q&A sites • Users attention is “skewed” to top questions • Avoid distraction, encourage contribution • Other sites such as Yelp, TripAdvisor • Drive enough reviews to key venues • Ensure reliable rating • The first large-scale measurement study on Quora • Graph structures contribute to effective content discovery • Social graph indicates content quality • Question graph focuses user attention 20 Thank you! Questions? 21