Wisdom in the Social Crowd: an Analysis of Quora

advertisement
Wisdom In The Social Crowd:
An Analysis Of Quora
Gang Wang, Konark Gill, Manish Mohanlal,
Haitao Zheng and Ben Y. Zhao
University of California at Santa Barbara
gangw@cs.ucsb.edu
Asking Questions on the Internet
• Systems to answer user questions on the Internet
• Google - general information
• Wikipedia - factual knowledge
Q: What is the population of Rio?
• But we often have questions that require…
• Domain-specific knowledge
• First-hand life experiences
Q: What is the most interesting souvenir you can buy in Rio?
1
Online Q&A Services Today
• Question and Answer (Q&A) sites
• Web services where people ask and answer questions
• A crowd-sourced way to search information
• Large online knowledge repositories
• 300+ Million questions
• 1+ Billion answers
• 3.5+ Million questions
• 6.8+ Million answers
• As the Q&A systems grow to massive scales…
• More difficult for users to locate useful answers
or interesting questions
• Low-value questions (spam) overwhelm the system
2
Quora - Social Q&A
• “Hottest” (most successful) today
• First social network based Q&A
• 350% traffic growth in 2012
• Many answers are returned
as top answers to Google queries
• Quora’s advantages
• High-quality questions and answers
• True domain experts participation
politicians, actors, startup founders, etc.
3
How does Quora’s internal structures contribute to its success?
A Measurement Study of Quora
• Limited understanding of Quora
• Size of site (questions, users), growth rate
• Mechanisms for content discovery, quality control
• Questions we asked in our study
• How does Quora grow over time?
• What’s the impact of social graph on Q&A activities?
• How does Quora direct users to the valuable content?
Match experts w/ questions, and seekers w/ answers
4
Outline
• Introduction
• Characterizing Quora
• Analyzing Graph Structures
• Implications
5
A Typical Question Page
Topics
Related Questions
Question
Votes
Answer
6
Graphs, Graphs, More Graphs
• User-topic graph: user following topics
• Social graph: user following other users
• Related question graph: connecting related questions
Topics
Q
Q
Q
Q
A
A
A
Q
7
Data Collection
Website
Data
Since
Total
Questions
Total
Topics
Total
Users
Total
Answers
Question
Coverage
Quora
Oct. 2009
437K
56K
264K
979K
58%
StackOverflow
Jun. 2008
3.45M
22K
1.3M
6.86M
100%
• Crawling Quora
• Snowball-crawled related question graph (August 2012)
• Obtained the largest connected component
• Slow speed, minor impact to the site
• Using the dataset of StackOverflow as a comparison
8
Growth Over Time
Number of Questions
10,000,000
1,000,000
761K
Total # of questions
estimated by Qid
100,000
437K (58%)
10,000
Stack Overflow
Similar growth trend with StackOverflow
1,000
Quora (Total)
Quora (Crawled)
100
0
5 2009/5
10 15 2010/3
20 25 2011/1
30 35 2011/11
40 45 2012/7
50
2008/7
9
Outline
• Introduction
• Characterizing Quora
• Analyzing Graph Structures
• Social Graph
• Related Question Graph
Details on User-Topic
Graph in the paper!
• Implications
10
How do social ties impact Q&A activities?
11
Social Graph Structure
• Users can follow other users to build social connections
• Asymmetric social graph
• Users receive items in their newsfeed from people they follow
100
CCDF (%)
10
Social degree has power-law distribution
1
Followers
Followees
0.1
0.01
12
0.001
1
10
100
1000
Social Degree
10000
100000
100,000
100,000
10,000
10,000
1,000
100
10
Followers Per User
(Average)
Followers Per User
(Average)
Is the Social Graph Meaningful?
1
1,000
100
10
1
1 answers
10
100
1,000 10,000
1
100 followers
10,000
• More
or high-quality
answers ==
more
User Answers
User Received Votes
• Social structure could indicate content quality
• Correlation between user’s # of followers and
• # of total answers the user wrote
• # of votes the user ever received
13
Using Social Ties to Attract Answers
• Would social ties help to attract answers?
• Defining “super-users”
Questions
(%)ofofQuestions
CDF
CDF(%)
• Top 5% users sorted by # of followers
100
80
80
Normal Users
60
60
Normal Users
Super Users
40
Super Users
40
20
Social20
ties
0 have no effect on attracting answers
0.01
0.1
1
00.001
0 2 4Answers
6 8 Per
10 Question
12 14 16 18 20
(Normalized
by #Followers)
Answers Per
Question
14
How does Quora direct users to
“interesting” questions?
15
Related Question Graph
Q
Q
• Related question feature
Q
Q
• Allows users to browse a series of related questions
• Related question graph
• Questions as nodes, edges indicates “related” relationships
• Power-law structure
• A small set of “core” questions
inside each topic
CCDF (%)
• Graph properties
100
10
1
0.1
0.01
0.001
0.0001
16
1
10
100
1000
Question Degree
Impact of Question Degree
7
Views
4000
6
Answers
5
3000
4
2000
3
Average Number of Answers
Average Number of Views
5000
• Strong correlation between question degree and
1000 user’s attention on the question
2
• Question graph drives users to “core” questions
0
1
0
5
10
15
20
25
Questions Bucketized By Degree
30
35
17
User Attention on Similar Questions
• Similar questions in Quora
• Questions around very close (same) subjects
• Redundant questions asked by different users
• Do users pay equal attention to similar questions?
• Locating similar questions by partitioning question graph
• METIS, produce clusters, each contains similar questions
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
18
Equal Attention on Similar Questions?
• Is user attention evenly distributed in each cluster?
• Gini coefficient (G): evaluate the uniformity of distribution
% of Total Views
•
•
19
0
G=0 0
% of Total Views
100 100
100
100
80 80
80
80
60
60
Answers60
40 60User attention is highly
40 skewed in each cluster
40
20
20
Views 20
0 distracted by similar 0questions
0 40Excellent! users are not
0
50
100
0
50
100
0
50 100
20of Questions
% of Questions
%
% of Questions
CDF of Clusters (%)
% of Total Views
• G=0: perfect equality
• G~1: extremely skewed distribution
0.2
G=0.4
0.4
0.6
Gini Coefficient (per Cluster)
0.8
G=0.9 1
Implication and Conclusion
• Implication for crowdsourcing content sites
• Q&A sites
• Users attention is “skewed” to top questions
• Avoid distraction, encourage contribution
• Other sites such as Yelp, TripAdvisor
• Drive enough reviews to key venues
• Ensure reliable rating
• The first large-scale measurement study on Quora
• Graph structures contribute to effective content discovery
• Social graph indicates content quality
• Question graph focuses user attention
20
Thank you!
Questions?
21
Download