M V W

advertisement
Georgia Institute of Technology
College of Computing
Swathi Bhat
Ganesh Bhat
Ganesh Bhat
MY VIRTUAL WORLD
Questions in Social Networks
MOTIVATION AND IDEA
Question Routing as an alternative to the Question and Answer approach in Yahoo Answers,
Reddit, Hunch among several others. We wanted to explore this Question and Answer approach
in a Social Network schema. A social network service focuses on building online
communities of people who share interests and/or activities, or who are interested in exploring
the interests and activities of others.
Most social network services are web based and provide a variety of ways for users to interact,
such as email and instant messaging services. Their principal strength has been an ability to
gather tens of millions of unique users. Their main shortcoming, however, has been the inability
of several social networks to monetize their volumes of signed-up users.
Social networking has encouraged new ways to communicate and share information. Social
networking websites are being used regularly by millions of people. he main types of social
networking services are those which contain category divisions (such as former school-year or
classmates), means to connect with friends (usually with self-description pages) and a
recommendation system linked to trust.
We seek to explore the idea of routing questions in this framework of a trusted network where
friends ask questions among friends only. Also we try several algorithms as routing approaches
and try to analyze them in terms of the answer count that we obtain for each of these routing
algorithms in the social network.
In this simulation, we compare three approaches to routing questions through a social network,
evaluating each approach in terms of the extent to which these questions are answered by
experts. In general, we want to simulate question routing on social networks, the process by
which questions are answered or passed along to other members of the social network. By
evaluating these approaches, we seek to explore the increasingly popular use of question/answer
systems on social networks.
INTRODUCTION
In this simulation, we compare three approaches to routing questions through a social network,
evaluating each approach in terms of the extent to which these questions are answered by
experts. In general, we want to simulate question routing on social networks, the process by
which questions are answered or passed along to other members of the social network. By
evaluating these approaches, we seek to explore the increasingly popular use of question/answer
systems on social networks.
The study of large-scale networks has emerged over the past several years as a theme that spans
many disciplines, ranging from computing and information science to the social and biological
sciences. Indeed, a shared interest in network structure is arguably one of the forces that is
helping draw many of these disciplines closer together. As one aspect of this broader theme, we
consider a convergence of ideas taking place at the boundary between distributed computer
networks and human social networks the former consisting of computing devices linked by an
underlying communication medium, and the latter consisting of people and organizations in
society connected by ties that represent friendship, interaction, and influence. Distributed
computing systems have long been intertwined with the social networks that link their user
populations.
Recent developments, however, have added further dimensions to this relationship: the growth
of blogging, social networking services, and other forms of social media on the Internet have
made large-scale social networks more transparent to the general public than ever before. We
discuss three related areas that illustrate the issues at this interface. The first is centered around
the small-world phenomenon the premise that most pairs of individuals in a social network are
linked by very short paths (or "six degrees of separation"). In earlier work, we proposed that the
social-psychology experiments providing the first empirical evidence for the phenomenon were
related in fundamental ways to the problem of decentralized routing, and this theme has been
pursued in a number of subsequent papers.
In the process, close connections have been developed to research in the design of decentralized
peer-to-peer systems,and some of the patterns suggested by the basic models of small-world
networks have been borne out to a striking extent by empirical studies of social network
structure.As a second area, we consider cascading behavior and the diffusion of information in
networks. Rumors, fads, innovations, social movements, and diseases spread through human
social networks in much the way that information propagates through a distributed system. And
as with small-world networks, the analogies between the computational and social versions of
these phenomena turn out to be deep rather than superficial.Communities and social networking
sites, and in the analysis of information cascades among weblog.
APPROACHES
To test the efficacy of these approaches, we simulated question routing across a virtual social
network, using ExpertRank, FriendRank, or RandomRank to decide who should receive the
question.
RandomRank
RandomRank is the baseline for comparing the various approaches. With this approach, person
P randomly selects one of her friends and routes the question to that friend.
FriendRank
FriendRank extends RandomRank, but passes the question to the most knowledgeable friend.
FriendRank assumes that every friend knows how knowledgeable his or her friends are, and it
passes the question to the most knowledgeable friend. FriendRank only maintains the local
perspective of the sender. In particular, the sender has no information about his friends' friends
knowledge.
Expert Rank
ExpertRank revises FriendRank, and passes the question to the friend with highest centrality
rating, or ExpertRank, in answering and receiving questions. This approach taking into account a
person's friends' friends expertise. In particular, we use the PageRank algorithm, a modification
of eigenvector centrality, to assign centrality scores to each person in a question map of Q. A
question map of Q, M(Q) is a directed graph of persons who have sent or received a question.
Each question map is a tree, rooted on the author of the question. Since question maps are trees,
there are no cycles in routing; hence, if person A has sent or received a question Q, then person A
cannot receive question Q. Each person P has a single, global ExpertRank for each question Q,
which relies on the sum of P's expertise at each tag T of question Q. A person's expertise at tag T
is calculated by taking the union of all question maps whose questions contain T, Um(T) and
then returning the centrality score for the person P. In the most robust account of expertise, we
would take into account accuracy and timeliness, but for this initial analysis, ExpertRank only
depends on centrality.
MODELING SOCIAL NETWORK
Our virtual social network is composed of Persons, who can be friends with other persons in
the network, and who have knowledge. To generate our social network, we randomly
created a set of persons and randomly selected friendships between those persons. We
assume:
1. Friendships are randomly distributed across the people in the social network.
2. Everyone has the same # of friends.
Modelling Knowledge
Knowledge is modeled through the use of Tags. There is a finite dictionary of tags which
represent all topics of knowledge. Each person has a knowledge score between 0 and 1 for
every tag T. We assume:
1. Knowledge scores are randomly distributed across all people and tags .
2. Knowledge scores remain constant, despite exposure to questions and answers.
Modelling Question Generation
Questions are modelled as subsets of tags. Answers are stub objects appended to
questions. Each answer can only have one question, but a question can have multiple
answers.
At the beginning of the simulation, we randomly generate a set of questions.
To generate each question, we:
1. Randomly select a person to ask a question.
2. Randomly select a set of tags to constitute the question.
We assume:
1.
2.
3.
4.
5.
Each question Q has a set of tags from the finite universal dictionary D.
Every tag Ti of question Q is equally relevant to any other tag Tj of Q.
Tags are randomly distributed across the questions in the social network.
Every question has the same # of tags.
Questions are asked by a random person P in the social network, independent of that P's
expertise in the tags.
Modelling Question Routing And Answer Generation
After we have generated our questions. We simulate the routing and answering of them.
To do this, we use the concept of an active question. An active question is one which can
either be routed to another person or answered. A question becomes inactive if every
person who could pass or answer it has decided to answer it, pass it, or neither. Here is the
general algorithm for routing and answering questions:
Initially only the authors of the questions possess it, and the question is active for all
authors.
While(there is an active question)
Select a random active question Q.
Select a random person P who has the question Q.
P decides whether to respond to Q
If P decides to respond to Q
P decides whether to answer Q or pass Q to a friend.
If the decides to answer Q,
P provides a random answer A for Q.
If P decides to pass Q to a friend,
P uses the specified routing approach (either RandomRank, FriendRank, or
ExpertRank) to select a friend F
P passes Q to F
Make the question inactive for P.
We assume:
1. A question can be routed or answered only if a person has a copy of the question.
2. A person can either route or answer a question, but not both.
3. A person can route or answer a question only once.
4. A person can only route questions to friends.
5. When deciding whether to answer or route a question authored by a friend, all friends
have the same probability, FriendResponseRate
6. When deciding whether to answer or route a question authored by a non-friend, all nonfriends, have the same probability, NonFriendResponseRate
7. A person will answer a question if they have a certain level of knowledge,
minKnowledgeToAnswer, about the question's tags, otherwise they will pass it.
8. The minimum knowledge to answer a question is the same for all people, and remains
constant through the simulation.
9. The ranking algorithms do NOT decide whether a person will route or answer, but only
decide to whom a person should route a question if the question is to be routed.
MY VIRTUAL WORLD API
Front-end
Interactive Interface to enter the statistics for simulation:
PERFORMANCE METRICS
To evaluate the efficacy of each routing approach, we will consider 3 dependent variables,
and 9 independent variables:
Dependent Variables
1. Answer Ratio = (the number of people who answered the question) / (the # of people
who received the question)
2. Pass Ratio = (the number of people who passed the question to another person) / (the #
of people who received the question)
3. Average Knowledge Per Question = (for each question answered, sum of the answerers
knowledge of that question) / (the # of questions)
Independent Variables
1. # of Persons = size of social network = (200)
2. # of Friendships Per Person. = (5)
3. # of Tags = size of the tag dictionary = (20)
4. # of Questions = the number of questions asked = (from 10 to 100, incrementing by 10)
5. # of Tags Per Question = (1)
6. Friend Response Rate = the probability that a person will respond to a friends question by
either answering the question or passing it to another friend. = (.5)
7. Non-friend Response Rate = the probability that a person will respond to a non-friends
question by either answering the question or passing it to another friend. = (.5)
8. Maximum # of Passes Per Question = (10)
9. RouterApproachType = algorithm which decides which friend to pass a question to =
(RandomRank, FriendRank, ExpertRank)
We assume that a routing approach is effective to the extent that it maximizes the Average
Knowledge Per Question.
EXPERIMENTAL EVALUATION
We have run the simulation
We successfully designed and implemented an abstract framework for testing question routing,
and used this framework to compare the performance of three approaches to routing questions,
RandomRank, FriendRank, and ExpertRank. While the results of our experiment did not provide
strong evidence that the algorithms have significant performance differences, we cannot yet
conclude that these approaches lack significant performance differences. We believe that we
may be able discern significant performance differences by tweaking the indepedent variables,
and/or changing the initial topology of the question maps. We hypothesize that the near
uniformity of performance across the router approach types may be due to the random
distribution of knowledge across the social network.
CONCLUSION
We would like to revise ExpertRank, which currently only uses centrality to discern
expertise. This centrality measure does not take into account the knowledge scores of
friends, but rather only relies on the structure of the question maps, how people have
historically
passed questions. But when we test ExpertRank, we generate these question maps without
reference to the knowledge scores of any of the people, but only with reference to the
existing structural centrality of the question maps. This means that the structure of the
question maps will not reflect the knowledge scores of any of the people, and may explain
why ExpertRank performs similarly to RandomRank. To overcome this, we would like to
revise ExpertRank to take into account FriendRank or some other method that uses accurate
knowledge scores of friends.
Also, we would like to revise and test certain assumptions, which we think would make the
question routing model more realistic. For example, not everyone has the same number of
friends. We would like to find out the actual distribution of friends in social networks in
order to test our model. Similarly, we would like to test our model on actual distributions of
tags based on the frequencies of words in the English language. And we would like to find
out how our model works when we allow users to both answer a question and route it. And
finally, we would like to found out the likelihood that a question is posed by an expert, so
that we can more accurately model the seeding of questions in our network.
Download