Georgia Institute of Technology College of Computing Swathi Bhat Ganesh Bhat Ganesh Bhat MY VIRTUAL WORLD Questions in Social Networks MOTIVATION AND IDEA Question Routing as an alternative to the Question and Answer approach in Yahoo Answers, Reddit, Hunch among several others. We wanted to explore this Question and Answer approach in a Social Network schema. A social network service focuses on building online communities of people who share interests and/or activities, or who are interested in exploring the interests and activities of others. Most social network services are web based and provide a variety of ways for users to interact, such as email and instant messaging services. Their principal strength has been an ability to gather tens of millions of unique users. Their main shortcoming, however, has been the inability of several social networks to monetize their volumes of signed-up users. Social networking has encouraged new ways to communicate and share information. Social networking websites are being used regularly by millions of people. he main types of social networking services are those which contain category divisions (such as former school-year or classmates), means to connect with friends (usually with self-description pages) and a recommendation system linked to trust. We seek to explore the idea of routing questions in this framework of a trusted network where friends ask questions among friends only. Also we try several algorithms as routing approaches and try to analyze them in terms of the answer count that we obtain for each of these routing algorithms in the social network. In this simulation, we compare three approaches to routing questions through a social network, evaluating each approach in terms of the extent to which these questions are answered by experts. In general, we want to simulate question routing on social networks, the process by which questions are answered or passed along to other members of the social network. By evaluating these approaches, we seek to explore the increasingly popular use of question/answer systems on social networks. INTRODUCTION In this simulation, we compare three approaches to routing questions through a social network, evaluating each approach in terms of the extent to which these questions are answered by experts. In general, we want to simulate question routing on social networks, the process by which questions are answered or passed along to other members of the social network. By evaluating these approaches, we seek to explore the increasingly popular use of question/answer systems on social networks. The study of large-scale networks has emerged over the past several years as a theme that spans many disciplines, ranging from computing and information science to the social and biological sciences. Indeed, a shared interest in network structure is arguably one of the forces that is helping draw many of these disciplines closer together. As one aspect of this broader theme, we consider a convergence of ideas taking place at the boundary between distributed computer networks and human social networks the former consisting of computing devices linked by an underlying communication medium, and the latter consisting of people and organizations in society connected by ties that represent friendship, interaction, and influence. Distributed computing systems have long been intertwined with the social networks that link their user populations. Recent developments, however, have added further dimensions to this relationship: the growth of blogging, social networking services, and other forms of social media on the Internet have made large-scale social networks more transparent to the general public than ever before. We discuss three related areas that illustrate the issues at this interface. The first is centered around the small-world phenomenon the premise that most pairs of individuals in a social network are linked by very short paths (or "six degrees of separation"). In earlier work, we proposed that the social-psychology experiments providing the first empirical evidence for the phenomenon were related in fundamental ways to the problem of decentralized routing, and this theme has been pursued in a number of subsequent papers. In the process, close connections have been developed to research in the design of decentralized peer-to-peer systems,and some of the patterns suggested by the basic models of small-world networks have been borne out to a striking extent by empirical studies of social network structure.As a second area, we consider cascading behavior and the diffusion of information in networks. Rumors, fads, innovations, social movements, and diseases spread through human social networks in much the way that information propagates through a distributed system. And as with small-world networks, the analogies between the computational and social versions of these phenomena turn out to be deep rather than superficial.Communities and social networking sites, and in the analysis of information cascades among weblog. APPROACHES To test the efficacy of these approaches, we simulated question routing across a virtual social network, using ExpertRank, FriendRank, or RandomRank to decide who should receive the question. RandomRank RandomRank is the baseline for comparing the various approaches. With this approach, person P randomly selects one of her friends and routes the question to that friend. FriendRank FriendRank extends RandomRank, but passes the question to the most knowledgeable friend. FriendRank assumes that every friend knows how knowledgeable his or her friends are, and it passes the question to the most knowledgeable friend. FriendRank only maintains the local perspective of the sender. In particular, the sender has no information about his friends' friends knowledge. Expert Rank ExpertRank revises FriendRank, and passes the question to the friend with highest centrality rating, or ExpertRank, in answering and receiving questions. This approach taking into account a person's friends' friends expertise. In particular, we use the PageRank algorithm, a modification of eigenvector centrality, to assign centrality scores to each person in a question map of Q. A question map of Q, M(Q) is a directed graph of persons who have sent or received a question. Each question map is a tree, rooted on the author of the question. Since question maps are trees, there are no cycles in routing; hence, if person A has sent or received a question Q, then person A cannot receive question Q. Each person P has a single, global ExpertRank for each question Q, which relies on the sum of P's expertise at each tag T of question Q. A person's expertise at tag T is calculated by taking the union of all question maps whose questions contain T, Um(T) and then returning the centrality score for the person P. In the most robust account of expertise, we would take into account accuracy and timeliness, but for this initial analysis, ExpertRank only depends on centrality. MODELING SOCIAL NETWORK Our virtual social network is composed of Persons, who can be friends with other persons in the network, and who have knowledge. To generate our social network, we randomly created a set of persons and randomly selected friendships between those persons. We assume: 1. Friendships are randomly distributed across the people in the social network. 2. Everyone has the same # of friends. Modelling Knowledge Knowledge is modeled through the use of Tags. There is a finite dictionary of tags which represent all topics of knowledge. Each person has a knowledge score between 0 and 1 for every tag T. We assume: 1. Knowledge scores are randomly distributed across all people and tags . 2. Knowledge scores remain constant, despite exposure to questions and answers. Modelling Question Generation Questions are modelled as subsets of tags. Answers are stub objects appended to questions. Each answer can only have one question, but a question can have multiple answers. At the beginning of the simulation, we randomly generate a set of questions. To generate each question, we: 1. Randomly select a person to ask a question. 2. Randomly select a set of tags to constitute the question. We assume: 1. 2. 3. 4. 5. Each question Q has a set of tags from the finite universal dictionary D. Every tag Ti of question Q is equally relevant to any other tag Tj of Q. Tags are randomly distributed across the questions in the social network. Every question has the same # of tags. Questions are asked by a random person P in the social network, independent of that P's expertise in the tags. Modelling Question Routing And Answer Generation After we have generated our questions. We simulate the routing and answering of them. To do this, we use the concept of an active question. An active question is one which can either be routed to another person or answered. A question becomes inactive if every person who could pass or answer it has decided to answer it, pass it, or neither. Here is the general algorithm for routing and answering questions: Initially only the authors of the questions possess it, and the question is active for all authors. While(there is an active question) Select a random active question Q. Select a random person P who has the question Q. P decides whether to respond to Q If P decides to respond to Q P decides whether to answer Q or pass Q to a friend. If the decides to answer Q, P provides a random answer A for Q. If P decides to pass Q to a friend, P uses the specified routing approach (either RandomRank, FriendRank, or ExpertRank) to select a friend F P passes Q to F Make the question inactive for P. We assume: 1. A question can be routed or answered only if a person has a copy of the question. 2. A person can either route or answer a question, but not both. 3. A person can route or answer a question only once. 4. A person can only route questions to friends. 5. When deciding whether to answer or route a question authored by a friend, all friends have the same probability, FriendResponseRate 6. When deciding whether to answer or route a question authored by a non-friend, all nonfriends, have the same probability, NonFriendResponseRate 7. A person will answer a question if they have a certain level of knowledge, minKnowledgeToAnswer, about the question's tags, otherwise they will pass it. 8. The minimum knowledge to answer a question is the same for all people, and remains constant through the simulation. 9. The ranking algorithms do NOT decide whether a person will route or answer, but only decide to whom a person should route a question if the question is to be routed. MY VIRTUAL WORLD API Front-end Interactive Interface to enter the statistics for simulation: PERFORMANCE METRICS To evaluate the efficacy of each routing approach, we will consider 3 dependent variables, and 9 independent variables: Dependent Variables 1. Answer Ratio = (the number of people who answered the question) / (the # of people who received the question) 2. Pass Ratio = (the number of people who passed the question to another person) / (the # of people who received the question) 3. Average Knowledge Per Question = (for each question answered, sum of the answerers knowledge of that question) / (the # of questions) Independent Variables 1. # of Persons = size of social network = (200) 2. # of Friendships Per Person. = (5) 3. # of Tags = size of the tag dictionary = (20) 4. # of Questions = the number of questions asked = (from 10 to 100, incrementing by 10) 5. # of Tags Per Question = (1) 6. Friend Response Rate = the probability that a person will respond to a friends question by either answering the question or passing it to another friend. = (.5) 7. Non-friend Response Rate = the probability that a person will respond to a non-friends question by either answering the question or passing it to another friend. = (.5) 8. Maximum # of Passes Per Question = (10) 9. RouterApproachType = algorithm which decides which friend to pass a question to = (RandomRank, FriendRank, ExpertRank) We assume that a routing approach is effective to the extent that it maximizes the Average Knowledge Per Question. EXPERIMENTAL EVALUATION We have run the simulation We successfully designed and implemented an abstract framework for testing question routing, and used this framework to compare the performance of three approaches to routing questions, RandomRank, FriendRank, and ExpertRank. While the results of our experiment did not provide strong evidence that the algorithms have significant performance differences, we cannot yet conclude that these approaches lack significant performance differences. We believe that we may be able discern significant performance differences by tweaking the indepedent variables, and/or changing the initial topology of the question maps. We hypothesize that the near uniformity of performance across the router approach types may be due to the random distribution of knowledge across the social network. CONCLUSION We would like to revise ExpertRank, which currently only uses centrality to discern expertise. This centrality measure does not take into account the knowledge scores of friends, but rather only relies on the structure of the question maps, how people have historically passed questions. But when we test ExpertRank, we generate these question maps without reference to the knowledge scores of any of the people, but only with reference to the existing structural centrality of the question maps. This means that the structure of the question maps will not reflect the knowledge scores of any of the people, and may explain why ExpertRank performs similarly to RandomRank. To overcome this, we would like to revise ExpertRank to take into account FriendRank or some other method that uses accurate knowledge scores of friends. Also, we would like to revise and test certain assumptions, which we think would make the question routing model more realistic. For example, not everyone has the same number of friends. We would like to find out the actual distribution of friends in social networks in order to test our model. Similarly, we would like to test our model on actual distributions of tags based on the frequencies of words in the English language. And we would like to find out how our model works when we allow users to both answer a question and route it. And finally, we would like to found out the likelihood that a question is posed by an expert, so that we can more accurately model the seeding of questions in our network.