24085 >> Scott Counts: We're going to get started this morning. So today we have Aditya Pal from the University of Minnesota, where he's advised by Joe Konstan. Aditya is quite accomplished. In not even four years of graduate school already has 12 publications to his name, which are first authors from the top revenues that we recognized WWW, CIKM, ICWSM, CCSW, YWSM [phonetic] and others. And so very accomplished on the publication front. I think his research is very relevant to all of us, because he works in this space sort of social information space, so much of the information we encounter these days is embedded in social spaces, our news feed our Twitter streams, our Q&A sites and so forth. He's going to tell us about his work in this area today. So let's welcome Aditya. [applause] >> Aditya Pal: All right. Hi everyone. Thanks for joining in and thanks for the kind words. So I'll actually talk about user expertise in online domains. So the emergence of social Web has given the opportunity to billions of people across the globe to create and share content online. For example, if you look at question and answering site they allow users to exchange knowledge in the form of questions and answers. And the kind of value that you get from these sites exceeds the kind of value that you would get by going to information specialists. Blogs, micro blogs, allow users to share their thoughts and ideas. And these mediums have turned out to be extremely useful for information dissemination and mass mobilization. In fact, several of the great revolutions around the globe such as Egyptian revolt, the London riots, have actually been facilitated by these mediums. So if we look at the user-generated content online, some small facts, we see that there's more than one billion answers on Yahoo! Answers alone. There are more than two billion blogs that are posted every single day and there are more than 250 million photos on Facebook per day. So this is a huge amount of content that has been generated online. As a result, it becomes more important and more challenging to find the top experts in these systems. And we need to find these experts for several different reasons. And the reasons are, for example, we want to find high quality content for search and information retrieval. We need to find top users for recommendations. We need to find -we need to do topic simulation, and we need to do several different things such as e-commerce and wild marketing. So my focus has been on identification of these experts and online domains, especially in micro blogs and question and answering systems. So let me begin with my work in micro blogs, which is done with Scott [inaudible] while I was returning here at MSR. So in order to motivate why we need to find experts in micro blogs let us just consider a very specific example. Assume that I am trying to look for updates about hurricane, and this is the typical search result I get when I fire at Bing social. And if you look more closely at it say probably the second to it or second update, it doesn't look like highly informative about a topic hurricane. But on the other hand, if you look at the [inaudible] by association, it looks highly promising, because it looks like the nation with the association is a top authority on this topic. As a result, an interface which actually demarcates these top authorities provides a score that how authoritative this person is, can be much beneficial for end user and it would enable them to find the top content. So the prior approach to find these top experts in micro blog space has been proposed by Vin, et al. [phonetic] What they actually do is they consider a user topic distribution using dirichlet allocation. So the idea here is they try to identify how much is a person within each topic and this is a probabilistic model. And once they found out their topic distribution, then they actually compute a weighted graph of users, where the edge weight indicates how much do people have similarity over topics. And then they actually run a page rank and try to find top authorities. So there are actually several issues with this approach. For example, one of them is how is this approach handle millions of authors that are posting in the micro blog space? Because it's a graph base, it uses page rank and dirichlet allocation. It's really hard for it to scale for these millions of authors and handle this and probably in any area or manner. The second big issue with this is that for several of the topics that we look at these days, authors might not even exist prior to that topic. Say, for example, consider the topic Haiti earthquake. The author Haiti Relief Fund, didn't even exist before the hurricane actually happened. As a result we would not expect Haiti Relief to have number of followers. An algorithm which looks at how many followers you have, looks at the graph properties might not be able to surface Haiti Relief Fund. Similar case for iPhone reviews. But the bigger problem with this approach is that it is actually sensitive in surfacing authorities say, for example, [inaudible] authorities [inaudible]. So these people need not be highly topical or necessarily belong to one topic. But since they are followed by millions of people as a result most of these algorithms are sensitive to surfacing these people. So we need to avoid these celebrities and these general authorities. Last but most important we need to find these authorities in near real time, because remember that most of these topics are so fast paced that things change at a minute by minute level. And an algorithm which might take, say, days to run will not be able to keep up with the pace with which the data is coming into the system. So in order to handle these challenges, we propose a very simple approach. And this approach can be understood by the simple pipeline. The first step is feature extraction. Then we do clustering and then we do ranking. So let me actually go into a depth of these things. So in the feature extraction phase we try to extract several different features for a user such that it actually reflects the expertise of that person. And we have like 11 to 12 features that we extracted for each users. And I'll just mention a few of them. So for text features we had, say, the topic signal. So the idea here is that we try to estimate how much is this person posting within one topic versus posting in different topics. So imagine for a celebrity, the topic signal would be actually low because they would occasionally post one or two tweets or message in one topic but they need not be highly topical. Similarly for signal strength which actually tries to estimate how many original to post versus how many posts they actually take from the network and forward. So the idea here is that if for a highly topical person signal strength should be high. And similarly we have several graph features such as mention impact, which actually tries to estimate how much is this user about a topic and how much is the topic about this user. And then we have retweet impact, which actually tries to estimate how much of his tweets or her tweets has been retweeted by other people and how many unique users they have retweeted. So the idea we use this log scoring kind of thing overhead because we see -- we observed that several of the spammers had multiple accounts. So from one account they will post update. From the other account they will actually retweet it. So you will see that these people have a large number of retweets, but the number of unique users who retweet is very low. As a result this score will become zero for these people. >>: How do you distinguish a topical tweet -- what distinguishes a tweet as a topical tweet? >> Aditya Pal: Basically if that topic term occurs in the tweet, then that tweet is topical. >>: So possible hashtag uses a ->> Aditya Pal: If it appears anywhere in the tweet. It need not be in the hashtag, can appear anywhere in the text of the tweet. >>: Can you determine the list of topics? >> Aditya Pal: So you can actually search for a topic. So basically let's say you pass on hurricane. So hurricane is a search term. We look at all the tweets that contain hurricanes. I'll demonstrate what we we actually do, we do more than that. And we have network score which tries to see how many followers this person has versus how many fields this person has. And we had like a bunch of other features. For example, one of the features was similarity, how much is this person retweeting his or her own tweet or posting the same message again and again. Many times we see that the people who are trying to spare some message post an ad, they post it again and again. Their similarity would be very high within themselves and might be candidate for spammers. So once we extract these features, so we are actually in a high dimensional space. And for each user we have a bunch of features. And what we do with that is that we then propose a probabilistic clustering. We use Gaussian mixture model to cluster users. And the idea here is that we wish to partition these people in naturally-occurring clusters for a given topic. And the reason why it is done is because we would like to eliminate all the ordinary people, all the spammers, celebrities from the true set of authorities. And to do this successfully, what we do is we first pick core users that belong to a cluster with a probability greater than .9. For example, the user that I have highlighted is a core user and it belongs to the second cluster. And this allows us to discard several of the anomalies. Users who appear to be, appear in several -- they could be part of several different clusters. So we discard such people, and we hope that many of the celebrities and the spammers actually get eliminated because of these things. They might be high on some features but low on other features. And then we actually try to find out which is the right authority cluster. And this authority cluster we select based on how much topic signal on average they have, people in their cluster have, how much network impact and mention impact and so on. Now we have got a bunch of ->>: Could you go back, what are the dots in the axis? >> Aditya Pal: It's just to illustrate. Basically it's a very high dimensional. It will be going in 11 dimensional space right now. So this is just for illustration. >>: How do you think the core users -- so these are essentially [inaudible] of that cluster, right? >> Aditya Pal: Core users will be basically those that belong to that cluster with the probability .9. So basically if you use a Gaussian mixture model, it would give you likelihood of how much this person belongs to this cluster or how much it belongs to other clusters. >>: How do you initialize your priors for that? >> Aditya Pal: I'm sorry? >>: How do you initialize your priors for positive iteration? >> Aditya Pal: If Brad is uniform in this case or you can use some basic methods, K means 2 first, initialize the clusters and actually used based on that. I mean, this is because we're just trying to explore this high dimensional space. We do not know anything about it, we don't even know how many clusters it should have. So we can use something like bayesian information criteria to find out how many clusters it should have. So that side here. So the final step is ranking and so the idea in the ranking is that we wish to estimate how much importance is this person for this topic. And this part actually we use a Gaussian area distribution, and this measures the importance of the user A in comparison to other users for a feature F. And we actually take a weighted aggregate of this importance for all the features. So essentially high value of SA is actually good. And we actually saw people based on their scores. And then we picked the top people. So the dataset we use is all the tweets. We use 90 million tweets from 6/10/2010 [inaudible] and June 10, 2010, and in order to extract the topics. So we considered the keyword based extraction to get all the tweets that contain that particular key topic term. And what we do we look at how much hashtags it has. How many URLs it has and what kind of URLs it has and we also bring in all the other related tweets that may not have mentioned the same keyword but have the same hashtag or same URLs. We experimented with several different topics. And one of the popular topics during the time of data extraction was iPhone I spell welcome [phonetic]. If you look at the results of the algorithm, what we see, for example, if you look at oil spill, we see that it returns NWF as a top authority. And apart from NWF you also see Nolan Use as one of the top of authorities. NWF stands for National Federation Wildlife Foundation. And New Orleans users are new Orleans news agency. And these two agencies have been extremely active around or expert on disaster when it happened. And at the same time it also returned several of the top reported agencies such as Huffington Post, reveal those [phonetic] and Time. In a sense it is able to return a mix of relatively topic people and at the same time discard some of the celebrities who occasionally posted one or two tweets but didn't appear to be topical at all. So it's able to return a mix of reported and devoted to the topic people. But this list is still very subjective. And anyone can interpret it in a different manner. So in order to extensively evaluate how good this list of people is, we actually then consider several other models. We use a graph-based model which actually just considers the graph properties along with page rank and tries to see if it can find out the top authorities, use a text-based model and randomly selected people. And for all of these models, you pick the top 10 authors by model, and we pick the four most recent topical tweets by these people. And if you just look at the average number of followers of the users that are returned by these models, what we see is that the graph model has roughly on an average one million followers, authors that it returns, which actually indicates that most of these people are celebrities, whereas in our case it is much less than one million. So in some sense it is an indicator that ours is less sensitive to the number of followers of an author, and it is actually trying to avoid returning some of the celebrities that have posted occasionally on a topic. And to further validate, we considered two screens: One an anonymous screen to evaluate these authors, and we showed the tweets of these people and asked users to rate how interesting they find the tweets to be and how authoritative they find the user to be, and then we have an anonymous screen in which the differences that we showed the user name of the person. The main idea here is to capture how good is the content on its own versus if user name has any influence on the value of the content and the perception of the writers. And in most of the cases we found that a rating based on the graphless topical method is actually statistically significantly better than all the other methods, which actually is an indicator that we are able to find out truly relevant people, and at the same time we are able to find out people who are generating really high value content. But one important question that it leaves with us is that how much is there a bias when we actually see a name of a person versus when you do not see the name of the person. And in order to motivate this let me go back to the hurricane example. I initially said that this tweet is probably not very interesting or not very useful. And one of the reasons why I said that is because this has been posted by someone unknown. But whatever this tweet has been posted by someone very well known, Sanders and Cooper, probably perception might have changed and we might think this looks like a motivational tweet that after hurricane comes a rainbow or something. And we also had the anonymous and non-anonymous evaluations with us. So we actually tried to estimate how much of a bias creeps in because of the authority value of the person. And when we actually did this, what we found out is that in general the ratings actually decrease when author names are shown. This is not what we were expecting. We were expecting that the ratings should increase because we picked top authorities. So now we then try a deeper analysis of it. And we actually clustered authors on the anonymous ratings into good, average and bad. And good people had very high anonymous ratings, average had slightly in between, and the bad had low anonymous ratings. And now when we actually measure these effects, what we see is that when the names, for the good authors, when the names are shown, the ratings actually increase. So this probably is an indicator that their content is any way very good, the content they re producing, and at the same time the name is also reflective of some recognition by the readers. As a result, when the names are shown, there is a positive boost in the ratings, in the nonanonymous conditions. >>: How do you define good, average and bad for this? >> Aditya Pal: We actually clustered people based on what ratings they have got and clustered anonymous ratings. So the anonymous ratings trying to estimate how good the content is, irrespective of who is actually producing the content. >>: Greater than [inaudible] the content rating. >> Aditya Pal: Yes, content rating. That's anonymous one. Just on the content. And the average people didn't appear to be very bad. I mean, they're slightly bad than good. But they're not very, very bad, I would say. But when the names are shown, their ratings actually decrease significantly. So this is, again, an indicator that probably the readers are not able to recognize these people. As a result their ratings get hurt significantly even though the content is not very bad. >>: These other people, you said you showed only top authorities in this particular -- and what was the filtering? Was that number of followers? >> Aditya Pal: So it is irrelevant the data I just illustrated previously. So if one uses just a graph properties the other uses graph plus topical properties. Other just uses topical properties and the fourth one is just a random one, which picks people randomly. So we use these four models to actually then compute this thing. >>: That's not what I'm asking, I'm asking the set of people in which you -- the set -- not the subjects that look at this, but the tweeters, the tweets who you just judged this method were they already preselected to be people who were famous people and authorities, or were they just taken from all over the set? >> Aditya Pal: They were not given to me. >>: So they were just all over set. So the average tweeter is probably not very recognizable. >> Aditya Pal: Probably, yeah. So all right. For the bad authors, the name hurts a lot. So overall what this indicates is that if ->>: Can I add one more question? People for each category, good average and bad, to make it here how many each. >> Aditya Pal: So, yes, they're roughly equal. We had 40 authors. We had three -- so it's less than 15 for each of them. But they were almost like equal. So the key ->>: So the 4.2 and 4.3, the authors, was that significantly a difference? >> Aditya Pal: Yes, it was. >>: What was the range like across when you went from the anonymous -- anonymous, did everyone get a boost? Did some people go down? What was the shape of the curve after you ->> Aditya Pal: For them, I mean, all of them. So most of them went up. >>: Most ->> Aditya Pal: Yes, researchly significant. >>: When you revealed the names of the authors of the tweets, those were truthful names? >> Aditya Pal: Yeah. >>: Did you try seeing if a name that people seemed to like could be applied to a bad tweet and make it, boost its popularity to see whether it's the name versus the content? >> Aditya Pal: I think [inaudible] she has done some work on this, some extensive work on this. And they have found I think that it can be done that way. So you can positively swing if you actually show a different name on the content. >>: Especially the names that seem topically relevant, like name your spam account iPhone expert, names that are related to the topic of your tweet, people will cover your tweet in more -- well, so [inaudible]. >> Aditya Pal: Right. Anyway, I don't remember what I wanted to say here. Anyway, there were other factors of bias. For example, what does that name reflect? If that name reflects a gender say man then in this case we found that they have got higher ratings than woman names. And if the name reflects an organization such as CNN Builders they had higher ratings than a name that just reflects that this is a human being who is posting these tweets. And also we saw that topical names sometimes has increasing boost in the ratings. For example, a name like iPhone for Reviews will have a higher rating for topic iPhone versus a name that has nothing to do with iPhone. This is just name. There are other several other signals. You can show the complete about me of the person. You can show the URL and complete data about that human being. And you can actually bias and shift, boost these ratings in several different ways. Anyway, to summarize thus far, what we found is that the algorithm was able to discover more interesting and authoritative authors in general. And we saw that name value can be used to bias and positively shift the perception of the end users and there are several other signals that actually Morrison Currents [phonetic] have tried to find out. So an algorithm that can actually take all of these features into account, at the same time see how this user or just find out the expected value that this user would recognize the person in the list of authorities that it has found. You can actually positively boost the results for an end user -- from an end user perspective. And this can be actually quite useful in scenarios where you have a lot of information about the end user. You have the social graph about the person. And then you can actually use that actually to compute who is the person, authority who is closest to this guy, that the chances that he would recognize him or her is highest. In that case, you can influence the perception of the person of the end user. And last, but importantly, what we found is that the algorithm runs in near real time for large scale datasets. And it is very implementable on a MapReduce kind of framework. So it makes it quite an attractive choice to use in situations where you have humongous amount of data from reduction perspectives. So let me now move on to the concept of expertise and question and answering systems. And this is the work that I did with my advisor at University of Minnesota. So in order to motivate the expertise in question and answering system let me take a very simple example. Here a mom is concerned about her son stealing from her. So basically you see that people actually depend on question answering systems for all kind of things. This question is one serious instance. And there are several instances where people ask for medical advice, immigration help. In fact I have also done that. You would actually want to estimate the advices that you are getting are good or not. So if you look at the answers by A and B, it doesn't look like they're willing to help this person or they're very serious about the problem. But if you look at the answer by C, it looks like this person is actually quite serious in solving the problem and he might also have seen a similar situation. But other than that, we also see that this person is a top contributor as labeled by Yahoo! answer. So this is taken from Yahoo! answers and this person is a retired bill collector. So our confidence in the answer that this person has given boosts significantly. As a result it's actually quite important to identify these experts in these systems and also show demarcate these interfaces with their expertise, so that we can actually, from a perspective of an end user, the confidence in the advice actually increases. And this is actually very useful when you ask for medical advices or immigrations or some of the serious topics. So the prior approach is to find experts in these domains. They actually look at the number of answers, number of questions, number of best answers, all the direct measures. And so, for example, [inaudible] actually found out that this very simple measure is actually better than a fancy page rank kind of method or a [inaudible] kind of method in finding these experts in these systems. On the other hand, we were actually more inspired to address a question, which is why do people select questions for answering? And we formulated this concept of question selection bias and the idea here is that for a person to give an answer and a question there are several different factors. One is that person should have knowledge to answer this question, this is actually more true for experts in general, and availability of that question and so on. But one important concept that should be there is that there should be scope of contribution on the question. For example, in the previous example that the data just showed, since C already answered the question, if some other expert knew the answer, there is no point for him to answer it again. It would be considered a duplicate answer. So scope of contribution is actually quite important. So we modeled the scope of contribution. And the concept called existing value, which is when a person selects a question for answering, how much value existed on the question already? And this value is actually just data maintained on the previous answer on those questions. It's just a function of words of the previous answers and the status of those answers. The status could be it's good answer, best answer, helpful answer, so on. And we use this existing value to compute a probabilistic selection preference for a user. And here the idea is this probabilistic preference would say, okay, this person actually preferred selecting question which has low existing value on it versus this person prefers selecting question which has existing value on it. And the hypothesis here is that an expert who is more likely to select questions which have low value in them, the reason being they would try to maximize the contribution they are making and the value out of it would prefer selecting questions with lower view on them, disproportionately more than all the other users. So we used this and we had two datasets. We used Intuit dataset which is a QS service for tax related questions. We use a complete [inaudible] dataset and the good thing with this dataset it came up with golden labeling of experts. And this labeling was done by intuit developed at tax experience of the people and several other measures to find out these experts and label them. And these people actually were shown in the system also to other users also. And then we used a stack overflow dataset, which is a popular QS service for computer science questions. And it didn't come with a labeling of experts. So we use a synthetic labeling of these experts. We also tried manual labeling of these people. So if we look at the average selection bias of experts and ordinary users in this labeled dataset ->>: What is the specific labeling? >> Aditya Pal: So specific labeling we actually tried out using the file set of models that says that number of best answers, good indicator of expertise. We tried the number of best answers to label experts. >>: Baseline against which you can ->> Aditya Pal: Yes. So synthetic labeling is basically just for the robustness. Then we tried also manual labeling, sampled 100 users and then tried to label who are the experts among them using an expert person. All right. So if you look at the average selection bias for users intuit dataset, what we see is that there is a high propensity for experts on average to select low existing value question. So this is -- and again it is low for ordinary users. Ordinary users you would see that there would be even more inclined to answer a high EV question. Whereas an expert is actually less inclined to answer a high EV question. So, okay, this sounds good. But this is just on average. Can we actually use this to classify and find out who is an expert in the system. And now remember that we are just looking at how and when a person selects a question for answering. We do not even look at explicitly the contribution that person is making. So we are just looking at the this person selected this question at this instance. We do not even look at what kind of answer that person gave. So just using this existing value concept, we just tried to find out can we find experts in the system or not. Okay. And we used several classification models, and boosting, that is actually performed quite nicely for datasets and so if we look at the performance by question selection bias, we see that in general it has a high accuracy, 76 percent versus baseline, and I think this is true for both the datasets. And in general like, for example, in this case it has low recall. And in this case it actually builds a best baseline which is the prior state of art model that we had. But, on the other hand, one interesting result that we have here is that if we combine the best baseline, the best prior start-of-art model with question selection bias, we're actually able to improve the performance quite significantly in actually both conditions. And the kind of improvement that we get over the best prior state of art model is that it leads to at least 42 percent improvement. So this sounds good. So we can actually find experts then question and answering spaces. >>: What's the total number of experts you're finding? >> Aditya Pal: So in this case for the intuit dataset it's less than 100. For stack overflow it's just 2500. So probably I can go back. So 83s and this is a total number of users. And this case this is the total number of users and this is 2900. 29 experts. So yes, we can find experts in these systems, but the important question here is that can we find these experts early on, when they have joined the system, within a few weeks of their joining, can we find that yes this person has the potential of becoming expert, maybe not now, maybe sometime in the future. So can we find them right now and provide the measures to nurture and foster their contributions? So again to just motivate this, the idea is let's go back to that example. So we had this confidentcy answer. Because C looked to be a top contributor. But what if C was not a top contributor. Say this was the first or second answer that C has posted in the community. In this case you know probably I have confidence in this answer might be slightly reduced if you have to take some serious advice such as to call the [inaudible] so basically in order to find out whether this person should have an expertise or should have potential it could be quite beneficial if you could then demarcate that this person looks like it has some potential, maybe not an expert right now. Maybe sometime in the future. And there are several benefits of it such as it helps in retaining these key drivers of the community. So understand these are the key answer people in the system. And it's because of them that the answering actually survives in these systems, and it's actually quite important to retain them and actually provide them some measure so that they can be nurtured. So we said that the quality is of potential expert should have is he should have high motivation. That is willingness to help others and he should have an ability that is a quality of help that person provides. And to measure these two qualities, we had several abstract indicators such as the quantity of contribution. The frequency of contribution. And so on. And, for example, to measure ability, we just tried to measure the domain knowledge the person has. We looked at number of best answers that person is giving. And trustworthiness of the user's answer. And for politeness and clarity we had like how much typos this person is making, spelling mistakes so on and so forth. So the approach in finding these potential experts is that we looked at the label experts and we looked at their performance when they were new in the system. So basically these experts now serve as a benchmark for the new people who are joined, and we just say that if these new people match up to the potential of these experts, when they were new, then probably these people have potential of becoming experts in the future. So we model that in use of experts when they were new in the community and we find out how many new users have a similar match-up with these experts. And we use classification and ranking model. >>: When you say new, almost all the other things you showed on the previous page look like sort of attributes that you would expect someone to have been around to have. You count like how long they've been around and you counted number of best answers, number of questions answered and frequency of answering, right? >> Aditya Pal: Yes. So if we look at some of these attributes such as how many answers that person gives in a session, it need not be necessarily related to for how long this person is in the system. So say, for example, how many times this person is logging into the system or how much logging span it has. So probably even for someone who is new but highly motivated, probably these values should be higher. Of course, I agree that number of questions and answers that this person is giving would be low. But then remember we are trying to actually measure the experts when they were new in the system and not looking at their contribution as of now. >>: Can you distinguish between experts who stay and experts who leave or quit? If some experts come into the system they show off for a month then disappear for a while. Some experts come into the system and stick around for three years. Can you distinguish between the two types of experts? >> Aditya Pal: In this case not. Because we're actually looking at the first few days of their participation. So we're not looking after, say, one month what happened. We're not bucketing based on first month, second month, third month like that, we're just looking at the first few weeks of their participation and trying to estimate whether they have potential of becoming experts in the future or not. So, yeah. All right. So we called the new joinees as false positives if they match up to the potential of the experts in the system. And we actually validated these false positives using, by asking intuit to actually validate them. And what we found out -- we used classification ranking and random model to do this validation and what we found out they came back and said that almost 77 percent of the users new joinees found by the classification model have expertise. And I mean they have potential of becoming experts in the future. And then using the ranking model we found that 51 percent of the people that outside the classification model had expertise to become, have potential to become experts in the near future. Similarly for the random who had given like more than nine answers in the community, 25 percent of them had potential of becoming experts. And then a longitudinal analysis of it showed that these people, they were roughly nine to ten times more active after one year. So this is just looking at 14 days of their participation in the system, and then we were able to estimate that they had roughly nine to ten times more active. There are several of these people actually left the system altogether. So to begin with, these people had high motivation and ability and they actually continued showing that in the system, and probably some actually, some encouragement to these people can actually go a long way in improving their participation and invigorating them. And so we found these potential experts. What do we do with them? >>: So here in this case you identified some new guys, new people, and you had judges, expert judges grade them. But you do have a label set, too, you could do all the experiments as well, right? >>: In this case we don't need that. Because see these are not the people who are labeled as experts. These are the people who are not labeled as experts at all. These are the new joinees what we did is we took all the experts that are labeled and look at their participation when they were new in the community, and then we looked at these new joinees who had spent two weeks in the system and tried to see how many of the false positives can we find. So these people will be called false positives in this case. So we don't need to do cross validation in this case. >>: What I'm saying you could take the people, large number of people and their first few weeks of time, including some who are experts, hide the labels, run your algorithm on it, and see the position recall. >>: Exactly. Exactly. >>: You could do that. So these aren't separate identifications of expertise by -- >>: No, so these are just those people who the model think as experts but are not labeled as experts. We call them false positives. There are false positives and false negatives also, false negatives are the ones labeled experts but called as nonexperts by the model. So we're just looking at the false positives here. And that's also not very interesting to look at them, because, for example, what Intuit is they have several experts, labeled experts who have not given many answers. And the way they actually estimate their expertise is that they look at their prior experience. So many of these people have worked as a tax consultants we do not have that data they anonymize everything they don't give us their about me, so we don't know that. Surprisingly few of the experts gave, say, one answer and they were experts. We were wondering why these people became experts to begin with. Anyway, so the key thing is that yes we have found these potential experts. What can we do with them? We actually experiment with one socialization experiment. And the idea here is that can we actually engage them and can we actually provide them a sense of responsibility so that their participation can improve. So what we did is that they were allowed to now answer questions, free to ask questions or the prior question that had been posted on the site. And they would give answers for the same question and they would grade each of those answers. And for tax accuracy, clarity and politeness of response. And how much participation they have, they were given silver badge or a gold badge and so on. So these were actually mechanisms by which they were actually tried to see that, okay, how you can improve looking at other people's answers. Okay, this person did this mistake and judge their own ears. So what we found is that if you just look at the activity after one year, so the people who received this experiment, their activity actually doubled, almost doubled, compared to people in the control group. But the quality of the answer they gave -- it didn't increase. So but still what we found out their activity actually doubled, which is actually in some sense ->>: I was going to ask. This is an experiment you ran and intuit said we'll change our website for them. >> Aditya Pal: It's an off line thing. Intuit didn't change the data website. We invited these people and intuit contacted them and said do you wish to participate in their study and they participated in that study so this website is at our end. So we pulled in the whole dataset from intuit. And we actually had a database and we showed this to these people off line. It was integrated into the intuit system. >>: But this reading said that [inaudible] but the thing they did over the course of the year, so the subjects that you got from intuit, that they contacted, they knew your site wants to go through this kind of thing or did they continue? >> Aditya Pal: Yeah, actually, yes, they used it for a few days, it was open for a week or so. And then after that it wasn't, it's a surrogate site you can use the site as an alternative to intuit. Because the answers you posted will not be visible there. So we just ->>: Was your control group people you didn't contact or people you contacted but didn't put through this Web treatment? Is this like a self-selecting experiment where people who are more active and more likely to respond to your request? >> Aditya Pal: So the control group in this case would be those people who actually were found by the classification, the ranking model, but we did not contact them. Or who did not respond to the invitation to participate in it. >>: The latter group has been problematic. That's the difference. >> Yeah, was there any kind of way you could reason about whether people were more active because they're the kind of people who responded to, who are willing to take the time to respond to your ->> Aditya Pal: It could be the case. But then we also did an off line evaluation of the whole experiment that we did. And many people actually gave us a feedback that they found it to be quite useful. Initially what they found out that this gives me a sense of responsibility in the system. I was like, okay, I'm giving an answer because I don't have anything good to do right now but now they're like this seems like a serious business going on, and I think I should be more responsible in providing these answers. So we saw that some sense of responsibility got instilled in those people. But how scientific or -- it's hard to validate that. >>: Was there a change in their activity over time. >> Aditya Pal: We haven't looked at that. So change in activity, we just looked after one year, and then saw this. Because it's a tax-based system and it happens every one year. So we have to wait for this tax cycle to find out what actually, how much it will increase or improve. >>: Can you explain again where the one-year comes in, because it sounded like people just did the experiment, it's not like they were using the system continuously. >> Aditya Pal: Also by one year, one year on the original site. Intuit website. >>: So looking at one year's worth of data. >> Aditya Pal: No, no, after one year. So we have to wait for one year because of the tax cycle happens yearly. And it's only when the tax cycle happens people ask questions on the site. So otherwise for the rest of the time period, the site is probably not going that much. So there is less traffic on the website for, say, nine or ten months, so there is this traffic within the first four months and then the next year. >>: So but the participants in the study weren't using the website throughout the entire year. They just ->> Aditya Pal: Because that will not happen because there is no volume of questions on the site. >>: Were they using it throughout the two or three months of tax season. >> Aditya Pal: Yes, exactly. >>: So it wasn't just one time. >> Aditya Pal: Yeah. >>: I wanted to ask exactly when did they use the system? Was it after April 15th of the previous year like after tax season is over now use our experimental system and rate yourself, or was it before April 15th, like when the site was currently active? >> Aditya Pal: It was before I think I would say let me say it was around, I think, February or something. >>: So right before. >> Aditya Pal: Yes, right before that. >>: That's probably a good time to do it. >>: So for that February to April of that year, that's not this data. You have February -the next year. >> Aditya Pal: Next year. >>: Wait. So if they used your system like in February, then you're not measuring their change for that year but for the year afterwards? >> Aditya Pal: Yes, because it got mixed up with it. So because when you do it in February, the volume already starts happening. So basically people start asking questions in February around that time. >>: Can you imagine it would affect that year's -- is it differentially -- let's call the year in question the experimental year. Can the experimental year plus one, plus or minus one, are you looking at experimental year plus experimental year plus one? >> Aditya Pal: Experimental year plus one. >>: Even experimental year is going to be affected to some degree because you're doing this in the beginning of that. >> Aditya Pal: But one strong motivation to do it plus one is that we wanted to see how many people actually return back, come back. >>: It seems like plus one, maybe you should look at experimental year minus one, versus experimental year plus one. >> Aditya Pal: The problem with experimental year minus one many of these people didn't even exist because we're just looking for the new joinees so these people are new in that system. They've just opened their accounts 14 days old or something. Something of that kind. So minus experimentally or minus one would be zero for these people. All right. So I think I can skip this a little bit. So in order to conclude this file, what we found out is that experts can be identified in these online spaces, and I showed we can do this in Q&A and we can do this in micro blogs, and we can find these people in near real time for large scale datasets. More importantly we can find these users who have expertise quite early on. I also showed with one very simple measure, one very simple socialization experiment, the participation can be improved and they can be nurtured and can be retained in the community. And this is actually the key focus for me going forward also. And I wish to understand the modern information systems that we have, and since users are at the core of these systems, so I wish to understand what are their intrinsic properties, what biases they exhibit, what kind of selection preferences these people have, what roles do they play in the community, what kind of facets they have, how do they evolve over time. And at the same time understand their social dynamics such as how do they interact with one another. How do they influence each other? And the biggest challenge that we have with us is that we have a huge amount of data that we have to handle. And at the same time the diversity in the kind of people we have is increasing, and it makes the user modeling a very hard task. We have people from different geographic locations, different languages so organization different mentalities and several other issues, and one more important challenge is that there's a cross-domain synergy. Nowadays we have people have multiple accounts. So they have an account on a micro blog site. They have it on social networks. As a result anything they do in different domains links these domains together. So given this online footprint of a user how do we actually understand and model the whole data? And this actually adds up for the complexity of the whole problem. And with this I have a few concrete goals listed over here. So what I wish to do in future is I wish to build tools and algorithms that would enable in identifying users who fill in a specific role within a community and out across different communities. And the kind of roles that have so far looked at is expertise. But a user can play several different roles. User can be evangelist. User can be just an ordinary contributor, random surfer and so on. And I wish to build these tools and mechanisms so these people can be identified quite effectively and early on. And we can actually understand and adapt according, adapt interfaces based on their taste, based on their style, based on what they actually desire from the community. And so in a sense just trying to improve the Web experience for these people. And one additional goal that I have is that I wish to improve the social search experience for these people. So we have like huge amount of user generated content with us, but still we do not see that in the search space how well we are able to integrate with the search. It's not. That integration is not there. And the big challenge here is how do we find really high quality user generated content and infuse it with the search engines we have of now. And one interrelated aspect of it is that we have like complete footprint of a user. We have like user graph and we have like user activity online in several different domains. So can we use this to model the intent of a user better and actually improve the retrieval for the search engines and improve the relevancy of the documents that we can retrieve for these people, and at the same time provide a unifying experience for the person. So we have from a search engine standpoint we have moved from the blue links to this much feature page. We now have images on this page. We have videos. We have weather information and so on. So whether the search page looks like this, it's fused in this manner or in an aggregated search manner in which we put modules together. Or in some other fashion. The kind of challenges that we have to handle are that user is researching from different devices, how do we adapt these for different devices and how do we do it based on geography, demographics and intent and several other features. And this would be the key focus for me in the future. And so to summarize so far in this talk I talked about my work in expertise discovery, but I have worked in online domains using machine learning and data mining to solve several other interesting problems, such as blog recommendation in blogs and integrating into a mission and on the Web from underlevel sources, and I wish to actually work in doing machine learning and data mining in these interesting domains in the near future. And, finally, I'll acknowledge my advisor Joseph Konstan and, most importantly, Scott for giving me an opportunity to intern here and again for hosting me here. And with this, thanks a lot. [applause] Do you guys have any questions? >>: Going back to some of your stack overflow notes. Stack overflow and serve overflow when you're doing, are very heavily gamefied already. So as a user you're not allowed to respond to questions until you've answered, until you've developed some degree of reputation. The place where you put your answers varies depending on your degree of reputation and badges and all sorts of things. So your measurements of how independent were independent variables I guess is what I'm asking. In a less gamefied system would that have affected the way that ->> Aditya Pal: In this case it's a highly gated system. So what we see is that many of these people get scared in the waiting. I've done some work on it. And we just looked at how people evolve over time. So in that we saw that when the stack overflow actually got announced, when it got released, during that time it was not that kind of system. And now over time what we see is that the ordinary users are actually scared in giving answers and there are actually several different questions that are posted over there on the site which says, okay, there's an expert here, I'm scared of this guy, because he's answering everything in my topic. What do I do? Should I answer or not? I look stupid or something. I mean, this is an open ended question and probably a good answer to this would be that what I thought of it is that a system which allows you to make an anonymous contribution. Probably could be beneficial. And if many people think that your contribution is actually good, probably you can then reveal your name. So in this case it would benefit a new joinee from, save them from embarrassment from giving a bad answer or something. And probably it can help them in the longer run in boosting their confidence and so on. That's one way to look at it, yeah. Did I answer your question to some extent? >>: Part of what I'm asking is as you're measuring with the development of experts in the system, were you able to take it into account at your say tenth, at your tenth answer that gets at least one up vote you get a badge. So I might expect a tick in the count of right number of people who say get to that tenth and then maybe a drop-off in the number of people who bothered to do an 11th. So your curve of number of answers isn't quite smooth. It's got some funny bumps in it because of the way the reward system works. >>: If we haven't looked at that per se, but that would be actually interesting to look at. And I expect, yeah, when you said there would be a bump off, obviously, that would be the case. I think similar studies, some studies have found on Yahoo! Answers the way people become top contributors, if you look at the overall distribution if you look at it it's power log of N distribution it fades off. So people who reach level five expertise are very few people who reach level four are slightly more. So I don't know. >>: Shows net research on eBay and finds the same thing threshold. >> Aditya Pal: Probably. >>: Star changes are special where people change their behavior. >> Aditya Pal: Probably. I think something similar is there in Wikipedia also. As you grow more experience and you actually own a page, you do not welcome any edits on that page, too. So I don't know. >>: I wanted to ask you if you could go a bit more in depth into some of the questions on the slide about modern information systems, the one right after conclusions. In particular, the context of expertise and authorities, I was wondering if you could say a little bit about across the systems that you've studied how much about expertise in your definition of expertise is this an intrinsic property of the user and how much of it is about the role that someone plays in the community. And how much of it also I guess to add on to this is it about what the person was asking the question is looking for. How do you define expertise across these kind of -- like how is -- >>: So, for example, if I just talk about intrinsic properties, so the thing called question selection bias, that's actually in some sense an interesting property of a person. So basically you're just trying to maximize a reward that you get for the effort that you are putting. And this actually varies from one individual to another. So as a result of selection bias is an interesting property. We're not even looking at what is actual contribution, how valued that contribution is, so in some sense this intrinsic property is actually an indicator of in this case is an indicator of expertise of the person. Now, if you look at social the dynamics, the interaction, when you actually give an answer in a system, many people like your answer. They actually upvote it. So that actually is in some sense an influencing factor and also is an indicator of how expert you could be. Similarly, for micro log. So if you posted a tweet, someone retweets, it gives you a boost that at least few people in the network are posting some tweets. Probably you actually would get more encourage for small tweets. But in any case that's an indicator for your expertise for that given topic as per the judgment of other people in that community. So I guess did I miss any point anywhere? So ->>: I guess the last third of my question was about how is -- when you say I'm looking for an expert, how much of that is the kind of expert that I'm looking for, how much of it is what the person is asking, like why I'm asking, if I want to find the best results, for example, micro blog best answers to a question or if I want to find the experts that are going to influence, advertiser. >> Aditya Pal: Exactly. >>: How is the definition change? >> Aditya Pal: It actually totally depends on the objective that we're trying to maximize. So say if the objective is to find a person who knows this topic. So in some sense this comes close to the actual true definition of the expertise in the conventional domains we look at. And probably the most direct measures we'd be looking at how many best answers this person has given or how many best answers over a number of answers this person has given. For example, if I assume that Albert Einstein is answering something, probably all his answers should be best answers, right? So in this case this ratio is actually important. But in cases where you have to find interesting content in that case you're not looking for expertise. This person might be wrong. He may be making jokes about it, but people are finding it interesting. For example, in oil spill we found out that there was one, some fake account was there who was posting actually humorous messages. And many people were liking it even though that person was posting fake news but many people liked that. That person didn't have domain expertise but turned out to be extremely interesting person. So in this case basically now you would actually measure something more extrinsic which is basically how many people are liking this person's tweet. And so basically different dimensions actually convey different meanings. And actually depends on what actually we are trying to solve. So if, for example, influence, if you have to measure influence then we just look at say what is the likelihood that my influence would lead to some positive outcome in my network versus someone else. Basically if you say we have to build a business strategy then we have to see what are the core set of people I should pick so adoption is maximized in some sense. Then I would look for the most influential people such that the outbreak of that influence would lead to a mass adoption. So basically different strategies have to be deployed for different scenarios. >>: So Microsoft has several different kind of social question and answer type forums like social.amnesty and Microsoft.com, how would you fix those sites to increase participation and increase the quality of the answers or given what you've learned about some of these other systems, like what should Microsoft do to change their systems? >> Aditya Pal: So, for example, for MSDN, is MSDN allows people to answer? No. >>: There's like a whole question and answer section. >> Aditya Pal: I haven't seen that. Anyway, so probably one way to look at where it is not doing it right. So we have to first understand, okay, it is actually failing in this way or the other way. So first understand what are the key variables, because it is failing and try to fix them. One very simple way to fix it is basically advertise more about it. Probably give some incentive to people. And maybe if you turn out to be a good contributor, maybe you'll get a badge or some goody from Microsoft or something of that kind of thing. You might get one year free subscription if you give lots of good answers, something of like that kind, can be taken up. Probably other than that I'm not sure the interface is not very nice and probably the interface could be improved and maybe you could provide users with different applications such as application on a phone or something so that they can participate more vigorously. And also looking at what are the needs of the people. So, for example, I imagine for MSDN, maybe a tighter integration with Visual Studio might be beneficial. I'm not sure if it is there or not. But integration with Visual Studio, where now you can actually post an answer if you don't find anything good you can fix things there. So something of that kind can be used and there are several other, some small, small steps that can be taken to improve it. But first we are trying to understand why is it not very good or not. So, yeah, that has to be the first step. [applause]