24085 >> Scott Counts: We're going to get started this... from the University of Minnesota, where he's advised by Joe...

advertisement
24085
>> Scott Counts: We're going to get started this morning. So today we have Aditya Pal
from the University of Minnesota, where he's advised by Joe Konstan.
Aditya is quite accomplished. In not even four years of graduate school already has 12
publications to his name, which are first authors from the top revenues that we
recognized WWW, CIKM, ICWSM, CCSW, YWSM [phonetic] and others. And so very
accomplished on the publication front.
I think his research is very relevant to all of us, because he works in this space sort of
social information space, so much of the information we encounter these days is
embedded in social spaces, our news feed our Twitter streams, our Q&A sites and so
forth.
He's going to tell us about his work in this area today. So let's welcome Aditya.
[applause]
>> Aditya Pal: All right. Hi everyone. Thanks for joining in and thanks for the kind
words. So I'll actually talk about user expertise in online domains.
So the emergence of social Web has given the opportunity to billions of people across
the globe to create and share content online. For example, if you look at question and
answering site they allow users to exchange knowledge in the form of questions and
answers.
And the kind of value that you get from these sites exceeds the kind of value that you
would get by going to information specialists.
Blogs, micro blogs, allow users to share their thoughts and ideas. And these mediums
have turned out to be extremely useful for information dissemination and mass
mobilization.
In fact, several of the great revolutions around the globe such as Egyptian revolt, the
London riots, have actually been facilitated by these mediums.
So if we look at the user-generated content online, some small facts, we see that there's
more than one billion answers on Yahoo! Answers alone. There are more than two billion
blogs that are posted every single day and there are more than 250 million photos on
Facebook per day. So this is a huge amount of content that has been generated online.
As a result, it becomes more important and more challenging to find the top experts in
these systems. And we need to find these experts for several different reasons.
And the reasons are, for example, we want to find high quality content for search and
information retrieval. We need to find top users for recommendations. We need to find -we need to do topic simulation, and we need to do several different things such as
e-commerce and wild marketing. So my focus has been on identification of these experts
and online domains, especially in micro blogs and question and answering systems.
So let me begin with my work in micro blogs, which is done with Scott [inaudible] while I
was returning here at MSR. So in order to motivate why we need to find experts in micro
blogs let us just consider a very specific example.
Assume that I am trying to look for updates about hurricane, and this is the typical search
result I get when I fire at Bing social. And if you look more closely at it say probably the
second to it or second update, it doesn't look like highly informative about a topic
hurricane.
But on the other hand, if you look at the [inaudible] by association, it looks highly
promising, because it looks like the nation with the association is a top authority on this
topic.
As a result, an interface which actually demarcates these top authorities provides a score
that how authoritative this person is, can be much beneficial for end user and it would
enable them to find the top content.
So the prior approach to find these top experts in micro blog space has been proposed
by Vin, et al. [phonetic] What they actually do is they consider a user topic distribution
using dirichlet allocation. So the idea here is they try to identify how much is a person
within each topic and this is a probabilistic model.
And once they found out their topic distribution, then they actually compute a weighted
graph of users, where the edge weight indicates how much do people have similarity over
topics.
And then they actually run a page rank and try to find top authorities. So there are
actually several issues with this approach. For example, one of them is how is this
approach handle millions of authors that are posting in the micro blog space?
Because it's a graph base, it uses page rank and dirichlet allocation. It's really hard for it
to scale for these millions of authors and handle this and probably in any area or manner.
The second big issue with this is that for several of the topics that we look at these days,
authors might not even exist prior to that topic.
Say, for example, consider the topic Haiti earthquake. The author Haiti Relief Fund,
didn't even exist before the hurricane actually happened. As a result we would not
expect Haiti Relief to have number of followers. An algorithm which looks at how many
followers you have, looks at the graph properties might not be able to surface Haiti Relief
Fund. Similar case for iPhone reviews.
But the bigger problem with this approach is that it is actually sensitive in surfacing
authorities say, for example, [inaudible] authorities [inaudible]. So these people need not
be highly topical or necessarily belong to one topic. But since they are followed by
millions of people as a result most of these algorithms are sensitive to surfacing these
people.
So we need to avoid these celebrities and these general authorities. Last but most
important we need to find these authorities in near real time, because remember that
most of these topics are so fast paced that things change at a minute by minute level.
And an algorithm which might take, say, days to run will not be able to keep up with the
pace with which the data is coming into the system.
So in order to handle these challenges, we propose a very simple approach. And this
approach can be understood by the simple pipeline. The first step is feature extraction.
Then we do clustering and then we do ranking.
So let me actually go into a depth of these things. So in the feature extraction phase we
try to extract several different features for a user such that it actually reflects the expertise
of that person.
And we have like 11 to 12 features that we extracted for each users. And I'll just mention
a few of them. So for text features we had, say, the topic signal. So the idea here is that
we try to estimate how much is this person posting within one topic versus posting in
different topics.
So imagine for a celebrity, the topic signal would be actually low because they would
occasionally post one or two tweets or message in one topic but they need not be highly
topical.
Similarly for signal strength which actually tries to estimate how many original to post
versus how many posts they actually take from the network and forward.
So the idea here is that if for a highly topical person signal strength should be high. And
similarly we have several graph features such as mention impact, which actually tries to
estimate how much is this user about a topic and how much is the topic about this user.
And then we have retweet impact, which actually tries to estimate how much of his tweets
or her tweets has been retweeted by other people and how many unique users they have
retweeted.
So the idea we use this log scoring kind of thing overhead because we see -- we
observed that several of the spammers had multiple accounts.
So from one account they will post update. From the other account they will actually
retweet it. So you will see that these people have a large number of retweets, but the
number of unique users who retweet is very low. As a result this score will become zero
for these people.
>>: How do you distinguish a topical tweet -- what distinguishes a tweet as a topical
tweet?
>> Aditya Pal: Basically if that topic term occurs in the tweet, then that tweet is topical.
>>: So possible hashtag uses a ->> Aditya Pal: If it appears anywhere in the tweet. It need not be in the hashtag, can
appear anywhere in the text of the tweet.
>>: Can you determine the list of topics?
>> Aditya Pal: So you can actually search for a topic. So basically let's say you pass on
hurricane. So hurricane is a search term. We look at all the tweets that contain
hurricanes. I'll demonstrate what we we actually do, we do more than that.
And we have network score which tries to see how many followers this person has
versus how many fields this person has.
And we had like a bunch of other features. For example, one of the features was
similarity, how much is this person retweeting his or her own tweet or posting the same
message again and again. Many times we see that the people who are trying to spare
some message post an ad, they post it again and again. Their similarity would be very
high within themselves and might be candidate for spammers.
So once we extract these features, so we are actually in a high dimensional space. And
for each user we have a bunch of features. And what we do with that is that we then
propose a probabilistic clustering. We use Gaussian mixture model to cluster users. And
the idea here is that we wish to partition these people in naturally-occurring clusters for a
given topic.
And the reason why it is done is because we would like to eliminate all the ordinary
people, all the spammers, celebrities from the true set of authorities.
And to do this successfully, what we do is we first pick core users that belong to a cluster
with a probability greater than .9. For example, the user that I have highlighted is a core
user and it belongs to the second cluster.
And this allows us to discard several of the anomalies. Users who appear to be, appear
in several -- they could be part of several different clusters. So we discard such people,
and we hope that many of the celebrities and the spammers actually get eliminated
because of these things. They might be high on some features but low on other features.
And then we actually try to find out which is the right authority cluster. And this authority
cluster we select based on how much topic signal on average they have, people in their
cluster have, how much network impact and mention impact and so on.
Now we have got a bunch of ->>: Could you go back, what are the dots in the axis?
>> Aditya Pal: It's just to illustrate. Basically it's a very high dimensional. It will be going
in 11 dimensional space right now. So this is just for illustration.
>>: How do you think the core users -- so these are essentially [inaudible] of that cluster,
right?
>> Aditya Pal: Core users will be basically those that belong to that cluster with the
probability .9. So basically if you use a Gaussian mixture model, it would give you
likelihood of how much this person belongs to this cluster or how much it belongs to other
clusters.
>>: How do you initialize your priors for that?
>> Aditya Pal: I'm sorry?
>>: How do you initialize your priors for positive iteration?
>> Aditya Pal: If Brad is uniform in this case or you can use some basic methods, K
means 2 first, initialize the clusters and actually used based on that. I mean, this is
because we're just trying to explore this high dimensional space. We do not know
anything about it, we don't even know how many clusters it should have. So we can use
something like bayesian information criteria to find out how many clusters it should have.
So that side here. So the final step is ranking and so the idea in the ranking is that we
wish to estimate how much importance is this person for this topic. And this part actually
we use a Gaussian area distribution, and this measures the importance of the user A in
comparison to other users for a feature F.
And we actually take a weighted aggregate of this importance for all the features. So
essentially high value of SA is actually good. And we actually saw people based on their
scores. And then we picked the top people.
So the dataset we use is all the tweets. We use 90 million tweets from 6/10/2010
[inaudible] and June 10, 2010, and in order to extract the topics. So we considered the
keyword based extraction to get all the tweets that contain that particular key topic term.
And what we do we look at how much hashtags it has. How many URLs it has and what
kind of URLs it has and we also bring in all the other related tweets that may not have
mentioned the same keyword but have the same hashtag or same URLs.
We experimented with several different topics. And one of the popular topics during the
time of data extraction was iPhone I spell welcome [phonetic].
If you look at the results of the algorithm, what we see, for example, if you look at oil spill,
we see that it returns NWF as a top authority. And apart from NWF you also see Nolan
Use as one of the top of authorities. NWF stands for National Federation Wildlife
Foundation. And New Orleans users are new Orleans news agency. And these two
agencies have been extremely active around or expert on disaster when it happened.
And at the same time it also returned several of the top reported agencies such as
Huffington Post, reveal those [phonetic] and Time. In a sense it is able to return a mix of
relatively topic people and at the same time discard some of the celebrities who
occasionally posted one or two tweets but didn't appear to be topical at all.
So it's able to return a mix of reported and devoted to the topic people. But this list is still
very subjective. And anyone can interpret it in a different manner.
So in order to extensively evaluate how good this list of people is, we actually then
consider several other models. We use a graph-based model which actually just
considers the graph properties along with page rank and tries to see if it can find out the
top authorities, use a text-based model and randomly selected people.
And for all of these models, you pick the top 10 authors by model, and we pick the four
most recent topical tweets by these people.
And if you just look at the average number of followers of the users that are returned by
these models, what we see is that the graph model has roughly on an average one
million followers, authors that it returns, which actually indicates that most of these people
are celebrities, whereas in our case it is much less than one million.
So in some sense it is an indicator that ours is less sensitive to the number of followers of
an author, and it is actually trying to avoid returning some of the celebrities that have
posted occasionally on a topic.
And to further validate, we considered two screens: One an anonymous screen to
evaluate these authors, and we showed the tweets of these people and asked users to
rate how interesting they find the tweets to be and how authoritative they find the user to
be, and then we have an anonymous screen in which the differences that we showed the
user name of the person. The main idea here is to capture how good is the content on its
own versus if user name has any influence on the value of the content and the perception
of the writers.
And in most of the cases we found that a rating based on the graphless topical method is
actually statistically significantly better than all the other methods, which actually is an
indicator that we are able to find out truly relevant people, and at the same time we are
able to find out people who are generating really high value content.
But one important question that it leaves with us is that how much is there a bias when
we actually see a name of a person versus when you do not see the name of the person.
And in order to motivate this let me go back to the hurricane example. I initially said that
this tweet is probably not very interesting or not very useful.
And one of the reasons why I said that is because this has been posted by someone
unknown. But whatever this tweet has been posted by someone very well known,
Sanders and Cooper, probably perception might have changed and we might think this
looks like a motivational tweet that after hurricane comes a rainbow or something.
And we also had the anonymous and non-anonymous evaluations with us. So we
actually tried to estimate how much of a bias creeps in because of the authority value of
the person. And when we actually did this, what we found out is that in general the
ratings actually decrease when author names are shown.
This is not what we were expecting. We were expecting that the ratings should increase
because we picked top authorities. So now we then try a deeper analysis of it. And we
actually clustered authors on the anonymous ratings into good, average and bad. And
good people had very high anonymous ratings, average had slightly in between, and the
bad had low anonymous ratings.
And now when we actually measure these effects, what we see is that when the names,
for the good authors, when the names are shown, the ratings actually increase. So this
probably is an indicator that their content is any way very good, the content they re
producing, and at the same time the name is also reflective of some recognition by the
readers. As a result, when the names are shown, there is a positive boost in the ratings,
in the nonanonymous conditions.
>>: How do you define good, average and bad for this?
>> Aditya Pal: We actually clustered people based on what ratings they have got and
clustered anonymous ratings. So the anonymous ratings trying to estimate how good the
content is, irrespective of who is actually producing the content.
>>: Greater than [inaudible] the content rating.
>> Aditya Pal: Yes, content rating. That's anonymous one. Just on the content. And
the average people didn't appear to be very bad. I mean, they're slightly bad than good.
But they're not very, very bad, I would say.
But when the names are shown, their ratings actually decrease significantly. So this is,
again, an indicator that probably the readers are not able to recognize these people. As
a result their ratings get hurt significantly even though the content is not very bad.
>>: These other people, you said you showed only top authorities in this particular -- and
what was the filtering? Was that number of followers?
>> Aditya Pal: So it is irrelevant the data I just illustrated previously. So if one uses just
a graph properties the other uses graph plus topical properties. Other just uses topical
properties and the fourth one is just a random one, which picks people randomly.
So we use these four models to actually then compute this thing.
>>: That's not what I'm asking, I'm asking the set of people in which you -- the set -- not
the subjects that look at this, but the tweeters, the tweets who you just judged this
method were they already preselected to be people who were famous people and
authorities, or were they just taken from all over the set?
>> Aditya Pal: They were not given to me.
>>: So they were just all over set. So the average tweeter is probably not very
recognizable.
>> Aditya Pal: Probably, yeah.
So all right. For the bad authors, the name hurts a lot. So overall what this indicates is
that if ->>: Can I add one more question? People for each category, good average and bad, to
make it here how many each.
>> Aditya Pal: So, yes, they're roughly equal. We had 40 authors. We had three -- so
it's less than 15 for each of them. But they were almost like equal. So the key ->>: So the 4.2 and 4.3, the authors, was that significantly a difference?
>> Aditya Pal: Yes, it was.
>>: What was the range like across when you went from the anonymous -- anonymous,
did everyone get a boost? Did some people go down? What was the shape of the curve
after you ->> Aditya Pal: For them, I mean, all of them. So most of them went up.
>>: Most ->> Aditya Pal: Yes, researchly significant.
>>: When you revealed the names of the authors of the tweets, those were truthful
names?
>> Aditya Pal: Yeah.
>>: Did you try seeing if a name that people seemed to like could be applied to a bad
tweet and make it, boost its popularity to see whether it's the name versus the content?
>> Aditya Pal: I think [inaudible] she has done some work on this, some extensive work
on this. And they have found I think that it can be done that way.
So you can positively swing if you actually show a different name on the content.
>>: Especially the names that seem topically relevant, like name your spam account
iPhone expert, names that are related to the topic of your tweet, people will cover your
tweet in more -- well, so [inaudible].
>> Aditya Pal: Right. Anyway, I don't remember what I wanted to say here. Anyway,
there were other factors of bias. For example, what does that name reflect? If that name
reflects a gender say man then in this case we found that they have got higher ratings
than woman names.
And if the name reflects an organization such as CNN Builders they had higher ratings
than a name that just reflects that this is a human being who is posting these tweets.
And also we saw that topical names sometimes has increasing boost in the ratings. For
example, a name like iPhone for Reviews will have a higher rating for topic iPhone versus
a name that has nothing to do with iPhone. This is just name. There are other several
other signals. You can show the complete about me of the person. You can show the
URL and complete data about that human being.
And you can actually bias and shift, boost these ratings in several different ways.
Anyway, to summarize thus far, what we found is that the algorithm was able to discover
more interesting and authoritative authors in general.
And we saw that name value can be used to bias and positively shift the perception of the
end users and there are several other signals that actually Morrison Currents [phonetic]
have tried to find out. So an algorithm that can actually take all of these features into
account, at the same time see how this user or just find out the expected value that this
user would recognize the person in the list of authorities that it has found.
You can actually positively boost the results for an end user -- from an end user
perspective. And this can be actually quite useful in scenarios where you have a lot of
information about the end user. You have the social graph about the person. And then
you can actually use that actually to compute who is the person, authority who is closest
to this guy, that the chances that he would recognize him or her is highest.
In that case, you can influence the perception of the person of the end user. And last, but
importantly, what we found is that the algorithm runs in near real time for large scale
datasets. And it is very implementable on a MapReduce kind of framework. So it makes
it quite an attractive choice to use in situations where you have humongous amount of
data from reduction perspectives.
So let me now move on to the concept of expertise and question and answering systems.
And this is the work that I did with my advisor at University of Minnesota.
So in order to motivate the expertise in question and answering system let me take a very
simple example. Here a mom is concerned about her son stealing from her. So basically
you see that people actually depend on question answering systems for all kind of things.
This question is one serious instance. And there are several instances where people ask
for medical advice, immigration help. In fact I have also done that. You would actually
want to estimate the advices that you are getting are good or not.
So if you look at the answers by A and B, it doesn't look like they're willing to help this
person or they're very serious about the problem. But if you look at the answer by C, it
looks like this person is actually quite serious in solving the problem and he might also
have seen a similar situation.
But other than that, we also see that this person is a top contributor as labeled by Yahoo!
answer. So this is taken from Yahoo! answers and this person is a retired bill collector.
So our confidence in the answer that this person has given boosts significantly.
As a result it's actually quite important to identify these experts in these systems and also
show demarcate these interfaces with their expertise, so that we can actually, from a
perspective of an end user, the confidence in the advice actually increases.
And this is actually very useful when you ask for medical advices or immigrations or
some of the serious topics. So the prior approach is to find experts in these domains.
They actually look at the number of answers, number of questions, number of best
answers, all the direct measures.
And so, for example, [inaudible] actually found out that this very simple measure is
actually better than a fancy page rank kind of method or a [inaudible] kind of method in
finding these experts in these systems.
On the other hand, we were actually more inspired to address a question, which is why
do people select questions for answering? And we formulated this concept of question
selection bias and the idea here is that for a person to give an answer and a question
there are several different factors. One is that person should have knowledge to answer
this question, this is actually more true for experts in general, and availability of that
question and so on.
But one important concept that should be there is that there should be scope of
contribution on the question. For example, in the previous example that the data just
showed, since C already answered the question, if some other expert knew the answer,
there is no point for him to answer it again. It would be considered a duplicate answer.
So scope of contribution is actually quite important.
So we modeled the scope of contribution. And the concept called existing value, which is
when a person selects a question for answering, how much value existed on the question
already?
And this value is actually just data maintained on the previous answer on those
questions. It's just a function of words of the previous answers and the status of those
answers. The status could be it's good answer, best answer, helpful answer, so on.
And we use this existing value to compute a probabilistic selection preference for a user.
And here the idea is this probabilistic preference would say, okay, this person actually
preferred selecting question which has low existing value on it versus this person prefers
selecting question which has existing value on it.
And the hypothesis here is that an expert who is more likely to select questions which
have low value in them, the reason being they would try to maximize the contribution they
are making and the value out of it would prefer selecting questions with lower view on
them, disproportionately more than all the other users.
So we used this and we had two datasets. We used Intuit dataset which is a QS service
for tax related questions. We use a complete [inaudible] dataset and the good thing with
this dataset it came up with golden labeling of experts.
And this labeling was done by intuit developed at tax experience of the people and
several other measures to find out these experts and label them. And these people
actually were shown in the system also to other users also.
And then we used a stack overflow dataset, which is a popular QS service for computer
science questions. And it didn't come with a labeling of experts. So we use a synthetic
labeling of these experts. We also tried manual labeling of these people.
So if we look at the average selection bias of experts and ordinary users in this labeled
dataset ->>: What is the specific labeling?
>> Aditya Pal: So specific labeling we actually tried out using the file set of models that
says that number of best answers, good indicator of expertise. We tried the number of
best answers to label experts.
>>: Baseline against which you can ->> Aditya Pal: Yes. So synthetic labeling is basically just for the robustness. Then we
tried also manual labeling, sampled 100 users and then tried to label who are the experts
among them using an expert person.
All right. So if you look at the average selection bias for users intuit dataset, what we see
is that there is a high propensity for experts on average to select low existing value
question. So this is -- and again it is low for ordinary users. Ordinary users you would
see that there would be even more inclined to answer a high EV question. Whereas an
expert is actually less inclined to answer a high EV question.
So, okay, this sounds good. But this is just on average. Can we actually use this to
classify and find out who is an expert in the system. And now remember that we are just
looking at how and when a person selects a question for answering.
We do not even look at explicitly the contribution that person is making. So we are just
looking at the this person selected this question at this instance. We do not even look at
what kind of answer that person gave.
So just using this existing value concept, we just tried to find out can we find experts in
the system or not. Okay. And we used several classification models, and boosting, that
is actually performed quite nicely for datasets and so if we look at the performance by
question selection bias, we see that in general it has a high accuracy, 76 percent versus
baseline, and I think this is true for both the datasets.
And in general like, for example, in this case it has low recall. And in this case it actually
builds a best baseline which is the prior state of art model that we had.
But, on the other hand, one interesting result that we have here is that if we combine the
best baseline, the best prior start-of-art model with question selection bias, we're actually
able to improve the performance quite significantly in actually both conditions.
And the kind of improvement that we get over the best prior state of art model is that it
leads to at least 42 percent improvement. So this sounds good. So we can actually find
experts then question and answering spaces.
>>: What's the total number of experts you're finding?
>> Aditya Pal: So in this case for the intuit dataset it's less than 100. For stack overflow
it's just 2500. So probably I can go back. So 83s and this is a total number of users.
And this case this is the total number of users and this is 2900. 29 experts.
So yes, we can find experts in these systems, but the important question here is that can
we find these experts early on, when they have joined the system, within a few weeks of
their joining, can we find that yes this person has the potential of becoming expert,
maybe not now, maybe sometime in the future.
So can we find them right now and provide the measures to nurture and foster their
contributions? So again to just motivate this, the idea is let's go back to that example.
So we had this confidentcy answer. Because C looked to be a top contributor. But what
if C was not a top contributor. Say this was the first or second answer that C has posted
in the community.
In this case you know probably I have confidence in this answer might be slightly reduced
if you have to take some serious advice such as to call the [inaudible] so basically in
order to find out whether this person should have an expertise or should have potential it
could be quite beneficial if you could then demarcate that this person looks like it has
some potential, maybe not an expert right now. Maybe sometime in the future.
And there are several benefits of it such as it helps in retaining these key drivers of the
community. So understand these are the key answer people in the system. And it's
because of them that the answering actually survives in these systems, and it's actually
quite important to retain them and actually provide them some measure so that they can
be nurtured.
So we said that the quality is of potential expert should have is he should have high
motivation. That is willingness to help others and he should have an ability that is a
quality of help that person provides.
And to measure these two qualities, we had several abstract indicators such as the
quantity of contribution. The frequency of contribution. And so on.
And, for example, to measure ability, we just tried to measure the domain knowledge the
person has. We looked at number of best answers that person is giving.
And trustworthiness of the user's answer. And for politeness and clarity we had like how
much typos this person is making, spelling mistakes so on and so forth.
So the approach in finding these potential experts is that we looked at the label experts
and we looked at their performance when they were new in the system.
So basically these experts now serve as a benchmark for the new people who are joined,
and we just say that if these new people match up to the potential of these experts, when
they were new, then probably these people have potential of becoming experts in the
future.
So we model that in use of experts when they were new in the community and we find
out how many new users have a similar match-up with these experts. And we use
classification and ranking model.
>>: When you say new, almost all the other things you showed on the previous page look
like sort of attributes that you would expect someone to have been around to have. You
count like how long they've been around and you counted number of best answers,
number of questions answered and frequency of answering, right?
>> Aditya Pal: Yes. So if we look at some of these attributes such as how many
answers that person gives in a session, it need not be necessarily related to for how long
this person is in the system.
So say, for example, how many times this person is logging into the system or how much
logging span it has. So probably even for someone who is new but highly motivated,
probably these values should be higher. Of course, I agree that number of questions and
answers that this person is giving would be low.
But then remember we are trying to actually measure the experts when they were new in
the system and not looking at their contribution as of now.
>>: Can you distinguish between experts who stay and experts who leave or quit? If
some experts come into the system they show off for a month then disappear for a while.
Some experts come into the system and stick around for three years. Can you
distinguish between the two types of experts?
>> Aditya Pal: In this case not. Because we're actually looking at the first few days of
their participation. So we're not looking after, say, one month what happened. We're not
bucketing based on first month, second month, third month like that, we're just looking at
the first few weeks of their participation and trying to estimate whether they have potential
of becoming experts in the future or not.
So, yeah. All right. So we called the new joinees as false positives if they match up to
the potential of the experts in the system. And we actually validated these false positives
using, by asking intuit to actually validate them.
And what we found out -- we used classification ranking and random model to do this
validation and what we found out they came back and said that almost 77 percent of the
users new joinees found by the classification model have expertise. And I mean they
have potential of becoming experts in the future.
And then using the ranking model we found that 51 percent of the people that outside the
classification model had expertise to become, have potential to become experts in the
near future.
Similarly for the random who had given like more than nine answers in the community,
25 percent of them had potential of becoming experts.
And then a longitudinal analysis of it showed that these people, they were roughly nine to
ten times more active after one year. So this is just looking at 14 days of their
participation in the system, and then we were able to estimate that they had roughly nine
to ten times more active.
There are several of these people actually left the system altogether. So to begin with,
these people had high motivation and ability and they actually continued showing that in
the system, and probably some actually, some encouragement to these people can
actually go a long way in improving their participation and invigorating them.
And so we found these potential experts. What do we do with them?
>>: So here in this case you identified some new guys, new people, and you had judges,
expert judges grade them. But you do have a label set, too, you could do all the
experiments as well, right?
>>: In this case we don't need that. Because see these are not the people who are
labeled as experts. These are the people who are not labeled as experts at all. These
are the new joinees what we did is we took all the experts that are labeled and look at
their participation when they were new in the community, and then we looked at these
new joinees who had spent two weeks in the system and tried to see how many of the
false positives can we find. So these people will be called false positives in this case.
So we don't need to do cross validation in this case.
>>: What I'm saying you could take the people, large number of people and their first few
weeks of time, including some who are experts, hide the labels, run your algorithm on it,
and see the position recall.
>>: Exactly. Exactly.
>>: You could do that. So these aren't separate identifications of expertise by --
>>: No, so these are just those people who the model think as experts but are not labeled
as experts. We call them false positives. There are false positives and false negatives
also, false negatives are the ones labeled experts but called as nonexperts by the model.
So we're just looking at the false positives here. And that's also not very interesting to
look at them, because, for example, what Intuit is they have several experts, labeled
experts who have not given many answers.
And the way they actually estimate their expertise is that they look at their prior
experience. So many of these people have worked as a tax consultants we do not have
that data they anonymize everything they don't give us their about me, so we don't know
that.
Surprisingly few of the experts gave, say, one answer and they were experts. We were
wondering why these people became experts to begin with.
Anyway, so the key thing is that yes we have found these potential experts. What can we
do with them? We actually experiment with one socialization experiment. And the idea
here is that can we actually engage them and can we actually provide them a sense of
responsibility so that their participation can improve.
So what we did is that they were allowed to now answer questions, free to ask questions
or the prior question that had been posted on the site.
And they would give answers for the same question and they would grade each of those
answers. And for tax accuracy, clarity and politeness of response.
And how much participation they have, they were given silver badge or a gold badge and
so on.
So these were actually mechanisms by which they were actually tried to see that, okay,
how you can improve looking at other people's answers. Okay, this person did this
mistake and judge their own ears.
So what we found is that if you just look at the activity after one year, so the people who
received this experiment, their activity actually doubled, almost doubled, compared to
people in the control group.
But the quality of the answer they gave -- it didn't increase. So but still what we found out
their activity actually doubled, which is actually in some sense ->>: I was going to ask. This is an experiment you ran and intuit said we'll change our
website for them.
>> Aditya Pal: It's an off line thing. Intuit didn't change the data website. We invited
these people and intuit contacted them and said do you wish to participate in their study
and they participated in that study so this website is at our end. So we pulled in the
whole dataset from intuit. And we actually had a database and we showed this to these
people off line. It was integrated into the intuit system.
>>: But this reading said that [inaudible] but the thing they did over the course of the year,
so the subjects that you got from intuit, that they contacted, they knew your site wants to
go through this kind of thing or did they continue?
>> Aditya Pal: Yeah, actually, yes, they used it for a few days, it was open for a week or
so. And then after that it wasn't, it's a surrogate site you can use the site as an
alternative to intuit. Because the answers you posted will not be visible there.
So we just ->>: Was your control group people you didn't contact or people you contacted but didn't
put through this Web treatment? Is this like a self-selecting experiment where people
who are more active and more likely to respond to your request?
>> Aditya Pal: So the control group in this case would be those people who actually were
found by the classification, the ranking model, but we did not contact them. Or who did
not respond to the invitation to participate in it.
>>: The latter group has been problematic. That's the difference.
>> Yeah, was there any kind of way you could reason about whether people were more
active because they're the kind of people who responded to, who are willing to take the
time to respond to your ->> Aditya Pal: It could be the case. But then we also did an off line evaluation of the
whole experiment that we did. And many people actually gave us a feedback that they
found it to be quite useful. Initially what they found out that this gives me a sense of
responsibility in the system.
I was like, okay, I'm giving an answer because I don't have anything good to do right now
but now they're like this seems like a serious business going on, and I think I should be
more responsible in providing these answers.
So we saw that some sense of responsibility got instilled in those people. But how
scientific or -- it's hard to validate that.
>>: Was there a change in their activity over time.
>> Aditya Pal: We haven't looked at that. So change in activity, we just looked after one
year, and then saw this. Because it's a tax-based system and it happens every one year.
So we have to wait for this tax cycle to find out what actually, how much it will increase or
improve.
>>: Can you explain again where the one-year comes in, because it sounded like people
just did the experiment, it's not like they were using the system continuously.
>> Aditya Pal: Also by one year, one year on the original site. Intuit website.
>>: So looking at one year's worth of data.
>> Aditya Pal: No, no, after one year. So we have to wait for one year because of the
tax cycle happens yearly. And it's only when the tax cycle happens people ask questions
on the site.
So otherwise for the rest of the time period, the site is probably not going that much. So
there is less traffic on the website for, say, nine or ten months, so there is this traffic
within the first four months and then the next year.
>>: So but the participants in the study weren't using the website throughout the entire
year. They just ->> Aditya Pal: Because that will not happen because there is no volume of questions on
the site.
>>: Were they using it throughout the two or three months of tax season.
>> Aditya Pal: Yes, exactly.
>>: So it wasn't just one time.
>> Aditya Pal: Yeah.
>>: I wanted to ask exactly when did they use the system? Was it after April 15th of the
previous year like after tax season is over now use our experimental system and rate
yourself, or was it before April 15th, like when the site was currently active?
>> Aditya Pal: It was before I think I would say let me say it was around, I think, February
or something.
>>: So right before.
>> Aditya Pal: Yes, right before that.
>>: That's probably a good time to do it.
>>: So for that February to April of that year, that's not this data. You have February -the next year.
>> Aditya Pal: Next year.
>>: Wait. So if they used your system like in February, then you're not measuring their
change for that year but for the year afterwards?
>> Aditya Pal: Yes, because it got mixed up with it. So because when you do it in
February, the volume already starts happening. So basically people start asking
questions in February around that time.
>>: Can you imagine it would affect that year's -- is it differentially -- let's call the year in
question the experimental year. Can the experimental year plus one, plus or minus one,
are you looking at experimental year plus experimental year plus one?
>> Aditya Pal: Experimental year plus one.
>>: Even experimental year is going to be affected to some degree because you're doing
this in the beginning of that.
>> Aditya Pal: But one strong motivation to do it plus one is that we wanted to see how
many people actually return back, come back.
>>: It seems like plus one, maybe you should look at experimental year minus one,
versus experimental year plus one.
>> Aditya Pal: The problem with experimental year minus one many of these people
didn't even exist because we're just looking for the new joinees so these people are new
in that system. They've just opened their accounts 14 days old or something. Something
of that kind. So minus experimentally or minus one would be zero for these people. All
right. So I think I can skip this a little bit. So in order to conclude this file, what we found
out is that experts can be identified in these online spaces, and I showed we can do this
in Q&A and we can do this in micro blogs, and we can find these people in near real time
for large scale datasets.
More importantly we can find these users who have expertise quite early on. I also
showed with one very simple measure, one very simple socialization experiment, the
participation can be improved and they can be nurtured and can be retained in the
community.
And this is actually the key focus for me going forward also. And I wish to understand the
modern information systems that we have, and since users are at the core of these
systems, so I wish to understand what are their intrinsic properties, what biases they
exhibit, what kind of selection preferences these people have, what roles do they play in
the community, what kind of facets they have, how do they evolve over time.
And at the same time understand their social dynamics such as how do they interact with
one another. How do they influence each other? And the biggest challenge that we have
with us is that we have a huge amount of data that we have to handle.
And at the same time the diversity in the kind of people we have is increasing, and it
makes the user modeling a very hard task. We have people from different geographic
locations, different languages so organization different mentalities and several other
issues, and one more important challenge is that there's a cross-domain synergy.
Nowadays we have people have multiple accounts. So they have an account on a micro
blog site. They have it on social networks. As a result anything they do in different
domains links these domains together.
So given this online footprint of a user how do we actually understand and model the
whole data? And this actually adds up for the complexity of the whole problem. And with
this I have a few concrete goals listed over here.
So what I wish to do in future is I wish to build tools and algorithms that would enable in
identifying users who fill in a specific role within a community and out across different
communities.
And the kind of roles that have so far looked at is expertise. But a user can play several
different roles. User can be evangelist. User can be just an ordinary contributor, random
surfer and so on.
And I wish to build these tools and mechanisms so these people can be identified quite
effectively and early on. And we can actually understand and adapt according, adapt
interfaces based on their taste, based on their style, based on what they actually desire
from the community.
And so in a sense just trying to improve the Web experience for these people. And one
additional goal that I have is that I wish to improve the social search experience for these
people.
So we have like huge amount of user generated content with us, but still we do not see
that in the search space how well we are able to integrate with the search. It's not. That
integration is not there.
And the big challenge here is how do we find really high quality user generated content
and infuse it with the search engines we have of now.
And one interrelated aspect of it is that we have like complete footprint of a user. We
have like user graph and we have like user activity online in several different domains.
So can we use this to model the intent of a user better and actually improve the retrieval
for the search engines and improve the relevancy of the documents that we can retrieve
for these people, and at the same time provide a unifying experience for the person.
So we have from a search engine standpoint we have moved from the blue links to this
much feature page. We now have images on this page. We have videos. We have
weather information and so on.
So whether the search page looks like this, it's fused in this manner or in an aggregated
search manner in which we put modules together. Or in some other fashion. The kind of
challenges that we have to handle are that user is researching from different devices,
how do we adapt these for different devices and how do we do it based on geography,
demographics and intent and several other features.
And this would be the key focus for me in the future. And so to summarize so far in this
talk I talked about my work in expertise discovery, but I have worked in online domains
using machine learning and data mining to solve several other interesting problems, such
as blog recommendation in blogs and integrating into a mission and on the Web from
underlevel sources, and I wish to actually work in doing machine learning and data
mining in these interesting domains in the near future.
And, finally, I'll acknowledge my advisor Joseph Konstan and, most importantly, Scott for
giving me an opportunity to intern here and again for hosting me here.
And with this, thanks a lot.
[applause]
Do you guys have any questions?
>>: Going back to some of your stack overflow notes. Stack overflow and serve overflow
when you're doing, are very heavily gamefied already. So as a user you're not allowed to
respond to questions until you've answered, until you've developed some degree of
reputation.
The place where you put your answers varies depending on your degree of reputation
and badges and all sorts of things. So your measurements of how independent were
independent variables I guess is what I'm asking. In a less gamefied system would that
have affected the way that ->> Aditya Pal: In this case it's a highly gated system. So what we see is that many of
these people get scared in the waiting. I've done some work on it. And we just looked at
how people evolve over time. So in that we saw that when the stack overflow actually got
announced, when it got released, during that time it was not that kind of system.
And now over time what we see is that the ordinary users are actually scared in giving
answers and there are actually several different questions that are posted over there on
the site which says, okay, there's an expert here, I'm scared of this guy, because he's
answering everything in my topic. What do I do? Should I answer or not? I look stupid
or something.
I mean, this is an open ended question and probably a good answer to this would be that
what I thought of it is that a system which allows you to make an anonymous contribution.
Probably could be beneficial. And if many people think that your contribution is actually
good, probably you can then reveal your name. So in this case it would benefit a new
joinee from, save them from embarrassment from giving a bad answer or something.
And probably it can help them in the longer run in boosting their confidence and so on.
That's one way to look at it, yeah. Did I answer your question to some extent?
>>: Part of what I'm asking is as you're measuring with the development of experts in the
system, were you able to take it into account at your say tenth, at your tenth answer that
gets at least one up vote you get a badge. So I might expect a tick in the count of right
number of people who say get to that tenth and then maybe a drop-off in the number of
people who bothered to do an 11th. So your curve of number of answers isn't quite
smooth. It's got some funny bumps in it because of the way the reward system works.
>>: If we haven't looked at that per se, but that would be actually interesting to look at.
And I expect, yeah, when you said there would be a bump off, obviously, that would be
the case. I think similar studies, some studies have found on Yahoo! Answers the way
people become top contributors, if you look at the overall distribution if you look at it it's
power log of N distribution it fades off. So people who reach level five expertise are very
few people who reach level four are slightly more.
So I don't know.
>>: Shows net research on eBay and finds the same thing threshold.
>> Aditya Pal: Probably.
>>: Star changes are special where people change their behavior.
>> Aditya Pal: Probably. I think something similar is there in Wikipedia also. As you
grow more experience and you actually own a page, you do not welcome any edits on
that page, too. So I don't know.
>>: I wanted to ask you if you could go a bit more in depth into some of the questions on
the slide about modern information systems, the one right after conclusions. In particular,
the context of expertise and authorities, I was wondering if you could say a little bit about
across the systems that you've studied how much about expertise in your definition of
expertise is this an intrinsic property of the user and how much of it is about the role that
someone plays in the community. And how much of it also I guess to add on to this is it
about what the person was asking the question is looking for. How do you define
expertise across these kind of -- like how is --
>>: So, for example, if I just talk about intrinsic properties, so the thing called question
selection bias, that's actually in some sense an interesting property of a person. So
basically you're just trying to maximize a reward that you get for the effort that you are
putting.
And this actually varies from one individual to another. So as a result of selection bias is
an interesting property. We're not even looking at what is actual contribution, how valued
that contribution is, so in some sense this intrinsic property is actually an indicator of in
this case is an indicator of expertise of the person. Now, if you look at social the
dynamics, the interaction, when you actually give an answer in a system, many people
like your answer.
They actually upvote it. So that actually is in some sense an influencing factor and also
is an indicator of how expert you could be.
Similarly, for micro log. So if you posted a tweet, someone retweets, it gives you a boost
that at least few people in the network are posting some tweets. Probably you actually
would get more encourage for small tweets. But in any case that's an indicator for your
expertise for that given topic as per the judgment of other people in that community.
So I guess did I miss any point anywhere? So ->>: I guess the last third of my question was about how is -- when you say I'm looking for
an expert, how much of that is the kind of expert that I'm looking for, how much of it is
what the person is asking, like why I'm asking, if I want to find the best results, for
example, micro blog best answers to a question or if I want to find the experts that are
going to influence, advertiser.
>> Aditya Pal: Exactly.
>>: How is the definition change?
>> Aditya Pal: It actually totally depends on the objective that we're trying to maximize.
So say if the objective is to find a person who knows this topic. So in some sense this
comes close to the actual true definition of the expertise in the conventional domains we
look at. And probably the most direct measures we'd be looking at how many best
answers this person has given or how many best answers over a number of answers this
person has given. For example, if I assume that Albert Einstein is answering something,
probably all his answers should be best answers, right?
So in this case this ratio is actually important. But in cases where you have to find
interesting content in that case you're not looking for expertise. This person might be
wrong. He may be making jokes about it, but people are finding it interesting.
For example, in oil spill we found out that there was one, some fake account was there
who was posting actually humorous messages. And many people were liking it even
though that person was posting fake news but many people liked that.
That person didn't have domain expertise but turned out to be extremely interesting
person. So in this case basically now you would actually measure something more
extrinsic which is basically how many people are liking this person's tweet.
And so basically different dimensions actually convey different meanings. And actually
depends on what actually we are trying to solve. So if, for example, influence, if you have
to measure influence then we just look at say what is the likelihood that my influence
would lead to some positive outcome in my network versus someone else. Basically if
you say we have to build a business strategy then we have to see what are the core set
of people I should pick so adoption is maximized in some sense. Then I would look for
the most influential people such that the outbreak of that influence would lead to a mass
adoption.
So basically different strategies have to be deployed for different scenarios.
>>: So Microsoft has several different kind of social question and answer type forums like
social.amnesty and Microsoft.com, how would you fix those sites to increase participation
and increase the quality of the answers or given what you've learned about some of
these other systems, like what should Microsoft do to change their systems?
>> Aditya Pal: So, for example, for MSDN, is MSDN allows people to answer? No.
>>: There's like a whole question and answer section.
>> Aditya Pal: I haven't seen that. Anyway, so probably one way to look at where it is
not doing it right. So we have to first understand, okay, it is actually failing in this way or
the other way. So first understand what are the key variables, because it is failing and try
to fix them. One very simple way to fix it is basically advertise more about it. Probably
give some incentive to people.
And maybe if you turn out to be a good contributor, maybe you'll get a badge or some
goody from Microsoft or something of that kind of thing. You might get one year free
subscription if you give lots of good answers, something of like that kind, can be taken
up.
Probably other than that I'm not sure the interface is not very nice and probably the
interface could be improved and maybe you could provide users with different
applications such as application on a phone or something so that they can participate
more vigorously. And also looking at what are the needs of the people. So, for example,
I imagine for MSDN, maybe a tighter integration with Visual Studio might be beneficial.
I'm not sure if it is there or not.
But integration with Visual Studio, where now you can actually post an answer if you
don't find anything good you can fix things there.
So something of that kind can be used and there are several other, some small, small
steps that can be taken to improve it. But first we are trying to understand why is it not
very good or not. So, yeah, that has to be the first step.
[applause]
Download