18891 >> Jitu Padhye: Welcome everybody. We're going to...

advertisement
18891
>> Jitu Padhye: Welcome everybody. We're going to have two talks today. It's a great pleasure to
welcome Professor Sue Moon from KAIST, and she's going to talk about Twitter and her recent analysis of
Twitter. It's been picked up by a number of news outlets, PC World, BBC and MI Direct Reports. And
Sanjin is going to give a talk on, her student, he's going to give a talk on building routers out of graphics
processors, right? Okay. Thanks.
>> Sue Moon: All right. So I'm going to talk about my Twitter analysis. It's a joint work with Haewoon
Kwak, Changhyun and Hosung Park, all my grad students.
So I think most of you heard about Twitter, but I don't know how many of you actually use Twitter. Could
you please raise your hand if you use Twitter. I'll do a brief thing about Twitter micro block short message,
140 character limit per message. And also you read other people's tweets. Here I'm following New York
Times dining, so I get to read what New York Times write about dining, and then some other people.
Also, the Twitter home page, the typical Web page actually has a list of trending topics. Basically, it's top
10, top 20 words that appear in all the tweets that Twitter receives.
So you can also check out what are the most -- what other people are interested in, what other people talk
about. And I think you hear about lots of events that made Twitter very popular. One of it is this U.S. Air
ways jet crashing into the Hudson River. And the first photograph of the accident actually appeared on
Twitter before any even local news media got there.
One striking thing on Twitter is that in most online social networks, the friendship relationship, online
friendship, declaration of online friendship is actually mutual. I ask you to be a friend on Facebook and you
have to accept my invitation.
On Twitter, it's no longer. It doesn't have to be reciprocated. In Twitter, people use this term of following.
So I can follow anybody and anybody can follow me. And it doesn't have to be -- basically the relationship
does not have to be reciprocated.
>>: Can you stop somebody from following you?
>> Sue Moon: Yeah, actually Jitu did that yesterday.
>>: First time I [indiscernible] random in the first few minutes.
>>: You found her tweet?
>> Sue Moon: No -- yeah.
>>: Come sit for the talk. I said sign up. I want to make sure she didn't block me yesterday. I was one
of -- [laughter].
>> Sue Moon: He didn't know that was use, probably. I was there when he was blocking. We didn't know
it was you. Okay. So weird things happen on Twitter. But typically following on Twitter, it doesn't have to
be mutual.
And one important thing is people complain that people are not following me back. And if you expect
people to follow you back on Twitter, you are on the wrong medium. You should pick another medium,
because it's a free choice. And I really like that.
So following is basically subscribing to Twits, and not just, I think people are you're aware of the Twitter's
role in the Iran election. A lot of people on the street would tweet about what's happening on the street
when all the major news media in Iran were not reporting about what was happening on the street.
So through this Iranian election and the aftermath were reported on Twitter. So in this work, I have three
goals in this talk. Actually, it's a combination of two papers. So the first two bullets are from our World
Wide Web paper. So the first is we analyzed the relations on Twitter and see if what sets it apart from
existing online social networks and then show why we consider Twitter as news media. And then lastly we
propose a new ranking algorithm to identify influentials.
>>: Seems the differences lie in what the defaults are rather than something fundamental. Because I don't
know I hide like a bunch of my friends on Facebook, and that potentially makes also you need directional
communication. So it seems -- it's just the default that's different here.
>> Sue Moon: Right. Right. So I mean, we say Twitter is different in this aspect. But we're not saying that
Twitter is not a social network. We say Twitter has these traits or characteristics of news media but we
can't say that Twitter is news media or has like these traits of news media, but we cannot basically make a
statement that Twitter is just online social network or just news media. It has both of it.
So media is a means of communication. And when we say news media, we think of radio and TV
newspapers, magazines. And basically the most prevalent feature is that they reach influence or reach
influence people widely. And so for these three goals, we're going to basically show the following three
things, that we're going to show the following is not reciprocated on Twitter. Just as Ratul pointed out that it
can be reciprocated.
It's basically on how people use it. And on Twitter we show that it's not used in the typical way as social
relationships would work.
And then three features about news media. They talk about timely topics and a few users reach, they're
very large audience directly. And also you can have -- you can see the impact that word of mouth
spreading of news have. And then we're going to also find a new ranking algorithm based on effective
readership and I'm going to explain effective readership later. So the data that we used for this work was
collected last year, between June and September.
And at that time there were 41 million users. And we got all the profiles that we could near complete. And
there were 1.47 billion relations, following relations and we have this dataset publicly available.
And in between June and September of last year, we basically followed all the trending topics and the
related tweets. So there were about 4200 trending topics in that time period, and 106 million tweets
mentioning the trending topics. And we cleaned spam tweets, removed using clean tweets.
And Twitter is becoming a hot bed of all sorts of research. So almost every conference is -- lots of
conferences now have papers covering some phenomenon that we observe on Twitter. How to rank users
and you can even predict movie profits based on tweets and you can recommend users based on
similarities in tweets or detect real time events. In this example, there was a paper by Japanese scientists
that you could actually give a warning about impending earthquake wave coming to you.
And there were like four papers with Twitter in their titles at World Wide Web choices, but of all the papers
they use sample set, small set of the Twitter data. So our paper is basically the first paper that looked at
the entire Twitter-sphere connectivity graph.
And I think Twitter is very interesting, because it is the first time that the word of mouth spreading is made
so easy to collect data about and analyze. And that's why the Library of Congress has decided to basically
put all the tweets on their archive. And I think that act itself represents the importance of Twitter data.
So part one, following is mostly not reciprocated. So why do people follow others? It's a reflection of online
social relationships. So when we say social, it means some interaction. So it's got to be mutual.
Otherwise it's just one person listening to the other as if we were listening to the news media of today one
way communication.
And social lists look at reciprocity as the important feature in human life. So is following reciprocal. On
Twitter we see only 22 percent of user pairs follow each other, which is much, much lower than any
reported activity interaction. So on Flickr, 68 percent. On Yahoo! 360, 84 percent. And on 77 Cyworld
guest messages. Cyworld is the number one online social networking service in Korea. So all these online
sites that we consider to have some sense of online social networking interaction going on.
The reciprocation, ratio of reciprocation is much higher than 50 percent. And on Twitter it's lower, 22
percent. And this is -- it agrees with my understanding of how I use Twitter and how most people use
Twitter, actually.
And people come to Twitter not to interact with people, but mostly for information.
>>: The numbers, in fact, if you take on all the celebrities from Twitter is this the number of people
screened, but some people have thousands of followers?
>> Sue Moon: That's interesting. Because Twitter is one of the first online social networking sites that
these established news media had presence. So Facebook, New York Times, how many people followed
New York Times? So Cyworld, there were celebrities. So Cyworld, we do have -- this is an online service
in Korea. So they do have celebrities. But no news media, actually.
So it does play this role of news media.
>>: I guess we would just say -- celebrities too. You can interpret it as nonpersonal sites. Sites managed
by professionals not individual accounts, right.
>> Sue Moon: Right.
>>: You build those out.
>> Sue Moon: Uh-huh.
>>: It seems like the model is more like nobody from [indiscernible] visits my Web page ever. I go visit
[indiscernible] all the time that seems to be the more correct model here. [indiscernible] probably
[indiscernible].
>> Sue Moon: Right. Right. Right. So that's a good question. I don't think we took the data out. But we
can do it. But still the reciprocation probably is below 50 percent, but no guarantee.
>>: I just wanted to point out these are not just celebrities. It's actually many dental practices have now
Twit pages that people follow what they're doing. They're not celebrities.
>>: Like professional sites versus [indiscernible].
>>: Yeah. And you have to be careful in that --
>> Sue Moon: I mean, professors are also like I have 4 to 1 ratio following followers because I don't follow
every student who follows me.
And second, so now let's talk about news media like traits of Twitter. Users talk about timely topics.
Today, the two graphs I'm going to show here. So this is the trending topics, how they evolve over time.
So the dark green part is the new and the lighter the color gets, the older it gets. Meaning that it maintains
some level of memory. So what used to be a trending topic yesterday remains a trending topic last week or
yesterday, but mostly about more than 50 percent of the trending topics are new.
>>: Why is it a condition of [indiscernible].
>> Sue Moon: Older than last week.
>>: Older than last week?
>> Sue Moon: Yes. So some topics remain. So meaning that people talk about new things, but they still
hold onto some memory. Unfortunate part is we could not actually do the comparison with news media,
how much of like yesterday's news carry over to today's news, because we couldn't really get like some
data to compare with and had trouble doing this classifying the CNN news, but one -- here's another view
of looking at the data. So we looked at the traffic pattern. And there was a study of YouTube downloading
patterns.
And based on the downloading pattern, you could actually do this categorization, whether if it's bursting but
heavy traffic and lasted long enough they basically considered it headline news or if the traffic was not -the volume was not as significant as the headline news but if they persisted, the persistent use. Or if
there's like just hiccup it would be femoral you would see it less time or more or less debris.
So headline news, actually 54 percent of the trending topics could be considered headline news. And
seven percent news. For example, Red Sox. People talk about that stuff all the time, the sports team.
Somebody back in the day blah, blah, blah, somebody start trending topic and people would comment
yeah, back in the day I used to be a soccer player. And you know it would go on for a short time and then
the topic would dissolve.
So from here we can see that more than 60 percent basically are headline news type. So people talk about
timely things. Of course -- yeah. And then the next thing is a few users reach a larger audience directly.
So this is the graph that probably you wanted to see. So the blue line is the number of followers a user
has. Red line is the number of followings a user has. So we hear a lot about power law.
In a social network, you expect to see a power law. So like straight line basically should be a straight line
but beyond 100,000 points, the tail is lifted. And the other part, the following, the red line, the tail is
dropped.
>>: How many points are there in that million plus area? Or ->> Sue Moon: About 400. About 400 because we have 41 million. So this is about an order like 10 to the
minus fifth. So about 400.
>>: People 100,000, a million followers.
>> Sue Moon: I'll show you some names later on. But at this time, number one when the measurement
took place, it was Ashton Kutcher with over a million and Barack Obama was No. 7. The Ellen DeGeneres
show and CNN Breaking News they were top 10 by the number of followers.
So most top 400 sites are basically celebrities, or it wouldn't be a simple human being. It would be
celebrities or some politicians or major news media.
So what's interesting is you flip it and then the more followings, and then it's actually the number drops
drastically, meaning that not all the people follow who have a large following basically reciprocate. Who
would reciprocate? Ashton Kutcher doesn't reciprocate. Paris Hilton doesn't reciprocate. But Barack
Obama reciprocates because he's a politician. He wants to give the attention back to there.
So the number of people who actually follow more than a million is smaller than actually those who have
more than a million followers.
So going back to the question of reciprocation. So we just cut the tail part off and still there's this gap. So
because of this gap, I think the reciprocity is going to be fairly low.
>>: So who follows more than a million people?
>> Sue Moon: Oh, politicians, Barack Obama. Yeah.
>>: I guess you did say that you [indiscernible] but in my experience I think towards the [indiscernible] I
think they could be spammers.
>> Sue Moon: Right. There are these two hiccup points. So Twitter at the beginning let everybody follow
as many people or be followed by as many people. Still a person can be followed by as many people as
possible, but you cannot follow more than 2,000 people. Here's a hiccup point at 2,000 and Twitter
introduced this restriction less than two years ago.
If you want to follow more than 2,000 people you have to have a good number of followings. So you can
be a spammer cannot follow in and follow as many people as they want and write about.
So you can still mention people that you don't follow.
>>: The spammers cannot do that without creating 2,000 accounts.
>> Sue Moon: Right. And getting approval from the Twitter. So beyond 2,000 followings, followers, you
have to basically talk to Twitter and I think it's deterrent for people.
>>: So what advantage does a spammer get by following lots of people?
>> Sue Moon: You can actually -- I mean, I guess mentioning -- if you follow those people then you can
create lists and basically have convenient features on Twitter to categorize people, group people based
on -- but nowadays actually ->>: I can just go to that user's Twitter site anyway without having to follow them?
>> Sue Moon: Right. But you get a lot of tweet spams these days, so a lot of people would say Sue
welcome to Raleigh when I landed to Raleigh when I came for the worldwide conference, welcome to
Raleigh and I don't know this person. So this kind of spam happens. They call it mention or you can
basically reply. These features you can use without following.
So there are these spams, yes. And I'm saying that we did not remove all of the spam tweets, but this is
followings and followers. So these do not include the -- they may include spammers, early spammers, yes.
And another question we raised is are those with many followers, are they as active because they have
more followers? So what I plot here is the number of followings, and for each number of followings, how
many tweets they've done. So what I'm plotting here is basically average number of tweets and average -median number of tweets for the number of followings.
And as you can see, the median increases, and this is log scale. So here also log scale up to 100 points.
The number of tweets increases as you have more followings. And even up to like 10,000, it's not
decreasing.
And then beyond this, this blue line is actually we've done log scale beaming and plotted the medium point.
And so it keeps increasing beyond a certain point up to 10,000 -- up to 100,000. And then it drops down to
some reasonable level.
What I'm trying to show here is that, yes, the more followings you have up to a certain point, up to definitely
100, you do -- you're more active.
But there is no way for us to get, you know, the date of the user joining the Twitter service. So we could
not actually average it over the period. But as a general trend. So you are more active. And the fact that
average is way above median meaning that there are outliers. So maybe there are outliers because we do
not filter the spam traffic, the spam tweets very well.
But, yeah.
>>: So why is like 25 and like 215 like those seem to be like special things in the graph?
>> Sue Moon: Right. There are these data points that dip, right? What was it?
>>: Is it because the data that you were scraping, you can only scrape a screen full of stuff and there's a
limit to the number of things that? ->> Sue Moon: No, probably not. So these few data points I think align with the hiccup point at 20 and
hiccup point at 2,000.
But the other ones ->>: Hiccup at 2,000?
>> Sue Moon: What?
>>: That's a ->> Sue Moon: Yeah, but it goes to the side. But this is like ending part. So hmmm. Ah, actually, I don't
have a good clue.
>>: The following. So that paper does not apply to those.
>> Sue Moon: Yeah. I have to double-check. Yeah. And the other part here is that there are people who
are followed by a very large number but never tweet. So there are these celebrities who join Twitter but
never tweet.
And so I met Laslo Barabasie [phonetic], the author of "Linked," and he said I'm on Twitter but I never
tweet, but why do people follow me? Like, duh because you're Barabasie. So there are people who open
the account. He's a -- anyway.
>>: Isn't the X axis number of followings not followers.
>> Sue Moon: Oh, yeah, yeah. I think actually -- I think I labeled it wrong. It's number of followers. Oh, I
think it's number of followers. Sorry. Yeah, yeah.
And then the fourth one is most users can reach large audience by word of mouth. This is the interesting
part on Twitter.
So following direction, we use this right side arrow. But actually if I follow you, what you write, the
information follows back to me. So the direction of information flow is the other side.
The average path length is 4.1, which is very, very short. And average path lengths, we basically took a
thousand random samples of seeds and then calculated the distance from that seeds to all the rest of the
network. And then basically planted this.
And about with 8,000 samples, the graph is about the same. And the CDF shows that about 70 percent of
pairs have distance 4 or shorter, and 95 percent distance 5 or shorter. It's extremely short. So there's an
MSN Messenger network analysis that was done by Yuri Lescovic and some other person at MSR here
and they showed the average length of six point something and it's longer. And this is a directed graph.
And so we expect it to be longer because it's directed. But maybe because there are people like Barack
Obama that has bidirectional relations with millions of people. Still the average path length is very, very
short.
>>: Wanting to path ->> Sue Moon: From one user to another user, the graph, shortest path.
>>: Follow him that is the path two.
>> Sue Moon: One. One hop. So I follow Jitu and Jitu follows, say, Ming, the information path from Ming
to me is 2.
>>: This doesn't take into account that many people aren't connected at all, right? Because Twitter has
lots of sort of islands which would be short path lengths and not connected to the graph.
>> Sue Moon: I think we probably looked at the greatest connected component GCC, and I think it covers
more than 90 percent of the users. Yeah. So I forgot the number, but the island's not connected to the
GCC very --
>>: The other version of that question is what did you do for pairs that you did not see any information flow
at all? Like if [indiscernible] I don't follow anybody, how is that? It's connected in a graphic sense but not
an information flow sense.
>> Sue Moon: That's a good question. I don't think we looked at it, actually.
>>: I don't follow anybody. So you can't like ->> Sue Moon: Right, right. So I don't think we took -- so how many pairs actually have no path, right? No
path. I don't think we have that data. Yeah.
>>: This is on the [indiscernible] following links, right?
>> Sue Moon: Yes.
>>: So how [indiscernible] be different looking at the following or if I'm looking at the [indiscernible] only the
bidirectional length?
>> Sue Moon: I'm expecting it to be a lot longer. Even segmented, because the reciprocation was only 22
percent. So we will have just one-fifth of the links.
>>: But then I would imagine the bad length would be shorter.
>> Sue Moon: It won't be shorter. I'm almost sure it won't be shorter, because we looked at -- MSN
Messenger network averaged path lengths of five or six. We had the CIO network which is basically every
Korean joined the network, and the average path length there is more than 4.1. It's 4.7, about 5. I mean
the CDF is slightly to the right hand, right side of this graph.
So I think if online social network, I don't think there's anything that has path lengths shorter than 4.5.
Average path length of 4.5, at least reported.
>>: Slightly obnoxious question. What are we supposed to take away from this graph? What good is this
number? Had this number been 5.3, what would it change?
>> Sue Moon: Hm, that's a good question. To a computer scientist, right?
>>: Why should I care that it's 4.4. Why are you asking this question about this graph ->>: I'm being general, all the graphs so far. The accuracy of 4.1 was right there.
>> Sue Moon: Right. So ->>: Basically questioning why is she doing that work.
>> Sue Moon: Yeah.
>>: Can I answer?
>>: Some is reasonable [laughter].
>> Sue Moon: Some of it, thank you.
>>: Sorry. I didn't mean ->> Sue Moon: That's okay. I'm enjoying it.
>>: It's a measure of the correctness of the graph, right.
>> Sue Moon: Yeah.
>>: It's a measure of the quickishness of the lemma.
>>: I know but what am I supposed to do with it.
>>: So now that you know how quickish the graph is it, it will help you do a lot of provisioning of the service
itself. If I have N places that I can put my 5,000 servers, right, which user should go in which servers and
where.
>>: I see. Good point.
>>: I also have another comment on this thing that short path lengths usually less than 6 properties of
small graphs [indiscernible] information travels very fast in graphs of these properties.
>> Sue Moon: Right. So small world. People say small world, and in social science it translates to six
degrees of separation, meaning it's shorter. So one thing is information actually flows over a much shorter
network than human social network, which is true, actually. And now we're going to look at this unique
feature on Twitter that you can actually relay somebody's retweet and so an example here is Node 0 says
hey there's free coffee and node 1, 2 and 4 they are followers of node 0. So they see this free coffee tweet
coming from node 0. And node 4 found it particularly useful so it decides to retweet, meaning relay. So he
retweets it as his own message what node 0 wrote about. So it's basically RT is the general convention on
Twitter saying which stands for retweet and quotes the originator and the message.
So what happens is if node 4 retweets somebody else's original tweet, then node 4's followers, in this case
node 6 and node 5 and node 6, they see this retweet and learns about the availability of free coffee.
So node 0 would be the original writer and node 4 would be the retweeter.
>>: What's the convention if node 6 were to retweet that message, would it credit node 4 or node 0?
>> Sue Moon: It used to be like chain reaction, but spread -- you have to be -- you have to do it manually,
because retweet, this is like a convention that it's a bottom-up convention that users start to use it and now
Twitter supports it as a key option. And nowadays you can see who the original writer in this case node 0
is, but at the bottom it just shows that it came through who. So you actually have like the original and the
hop just before you information.
But used to be able to see all the change. But you know the writer and you have the time of the tweet and
you can kind of follow it. But it's actually got a lot harder with this new feature on Twitter to actually track
retweets.
So anyway, one hop neighbor, two hop neighbors. So how we construct the retweet tree is here is the
writer/reader one, one multiple reader still one or we can have two. So this is the empirical RT trees. So
this is a very interesting data because we talk about word of mouth spreading. We talk about information
spreading, but how often did we have data to actually analyze how information spread? Does it spread like
virus? How do people hear about news? How do relay news, and this is one of the I think first times that
we have actually data to talk about how information spreads in the social science. So this is like a trending
topic, one trending topic over time how each trending topic tweet, any tweet that mentions a trending topic
gets retweeted. And this is a tree.
And 98 percent of the retweet tweets are hi I say and somebody else retweets and that's it. But there are
some cases that the chain would go up even to ten, meaning that it would keep spreading.
And the number of users who participate in retweets would almost reach a few thousand. Meaning that a
single retweet, a single tweet would be retweeted by thousands of people in a rare case.
And so this is the additional readers that we pay attention to that's interesting. Because if you retweet, you
actually bring in additional readers to the new tweet. And what's interesting is, like you boost the audience
by retweet. So this is number of followers of a source on the X axis, and Y axis is additional recipients of
the retweet. So no matter what the number of followers of the original source is, when your tweet gets
retweeted it buys additional 100, at least median.
And average, it can buy like a few hundred. So this is the power of retweets. Some bothers to retweet on
the average, it would bring -- be retweeted by extra 100 retweeters. The timeline between retweets, 35
percent of retweets would take place in the first ten minutes. More than 50 percent in an hour. So it's like
very fast. This is like information spreading -- it's only possible in today's online connected Internet age.
So the last part is rank users on effective readership. I'm just going to give a key idea on new ranking. So
the key idea is your retweets is not always the first to all of your followers. Let's say I tweet about some
interesting article yesterday on New York Times, and I retweet about it. Jitu retweets my tweet, because
he finds the New York Times article as interesting. But somebody also retweets my retweet. What if Ratul
follows both Jitu and Sanjin, then Ratule would hear about it from Jitu first. So it's really useless or it's
redundant.
So we're going to count only those who matter, effective readers. Also, not all your followers read all your
tweets. I follow more than 100 people. So I cannot read everybody's tweets. I just bother to read like the
first few pages and that's it at any time I log on to Twitter. Also, people forget what they have read. This
does happen. You know, after a week or two did I read this article? Or maybe stories. Especially those
tiny URLs, is it the same story or not?
So people do read the same story multiple times. So there are different ways to gauge whether your
retweet is really having any impact on the readers. So based on these like where it came up with new
ranking algorithm, and I'm just going to show this user rankings. So followers -- oh no, this one. So this is
last year, June, the ranking by the number of followers, number one Ashton Kutcher, No. 7 Barack Obama.
You use following follower graph and you can run page rank because it's a directed graph. And actually
the green basically show overlapping things between the ranking by the follower, number of followers,
ranking by page rank there almost the same because page rank basically counts the number of in links as
much.
But if you count the number of our retweets actually, then you actually see different rankings. So of course
people who have -- people who have a large following, they are influential. But there is also another way to
consider influence.
One is not -- I don't know which one is more important. But there is a quantitative way to measure that your
story gets spread by word of mouth, and we propose a new mechanism to quantify that. So summary: I
think low reciprocity distinguishes Twitter from other online social networks and has these characteristics.
And we are proposing a new ranking algorithm to quantify this word of mouth spreading. Not worm
spreading but it's word of mouth spreading. Other questions? Yes.
>>: How do you measure the effective readership?
>> Sue Moon: So we have a human memory decay model and we also like model, basically give the
likelihood of people reading a tweet, depending on the number of followers a user has. So the more
followers a user has, the less likely one would read the tweet. So we would take it all into the model and
then calculate the influence based on the tweet data.
>>: Can you validate this model?
>> Sue Moon: Oh, that's the toughest part. That's where we're stuck.
>>: [indiscernible] one of the influentials, versus one of the noninfluentials?
>> Sue Moon: What, plant a story? That's a good idea. I'll add GS and ->>: On that note, have you removed spammers?
>> Sue Moon: We basically use clean tweets, which is plug-in for -- I forgot, plug-in for one of the
browsers. But we don't have a good mechanism.
>>: So if you were ->> Sue Moon: It's getting harder.
>>: If you were to compare the spammers on some of these metrics, do they look very different?
>> Sue Moon: So we haven't paid a lot of -- there are other proposals to get rid of Twitter spammers. So
we can proposal use their mechanisms. And spammers don't have a large following. Spammers mention
a lot of people but their tweets are not retweeted. I'm just assuming. So if we study like the retweet
patterns, probably they don't matter. But ->>: So these metrics might also be useful in helping you identify spammers?
>> Sue Moon: Yeah, yeah. I mean, these would be helpful for the top ranking influential. I don't know if
I'm on the like ten millionth of the Twitter users like who would care about the ranking, I don't ->>: I guess you know influence with the Twitter management itself, but this is an interesting experiment to
do. You can some of the influentials could say that I charge you $2 a month to follow me.
>> Sue Moon: Wow, you try that. You try that. I don't know.
>>: That would show how much people actually care.
>> Sue Moon: I don't think ->>: I followed [indiscernible] would I pay him $5 a month? Probably not.
>> Sue Moon: Right. So that's the emergence of new media, who would bother to pay to follow you.
>>: Doesn't have to be money. Like ->> Sue Moon: Doesn't have to be. Some ->>: Coupons from Bill Gates.
>> Sue Moon: Lunch coupons?
>>: He doesn't have to pay me, if I have to give him something if I have to give him coffee once, do I follow
him?
>>: He might introduce advertising to -- he might have to then ->>: I see.
>>: So ->> Sue Moon: Last one?
>>: So you made some of a big deal in bringing a case like being able to see the entire [indiscernible] as
you were going through the analysis, I was wondering what would you not have done with the property
sample graph.
>> Sue Moon: Average path lengths. Oh, yeah, average path lengths. Actually, you can do samples,
right? You did 8,000 samples, right?
>>: Because [indiscernible] except for Twitter nobody is going to be able to study the complete graph.
>> Sue Moon: Right.
>>: So in terms of just developing methodology --
>> Sue Moon: Right. So typically when you have -- when we originally started the work, we were thinking
about like looking at degree correlation, clustering coefficient and these metrics that typically apply to
undirected graphs, right? And then we found out that we could not use it to Twitter graphs. So there are
other metrics that actually require the entire graph that we don't have good sampling methodology for.
>>: All kinds of metrics cannot compute sample graphs.
>> Sue Moon: Clustering?
>>: Clustering coefficient.
>> Sue Moon: Clustering coefficient.
>>: Looking at [indiscernible] closures, if you had two a third connected to one what's the likelihood you're
going to get a triangle the whole graph.
>> Sue Moon: Basically the network motif analysis you cannot do on sampled graphs and there are other
analyses that I didn't present today that require, that are not -- that for which there's no good known
sampling methodology.
>>: Like if you have some with very, very large degrees, your sampling algorithm somehow requires --?
>> Sue Moon: Well, there are sampling methodologies that take into account of the degree.
>>: But you have to know that apriori, right?
>> Sue Moon: No, no, you arrive at the node and then look at the node degree and then ->>: As work.
>> Sue Moon: As work, right. Metro ->>: If you pretend to be socialologist. Sociologists do this all the time.
>> Sue Moon: So just to wrap up, so I think from the computer scientist point of view, why am I interested?
And I'd like to find like systems questions. So one is just as Sherad said how would you actually support
the underlying network system. For example, Twitter is extremely slow. And I don't know how Twitter's
underlying system is like. But I hear that their original design was just so bad that they were having trouble
scaling up.
So they're doing a much worse job than Facebook. So apparently Facebook is doing a lot better. But this
kind of transaction, I think, is not something that -- I'm not a data person. Sorry, I don't know. But this
power law workload with very heavy tail than expected by the power law. This kind of workload, is it
something that database people are commonly using? Or not? So there's this workload aspect of it. And I
think there are other like computer system problems that come from social networking services. And that's
one -- and there are like other algorithmic challenges that either we or computational geometry people or
complex systems, stickle physicists all tackle, I think we should all tackle together. So there's these pots of
problems. I should stop. Thank you.
[applause]
Download