>> Scott Counts: We're excited to have Professor Noshir Contractor here with us today. Professor Contractor is the Jane S. & William J. White Professor of Behavioral Sciences in the School of Engineering, School of Communication and at the Kellogg School of Management at Northwestern. Prior to being at Northwestern, Nosh spent many years at UIUC affiliated with the NCSA, as well as a variety of departments there. Professor Contractor has published or presented over 250 papers and also coauthored a very influential book. And today he's going to be talking about sort of knowledge networks in the 21st century. And with that I welcome Professor Contractor. >> Noshir Contractor: Thank you. Thanks, Scott. It's a pleasure to be here. I was here for the first time at Microsoft for the faculty summit a few months ago. I guess it was June, July. It was around July that I was here. And so I'm glad to have a chance to come back a second time. In all these years, 20 years that I was at Illinois and then having after that at Northwestern for a couple of years I've had interactions with folks here, but actually never physically visited the place. So thank you, Scott, for inviting me and giving me this chance. And I'm looking forward to conversations with many of you here. I was going to start with that thing on the left there. I don't know whether it's visible to folks in the room and/or elsewhere, but it says SNiF, Social Networking in Fur. And it was -- this is an article that was written in Wired magazine, and the idea was to develop a collar. And it was at The Media Lab they came up with this technology. It was a collar for a dog, and the collar had like some kind of a USB drive and also like an IR infrared beam. And so you put it on the dog's collar and the dog would go play around and when it came across other dogs that had the same kind of collar, then through the IR it would exchange business card information about which dog the other person -- owner and name and all of that kind of good stuff. At the end of the day you take the dog's collar off and you would take the USB drive and put it into your computer and go on the Web and you would be able to see the network of your dog, all the dogs that your dog likes to hang out with. And you could click on them and say, gee, the dog really likes to hang out with this one, click on it and it will tell you all the information about it, plus it basically tells you -- it's a digital trace of your dog's social network. And so they call it Social Networking in Fur, and you could use it to set up dog dates and who knows where it would go from there. Wired magazine referred to it as moving from social networking to social petworking and, of course, sort of the geeky way of thinking about these things. And you may ask yourself why on earth am I starting out my talk about this. This actually was in the April issue of Wired magazine, and for a while I was convinced it was an April Fool's joke, and it was not. It was really technology that was built at The Media Lab. But before I even get to that, I wanted to also mention the second gizmo that some of you may have seen called the Lovegety. How many of you here are familiar with the Lovegety? You've heard of this? Okay. So this is an audience that actually gets the Lovegety. Probably most other places there's normally one or no one who's heard of it. So this was very popular in Japan. It's kind of like the Tamagotchi for those of you who may not have heard of it. It's kind of like the Tamagotchi virtual pet. But this is one that you would program with the kind of food, music, and movies that you like and the kind of food, music, and movies that you would like in a potential love partner. The text is directly taken from the Japanese Web site that was advertising this product. So take it for what it's worth. And essentially what happens is that anytime you walk around -- and I saw this in Osaka and Kyoto when I was there for a couple of conferences, all the teenage kids and so on with all the holsters of all the different devices that they have. This is one of them. And anytime they would look around to see whenever this thing was either beeping or flashing or buzzing. And if it was, then they look around to see who around them was also beeping or flashing or buzzing, and then that would be a way in which you could get a potential love match that would generate on this. You may be seeing a trend here in what I'm trying to get at here. When I talk about this, I teach this class on social networks at Northwestern and I have engineering students with my appointment there as well as students in communication. And all the engineering students, they just think this is a wonderful, wonderful idea. This is such a great application to be able to sort of network effectively and efficiently. And all of the communication students in class just don't get it. They just think why can't we just talk to one another and find about it, etc. And so what it does is it begins to address an issue that is at the heart of what I'm talking about, and there's two sort of takeaway messages right here. One is that while we have used technologies to, you know, communicate, collaborate, present through video and so on and so forth with anyone, anytime, anyplace, we're only now beginning to see how we would use technologies to be able to use the same technology to identify who it is that we may want to communicate, collaborate and connect with. So the example, the common theme between the Lovegety and the SNiF, well, both instances where you're thinking of how you would use the technology not to do the collaboration, but to identify who it is. The second takeaway message touches on the fact that when you have technology that allows you to connect with anyone anytime, anyplace, more so than ever before you now need to think about the social motivations for why you may choose to communicate with one person as opposed to another person. And so what you have now is a greater interest in what you saw by the Lovegety story that I mentioned was exactly that. So the communication students would have the same technology but they didn't necessarily want to use it. And so what would be the incentives that one needs to think about in order to use these kinds of tools in order to make the kinds of connections, etc. So of course I'm not going to talk any more about dogs or about romantic relationships; instead I'm going to focus more about social and knowledge networks in more general context in terms of knowledge work more generally. And the general question that I'm going to be asking in all these cases is why do we create, maintain, dissolve and reconstitute our communication and our knowledge network link. So what are the social technical incentives, costs associated with it, motivations associated with it that help us understand it. And the reason we want to do this is because when we want to design technologies such as recommend assistant, such as social networking, collaborative filtering and so on and so forth, right now I would argue it's being done by really smart people who have really smart insights but not really well founded in social science theory which actually has a lot to offer in terms of making these recommendations much more effective as well as recommendations -- I say two levels: one is recommendations that people will actually use, and then once they use it, that these recommendations are effective. So those are sort of two of the end-goal reasons why we may want to think about this. And I'm going to make an argument today that there are -- that this is a perfect-storm situation developed to do these kinds of things. We have really good social science theories that allow us to identify why someone may want to talk to somebody else. We have really good network analytic methods, in particular a set of statistical tools, some of which is actually being developed here at the University of Washington, set of tools called exponential random graph models, or p * techniques, that are really good in terms of being able to statistically analyze, make inferences and model these kinds of networks. And the third, of course, is it is nice to have these tools and it's nice to have all this theory, but where's the data. And obviously, as you know well out here, that more and more now we have the digital traces that allow us to be able to capture the kind of data that is necessary to do the modeling, to apply the theories, to be able to build smarter and better and more pervasive Lovegetys and SNiFs, etc. So that's sort of the overview of what I'm going to try to cover out here. I know this is fairly informal, so I can keep talking, but you should feel free to chime in anytime and ask questions along the way or challenges, etc., and I'm happy to field them as we go along. So in the book that Scott mentioned, my collaborator, Peter Monge, who is at the University of Southern California, and myself essentially did a review of all the literature on looking at communication networks from a social networks perspective. And we came up with a series of theories from the empirical literature as well as from the theoretical literature on why it is that we create, maintain, and dissolve communication and knowledge network links. And this is a one-slide summary of the book. So this is the CliffsNotes of the book. And I'll take a few minutes to go through these different theories, because it set the stage for some of the empirical examples that I'll talk about later. The title, if you'll notice, was called From Disasters to WoW. And so disasters, I'll talk about some examples of work we've done looking at Katrina, but also some other projects that we have looking at networks in these communities. Whoops -- is there a way -- someone out there should tell me there's a way to suppress those for a short period of time when I'm in the middle of a presentation. There you go. I can see someone who's ready to give me an answer on that. This would be the place to ask that question, wouldn't it? And the WoW -- how many of you are familiar with World of Warcraft? Of course. Of course everyone out here. So we'll have an example where we've done some analysis looking at networks with the World of Warcraft, but this example that I'll give really is a much smaller one. We are doing a lot more work now with -- we have a big grant from the National Science Foundation and one from the Army Research Institute to look at networks of interactions, transactions and -- actions, interactions, and transactions of all the digital data from EverQuest, which is like EverQuest II, which is like World of Warcraft. It's a Sony product -- how many of you are familiar with EverQuest II? Okay. So we have about 60 terabytes of data from them, and we are working on doing -- on doing that, so analyzing that we have some results from there as well, as well as we have data from Second Life about friending and grouping as well, and so we'll talk a little bit about some of those. So this is [inaudible] the digital day that we have here. Okay. So before that, the theory of [inaudible]. So why is it that I may want to create a link with someone? Well, I may want to create a link with Danielle because I have something that -- I want something that Danielle has, right, so that's the self-interest. It's the economic theory for networks, and simplified obviously, but the idea that I'm going to maximize my individual utility function. And if I'm going to create a link, it is because I'm getting something out of it that I would not get. So there's a cost-benefit analysis that I do, and it's based entirely on the fact that he has something that I want and therefore I want to get it from him. So that's the theory of self-interest, is obviously, as you can imagine, a theory that doesn't always work because there are several instances where I may -- that person may not return my calls, that person may not want to respond to me, etc. So it works in some cases. There's good evidence that it works, but at work along with other kinds of issues, and so by itself the theory of self-interest is not a valid theory that always operates in these contexts. A second theory that is a little more robust is that I want something from Scott and Scott wants something from me. So we set it up as a social exchange. So I'm trying to create a network link with Scott because I want something from him, but given that he wants something from me, there's a greater likelihood that that link will actually develop. So we have what would be all market interactions based on this idea, right, that there's an exchange happening or a social exchange or resource dependency. This applies not only at the individual level, it also allows at organizational levels and at international levels that you could have different entities that set up social exchange relationships. A third reason maybe that I want to create a link with Danny, and not because I want anything from him or he wants anything from me, but together we have a better shot of getting something from a third party. So that's the idea of collective action or of mutual interest. It's the way lobbies work, it's how you set up standards, it's why companies that may be competing in the software industry will engage in a precompetitive alliance where they'll come together and say we need to set some standards, we want to create a new market. It applies, as I said, at the individual level all the way up to, again, at higher levels of aggregation as well. And so that's sort of what is called as collective action. We come together not because I'm benefiting from the other person, but together we have a greater chance of benefitting from a third party. Theories of contagion. Now I'm on the right-hand side column. Theories of contagion says that the reason I want to create a link with someone is because everyone else is connected to that person. Right? So this is you getting infected by the idea. I'm looking around and I go, I'm connected to that person because everyone is connected. And this is actually something that has gotten a lot of currency lately. If you think about this notion of scale-free networks, how many of you are familiar with a concept of a scale-free network? There are a couple of you that are familiar. The basic idea of scale-free networks is -- think of it as the 80/20 rule. It says that 80 percent of the connections go to 20 percent of the nodes. Right? So some proportion like that. So you have some hubs in the network and, in fact, a lot of the work that Barabási and subsequently several generations of network researchers have shown that you see these kinds of scale-free networks everywhere; that you see this a lot. Certainly the transportation network is one of those that you think of O'Hare and Heathrow and Atlanta as big hubs that most of the flights go to a few nodes. And that's what happened. Now, in the case of individuals, also these things happen, that -- and in some ways there's some logic to it. If I'm a newcomer here at Microsoft, Christy -- is that -- am I pronouncing right? So Christy is starting as an intern here today. Now, she -- if she wants to build connections here and build a network here, it would be in her interest that if she's going to spend time talking to one person, that person should be a hub in the network because then she's two steps removed from everyone else in the network. So there is some sort of efficiency argument also that you could say that, you know, you may want to build a tie with the rich getting richer. Rich in terms of nodes and network connections will get richer as a result of this. Theories of balance says that the reason -- and, again, remind here, we're talking about why we create links with people, right? Another theory of balance says -- and this comes out of the old social psychology literature that says that I may want to create a link with Christy because I know Scott and Scott knows Christy. Right? So we like to be friends with friends of our friends. And there's a -- I mean, if just you look at the research -- I don't know if you had this experience, but on occasion I work with two people very closely who hate each other's guts, and that creates a big problem, right, because there's a tension that builds into it and you try to reduce that kind of dissonance. And this is the reverse of that. This is saying you like to be friends with friends of your friends, because it reduces dissonance, it makes things easy, etc. That's the good news. That's also the bad news. Because you don't learn a lot from friends of your friends. There's a lot of literature in social network analysis that says new ideas, new insights, new job opportunities is one of the studies that was done. In all of these areas you're more likely to find it out from people who are not friends with your friends --that is, people who are connected to you but are separate from your friends -and that these often tend to be weaker ties, but that's where novel information comes from. So if you think of theory of self-interest, that actually runs counter to theories of balance because it says that in theory of self-interest I'm much better off not being -- I need to sort of fill structural holes, and by that you mean you connect to people who are not connected to each other. And, again, there's good research evidence that shows that people who fill structural holes tend to be upwardly mobile in organizations, they tend to do well. The same thing is true at the organizational level as well. Theories of homophily is another aphorism that you've probably heard. It's birds of a feather flock together. You like to be with people like yourself. You know, so it's often the case that you'll go somewhere else and you'll end up sitting either with people who -even when you're at a conference, right, you'll end up sitting with people who are doing the same thing that you do, from maybe fellow employees from your own department, even from your own company, sometimes especially -- and I give talks in Asia and Beijing and so on, but I've given some talks -- just remarkable the amount of gender segregation that you see in it and it's not required, it's just -- it is emergent in a sense, but you see clusters of women sitting together and men sitting together in these events, etc. And, again, the reason you have this homophily, the sense of wanting to be with someone like yourself, is because there is a comfort level, you don't have to challenge your assumptions, you don't have to explain yourself. You get things done more effectively. So there is a certain efficiency argument why it works. Again, here, that's the good news and the bad news, because there's pretty good literature especially done in the organizational context that shows that heterogeneous groups tend to be -- have the ability to be far more creative than homogeneous groups. And so what you find, and it's a troubling finding because it looks like you are trying to trade off here, that homogeneous groups tend to be more satisfied, just not more creative, while heterogenous groups have the ability, I said, to be more creative. They also have a higher variance. So sometimes if they -- they can fall apart and fail spectacularly, but when they do succeed, they can be much more creative than homogeneous groups. So there's, again, a tendency left to our own devices that we would want to group up with people who are like ourselves, but then the creativity goes out because you can't get challenge of assumptions and all of the kinds of things that allows for creativity. Proximity is another big issue. The basic premise here is -- goes back to again literature that was done in the 1970s, the research that showed that if you're offices move from being, say, 15 feet apart to, say, 30 feet apart -- that is, twice the distance -- the communication actually falls to a quarter of what it was. So there's an exponential sort of decline in communication as you move away from one another. Now, I know you're in an interesting situation because just within the last year now you have all sort of been brought into one physical collocation, etc., so you have greater opportunities now for communication, and there's lots of people who have actually done studies of groups that were disparate and that came together and how it changed the dynamic of communication networks that happens within those contexts. You can make an argument that today in this day of the Web and so on that this is not quite as important any longer. That, you know, The Economist magazine, one of my favorite magazines, had this cover story a few years ago, and it was also a book subsequently. It was written by Cairncross, Frances Cairncross, said the death of distance, that distance doesn't matter any longer now that we have all these great technologies; it's something that we can do just as easily through electronic means, etc. Well, and it turns out, even though I like The Economist, this is one place where I think the economist got it wrong. Today technologies continue to be important in part because they enable face-to-face communication. They are not always a substitute. The assumption we make is that it's a substitute. The single biggest application of e-mail today used, not abused, is to set up face-to-face meetings. And so we don't think about those kinds of issues. When we did a study at Illinois looking at the AOL Buddy Lists that were there of people and we were, through a lucky accident, actually able to find out the geocoding of about 3,000 students of their own address but also the geocoded data for all of their buddies on those lists, etc. And what we found was if the death of distance hypothesis is true, then you would find essentially a uniform distribution based on distance, that anyone is likely to be on that buddy list. But in fact what we found is a very strong bimodal distribution, where, on the one hand, you have people who are really far away from you who you are more likely to connect through Skype or any of these buddy lists, but you also have a large number of people who are very proximate to you. These are people in the same buildings, on the same floor, sometimes in the same room that you use these tools to be able to communicate with, etc. So there is a strong proximity argument that I think continues to be important. In fact, in some of the results that I will skim through here, we have strong evidence of this both in Second Life as well as in EverQuest; that you have the opportunity to play and friend and trade and mail with people all over the world, and yet the likelihood of you actually engaging in these interactions is very, very related to proximity, and that even moving -and I have the exact numbers [inaudible], but moving from, say, 25, 50 kilometers to 500 kilometers apart, the likelihood, the probability of having those network links falls precipitously. This is a paper that I'm presenting later this week actually at the AAAS meetings in Chicago on digital traces of networks. And then finally the last one is theories of coevolution. A theory of coevolution idea is basically a mix of these ideas here. What is says is if you look at what happened in two industries historically, the Hollywood industry and the engineering construction industry, these are two industries that have been doing things that now we in the information sector are doing increasingly but still need to learn from that. And that is they typically come together to make a movie or to build a house. And it's a -- you know, it's a primordial soup of people that they deal with, they come together, they decide who you want to come together for that particular one, how do they choose the people. Well, obviously you want to make sure you have a screenwriter, an actor, and a costume designer and gaffer and all of those, but you decide which one of those you need, and you make this on the basis of a social network. And then at the end of that movie, depending on how the movie did and depending on how the dynamics work, you dissolve at the end of that movie, the network dissolves, only to be recreated now with a different configuration but clearly shaped and influenced by what the network was previously. So there are logics associated with why these people do these things. Now, where does the coevolution part come there? Different people are doing this all the time. Some movies are more successful than others. What happens is that people in any one of these groups that are coming together are looking at the logics that others are using. So if you have a successful movie, then you're looking to see how did that team get assembled: was it based on people from the same school, did they go internationally to find someone, etc. So you're looking at these logics and so what you see is a coevolution of the logics that are happening here. Some of the logics that are being applied in one place then get incorporated so you have these multiple motivations and the logics we use themselves are influenced by what is happening with other people's logics. Is this making sense so far? So this is really the one theory slide I appreciate you indulging me on, but, as you can see, there's a lot of different motivations for why we begin to create these network links, etc. This would [inaudible] great as a story as I've done it so far, but what is nice about this is that actually each of these theories has a unique structural signature. That is to say that if people are operating on the basis of self-interest, then we are likely to see certain kinds of local configurations in the network more so than you would expect by random. So in the simplest example in theories of self-interest, if there is a tie from A to B and a tie from B to D, there's a lower likelihood of a tie from A to D because it's not in your self-interest, as we discussed earlier, to connect with people who you're already connecting with, etc. Likewise, in the theory of exchange, it says that if there is already a tie that goes from A to F, then there is a positive likelihood that you'll also see a tie from F to A because of the social exchange issue that we talked about earlier. So bottom line is that you can think about each of the theories that we talked about and each of them have embodied in it certain local configurations that you would expect to see more than by random chance in any network that you're studying. So I'm happy -- if there are others in the room who wanted me to talk more statistically about this issue, I'll be happy to go into it. But in a simple way, what you are trying to do is you can take a network as you observe it and say okay, to what extent is this network -you could say -- you can estimate parameters that in essence tell you this network is driven 10 percent by self-interest, 22 percent by social exchange, 15 percent by homophily and so on and so forth. It's a bit of a caricature, but that's the idea here. And so once you are able to estimate that in a network, you also have the ability to use that information to shape the network, to turn the network, to rewire it and to think of the incentives that are necessary to make the network do something more effectively than what it is currently doing. So, again, as I said, these are some statistical techniques that allow you to do this. And, as I said, there's a group here at University of Washington that is actually on the forefront of developing many of these techniques and as well as people in Australia and Melbourne and there's another group in Groningen in The Netherlands and Oxford that are doing a lot of work in this area. So what you have is essentially a statistical macroscope. It's not a microscope because you're looking at the macro network that is being -- the emergent network that is being driven by these local logics that you may see within the network. Now, you may say, okay, you have all these different theories, and I've put them now on the left-hand side here, these are the rows of this table, but not all of them are always going to be good for what you want to do. And so part of what we've been trying to figure out is which ensemble of theories may be more useful in explaining networks and certain cases. So amongst the different groups that we've been studying in our research group, we've discovered at least five, and this is by no mean a complete list, but there are at least five different motivations that communities may have. They are not mutually exclusive, and certainly not exhaustive. One is exploring. Sometimes a group likes to explore new ideas, so the goal is any group that's involved with innovation, for example, is trying to create new ideas and so exploration becomes a very important part of what they're doing. As you can imagine, if your goal is exploration, then based on the theory of self-interest becomes very important because you're trying to be able to build links with people who are not connected to you because they're more likely to give you novel information, balance is less important, homophily and proximity are less important because people who are around you and who are similar to you are probably going to have the same information that you have. And if you're looking for novel information, you may need to reach outside those things. So what we're saying is these different theories may be more or less likely given certain kinds of goals; in this case, exploration. Another goal could be exploitation. Exploitation here means you want to exploit your existing resources as opposed to explore new resources. So you already have a team that has all the different talents that you need. And what you want to do is what kinds of networks, what kind of organizational structures will most effectively exploit the resources you already have in order to accomplish something. This is often the case where what you're looking for is not a disruptive innovation, which is something that comes out of exploration, but for incremental innovation. We are trying to -- you already got an idea that you want to build a particular kind of solid-state drive and you're trying to just find ways to do it cheaper, more accurately, faster, all of those kinds of incremental notions, and you want to exploit the existing resources. The third category is called -- is what we call mobilizing. And this is to say, you know, you're trying to set up something to mobilize for some big effort, so it's a planning effort, etc., how do you bring people together, what are the kinds of networks that are important to bring people together to do that. So certainly lobbying efforts would fall into this category. But also other examples of mobilizing may be when you're trying to agree on some kinds of standards. So obviously collective action is very strongly tied into this, but there are other things that you may also want. Yes. >>: [inaudible] so that the double dashes here I'm assuming you mean that, for example, it's minus. >> Noshir Contractor: Yeah. Yeah. >>: So the question I'm really asking is so mobilizing -- I wonder if, for example, if you wanted to set up a structure for mobilizing people but you knew, for example, almost by definition networks that are geared towards mobilizing are rotten or less good for those who are self-interested. >> Noshir Contractor: Right, right. >>: But there's an interesting tuning, then, to have the opportunity to do, because you're like, oh, well, I want to have people do my thing, and so if I support them in their efforts of self-interest, then they might be that much more likely to do my mobilization. >> Noshir Contractor: That's right. So that's really the next one, the collective action is the mutual interest. In other words, we are trying to align their self-interest with something that I want to do. So that's why you see the positive is for collective action and the negative is for self-interest. Self-interest by itself would not be useful. But if there is a way to align their self-interest with the mutual interest, then you have the positive thing. And that's exactly -- that's exactly the kind of tuning that one is thinking about in crafting these kinds of things so this is -- I just call this a meta theory because it's a theory about theories, which theories are more important in certain cases. Good question there. Bonding is another interesting one. And this is basically saying that the reason I want the goal of the community is really not necessarily just to explore, exploit, mobilize, etc., but it's to build trust. It's to do -- it's why people play golf and why Joe [inaudible] wrote this article I think it was in Wired a few years ago that said World of Warcraft is the 21st century version of golf, that people play World of Warcraft not really to go around killing monsters and going on quests, but it's really just to build bonding, to build trust, etc. And so that's an example of a community where one of the goals at least is bonding by itself. And then finally swarming. The notion of swarming really came to us because we've been doing a lot of work in the area of disaster response and emergency response, etc. And in those kind of situations, one thing unique about it, and you can think of it as a mobilizing effort, but it's mobilizing with a twist, because rapid response is a very important part of the it. So most of the time you have a network that is pretty latent, which is pretty dormant, and then when you need to get into action very rapidly that network has to swing into action to take care of certain things, and then very quickly after that we do have to dissolve and go back into that latent state. So if you think of emergency response, there's sort of three aspects to it: there's a preparedness, there's a response, and then there's a recovery. And so when you need to look at all of these aspects of the networks that go into that get done differently, it's not just in disaster response that you have swarming. Another place where you have swarming is actually in World of Warcraft. For those of you what have played these games or MMOs generally where there's a lot of planning that goes into going on some kind of -- in your guild to go in and attack. And then when it actually happens, you know, there are people everywhere who will say, Show up exactly at 7 a.m. or 7 p.m., depending on whatever time zone they're in, and then they all very quickly swing into attack. And very often so they themselves have to do a rapid attack, and then the response to that also has to be very rapid. So the kind of logic that people use, you may know somebody really well, but if you're being attacked, you may be willing to trade off on certain things that you know because you need somebody who is either close by in proximity becomes important or people who are experts in a particular area or people who will have the interest to save themselves, the right to protect themselves. And so collective action, mutual interest becomes more important in those cases. So this is sort of the theory on theories about this. Now, we are very fortunate in our lab that we have a lot of different projects that actually give us an opportunity to look at this phenomena across these different exploration, exploitation, mobilizing bonding, etc. So a lot of work we have is in the area of science and engineering applications. This is just a handful of some of the applications we're looking at which is, you know, understanding and enabling virtual communities in emergency response [inaudible] a lot of work with NIH looking at tobacco research networks. These are networks of people who are involved in tobacco research. I'll end with a demo of that. We're also working with people -- with NIH looking at obesity networks. So this is not networks of people who are obese or get obese by talking to other obese people, this is the researchers who are studying it. NIH and NSF and a lot of our funding agencies are spending huge amounts of investment in these research communities. But because research communities are not well networked with each other, they're not getting good return on those investments. So they have a real vested interest to do that. In the business context, we're doing a lot of work with companies like Procter & Gamble, with Kraft, where, for example, in Kraft -- this is no trade secret, but over the last few years they have had an obscenely large number of startup items that they came up, food products that they designed that fell flat. And many of them actually say that having up to 50 or 60 percent failure rate is not a bad thing because it encourages a culture of trying something risky and having it fail, but when that number starts getting into the 80 percent range, then it becomes a little more troubling. And part of it is because the people who are coming up with these ideas are working with -- are not well networked with people after the product fails. Everyone says, well, if you'd only talked to those kinds of people before, both within Kraft and outside Kraft, you could have pre-empted the failure or you could have actually tuned the product so that it was more likely to succeed in the first place. So those are examples that we are talking about in that context. I already mentioned in the entertainment context it's a terrific way to understand the motivations for creating links in places like World of Warcraft and places like EverQuest and places like Second Life, but there's also an interest on their part, on the people who are developing these networks, which is one of the reasons they give us the data in the first place, is because they want to see can the networks tell us, the network patterns of an individual tell us how likely it is that we retain a user. Because, remember, most of these are for-fee subscribers. People are subscribing to this and they're paying a certain amount of money to the companies that own these things. So if there are network signatures that say here is an early detection that this guy is about to stop being a subscriber, then that's a good way for them to be able to do retention. It's also a good way for them to see if there's someone else who is well positioned in the network who they can recommend talks to this person and incentivized them to have a connection in order to have the retention. Another part of this is that there's also increasing amount of sort of nefarious network behavior that happens within these contexts. So in Second Life, for example, you may have heard the stories that it's being used as a way of channelling laundering money which doesn't go to the regular financial system but instead what happens is that people are using Second Life Linden dollars as a way of sending money into from one person to another but not sending it through the regular financial system. You buy some -- you take money, you buy Linden dollars, then in the game you may sell a cap or a virtual cap or something to someone else for an exorbitant amount of money, and now that person can take those Linden dollars and turn it back again into cash at their end. So you have these techniques and if there are structural signatures associated with those kinds of networks, then you can look at the trade networks and identify potential problems within this context. And then identifiably we've been doing a lot of work with societal justice applications. As you can imagine, there are large numbers that -- two that come to mind here. One is in the area of immigrant communities and the kinds of networks they have and why the networks that they have are often not what the government assumes these networks are. The government puts a lot of money to help in terms of education and medical and job opportunities for immigrants, but they focus it on the assumption that the immigrant community are going to go to certain organizations and to connect with them. It turns out that for a variety of reasons, including their immigrant status, that's not where they go. And so understanding these networks and being able to see what would be a wise investment of funds in order to really help the immigrant community, it's a project that we are doing in collaboration with the Field Museum in Chicago. And then the same thing applies, the MacArthur Foundation has given us a grant to look at -- they are funding a lot of work in the area of digital media and learning, especially activist kind of work within digital media and advocacy kind of work in that area. And they want to see what is the network of people who are involved in this and how could they make this network more effective. So that's sort of that aspect of it. And all of these, obviously, the core research areas, what are the social drivers for creating and sustaining these communities. So we take many of these communities that I talked about, now I've made those as the rows on this. The columns are the same ones we talked about before, exploring, exploiting, mobilizing, bonding and swarming. We find that each of these communities have some of these goals, and so our goal here is then to be able to say okay, now we can start modeling and seeing whether the communities that we know which theories are associated with the goals. So we have three things here going on. We have the actual context. We have what the goals of the context are, whether it's World of Warcraft or emergency response. We know what the goals of the community are; we know for certain goals of the community what theories are more important. And so taking that together we are now able to go in there and see the extent to which those theories are really driving the community given what the goals of the community are, etc. All of this is great in principle until you hit the point that you need to have some data to be able to do any of this kind of stuff, which of course is where a lot of what we are now talking about comes in. I like to think of this is as -- I like to think about what we have here is a multidimensional network. We have multiple types of nodes and multiple types of relationships. You think of this as a fishnet. You can pick up any node. It could be a person, it could be a document, it could be a dataset, it could be a concept, and you want to see all the other nodes that it's connected to and how it may be connected to them. You may also want to see given where you are, you are one of these nodes in this network, what are the best three ways in which you can get to it. So think of this as -- and this is demo that I'll close with here, is that if I'm interested in a particular topic, show me the ten most important people, the ten most relevant documents, the ten most relevant related concepts, and not only show those to me, but then rank order them weighting it both on that person's relevance or that document's relevance, etc., but also the likelihood that I may be able to -- how I'm connected to that person. So, you know, what degrees of separation or even more sophisticated ways looking at social incentives of how I may connect to that person. And so that's -- to me, that's the end goal of a lot of the work we're doing. So understanding these social incentives will allow us to rank order. There are three experts on a topic. The person that would be recommended most highly to Scott may not be the same person who would be recommended highly to me because it's taking into account not only the expertise of the person, but Scott's network and how Scott may be directly or indirectly connected to this person. And so that's why the weighting depends on both the identification of the expert and then the selection through the network. So obviously it comes down to being all about relational metadata. So I think about social networks as really relational metadata. Right? Who is connected to what. And you can think about all the different ways in which we are already harvesting this information. We have technologies to capture this, we have technologies to tag these communities, we have technologies to manifest it, visualization is a big part of it, but analytics is again a very large part of it as well in being able to look at the visualization. In some cases what we do is we actually do the analytics first and they tell us, okay, this is interesting, then I can grill down into looking at that so that that visualization would help me understand that better. And so we obviously have many, many ways in which we've been doing this, some of the ways in which we do it is we just look at text mining tools that allow us to extract entities and to extract relations amongst entities. So inferring relations amongst entities is one way to do this. We have Web crawling tools that allow us to go on the Web and be able to see who's linked to what, who are linked to by common other Web sites, etc., so there's all kinds of tools that allow us to do that. And then finally the Web with science citation is a wonderful opportunity, as are also similar things from ACM [inaudible], from the ACM Portals, etc., but that has a lot of information of who's coauthoring with whom, who's citing whom, who is -- and so on. And co-citations, etc. So there are all these different ways that we have taken, and one of the things that we've been doing in our lab is we built a cyberinfrastructure for inquiring knowledge networks on the Web, so it's CIKNOW, and the "I know" of course is trying to answer questions like who knows who, who knows what, who knows who knows who, and who know who knows what, which is at the heart of a lot of what we're trying to do in this context. So we've already talked about this. I'm going to start with one on emergency response to show how these automated tools work. And that's -- I'll start with an example from Hurricane Katrina. And so really the rest of it is all little results and demos. And we'll keep an eye on time and we can move things around depending on how much time we have here. So what happened in Katrina is that the day Katrina happened one of my colleagues at University of California Irvine, Carter Butts, he did something very smart. He was -he'd already been working in an emergency response project funded by the NSF; so was I on a separate project. But he had learned through the project that he was working on that whenever there is an emergency response, there are these things called situation reports that have to be generated by any local, state and federal agency on a daily basis during an emergency. So what he did was he went and began to download every day all the local, state and federal sit reps as they're called. They're called sit reps for short, situation report. And it's a basic format. It's unstructured text. This is what it looks like for the Colorado Division of Emergency Management. This was what the sit rep looked like on the 30th of August in 2005. Tells, you know, which organizations you're working with, what kinds of things you're planning to do, what kinds of things you have done, etc. What he did and what he was doing was he had undergraduate students at UC Irvine look at this and code which organizations were connected to which other organizations. Because especially during an emergency, but even elsewhere, getting people to tell you about their social network data, trust me, is really, really challenging. So obviously there's a real enticement. In that case, you know, he had a bunch of undergraduates who could do it. Meanwhile, what we were doing was at NCSA, so this was a human coding procedure, you know, putting basically HTML tags, but at the same time at NCSA, National Center For Supercomputing Applications, in -- at Illinois, we were using a tool called D2K, which is an automated -- essentially it's an entity extraction tool along with relationships amongst entities. And it needed some training. It doesn't work right out of the box. You needed to train it about what different acronyms mean in this business. And so fortunately on the FEMA homepage they actually have a list of all the acronyms. They call it TLA, three-letter acronyms. And it's a list of all the acronyms that are in the emergency response. But after it was trained, it actually works quite well. So what we did was we were able to generate just networks based on that. So on the first day we did time slices. This was done automatically, right? So it could be in principle be done in real time once we develop these technologies and it's a way of mapping what's happening within the network. Well, how well did it work? Well, as you can see from this, this is a multidimensional network. Some of the nodes are blue because they represent people, like Governor Bush of Florida; some of them are red, they represent organizations, Salvation Army and the American Red Cross on the top; some of them are keywords or concepts like shelter; some of them are physical locations like Louisiana, New Orleans, Alabama, Texas in green. And what you see is that Florida is a topic of conversation because we don't -- may not remember this now, but really what happened was that Katrina first hit Florida before moving up to New Orleans. And so as you can imagine at that point Florida was quite central within this network. The other thing that we noticed here, and it's not very clear from this picture, is that there was a very strong network that existed amongst the people who were involved in the petroleum issue. So that group was really well networked together. They were exchanging a lot of information, they were well prepared for it. I gave a talk similar to this one to a group of executives, high-level executives at Exxon. And one of them who was sitting there, just he went into his bag and he took out this big binder. And he says, this is why. This is one of the things all of us keep with us at all times is a set of procedures to anticipate in times of any kind of disaster, manmade or natural. And so they were really well prepared. And they worked well with the government agencies within that area. Not so well prepared were in the top was around the shelter, both human shelters and animal shelters. The networks were really sparse and really disconnected. So in retrospect we know that that part of the network that -- you know, we didn't do a good job in that area. We did a much better job in terms of energy issues, and it can probably make an argument, a financial argument about why that was the case. But this was a good way in almost close to real time to see a dashboard of where the challenges would be within the network. I'm just going to run through the slices here. As you change slices, different organizations move to different places and some become more important than others within the context, etc. The one I want to touch on here is a score -- whoops. Something went wrong with the image there. But essentially -- yes. >>: [inaudible] maybe I missed it, but how did you get -- what kind of data was that? How did you ->> Noshir Contractor: So the data was the sit reps, the situation reports, right, so each of those -- that was the data. So it was digitally available, it was on the Web, Carter had downloaded it, so that became and we had it time stamped by organization name. And so what we did is we piped that data into the D2K and extracted the entities from that. So what you'll notice here is that we just tracked a very simple network metric called betweenness, which is the extent to which an organization is brokering or connecting other organizations that are not connected together. And what we found is that over the time slices we track two organizations, the American Red Cross in red and the green is the FEMA. And the lower number means it's a higher rank in terms of betweenness. So obviously you would want FEMA to be between a lot of other organizations to serve as the center. And so it had a pretty low rank, just as you would expect, and that's why on the lower left corner, green, you see FEMA down there in terms of the network. But suddenly something happened. As you got to time slice 5, which was about ten days into it, all of a sudden the American Red Cross moved higher in rank and FEMA was actually drifting lower in rank, a higher value for the rank. Now, in rank respect -- yes. >>: This is from the sit reps taken from the FEMA Web site, right? >> Noshir Contractor: From all the Web sites. So yeah, yeah. So not just from FEMA. So each organization is supposed to go and put it up there. Right? So it's not just FEMA. Right? It's every organization. Yes. >>: And so that's a really -- that's in a major drop or increase to [inaudible]. >> Noshir Contractor: Yeah. >>: If you want to look at it from 4 to 5. >> Noshir Contractor: Yeah. >>: Something happened I assume. >> Noshir Contractor: Yeah. Yes. And so we know, because we've seen the studies that were done in Katrina, we know that there was a major crisis in leadership that FEMA faced that point, and that's when the American Red Cross began to step in there and decide, you know, we can figure out how to do these things later on. And so if you talk to people who were involved in this, they all say that's spot on. And it was really -- it was really interesting to see that we now have the tools that allow us to be able to track these in close to real time and be able to do this -- it becomes an enabling tool for people to look at a network. And these are not even any fancy theoretical models. This is just looking at the basis of just the betweenness, so it's a descriptive measure right here that gives us something very revealing about what happened within this network. Is this making sense here? Okay. How are we doing on time? So I have other examples that I can talk about. I'll talk about this one. It's focusing on knowledge networks more generally, and then it sort of sets the stage for a couple of -the WoW example as well. So we did a study that was funded by the National Science Foundation, which is now -the study is concluded. But we looked at about three dozen different organizations around the world, some in government, some private sectors, some international, some domestic, and we -- one of our companies where we got data from was Boeing here in Seattle. And so what we were doing there was looking at small knowledge networks, small groups that existed within these organizations, and asking them a very simple question: When you need information on topic X, assuming topic X was relevant to the group, if you need information on topic X, who do you go to. And separately we also asked a bunch of other questions about their colleagues including who's knowledgeable about topic X. And as you can imagine, you don't always go to the person you think is knowledgeable for a variety of reasons. Right? There's an article -- if you're interested more in this, there's an article in Harvard Business Review from a couple of years ago titled Competent Jerks and Lovable Fools, and it tells you everything you want to know just by the title about how you sometimes go to people who may not be the expert on a particular topic [inaudible]. So that's what we were trying to do. So the rational argument is what is called in social psychology as theory of transactive memory. It says I go to someone because that person is knowledgeable, makes a lot of sense, but I also grow that person's expertise. So in your own workgroup, you say I'm not going to learn everything, but I know somebody in my group is really an expert on info [inaudible] or somebody in my group is an expert in social network analysis. I will go to that person, but when I find an article on this topic, I will feed it to that person so that they can grow -- I can grow their expertise on this topic, etc. So that's why you communicate, to allocate information; that is, when you come across something, you send it off to them to allocate it so that they become the expert. You're growing their expertise. Another one is social exchange. I go to somebody -- again, these are the theories that we talked about. I go to somebody because I -- they came to me for something else, so they owe me one. So it's an "owe me one" sort of exchange argument. Another is proximity. The office was next door, that's why I went to that person. Not necessarily because that person is an expert. Inertia, I worked with them in the past, I've collaborated with them in the past, it's easy, I have a good bond and relationship with them, that's why I'm going to go to them. And then finally we have this called public goods and transactive memory, but the idea here is focusing on the intranet. Remember, the nodes of the network don't have to be people, they could be knowledge repositories. And so the argument for knowledge management has been for a long time if people all put this stuff on the intranet, I don't have to go bug them. So the more knowledgeable person is putting things on the intranet, the less likely I am to go to that person. And we ask them questions about who was -- you know, how often do you go to the intranet for each of these topics, who do you think puts stuff on the intranet on these topics, how much do you put on it, etc. So we had that network questions about the intranet. Well, what did we find? The significant findings here, these were the only significant configurations that explain why you retrieve information. To look at these numbers, .5 is your default. So a random chance, like the flip of a coin that you would go to someone. So a value higher than .5 is positive, the value below .5 is negative. Just because you socially communicate with someone, that doesn't mean you go to them, because that's .144, well below .5. The second one says that you communicate -- if that transaction memory does work. If I think someone is knowledgeable and I'm allocating information to them, then the coefficient was .995. A third one says that if I think someone is knowledgeable and provision means they are putting stuff on the intranet in this case, the coefficient was positive which says that I would go more to them not less to them. Right? So this of course when we -- and this was a consistent robust finding across many -- all of the groups that we had. And in retrospect, when you went and talked to the groups, they said, well, of course, because what happens is that the more someone puts on the Web, the more likely it is that I know what she or he is knowledgeable about. And the second thing is that the more this person puts on the Web, the more this person is signaling to me that they're willing to share their information with others. But at the same time, I'm not going to be able to go talk to them -- I'll still want to go talk to them personally about it, because otherwise I don't know where on their Web site to go find the stuff that I want. Yes. >>: I just sort of channel Jonathan Grudin who couldn't be here today. >> Noshir Contractor: Uh-huh. >>: So he's been doing a lot of work in enterprises and sort of around that topic of, you know, how do you get that tacit knowledge in, you know, on a wiki or whatever, whatever it is. And actually one of the things that's come out of a lot of the qualitative work he's done if that people go, well, I have the tacit knowledge, why would I give it up, that makes me valuable. >> Noshir Contractor: Right. Sure. >>: And so this -- if you can show them this number that's kind of the opposite, right, it's like saying, no, no, no, I actually put it out there, and that will make you even more valuable. >> Noshir Contractor: That's true. That's a good point. And actually I'll come back to that in a second. Because I think there is something interesting here that I'm going to touch on on a nonsignificant finding that I think is significant substantively given what you just said about Jonathan's studies. So bookmark that just for a second here. So just to wrap up the rest of these, the perception of knowledge, if I think someone is knowledgeable and that person had come to me for help on a topic and I have a social tie, then I'm very likely to retrieve information from that person. And then the final one is if I think someone is knowledgeable, their office is next door and I go do social things with them, then I'm more likely to retrieve information. Now, to come back to the point you made a couple of minutes ago here, two nonfindings that are quite interesting, perception of knowledge by itself is not one of the significant predictors. So people who set up expertise directories, right, and say if only Microsoft knows what Microsoft knows, right, by itself that that does not necessarily work here, because unless you give people some other incentive or some other mechanism, my knowing that somebody is an expert is not more likely to make me go to that person. And that's for the same reasons that you were talking about, that, you know, people are not willing to give it. But there are other ways, other incentives that make it possible for me to get the value, the tacit knowledge that resides in your head. The other one is that social communication by itself was a negative tie, right, since you don't -- it says how does that make sense that you don't go to somebody who you're socially well connected with? Yes, but notice these ones here. Social communication shows up here as a -- essentially like an interaction effect. So what it's saying is that social communication by itself may not be enough, but it serves as a very key lubricant, especially when you are trying to get to people who are knowledgeable. And I certainly know this, that I -- there are friends of mine in social network analysis who are very, very knowledgeable people and a lot of the other people that I know in my network don't want to go to them directly and ask them for help. And the reason I feel very comfortable going to these people is because I have built over the last several years, decades now, a really strong social tie with at least one particular person who is very knowledgeable in this area. But because I built that social tie, that makes it a lot easy. Plus, I mean, this really resonates because yes, this person comes to me for other things, and this person was proximate to me when I was at Illinois. And so all of these explanations helps -- it did resonate in terms of my own experiences even though in this case it was nice to have solid, quantitative data on this topic. So which one should I go to next. Okay. Let me touch on -- I'll touch very quickly on the Second Life data. And the idea here again is we get this data from Second Life and we say, you know, who -- so we got all the data from Second Life, which we'll talk about here, and we want to test three theories: To what extent are friendship networks within this based on transitivity, being friends with friends of your friends; homophily, that is friends with the same age and same gender; and friends who live nearby, which is the proximity. So this was sort of the basic idea. Friendship in Second Life, as you can imagine, is quite volatile. This is 6.4 million users, 8 million friendships from April 2001 to September 2007. But a vast, vast majority of those friendships don't last very long. Right? So the number of days in Second Life falls very precipitously if you think about longevity of friendships within it. So this is just [inaudible] this is another way to look at that. I'm not going to have time to go into that. But this study, what we looked in particular, is something called a teen grid. I don't know how many of you are familiar with it, but within Second Life there is a teen grid where the only people allowed there are teens. And other people -- in fact, some people joke there is more researchers in teen grid than there are teens in teen grid because there's so many people who are interested in looking at that. So it's an international gathering place. The active players in the second quarter in 2007, so we just took a slice of data, there were about 2,500 users and about 22,000 friendships amongst these users, etc. So that became the data that I'm going to talk about here. We had a bunch of hypotheses about adolescents, online friendships are not just random networks, that real world geographically is positively associated with it, time spent online is positively associated. We call that digital proximity. If you are there, you're more likely to talk to other people. Geographic proximity was the first one. And the last one we call temporal proximity. And this was interesting. Because it says if you joined at about the same time, you're more likely to be friends. For those of you who are in Second Life or have been on Second Life, you'll know that they have this sort of orientation island where when you join you essentially get put with a cohort of people, so there is a cohort effect that comes in. We call that a temporal proximity then. Basically this is the data. So we had the users, they came from a thousand cities, 48 countries. The mean age was 15. They had different types of accounts in it. And this is what the network looks like. This is sort of based on geography. The big cluster on the left is the U.S. obviously, and then you have other countries like Puerto Rico and Canada and the U.K. and Australia and South Korea. And those are the some of the peripheral nodes that you see within the network. So this is a very high-level picture of what was going on there. That sort of touches on the geography. This is also colored by male and female, so you see some of that stuff. So I'm just going to give you the basic results that we found out there. And I'm not going to do the numbers here. But the basic results was friendships were not random. So all the hype that we have about these worlds are so scary because people are creating random links with other people, etc., we didn't find evidence of that. There's a lot of systemicity to the kinds of friendship patterns that you see out there. The second one is geographic proximity was very positively associated with friendship formation. Digital proximity was positively associated with the temporal proximity people who joined at the same time, was positively associated with friendship formation. Age was not. I'm sorry, age -- go ahead. You want to ask me a question? >>: Yeah. So geographic [inaudible] at what scale, like my neighborhood or my country? >> Noshir Contractor: I'll come to that. We have some slides that talk about that. No. This was based on latitude and longitude. Did I -- maybe in my hurry to skip slides I didn't show that. But this was based -- and I have another slide that I'll talk about. These were the measures that we used. So what we did was digital proximity, we had two measures based on latitude and longitude, but we also did a log scale of that so that -- okay. So that's how we did that. And I have another one that I'll touch on there. So basically the one that was French -higher-status individuals are more likely to form friendships, so that was sort of the scale-free idea that the rich get richer within this. They tended to be balanced. And age homophily was there but was not very strong. So, in other words, there wasn't this strong tendency of people wanting to be with others of the same age as them. But they were still there. It was significant, just not very strong. Somebody had a question? Yes. >>: [inaudible] for clarification [inaudible] so by higher status, you define that as people with a lot of friends. >> Noshir Contractor: Yes. In this case. Yes. Right. >>: [inaudible] more outgoing people tend to make more friends because presumably everyone starts Second Life with zero friends? >> Noshir Contractor: Right. That's exactly right. Yeah. And so you have to aggressively make that decision. That's exactly right. Yeah. That's right. Thank you. So basically, you know, the kinds of friendship characteristics [inaudible] virtual world was actually very similar to what we see in real-world studies that have been done on friendship networks, especially amongst -- there's a lot of studies that look at friendship networks in high schools. So the same kind of demographic. And you found actually very similar patterns. And so in some ways, the good news and bad news here is that if you -- this becomes -once you get these kinds of findings, it becomes a good exploratory area to do a lot of studies that you want to look at that you want to try to understand in the real world but you may not be able to manipulate quite as carefully as you can in this world, so this becomes a good environment. The bad news is you don't want to necessarily take these kinds of studies and then try to use it to understand disaster responses. Because I know there are a lot of people who are trying to develop similar kinds of efforts to study how people would respond to a pandemic, and it's not clear so far at least that how people respond in these worlds would be similar to what may happen in a real-world situation. So that's a word of caution there. I'm trying to rush through these examples here. I'll give you one from World of Warcraft since all of you are familiar with it. You don't need to see all these slides. Here's what we did in World of Warcraft. In World of Warcraft we studied groups, and this was not digitally collected data. This was before that. This was actually we surveyed people and got them, got this data from them. And what we did was -- these are the different groups that we went to, different guilds that we went to. And it was the same question, it was a replication of what we did in places like Boeing and the other companies and the government agencies that I was mentioning earlier, and that is we said, you know, for your group, what are the five or six areas of expertise that you need and who do you go to and same -- essentially the same study. But here because we were doing it online and we had a group of people that were quite willing to participate in this study, we were actually able to get longitudinal data, so we had data at time 1, time 2 and time 3. And this is what the retrieval looks like at each of these areas. The size of the nodes refers to who was the expert, so you'd expect a large blue node there, for example, to be getting a lot of links which say yeah, you're going to the expert. But you also have situations where you may have a lot of links that may not go to a red node, not quite as many going to this red node, so that's telling you that there's an expert and many people are not going to this expert. The reason why you have -- so that's the difference between the -- the reason why you have some are red and some are blue is we did a control condition while the blue was just networks of people who played World of Warcraft without using audio, the red was people who played World of Warcraft and were using audio. So we were trying to do a study of the difference between network retrieval patterns between blue and red. I'm not going to talk about that part of the study here. But, again, we did the same -- you know, we contextualized the goals of the network -- yes. >>: [inaudible] can you actually just talk briefly about that? Because that's actually pretty interesting [inaudible]. >> Noshir Contractor: Sure. So what we did in that case -- let me give you the results, and then I will tell you how that will change. So this was the results across the two. So first of all they were not significant differences -- substantially significant differences between the audio and the nonaudio group. Okay. So what I'm going to show you here is across all groups, audio and nonaudio groups. But then I will talk about one thing that was -- after giving you [inaudible] I'll tell you what was a difference in the audio versus nonaudio group. So basically what we are doing here is saying we can now -- because we have longitudinal data there is a set of tools that have been developed at Groningen in The Netherlands by Tom Snyder's group that allows us to longitudinally see the likelihood that you will create a link with someone based upon the different theories that we had here. So one says that, again, like in Second Life, there's no likelihood of you creating a random link with someone. So minus 1.5 is the like -- or the cost associated with just randomly creating a tie or going randomly to somebody for help in World of Warcraft. The second one says but there is a .55 benefit of reciprocating. So if that person came to you for help, the social exchange hypothesis, you are indeed more likely to go to that person. So there is -- and remember that -- so reciprocity is going to incentivize you going to that person. The third one says there's again a positive benefit of being a friend of a friend of that person. So that I'm more likely to go to Scott if I'm friends with Danielle and Danielle is friends with Scott or if I'm friends with someone else who is friends with Scott, etc. The more of those that I have, each of them will account for .89. And then, finally, in this case we found -- remember the previous studies that we did we said there was no significant effect for expertise by itself. In this case we found a significant effect, statistically significant effect, but substantively that value is .04. So that is the extent that you're going to somebody who's an expert, .04 is the contribution of you thinking that person is the expert. Most of it is based on network ties on friends of friends and exchange, etc. So that's quite interesting. Now how was this different in the audio group? In the audio group, these became even more important, that these values became even more important than this one. This remained pretty much the same. But if you did the same, they got the same kind of results, but those coefficients became much more important, which basically meant that audio was allowing you to -- was facilitating the friendship ties with -- the exchange ties and the transitive [inaudible], etc. So these two became more important, this remained about the same, maybe slightly less in importance, etc. >>: So back on the graphic where you had the networks, there was one node in particular that looked kind of weird to me. So down in the lower right-hand corner. >> Noshir Contractor: Uh-huh. This one. >>: All the -- it looks like -- and maybe this is just an issue of the size of that circle makes those areas look like there's a hell of a lot of them. >> Noshir Contractor: Yes. >>: It looks like that person isn't expert really at all and yet everyone wants to speak to them. >> Noshir Contractor: Yes. >>: What's the hypothesis there about what's happening? >> Noshir Contractor: Well, this happens a lot, by the way, when we've looked at these kinds of networks a lot. One of the ways in which this kind of happens a lot ->>: [inaudible] >> Noshir Contractor: Well, maybe. But also what happens is sometimes all of these things can be explained by one tie between this person and an expert. So what happens is this division of labor. >>: [inaudible] >> Noshir Contractor: Beg your pardon? >>: Is there admin basically? >> Noshir Contractor: Well, it could be an admin or it could be somebody who is much more accessible and she or he is tying to the expert. But then they become the one who then spreads the word around to other people a lot. So we see these dyadic situations quite often where you have -- another version of this, this is only showing retrieval, but another kind of these binary stars that we see is one person -- everyone allocates information to one person. But then they go retrieve information from a different person. And the reason is that the person they allocate to is obviously inaccessible, but they're still [inaudible] that person's expertise and they know that the person they're retrieving information from is really serving as a channel to that person. So those are the kinds of things that we have seen that may explain this kind of. >>: Do you trap -- and, you know, we don't see it reflected here, but do you trap like the scale or the frequency or something of the relationships ->> Noshir Contractor: Yes. >>: -- between -- so like you might be able to then hypothesize or then prove that yeah, that little bitty dot is in fact empirically connected. >> Noshir Contractor: That's right. In this graphic, you can't, but again in the study that I'm not going to talk about today, we actually are able to show the strength of the tie. It's been a very strong -- and this is why I call it the binary star, because you have two of them that really have a strong tie to each other, everyone allocates to one and retreats from the other. That's exactly right. Yep. >>: [inaudible] high level here, I mean curious what you think. So in these sort of expertise finding systems, you have continually showing just the expertise is really not much of a predictor. Is -- you know, but we've never had the tools before to really know who the experts are. We're just doing the same things we've always done. >> Noshir Contractor: Right. >>: Is it fundamental to who we are or is it just that, you know, over time once we have the tools [inaudible]? >> Noshir Contractor: Well, that's a good question, and there is some evidence that that by itself, that it's not just -- it is fundamental to us, is what I'm saying. And here is the thing. There are a lot of companies that have created these expertise directories. And, again, I've talked to people at Raytheon at some length about the expertise directory that they have there and there are Accenture and others, IBM has created these. And many of them will look at these findings and say of course, that explains why these things by themselves don't work. Right? At least I don't have sort of solid empirical evidence, but at least having spoken to these people, many of them have not succeeded. And I think this is part of the reason why. So there is a way to fix that problem. And so I know that I'm really slow on time, so I'm just going to go ahead and -- let me see if there's something in particular. This is the EverQuest. I don't think I'm going to have much time. I think there's a slide that talks about the distance between players. So this is how we did the log scale. Right? So the log scale idea here is the same kind of data we collected from EverQuest that we did from Second Life, but here we look at the scale -- the amount -- this is just a -- this is a log scale just to show what is a probability that we hypothesize the communication as you move. So, you know, most of it is really close by. So it's like within 15, 20 kilometers as opposed to not -- that's the effect that we wanted to focus on. And I think on one of these slides we had -- so this is the same kind of data -- essentially the same kind of results. This is how we computed the distance, the geographical distance using latitude and longitude. So that's the cosine function that we did there. And the short was 50 kilometers, the intermediate was up to 800, and the long was beyond 800. So this was in response to the question, Scott, that you asked earlier. And again we basically found the results -- I think there is one particular that I wanted to focus on here. This one. That in this case we also had time zone difference and that individuals in the same time zone are 2.5 times more likely to be partners than individuals with one hour difference. But this is the one I wanted to focus on. Individuals 10 kilometers away from each other are five times more likely to be partners than the ones who were a hundred kilometers away from each other. So you see a pretty strong effect of distance within this particular context. >>: [inaudible] which is the tail and which is the dog? Like, for example, among my Facebook friends, the vast majority of those people are not people that I met from the Internet, they're people that I know in real life and I just carried them into that [inaudible]. >> Noshir Contractor: That's it. >>: [inaudible] >> Noshir Contractor: That's it. And so this is exactly what we are hypothesizing here. We don't have -- you know, you can embellish this with qualitative studies, etc., but the reason we are seeing this proximity effect, we argue, is because you are dragging your offline world into your online world. Which really defies the sort of hype around the fact that, you know, you're creating these scary networks along the way, that when in fact what you're doing is you're really dragging your offline into your online world, etc. And in some cases it still explains the distance things, but those are people that you were previously connected with in some ways, and so you do see a lot of geographic distribution. But certainly in my case, I know that most of the people I have on Facebook who are not living in Chicago are people who at some point I had some connection with and that's why we built those kinds of things. So I just want to end with a demo very quickly of a couple of tools. So what are we doing with all this stuff. We have all these really interesting theories and insights that we've been developing, and on the basis of that what we are doing now is developing recommender systems that I incorporate this kind of information both from the statistical modeling site, from the social theory site, etc. So one that I will show you is something that we did for tobacco. This is a research community that we now know that the light cigarette is no less carcinogenic than the regular cigarette. But we had all the information. All the research for this existed in the public domain for about ten years before we now know that this was the case. And the reason for that is because there in the tobacco research community, all of them mostly funded by the National Cancer Institute as well as a few other major agencies like that in the private sector, we know that epidemiologists have found that shortly after the introduction of the light cigarette that people -- there's a new, more aggressive, more pernicious form of cancer that came on. But they didn't connect it to the light cigarette or the low-tar cigarette. People who smoke in laboratories -- I didn't know these things existed, but you have smoking laboratories where they wire you up and make you smoke cigarettes and see how you react to it physiologically had found that you inhale five times as much smoke from light cigarettes than you do with regular cigarettes. Five times as much smoke. So thereby exposing the exposure to carcinogens is so much higher there. A third group that does reverse engineering of cigarettes because the industry won't tell you what additives they put in it had found that light cigarettes had additives with unknown carcinogenic consequences, but they had not made the tie into that. So there was this one guy in UC San Francisco who went to NCI, National Cancer Institute, and said, you know, I've been reading these studies, not many people look at these bodies of literature because they're not -- there are poor networks amongst these researchers, but I think that if NCI were to just do a one-year initiative to test this hypothesis and fund people to do that, to make connections amongst results already there, you may find this. And that's exactly what they found, and it took less than a year to do it. So NCI in response said we need to build better networks to help get better return of investments in what it is that we're doing, and so they came to -- they created the network and said, well, it's nice you create a network, but, you know, we create a portal, we have places we put documents and so on, but if I have a question on something, how do I then actually use the network. And so one of the things we did is the rest of the portal is -- it's pretty standard. But we put this thing in called networking. And by networking what we did is we took all of these researchers and, again, digitally went to PubMed and were able to extract who knows -- we took the abstracts of all the articles, and we did all the authoring, coauthoring, citation, co-citation and entered the extraction from the concepts from the abstract. So they already have these things called MeSH, which is the subject header titles, but we also did just text mining on the abstracts to get it. And essentially that became a multidimensional network. And so if you look at it and say I want to look at the list of terms here, you see the people as well as some of the terms that are associated with it. One that is pretty important these days, this thing called smokeless tobacco, because that's the big controversial issue now. And so I'm logged in here as Scott, you can see, and Scott knows that I've logged in as him, so I have his permission to log in, but I wanted to show this as somebody who's in the network and how this could work in terms of making recommendations. So the one thing -- the first thing you could do is just say show me the network of smokeless. So this is a multidimensional network with smokeless in the center and all the other things that are associated with it. So this is what that network looks like, right? So you can visualize this network and then you can do some kind of clustering, and you find smokeless is the center. There are three people who are associated with it, these are all the articles that they've written. You click on the article, it goes to that article. And clearly you see Dorothy Hatsukami, Christian and Tomar are the three experts on smokeless because their articles contain these keywords a lot, and those articles were most cited. And so you have a whole set of algorithms that work on the basis of that. Now, but if I'm saying I'm not just interested in it but if I, Scott [inaudible], I'm logged in as Scott, if I want to search this term, can you make recommendations to me on who are the ten best documents, ten people, and ten other keywords that I may want to look at. So that's what you see out here. As you scroll down, you see ten people, ten keywords, and ten articles. If I open persons, you now see all of them are still listed, Hatsukami, Christian and Tomar. But on a scale of zero to a hundred they vary in how strong that recommendation is. And you can click on the Y and it will begin to say I can take someone even lower down, I can take Gary Giovino. If I click on Gary and click on the Y, what it does is it both text based as well as visually tells me how I'm connected to that, the shortest way in which I'm connected. So I'm logged in here as Scott [inaudible]. I'm interested in smokeless, which is the bottom here. I'm recommending you go talk to Gary and the reason is because Gary wrote an article called "Trends in Smokeless Tobacco Use Among Adults and Adolescents in the U.S." Now, I may already know Gary as Scott, but if I want to see all the things I have in common with him, I can click on that link. And what it does is it shows me that I have six keywords in common with him; that is, our articles have six keywords in common. So this is the homophily theory that we talked about. The second is we have three people that we have both coauthored with, so this is the friend of a friend idea that comes into it, etc. And then these are two articles that are cited that cite both of us. Right? So it's giving you things that you have in common. So now all of a sudden both Scott and Gary, if Scott has to talk to Gary about it, there's a greater likelihood that, A, Scott will feel comfortable talking, so this is the, you know, the jerk is no longer a competent person who is -- I'm intimidated by, I see all the things I have in common; it's also more likely now that Gary is going to respond to Scott because Gary can see all the things that they have in common. So what you're doing is you're still going with the expert, but now you're using these social networking techniques to be able to tell both the parties, both the nodes what are the social reasons that they may want to -- that helps lubricate it. So we know from theories what motivates people to do it, and we're taking that and saying here is what we want to do with this. In some cases you can tune these algorithms. Some cases you may not want homophily to be. You want to show differences. So you may want to say, yeah, I want to go to somebody who is different from me on this particular topic, but showing them all the things I have in common with them in other topics makes it more likely that I'm that person and I'm going to feel comfortable, etc., about it. Yes. >>: How about the next step, does it work? >> Noshir Contractor: Yes. So but that's the part of -- that's our existing NSF grant, right, is -- remember I started out saying there are two things here you can do. One is to get them to use it, and then the second is whether they actually do better as a result of it. Right? So the both of those questions. So we have two grants out of NSF that is looking at building this for different communities. So we have a project that was funded out of NSF just a few months ago, an effort called VOSS, the Virtual Organizations as Sociotechnical Systems. And what we're doing is we're working with five major cyberinfrastructure efforts out of NSF. One is called the nanoHUB. In fact, I'll show it to you if you want. On the nanoHUB site we have -- nanoHUB is basically a bunch of about 25,000 -- I think it's even more than that now -- people who are -- it's a cyberinfrastructure for people who want to do nanoscience, nanotechnology. They upload things there, they do tags, they do a lot of those kinds of things. Let's see if it's working here. And we have it on the site. So this is a community of people who are collaborating with each other, who are chatting, who are tagging. So we have that kind of information directly generated from the cyberinfrastructure. And if you look at it now, it has resources, they have tags, so it's a huge number of people in there. I think it's in the 20,000, 30,000 range. So they are piping data to us and then we can go to the bottom here, if you go to a particular resource, it shows you recommendations. So it's out of our labs, so at the bottom it says if people -- it's kind of like the Amazon on steroids. If you're interested in this topic, here are five people, here are four documents, different presentations, etc., and you see it's -- so we are making this available through our lab, but it's available on their site. So that's one group we're working with, and we have several other groups, including tobacco, that we are working with as well. The last example I'll show you, and really this is the end of it, is we talked about navigating these networks amongst the people. But sometimes NSF would like to be able to see from the top are the people they are funding while networked with each other, for example. And how well they're doing with each other. So it also works as a management tool, as an assessment tool, to look at the network. So, in that case, we did -- there was program officer at -- well, it was the director -associate director of the Office of Cyberinfrastructure, Dan Atkins, and so Dan came to me when he took on this position and said, I'd like us to map all the projects at NSF that are connected with cyberinfrastructure. Every project. And I want to see who's connected to whom, etc. So what we did was we built this tool where we took all the proposals that NSF had funded and took all those principle investigators and went back to the [inaudible] and crawled it to do those kinds of things. We took the abstracts of the proposals and did the text mining, very similar to what we did in the tobacco context. But here the number of words were in the hundreds of thousands, so we couldn't have a list like we did here, so we put G, and it will show you all the names of people, concepts, etc., connected to that. And what I'm going to do is take the word genomic, and in this case I'm going to go ahead and submit it. We could do it -- I mean, you submit that network, you can view it as a Google Map or view it as a network to show locations where people are doing that kind of stuff. If you do it as a network, this is what that network looks like for the word genomic. By itself it may not tell you once, but the moment you cluster it, you see something very interesting happening here. So they're essentially two projects. On the left-hand side, so these are multidimensional networks, so the genomic is a concept, purple nodes are the project, this is a high-speed network connection for genomics research. This was another project also funded by NSF enhancing access to the bibliome for genome, genomics. These are some of the other keywords. The yellows are the other keywords associated with it. The salmon color here is the -- or, yeah, is -- are the principle investigators. And then that is the program officer within NSF, Doug Gatchell and Sylvia Spengler. So what you see out here is that there is a lot of ties amongst the researchers who are focused on this project in terms of who's citing and coauthoring with each other. You have a lot of ties on this side amongst these people. So one thing -- the good news at NSF is that these are not people who just came together to work on a project for the first time. They already have pretty good collaboration links amongst themselves. But there is no connection between the two. Even the program officer Sylvia Spengler was not aware initially that Doug, another program officer within the National Science Foundation, was doing this. When Dan pulled this up and looked at it, my first reaction as a researcher is, well, maybe they don't need to talk to each other, maybe, you know, the fact they don't have any other keywords in common means they may not have any reason to talk to each other. And there's a response which I think was very illuminating was he says that may be the case, but this starts the conversation. This allows us to say should these people be talking to each other, etc. And I think that what you see is that the same kinds of tools now allow us to be -- do diagnostics of this kind. I've obviously taken for visualization purposes clean examples, but a lot of this is driven, as I said, by an underlying logic on the modeling and there are indices that you can look at to see which nodes are important and what ways are they important, etc. So bottom line is where we started off, we said we have good theories, we have good data and we have good methods that I think creates a real sweet spot here for trying to build better Lovegetys and better SNiFs than we've been able to do so far. So I'm sorry this took much longer than I had anticipated, but I'm happy to take questions or offline or online as well as we go along. So I'll stop with that. [applause] >> Noshir Contractor: Of course I'll leave this slide on there just because these are the people who do all the real work on it. Yes. >>: So I was curious about the issue of, like, accessibility. So you had expertise as like the main component when you're going to talk about all the different reasons why somebody would talk to somebody. That's huge. But I'm just thinking from personal experience I can't go talk to the PI of my lab because he doesn't have time [inaudible] his e-mail. So then, you know, I'm more likely to go to somebody else. And I was thinking is that an explanation for maybe a little tiny messenger dot [inaudible]. >> Noshir Contractor: That's right. Right. Yeah. That's a big part of it. There are many of those mechanisms. One is that person's availability is one issue. The other is the person -- what we have found in a student -- this one. This postdoc right here. For her dissertation what she did was she found that there was also a strong tendency in organizations where you go to somebody who has just a little more expertise than you rather than the really big expert. And the reason that we are -- and this shows up systematically across is because a person who is a little more expert than you is in a much better position to explain to you these things because they have just been there, so to speak, and they can use a language. And very often when you go to the big expert they'll talk in a language and terminology that make it inaccessible. So in addition to availability or access in terms of geography, there's also availability and access in terms of being able to explain these ideas to someone. So that's -- so we have very interesting dynamics about who you go to, even within expertise, but it means that it was a reverse of what we had thought. We thought that the greater the difference in expertise, the more likely you're to go to someone. And instead what we found here was the research of that. >>: So does that dynamic capture when you were showing all the numbers of the [inaudible]? >> Noshir Contractor: Right. That particular statistic, that explanation was not -- we didn't use that in WoW, obviously. That's stuff that we've done more recently to look at relative expertise rather than absolute expertise. What we looked at there was absolute expertise. Yeah. Good question. Yeah. >>: So the last little demo that you showed, which is, I think, especially interesting is vis-a-vis gap finding, do you have -- and maybe this is what that -- maybe what your browser was about to bring this up, a more generalized gap finder, like help me as either NSF or as, you know, Ethan who's interested in X number of things, where are my gaps? Like what are the things that the system is able to establish that I don't know that I probably should. The Facebook, again, analogy is [inaudible] people that you probably know based on the fact that you have 12 mutual friends and yet you're not friends with each other? Unlikely. Or at least weird on Facebook's perspective. >> Noshir Contractor: Yeah. And so actually Facebook has that recommender system, but they do it basically -- as far as I can tell they use two things. They use same institution, so somehow they expect that I would know anyone who was at USC because I went to school there. Live in Seattle, that's another one. So there's a geography. And so it's a proximity homophily kind of idea. Or the mutual friends idea, the friend of a friend. So those are the ones they do. I would argue that in scientific engineering communities there are other kinds of data that we have that we could use and make it more likely. Now, that's as example the demo that I didn't do that you saw that I was heading towards, is in the case of the nanoHUB, for example, what you could do, what the recommendation will say is that if you're interested, say, in nanomedicine, okay, then what you would do is it will say, well, I can pull it up if you want to. But basically what it does is it says if I'm interested in nanomedicine, then the reason I'm recommending this particular presentation to you as opposed to another presentation is because you tagged something by this particular -- on this particular -- this guy who did this presentation, you tagged something else that he had presented, which was really -you liked it a lot, so we're recommending that as a result you would go -- you may like his presentation on nanomedicine, especially because it was also tagged by other people as being a really good presentation. So it's a fairly sophisticated kind of explanation, but it customizes it to you in a way that I don't see right now in most of what is available publicly in terms of commercial software that does these kinds of things. So I'm going to log in here as John Doe, and that user ID is 21,000, so I'm just showing you from our internal engine as opposed to from their Web site, so I don't want to log in as someone else. But if I'm logging in as John Doe, it shows me the recommendations. So some of these are tools, some of these are users, some of these are tags, some of these are online presentation. You again get the Y score and you click on it. If I'm interested in James Leary, you click on that and it will say the reason I'm recommending, it is because you, John Doe, you tagged -- whoops. This Java new application keeps telling me that everything I had previously was old. >>: Do you sniff out the Harry Potters of the list? Like people who read this obscure text also read Harry Potter, because everyone reads Harry Potter. >> Noshir Contractor: Right. So what it is ->>: [inaudible] using this tool, Microsoft Word, as it turns out, I've already installed it. >> Noshir Contractor: Yes. And I think what -- yes. So there is obviously inverse frequency applications of what you want to be able to do. And we call that an entropy figure. In other words, it says, for example, if you want to accomplish the similarity between two people, then the similarity between those two people is weighted not -- that if you have -- so, example, everyone in this room says that I'm interested in communication or I'm interested in computers, that doesn't count for a lot. If there are unique words that just two of us are interested in, that counts for a lot more. So that's basically the idea of using entropy as a way of saying that just the frequency of words don't encounter -- what is unique about it is what you may want to be able to pick up on and weight that more heavily. So this is an example of it, right? So John Doe is logged in, I'm interested -- it says I would be interested in looking at this presentation called "Nanofactories - In-situ Production of Therapeutic Genes" that was put together by James Leary. I had already -sorry. I had already tagged this one. It was done -- it was prepared by James Leary. James Leary also has another lecture on nanomedicine called [inaudible] basic nanomedical systems design. That's why I'm representing it. Now, we've already changed the visualization of this, because right now on this to understand what I just described you need to click on each link. And we found from user studies that instead if you just have the legend that says what type of link it is, then without having to click into it, you'd be able to understand what I just said verbally, etc. So that's a variation that we're working on, etc. So I -- yeah, I mean, you know, we may be being crazy about these kinds of ideas, but I think there's a -- from what I've seen from these communities like nanoHUB, there's another one for earthquake engineering simulation called NEESgrid. There's another group that were working at [inaudible] laboratory for water. It's called WATERS. It's a nice acronym that has about water, about hydrology and so on. We're doing one called The Media Hub, which is a project that is handled out of the Social Science Research Council in New York funded in part by MacArthur and others to bring people from -- to help build a recommendation for people who are interested in media research around the world. So we have this project. And so that's when we can answer Danielle's question of, well, does it work at both levels in terms of other people are going to use it and more significantly is it really going to help make them more effective. Tough questions to assess in a short term, but I think we have a design that allows us -- at least we now have the data that allows us to get at some of those things. Well, thank you so much again. I really appreciate it. Great questions, too.