>> Scott Counts: Okay. Folks. Let's go... Winter Mason here from Yahoo! Research, and pleased to welcome...

advertisement

>> Scott Counts: Okay. Folks. Let's go ahead and get started. Today we have

Winter Mason here from Yahoo! Research, and pleased to welcome and introduce Winter. Winter has a unique and potent background. He got his Ph.D. at Indiana in a combination of cognitive science and social psychology. So he's adept at thinking about people in a very, very small sort of down to the neuron level. And has spent the last year and a half or so at Yahoo! Research where he's thinking of a much larger scale, social network scale, and so he's one of the rare people who can think in sort of multiple levels of analysis with respect to social computing.

And today he's going to talk about social networks and success. So please welcome Winter Mason.

>> Winter Mason: Thanks for having me. Thanks, Scott, for introducing me. So as he said, I'll be talking today about social networks and success. I'm just going to open it up by asking, trying to answer why could social networks affect success?

So one thing is that social networks affect the information, the access that people have to information. So people in different positions within a social network have access to different kinds of information and they are able to broker information between other individuals that they are connected to.

And as a result, you can see, you know, a person might be more successful because they're in a better position with respect to the information that they have available to them.

Similarly, in information diffusion, if you want to reach a whole lot of people and you want to affect a large number of people, then you might think about information diffusion and how far and wide a person's influence can reach. And this, of course, will also be constrained by the social network that they're embedded in.

Another way that you can think about it is with respect to workgroups. So workgroups, you know, share information. They share resources. Resources are distributed amongst people. And this distribution, the way that these resources and this knowledge and the capabilities are distributed within the social network of a work group can actually influence the way that the group as a whole can act and respond and behave generally.

So you can actually think about how the internal social networks affects the entire group's success.

So what I'm going to do is I'm going to -- I was going to talk about three projects, but I'm going to -- I've pushed one to the back in case I don't have enough time.

But basically I'm going to talk about a couple of research projects. One that I did in my Ph.D., and the other that I just did recently.

So the first thing that I'm going to talk about is this project that I did with my colleague Sid Surry. We're basically looking at social networks and success of individuals. This basically has been addressed in the sociological literature under the rubric of social capital.

So social capital is meant to be analogous to human capital and real capital in that it is actually resources that you can draw upon. And so we, out of the numerous definitions that we saw in the literature, we felt like this one was the most representative of the way that people have been thinking about social capital.

And we only tweaked it a little bit. This is Nan Lynn who we're citing here. The definition is social capital can be defined as resources embedded in a social structure which could be accessed and/or mobilized in purpose of actions. So the two key parts of this definition is the things that -- the social capital is actually embedded in the relationships. A classic example of social capital is simple reciprocity. So I do a favor for you. Now you are a resource that I can draw upon because we have a relationship of established reciprocity, where I know you owe me a favor and you are going to do me this favor because I've done you one.

This means I can access these resources and maybe even get you to do things, act in a particular way, by utilizing our relationships. But this can -- what crucially it's not just reciprocal relationships that are important. You can have entire networks.

And so the literature on social networks, excuse me, on social capital, has tended to focus on two particular aspects. The first two particular sources of social capital. The first is network closure, or cohesion. And this is the position that's been advanced by Coleman. And basically the idea here is that if you have a very cohesive network surrounding you, the people you're connected to are very strong ties, you're going to have a lot of social support.

So if you run into any problems they'll be there to help you out. You get into debt. Your family can bail you out. Later on you get a job and you can pay them back, right? So if you have a community that supports you then that can actually buffer your losses and therefore improve your overall success, right?

You also, if you have a tightly knit network, you might have this network of reciprocity and trust, so you can say well I did a favor for John and John did a favor for you, so now you need to do a favor for me, right?

So if you have this kind of nice community and tight knit network then this can actually improve the resources that you have available to you. Finally, a lot of these tight knit networks in this argument, a lot of these tight knit networks are based in some community, some real like, you know, a lot of these studies have focused on ethnic communities where they're looking at how these social norms are established that enable people to kind of -- it's not just the trust but it's also kinds of the understanding of behavior and how you're going to react to certain things.

So if you have this tight knit community then you can enforce these norms, you can sanction these norms, you have this very like developed whole community trust. And so there's this whole theme of literature based around saying that individuals can be more successful if they're embedded in a network that is very cohesive and very strong.

However, there's also this competing view which says that if you're in a position that bridges different communities, okay, so rather than being tightly embedded within a community, if you're actually bridging different ones, that's going to be better for your success.

So one reason this might be something that I alluded to just moments ago, which is that you have access to a wider range of information. So if you have different communities that are all very homogenous in their ideas sharing the same norms, sharing the same information, and you have access to multiple communities, then you can draw from all of this and combine it and put it together and broker through the relationships as I mentioned. If you're a single seller in a network of buyers, you can play the buyers off of each other, assuming they're not connected, to get the lowest price possible. Of course, if you're not in this position and the buyers are actually in communication with each other and colluding, then you don't have this option. If you're in a position where you can broker, then this might be good for your social capital and for your success more generally.

However, you know, so Sid and I were looking at this literature and we felt like there was some kind of glaring, missing things. The first is, you know, there's other sources of social capital. You can imagine other sources of social capital other than just being in a tight knit community or bridging different communities, right? Like maybe an important thing is simply having a whole lot of resources and people to draw from. Right?

Maybe if you're Ashton Kutcher and you have a million followers on Twitter that actually is a better resource that you can draw upon than if you're, than if you only have a few friends. Another thing is maybe actually centrality in the network as a whole. Maybe it's not important that you're bridging different communities, maybe you're at the center and information has to flow through you for the whole network, that's going to be the thing that's most important.

And in fact there's probably many other sources of social capital that you can imagine that would generally benefit an individual. Moreover, for any given source of social capital, there's likely to be multiple measures for that source, right? So for the sociologists that have been studying bridging and advocating the bridging perspective, this is what's important in terms of network position for success, they tend to focus on a single measure that Ron Burt came up with called Network Constraint, where the idea is if your network, if your social network is highly constrained, then you don't have a lot of structural holes and therefore you don't have access to resources.

But of course there might be many different measures of bridging or there might be many different measures of cohesion. And so we just felt like this was kind of missing. Why not look at all of these things, and even on top of that, why not, what about the possibility that being in a network position where you kind of have a strong network around you but you also have weak ties to other things, maybe that's even better than being in a position that's simply bridging or simply in a tight knit community in and of itself.

So what we did is we said, okay, we're going to take advantage of the fact that we have lots of data. We're going to get lots of data and we're going to look at different measures of success.

So what we did we looked at these four different datasets. So Cite Seer and Cite

Seer X are both co-authorship networks. They're both datasets about publications and citations. And Cite Seer is a subset of Cite Seer X, basically

Cite Seer X was a subsequent crawl of the Web. Both of them were designed at

UPENN to --

>>: Penn State.

>> Winter Mason: Penn State. Thank you. To crawl the different citations. And so basically what we did was we just took a projection of the bipartite network between authors and papers and set two authors have a tie if they co-authored a paper together.

Our measure of success in these two datasets is the number of citations received, just the raw count. Now, of course, this is not perfect and there's a lot of things about you can use eigenvectors or eigen centrality on the citation graph to figure out the best impact. There are a bunch of different ways of measuring success.

But certainly the raw number of citations is a reasonable measure of success in an academic community. So what we were interested in for these two basically, does a person's position within their co-authorship network affect the number of citations they receive and also the number of citations per publication that they receive. So their average kind of success of their papers.

For Live Journal, what we did, this is a similar -- it's very comparable to the co-authorship networks, except now we're looking at blogs. And so we have

2.2 million bloggers, four hundred million blog posts, and the social network that we're looking at is this explicit friend network where people actually say I'm a friend of this other blogger.

And our measure of success is comments. And we feel like this is like it's a proxy for their readership. So if they have a lot of readers they're going to have a lot of comments. We also looked at the comments per post.

Now, the last dataset is very different, and basically the reason why we used it was because we wanted to look at a dataset that had been analyzed from this traditional social capital perspective.

And so Ron Burt has this paper where he analyzed this network of the bureau of labor standards classifies all businesses into various industries, and then every five years they release a report that is the buying and selling between the different industries.

And so you can construct a graph out of this buying and selling network. There's a particular technique that Burt did that we imitated like cutting it off at the two percent line so you're eliminating the weakest ties, et cetera, et cetera, but basically that's our buying and selling network, the measure of success is the price cost margin to a rough approximation is like the profit margin for the industry.

So what we did is we had a huge set of features that we looked at. We could have even done more. We just kind of did the ones that were computable and took wide range as possible. Used ones that were used in the literature, and ones that lined up with kinds of the sources of social capital that I just talked about. So things like cohesion, where you've got like clustering coefficient.

Between-ness, I'm sorry, centrality so you have closeness and between-ness.

You know, reach. So you have two hop degree and degree, et cetera, et cetera.

Now we also had some nodal features, nonnetwork features that we looked at, to see how the network features stacked up against nonnetwork features. Not just looking at the social capital stuff but also other things that might be indicative of success.

And there's a couple of features in here that I want to talk about, specifically one is the industry concentration score. Obviously that only applies to our final dataset. Basically what it is is it's just the percentage of output within an industry that is accounted for by the top four most producing businesses within that industry.

It's a very -- I don't know who came up with it, but it's supposed to be a proxy for how dominated a particular industry is by the top industry. So if you look at like car manufacturing, there's obviously like four companies that tend to dominate the market more than the rest whereas like with sheet metal manufacturing there's a whole lot of little businesses scattered across the U.S. So the industry concentration score is very low for those.

The second one is ego components. This one was actually independently and prior to us coming up with it created by Steven Borgatti. Basically what it is if you take an actor and their immediate ego network and then remove them, how many disconnected components are made?

So in this way it's kind of like a measure of bridging because if you don't have that node there, how many different pieces are they connected to, how many different communities are they connected to?

So our first pass question was: Okay, so we have all of these features, right?

How do these relate to success? I mean, all of these down here are specifically

network features. And therefore they're capturing things, relationships, and therefore it's going to be indicative of social capital and different sources of social capital. So the first thing we did was we just looked at essentially correlations of each of these things with the various measures of success. So in site seer we're looking at total citations. We looked at the sites per pub and features and found that they had a very heavy tailed distribution so we took the log to normalize the errors.

So what we see here, just a couple of quick points. Basically the very most predictive feature in both the average citations per publication and the total number of citations is the number of publications you do. So what this means is that if you were highly productive and you're outputting a lot of papers, that's going to be the more that you do that the more citations you'll get and the more citations per paper that you will get.

The most predictive network feature is the maximum weight. So the weight is essentially the number of papers that you've coauthored with a person. So that's the weight on the edge. And so if you've coauthored, if you have one person that you've coauthored many, many papers with, that's also correlated with the number of citations that you will receive.

The most -- like the most predictive structural feature is two hop degree. And it's interesting, you know, so more than just the number of co-authors you have, right? But the number of authors that are reachable within two hops of this co-authorship network is more predictive of the number of citations you receive than just the number of co-authors you have. So here's the situation where you're really getting at the most -- I feel like the most likely explanation for why this is true is because you're actually seeing information, you know, crossing.

So if I'm connected to somebody that's coauthored with other people and if the number of people that I've coauthored with reaches a wide range of people that's actually more predictive of success and that's actually going to influence my ability to get citations.

>>: Sorry, but your definition of success is the total number of citations unnormalized by anything.

>> Winter Mason: That's correct. We also have cites per pub.

>>: Yeah.

>> Winter Mason: So in fact the ordering of these features in terms of their how predictive they are of citations and citations per publication tends to be pretty similar. Obviously there's some variation, but, for instance, the maximum weight and the two hop degree are up here and then the maximum weight and two hop degree are up here. So certainly in terms of like the ultimate ordering of all of these things, you're seeing these particular features be predictive, both of total citations and citations per publication.

I will only talk -- oh, I had one other thing that I wanted to say on this one, which is that our measure of ego components, which is probably the, one of several various measures of bridging, actually in this domain is way more predictive than network constraint of the number of citations you receive. So the sociologists that have been using this network constraint as a measure of bridging, maybe in this domain looking at other measures of bridging would be a better idea.

I'm not going to talk too much about Cite Seer X because actually what we see is similar to Cite Seer. It's not surprising. It's the same domain. There's some overlap in the data. So it's not too surprising. But basically number of publications is first. Maximum weight is the most predictive network feature. And then two hop degree is also very predictive.

So it's very similar to Cite Seer. What's interesting is that when we look at Live

Journal, we actually see similar order of the variables again. Right? So the one difference being that degree is much more predictive of the number of comments that, the total number of comments you've received than the number of posts you have. I mean, not a whole lot more, but more. And we think that there's some reasonable explanation for this. So the reason you're creating an explicit friendship to somebody in this blogging network is because there's somebody that you are interested in reading their blog. And if you're interested in reading their blog, then you're likely to comment on them. So if you have a large degree, you have a lot of people that are potential readers of your blog, and therefore likely to comment on it.

But, again, we're seeing it's not just -- it's not just the immediate people that you're connected to. It's two hops away that's being important. And incidentally, we thought about, okay, we could do three hop degree, but just turned out to be computational intractable at least for the time scales that we were looking at.

But, again, also, ego components ends up being a very predictive feature. So yet again and more predictive than network constraints. Here we've moved to a completely different domain outside of co-authorship where the edges we're looking at is this explicit friendship and the measure of success is now comments. And we're still seeing a kind of similar ordering.

So we've gone through like three datasets, and we're feeling okay we're starting to see a trend here. We definitely feel like we're getting an idea for how things work and how these variables are related, at least on a first pass just correlational nature with R measures of success. Great it's telling us this nice coherent story.

Well, then we get here. And all of the basically the orderings that we had seen, the relationships with success, are all completely different here. What we're seeing is the industry concentration score is relatively correlated. It's 27 percent correlation. I mean, it's not great. Concentration weighted NC. So what this is you take your network constraint measure and you weight it by the concentration score of the industries that you're connected to. This is actually -- so these two industry concentration and concentration weighted network constraint are

basically the only variables that Burt used in this paper that we were comparing our results to.

And what's interesting is that so this is something that Scott and I were actually just talking about. So over here you know the most predictive features in Live

Journal and Cite Seer the most predictive features are tending to be correlated between .5 to .6. Maybe even .7. So you're seeing these like very large correlations. And then over here in the industry dataset you're seeing these very, very small things, and you know Burt has this long paper about how talking specifically about this and how network constraint is very predictive of success.

Right? And how it's this clear measure of social capital.

But then you look at the R squares and they're like .2 at best. Right? So it's questionable. And so what we did -- now we have this kind of very simple first pass understanding of how these features relate to success, right? So we can say some things right at this point? We can say that there is a relationship between these network features and success. There is a positive relationship, measurable relationship, significantly different from zero relationship with these network measures and success.

We also can say that at least for some of our datasets we saw this consistent pattern of you know the productivity being the kind of most predictive thing, measures of reach like two hop degree being the next most predictive. I'm sorry, before that the strongest tie being most predictive then measures of reach. We see bridging being also important.

We start to see these kind of relationships. But, of course, the question might be, you know, it might be a confluence of different network features that are important for predicting success.

And so what we wanted to do is look at sets of features. But of course before you do that you want to look at the interrelationships. Now, I'm going to go through these, they're pretty pictures but they're not that informative. There's just a couple of things I want to share about them which is basically if it were true that there were only these two sources of social capital, you know, the bridging and the cohesion, then you might expect to see kind of two groups of features kind of just falling apart, whether they are a bridging measure or a cohesion measure.

But you don't really see that.

And so basically the only thing that you really need to take away from these is that you do have these features that are highly redundant. These are intercorrelations where the more red it is the more positively correlated it is. The more blue it is the more negatively correlated they are.

And I know you can't read any of these. Basically suffice it to say, clustering is negatively related with ego components which is completely unsurprising. It's positively related to network constraint. Also unsurprising. So the kinds of relationships are not very surprising or particularly important, just the key thing to take away from these is that there are these strong correlations that we have to take into account.

So in order to do that, what we did is we adopted a technique from the bio statistical literature, which is this fast correlation-based filter. Because what we wanted to do -- we didn't want to just -- of course we can throw all of these features into a single regression model. We also did regression trees.

You know, throw it -- and the results were comparable. So I just report the regression results. But basically you want to -- you could throw all of them into the model and then that would really be maximizing your predictive accuracy.

But what you want is really some sort of interpretability. You kind of want to be able to say are statements like, well, controlling for the features in our model, you know, an extra friend, an extra person in my social network is going to lead to so many more citations.

But you can't do that if you have co-linearity of your variables, or you can do it but your confidence intervals on the statements you'd be making would be way off, right? So what you do here -- what this algorithm does, it's really quite simple.

It's basically like forward step-wise regression where you put in the most predictive variable.

But then you apply this filter where you eliminate any of the features that are too highly correlated with that feature you have in your model. Then you look at the next -- after you've eliminated that, you look for the next most predictive one, put it in your model. And then apply the same filter to say what's two co-linear with the model I have already and eliminate those features. Essentially you're ensuring that you're always adding the most predictive feature after eliminating the ones that are too co-linear. So that allows you to interpret the variables.

>>: So the data question here, so let's say that you did something like this analysis instead, which would be a blend of all your features you could quickly find -- given the amount of overlap [indiscernible] I would guess [indiscernible].

>> Winter Mason: Sure.

>>: What's going on there. So the problem like you said may well be interpretability because you could then say this is a blend of this blend of that.

But at the same time I guess the question I would ask in terms of social scientists would be like well they came up with these heuristic features, right? They're not -- they're not golden -- there's nothing sacred about them. And it may in fact be it's much more effective to use blended features.

>> Winter Mason: I mean, I completely agree. And in fact like I think that to some extent when you look at kind of the real core of what they're doing when they're doing these sorts of analysis and trying to make these predictions really actually what they want is the whole model. They really want to predict success as best they can, right? But then there's like this big like, when they're talking about it, they make very strong claims about the theoretical basis and why they make nearly causal statements about it's always correlational studies but they make these nearly causal statements about why this feature is important and they spin a story about why this feature is important, right?

So we were as perhaps you can tell we were kind of pitching this to the sociologists and of course in order for them to kind of buy it at all, we had to like make sure that there was this interpretability. And I think that maybe if you're like really interested in organizational things or whatever, like you can imagine maybe trying to tweak these variables so that you're getting like the most success or whatever.

But --

>>: Alternatively what you could imagine doing one might force in the social psychology sphere, you could do something like you could say I'm going to use

PC and friends optimizations are and make justifiable features. Make a whole bag of them and correlate best to the ones I know are actually optimal. And they would be like wow.

>> Winter Mason: That's not a bad idea, actually. Maybe I'll try that approach.

>>: And I had another question actually which was semi-related. Which is so you look at the correlation coefficients of these features. One thing I wonder about is in terms of actual predictive ability, there is the possibility of saying I'm going o look at this person over N once, maybe the person on the network. And then look N once into the future and see if I can say something about how they might perform.

>> Winter Mason: We actually did that.

>>: You did?

>> Winter Mason: Yeah. I'm not going to talk about it, though. Yeah, we were doing essentially basically given the number exactly your network position now can we predict the increase of number of citations you can have. We did that. I decided to cut it just for time.

Right. So what we're looking at are sets of features. So we've applied this FCBF algorithm where we're getting these coherent sets of features. And so this is Cite

Seer, total number of citations I'm going to focus on first.

Basically so if you recall, just looking at the number of publications was correlated with number of citations about 71 percent. So the R squared was like

I think it was like 51 or around 50, 51. No, it was 49. So the R squared was .49.

So even with all of the network features. So this is, if you take all the network features and throw them all in the model and how well does this predict success, it still doesn't do as well as just solely the number of publications.

But at the same time, right, this is -- when you have them all in there they're accounting for about 41 percent of the variance. So that's not terrible. Especially given the kinds of R squareds that you're seeing and other work on this tends to be around .2. That's actually pretty good. And in fact when you just take these five network features and put them in the model, you still get -- you still account

for about 38.5 percent of the variance. So what's nice is that we are, because of the FCBF algorithm, we're guaranteed that these are not too co-linear with each other. So we can actually make the kind of statements where we say: If for every additional paper that you co-author with your strongest tie, with the person you've coauthored with the most, controlling for these other variables, you're going to get on average about five and a half additional citations.

Similarly, for every additional person in your two-hop network, so every additional person within two hops of you in your co-authorship network, on average you're going to get about a fifth of a citation controlling for these other variables.

Now, citations per publication, again, I don't think I was entirely clear on this.

What we're doing is we're just taking the network features and applying the FCBF algorithm to them. That's what I'm showing you here. We also did it to the one, to the case where we were throwing in all the features we had available to us and applying the FCBF algorithm. I had the R squareds here, but I'm not going to talk about the features selected in it.

So when just take the network features and try to predict the number of citations per publication, you really don't do a good job. You get about eight percent of the variance which I'm just going to say is none. So you can't really -- I mean, you can -- interpreting these features is not a good idea, basically.

So Cite Seer X I'll flip back between these two, and you see that the features that are being selected for are essentially the same in Cite Seer and Cite Seer X.

Again this is not terribly surprising because you're looking at the same kind of the same kind of data.

>>: I understand the ego components but I see CRX doesn't [indiscernible].

>> Winter Mason: That's right. So basically it's not a huge contribution to it. And in fact if you look -- if you look at the single correlations, that's one of the small differences between Cite Seer and Cite Seer X is ego components is not as predictive.

>>: [indiscernible] do you have a handle on that?

>> Winter Mason: So just think -- at a very rough level it's just a measure of bridging. It's like how many different communities you are connected to. So if you have a set of people over here that are co-authoring together and a set of people over here co-authoring together and a set of people over here then your ego components would be three. Whereas if you have everybody that you co-author with also co-authors with each other or there's a path connecting the people you co-author with then ego components would be one.

>>: Why is it different from clustering?

>> Winter Mason: Because the key is that what you're looking at is the number of connected components. So.

>>: What's yourself, take yourself out of it?

>> Winter Mason: Just look at your ego network you take yourself out.

>>: [indiscernible].

>> Winter Mason: So it's the path between them. Okay. So right. So Cite Seer and Cite Seer X not much difference. Ego components goes away. Like the D prime the effect size of each of these variables within the models is pretty comparable across them.

And, again, with just these in this case with just these four network features, you're still accounting for about 34 and a half, yeah, about 34.5 percent of the variance. And yet again you can't predict, you can't do a very good job predicting citations per publication. The R squareds are pretty low.

So this is a bit of a slog, but I'm almost through it. So Live Journal, honestly, like we looked at this and essentially all the variance is being accounted for by degree. So that network feature does a very good job. Interestingly, you can predict comments per post a lot better than you can citations per publication.

The R squared here is about 31.5 percent. And we again think that this is because of the large effect of degree.

In industry dataset, I was talking about how with R squareds in this range you can't really interpret anything. And so again that's kind of what we find. Basically because we don't have the concentration scoring here. And so honestly the conclusion that we reached about this industry dataset is that all of the results that were kind of, were seen in the previous publication of this, were due to this concentration score. And it just seemed as though it were the network features.

I mean, I can't say that 100 percent conclusively but at least in our explorations of the data that's what it seems like. Which is unfortunate because our whole point of using this dataset at all was to make comparisons to the paths. So I guess it's good in that maybe we're addressing issues in previous work. But it didn't really help in terms of us being able to predict something interesting about industries in the U.S.

So we can make some kinds of statements about looking across all of these domains and all of these measures. We can make some kinds of generalizations.

>>: Wait. That's fantastic input isn't it? I mean you just said similar work from a couple of years ago that everyone has come to a theory of how industries communicate with each other isn't. They were actually measuring the wrong thing.

>> Winter Mason: Yeah. It depends on how much -- okay. The reason I glossed over it is because I don't think anybody here has hung their hat on that particular hook.

>>: No one in this room.

>> Winter Mason: Right. Exactly.

>>: But I mean, social network analysis people there was a lot of that tone.

>> Winter Mason: Yeah.

>>: Making a lot of money making that thing [phonetic].

>> Winter Mason: I think the thing is that if you look at the original work, I mean, like I said, you're seeing these R squareds that are in the range of at best .2.

And so you kind of have to call into question why people were interpreting it in this way in the first place. Right? I mean, so honestly you don't even need to have done our analysis to know that it wasn't that great to begin with. And it's just kind of -- I don't know -- I mean this is one -- this is actually one of the reasons that we got into it is because we were looking at this paper and we were like this just doesn't seem to make sense. And so we tried to address it.

But really what we had hoped was that you know by throwing in these other features, by looking at these other network features, we were actually going to be able to say, oh, you know what it's not network constraint that's important. It's not the industry concentration score that's important, it's X. And we weren't able to do that. So just having a negative result, kinds of a null result, is not as strong of a statement as being able to say we have some really clear positive result.

Okay. So going across these different datasets, you know, we can make a couple of summarizing statements. One is that in almost every single case, including the industry dataset, it was a nodal feature that outperformed the network feature. The only case this wasn't true was degree in the Live Journal dataset predicting the number of comments received. In almost every other case this measure of productivity or the industry concentration score was the most predictive.

However, importantly, you know, this is something that we feel like is just, is kind of obvious but overlooked in the literature, which is that you take combinations of sources. So bridging and cohesion and reach, and put them in a model. And it does better at predicting success than any single source of social capital alone, whereas like the literature was all about it's either X or Y. In actuality it's XY and

Z.

Of the network features almost always measures of reach in terms of how many resources you have available to you were predicting, doing the best job of predicting success.

In terms of addressing this literature about bridging, we found that our ego components measure almost always outperformed network constraint. It's at least worthwhile to consider different measures and different sources when you're looking at different domains.

So I feel like there's just been this kind of white wash of this is how you treat social capital for everything, and in fact what we clearly see here is there's differences between different domains and datasets.

So.

>>: I had one more question about this. I'm curious if you or the previous author did this analysis, because so far you've shown something where you have continuous metric success [indiscernible] or dollars or volume capital or whatever. And you are kind of progressing from that and showing [indiscernible] so an alternative way to look at it is I'm going to set a threshold of success which is more than citation, more than citations the R factor, H factor, and for businesses some capital amount or whatever. And say, okay, now I'm going to use classifier, any other means you have and I don't care what combination I end up with or what crazy thing but I want to see how well you do the actual task of saying one residual for these things.

>> Winter Mason: So I can definitely tell you nobody has done that. [laughter] we did actually briefly think about that as we were approaching this problem, because one of the statements that we were making this claim that Ron Burt's whole like argument was kind of flawed, and somebody came back and said, well all he ever said is that you have higher potential for success if you have these features, right?

And so if you're just actually trying to regress on it and success is a rare event, then you might have this problem. So we did actually think about this classifier thing.

But again we kind of erred on the side of let's small steps towards the goal.

>>: The dataset?

>> Winter Mason: That's right. Absolutely. So the next thing I'm going to talk about is kind of the relationship between social networks and success of groups that I alluded to at the beginning of the talk. So this is work that I did with folks at in my Ph.D. program Andy Jones and Rob Goldstone and maybe a couple of you have seen me talk have seen this bit again and hopefully not and hopefully you will get more out of it if you have.

So specifically what we're looking at here is how distribution of knowledge within a group and how that knowledge is shared affects the success of the group as a whole.

So it's trivial to say people share good ideas. That's definitely true. This includes skills, techniques, solutions, whatever. It is also true that this communication is situated in a social network or at the very least in some sort of communication network, be it the work hierarchy or the social network or what have you, but there's definitely some constraints on the way that the information flows. And therefore QED people, these good ideas spread through social networks.

So given that that is true you can ask the question, well, what kinds of networks are best for spreading good ideas?

And when you say that, does the network interact with the problem space, and if so how. So this is the question that we were trying to address. And what we did is we -- this was an experimental in the lab, brought a bunch of people to a computer lab had them sit in front of a bunch of computers had them play this boring game where they basically had to guess a number between 0 and 100.

They'd get a score that was based on some fitness function plus some stochastic noise, and they'd also see the guesses and scores of their neighbors in the social network that we placed them in.

And all that we did is tell them to try to get the highest score. So, okay, I don't know what happened there. So basically -- well, this is what they saw. Okay.

So they had to guess a number between 0 and 100. On the previous round, you know, they'd see this is what you guessed, this is the score you got. And those are the guesses and scores of your neighbors.

So you might see this thing, okay, if you were this user, you might say, oh, 39, I want to go to 39. That's where the highest score is. So the next round I'm going to also guess 39 or maybe I'll guess 37 or 38 but I'm basically going to go there.

So there's this underlying fitness function.

And so you guessed here and the other people guessed here. So here's 39, right, you guessed 39. The fitness function will now give you a score around 42.

But what actually you're missing is that there's overhear around 70, there's a better point where you would actually get a higher score. So their only goal is to get as high a score as possible. They have 15 rounds to do it. And after each round they see the guesses and scores of their neighbors.

>>: Highest cumulative score or highest score on the 15th guess?

>> Winter Mason: Cumulative score.

>>: For example, experimenting around the 70 space would actually be high score?

>> Winter Mason: Oh, oh, because there's a penalty for exploration is what you're saying.

>>: Whether it goes straight down or comes up. So there's X degree not to find out.

>> Winter Mason: We actually -- yes, that's true. That's absolutely true. There's definitely this trade-off between exploration and exploitation, but certainly like if you are on round three, you know, in the earlier rounds there's less of a penalty for exploration, essentially.

And so we had variable group sizes, it was very difficult -- you just actually essentially impossible to control group sizes, because it was, you would recruit a bunch of people to come to an experiment and some of them would show up.

And so our group sizes were variable, but we built social networks, communication networks for them to be playing in. Based on a couple of different kind of systematic things. So we had a regular lattice where we would put them in a ring and we tried to -- well, okay, we put them in a ring. We had small role where it was basically the same as lattice but some of them were randomly rewired.

>>: Time constrained or volume constrained, both.

>> Winter Mason: 20 seconds per round and there were 15 rounds.

>>: I see.

>> Winter Mason: And maybe the first round they actually needed the

20 seconds, but pretty much every other time they were like, oh.

>>: How big, I was like they could scan for the max, for instance?

>> Winter Mason: So we did a little bit of pilot testing to figure out like kind of -- we wanted to make the search space big enough that it wasn't going to be too easy, but also not impossible for them to search, right? And we had to do that while factoring in the fact that they were going to be getting information from other people. So it was -- and we did a little bit of pilot testing. Not a whole lot.

Maybe not even as comprehensive as I would have liked in hindsight. But enough to kind of get it roughly in the area of the right level of difficulty.

So we had the small world which is basically the lattice with some rewired and we had the random end. And for these three we used techniques that ensured that the average degree was always the same, which was around three.

And then we had the fully connected one which was kind of like an upper bound.

So this is the case where everybody's got full information, rather than just seeing -- rather than just seeing three of their friends there that see this big long list, depending on how big the group was, right? So they had essentially full information in the fully connected network.

And so in our first experiment, we had nine groups. Seven to 18 participants.

Median size was ten. We had two different problem spaces. So the unimodal is trivial. It's hill climbing. It's just you search around the peak and you're there.

Right? The second one was multi-modal, which is kind of the first one I showed you where you actually have local maximums so you can actually get fooled into going into one of these local maximums.

So we had two metrics that we were looking at in terms of success. So we're interested in group success. And so the first metric was how many rounds on average did it take for a person to find this global maximum?

Or this global maximum. And then our second measure of success was number of points per round, average number of points per round, which also -- yeah, that's right. So for speed of discovery, this is with the unimodal. So very trivial.

Completely unsurprising in the fully connected network. It took them only three rounds whereas on the others it was closer to four. The random actually was a little bit in the middle. But basically this is -- I don't have the error bars on here for whatever reason, but I promise full is significantly faster than the rest.

>>: When you say finding, you mean within some.

>> Winter Mason: Within a half standard deviation because these are constructed based on a combination of normal, of Gaussians and then we just took half deviation so if they guessed within that.

So then when you look over rounds, you see this same kind of pattern where -- so the fully connected is this one, and you see that they're finding the maximum, you know, much faster. Oh, this is the percentage of people that are guessing within this half standard deviation, right? This is actually -- if I had put up the plot of the total number of points accrued, it would look almost identical to this.

And so you know you see this -- right. So the fully connected does it really well.

All of them eventually converge on it. Except for the lattice, which is the ring, and the lattice information takes a long time to get around. So one person finds the peak and other people haven't, right, and they're relying on the information from somebody else, it's going to take longer to get them in the lattice.

That's one of the things we specifically designed it for was to have this long path length.

>>: Everyone therefore is not [indiscernible] does everyone have a constant number of neighbors?

>> Winter Mason: No, no, we did in the lattice random and small world we controlled for the average number of neighbors. So it was always around three.

But there was variation between the people.

And certainly there was like completely -- well, everybody had the same degree and the fully connected. So here's one of the most interesting results in my opinion. This is the multi-modal. And this is how quickly they found it. What you see is actually the small world network on average found it faster, found the global maximum faster than the fully connected where they had complete information.

And we're still seeing the lattice being the slowest. And when you look at the convergence, what you see is that it's almost as though you would expect this to be the fully connected, but I promise it's the small world. They get an early start.

And end up everybody ends up converging, even the lattice, but you know the small world gets this kind of early lead.

And we don't know exactly why, but based on some -- we actually kind of looked at the exact patterns of where people were guessing at some of these, and we kind of developed a hypothesis for why this was happening which is what we would see to with the fully connected is the situation I alluded to at the beginning of this. Which is people would -- you'd see this long list, and it just so happened that nobody had guessed in this particular range and somebody had guessed in this particular range, right? So they'd all quickly converge like within the second round they'd all converge, and about two-thirds of the time they'd converge to the local maximum rather than the global maximum and as a result --

>>: Something you might find interesting, I recently saw another study in cases like this where there's no unknown fitness function, people have this strong unimodal function. That there's a reward payoff with slot machine where bimodal would include the wrong thing. They'd stick to the mode, they know so it's that

[indiscernible] it's interesting.

>>: Talk to you about that.

>>: When you look at the lattice curve it does march constantly upwards. So boredom? What I would expect from what you told me is that it would start low and it would near 0 and it would stay low near 0 because no one would be near the global maximum. They'd all be near some global maximum.

>> Winter Mason: Right. Absolutely boredom is the exact right answer.

Basically what you would see is that for if I plotted this for a single run, like these are averages. If I plotted it for a single run a lot of times what you'd see is right, exactly, it's kind of like the flat, flat, and then like some jump up around like around 7 or so because basically people found the local maximum in the first two rounds. They stuck there for a while and then they got bored. But boredom and exploration, you know -- [laughter] -- kind of the same thing, right? So we thought this was really interesting. I mean, we really expected the fully connected network to be the upper bound. It's complete information. We really expected it to be the upper bound. What we actually found was that the small world was doing better.

>>: Supposing you had no time constraints, do you think [indiscernible].

>> Winter Mason: Well, what we're seeing is this difference being in the very early rounds. Right? So regardless of how much -- I think in the end, you know, just by everybody's gotten there and they're staying there, you know, eventually this little difference would get washed out.

But the difference that we're seeing is in the early round so it doesn't matter how much time we give them.

>>: But, again, it's not so surprising if you -- because they're not seeing the plots on the graph.

>> Winter Mason: Right, they're just seeing these boring list of numbers.

>>: Can you imagine the human having a mental model in the distribution the scandal is look for the max, it's got to be close to the max.

>> Winter Mason: Absolutely. I completely agree. I think that's a really good explanation and actually would like I say would love to see that paper because I think that would be fantastic for this.

So we did another one, basically this is the same setup, same social networks that we were manipulating and a different fitness function. We call this the needle for obvious reasons, right? And so this is actually, again, specifically meant to, designed to fool them in this exact way. Right? They see this, what looks like a unimodal distribution and finding this in order to do this, they have to some real exploration in this no scoring problem space. And in fact the range in which they can get any points at all greater than 0 essentially, I mean there's stochastic noise, right, but is about a five point range. And I forgot to mention we did change where these were located. It wasn't always at the same points. We counter balanced it so on and so forth. And so what we found in this case was actually looking at how quickly on average people converge to it was not a very informative measure, because it turns out that almost all the time nobody found it.

So then you just had the censoring at their 15 rounds. That's as long as they had. So this is a situation where we might have actually seen some differences if we had given them more time. But, trust me, they didn't want to be there any longer than they had to. And so instead what we do is we look at how many groups found the needle at all. Had somebody that found the needle that actually made a guess within here and of course when somebody had made a guess within here they always stayed there. They immediately realized that it was better than over here. But -- and so in this case the number of groups, when we're looking at the number of groups that found it, it was actually the lattice that did the best.

So in fact when you look at the convergence, so this is -- yeah, so this is the global maximum. This is actually how many were guessing the percentage that were guessing within the range of the needle, and this is how many people were guessing within the range of the local maximum.

And what you see is basically -- right. So here you see this squares -- they're going to the local maximum. This is the lattice, they're going to the local maximum at first, but slowly they're starting to go away from the local maximum and starting to go towards the needle. This is, of course, on average we have to take it across all the groups and whatever.

And so the idea is that basically what we're seeing is that the lattice, as I mentioned the thing that was causing a problem in the unimodal where it was taking a long time for this information to travel around the social network, is actually benefitting them here, because people are doing more exploration. They are less likely to have seen somebody that has guessed within the local maximum which means they're going to continue doing exploration, which means

they're going to be more likely to find the needle. Once they found it that good information can then propagate to the rest of the people in the network.

What we're seeing here is this very clear trade-off between exploration and exploitation. And in fact what we see is that these different networks, based on the speed of transmission of information within them, actually encourages exploration and overexploitation or vice versa.

And importantly, you know, which is going to be most beneficial for the group is going to depend on the kind of problem that they're looking at. Right? If it's a unimodal case you really want fast transmission of information, because that will get people, as soon as somebody's found it they'll all converge to the correct solution, because there is only one correct solution. But if you have something more like the needle maybe there's a really good solution that's not easily found and requires a lot of effort, maybe you actually want to slow down communication between people so that people are doing more exploration.

There's work on brainstorming that shows that people are a lot better when people brainstorm independently then you put the ideas together, because even just as an aspect of turn-taking or whatever, when you have this kind of very rapid communication it actually diminishes people's creative ideas.

So if you have people -- you know, this is analogous to slowing down the communication. And so --

>>: [indiscernible] does anybody try the uniform [indiscernible] uniformity?

>> Winter Mason: Yes. It was not a very successful strategy. Basically, if you had infinite time, of course, that would be reasonable. So what I'm about to talk about, just briefly, because I'm essentially out of time, is kind of where I want to go next.

Right? Which is I want to actually combine these two things and look at how the social network structure affects both individual success differentially and group success differentially, simultaneously.

And so what I'm doing -- this is my -- okay. So from the two studies we know that okay position of the network affects individual success, and overall structure affects group success, right?

So what I want to do is combine them. And so what I'm thinking of doing is -- and actually I've started building this. I'm going to be running it in the next month, basically. So now we've got -- it's essentially the same kind of setup I had in the experiment I was just telling you about except in this case instead of being one dimensional they're just guessing a number they're actually going to be searching a field with a problem space.

So another nice thing about this is that there's noise here. So there's where you guessed that it's not a smooth landscape. But there's -- it's not stochastic noise.

So there's a definite answer. If you guess here and somebody else guesses

here at the same time, they'll get the exact same points as you. Right? So there's no stochastic noise. It's a little bit more fun, because you're clicking. I'll actually show a demo of this in a moment.

So this is just the setup. But what's interesting to me is to take these kinds of graphs, where in this one basically you're maximizing actually built a little genetic algorithm to design these so that I'm maximizing the variance of the between-ness. So the person here is going to be very, very central and the people here are going to be very, very noncentral.

So within this group -- and what's nice about this is that every single. I've fixed this. This is fixed degree so everybody actually has a degree of three. So they're all getting the exact same amount of information. And so if we see differences here compared to here, then we'll actually have experimental evidence that these positions really matter as a result of between-ness versus not. Right? This one maximizes the average closeness.

And there's another one that I didn't put up that is maximizes the variance of the variability between closeness. So we can look at within a single network how does a very close individual, how do these particular structural properties affect the way that these individuals succeed?

I also want to look at using this methodology, how groups specialize. So this is what I wanted to get at your question, Chandra, is what I'm interested in is a real ideal strategy, if you're working with the same group over time, the ideal strategy is to initially distribute yourselves uniformly. But if all you have to go on is essentially the, the only thing that you see are the people that you're connected to, you might be searching the same place as somebody that's two or three hops away and not know it.

So that information has to propagate. But it's possible for that to propagate right?

You look at the graph coloring experiments that Sid Surry and Mike Curtains did you only have information but the graph is able to attain this kind of coordinated state. So I think you might be able to see the same sort of thing. If people are iterating on this and working with the same people over time, maybe at round 0 of the sixth trial of this, they're going to have learned to optimally distribute themselves in the space.

So that's something that I also want to work on. And so, brilliant. So essentially this is how it's going to look. They're going to see feedback in terms of how -- this is what the other people guessed and so you can -- where is the mouse?

Here. So you can see this person, the little yellow marker that's getting highlighted there. That's what person A did and this is the score they got. These are your guesses in previous rounds. Okay. And you can see the score that you got and it's of course color-coded in the same way.

You can -- one of the things in the previous experiment that I felt it was like oh I wish I could have fixed this is that we could kind of look at copying by saying, oh, this person guessed near where this other neighbor of theirs guessed but you

can't really differentiate between that and exploration, just like them guessing the number because they wanted to guess the number rather than because they saw it from somebody else, right? There's no clear thing.

Here, we'll give people the option to actually explicitly copy either their past behavior or their neighbor's behavior, and so the little like problem space that I had showed you before is actually going to be embedded in this map. They're not going to see anything. We'll frame it as an oil exploration thing. And so they'll actually get to play this game. Hopefully it will be more fun than just guessing numbers so we'll actually do it and here's the plug for H comp. I'm going to be -- [laughter] -- I'm going to be doing this on Mechanical Turk recruiting participants on Mechanical Turk. We've got this method. We figured out you can get maybe eight people to work simultaneously, right, if you're lucky, to get people to show up at the same time and work.

But we've got this method to build basically you can build panels. And by doing this you can actually get large synchronous groups working together. And so that's actually what is necessary in order to have those networks that I was showing you before. You need to actually be able to get people there participating at the same time. This is the reason why I haven't followed up on it since I left grad school is I just didn't have the resources. I didn't know how to get that many people working on a problem simultaneously.

And so now we have the method and now we're going to do it in using

Mechanical Turk and then there's this human computation workshop that everybody should go to where work like that will be presented.

>>: Is the color-coding now according to functionality or is it?

>> Winter Mason: Right so that's the score that they're getting.

>>: I see. That's interesting, because now compared to the previous one, where there's no visual prep of the function that you're exploring it now, this may produce -- for better or worse -- some other aspect into what's going on.

Because for instance there right if you had plotted the points on an axis the multi-modality becomes more obvious.

>> Winter Mason: That's right.

>>: Because it's seen as opposed to -- but here you see it as a prospect. Which

I expect will help people.

>> Winter Mason: Yeah.

>>: In fact, it means throwing out things like color in between rounds so that they're gradually filling in a map or --

>> Winter Mason: Yeah, that's something that I'm not sure whether to be worried about it or not. Right, is the fact that I feel like this presentation might encourage exploration, independent, just by virtue of this presentation.

>>: The color of the map.

>> Winter Mason: Right. Exactly.

>>: Good question.

>> Winter Mason: Yeah.

>>: Might be interesting to look at the relevance [indiscernible] the list. Very hard for people to deal with the list and data.

>> Winter Mason: Absolutely. My hope is that -- so the way we're going to do it we're actually going to pay people based on their points. So -- and we're going to allow people to play this repeatedly.

>>: The other thing you could do you could kind of -- you could make a false limited memory where you only have to --

>> Winter Mason: Yeah, absolutely. I --

>>: Reduce the coloring instinct.

>> Winter Mason: Yeah. Yeah, actually I mean that's a very good idea. That's actually something that I had thought about earlier, and I've kind of -- because I was thinking about that, because I was also thinking do I also want to show the history of the other players, right? Right now all I'm doing is showing your own personal history. Right? Rather than the history of both you and all of your neighbors. So you're not actually coloring out the whole map. You're just seeing kind of your personal history.

>>: I see. I thought you were seeing the neighbor's ones too.

>> Winter Mason: No, right. But that's a relatively arbitrary decision I've made.

And the hope is that these features, yeah, they're definitely going to affect the way people behave. But the hope is that we'll still see the differences in the networks and network positions regardless of whether of which particular one that we choose. So long as we're constant across the different networks, and the different positions within the networks hopefully we'll see the difference regardless of what kind of information presentation we choose. I mean, it is possible that maybe if I select one particular presentation not showing the history, showing the history, whatever, that the differences will be greater or less than some. But hopefully we'll see them regardless.

>>: The fear I would have is the particular case of not showing your neighbors, I could assume that [indiscernible] as well as other people that people might just not realize that there's anything other than what they've -- you might chop off the network effect.

>> Winter Mason: I see what you're saying.

>>: You're right it would still affect uniformly but this might be a really big effect.

>> Winter Mason: That's true. That's a good point. And actually, yeah, I'm still looking for feedback on this presentation. So this is actually very useful.

Because like I say I haven't launched this yet. So there's still room to play.

>>: Interesting.

>> Winter Mason: Any questions?

Okay. Thank you.

Download