>> Eric Horvitz: Thank you for coming. I’m... the Redmond Lab. I’m thrilled to welcome Sandy Pentland...

advertisement

>> Eric Horvitz: Thank you for coming. I’m Eric Horvitz the managing director of the Redmond Lab. I’m thrilled to welcome Sandy Pentland today to Microsoft research to the visiting speakers series. Sandy is a friend and a long-term colleague, and he’s joining us today as part of his countrywide book tour on his new book

“Social Physics: How good ideas spread, The lessons from a new science”.

I’ve known Sandy as a pioneering scientist doing things that are close to my own interests and heart. He’s an early leader in computer-based perception and sensory fusion, including pioneering work in vision and also in the analysis of rich streams of data and how making inferences from those streams can give us a sense for human notions of context.

As one example of some of his work, several people in research, and maybe many of you in the audience, might know of his work with Matthew Turk to develop methodology called Eigenfaces for face recognition that are used widely now.

Sandy’s publications have been inspirational. He’s also mentored in his years at MIT many fabulous leaders, now renowned computer scientists in themselves, including several folks who has graced Microsoft research and Microsoft corporation more widely, and these include our close colleagues Matthew Turk, who I mentioned before on the Eigenfaces, and also [inaudible] Oliver, who is now at [inaudible] after spending a number of years with us at Microsoft research.

Over the last 20 years Sandy has moved deeply into an area that has come to be known as computational social science, and this includes data-centric studies of people and the networks and the larger organizations that they compose. And he’s formulated a really beautiful and rich research program in this area at MIT with multiple projects that help to define opportunities for new understanding about people and society that come via sensing and data resources that have come available through such things as mobile devices and online activities.

Sandy currently serves as director of MIT’s human dynamics laboratory. He also is a director of MIT media lab’s entrepreneurial program, I guess it’s called the entrepreneurship program, and co-leads the world economic forum, big data and personal data initiatives.

His new book “Social Physics” comes on the heels of his prior book “Honest Signals:

How they shape our world”. I just mentioned to him outside that I just finished that book. I have to get reading faster. I enjoyed it very much and I look forward to hearing about his latest writings.

Please join me and give Sandy a warm welcome.

[applause]

>> Sandy Pentland: Thank you very much. So is the microphone and everything happy? Good. Okay. So what I want to give you today is a talk that’s actually a sort

of general talk. So I’ll talk a little bit about the mathematics, a little bit about the technology, but only a little bit because I think that the audience here is really quite broad, and so if I give something that’s sort of a technical talk that it will cause problems. So you have to forgive me that I’m talking at a sort of more public popular level.

I’ve created a website. It’s called socialphysics.org that has all the original research and it also has videos of talks from all of the students that worked on this, and other sorts of resources like that including several of the world’s biggest, richest datasets about human behavior, which are open for use and I would encourage you to do that.

So what I’m going to talk about today is this thing called social physics. So social physics has a long history. Originally the term came up in the early 1800s when the idea of sociology first came up, the science of people. And the coiner of that phrase first started with social physics, you know, thinking about it as a sort of mechanistic description of society and quickly sort of figured out that society is not that way and dropped it. And it came back again in the 1930s-1940s when people like Zipf

[phonetic] and others discovered these statistical regularities in human behavior that looked very much like a lot of sort of physical type things, but those were not generative processes. So they were interesting descriptives, but they didn’t give you a handle on much to do.

So what I’m going to do is talk about what happens when modern big data meets social science. And what happens is that all the things that we know about political science, about economics, about psychology and a lot of medicine is a result of what are actually very small datasets, and they’re very sparse in terms of time. And with the advent of mobile telephones and credit cards and cameras and stuff like that you can now get very rich annotation of people, entire communities for a long period of time.

And what that lets you do is that lets you build really interesting mathematical descriptions that are quantitative and predictive of how communities evolve, how ideas change in communities in a way that was rather unimaginable previously.

So that’s the science part of it. The other side is the moment you realize that you can do this sort of stuff, it conjures up all these bad visions and you realize very quickly that Orwell was not a very creative fellow and we could do an awful lot more damage than he ever imagined. And so what we ought to think about is how can we use this new resource to understand people to build a better society, a more creative society, but also one that’s safe for people.

So that will be the second half of the talk. So the first thing I want to do is just show you this qualitative thing. This is something that along here is duration of observation, and this is something like number of bits per second of bits per minutes. Bits aren’t quite the right measurement, but this is the big thing is that if

you look at most social science it’s down here around Zero-zero. So what it is is this freshman in psych 101 was filling out a survey a couple bits per minute, short period of time, small number of people. Over here you’ve got things that are like the

Framingham [phonetic] heart study, 30 years of tens of thousands of people, but they only talk to the people once every three years and they collect about 30 numbers.

So the bit rate’s about zero. Moreover they do it in a way that’s a snowball sampling technique so that you can’t ask what happened to the average population. You get to see this guy here, he has higher cholesterol there. Maybe he ate nothing but

Kentucky Fried Chicken in between, but you don’t know, okay, because you never collected that.

Most recently we’ve had some big data studies. So Telco’s credit card companies have given us quote unquote anonymous data that lets us look at human behavior.

So in my lab, for instance, we had six years of credit card data for about 100 million

Americans. Think about that.

And it was actually behind firewalls and so it was pretty safe, right? Never got hacked. And we have Telco data. For instance, we have Telco data so we have calls, mobility, and credit card records for several million people in a mid-sized country south of us. So incredible things. But they’re very narrow because while you get lots of data points every day, you don’t know anything about the person, right? So it’s hard to know how this data is giving you a perspective about human behavior.

So what I did was I started a project to get rich data sets where it was really sot of the god’s eye view, everything we could collect. And I did a variety of things. We built hardware that has sensors in it, little badges and we would get everybody in an organization to wear this for a month. And what we would do is we would know where they were, who they talked to, how their body language was, what their tone of voice was. We did not record content ever because people got freaked out about it. And also a lot of the things that you might imagine as an operational system in the real world you wouldn’t want to have content in there.

So we tried to say away from that. So those are some of these studies here in the middle where we’d look at people for a month, 100 people for a month, and we could see every single conversation and we could look at also behavior because we would collect all the electronic stuff. And then other sorts of studies that we’d do are with phones.

So we would give out smart phones to everybody in a small community. So we would do, for instance, apartment buildings, 65 young families give them all phones.

We would collect the, and it’s human subjects, right, so we’d give them informed consent, we would pay them, we would give them the phone, and then we’d collect where do they go, who do they call, who do they spend time with? Their facebook activity, their credit card, even we got things like these EEG things that Zeos

[phonetic] makes so we got their EEG at night. And we discovered all sorts of interesting things. But it’s much richer than any other social science we’ve seen before. That’s the point. We have all these communication channels, we have it densely, you know, every 16 milliseconds for a year for all the people. And we tried to pick sites where the social networks were contained, so very insular communities.

And the part that’s interesting about this, of course, is the [inaudible] up there is where the world is heading. As things become more and more digitized the experiments I’m doing are going to look a little thin, okay? Because the world is going to be so digital that we’re going to know everything about everybody potentially.

And so we want to ask what is it that can be learned, what can we do with it? The short-term thing is that we can begin using this big data to build mathematical models of human behavior that are actually quantitative, predictive, verifiable, refinable. So that’s really been the problem with most social science and most medicine is that you do not have enough data over all the channels or all the ways of interaction to be able to ask what were the causal connections?

And the typical sort of modeling we do, which I’m not going to talk about much, is you think about all these people interacting with all the channels and there’s this huge Markov model, okay, except that that’s too many parameters to actually estimate. So we do a slight simplification where the influence that I have on you is limited to my state and your state, and then there’s a fixed parameter. What that does is give you log of the number of the parameters and it turns out that you can use EM methods to actually solve for everybody has their own [inaudible] machine, a hidden Markov model, and we have parameters of influence between them.

And you can actually do this pretty realistically. We’ve done it for on the order of ten percent of all the people in a mid-sized country looking at their car patterns, for instance. So you can really do this sort of stuff. It scales halfway reasonably well.

But the big thing is you can actually solve the equations.

So what do you find when you solve the equations? Well, here are the things I really want to you pay attention to, because they’re things that we typically get wrong, and

I think the people in this room get wrong. So one is this diagram comes from Danny

Kahneman’s Nobel Prize lecture. It talks about having two ways of thinking in humans, there’s the fast and the slow. It’s a little bit of a cartoon, but not much of a cartoon, and he has a Nobel Prize and you don’t so we get to do this, okay? And what you might ask is, okay, the slow for those who haven’t read the book is the rulebased sequential thinking that we do. It’s the thing that we spend all of our time, we go to college about this. And then the fast is this associational mechanism, this tens if not hundreds of millions of years old, it’s based on experience. It’s actually interesting to think that this is better than this if you have the right experience and it’s a complicated problem.

If you have lots of variables, lots of tradeoffs, this is much better. It’s basically sort of a partisan estimate of a complicated surface. This, we all know, has some computational complexity problems and is very unstable with respect to assumptions and initial conditions. Okay?

So in fact, if you look at people you see that probably this is 90-95 percent of everything we do because it’s very fast, it’s quite accurate, and this is what we use to catch the bugs. Okay? But another interesting thing is you might ask how do these types of thinking learn? Where does their database come from? And this is where we often make our mistake. In the data that we have I call learning in this system exploration. And it’s a thing that’s called simple contagion in the sociological literature.

So it only takes one exposure to an idea for it to go into your conscious thinking rule-based stuff. Somebody says Audis are very expensive. Okay, well that’s in your head now. You can sort of think about that, okay? And we’re very profligate with this. We like to have lots of ideas and so forth, lots of things that we think about and

[inaudible] through, but what’s interesting is that this has very little to do with this.

It’s very hard to get from here to there. You can think about how hard is it to stop smoking even though I tell you smoking will kill you? Right?

Fifty years later people are still smoking, right? I mean all sorts of habits are very poorly coupled to what we think about the habits. So that’s perhaps as it should be.

When we look at habits, and I’ll say a little more about what counts as a habit, it’s really these automatic sorts of reactions that we have, that’s [inaudible] engagement and it has a different method of learning. The method of learning that seems to be the thing that describes this is called complex contagion. What that means is that if you see people that you interact with, so these are peers, experimenting with something and appearing to get goo results, then you seem to absorb it without a great deal of consideration.

So if everybody around you suddenly starts drinking skinny lattés and everyone says oh this is so great, right, without really thinking about it you’re very likely to start doing skinny lattés. That’s a bad example, but in general it’s those sorts of things.

So what we’re doing, if you think about it, we live in this uncertain world, and if I see people who are like me, which means their knowledge will generalize to me, if I see them doing things that appear successful and I see several experiments of that sort, then it’s a good strategy for me to follow that up. I get to have an entire behavior, an actual piece of information without all the scar tissue of trying the bad things. Okay?

So this is probably why we’re social species, is because we need to observe our peers and see the things that are successful and adopt them. And that’s in the title of the book, how do good ideas spread.

Here we can play around with a lot of stuff because it’s not tightly connected to our actual habits. So okay, that’s idea flow. So here is an example of the stuff that we do.

So this is a German bank. It’s got five departments, management, development, sales, support, and customer service. What they’re doing is making a new add campaign. So it’s a creative process. You can’t see it but there’s the date down here.

Each frame is one day.

This is all the email, stuff like that. This is all the face-to-face stuff. So we have wearing these badges for a month. Okay? And we can see when all the groups get together, the big circle means this group had a group meeting, right? All that other stuff that’s in the halls and around the coffee pot. Now, let’s go through it, so let’s see, wait till the end of the month. You can’t see this, but don’t worry about it I’ll tell you. Beginning of the month managers send out lots of stuff. Let’s have an add campaign. Everybody has meetings, right? Figure out what we’re going to do. You will notice that nobody talks to customer service the whole time. And then they get the product and they release it, and it’s a disaster and they have all the meetings with customer service.

Okay? The rich channels of communication are not observed in almost any corporation or any organization in the world, but yet those are the ones that contain high-value nuanced information. Now, we might keep track of the email or phone calls, but we never keep track of the other stuff. But yet, that’s where the most valuable, important stuff is. So this organization, in response to this move customer service to sit in the middle of all the other organizations so that you had to run into them and talk to customer service and that solved a lot of their problems. But a more interesting thing comes out of this.

And this is from a study of some dozens of corporations. You can divide this face-toface stuff up into two pieces. First of all this pattern doesn’t seem to say much about productivity or creative output, sorry, but all that electronic stuff doesn’t seem to do very much. You need enough of it. You need to have different ideas flowing about, but it doesn’t affect behavior very much.

On the other hand if you look at the face-to-face stuff you see two things. One is you see engagement, which is within a group. Does everybody talk to each other? So the mathematical version of that is if I talk to you, and I talk to you, do you guys also talk to each other? So it’s a loop construct. If all the loops are closed it is very high engagement, which typically accounts for 30, sometimes 40 percent of the difference between low productivity groups and high productivity groups. You should sit and think about that because there is no other management thing you’ve ever heard that has that big a percentage of variance.

You might say, well, 40 percent that’s not a big deal, right? That’s only as important as your genome, for instance, in medical outcomes, or only as important as IQ.

Okay? Interesting.

And then there’s another component to these, which is stuff your boss tells you probably not to do, which is going and talking to the sales guys and the janitor and everybody else, and we call that exploration.

So in organizations, when you talk to people that aren’t in the official org chart you productivity goes up because you have broader context, presumably. So what this is doing is this is passing implicit tacit knowledge around, it’s the stuff that you don’t write down. It’s stuff that maybe works life balance stuff. You know, there are a variety of things and we’ve done interventions. This is causal. The interventions can be things like you make the lunch tables different, or you have mixers of various sorts. There are things like that. And creative groups, like we do drug discovery groups. This exploration has a bigger role than the engagement. So you get these organizations, which are the key profit centers for drug companies, and a bunch of

Ph. D’s like highly skilled and the biggest factor in what determines a productive drug researcher is how much they talk to the people they’re not supposed to talk to.

Okay? How much do they understand? You can talk about causality here, we’re happy to do that. I’m not going to lecture to emphasize that, but if you intervene you find that if you get them to talk more broadly the waiting, the inventiveness, the creativity goes up. So it has a causal thing.

So the bottom line there is the pattern of social ties predicts productivity and innovation. Why? Because of social learning. All right? If I see people doing things and I learn from it, I’m going to be able to do better decisions. In a work group it means that everybody is on the same page, you know? I’ve seen what you do and maybe it’s not what I thought we were supposed to do. And then I’ll ask you a question about it, okay?

But I have to see that to be able to get compatible behaviors between people. And I wrote a paper about this last year and it won this award, paper of the year, from

Harvard business review, and it won a paper of the year from the Academy of

Management and I believe that’s the first time that the Academy of Management or

Harvard Business Review have ever agreed. So maybe it has some truth to it.

So that’s in companies. Now let’s look a little more broadly. So this is a company I cofounded some years ago. These are taxicabs moving around in San Francisco. It’s a proxy for people moving around. It’s all anonymous. The black dots here are the most common places that people go. It looks like a nicely mixed city but if you analyze it in terms of dynamic clustering, so you ask about these patterns of flow you find that the flows actually have these sort of like singularities that they flow around.

What that means is that there are subgroups that spend most of their time with each other and not with anybody else. They walk right by the other people but they don’t hang with them. That doesn’t mean they know each other, it just means that they’re

this type of person and that type of person. And you can map these onto the city if you want, but what’s more interesting is if you actually now go and talk to these people you find interesting things. You find that fashion sense is different in the different groups.

So you’ll get these ways of fashion propagating through just one of these groups.

Not the other groups. So one group all the girls will start wearing red dresses but not in the other group that walks right by them, and it seems to do something with learning from each other.

What seems to be successful? Now, we’ve done this in other things too. Like we’ve done this in a dorm, looked at weight gain. We’ve even done politics. If you spend time with people who are republican, you will tend to be more right wing, if you spend time with people who are more democratic you’ll tend to be more democratic. It’s not the stuff between your ears, it’s the sort of learning and implicit stuff that you soak up that seems to be the effect.

So red dresses, that’s maybe understandable here as one that’s not so good. Credit rating is very highly predictable, much more so than demographics. So the

[inaudible] score is not so great, but if you know what group the person is part of you know a lot more about their risk profile. Different groups have different attitudes towards risk.

Some groups you give them a credit card it’s all wonderful, other groups not so good. And then this is one that is really important is different groups have different chronic disease liability or susceptibilities. So we don’t know precisely why it’s this way or the other, it’s some combination of behaviors that they learn from each other and adopt that makes them more susceptible to say diabetes or alcoholism. There’s probably some self-selection that goes into that also. But typically you get prevalence’s of these chronic diseases that differ by a factor five between these groups.

And that’s important because now you know where to have a campaign for diabetes or where to test. So that’s important. So what’s going on here again is this idea flow, these cascades of behavior change as a function of exposure to other people, right?

Remember, all these guys read the same newspapers, see the same T.V. It’s not the information in the sense we usually think, it’s something having to do with interaction. So here’s one of the big things that I want to put out there. I think that we are seeing the beginning of a really new type of science about people, okay?

The science we have today is mostly cognitive science. It’s economics. And one of the fundamental things is you have individuals. I’m not going to argue about the rational part, okay? In other words it’s what happens between your ears that matters to your decisions, okay? So like in economics you assume you’ve got a bunch of more or less identical people, maybe they have slightly different utility curves, right? And then you apply an incentive and that changes their behavior.

And in fact, these sorts of things underlie markets and there’s this famous quote by

Adam Smith about the invisible hand. How many people know where it’s from?

What?

>>: The wealth of nations.

>> Sandy Pentland: Wrong. It’s moral sentiments. It does not appear in the wealth of nations. And despite that there are some Nobel Prizes around this, it turns out that this notion that markets distribute things in the best possible way, it’s called social efficiency, only happens for market parameters of measure zero, which means that it’s the regulator that makes the division, not the market itself. Market adds an interesting thing. But if you actually read Adam Smith’s Moral Sentiments book he also says things like this, “It’s human nature to exchange not only goods but ideas, assistance, and favors, and it’s these exchanges that guide solutions for the good of the community.”

In other words, it’s not the market, it’s the fact that we have these social transactions, this exchange network between us that causes us to come to compatible norms of behavior and allocate the resources in an appropriate way.

That is sort of a mildly [inaudible] version for those of you that know some economics. But the big thing is look, there’s a social fabric, and we come to compromise with each other, we learn from each other principally, but we also have social pressure between each other so that you don’t do things that screw me over and I don’t do things that screw you. So we have a little fight about it, we reach a compatible norm, the world is happy.

And it’s that that is the invisible hand. But think about it, when you write programs when you do the social networks you don’t ever think about this stuff. Almost nobody does. This irritates me to death. I go to places like Davos and there are all these world leaders and they use economic metaphors for everything. Government is a market, right? Well, no actually we’re people and we talk to each other and we have this social fabric, and the peer-to-peer effects are important. The peer-to-peer effects are where you get fads. It’s where you get new political movements. It’s where you get financial bubbles.

If you assume everybody is IID independent, you’re not going to get those things.

When you can model this you can count for them. But to model them you have enormously more parameters. You need enormously more data. Guess what? We now have enormously more data.

So, cool. Social fabric. Now, when we do incentives thinking the old way what we would do is we would give this guy some money to say become more red. Okay? Or some other sort of approval or something. But he’s being fought over by these other things by this social fabric. So we see where the economic incentives and other sort of incentives that they don’t work nearly as well as we think they ought to. In fact,

when people get comfortable they don’t work reliably at all. And moreover when you remove them people snap back to their old behavior because they embed it in this exchange network. Okay? But this suggests another way to incent people.

Instead of giving people incentives, I could give the social network an incentive. So let me show you some examples. This is a very different way of thinking about things. It’s a fundamentally different way.

And if you do the math, because you can write down al the equations here, and you can solve for how much of an incentive is going to cause what amount of change, and you get equations like this. It’s fairly standard econometrics, except that it has terms that have to do with social incentives and the cost of interaction between people. So it includes terms that have to do with the social network. And if you analyze these, what you find is that the total award distributed to peers in this social incentive are at least twice as, at least half the size of individual incentives.

So what that means is we have a fixed budget, you want to change behavior, you can go twice as far with social incentives as you can with that. In fact, it’s a lot better than that. This is under fairly general assumptions. You can read the paper. Let me give you some examples. So here’s a really striking one. We took one of the cantons in Switzerland. They had to save energy because they’re hydroelectric primarily, and above a certain level they have to fire up diesel generators, which is not only expensive, but it makes the air really polluted up in the mountains, okay?

And they tried educating people, they tried giving them financial incentives.

Nothing had much effect. And we talked them into trying social network incentives.

So instead of saying if you save energy I give you a reward, let’s imagine you two guy interact, okay? I’m going to give him a reward if you save energy. Okay? And if you two guys interact I’m going to give him a reward if you save energy. And what is that going to do?

Well, that’s going to cause you two guys to talk. If you know each other and interact regularly and I set up that incentive you guys are going to start talking about this.

And it’s okay for you, because you get to talk to him about it, right?

And if he saves energy then you come out. So we had a budget of about fifty cents per week per person. And we got a reduction in energy use of 17 percent over the population that we signed up. Now, to give a comparison, there have been places where people have saved 17 percent energy, but it required a doubling in the price of energy.

So that gives you a comparison individual incentives and social incentives. That’s example one. Here’s example two. I took a community of young families and I divided them into two. So these are people I’d given smart phones to. And so they’d signed up as part of an experiment.

And the smart phones showed them their level of activity. This was in the fall in

Boston when everybody gets under the blanket and doesn’t come out, right? And so we wanted to be more active, and what we did is I gave them a reward if they became more active. That was one half of the community.

The other part of the community I gave them a reward if their buddy became more active, okay? So I took people who were either friends or who interacted a lot and I set up these social network things where not to the person, but to the buddy. And so what happened was the buddy rewards were between four and eight times more effective per dollar than the individual rewards, and when we ran out of money because we’re just a university, the buddy rewards were far more sticky because what we’d done is sort of stretched the social fabric out of shape. So now being active was a topic of conversation.

It was a badge of sort of social brownie points or something like that. And it persisted for at least another month or two after that. And things change so it’s hard to tell, but had the properties that you want in terms of more of a long-term influence of behavior.

This is one done by James Fowler [phonetic] sort of working with facebook. So he in

2010 sent out a get out to vote message to 61 million facebook users. The message itself had almost no effect, but a few people did tip and went and voted, and he also had a button where you say I voted. And if you say I voted then all your facebook friends would see your face, and that had almost no effect too, except with one class of people. If your people that had taken pictures of you together that were on flickr, so these are people that had face-to-face relationships enough so that they had posted a picture with them together, right? In that subset of things one person voting typically tipped two to three other people.

If you were just facebook friends it had almost no effect. So which channels of communication matter? The face-to-face stuff. And it’s the social pressure of it here, just like the social pressure there or there, that did the job, not the information. So often I see people talking about well, we want to change behavior we need to give them more information. Well, good luck. Okay, so those are some examples.

Now, why does this social network stuff, this social interaction, this idea flow between work? Let me give you an example. So this is a network called eToro. At this point it was 1.6 million people, and they’re buying and trading dollars, Euros, gold. They can trade more things and it’s now over 3 million people. And the thing that’s unusual about it is you can see what everybody does. So I can go on there and

I can see what you’re buying and selling. I can’t see the dollar amounts, just your strategy about it. You’re going to short Euros and long dollars and use leverage times 50. And I can say well, he’s a really smart guy and so I’m going to take ten percent of my money and do just exactly what he does, okay?

And when that happens you get a dot. So these people are the same people as these people. Each dot is a following. So that means we’ve done some social learning for real, okay? And if you look at the community you see subgroups that do very little social learning and you see groups that look like this where I’m following you and you’re following him and he’s following me. Same things go round and around and around, and those sorts of situations while the flow of ideas is very good, it’s the same ideas over and over again. So you don’t get new strategies into the mix.

And in the middle you have people that are very diverse and knew their following and there are no closed loops here. Okay? So you can actually take the models I showed at the beginning, the sort of reduced Markov models, we call it an influence model, and you can build a model of these millions of people and you can actually characterize the cascade behavior in that model. So I can talk about how the topology of following changes the cascade distribution of behavior change.

So idea flow if you would, okay? So when here there is very little idea flow. You get tiny, tiny little cascade. There is very little following. Here you get lots of great big cascades all the time, but it’s over the same stuff. And here you get a much richer set of distributions. Yeah?

>>: [inaudible]. If you ignore the connections and just I see what the [inaudible].

>> Sandy Pentland: Here you can only see a certain number of people. I see this because I have an in with the guy who runs the platform. Most people have to go look at individuals. There’s a leader list and so forth, but they can’t see what everybody is doing. But remember, everybody here is reading the same newspapers, seeing the same TV shows, they have the same external information.

Okay?

And what I can do is I can build this model that talks about cascade behavior by looking at the behavior over time. And what you can do then I write down this equation gives you this. So here you get these big cascades all the time, here you get almost no cascades, in between you get middle-sized distribution of cascades. Each dot here is the return on investment on all of the social traders for an entire day. So each dot is hundreds of thousands of trades, okay? And the thing is for an entire year.

And what happens over time is the network becomes more congested, so like an echo chamber and some days it’s very isolated. I can talk about that because there’s this tradeoff between individual thinking and social thinking. So when the market is very predictable you tend to be more isolated, when it’s not predictable you tend to go up here as a compensation strategy.

And this is market adjusted return on investment. It turns out that the isolated guys, the guys that are doing it between their ears, right, are market neutral. Not so good. And as you become more social as you get more evolved in these distribution

of cascades your return of investment goes up 30 percent. That’s huge. And now as you become even more and you get into the echo chamber it goes down. And what

I’m not showing is that sometimes it goes all the way down. You get crashes. So there’s a case of this one guy in Latvia that had a long streak winning and he got followers and more followers and the followers got followers and they all thought they were following lots of different people, but really it was this one guy in Latvia that was the generation of all of it. And then one day his strategy failed and it wiped out all these tens of thousands of people. Okay, so that’s not in this graph.

So what I’ve said here is that this flow of idea when it’s correct, when it’s sufficiently diverse, cause that’s really the difference here. In the isolated part it’s not diverse, in the echo chamber it’s not diverse, in the middle when it’s sufficiently diverse you get better decisions.

Well I can actually map that in more-or-less real time in the real world doing things that are sort of surprising. So this is using [inaudible] tower data. And dark red means that there are people from different parts of the community in the same spot.

So this is mixing of different communities throughout the area of Mexico City, and you can do this in real time pretty much. We do it with about one day of lag. And you can compare this to all sorts of different things.

So what I’ve said here is that this mixing of different communities, this flow of ideas by seeing what other people do is the source of better decisions, and in companies it’s the source of great increases in productivity. How generally can we do this?

So earlier last year I talked [inaudible] carrier into releasing their cell tower data for the country of the Ivory Coast. Ivory Coast is a poor country but it also had a civil war about two years ago, so for instance, the government cannot go above about here because they get shot, right?

But what you can do is you can look at the behavior of the people in each of these little micro provinces and you can estimate their poverty index. It turns out that as people become more wealthy, more comfortable, they explore more both physically and in terms of phone calls. And when you get that greater mixing the community becomes more wealthy.

So you can make a map of poverty. And these are also correlated with things like crime rate and so forth. So you say oh, that’s pretty amazing that you can do a whole country every day instead of having a ten-year-old census. And of course here you can’t even do the census. But is this generally true? So one of my former students,

Nathan Eagle, published this. So he had got data from UK counsels. These are little administrative units in the UK, and what he looked at was the social [inaudible] patterns data, personal components analysis, communication within these council areas, and when you’ve got councils that were such that the people there didn’t talk to each other much and they didn’t talk outside of the community very much, they were very low on the socioeconomic percentile.

When you got communities that had very good patterns of idea flow they were very wealthy. Now, this indicator is a combination of things like JDP, crime rate, life expectancy, and infant mortality. So from the phone pattern or from the mobility pattern we can tell you how many babies die.

Quite accurately, right? It seems to be a basic property of people is that healthy communities have healthy patterns of communication, and ghettos just aren’t good.

Okay? And all these things co-vary with that.

So that’s pretty interesting. And here’s another thing that we think we did. So on the other end of things we analyze data from 300 cities. This data was originally gotten from the Santa Fe guys, [inaudible] and what we did was we used things like four square and CDRS to look at the amount of face-to-face interaction within the city.

And if you tell me the amount of face-to-face interaction, the pattern of face-to-face interaction in the city, I can tell you the GDP. And if you tell me the GDP I can tell you the amount of, and it’s about 80-90 percent variance accounted for. So what goes into this?

Well, it turns out that patterns of interaction in cities follow a fairly nice law that’s an intervening opportunities law. And it’s a function of density of population, how many people live in an area, and the transportation infrastructure, which you can quantify as something like average commute distance, okay?

So you tell me those two things about the density and the commuting, I tell you the

GDP. You tell me the GDP, I can tell you the commuting distance, all right?

It’s pretty interesting. In 300 cities, okay, 80-90. What that says is there’s this story that we’ve all heard, right? Bringing together different ideas as people meet and talk and share experiences results in innovation, and innovation results in increased and better outcomes like GDP.

Okay? Except we’ve never had data to prove that, and now we have massive data to get very, very high quality descriptions of this. But you’ll notice things that I did not say here, okay? I can predict the GDP without knowing the educational distribution.

I don’t know anything about specialization in the cities. I don’t know about class production and access to means of production if you’re a Marxian.

Not to say that those things don’t exist, and that they’re not important, but it seems that the big actor is this flow of ideas. The banging together of ideas face-to-face, and incidentally the electronic stuff doesn’t play much of a role at all.

It’s the face-to-face rich channels in today’s environment. That gives you the GDP.

Okay. So that’s pretty interesting. That’s an example of the sort of things that you

can begin to do when you get big data meeting social science. You can sort through all the stories you’ve heard about development and say huh, actually this one’s 80 percent of the variance, and not that that other one isn’t a gating thing, or not important, but this is what you ought to do to be able to get change. So let’s think about how we might do that.

So you can design cities, and that’s what we’re beginning to do. So this is the traditional way you design cities. This is for Singapore. So some poor group of people had to carve out of wood every building in the central part of Singapore and glue it on this huge board, and that’s how they design cities. Want a new building?

You pull the old piece of wood up, you carve a new building, and you stick it in and say oh, that looks pretty good, right?

This state of the art urban planning. And some of the buddies with me in the city science project [inaudible] have come up with a different way to do it. So they build cities out of Legos, and then you have a laser range finder that extracts the 3D structure and lets you do things like simulate wind flow or shadowing.

But we can also simulate the mixing of the population. So these things that have to do with creative output, or poverty, we can look for places in the city that are ghettos. We can look for places in the cities that ought to be hot spots that weigh the mixing and the GDP of the city. So we can design for cities that have better properties of idea flow and better social outcomes. That’s the idea. So I’m trying to give you an intuition of what the equations tell you because there are classic examples of this.

So I mentioned engagement, which is everybody being in the loop, and exploration.

So a classic example of engagement is villages from the middle ages. So these are villages north of Zurich. So there everybody knew each other, everybody talked to each other. They all had compatible norms. It was a completely boring and oppressive place, right? It was good for the kids, because everybody knew what to do and would support the kids and they would be good for elders too because the rules are pretty clear and there was a lot of engagement, but not much creativity because new ideas didn’t come in.

So how would you take this thing, which has some positive aspects to it, and make it more creative? Well, what you’d like to do is get the ideas form one village to bang against the ideas of another village, right? So here’s an example of that. This is

Paris. Lots of little villages, they call them [inaudible], and then good transportation allows all the villages to talk to each other.

And that’s the same pattern that you see in Boston, the same pattern that you see in

London, the same pattern that you see in New York City. And those cities are all a little overgrown now, transportation infrastructure is creaking, there are other things that have sort of come up.

But this is an intuition for what you need. So to have that sort of social support plus the banging together of ideas is a little bit like Jane Jacobs and the sort of battle between whole communities and Moses and his putting in big highways in New

York. And there’s some mixture of those that actually seems to be the right thing.

So what we’re doing is we’re looking at different types of infrastructure you can put in to get the best of both worlds. So at MIT Bill Mitchell started the folding car electric car, and Kent and Ryan actually built this in Europe for the city of Paris. I’m on the board of Nissan and we’re building the first commercial autonomous vehicles. And it will be something that addresses the same sorts of problems.

I can’t talk too much about it. But these are the sorts of things that you need. You need to have people get around amazingly well, and we can talk more about that if you want. The other thing you need, which we need to talk about is the trading of ideas and information by digital means. And of course you guys would be concerned about that.

And you have all these people talking about personal data, personal stories as being the new oil of the Internet. We’re going to mine it and make money, right? And so there’s a lot of concern. And as you’ve seen the things that I just talked about, you should be very concerned. Because if I can get at this stuff, I can influence you in ways that you’re not even aware of, okay? So what do we do?

Well, I started a discussion group at Davos about six years ago now, which brings together people like the chairman of the federal trade commission, the justice commissioner of the EU, people from the [inaudible] of China, heads of many of the major companies for Tel Cos and Banks and so forth, as well as MGOs, advocacy bodies to talk about what to do, and the nice thing is we were looking for a win-winwin solution. Craig Bundy was part of this also.

And this is from the 2013 version, so when people talk they have these amazing artists that diagram the conversation as it - you can’t interpret this don’t worry about it.

But what’s interesting is we’ve gotten to a place where I think there’s broad agreement among regulated industries, among regulators, and among a lot of the advocacy bodies, and it basically is this. The same sort of things that I do when I do an experiment in the subject’s law ought to be generally the case. Those are the sorts of protections that they have. So you have to know when someone’s collecting data about you, notification. This is a point of contention.

Regulators like this do not track stuff. I think there are better ways to do it.

Informed consent, everybody agrees about this. That means that you have to understand and see what the data is, and be able to opt in something that provides value to you. And you have to be able to opt out and the data goes away.

And in New York they took this a little too far, this right to forget, but you know, they’ll come to something reasonable I think. But this sort of infrastructure, with auditing to make sure when I give permission to use my data it’s used correctly, right, is the thing that there seems to be broad consensus about.

So what that does is that lets me know what information there is about me. It lets me evaluate the value proposition. No more of these big terms or conditions, so individual value of propositions. I can audit that they’re behaving themselves if I share things and I can get out of it if I don’t want. So we built software to do this.

It’s not terribly difficult. We have some clever twists that we think everybody ought to do. And we can talk more about this and the NSA, because this is actually an interesting thing with respect to the NSA. And also interestingly the department of homeland security has approached us about making this a more broad standard, because they’re not so worried about corporations spying on you, they’re worried about people from Romania or China going in and doing bad things.

So there’s a certain cyber resilience element to it. And what we’ve done with this is we’ve deployed it in various places. So for instance in the city of Toronto, in Italy, which is actually an autonomous region we’ve deployed this with Telecom Italia,

Telefonica [phonetic], the city government to be able to ask if you give people more control over their data and greater security to audit it and know where their data is going, do you get better sharing? DO you get more sharing?

Or is it something that is actually dangerous and people do stupid things with it?

Right? That’s a debate. Now, people argue about this all the time. I’m sick of arguing. I want actual data on the ground. You actually build the sucker and see what happens, right?

And this is what we’re going to do at MIT. We’ve gotten approval to make MIT into a living lab where al faculty, all students, all members of the community will control data about them and share for greater services, and the idea is, well, MIT is sort of a hacker community. We’re going to see how safe it all is, right? But also that this is really forward management of big data in a community where you’ve got the dual thing. So the privacy debate is misinformed.

Privacy is not the only value. Privacy is a value. The greater thing is the pattern of sharing of information within the society. The pattern of sharing is the source of innovation, it’s the source of better health, better wealth, more stability, social integration. And so you have to preserve that. That’s almost the primary thing, but you also want to preserve protection of the individual.

So you have to have some balance. And the approach that we’ve taken that there seems to be broad agreement about is the way you do that is essentially a democratic approach, where you give individuals more control about their data.

Now, there are a lot of asterisks about this that you guys will be particularly

sensitive to. For instance, [inaudible] too complicated for the average person? Yes.

And so one envisions that there will be services that come up, people like reputation.com, personal.com, there are a number of start up companies that are helping people manage their data for them and trying to have best practice and it’s a vision that competition between these will come up with really best practice for the users, and so that’s one example of a question that comes up.

Is this perfect? No. On the other hand, what the Snowden releases show us is that if you have a system like we have where not only do you have end-to-end encryption, but the ends are encrypted too, that’s pretty safe. They have to put really significant resources to be able to crack it.

Moreover, it’s modeled after the swift network, which you might be familiar with, so this is the inner bank transfer network. So you get, I believe it’s three trillion dollars a day going over this network. It operates in 164 countries, has a lot of squirrely banks in there that are a little questionable, but as far as we know it’s never been hacked, which is a little odd because, you know, where do bank robbers go? They go where the money is.

Well, the swift network is where the money is, okay? And what do they do? Well, there are a couple principles. One is there is a legal agreement. It’s a contract law.

It’s not a new law because it has to operate in all these countries, that gives joint liability. So if you screw up, he gets hurt, so do I. So we’re watching you, okay?

The other sort of thing is that you don’t share personal information. No bank gives up its balance sheet. It says I promise to pay this much money, and I have this much money. That’s all you know. All right?

So it’s minimum information sharing for the purpose. And that’s not perfect, but it reduces the dimensionality of the data and the reuse of the data dramatically, and it makes it possible to do a lot of things without data protection.

So that’s the story. If you want to know more about it you can go there, and this is the website, so we have the technical papers and videos of all my collaborators and students yakking about the stuff and so forth. So thank you.

[applause]

>>: [inaudible].

>> Sandy Pentland: Yeah.

>>: Can you [inaudible] so that people remotely --

>> Sandy Pentland: Yeah, absolutely. So the question is don’t you get a herding behavior in the sort of privacy thing, so everyone goes and does it? And yeah, you

do, and so that’s like banks or financial bubbles, right? So we don’t do a good job of managing that now, but if you pay attention to the fact that herding behavior happens, if you actually have a model of that, I didn’t mention it but, for instance, in this financial network we were able to break up herding behavior with relatively small incentives.

So when we saw that the network was causing these big cascades we would pick particular people to clip and gave them coupons to pay attention to other people, and what that did was destroy the cascade structure.

So you can actually manage things to be much safer from that point of view. I think he had his hand up first. Hey.

>>: Really impressive work on the identified patterns of a face time [inaudible] relationship with the community. How does this change in a world where skype or

[inaudible] are making the [inaudible]?

>> Sandy Pentland: So the question is, okay, so I’m sharing all these effects for faceto-face, what about the digital world and telepresence and skype and things like that, right?

And so in the data we have, which is backwards looking, skype and telepresence is not so important, right? And we see this overwhelming difference between rich data, rich interactions and more abstract ones. In fact, we have some great data that shows if you’re having a really bad day emotionally or a really good day, you stop using the electronic media and you go for rich media, face-to-face or phones.

As you have a more normal day you use the electronic stuff more. So what happens as you get better video? Well, that’s a richer channel of communication and it ought to be better, but the trouble with a lot of these is that there is no serendipity. I have to set up a contact with you, right? I don’t run into you in the skype hall, I don’t see you in the skype coffee shop and so forth.

Now, I think it will get better over time. And it’s the case that we know that if you have rich communication with a person, like you spend a day with them, then electronic means are really good at maintaining that relationship for a while. So what the real story is here from a productivity point of view or things like that, is you need to mix rich channels of communication with more abstract.

And skype is towards the rich end, but not completely because you don’t get the social context as much. It doesn’t happen with serendipity. There’s a story that the people who sell telepresence, can I say this? The people who sell telepresence stuff told me this story once where their president of the company gave a big all hand speech and they set up these telepresence things all around him. And so he would be up there talking about his stuff, and they said the coolest thing is when he would

say something you could look past him to the telepresence unit in Bangalore and you could see all the guys in Bangalore rolling their eyes.

And that told you a lot more than what the CEO was saying. So that social context, that sort of non-verbal incidental stuff is important in some cases.

So I think we’re getting there, but we’ve got a long way to go. Sir.

>>: I joined Microsoft last year. Before I joined I wrote a little report called conceal or reveal? It described how companies need transparency policy and spend as much time on that as they do on their privacy policy. So I have two parts to my question. First, which industrial sectors seem to get it and are sharing more anonamized [phonetic] data with their partners and with researchers? And since I live in Washington, does government have a role in more towards transparency getting this data out there so we can start --

>> Sandy Pentland: So the way the discussion has gone, right, is that the consumer privacy bill of rights, the data protection acts, the same things in China, industry complains about it a lot of actually it’s a hunting license. It says that if you behave this way, you can use this personal information with impunity. Okay? And the way you have to behave is you have to actually be very transparent about what the data is and what you’re doing with it so that you can get informed consent.

And so, for instance, I’m on the board for Telefonica because Telefonica wants to use all the data they have, but they have to be very clear about how they do it, okay, because they operate in places like Germany and Brazil. Okay? The same thing with

Nissan. I’m on their advisory board also. Same questions come up, among other questions. So banks, transportation companies, Tel Cos, medical. Why medical? We have HPA, but all of a sudden you have fitbits and everything else, and that data is not HPA data. That’s outside, but it has to somehow integrate with medical records and it doesn’t do it very well now at all.

So there has to be some level of control and privacy for non-traditional medical data in order to get maximum advantage of having those things. So all those regulated industries are looking at this and saying okay, well what the discussion has gotten us to a place where we now know what to do to be able to get into the data business, serve our customers better, make more money blah, blah, blah.

The why did orange give up data about the Ivory Coast? Because they feel like there’s a social tax that they have to pay to have to help the society with their data in order to get the right to monetize the data. Okay? And so certain data they’re more than willing to give up to start this conversation about data for good. Okay?

What does government have to do with this? Well, government is the one that’s holding peoples’ feet to the fire and setting the rules. So that’s really critical. And people always ask me about facebook and linkedin and things like that, and when I

talk to regulators, so this is like you get them late at night and a couple glasses of water. So the idea is that they have real control over regulated industries so they can hold their feet to the fire and they want to have best performance out of that, a really high bar for privacy and personal data control. And they think they can achieve that.

And industry is willing to go along for the right to actually play. Okay, but in the

Internet all those guys grew up without any regulation, and there is really no good way to put that genie back in the bottle until you show that in the regulated industries these high standards of privacy work.

If Bank of America, Verizon, if Kaiser Permanente are all in the exchange business and doing things that people approve of and are safe, then a company that’s operating without that level of control by individuals has no fig leaf and then you can sort of put the thing back in the bottle.

>>: Follow up though. In the EU and certain industries there are strict rules on the reuse of data. You collect it for one purpose. It can [inaudible]. That makes it very hard to do some of the things that people want to do when there’s a new unexpected relationship.

>> Sandy Pentland: So the way you have to approach it is you have to engage the customer in a dialogue about what happens with their data and what are the values they can get from it. So the trouble is they’ve been sort of like a little coy about it.

It’s like click here and we’ll do something for you. But they don’t really tell you what the data is or what they’re doing very often. So they are very reluctant to go back and ask can we use this data for something else? They need to be up front about it and say here’s the data, here are some things that you can choose to do, and you have to offer the customer a good enough value proposition that they’ll click yes and feel comfortable about it.

And that’s of course a challenge for them, but at the same time it’s a hunting license because now you can go back and use it for lots of things, you just have to involve the customer.

>>: And do you see people being more and more willing to do that?

>> Sandy Pentland: So when we build these living labs we find people very willing to do that, and so it’s a risk reward thing. If you can really believe that your data is being handled safely and that you control it, you’re much more willing to share. The moment your trust in that process decreases you’re not going to share, okay?

So sir?

>>: You have time for one more.

>> Sandy Pentland: Okay.

>>: How close do you think we are to a minority report? It seems like in this case you want to try to institute equilibrium, so I mean, I read the other day, [inaudible] but they were talking about predicting crime, right? Where they would actually be able to go and analyze all this data and say this person has a 99 percent probability that they’re going to commit a crime in the next 24 hours.

>> Sandy Pentland: So the question is can minority report and pre-crime and all that. So if you remember early on I showed this diagram from Kahneman about types of thinking. So all of this stuff only works with habits. It doesn’t work with conscious decision making. You cannot predict that very well, and the reason is obvious. It’s a generative process. It’s very highly dimensional, it’s very sensitive to basic assumptions.

So I can change one little tiny thing and you come up with a completely different set of conclusions. Habits are much more regular, they are repeated, and therefore they are predictable. So if somebody is really habitually doing something that’s not good you can predict that, okay? We’ve actually looked at crime prediction. It turns out you can do a halfway decent job, but not of individuals. You cannot, at leas tin our data, you cannot predict crime by individuals. What you can do is predict crime by place. And an interesting thing is the way you do it is you look for change in human behavior.

So if you have a plaza and suddenly all the old people don’t go there anymore, something changed and there’s a higher likelihood that there will be crime there. So the people somehow sensed that this was different, and you could see that the distribution of people was different there. You don’t know why, right? But something has changed that made them feel uncomfortable or uninterested and that’s a location where you’re likely to get higher crime.

Now, is that precrime? No. That’s being sensitive to the social situation I think, and is much less problematic because it’s not about the individual it’s about the place.

And similarly we have not been able to predict things like those sorts of things on the individual level. And we looked at it. We haven’t really tried it super hard. I also don’t think it’s moral and something you want to do. On the other hand, if you have an entire community, like these ghetto communities, what we’re doing there is predicting crime. You know, when they have very poor patterns of idea flow there are very high levels of crime.

So I can go in and say this community, high level of crime and the babies die and they don’t live very long and so forth. Is that good? Well, I think that ought to be the sort of thing that’s in the public domain. That ought to be a data commons so everybody can look at it and say huh, that community is not very good. Now, what

are we going to do about it? Are we going to just let them die off, or is government going to take that as a priority to do something about it?

I can’t tell you what to do. I can tell you to build better infrastructure for those people. I can tell you that maybe some short-term things that they need. But society has to know that there’s a problem before, and have that visible, before they take action. I think a lot of what we have today is a lack of transparency about that. So there are places that are really poor and really bad, and nobody is much aware of it.

And so they don’t get any help. If you could make that visible you could at least have a discussion about it.

>>: Thank you.

>> Sandy Pentland: Good. Pleasure.

[applause]

Download