16313 >> Kim Ricketts: Good afternoon, everyone, and welcome. ...

advertisement
16313
>> Kim Ricketts: Good afternoon, everyone, and welcome. My name is Kim Ricketts. I'm here today to
introduce and welcome Stephen Baker, who is visiting us as part of the Microsoft Research Visiting
Speaker series. Steven is here today to discuss the world of The Numerati, a global elite of computer
scientists and mathematicians, don't you love being called elite?
Who are involved in every realm of human affairs, whether it be creating new political groupings, upping
our consumer power or transforming healthcare by diagnosing illnesses before you even have symptoms.
The Numerati is here with us to stay.
Stephen Baker has written for BusinessWeek for over 20 years, covering Latin America, the Rust Belt,
European technology and a host of other topics, including blogs, math and nano technology.
Baker has written for the Wall Street Journal, the Los Angeles Times, the Boston Globe and many other
publications. His portrait of the rising Mexican auto industry won an overseas Press Club award. He's the
coauthor of blogspotting.net featured by the New York Times as one of the 50 blogs to watch. Join me in
welcoming Stephen Baker to Microsoft Research.
[Applause]
>> Stephen Baker: Thanks a lot. It's nice to be here. You know, the PR people at Houghton Mifflin put
some of these impressive sounding newspapers in my biography because once upon a time I wrote tiny
little dispatches for the Wall Street Journal, from places like Venezuela. But the real newspapers where I
got a lot of experience were now both defunct newspapers, like the El Paso Herald Post and Black River
Tribune in Ludlow, Vermont. Anyway, a little bit of a promo sometimes into those things.
One of the things that -- I'm on this tour for this book. I've been on it for two weeks. I have another half
week to go. And one of the things that I keep getting asked, for some reason or another is what these
people that I call the Numerati have to do with the financial problems we have in the world right now.
And it's funny that they ask, because when I started this book, you know, it was clear, one of the areas
where they're most important is in finance. And it was so clear that it didn't seem fresh or new and we just
decided everybody knew about the quants in finance so I wasn't going to say anything new let's junk that
and go to sexier stuff like elections and voting and shopping and computer dating and things like that.
So but I still get asked. And just a couple days ago the people from Mifflin said: Come up with something
about finance. You can tie it to the book because then we can get you on TV. And if they get me on TV,
then my Amazon ranking will go like this and I'll get on the Today Show or something. So if any of you
have any ideas to help me figure out how these people had to do with the mess we're in or, better yet, how
they can help us get out of it, I'm all ears and I will channel you as I go on the Today Show or the Colbert
report.
The one thing -- I've sent the few contacts I've had at Goldman and Lehman Brothers frantic e-mails asking
for their input. For some reason they're not answering them. I don't know why.
But one thing I did hear is if you look at credit applications or mortgage applications in around the year
2000, they asked for a lot of details about people. And what their employment background was and how
much money they made and their credit history and things like that.
And as the years passed, they asked for less and less information. And this kind of goes against the whole
theme of my book, which is that there's all of this information available about all of us so that people in
every realm can find all kinds of data about us and understand us and sell to us and give us advertising
and figure us out as voters and yet in finance they were moving in the other direction.
They were taking rounded people and turning them into ants. And so it kind of goes back -- it kind of takes
things the other way. I thought that was -- that's all I can say right now if they get me on the Colbert report.
Anyway, I'll tell you about the genesis of this book. I was working -- I worked at BusinessWeek, and I
pitched this cover story in the summer of '05. And the idea was that the U.S. tech industry might be
heading into a decline because of fewer -- graduating fewer engineers and scientists behind in broadband,
behind in wireless, 9/11 Visa regulations, I went on and on and the editors all yawned and said we've kind
of heard that before.
It sounds like Thomas Friedman's book. I was like, okay. It's kind of an important theme, though. Is there
any other way we can discuss this? And one of the science editors said math is at the heart of all of these
competitive issues.
And the editor in chief said why don't we do a cover story about math. Nobody writes about math.
Incidentally I have something I want to show you.
So I don't have audio visuals but I have a couple of props. Anyway, so he said let's write a cover story on
math and let's get somebody who is not too brainy to do it and he appointed me. [laughter].
And I didn't really know much about math at all, and I still don't. But I went around and I talked to people at
MIT and I called up the usual suspects and asked them about math in the most general terms. I learned all
kinds of interesting things that I had no idea what my story was going to be.
Then I went to IBM Research in Yorktown, New York. And the head of their stochastic analysis division,
Simon Dacreedy, told me he and his team of 40 were embarked on this project to build mathematical
models of 50,000 of their colleagues.
This was modeling the consultants at IBM, a group of them. And they were going to get data from all of
these different sources, the e-mails, the calendars and all that, and resumes and try to what people were
allergic to, what airports they lived near and build models of them so they could be deployed more
efficiently.
I thought if he can do that with workers, then other people can do that with shoppers, voters, et cetera, et
cetera, and that's how the idea came together. And I did this cover story. And it really didn't have that
much to do with math, full disclosure. It was much more data mining and computer science, really. But this
was the cover story. And it really sold well, because people, even if it's not math, if it says it on the cover,
there's a crowd of people that are interested in things like that.
And later I pitched this as a book and I got this book contract. But as I was working on this cover story, I
came out of the IBM interview and I said, I called up my roommate who has a Ph.D. in computer science.
My roommate from college. And I called him and I said: I am going to do the most exciting cover story you
can imagine. I'm going to do this mathematical modeling of humanity.
I was full of the passion of the ignorant, but it was excitement. And I raved on in this e-mail for a couple of
minutes and then forgot about it. Then a couple of weeks later I got a phone call from him. He said I'm
really concerned about that cover story of yours.
And he said have you ever heard of garbage in, garbage out. I said I've heard of it. He said have you
heard the story about the drunk and the light and looking for the key? I'm sure you've all heard of that
story, the guy's looking for the key because that's where the light is even though that's not where the key is.
So he gave me sort of 101 on what to watch out for in this world. And I went on and I've told this story once
or twice before, and he actually called me as I was on my way for an interview with Google and I was
thinking this morning should I cut out the Google bit for the Microsoft talk? But then I thought well it's not
that flattering to Google so I think I'll go ahead and tell it.
So I went to Google and I talked to Craig silverStein, who is one of the I guess the first employee. And I
said that story about the drunk and the key, is that something I should be keeping in mind? And he said:
When I was in middle school this was a science fair project and I came up with this experiment and I came
up with all this terrific data, and then I realized the experiment was flawed and so I tried to come up with a
new experiment that I could hitch up to the data that I had already generated.
And it was at that point that I said: You know, this mathematical modeling of humanity, it might happen, but
if it does, it's going to happen first in areas where people can afford to make a whole lot of mistakes.
And so marketing and advertising are two key areas for that thing. Anyway, so my book is about this
nascent effort, this modeling of humanity. And it's just looking at where we are in trying to figure out
patients, trying to figure out voters, trying to understand blogs and use them for market research.
Modeling, went to IBM, did the modeling of the workers, just this tour through this world, and a lot of these
efforts are really, you know, I think we'll look back and say they're pretty primitive. They make lots of
mistakes. They really don't understand us in a lot of meaningful ways.
But the standard isn't whether they're true or not or whether they understand humans in all of our
complexity. The standard is if they understand us just a little bit bit better than what the status quo was
before enough so they can make money. And if they can, then they keep on doing it and they learn a little
bit more and it progresses.
And that's where I think we are in this thing. And so I'm not here to make any tremendous promises that
I'm sure you would -- I don't need to make them to you anyway. But anyway. And the other thing is the
important thing isn't that it's true, it's that it provides incredible scale and efficiency so you can deal with
millions of people at the same time.
And that's why these schemes that I'm talking about have such tremendous power. For the first time we
can compare people to a million or 10 million or 100 million other people.
I thought I'd walk you through a few of the case studies that I did. One has to do -- a lot of them put us into
new tribes. They take a look at the old ones where we were understood by our demographics or our region
or our race and replace it with new ones that are based more on our behavior.
One of these is in politics. I went to this political consultancy in Washington called Spotlight and like so
many others they're trying to micro target swing voters.
And one of their ideas is that what are we September 30th? If on September 30th an American voter
doesn't know what he or she is going to vote for, that person really isn't terribly engaged in the political
process and isn't thinking about the issues the way that the politicians and the politically involved people
are.
They're thinking about things in another way. But those are the people who are going to likely swing the
elections in key states like Ohio, Wisconsin, New Mexico, Nevada. So how do you understand those
people? You don't do it by the issues that they don't really spend a lot of time thinking about.
How do you find those people? So they did -- they basically co-opted corporate marketing techniques.
They took about 4,000 people that they thought represented a cross section of the American voters, and
they gave them lengthy interviews, where they talked about all kinds of things that they were, what are you
scared of? What do you hope for? What do you want your kids to do? Sort of looking at the future through
their eyes at what scared them, what were they excited about, but not politics.
And then to fill out these profiles, they, of course, asked them a lot of questions about politics that they
could look at the correlations.
So they had 4,000 people. They gave them to Yankolovich and Partners, a company that analyzes
consumer behavior. They said are these people divided and can you divide these people into any sort of
recognizable groupings? And they could. And they did. And they said these are five tribes and you can
divide each tribe into a more zealous and a less zealous. So a total of 10 tribes.
And some were people who focused on righteousness. And some were people who focused on community
and those are pretty clearly Democrat and Republican. But they were interested in the ones in the middle
who really cared deeply about freedom.
Pie in the sky term, but they found it was something about these people around freedom. And there was
one group of them that they call barn-raisers. They have names for all of these people. Right clicks
[indiscernible]. But these barn-raisers care deeply, playing by the rules, those sorts of things. They care
about morality but they're not deeply religious as a rule. They're swing voters, represent 8 percent of the
population, which is 14 million voters and they voted for President Bush by 90 percent, 90 to 10 in '04 and
two years later they went Democrat 50 to 60 percent in the congressional elections. So they think that they
have their eyes on a new swing voting group. It's not a demographic, it's this tribe that exists only in their
database.
But how do they get 14 -- they have the 3,000 that they know about, and so the barn-raisers are 8 percent
of those 3,000, but how do they find barn-raisers in the rest of the country? They have to do a model
based on demographics and consumer behavior of the barn-raisers that they have. They test it against the
control group that they know, and then they take that model and they run it across 175 million voters to pick
out the 14 million barn-raisers. So they've done that with every one of us, everyone here who is a U.S.
voter exists in one of these tribes. And that's just for Spotlight. I'm sure we exist in other tribes for other
political consultants.
What they want to do is hit those barn-raisers with specific ads in places like Milwaukee, Santa Fe, swing
states, that emphasize the points that they seem to care about.
They think that their technique gets is 75 percent accurate. So 3,000 out of four people that they call
barn-raisers are barn-raisers and the other 25 percent are something kind of close. One of those freedom
tribes but not one of the community or righteousness tribes.
So a lot of people complain that I talk to complain about this. They think that it's kind of weird and scary
and it's the automation of American politics and we're being treated like things. But I say we've always
been treated like herd animals.
They've looked at us as one ethnic group or another or one urban group or another or voting precinct. So
they're actually trying to understand us as something closer to the people we are, even though they use
strange statistical techniques.
Another area that I covered was medicine. I went to Intel down in Portland. And they've wired the homes
of several scores of elderly people with all kinds of sensors. And they're trying to measure absolutely
everything these people do in their homes. The nature of their strides, how they shift their weight on the
kitchen floor. They've got sensors under their tiles to measure how they shift their weight on the kitchen
floor.
The strength of their voice. The length of time it takes them to recognize a voice on the telephone. All
kinds of things. They establish base lines for each of those behaviors, if they see a deviation from the
baseline that points to some problem and eventually they want to be able to diagnose it automatically or at
least come up with a suggested diagnosis automatically and they're looking at things like Alzheimer's,
Parkinson's Disease. Oh, loss of muscle mass in the legs or loss of balance that would lead to a
catastrophic fall.
Their theory is that a lot of people who -- well, right now a lot of middle aged people have aging parents
that they're hard-pressed to keep track of and take care of. And that as this generation ages, we're going
to need more and more of this home healthcare. I'm sure you people at Microsoft have lots of projects
using the same, following the same ideas.
I think eventually this is going to raise all kinds of questions for society about insurance, what happens if
the insurance company calls you and says I'll give you a 30 percent discount on your health insurance if
you put a few sensors in your house. I think, increasingly, we may be faced with those sorts of questions
which will raise further questions about the very nature of insurance, which is an industry that relies on a
certain amount of ignorance. And as we learn more we're not going to be as ignorant in what happens to
the insurance industry.
In the auto industry, there's a company called Progressive that's offering people discounts to put black
boxes in their car. And they measure where they go, how they drive, which neighborhoods they go in, what
times they drive. They're trying to assess their risk.
And I talked about this to one group, and I asked if anybody would be interested in that? And a guy said
not for me but for my kid. And I think that that's going to happen more and more, is that middle-aged
people are going to impose these surveillance systems on their parents and their kids and those are going
to be the test populations.
And if it works, and the results are good, then I think more and more of us are going to embrace it for
ourselves for the life enhancing qualities.
But like so many others, the business case for this starts out with really basic things that have less to do
with the Numerati and more to do with just reporting simple facts. One of the things is weighing people.
My mother actually participated in this Intel study in Portland.
And she was 90 and suffering from congestive heart failure and extremely weak and frail. And they told her
she should weigh herself every day and report the conclusion, report her weight every day. Well, she didn't
remember that often. She wouldn't remember to weigh herself every day at that point in her life. But I
bought a scale for her, one of these digital scales. And as soon as I gave it the to her I realized it was
absolutely futile because it takes a strong tap to activate that, and she couldn't double click -- she had a hell
of a time with a mouse, double clicking.
And tapping that scale was beyond her. And then even if she would remember to weigh herself and
successfully tap the scale, she wouldn't be able to see the numbers. So there were like 3,000 data
collection obstacles right there with my mother.
Well, at Intel they've wired people's beds so they can weigh them in bed. And that's a useful thing, you
know? People might pay for that. It's very primitive but these things start with primitive hookups. The only
trouble was there was one case where a woman gained eight pounds in the middle of the night they
thought she was taking on fluids should they get an ambulance over there. It turned out her little dog
jumped on her bed. So the data is not always that clean.
When I went to IBM -- well, I did this cover story, and the nice thing about writing cover stories for a
magazine like BusinessWeek is I can go to IBM and they can tell me we're going to model 50,000 workers,
and I can say IBM is going to model 50,000 workers, picking up data from e-mail and blah, blah and lay it
out in a paragraph and maybe even a second paragraph. After that, I don't need to know much about it. I
don't have to know how they do it because I'm off to my next example.
But when I'm writing my book, I had to go to IBM and say you know that thing I spent two paragraphs in the
book talking about could you walk me through that and tell me how you plan to model 50,000 consultants?
So they did. They walked me through it at some length, and they use old -- they use hand-me-down tools
from different disciplines. For example, they used financial tools to analyze the skills, to put a value on the
skills that people have so that they can do a business plan and say: This is where we project our
company's going to be in five years and these are the skills that we're going to need. So how much are
these skills worth and how many skills do we need? And so looking at skills, valuing people's contacts,
valuing, trying to create some kind of value for where they sit in the network according to their e-mail
patterns, all of these things go into numbers which each person becomes sort of like a mutual fund of
different skills going up and down.
And it's not at all what people are. But it gives them some way to try to get a handle on how to evaluate
them and project their value in the future.
So it's not -- it's not that close, but it might work to some degree. And then the other one that they use
that's a big hand-me-down, is operations research. And during World War II, the convoys were crossing
the North Atlantic to arm Britain they kept getting sunk by German U-boats so the U.S. and Britain put
together teams of mathematicians that turned the north Atlantic into an entire mathematical battleground, if
you call an ocean a battleground.
Anyway, they figured out how to optimize the convoys to minimize the damage, how many destroyers
should surround each convoy, how many boats should be in each convoy. They figured which routes they
should take, they optimized it, lowered the casualties along the way, and it was very successful.
And later, after the war, IBM used that same science to optimize its own supply chain. So they developed
all kinds of efficiencies. They saved a lot of money and then they used that knowledge to create a new
service business and they sold their supply chain smarts to the rest of the world.
And everybody optimized their supply chain either with IBM's science or somebody else's. And now IBM
has moved to a much more service company for manufacturing, and if they were to try to optimize their
supply chain, it would be its people.
And so that's what the Cready team is trying to do is to sort of optimize their people and they're using a lot
of the hand-me-down techniques from operations research. And again it doesn't really -- this wasn't built
for people. But if it works and provides some kind of incremental improvement then they'll go with it.
And I guess one of my questions for you, and I'd be interested in hearing what you have to say about this,
is if we have a system -- if we have systems that improve because they get better results and they try to
analyze people, and through the years and through the decades we fine tune them and fine tune them but
the very platforms that they're built upon were built for financial instruments and for machine parts, is this
the wrong way to try to understand people?
I don't really know. But that's the way that I think a lot of people are heading because that's the way that
works for today and tomorrow.
And this whole industry is based on today and tomorrow not based on a clean sheet of paper that might
work in five years. Maybe it's being done in universities. Maybe it's being done in research departments
like this one. But I think it's going to be basically built on the same systems that understand finance and
machine parts.
I went to Yahoo! and I asked the head of research there [Paraga Rafaca] about the challenges of trying to
dig through these mountains of data, trying to understand consumers and building services for them. And
he gave me kind of a primer on managing massive amounts of data. And he told me about overfitting and
all these other problems that you have with data and somehow you can get overwhelmed by it, you can
dive down rat holes chasing various correlations that turn out to not have any meaning.
So then I went to the National Security Agency, and I met with the chief mathematician there. This is in the
summer of '06, and they had gotten into a lot of trouble -- well, a lot of controversy, because they had been
consuming immense streams of Internet and telephone data. And so I was very worried about my
interview with him, I was worried he would object to my questions and storm out and slam the door.
So I was a little bit tentative when I asked him questions. I started out by saying you know these people at
Yahoo! were telling me that sometimes you get too much data. Is that a problem for you?
And he said the people at Yahoo! might not know how to store their data and they might not ask the right
questions and they might get confused by the data, but no, you can never have too much data.
And so that is my story. I've got this book. I'm happy to answer any questions or talk to you more about
the Numerati if you would like to.
>>: So your discussion about how you divided the political people into tribes, how that was being done,
was any follow-up done to see how successful they were in converting undecided into a decided, number
one? And number two, the [inaudible] because I assume they had an agenda. [inaudible].
>> Stephen Baker: Right. The question, I don't know if the question comes through.
>>: Not as well.
>> Stephen Baker: The question is about the conversion rate in the political thing. I would say if they are
doing those conversion studies, they're doing them and will only publicize them if they benefit their
consultancy. You know?
I think come December there's going to be a lot of chest thumbing by whichever was the winning side, and
a lot of claims about having swung Ohio or Wisconsin for one candidate or the other. And there is a lot of
hype in this field. There's some truth and a whole lot of hype and a lot of marketing. I don't really know it
will be interesting to find out. Maybe somebody will give me the inside look. But I don't know about their
luck in that.
Any others?
>>: I enjoyed hearing you talk, brought up a lot of stuff. I was just thinking about one place where they've
been very successful doing this, which is sort of the credit rating industry and a whole lot of potentially
unrelated information and turn it into a credit rating. And then in my mind they failed to adapt and they
failed to change but they still have so much power.
I mean they're so powerful that no matter how bad they are at this point, they're going to maintain at least
until they get so bad that the economy collapses, that they maintain a dominance in the industry. And all
sorts of -- seems to some extent you're talking about the democracitization of predictions or something. So
I just thought that was interesting. It's a case of potentially abuse of power and modeling.
>> Stephen Baker: You're talking about [Fair Isaac] or Standard & Poors?
>>: More like individuals, I was thinking, not corporate.
>> Stephen Baker: Fair Isaac. FICO score.
>>: [inaudible] is one. The other thing is you said they predict like 25 percent of the time and we think
that's good. But what about the other 25 percent of the people? Isn't the foundation upon which our
country was based, isn't that just [inaudible] people who don't fit in models or treat them differently?
>> Stephen Baker: Well, you do that, when you're running a standard political operation and you think that
there's a Democrat -- like if you go into Philadelphia, which is a highly democratic city, you run commercials
for the democrats to get them out to vote and you just forget about all the republicans that are there,
because they're a minority that you're not paying attention to.
>>: This is going well beyond politics this is diving down into people's lives, we're talking healthcare.
>> Stephen Baker: Right.
>>: I mean monitoring their homes and what if there's those 25 percent of the people and the healthcare
company looks at the data rather than the person?
>> Stephen Baker: Well, I think it's made for areas where it doesn't matter. If they think you're a barn
raiser and Obama sends you an ad saying I really care about right and wrong and nobody's been playing
by the rules and blah, blah, blah, you know that's not a big deal. You just get the wrong advertisement
that's not micro fitted to you, but in medicine it's a whole different game.
So I think it's going to be longer before these people make great strides in medicine. That's where they
need it the most. But yeah.
Yeah?
>>: Have there been social and kind of legal responses to the ability to profile people, crunch numbers to
find an idea of this group or that group will go this way or that way? Have there been lots of ways of wrong
way and right way of using that and what's the responsibility of the company to do that?
>> Stephen Baker: You mean to privacy issues?
>>: Privacy or manipulation in some respect as well.
>> Stephen Baker: I don't know. Does anybody else have any thoughts on that? I can't say.
>>: Any government regulation or attempt at ->> Stephen Baker: There's a lot of talk about different regulations. But there are very strict ones about
medicine. But as far as profiling for things like advertising and marketing, I don't think there's much -- I
don't think there's much of regulation at all. There are much stricter regulations in Europe than there are
here, I know that. I don't really have specifics on that, I'm sorry to say.
>>: So one of the key issues here when we are talking about this mathematical model of humanity is
people have a lot of privacy concerns. I think that's one thing that's maybe stopping a lot of things from
already being modeled more. How do you see that panning out? Do you see the Numerati becoming more
aware of people's privacy and building in maybe new I guess ways to protect people's privacy or do you
people becoming less concerned about privacy?
>> Stephen Baker: I see both. I see people redefining privacy and trying to come to grips with what -consider the secrets that you have and then which secrets should you keep in the future or should you
attempt to keep in the future.
And there's some secrets that traditionally you've kept them but you don't really need to. Then there are
other ones you want to keep. And I think a role for a company like Microsoft, and I know you're at work on
it, is to create tools for people to protect themselves and for industries to provide services where you can
get the benefits from sharing information without the costs of exposing yourself to loss of privacy or loss of
money.
I mean, especially important in -- I talked to one of your colleagues, Cynthia Dwark, I don't know if she still
works for Microsoft down in San Francisco, and she was talking to me about medical data and how you
could, if you zeroed in on it, it was impossible to see the individual. I mean that's the real key is it's a real
opportunity for companies like this one, I would think.
Now, I said that to Google. I went to Google two weeks ago. And I said: People at IBM are really
concerned about this article. The book excerpt ran as a cover, and it was about the IBM chapter. It was
about modeling workers and whatnot.
And it took out the most noteworthy stuff and so it didn't have some of the softening elements of the book.
And the IBM people were very upset about it. It made it look like they were a Big Brother company and
they wanted me in my talks about this to say that that was a pilot project and their surveillance of
employees is done on an opt-in basis now.
But then I went to Google and I said: You know these people at IBM were really concerned about that. But
I would assume at Google where all data is just considered information to be analyzed that you would
assume that people are looking at your patterns, your workplace patterns and trying to figure you out and
make you more productive or help you come up with better ideas or whatever.
And they were horribly offended by that idea. A couple of them denounced me for even suggesting it. And
the word "evil" came up into the conversation more than once.
And so it would be interesting. I don't know what the thinking is here at Microsoft. But I just assumed that if
companies aren't looking at that kind of data, it's just because they haven't gotten around to it yet. But I
mean not that you can build predictive models of IBM, I mean Microsoft Researchers, but there's something
to be learned. I don't know what you think about that.
Yeah?
>>: I wonder if you've come across any projects where people are trying to do real-time profiling for
instance I'm on vacation and I bought lunch maybe now I'm more prone to go buy dessert, based on
history.
>>: I don't think you need a model to figure that out. [laughter].
>> Stephen Baker: Well, the one company that I talked to that I thought was doing something interesting in
that area, do you know Sense Networks they come out of MIT. And they've put this software into
telephones so that they can track all these people's movements. They're doing it in San Francisco.
Coolest thing in the world to look at this map of San Francisco and see these various people moving
through it. And so they think that if you look at a city as sort of like the physical Internet, then the corner,
such and such a corner of Lombard Street and something else in San Francisco is like a web page.
And if you stand on that corner between, let's say, 9:00 and 10:30 p.m., then you and everybody else who
stands on that corner at that time have something in common just like people who visit a certain web page.
And so then if you look back at their patterns, like where do most of the people who go to that corner
sleep? And you might see that a certain number of them come from this area. And where are they at 2:00
in the morning. You might see they're in this certain club. And then you can define the people who have
those patterns as a tribe or a group you can market to.
And that could bring real-time marketing, the kind you're talking about. Interestingly, this is just beginning,
this thing just launched a couple months ago, Sense Networks, but the investors in Sense Networks aren't
VCs, it's a hedge fund.
You figure a hedge fund it's four million bucks, which is nothing, even in today's climate, and they get this
raw data of people's movements within New York, San Francisco and other cities, and if they can use that
to try to understand something about what consumers are up to, they might be able to understand the
economy just $4 million better.
>>: Have you thought about generational or cohort differences with regard to even the privacy or
acceptance of this? Because based on what I see, I think like the younger teens and things who are
brought up with technology and are much more comfortable and familiar with it would be more comfortable
with having this information used to ->> Stephen Baker: That's right. If you look at the blogs and social networks, there are many parts of our
society that are spilling much of their lives including intimate details of their lives for the whole world to see.
It's a treasure trove for the data miners who want to figure out do sentiment analysis on these people or
people in general because they just, they get big enough sample and they adjust for the age. But one of
the companies I visited was Umbria Communications, which was going through those blog posts and
coming up with sentiment analysis for marketing companies. And right now all they're doing is the thumbs
up or thumbs down for a new, the Jerry Seinfeld commercials.
But in the future they're going to be able to understand those messages, those writings with a lot more
nuance and context.
>>: You suggested earlier on that politics was borrowing from marketing. But I've done some -- I've
worked on some marketing campaigns for the company. I kind of got the impression it came the other way,
which is that there are certain industries like politics where marketing wannabes can get a start, and other
places, and they go to conservative companies that are establishing and have a lot to lose. But when
you're on a campaign that has pretty much no downside and big upside, it's a chance for some young kid
with some innovation to make a mark.
And they're almost never going to do the safe, reliable thing that anybody sensible would do. This is, if
you're going to make a mark, you have to do something where you've got a candidate who is probably not
going to win. And you've got to bet the farm on double 0 and come up with something truly creative.
And if it works, you've now got a career. I think it's different when you're in the home stretch.
>> Stephen Baker: Yes. Just from my own experience, having kids and friends who go into politics, they
often get discouraged because the old pros who know all the precincts and know the way things are done, I
find that young, sharp 20-year-olds or 22-year-olds often get stuck stuffing envelopes and not having that
kind of input.
>>: If you look at what the creative new things coming out of the more recent campaign, they're things that
both the old pros never would have done and also they're the things that the marketing companies have
never done.
>> Stephen Baker: You're right. Certainly in terms of the Internet stuff.
>>: So, for example, let's take, give you a five-minute head start on knowing who the presidential, VP
presidential pick is going to be in exchange for that I give you permission to text message me.
And they get a massive number of opt-ins by this method. This is something that an old pro never would
have thought of.
>> Stephen Baker: Right.
>>: I'm saying this campaign [inaudible] with this kind of thing. If you look at how the CVs of the people in
the conservative places that are doing marketing now, where they came from, they almost all came from
one of those places where something like politics where you didn't need a long CV to get in.
>> Stephen Baker: Interesting, I really didn't think about it.
>>: Modern research is born from politics. I think that's pretty much the genesis of it.
>>: Furthermore, one of the big marketing things now is turning your much loyal customers into your
marketers, which is what politics has always been about?
>> Stephen Baker: That's true.
>>: And you have to go out there.
>>: It's about changing attitudes. I mean with politics it's easier because you have a very discrete choice
and a very discrete period. It's the election that you have to impact their position before that, and then it's
ubiquitous after that, the marketing products.
>>: I have a question. How do you think the census, the American census [inaudible] you know the way
the government categorizes.
>> Stephen Baker: Yeah, I'm thinking about it. What do you think?
>>: Well, it just seems archaic the questions they ask. It's not very far-reaching. But they must use it to
make decisions about us.
>> Stephen Baker: It's just that when you fool around with the census, I can imagine if people, if there
were an open source movement to try to figure out what the best census would be, it would be fascinating.
>>: Right.
>> Stephen Baker: But then it would come up with all these privacy implications and this would be massive
debate about it and it would seem that the census for all of its potential that it gives us, is one of these
areas that's going to be really hard to change. But I don't know.
>>: Before the 2000 census, there was a proposal to actually do it by sampling. And all the
mathematicians who worked for the census bureau swear up and down it would be more accurate if they
were allowed to use sampling rather than individually trying to count everybody and the politicians just
would not hear of it.
>> Stephen Baker: Right.
>>: So it was voted down.
>> Stephen Baker: Yeah.
>>: I think a good example or one of the best examples of data privacy is the whole RFID thing, where
they were going to put a radio frequency ID tag maybe in a piece of clothing at a retail store or something.
And the benefits are great, right? Because potentially you could push your cart through the register and
pick up everything all at once. Nobody was having it. Putting RFID chips in licenses and have to be able
to disable it and passports now have them in there. There's websites telling you like take a hammer and
smash it in a certain spot so that it doesn't work.
If you do it any other way you're defacing the passport and it's illegal.
>>: Carry it wrapped in aluminum foil. [laughter].
>>: [inaudible].
>> Stephen Baker: Did you want to say something back there?
>>: I was going to mention that I was looking into some of these bills that were being passed and I was
researching some of the bills for some reasons, and found out that some of the censuses were actually,
they were going back 3,000 decades to pull up numbers to support the bill as opposed to using the modern
census. So it's kind of ironic that our government will say oh no go count everybody in that census,
because at times when it's not beneficial to them they'll go use an older census and quote it in the bill.
>> Stephen Baker: Right. That reminds me of my time as a steel reporter when I was in Pittsburgh. I went
to this cutting edge rolling facility, steel rolling mill in Indiana that was half Japanese. It cost more than a
billion dollars, which was incredible for the steel industry.
And they take a band of steel, flat rolled steel that's this flat and in one continuous process they roll it until
it's like tin foil. And it's just the band goes on forever and ever. But the tricky part is that you have to weld
together the ends of one band or the beginning of another and that's done -- that was done electronically,
and it was high tech thing they were very proud of. And so they show it to me. They tell me how it's
welded together really fast and really strong. And I saw this hammer next to it.
And I said what's that hammer doing there? And they said oh those guys never trust the automatic welds.
So I think those ->>: Just recently purchased your book. Although I haven't had a chance to read it yet -- it's a joke.
[laughter] I did notice that you had something in there related to RFID along the lines what Josh was just
mentioning. I was wondering if you had given any thought to anything that related to evening the value
propositions with some of these instruments that were designed initially or primarily for supply chain from
the perspective of the consumer. These devices in products that you purchased that still actually would
work when you have them in the home, have you thought about something that -- I know there's the long,
long set of discussion around privacy related to some of these things but perhaps maybe if the value was
sort of balanced out a little bit giving the consumer some value in having these in their products.
>> Stephen Baker: Yeah, I haven't come up with great ideas about it. But it just seems to me that
increasingly we're going to be making deals where if we agree to use services and provide people with our
data, we're going to be getting more and more -- they're going to have to offer us more -- as people
become aware of how valuable their data is, they're going to be in a position to ask for great deals and
great services in exchange for it.
And I think that's where there's going to be a lot of business opportunities for companies that figure that
out. But I don't have specific examples of that.
>>: Obviously the fear factor for people is huge. I mean when you're in the radio and stuff.
>> Stephen Baker: That's what people are calling that. They want to go off the grid.
>>: So but do you think as data collection becomes more sophisticated, if we have simultaneously much
more sophisticated security that that will alie some of it?
>> Stephen Baker: I think people a decade ago would never put their credit card information on line for
e-commerce, that was a big deal. Something happened to convince people that that was okay. And I think
there will be some demythfication that goes on. There's real scary stuff and there's fake scary stuff and
they can divide the two, hopefully.
>>: One thing -- I'm in market research so I'm cheating a bit, but talking about data privacy is to think about
the privacy and data at the point of collection rather than at the point when you're going to use it or just stuff
it in a new database somewhere so you can have separated it from identifying information.
So a lot of times this data is collected inadvertently or sort of, I don't want to say subconsciously, but they're
not really thinking about it as collecting data, they're thinking about it as part of the process.
And if they had built in thoughts about privacy and retaining that information at the point where it was
collected, you stand a much better chance of maintaining privacy.
>> Stephen Baker: Right.
>>: Anyway.
>> Stephen Baker: No, I think that's true. I think this has been just a random Helter Skelter evolution so far
without many rules guiding it, without best practices and a ton of really valuable wasted data that's not
analyzed.
And I think people are going to get, figure things out.
>>: On the subject of privacy, I found your NSA example intriguing. What are your thoughts on the amount
of infrastructure that they have in place to profile people, et cetera? Because they're not out to sell people
anything. They're trying to find the breadth of America and mop them up, are they miles ahead of the
private sector? Are they behind? What are your thoughts on that?
>> Stephen Baker: I don't know. I know they have real recruiting challenges because they're competing
with companies like this one when it comes to recruiting the top mathematicians and computer scientists
and it's really hard for them, civil service pay scale to compete with these big web companies.
And plus they're limited to American citizens which further handicaps them. And they have had to turn their
mission from cryptography and code breaking during the entire Cold War into data mining and this type of
analysis. And I can't imagine that they're ahead of the cutting edge private sector companies.
But that said, I got no details from the guy. He said they supposedly have the biggest math shop in the
world. He went on and on about the huge challenge of weeding, of finding truth in these mountains of data
and leads and all the rest. And I mean we haven't had a terrorist attack in this country since 2001. So far
be it for me to say they're not doing an effective job, but somehow ->>: Anthrax.
>> Stephen Baker: You're right.
>>: That was the wrong guy, though.
>> Stephen Baker: No, but it was a terrorist -- it was sort of a terrorist attack. I mean I don't know. It just
seems to me that that was one chapter I really had trouble writing because it was not -- they're not
successful -- it's not a good science for them, really, for the most part. They don't have any -- they don't
have good patterns of behavior or known patterns of behavior of terrorists the way they do Cheerios buyers
or home buyers. They don't have that kind of data.
And it's a little bit like the space shuttle, you know? The space shuttle has two accidents and so it doesn't
give them much of a sample to work with. And so what I did in that chapter is I looked at the two different
approaches you can have toward it. One is the statistical analysis of just data mining and the other is going
through databases looking for correlation, looking for phone numbers that overlap, names that overlap,
aliases and things like that.
There's this software called NORA, which is a nonrelation awareness or something. Anyway, it goes
through and tries to find people. The guy who created NORA thinks that he could have stopped 9/11, they
could have stopped it if they used something like his because it was clear that two guys who were
associated with the bombing of the U.S.S. Cole were living in Los Angeles and they were in the phone
book; the data was there.
So any other questions? I appreciate you coming down today and spending some time with me. Be happy
to sign any books if anybody wants one.
>> Kim Ricketts: And the blog.
>> Stephen Baker: This is my blog, thenumerati.net. My contact information is there. If you want to get in
touch with me, my e-mail is there. Feel free to leave all kinds of comments on the blog. I hate having zero
comments on a post. [laughter] it's depressing.
>> Kim Ricketts: Thank you.
[applause]
Download