Uploaded by taytay126

Exam 3 Ted Talks

advertisement
Susan
00:01
Technology has brought us so much: the moon landing, the Internet, the ability to sequence the human
genome. But it also taps into a lot of our deepest fears, and about 30 years ago, the culture critic Neil
Postman wrote a book called "Amusing Ourselves to Death," which lays this out really brilliantly. And here's
what he said, comparing the dystopian visions of George Orwell and Aldous Huxley. He said, Orwell feared
we would become a captive culture. Huxley feared we would become a trivial culture. Orwell feared the truth
would be concealed from us, and Huxley feared we would be drowned in a sea of irrelevance. In a nutshell,
it's a choice between Big Brother watching you and you watching Big Brother. (Laughter)
00:56
But it doesn't have to be this way. We are not passive consumers of data and technology. We shape the role
it plays in our lives and the way we make meaning from it, but to do that, we have to pay as much attention
to how we think as how we code. We have to ask questions, and hard questions, to move past counting
things to understanding them. We're constantly bombarded with stories about how much data there is in
the world, but when it comes to big data and the challenges of interpreting it, size isn't everything. There's
also the speed at which it moves, and the many varieties of data types, and here are just a few
examples: images, text, video, audio. And what unites this disparate types of data is that they're created by
people and they require context.
01:57
Now, there's a group of data scientists out of the University of Illinois-Chicago, and they're called the Health
Media Collaboratory, and they've been working with the Centers for Disease Control to better
understand how people talk about quitting smoking, how they talk about electronic cigarettes, and what
they can do collectively to help them quit. The interesting thing is, if you want to understand how people
talk about smoking, first you have to understand what they mean when they say "smoking." And on Twitter,
there are four main categories: number one, smoking cigarettes; number two, smoking marijuana; number
three, smoking ribs; and number four, smoking hot women. (Laughter)
02:46
So then you have to think about, well, how do people talk about electronic cigarettes? And there are so
many different ways that people do this, and you can see from the slide it's a complex kind of a query. And
what it reminds us is that language is created by people, and people are messy and we're complex and we
use metaphors and slang and jargon and we do this 24/7 in many, many languages, and then as soon as we
figure it out, we change it up.
03:15
So did these ads that the CDC put on, these television ads that featured a woman with a hole in her throat
and that were very graphic and very disturbing, did they actually have an impact on whether people
quit? And the Health Media Collaboratory respected the limits of their data, but they were able to
conclude that those advertisements — and you may have seen them — that they had the effect of jolting
people into a thought process that may have an impact on future behavior. And what I admire and
appreciate about this project, aside from the fact, including the fact that it's based on real human need, is
that it's a fantastic example of courage in the face of a sea of irrelevance.
04:04
And so it's not just big data that causes challenges of interpretation, because let's face it, we human beings
have a very rich history of taking any amount of data, no matter how small, and screwing it up. So many
years ago, you may remember that former President Ronald Reagan was very criticized for making a
statement that facts are stupid things. And it was a slip of the tongue, let's be fair. He actually meant to
quote John Adams' defense of British soldiers in the Boston Massacre trials that facts are stubborn
things. But I actually think there's a bit of accidental wisdom in what he said, because facts are stubborn
things, but sometimes they're stupid, too.
04:51
I want to tell you a personal story about why this matters a lot to me. I need to take a breath. My son Isaac,
when he was two, was diagnosed with autism, and he was this happy, hilarious, loving, affectionate little
guy, but the metrics on his developmental evaluations, which looked at things like the number of words — at
that point, none — communicative gestures and minimal eye contact, put his developmental level at that of
a nine-month-old baby. And the diagnosis was factually correct, but it didn't tell the whole story. And about
a year and a half later, when he was almost four, I found him in front of the computer one day running a
Google image search on women, spelled "w-i-m-e-n." And I did what any obsessed parent would do, which is
immediately started hitting the "back" button to see what else he'd been searching for. And they were, in
order: men, school, bus and computer. And I was stunned, because we didn't know that he could spell, much
less read, and so I asked him, "Isaac, how did you do this?" And he looked at me very seriously and
said, "Typed in the box."
06:19
He was teaching himself to communicate, but we were looking in the wrong place, and this is what happens
when assessments and analytics overvalue one metric — in this case, verbal communication — and
undervalue others, such as creative problem-solving. Communication was hard for Isaac, and so he found a
workaround to find out what he needed to know. And when you think about it, it makes a lot of
sense, because forming a question is a really complex process, but he could get himself a lot of the way
there by putting a word in a search box.
06:59
And so this little moment had a really profound impact on me and our family because it helped us change
our frame of reference for what was going on with him, and worry a little bit less and appreciate his
resourcefulness more.
07:17
Facts are stupid things. And they're vulnerable to misuse, willful or otherwise. I have a friend, Emily
Willingham, who's a scientist, and she wrote a piece for Forbes not long ago entitled "The 10 Weirdest
Things Ever Linked to Autism." It's quite a list. The Internet, blamed for everything, right? And of course
mothers, because. And actually, wait, there's more, there's a whole bunch in the "mother" category
here. And you can see it's a pretty rich and interesting list. I'm a big fan of being pregnant near freeways,
personally. The final one is interesting, because the term "refrigerator mother" was actually the original
hypothesis for the cause of autism, and that meant somebody who was cold and unloving.
08:11
And at this point, you might be thinking, "Okay, Susan, we get it, you can take data, you can make it mean
anything." And this is true, it's absolutely true, but the challenge is that we have this opportunity to try to
make meaning out of it ourselves, because frankly, data doesn't create meaning. We do. So as
businesspeople, as consumers, as patients, as citizens, we have a responsibility, I think, to spend more
time focusing on our critical thinking skills. Why? Because at this point in our history, as we've heard many
times over, we can process exabytes of data at lightning speed, and we have the potential to make bad
decisions far more quickly, efficiently, and with far greater impact than we did in the past. Great, right? And
so what we need to do instead is spend a little bit more time on things like the humanities and sociology, and
the social sciences, rhetoric, philosophy, ethics, because they give us context that is so important for big
data, and because they help us become better critical thinkers. Because after all, if I can spot a problem in an
argument, it doesn't much matter whether it's expressed in words or in numbers. And this means teaching
ourselves to find those confirmation biases and false correlations and being able to spot a naked emotional
appeal from 30 yards, because something that happens after something doesn't mean it happened because
of it, necessarily, and if you'll let me geek out on you for a second, the Romans called this "post hoc ergo
propter hoc," after which therefore because of which.
10:10
And it means questioning disciplines like demographics. Why? Because they're based on assumptions about
who we all are based on our gender and our age and where we live as opposed to data on what we actually
think and do. And since we have this data, we need to treat it with appropriate privacy controls and
consumer opt-in, and beyond that, we need to be clear about our hypotheses, the methodologies that we
use, and our confidence in the result. As my high school algebra teacher used to say, show your
math, because if I don't know what steps you took, I don't know what steps you didn't take, and if I don't
know what questions you asked, I don't know what questions you didn't ask. And it means asking ourselves,
really, the hardest question of all: Did the data really show us this, or does the result make us feel more
successful and more comfortable?
11:11
So the Health Media Collaboratory, at the end of their project, they were able to find that 87 percent of
tweets about those very graphic and disturbing anti-smoking ads expressed fear, but did they conclude that
they actually made people stop smoking? No. It's science, not magic.
11:32
So if we are to unlock the power of data, we don't have to go blindly into Orwell's vision of a totalitarian
future, or Huxley's vision of a trivial one, or some horrible cocktail of both. What we have to do is treat
critical thinking with respect and be inspired by examples like the Health Media Collaboratory, and as they
say in the superhero movies, let's use our powers for good.
12:05
Thank you.
Paul Strassman
when the history of computing will be
00:02
written I'm sure Google will be noted as
00:06
a major milestone in the development of
00:09
information science and information
00:12
management we are coming to an end of an
00:16
era and therefore we have to understand
00:20
that Google represents the future of
00:25
computing in a different way and I'll
00:28
explain why this is so I want to preface
00:31
my remarks by saying that you should not
00:34
buy a Google stock this is not a Google
00:36
stock promotion here Google in fact as a
00:39
company may fail that is not the point I
00:43
want to make here I want to just say
00:45
they are the harbinger of change that
00:50
will be imitated and copied and will
00:54
then set the tone for many of your
00:56
careers the students in here judging
01:01
from their age represent maybe a
01:05
generation that will be managing and
01:08
providing leadership for information
01:10
technology from year 2010 through year
01:14
2065 if there is so then the
01:20
understanding how Google thinking
01:23
changes the environment becomes a very
01:26
important part of your education history
01:32
is very important in understanding where
01:35
we have been basically the dimensions of
01:42
change over the last 50 years can be
01:45
quantified in terms of sources and time
01:49
delay responsiveness and during the data
01:52
centric area we had hundreds of sources
01:55
impacting a corporation and gradually
01:59
going from monthly to weekly cycling of
02:02
information is the information
02:05
technology
02:07
geishas moved from finance which was
02:09
basically a monthly animal to marketing
02:13
which became a weekly animal in 1980 the
02:18
demand for responsiveness and utility
02:23
shifted and in this particular area of
02:26
which I for lack of better definition
02:29
called a workgroup centric area we have
02:32
created millions of islands of
02:35
automation centered around servers but
02:40
these were little college shops and some
02:44
of them growing to substance but
02:46
nevertheless being scattered not
02:50
integrated not interoperable not very
02:54
reliable and having a rather slow way of
02:58
responding to external situation this is
03:02
what I call the Microsoft Intel era the
03:08
Google era which I will be discussing
03:11
today again this is symbolic deals not
03:16
with data or text but with multimedia it
03:20
deals with billions of sources of
03:23
information and basically shrinks the
03:27
information latency to real-time and by
03:32
the time we go into the generation of
03:36
2015 2025
03:39
a real-time responsiveness then becomes
03:43
the currency under which systems operate
03:48
these systems I call Network centric I
03:53
use just one of many examples but I'm
03:56
sure you recognize this particular
03:57
diagram this is where in real time
04:02
interaction between satellites drones
04:06
aircrafts cruisers guided missile
04:10
launchers and so forth are necessary in
04:12
order to execute
04:15
mission now this is a military example
04:19
which is appropriate in the setting here
04:21
in the Washington area but even when
04:23
you'll start looking at environments
04:26
like Federal Express for instance they
04:30
are all going to it network-centric real
04:33
time environment so we are already
04:36
moving in that direction now here are
04:41
the specifications ladies and gentlemen
04:44
that you will have to deliver in your
04:46
careers first the system that you
04:50
deliver has to be extremely reliable has
04:53
to be down less than five minutes a year
04:58
that six to eight sigma reliability you
05:02
must be able to represent the real-time
05:07
awareness of the situation in a very
05:11
fine very rapid color high
05:15
discrimination display you must be able
05:19
to tap into a multimedia environment at
05:22
least at gigabyte per second the latency
05:26
cannot exceed more than a quarter of a
05:30
second globally anywhere in the world
05:32
and if you want to innovate and change
05:35
the environment you must be able to
05:38
innovate in less than a day while at the
05:42
same time assuring security to an
05:45
extremely high level of fidelity so
05:48
those are ladies and gentlemen the tasks
05:52
before you and you don't have much time
05:55
to deliver because there are customers
05:57
out there who involved exactly meeting
06:00
of these kind of specifications now the
06:06
good news is that it's an awesome
06:08
opportunity because when you compare
06:11
what we have today in the client-server
06:14
group we're kind of a group work
06:17
environment we look pretty sick let me
06:23
just give you the highlights of what I
06:26
see
06:27
typical specifications first when you
06:31
look at any budget of any logical
06:35
operation and certainly when you look at
06:37
any of the government budgets especially
06:40
in the Department of Defense the much of
06:44
the effort of the IT spending is devoted
06:49
strictly to staying alive and getting
06:52
the thing patched so that it sort of
06:54
doesn't fall apart I call that the
06:57
infrastructure over 50% of the money is
07:01
being spent by people running around
07:04
during the break I talked to a number of
07:07
you who are night students who are
07:09
systems administrator who make a living
07:11
by making sure that there's paper in the
07:14
printers and that nobody has kicked the
07:17
cables on the at the desk and so forth
07:20
but you're part of infrastructure which
07:23
is important but there is no way of
07:26
creating value just from the
07:28
infrastructure I deliberated very much
07:32
on the subject of what is the current
07:34
performance of security the only
07:38
quantitative number I could find was an
07:41
on boolean question mark when you look
07:46
at the cost of fielding an application
07:49
you find that much of the ridging of an
07:53
application into systems environment
07:57
consists of integration sort of just
08:00
splicing things performing neurosurgery
08:05
on a run on a runner who is bleeding now
08:10
that's a good analogy and then of course
08:15
you have network downtime and I'm being
08:19
very kind but network downtimes
08:23
particularly on email availability in
08:26
many of the operational functions is
08:28
deplorable and if you want to innovate
08:31
anything you have to have a feasibility
08:34
study and god help you if you have to go
08:36
through DoD acquisition
08:38
you know that can extend infinitely and
08:43
indefinitely so here is the gap between
08:46
what is needed and what we have the
08:52
question is can we get from where we are
08:56
to where we are going and the conclusion
09:01
is very simple namely that you cannot
09:06
design network centric systems with the
09:12
existing workgroup centric architecture
09:14
just not doable you cannot design an
09:20
airplane using railroad technology just
09:24
another example even good trains don't
09:29
fly so so the this is a very important
09:35
good-news-bad-news kind of a slide the
09:40
good news is that we know where we need
09:42
to go the bad news is that all those of
09:46
you who are with it in your careers to
09:48
work group centric architecture based on
09:52
what you study these days is going to be
09:54
obsolete so you might as well get ready
09:57
for a different view of the world of the
10:00
future and here are four principles
10:05
which I see are manifested in Google and
10:11
they are of course many variants on this
10:15
thing but here it goes first if you want
10:20
to have reliability if you want to have
10:23
uptime if you want to every dungeon see
10:26
you have to build and operated protected
10:28
Information Network the current internet
10:33
is not a protected information
10:36
environment what you have is an
10:40
outgrowth of a very clever
10:42
DARPA actually ARPA research effort
10:46
build by the professors for professors
10:49
and for students
10:51
all you have now is an extrapolation of
10:54
that that is not going to be robust
10:57
enough to meet their requirements so if
11:00
you want to build your secure
11:04
environment on the existing
11:06
second-generation work group environment
11:09
internet it won't work you need a
11:11
different environment second you have to
11:16
offer Universal connectivity for
11:19
collection processing storing of
11:21
information and you must provide secure
11:24
communications now too is the principle
11:29
number two is a mouthful which leads
11:32
really to principle number three namely
11:36
that in order to achieve the
11:39
interoperability in the collection you
11:42
must maintain shared data models in
11:45
other words when you study the Bible in
11:51
the book of Genesis chapter 11 you may
11:55
those of you who are biblical scholar
11:56
remember that when the good Lord wanted
12:00
to confound human affairs he scrambled a
12:03
data dictionary
12:04
now they say loose interpretation of the
12:07
Bible but I'm sure you get what I mean
12:10
we must have here data models in order
12:13
to have interoperability across various
12:16
functions but fourth is actually the
12:20
most important one is this cannot be
12:22
playing the future cannot be playing in
12:24
some kind of a master design it has to
12:27
be evolutionary it cannot be obtained by
12:31
issuing an RFP for to the usual Beltway
12:37
consultants saying will give me a
12:40
network centric design it just cannot be
12:43
done the network centric principle is I
12:49
will demonstrate in a moment depends on
12:51
candela
12:51
continue upgrading in innovation and
12:54
experimentation so
12:58
what I will do rather than talk in
13:01
abstract I will use the Google
13:05
principles as an illustration of how
13:10
these principles are actually executed
13:15
the first one is building and operating
13:18
protected information network the
13:25
fundamental reality of our Google lead
13:28
is not a search application Google is
13:34
not a search application it's the only
13:36
way how they make lots of money it's the
13:39
only way
13:40
it's the secret juice that they get the
13:43
secret formula for getting up what is
13:47
basically a massive parallel processing
13:52
environment consisting of clusters and
13:55
in each cluster you have racks and in
14:01
each racks you have machines you have
14:04
multi multiple CPUs and basically you
14:10
have clusters and whether they are 20 or
14:15
40 nobody knows this is something that
14:18
is held very tightly but the underlying
14:22
fact about the reality of the Google
14:26
architecture is the fact that it is a
14:28
massive parallel machine consisting of
14:33
lots of clusters which consists of lots
14:36
of servers maybe 200,000 maybe 300,000
14:40
servers all connected all working
14:42
together and therefore what Google has
14:45
is the world's largest computer although
14:49
it is in small pieces fairly simple
14:53
machine once you hook up logically
14:57
hundred thousand 200 or 300 thousand
15:00
servers you have a supercomputer the
15:04
likes of which nobody has ever seen now
15:07
I also want to point out to you that the
15:10
way how these things are done is that
15:14
each of the clusters is basically the
15:16
same architecture in other words each
15:20
cluster is what's called index and
15:23
awareness of what exists in the network
15:26
it is all the documents in other words I
15:30
will be showing you for instance
15:31
references to post draftsman in Arabic
15:35
just happened to hit on that that is
15:39
most likely hosted in a document server
15:42
in Singapore it's surely not being
15:45
hosted in New York or in Washington but
15:49
if you make an inquiry in Connecticut
15:51
the index server would know that this
15:55
record is available in Singapore and the
16:01
duplicate of that record is also backed
16:04
up somewhere in the Pacific and then
16:08
brings that particular applications
16:11
through a web server into a web switch
16:13
and brings it to my desktop in New
16:17
Canaan Connecticut in less than quarter
16:19
of a second and so it is the duplication
16:26
of identical architectures which is the
16:30
secret sauce of Google it is the massive
16:35
parallel application of enormous amount
16:38
of computing power in a very organized
16:41
way self aware as part of the network in
16:46
order to find information and then
16:49
combine information from various sources
16:55
those of you who are involved in the
16:58
arcane art of building data centers I
17:02
was able to buy some subterfuge actually
17:07
show you a picture
17:08
what a cluster looks like I was told and
17:12
this was into an internal when you do a
17:16
search on Google they really
17:19
very little information unless you get
17:21
little devious and once in a while they
17:23
slip up and this particular operator
17:27
claimed that they put it up in three
17:28
days subsequently I found out that
17:32
setting up a Google cluster in three
17:35
days is just too slow and too
17:39
labor-intensive my latest intelligence
17:42
which are revealing today for the first
17:44
time is that Google has now
17:48
containerized clusters and they can be
17:52
drop shaped and put up in less than
17:55
eight hours anywhere in the world now it
17:59
is vast implication from the national
18:02
security and defense standpoint because
18:04
this is exactly the kind of capability
18:07
you need in the battlefield so what is
18:14
then the secret to Google
18:16
it is the infrastructure you have over
18:21
two and a thousand custom-built
18:22
commodity servers these are custom built
18:25
by a foundry in Taiwan this is no fancy
18:31
architecture no raid disk
18:33
these are off-the-shelf low-cost boxes
18:37
by the way you can buy one of those
18:39
boxes with some reservations so that you
18:42
can actually have a look at these things
18:44
these are basically pizza like kind of
18:47
boxer you slide them in and they are
18:50
billed as a full-color and hardware
18:54
which means any one of those servers
18:57
when they fail don't matter they are not
19:01
only plug replaceable but immediately at
19:05
least one and sometimes as many as three
19:07
servers pop in because the index knows
19:10
where the backup is now I don't know
19:19
whether your classes include petabytes
19:21
as a is a scale about a petabyte a
19:28
thousand terabytes terabyte is thousands
19:32
bites very quickly you can sort of
19:34
multiply the numbers or we know it's
19:37
more than five petabytes and growing
19:39
rapidly each server is 80 gigabytes
19:44
although is the disk become cheaper and
19:47
the prices drop they just yank out the
19:51
the the disks in factory refurbish it
19:55
and just put it right back one of the
19:58
interesting aspects of the servers on
20:01
the Google network is that when you move
20:05
from Pentium to the whatever the next
20:08
gizmo is it doesn't matter because the
20:12
functionality is not dictated by the
20:14
microprocessor which means it lowers the
20:17
cost you must understand everything that
20:19
I'm describing here is dirt cheap it's
20:23
standard low cost commodity hardware
20:30
when I compared some of the costs for
20:35
infrastructure the cheapest next
20:39
infrastructure is there of Sun
20:40
Microsystems which in my view is at
20:44
least four to six times more expensive
20:46
then this particular configuration when
20:50
you add the bloated Microsoft server
20:54
environment you are dealing with a large
20:58
multiplier of cost so you must
21:00
understand what drives this particular
21:03
environment is cost cost and cost now
21:08
course doesn't come for nothing the the
21:14
the the way how you compensate for the
21:18
fact that you have lots of cheap
21:19
Hardware you compensated by software and
21:23
that means that in addition to serving
21:26
machine cycles from the customer which
21:30
is you most of the Machine cycles are
21:34
really devoted to the system operating
21:37
itself and becoming aware in other words
21:40
the data gets moved
21:44
in the whole global network as the men
21:48
arises in certain frequency
21:51
distributions take place so this is a
21:53
massively parallel self adjusted self
21:56
healing self adaptive environment the
22:00
complex is mathematically very calm very
22:04
the indexing is complex it is a 500
22:10
million by two billion matrix if there
22:15
are any operations research people in
22:17
here or mathematicians they would
22:19
understand this is an awesome
22:24
mathematical exercise which of course
22:27
operates during the time as the demand
22:30
circuit circles around the ural in the
22:33
world there's always machines and
22:36
clusters who are not active and during
22:39
that time the machines just go and work
22:41
on themselves by the way this is the way
22:44
the human brain works the reason you
22:46
need sleep whether you know it or not is
22:50
you need idle time for your sensory
22:52
perceptions in your brain to be
22:55
re-indexed that's why the denial of
22:58
sleep is basically a way of ultimately
23:01
disorienting a human being your brain
23:04
require sleep in their particular case
23:07
it's called indexing re-indexing the
23:13
capital and operating costs our fraction
23:15
of commercial servers the extremely
23:18
scalable the traffic is growing 20 to 30
23:21
percent per month
23:22
you sort of wonder how long can this go
23:24
on and and we don't know but the data
23:29
centers are growing and being drop
23:31
shipped as conditions warrant now
23:37
because we are here in the national
23:39
security area I want to point out that
23:42
replication is the way how you deal with
23:46
reliability much of the inheritance from
23:51
the IBM and Microsoft era is
23:54
if you want reliability you'll just pile
23:56
on more functions more code as well self
24:00
up you you you just put more code into
24:04
an operating system the operating system
24:07
is really a stripped-down Linux version
24:10
proprietary by the way but the reliance
24:15
is on redundancy rather than on layering
24:19
software and the replication is done for
24:24
proximity and response which means you
24:26
replicate depending on the way that the
24:30
man appears so for suddenly there is
24:36
interest in New Orleans many of the data
24:41
that deal with New Orleans would be
24:43
moved from wherever they are to where
24:46
the demand would be arising think those
24:50
of you who are in the department of
24:54
systems engineering and in software
24:57
should understand that reliability can
25:01
be achieved with software and
25:03
architecture not with hardware those of
25:07
you who understand the way we do
25:10
reliability now right now for instance
25:12
you want to allow a reliable desktop you
25:15
will take your hard disk and make it a
25:18
raid disk to operate the array or far up
25:22
raid array they don't have any rein in
25:25
the system the whole system as a system
25:28
is a raid so you then rely on indexing
25:35
for response by moving transactions and
25:40
data to point of use we do it
25:43
do this dynamically and by the way the
25:50
it's easy to do indexing of text they
25:55
are now doing indexing of images which
25:58
is really the future challenge and their
26:02
dynamic indexing is very demanding and
26:05
of course you
26:07
started taking messages and the way you
26:12
do it if you take the index and you
26:15
break up the index so the index is not
26:18
in one place
26:19
the index is broken in two what's good
26:23
they call shards and they are
26:26
distributed across data centers so you
26:28
could kill any one data center now I
26:32
have no evidence corroborated evidence
26:35
that this is so but there was a
26:37
particular moment for a number of
26:39
reasons where they lost half of the
26:42
shards and they were still operating
26:44
only with a small delay in latency now
26:52
how does this work well the the issue
26:58
really has to deal with the query
27:00
serving infrastructure you in this kind
27:05
of a distributed environment you cannot
27:09
think in terms of a particular server
27:11
that has a particular database or Oracle
27:18
file and so forth and you hit against
27:20
and you want an answer you really when
27:25
you assemble a display and I'll be show
27:28
you some displays in a moment you may
27:31
have to go to many places where there
27:33
pieces of this display there are pieces
27:37
to the answer that you never know how
27:39
the question will be asked and the
27:43
question is not standard because the
27:45
question can be totally improvised it
27:49
can be in different languages which I'll
27:51
be demonstrating to you there's over 80
27:53
languages in which a particular question
27:56
can be asked yet the answer it still has
27:58
to come out to a particular disk top in
28:01
the context of that equerry and so that
28:05
means that in on in order to answer
28:08
certain complex questions particularly
28:11
the question is some boolean searches
28:13
you may actually involve the cooperation
28:16
over more than thousand servers index
28:19
servers
28:20
data servers and web servers in order to
28:24
answer their particular questions and
28:28
that means that the document servers
28:31
also have to then look at the pieces of
28:34
information which fits a particular
28:35
inquiry this is a subject this slide
28:40
alone would be worthy of a semester of a
28:45
course in engineering now how do you in
28:53
the world keep this thing going
28:56
particularly if you keep popping in
28:59
these clusters you know drop shipping
29:01
them you know and then plugging them in
29:03
and you suppose they're supposed to take
29:05
over they have a private proprietary
29:11
system called the MapReduce server
29:14
system which is highly proprietary and
29:19
they basically coordinate all the
29:22
servers in real time and distributes the
29:25
workload so that if the workload and
29:30
then I've seen the diagram of the work
29:32
holds the workload shifts with the time
29:35
time zone in which you operate and
29:39
therefore in order to optimize the use
29:42
of the system you basically have to
29:44
distribution of the workload in order to
29:47
keep both reliability and latency
29:50
up-to-date and you also must have a
29:57
capability in this environment to
30:01
reconstitute the service in case any
30:04
particular server or pieces of a cluster
30:07
or a component fails and therefore you
30:12
must have an operating system the likes
30:15
of which the world has never seen and
30:18
it's not an operating system it's
30:20
basically a master scheduler scheduler
30:22
which then monitors the performance
30:25
because it's a really an automatic
30:28
system that keeps track of itself
30:33
okay so so far this has been easy I hope
30:36
you appreciate the elegance of this but
30:40
it's a tough order and some of you who
30:44
have been in this business for a while
30:46
and you have built systems data centers
30:52
or a client-server system understand
30:54
that this is a degree of complexity
30:58
several orders of magnitude over what is
31:00
your experience now let's go now to the
31:03
second principle which is universal
31:05
connectivity by the way before I finish
31:11
I want to point out to you that all my
31:15
presentations here come with a 90-day
31:17
warranty which none of your professors
31:20
ever give you my warranty consists of
31:23
the fact if you have any question about
31:24
a specific slide any of the handout you
31:28
can email to me the question and if it's
31:34
decent and answerable another request
31:37
for free consulting I will put it on a
31:39
blog so you'll be able to see who is
31:42
asking what questions okay alright
31:48
universal connectivity principle number
31:50
two when we say universal it means that
31:56
it has to interface in many languages
32:01
although English is a dominant language
32:07
among computer experts computer experts
32:10
represent only a tiny fraction of the
32:13
human population and the question about
32:17
proportion of the human intelligence
32:18
that represents is still arguable
32:20
nevertheless the world is multilingual
32:23
multicultural and here's an example of
32:26
how the various languages get responded
32:32
to I know some of these languages I
32:34
tested it and I was amazed again
32:38
here is my inquiry and apparently I was
32:47
able to find a quotation in Iran in in
32:55
Arabic documentation published in
32:59
Philadelphia of all places and then some
33:05
kind of a reference to quotation and
33:11
these were searches of Arabic pages
33:15
again I'm just showing to you this as an
33:19
example of collectivity another example
33:25
of connectivity which is not only text
33:28
is that you may wish to go back and
33:33
actually go to Google video and see my
33:39
prior presentations as well as this
33:42
presentation that will be posted after
33:45
Google will make sure that this is
33:48
decent and awesome and doesn't violate
33:51
any laws so you will be seeing my
33:58
presentation of today most likely in
34:02
about a month or so now here is one that
34:08
is sort of interesting this is a new
34:11
application called Google base and in a
34:14
moment I'll be talking more about
34:15
applications and innovation but now now
34:20
the thing becomes very interesting the
34:23
question that was asked locate events
34:26
within 45 miles of New York in November
34:29
of 2005 now that you know this is a
34:33
pretty sophisticated kind of a question
34:36
and it shows that there is a BCD
34:43
and if you want to know D which is on
34:47
the near the docks on what looks to me
34:51
like 34th Street and unsavory district
34:54
by the way it would tell me it's souls
34:58
on all re which and there you can click
35:02
on that and and you can you can find out
35:07
what's going on and the 531 West 25th
35:15
Street in the mid district and you can
35:19
then click on it and find out what that
35:21
is by the way since this is typically
35:26
done as a Google innovation all of that
35:31
is called a beta version they have they
35:34
have over hundred beta applications as
35:37
you see report bad item they are asking
35:40
the customers this is one of the major
35:42
innovations on a part of the Google way
35:46
of doing things is the the customer is
35:49
the test and the tester of
35:52
of the application what is becoming very
36:01
important however from from a standpoint
36:05
of the future has to deal with what's
36:09
called semantic parsing because if you
36:13
will just index on the keywords you
36:17
would get just too many answers and
36:19
therefore you really have to parse
36:21
things in context and in this particular
36:25
case somebody wanted to know Bay Area
36:28
cooking classes and apparently there are
36:33
related items this is called semantic
36:37
parsing this is how your brain works
36:38
your brain basically particularly parses
36:42
what people say and relates it what you
36:45
don't know to what you know now going to
36:50
principle number three shared data
36:52
models I cannot
36:56
overestimate the importance of this
36:59
particular development particularly from
37:01
the standpoint of national security you
37:04
need to have a standard file system on
37:09
the hundred thousands of servers in
37:11
order to cooperate with one another now
37:14
the the text may be written in Arabic or
37:18
in English or whatever else but neveress
37:23
from a structural standpoint they must
37:25
be interoperable and they must be able
37:29
to be scheduled so that somebody asking
37:32
a particular question can get the answer
37:34
and therefore what I'm trying to tell
37:38
you here if you ever want to build for
37:41
an organization like this uh google-like
37:44
system you have to think very hard about
37:47
the kind of environments
37:50
that you are going to put into the
37:52
infrastructure in order for that
37:55
environment to be responsive to what
37:58
increasingly in the battlefield becomes
38:01
totally ad-hoc unprecedented inquiry or
38:07
request for information
38:13
so the the entire engine of detailed
38:18
profiles speaks on an intubated
38:22
directory I don't want to go into the
38:25
subject of metadata directory this is
38:27
very important I must tell you that the
38:30
partners of the range sustained a great
38:32
deal of money
38:33
in the 1991-1992 period of building a
38:37
metadata directory so that friendly-fire
38:42
would not take place because the data
38:44
were for the coordinates were not
38:47
compatible
38:49
I'm alluding to something that really
38:51
happened so that means that the data
38:55
transfers if you have thousand servers
39:00
cooperating the data transfers from
39:04
machine to machine must take place at
39:07
the Machine level because of the speed
39:10
again if you're shooting for redundancy
39:13
accountability in low latency data
39:18
transfers came up negotiate meaning and
39:21
that means that you must have an
39:24
extremely fast processing taking place
39:28
within the system in order to be able to
39:31
execute the transactions and again the
39:37
chunks is they are being processed after
39:39
placated again in case something gets
39:41
lost so the entire network is
39:44
probabilistic not deterministic they are
39:51
here is an example for its data
39:54
dictionary for interoperability and by
39:57
the way this is this is the data tags
40:02
which are available to developers so for
40:07
instance
40:08
the Google map environment is now
40:11
available to developers to integrate
40:13
into their own application now in order
40:18
for my application which may be
40:22
organization specific and for instance
40:27
if I'm the hand of Maxwell House coffee
40:29
and I want to know which stores or which
40:32
neighborhoods are deep consumers Olmecs
40:35
full house one-pound drip dry the drip
40:40
drip coffee chain's I would then
40:44
construct an application and then go to
40:47
google map using their tags in order to
40:50
feed my Maxwell House application from
40:54
the lab so that the map display that it
40:58
showed where the other stores where the
41:00
coffee drinkers are which leads me
41:05
really to the whole issue of api's this
41:09
is the key to the future of
41:12
network-centric operation you cannot
41:17
rely on a single organization Google
41:20
there's only six thousand employees and
41:23
they are very choosy about over the edge
41:26
of the paper the objective really is to
41:30
engage developers thousands and ten
41:33
thousand if you want to take one hot tip
41:36
away from a career standpoint from this
41:39
presentation the young student series if
41:43
you really want to make lots of money
41:45
and sign up the Google developer and so
41:56
you can then create environments by
42:00
using their hooks into the database
42:03
which is out there which is growing and
42:07
then coming in
42:10
interfacing the environment to this API
42:15
then fix the massive pedo computer with
42:22
your particular application courses
42:26
feeds directly into principal number
42:28
four which is upgrading innovation this
42:32
thing will not work unless it's dynamic
42:36
you must understand there are lots of
42:38
search engines out there search engines
42:42
are not that hard to put up what is
42:45
different here is we've created a
42:49
massive parallel computer that's growing
42:51
and that's sucking in data and then
42:54
makes available the possibility to very
42:59
quickly you as the infrastructure
43:02
without incurring the cost of the
43:04
instructure and then grow your
43:05
application on top of it here's a
43:08
partial miss which you want to become
43:11
acquainted with some of the very neat
43:15
very smart
43:18
many of these are still experiments but
43:22
many of these online services have been
43:27
fielded in less than three months by a
43:31
team of programmers well the list
43:33
density of four people they have a
43:37
server and plus tree developed it's
43:39
strictly for development and all the
43:43
Google people are encouraged to spend at
43:47
least one day a week paid time to play
43:51
around with innovation it's a fun thing
43:55
to do
43:56
you need to do I'm lucky enough to have
43:57
that privilege and so you can then do an
44:02
infinite number of things this one I
44:05
like this is called frugal and all these
44:10
things get very cute of course but you
44:13
can now use the master computer to
44:17
extract from Samsung who's abend or not
44:22
only the catalog of what they have but
44:27
also the manuals princes you know I've
44:30
got lots of the electronic widgets and
44:32
you know I'd instead of this big folder
44:33
full of manuals which were always
44:37
obsolete I don't have any manuals
44:39
anymore I just go and if I look for
44:42
something I just
44:43
frugal the manual for a particular
44:46
contraption I have and I get the latest
44:48
release and can analyze it and of course
44:51
it tells me where you can buy it and
44:54
then you can put cute things like the
44:55
price range you on the luggage stores
44:59
which are nearby and so forth and so
45:01
forth and of course you into
45:04
experimentation this is clearly clearly
45:10
environment that encourages innovation
45:14
and should be prototype of how our new
45:21
transformational efforts in both the
45:23
national intelligence and in defense
45:25
should be guided here is an example
45:33
I mention that I will put all of your
45:35
inquiries on the night of my on my blog
45:43
again I this is an application which is
45:48
inside Google in 15 minutes
45:54
I can put up an application with my
45:56
picture with my archives and with a
45:59
whole history publish the bulletin board
46:20
now all this thing is sort of
46:23
interesting because there's a big how
46:26
far you can only go from the master
46:29
computer you actually for certain
46:32
applications you have to stop occupying
46:35
the desktop you have to lost God
46:37
occupying the input up here is a Google
46:42
video viewer I don't want to get into
46:46
the details it is very sophisticated but
46:49
it means that in certain instances you
46:52
would have to put a handle into your
46:56
machine in order to interface with a
46:58
particular application now whether this
47:01
is an attack of like yourself is being
47:03
debated as we stand here and of course
47:08
that means that for the first time I
47:11
identify myself as a person maybe
47:17
membership or security in participation
47:23
network centric environment so let me
47:28
now conclude I understand that I had 60
47:34
minutes
47:39
let me leave with you a set of
47:42
comparisons so that you could relate
47:45
what you have today the world today can
47:54
be represented by this diagram the blue
47:58
dots are servers mostly Microsoft
48:02
servers and then each service a little
48:05
island owned by a seesaw or by an
48:09
operator or by a union or contracted to
48:12
a particular a contractor and so forth
48:15
there's zillions of live all over the
48:17
world providing employment to two people
48:21
who then must do the updating do the
48:27
desertification or whatever you call the
48:30
elimination of doing the updates that
48:35
come in almost weekly and so forth and
48:38
so this is the application workgroup
48:42
computing today these millions of local
48:44
applications and local data the only
48:47
analogy to this world is you know in the
48:51
12th century where every little town and
48:54
they're all a shoemaker
48:56
their own cabinet maker and their own
49:00
spinner and so forth
49:03
no economies of scale and this was
49:08
basically a craft environment the
49:13
problem with that environment is is very
49:16
vulnerable when you look at the
49:20
vulnerability today this is just keep
49:23
out is you see that in order for these
49:28
little enclaves of computing to manage
49:34
themselves in a year of increased
49:37
complexity are they heading covered in
49:41
complexity and any hardware and every
49:44
time they do it they become more
49:46
vulnerable so we are coming to an end of
49:50
a period where both dis da balloon
49:54
applications in elements faces 92
49:58
percent of all the desktops in the world
50:00
are Microsoft desktops and the
50:04
cross-platform applications are
50:06
increasingly vulnerable to attack and
50:09
compromised by the way the other system
50:13
certainly unique new systems have other
50:16
and therefore Cisco system that deal
50:20
with the switching also a vulnerable but
50:23
the fact that we have designed network
50:26
switching zipper on the Cisco from the
50:30
glands and from the lands means that
50:33
each of them standing on their own each
50:36
be given time had to have their own
50:41
defenses their own mouths their own
50:45
guards and so forth where the Turks were
50:47
coming and didn't do very well when the
50:51
Turks came in massive like I'm looking
50:54
from history of the town where I come
50:56
from the new Internet is I see is that
51:02
there's going to be billions of browsers
51:05
and I mean literally billions every cell
51:09
phone is a browser and you have browsers
51:13
which are music browsers and I polished
51:17
and then you there an unlimited in
51:21
degree of imagination for doing this
51:23
thing and they have to all share because
51:27
each of these browsers have to be window
51:31
into a multiplicity of functions and
51:34
therefore that is where we are going is
51:37
an architecture so where does it go
51:42
ahead the strategy of the last 20 years
51:46
is being captured in desktop it was done
51:48
very successfully very constructively by
51:51
Microsoft and they deserve all the
51:55
billions of profits that economy the new
52:00
era is that the desktop is not a
52:04
sustainable defense position for anybody
52:07
anymore
52:08
and the only defense defensible position
52:12
today these dozen can occupy the
52:14
Internet and by the way this doesn't
52:16
have to be one company multiple
52:19
organizations can occupy the Internet an
52:23
internet would be a different internet
52:25
than what we have today now what is
52:28
different from an economic standpoint is
52:30
in the word probe centric environment
52:33
the vendor sells you the software
52:35
license and you put in the labor and
52:39
capital costs you take the risk the
52:43
vendor takes no risk at all and you have
52:49
no recourse the network centric
52:52
environment fundamentally changes the
52:55
economics by placing the labor in
52:58
capital into the network without huge
53:02
economies of scale and where the
53:04
knowledge capital which is a term I will
53:08
be using in my next presentation by the
53:11
third lecture he's really dominating the
53:14
solution the worker of centric
53:18
environment created isolated silos or
53:23
whatever term is being used these days
53:25
which are using specific infrastructures
53:29
the problem is that the infrastructures
53:33
are trying to communicate with one
53:35
another and it's done with great
53:37
difficulty
53:38
you must have an infrastructure that
53:41
Universal certainly from a national
53:43
security standpoint that is necessary
53:47
now the problem with workgroup syndrome
53:50
environment is there is too much labor
53:52
too many consultants too many interns no
53:57
disrespect
54:00
I'll pay for their tuition no it just
54:05
means that the user is put moats around
54:08
this little castle and there is lots of
54:11
labor to control what happens inside the
54:14
castle Google approach basically is that
54:18
this is too complicated cannot be done
54:22
by users or by people it is to be
54:25
automated the workgroup environment its
54:30
operating system depends in implement
54:34
the Edward centric environment must be
54:37
open source browsers you have a totally
54:41
different economic model which really
54:43
deals with demand pricing and most
54:49
importantly has to do with intelligence
54:52
data read from file has no context it is
54:58
a data and then your brain has to look
55:01
at the gaze of the screen and figure out
55:03
what it means the amount of data
55:06
increases exponentially the brain cycles
55:09
are not really enough to deal with that
55:11
so what you need to do is the sample
55:14
data holding context if you are a
55:16
infantry to a leader in a ditch
55:20
somewhere in front of some godforsaken
55:22
village you don't want to have a data
55:26
dump from the satellite in gigabyte need
55:31
only an answer to a very simple question
55:33
about is this village safe or is it
55:36
being tested out and do we have someone
55:40
in there that we can trust so the future
55:47
is I see is that everything will be on
55:51
the Internet everything your cell phone
55:55
your oven your refrigerator
56:00
everything the selling product
56:04
electric meter on your car for
56:07
maintenance real-time maintenance doing
56:11
preventive maintenance or the karim
56:12
disorders have today you know their cars
56:15
we have GPS built-in and diagnostics
56:21
transmits to to general motors for if
56:25
you paid enough for a car you get that
56:27
kind of service and that means that all
56:30
data voice video and sensor input will
56:32
be accessible selectively is needed and
56:36
that means that if you want to be in the
56:38
telephone business the TV business on
56:41
the print business and newspaper
56:42
business you better get yourself a
56:45
second job to get ready for the time
56:48
when people will stop this missing
56:50
staffs from these organizations the
56:55
future is in services looking technology
56:58
services that respond to questions as
57:02
needed by the consumer information is
57:07
displayed in the context that is
57:09
relevant to the culture personality and
57:12
habits of the customer and the
57:15
applications are there for making
57:18
decisions what do I buy where do I go
57:21
lovely boy
57:25
now why is this important a whole
57:31
national security environment whether
57:34
it's Homeland Security Department of
57:36
Defense
57:36
intelligence really depends on the
57:40
ability to have a superior intelligence
57:43
in order to deal with the challenges
57:46
competitive challenges which are
57:48
economic and terrorist threats of the
57:50
21st century we cannot do it with
57:54
workgroups anymore we must transform in
57:58
the Turkish transform to Network centric
58:00
services and we don't need much time to
58:02
do it now you kiddo just go in and blow
58:06
up what you have today you must be able
58:09
to migrate and the way you migrate your
58:13
vibrates with displacement leave what
58:16
you had and put the new stuff and then
58:19
build it and they will come as who those
58:22
of you who have seen the movie build it
58:25
and see and let the customer decide
58:28
whether they get viable service and then
58:31
pocket the money which the days getting
58:34
scarcer and scarcer invested in
58:36
innovation so in conclusion then I hope
58:40
that the relevance of Google is a future
58:45
vision of the environment it would be
58:49
something that will be useful to you and
58:51
I certainly wish you well and I will
58:53
answer any question that you submit to
58:55
me by email
English (auto-generated)
David
00:12
It feels like we're all suffering from information overload or data glut. And the good news is there might be
an easy solution to that, and that's using our eyes more. So, visualizing information, so that we can see the
patterns and connections that matter and then designing that information so it makes more sense, or it tells
a story, or allows us to focus only on the information that's important. Failing that, visualized information
can just look really cool.
00:38
So, let's see. This is the $Billion Dollar o-Gram, and this image arose out of frustration I had with the
reporting of billion-dollar amounts in the press. That is, they're meaningless without context: 500 billion for
this pipeline, 20 billion for this war. It doesn't make any sense, so the only way to understand it is visually and
relatively. So I scraped a load of reported figures from various news outlets and then scaled the boxes
according to those amounts. And the colors here represent the motivation behind the money. So purple is
"fighting," and red is "giving money away," and green is "profiteering." And what you can see straight
away is you start to have a different relationship to the numbers. You can literally see them. But more
importantly, you start to see patterns and connections between numbers that would otherwise be scattered
across multiple news reports.
01:30
Let me point out some that I really like. This is OPEC's revenue, this green box here -- 780 billion a year. And
this little pixel in the corner -- three billion -- that's their climate change fund. Americans, incredibly
generous people -- over 300 billion a year, donated to charity every year, compared with the amount of
foreign aid given by the top 17 industrialized nations at 120 billion. Then of course, the Iraq War, predicted to
cost just 60 billion back in 2003. And it mushroomed slightly. Afghanistan and Iraq mushroomed now to
3,000 billion. So now it's great because now we have this texture, and we can add numbers to it as well. So
we could say, well, a new figure comes out ... let's see African debt. How much of this diagram do you think
might be taken up by the debt that Africa owes to the West? Let's take a look. So there it is: 227 billion is
what Africa owes. And the recent financial crisis, how much of this diagram might that figure take up? What
has that cost the world? Let's take a look at that. Dooosh -- Which I think is the appropriate sound effect for
that much money: 11,900 billion. So, by visualizing this information, we turned it into a landscape that you
can explore with your eyes, a kind of map really, a sort of information map. And when you're lost in
information, an information map is kind of useful.
02:55
So I want to show you another landscape now. We need to imagine what a landscape of the world's fears
might look like. Let's take a look. This is Mountains Out of Molehills, a timeline of global media
panic. (Laughter) So, I'll label this for you in a second. But the height here, I want to point out, is the intensity
of certain fears as reported in the media. Let me point them out. So this, swine flu -- pink. Bird flu. SARS -brownish here. Remember that one? The millennium bug, terrible disaster. These little green peaks are
asteroid collisions. (Laughter) And in summer, here, killer wasps.
03:42
(Laughter)
03:50
So these are what our fears look like over time in our media. But what I love -- and I'm a journalist -- and
what I love is finding hidden patterns; I love being a data detective. And there's a very interesting and odd
pattern hidden in this data that you can only see when you visualize it. Let me highlight it for you. See this
line, this is a landscape for violent video games. As you can see, there's a kind of odd, regular pattern in the
data, twin peaks every year. If we look closer, we see those peaks occur at the same month every
year. Why? Well, November, Christmas video games come out, and there may well be an upsurge in the
concern about their content. But April isn't a particularly massive month for video games. Why April? Well, in
April 1999 was the Columbine shooting, and since then, that fear has been remembered by the media and
echoes through the group mind gradually through the year. You have retrospectives, anniversaries, court
cases, even copy-cat shootings, all pushing that fear into the agenda. And there's another pattern here as
well. Can you spot it? See that gap there? There's a gap, and it affects all the other stories. Why is there a
gap there? You see where it starts? September 2001, when we had something very real to be scared about.
05:06
So, I've been working as a data journalist for about a year, and I keep hearing a phrase all the time, which is
this: "Data is the new oil." Data is the kind of ubiquitous resource that we can shape to provide new
innovations and new insights, and it's all around us, and it can be mined very easily. It's not a particularly
great metaphor in these times, especially if you live around the Gulf of Mexico, but I would, perhaps, adapt
this metaphor slightly, and I would say that data is the new soil. Because for me, it feels like a fertile, creative
medium. Over the years, online, we've laid down a huge amount of information and data, and we irrigate it
with networks and connectivity, and it's been worked and tilled by unpaid workers and governments. And,
all right, I'm kind of milking the metaphor a little bit. But it's a really fertile medium, and it feels like
visualizations, infographics, data visualizations, they feel like flowers blooming from this medium. But if you
look at it directly, it's just a lot of numbers and disconnected facts. But if you start working with it and
playing with it in a certain way, interesting things can appear and different patterns can be revealed.
06:14
Let me show you this. Can you guess what this data set is? What rises twice a year, once in Easter and then
two weeks before Christmas, has a mini peak every Monday, and then flattens out over the summer? I'll take
answers. (Audience: Chocolate.) David McCandless: Chocolate. You might want to get some chocolate
in. Any other guesses? (Audience: Shopping.) DM: Shopping. Yeah, retail therapy might help. (Audience:
Sick leave.) DM: Sick leave. Yeah, you'll definitely want to take some time off. Shall we see?
06:50
(Laughter)
06:58
(Applause)
07:01
So, the information guru Lee Byron and myself, we scraped 10,000 status Facebook updates for the phrase
"break-up" and "broken-up" and this is the pattern we found -- people clearing out for Spring
Break, (Laughter) coming out of very bad weekends on a Monday, being single over the summer, and then
the lowest day of the year, of course: Christmas Day. Who would do that? So there's a titanic amount of data
out there now, unprecedented. But if you ask the right kind of question, or you work it in the right kind of
way, interesting things can emerge.
07:41
So information is beautiful. Data is beautiful. I wonder if I could make my life beautiful. And here's my visual
C.V. I'm not quite sure I've succeeded. Pretty blocky, the colors aren't that great. But I wanted to convey
something to you. I started as a programmer, and then I worked as a writer for many years, about 20
years, in print, online and then in advertising, and only recently have I started designing. And I've never been
to design school. I've never studied art or anything. I just kind of learned through doing. And when I started
designing, I discovered an odd thing about myself. I already knew how to design, but it wasn't like I was
amazingly brilliant at it, but more like I was sensitive to the ideas of grids and space and alignment and
typography. It's almost like being exposed to all this media over the years had instilled a kind of dormant
design literacy in me. And I don't feel like I'm unique.
08:36
I feel that everyday, all of us now are being blasted by information design. It's being poured into our eyes
through the Web, and we're all visualizers now; we're all demanding a visual aspect to our
information. There's something almost quite magical about visual information. It's effortless, it literally
pours in. And if you're navigating a dense information jungle, coming across a beautiful graphic or a lovely
data visualization, it's a relief, it's like coming across a clearing in the jungle. I was curious about this, so it led
me to the work of a Danish physicist called Tor Norretranders, and he converted the bandwidth of the
senses into computer terms.
09:16
So here we go. This is your senses, pouring into your senses every second. Your sense of sight is the
fastest. It has the same bandwidth as a computer network. Then you have touch, which is about the speed of
a USB key. And then you have hearing and smell, which has the throughput of a hard disk. And then you
have poor old taste, which is like barely the throughput of a pocket calculator. And that little square in the
corner, a naught .7 percent, that's the amount we're actually aware of. So a lot of your vision -- the bulk of it
is visual, and it's pouring in. It's unconscious. The eye is exquisitely sensitive to patterns in variations in color,
shape and pattern. It loves them, and it calls them beautiful. It's the language of the eye. If you combine the
language of the eye with the language of the mind, which is about words and numbers and concepts, you
start speaking two languages simultaneously, each enhancing the other. So, you have the eye, and then you
drop in the concepts. And that whole thing -- it's two languages both working at the same time.
10:18
So we can use this new kind of language, if you like, to alter our perspective or change our views. Let me ask
you a simple question with a really simple answer: Who has the biggest military budget? It's got to be
America, right? Massive. 609 billion in 2008 -- 607, rather. So massive, in fact, that it can contain all the other
military budgets in the world inside itself. Gobble, gobble, gobble, gobble, gobble. Now, you can see Africa's
total debt there and the U.K. budget deficit for reference. So that might well chime with your view that
America is a sort of warmongering military machine, out to overpower the world with its huge industrialmilitary complex. But is it true that America has the biggest military budget? Because America is an
incredibly rich country. In fact, it's so massively rich that it can contain the four other top industrialized
nations' economies inside itself, it's so vastly rich. So its military budget is bound to be enormous. So, to be
fair and to alter our perspective, we have to bring in another data set, and that data set is GDP, or the
country's earnings. Who has the biggest budget as a proportion of GDP? Let's have a look. That changes the
picture considerably. Other countries pop into view that you, perhaps, weren't considering, and American
drops into eighth.
11:33
Now you can also do this with soldiers. Who has the most soldiers? It's got to be China. Of course, 2.1
million. Again, chiming with your view that China has a militarized regime ready to, you know, mobilize its
enormous forces. But of course, China has an enormous population. So if we do the same, we see a radically
different picture. China drops to 124th. It actually has a tiny army when you take other data into
consideration. So, absolute figures, like the military budget, in a connected world, don't give you the whole
picture. They're not as true as they could be.
12:07
We need relative figures that are connected to other data so that we can see a fuller picture, and then that
can lead to us changing our perspective. As Hans Rosling, the master, my master, said, "Let the dataset
change your mindset." And if it can do that, maybe it can also change your behavior.
12:26
Take a look at this one. I'm a bit of a health nut. I love taking supplements and being fit, but I can never
understand what's going on in terms of evidence. There's always conflicting evidence. Should I take vitamin
C? Should I be taking wheatgrass? This is a visualization of all the evidence for nutritional supplements. This
kind of diagram is called a balloon race. So the higher up the image, the more evidence there is for each
supplement. And the bubbles correspond to popularity as regards to Google hits. So you can immediately
apprehend the relationship between efficacy and popularity, but you can also, if you grade the evidence, do
a "worth it" line. So supplements above this line are worth investigating, but only for the conditions listed
below, and then the supplements below the line are perhaps not worth investigating.
13:17
Now this image constitutes a huge amount of work. We scraped like 1,000 studies from PubMed, the
biomedical database, and we compiled them and graded them all. And it was incredibly frustrating for
me because I had a book of 250 visualizations to do for my book, and I spent a month doing this, and I only
filled two pages. But what it points to is that visualizing information like this is a form of knowledge
compression. It's a way of squeezing an enormous amount of information and understanding into a small
space. And once you've curated that data, and once you've cleaned that data, and once it's there, you can do
cool stuff like this.
13:55
So I converted this into an interactive app, so I can now generate this application online -- this is the
visualization online -- and I can say, "Yeah, brilliant." So it spawns itself. And then I can say, "Well, just show
me the stuff that affects heart health." So let's filter that out. So heart is filtered out, so I can see if I'm
curious about that. I think, "No, no. I don't want to take any synthetics, I just want to see plants and -- just
show me herbs and plants. I've got all the natural ingredients." Now this app is spawning itself from the
data. The data is all stored in a Google Doc, and it's literally generating itself from that data. So the data is
now alive; this is a living image, and I can update it in a second. New evidence comes out. I just change a row
on a spreadsheet. Doosh! Again, the image recreates itself. So it's cool. It's kind of living.
14:46
But it can go beyond data, and it can go beyond numbers. I like to apply information visualization to ideas
and concepts. This is a visualization of the political spectrum, an attempt for me to try and understand how
it works and how the ideas percolate down from government into society and culture, into families, into
individuals, into their beliefs and back around again in a cycle. What I love about this image is it's made up of
concepts, it explores our worldviews and it helps us -- it helps me anyway -- to see what others think, to see
where they're coming from. And it feels just incredibly cool to do that.
15:28
What was most exciting for me designing this was that, when I was designing this image, I desperately
wanted this side, the left side, to be better than the right side -- being a journalist, a Left-leaning person -
- but I couldn't, because I would have created a lopsided, biased diagram. So, in order to really create a full
image, I had to honor the perspectives on the right-hand side and at the same time, uncomfortably
recognize how many of those qualities were actually in me, which was very, very annoying and
uncomfortable. (Laughter) But not too uncomfortable, because there's something unthreatening about
seeing a political perspective, versus being told or forced to listen to one. You're capable of holding
conflicting viewpoints joyously when you can see them. It's even fun to engage with them because it's
visual. So that's what's exciting to me, seeing how data can change my perspective and change my mind
midstream -- beautiful, lovely data.
16:35
So, just to wrap up, I wanted to say that it feels to me that design is about solving problems and providing
elegant solutions, and information design is about solving information problems. It feels like we have a lot of
information problems in our society at the moment, from the overload and the saturation to the breakdown
of trust and reliability and runaway skepticism and lack of transparency, or even just interestingness. I mean,
I find information just too interesting. It has a magnetic quality that draws me in.
17:06
So, visualizing information can give us a very quick solution to those kinds of problems. Even when the
information is terrible, the visual can be quite beautiful. Often we can get clarity or the answer to a simple
question very quickly, like this one, the recent Icelandic volcano. Which was emitting the most CO2? Was it
the planes or the volcano, the grounded planes or the volcano? So we can have a look. We look at the data
and we see: Yep, the volcano emitted 150,000 tons; the grounded planes would have emitted 345,000 if they
were in the sky. So essentially, we had our first carbon-neutral volcano.
17:46
(Laughter)
17:48
(Applause)
17:57
And that is beautiful. Thank you.
18:00
(Applause)
Julia
00:00
So I'd like you to imagine for a moment that you're a soldier in the heat of battle. Maybe you're a Roman
foot soldier or a medieval archer or maybe you're a Zulu warrior. Regardless of your time and place, there
are some things that are constant. Your adrenaline is elevated, and your actions are stemming from these
deeply ingrained reflexes, reflexes rooted in a need to protect yourself and your side and to defeat the
enemy.
00:30
So now, I'd like you to imagine playing a very different role, that of the scout. The scout's job is not to attack
or defend. The scout's job is to understand. The scout is the one going out, mapping the terrain, identifying
potential obstacles. And the scout may hope to learn that, say, there's a bridge in a convenient location
across a river. But above all, the scout wants to know what's really there, as accurately as possible. And in a
real, actual army, both the soldier and the scout are essential. But you can also think of each of these roles as
a mindset -- a metaphor for how all of us process information and ideas in our daily lives. What I'm going to
argue today is that having good judgment, making accurate predictions, making good decisions, is mostly
about which mindset you're in.
01:26
To illustrate these mindsets in action, I'm going to take you back to 19th-century France, where this
innocuous-looking piece of paper launched one of the biggest political scandals in history. It was discovered
in 1894 by officers in the French general staff. It was torn up in a wastepaper basket, but when they pieced it
back together, they discovered that someone in their ranks had been selling military secrets to Germany.
01:54
So they launched a big investigation, and their suspicions quickly converged on this man, Alfred Dreyfus. He
had a sterling record, no past history of wrongdoing, no motive as far as they could tell. But Dreyfus was the
only Jewish officer at that rank in the army, and unfortunately at this time, the French Army was highly antiSemitic. They compared Dreyfus's handwriting to that on the memo and concluded that it was a
match, even though outside professional handwriting experts were much less confident in the similarity, but
never mind that. They went and searched Dreyfus's apartment, looking for any signs of espionage. They
went through his files, and they didn't find anything. This just convinced them more that Dreyfus was not
only guilty, but sneaky as well, because clearly he had hidden all of the evidence before they had managed
to get to it.
02:45
Next, they went and looked through his personal history for any incriminating details. They talked to his
teachers, they found that he had studied foreign languages in school, which clearly showed a desire to
conspire with foreign governments later in life. His teachers also said that Dreyfus was known for having a
good memory, which was highly suspicious, right? You know, because a spy has to remember a lot of
things.
03:12
So the case went to trial, and Dreyfus was found guilty. Afterwards, they took him out into this public
square and ritualistically tore his insignia from his uniform and broke his sword in two. This was called the
Degradation of Dreyfus. And they sentenced him to life imprisonment on the aptly named Devil's
Island, which is this barren rock off the coast of South America. So there he went, and there he spent his
days alone, writing letters and letters to the French government begging them to reopen his case so they
could discover his innocence. But for the most part, France considered the matter closed.
03:51
One thing that's really interesting to me about the Dreyfus Affair is this question of why the officers were so
convinced that Dreyfus was guilty. I mean, you might even assume that they were setting him up, that they
were intentionally framing him. But historians don't think that's what happened. As far as we can tell, the
officers genuinely believed that the case against Dreyfus was strong. Which makes you wonder: What does
it say about the human mind that we can find such paltry evidence to be compelling enough to convict a
man?
04:24
Well, this is a case of what scientists call "motivated reasoning." It's this phenomenon in which our
unconscious motivations, our desires and fears, shape the way we interpret information. Some information,
some ideas, feel like our allies. We want them to win. We want to defend them. And other information or
ideas are the enemy, and we want to shoot them down. So this is why I call motivated reasoning, "soldier
mindset."
04:51
Probably most of you have never persecuted a French-Jewish officer for high treason, I assume, but maybe
you've followed sports or politics, so you might have noticed that when the referee judges that your team
committed a foul, for example, you're highly motivated to find reasons why he's wrong. But if he judges that
the other team committed a foul -- awesome! That's a good call, let's not examine it too closely. Or, maybe
you've read an article or a study that examined some controversial policy, like capital punishment. And, as
researchers have demonstrated, if you support capital punishment and the study shows that it's not
effective, then you're highly motivated to find all the reasons why the study was poorly designed. But if it
shows that capital punishment works, it's a good study. And vice versa: if you don't support capital
punishment, same thing.
05:44
Our judgment is strongly influenced, unconsciously, by which side we want to win. And this is
ubiquitous. This shapes how we think about our health, our relationships, how we decide how to vote, what
we consider fair or ethical. What's most scary to me about motivated reasoning or soldier mindset, is how
unconscious it is. We can think we're being objective and fair-minded and still wind up ruining the life of an
innocent man.
06:13
However, fortunately for Dreyfus, his story is not over. This is Colonel Picquart. He's another high-ranking
officer in the French Army, and like most people, he assumed Dreyfus was guilty. Also like most people in
the army, he was at least casually anti-Semitic. But at a certain point, Picquart began to suspect: "What if
we're all wrong about Dreyfus?" What happened was, he had discovered evidence that the spying for
Germany had continued, even after Dreyfus was in prison. And he had also discovered that another officer in
the army had handwriting that perfectly matched the memo, much closer than Dreyfus's handwriting. So he
brought these discoveries to his superiors, but to his dismay, they either didn't care or came up with
elaborate rationalizations to explain his findings, like, "Well, all you've really shown, Picquart, is that there's
another spy who learned how to mimic Dreyfus's handwriting, and he picked up the torch of spying after
Dreyfus left. But Dreyfus is still guilty." Eventually, Picquart managed to get Dreyfus exonerated. But it took
him 10 years, and for part of that time, he himself was in prison for the crime of disloyalty to the army.
07:26
A lot of people feel like Picquart can't really be the hero of this story because he was an anti-Semite and
that's bad, which I agree with. But personally, for me, the fact that Picquart was anti-Semitic actually makes
his actions more admirable, because he had the same prejudices, the same reasons to be biased as his fellow
officers, but his motivation to find the truth and uphold it trumped all of that.
07:55
So to me, Picquart is a poster child for what I call "scout mindset." It's the drive not to make one idea win or
another lose, but just to see what's really there as honestly and accurately as you can, even if it's not pretty
or convenient or pleasant. This mindset is what I'm personally passionate about. And I've spent the last few
years examining and trying to figure out what causes scout mindset. Why are some people, sometimes at
least, able to cut through their own prejudices and biases and motivations and just try to see the facts and
the evidence as objectively as they can?
08:35
And the answer is emotional. So, just as soldier mindset is rooted in emotions like defensiveness or
tribalism, scout mindset is, too. It's just rooted in different emotions. For example, scouts are
curious. They're more likely to say they feel pleasure when they learn new information or an itch to solve a
puzzle. They're more likely to feel intrigued when they encounter something that contradicts their
expectations. Scouts also have different values. They're more likely to say they think it's virtuous to test your
own beliefs, and they're less likely to say that someone who changes his mind seems weak. And above all,
scouts are grounded, which means their self-worth as a person isn't tied to how right or wrong they are
about any particular topic. So they can believe that capital punishment works. If studies come out showing
that it doesn't, they can say, "Huh. Looks like I might be wrong. Doesn't mean I'm bad or stupid."
09:41
This cluster of traits is what researchers have found -- and I've also found anecdotally -- predicts good
judgment. And the key takeaway I want to leave you with about those traits is that they're primarily not
about how smart you are or about how much you know. In fact, they don't correlate very much with IQ at
all. They're about how you feel. There's a quote that I keep coming back to, by Saint-Exupéry. He's the
author of "The Little Prince." He said, "If you want to build a ship, don't drum up your men to collect wood
and give orders and distribute the work. Instead, teach them to yearn for the vast and endless sea."
10:26
In other words, I claim, if we really want to improve our judgment as individuals and as societies, what we
need most is not more instruction in logic or rhetoric or probability or economics, even though those things
are quite valuable. But what we most need to use those principles well is scout mindset. We need to change
the way we feel. We need to learn how to feel proud instead of ashamed when we notice we might have
been wrong about something. We need to learn how to feel intrigued instead of defensive when we
encounter some information that contradicts our beliefs.
11:04
So the question I want to leave you with is: What do you most yearn for? Do you yearn to defend your own
beliefs? Or do you yearn to see the world as clearly as you possibly can?
11:18
Thank you.
11:19
(Applause)
Download