Susan 00:01 Technology has brought us so much: the moon landing, the Internet, the ability to sequence the human genome. But it also taps into a lot of our deepest fears, and about 30 years ago, the culture critic Neil Postman wrote a book called "Amusing Ourselves to Death," which lays this out really brilliantly. And here's what he said, comparing the dystopian visions of George Orwell and Aldous Huxley. He said, Orwell feared we would become a captive culture. Huxley feared we would become a trivial culture. Orwell feared the truth would be concealed from us, and Huxley feared we would be drowned in a sea of irrelevance. In a nutshell, it's a choice between Big Brother watching you and you watching Big Brother. (Laughter) 00:56 But it doesn't have to be this way. We are not passive consumers of data and technology. We shape the role it plays in our lives and the way we make meaning from it, but to do that, we have to pay as much attention to how we think as how we code. We have to ask questions, and hard questions, to move past counting things to understanding them. We're constantly bombarded with stories about how much data there is in the world, but when it comes to big data and the challenges of interpreting it, size isn't everything. There's also the speed at which it moves, and the many varieties of data types, and here are just a few examples: images, text, video, audio. And what unites this disparate types of data is that they're created by people and they require context. 01:57 Now, there's a group of data scientists out of the University of Illinois-Chicago, and they're called the Health Media Collaboratory, and they've been working with the Centers for Disease Control to better understand how people talk about quitting smoking, how they talk about electronic cigarettes, and what they can do collectively to help them quit. The interesting thing is, if you want to understand how people talk about smoking, first you have to understand what they mean when they say "smoking." And on Twitter, there are four main categories: number one, smoking cigarettes; number two, smoking marijuana; number three, smoking ribs; and number four, smoking hot women. (Laughter) 02:46 So then you have to think about, well, how do people talk about electronic cigarettes? And there are so many different ways that people do this, and you can see from the slide it's a complex kind of a query. And what it reminds us is that language is created by people, and people are messy and we're complex and we use metaphors and slang and jargon and we do this 24/7 in many, many languages, and then as soon as we figure it out, we change it up. 03:15 So did these ads that the CDC put on, these television ads that featured a woman with a hole in her throat and that were very graphic and very disturbing, did they actually have an impact on whether people quit? And the Health Media Collaboratory respected the limits of their data, but they were able to conclude that those advertisements — and you may have seen them — that they had the effect of jolting people into a thought process that may have an impact on future behavior. And what I admire and appreciate about this project, aside from the fact, including the fact that it's based on real human need, is that it's a fantastic example of courage in the face of a sea of irrelevance. 04:04 And so it's not just big data that causes challenges of interpretation, because let's face it, we human beings have a very rich history of taking any amount of data, no matter how small, and screwing it up. So many years ago, you may remember that former President Ronald Reagan was very criticized for making a statement that facts are stupid things. And it was a slip of the tongue, let's be fair. He actually meant to quote John Adams' defense of British soldiers in the Boston Massacre trials that facts are stubborn things. But I actually think there's a bit of accidental wisdom in what he said, because facts are stubborn things, but sometimes they're stupid, too. 04:51 I want to tell you a personal story about why this matters a lot to me. I need to take a breath. My son Isaac, when he was two, was diagnosed with autism, and he was this happy, hilarious, loving, affectionate little guy, but the metrics on his developmental evaluations, which looked at things like the number of words — at that point, none — communicative gestures and minimal eye contact, put his developmental level at that of a nine-month-old baby. And the diagnosis was factually correct, but it didn't tell the whole story. And about a year and a half later, when he was almost four, I found him in front of the computer one day running a Google image search on women, spelled "w-i-m-e-n." And I did what any obsessed parent would do, which is immediately started hitting the "back" button to see what else he'd been searching for. And they were, in order: men, school, bus and computer. And I was stunned, because we didn't know that he could spell, much less read, and so I asked him, "Isaac, how did you do this?" And he looked at me very seriously and said, "Typed in the box." 06:19 He was teaching himself to communicate, but we were looking in the wrong place, and this is what happens when assessments and analytics overvalue one metric — in this case, verbal communication — and undervalue others, such as creative problem-solving. Communication was hard for Isaac, and so he found a workaround to find out what he needed to know. And when you think about it, it makes a lot of sense, because forming a question is a really complex process, but he could get himself a lot of the way there by putting a word in a search box. 06:59 And so this little moment had a really profound impact on me and our family because it helped us change our frame of reference for what was going on with him, and worry a little bit less and appreciate his resourcefulness more. 07:17 Facts are stupid things. And they're vulnerable to misuse, willful or otherwise. I have a friend, Emily Willingham, who's a scientist, and she wrote a piece for Forbes not long ago entitled "The 10 Weirdest Things Ever Linked to Autism." It's quite a list. The Internet, blamed for everything, right? And of course mothers, because. And actually, wait, there's more, there's a whole bunch in the "mother" category here. And you can see it's a pretty rich and interesting list. I'm a big fan of being pregnant near freeways, personally. The final one is interesting, because the term "refrigerator mother" was actually the original hypothesis for the cause of autism, and that meant somebody who was cold and unloving. 08:11 And at this point, you might be thinking, "Okay, Susan, we get it, you can take data, you can make it mean anything." And this is true, it's absolutely true, but the challenge is that we have this opportunity to try to make meaning out of it ourselves, because frankly, data doesn't create meaning. We do. So as businesspeople, as consumers, as patients, as citizens, we have a responsibility, I think, to spend more time focusing on our critical thinking skills. Why? Because at this point in our history, as we've heard many times over, we can process exabytes of data at lightning speed, and we have the potential to make bad decisions far more quickly, efficiently, and with far greater impact than we did in the past. Great, right? And so what we need to do instead is spend a little bit more time on things like the humanities and sociology, and the social sciences, rhetoric, philosophy, ethics, because they give us context that is so important for big data, and because they help us become better critical thinkers. Because after all, if I can spot a problem in an argument, it doesn't much matter whether it's expressed in words or in numbers. And this means teaching ourselves to find those confirmation biases and false correlations and being able to spot a naked emotional appeal from 30 yards, because something that happens after something doesn't mean it happened because of it, necessarily, and if you'll let me geek out on you for a second, the Romans called this "post hoc ergo propter hoc," after which therefore because of which. 10:10 And it means questioning disciplines like demographics. Why? Because they're based on assumptions about who we all are based on our gender and our age and where we live as opposed to data on what we actually think and do. And since we have this data, we need to treat it with appropriate privacy controls and consumer opt-in, and beyond that, we need to be clear about our hypotheses, the methodologies that we use, and our confidence in the result. As my high school algebra teacher used to say, show your math, because if I don't know what steps you took, I don't know what steps you didn't take, and if I don't know what questions you asked, I don't know what questions you didn't ask. And it means asking ourselves, really, the hardest question of all: Did the data really show us this, or does the result make us feel more successful and more comfortable? 11:11 So the Health Media Collaboratory, at the end of their project, they were able to find that 87 percent of tweets about those very graphic and disturbing anti-smoking ads expressed fear, but did they conclude that they actually made people stop smoking? No. It's science, not magic. 11:32 So if we are to unlock the power of data, we don't have to go blindly into Orwell's vision of a totalitarian future, or Huxley's vision of a trivial one, or some horrible cocktail of both. What we have to do is treat critical thinking with respect and be inspired by examples like the Health Media Collaboratory, and as they say in the superhero movies, let's use our powers for good. 12:05 Thank you. Paul Strassman when the history of computing will be 00:02 written I'm sure Google will be noted as 00:06 a major milestone in the development of 00:09 information science and information 00:12 management we are coming to an end of an 00:16 era and therefore we have to understand 00:20 that Google represents the future of 00:25 computing in a different way and I'll 00:28 explain why this is so I want to preface 00:31 my remarks by saying that you should not 00:34 buy a Google stock this is not a Google 00:36 stock promotion here Google in fact as a 00:39 company may fail that is not the point I 00:43 want to make here I want to just say 00:45 they are the harbinger of change that 00:50 will be imitated and copied and will 00:54 then set the tone for many of your 00:56 careers the students in here judging 01:01 from their age represent maybe a 01:05 generation that will be managing and 01:08 providing leadership for information 01:10 technology from year 2010 through year 01:14 2065 if there is so then the 01:20 understanding how Google thinking 01:23 changes the environment becomes a very 01:26 important part of your education history 01:32 is very important in understanding where 01:35 we have been basically the dimensions of 01:42 change over the last 50 years can be 01:45 quantified in terms of sources and time 01:49 delay responsiveness and during the data 01:52 centric area we had hundreds of sources 01:55 impacting a corporation and gradually 01:59 going from monthly to weekly cycling of 02:02 information is the information 02:05 technology 02:07 geishas moved from finance which was 02:09 basically a monthly animal to marketing 02:13 which became a weekly animal in 1980 the 02:18 demand for responsiveness and utility 02:23 shifted and in this particular area of 02:26 which I for lack of better definition 02:29 called a workgroup centric area we have 02:32 created millions of islands of 02:35 automation centered around servers but 02:40 these were little college shops and some 02:44 of them growing to substance but 02:46 nevertheless being scattered not 02:50 integrated not interoperable not very 02:54 reliable and having a rather slow way of 02:58 responding to external situation this is 03:02 what I call the Microsoft Intel era the 03:08 Google era which I will be discussing 03:11 today again this is symbolic deals not 03:16 with data or text but with multimedia it 03:20 deals with billions of sources of 03:23 information and basically shrinks the 03:27 information latency to real-time and by 03:32 the time we go into the generation of 03:36 2015 2025 03:39 a real-time responsiveness then becomes 03:43 the currency under which systems operate 03:48 these systems I call Network centric I 03:53 use just one of many examples but I'm 03:56 sure you recognize this particular 03:57 diagram this is where in real time 04:02 interaction between satellites drones 04:06 aircrafts cruisers guided missile 04:10 launchers and so forth are necessary in 04:12 order to execute 04:15 mission now this is a military example 04:19 which is appropriate in the setting here 04:21 in the Washington area but even when 04:23 you'll start looking at environments 04:26 like Federal Express for instance they 04:30 are all going to it network-centric real 04:33 time environment so we are already 04:36 moving in that direction now here are 04:41 the specifications ladies and gentlemen 04:44 that you will have to deliver in your 04:46 careers first the system that you 04:50 deliver has to be extremely reliable has 04:53 to be down less than five minutes a year 04:58 that six to eight sigma reliability you 05:02 must be able to represent the real-time 05:07 awareness of the situation in a very 05:11 fine very rapid color high 05:15 discrimination display you must be able 05:19 to tap into a multimedia environment at 05:22 least at gigabyte per second the latency 05:26 cannot exceed more than a quarter of a 05:30 second globally anywhere in the world 05:32 and if you want to innovate and change 05:35 the environment you must be able to 05:38 innovate in less than a day while at the 05:42 same time assuring security to an 05:45 extremely high level of fidelity so 05:48 those are ladies and gentlemen the tasks 05:52 before you and you don't have much time 05:55 to deliver because there are customers 05:57 out there who involved exactly meeting 06:00 of these kind of specifications now the 06:06 good news is that it's an awesome 06:08 opportunity because when you compare 06:11 what we have today in the client-server 06:14 group we're kind of a group work 06:17 environment we look pretty sick let me 06:23 just give you the highlights of what I 06:26 see 06:27 typical specifications first when you 06:31 look at any budget of any logical 06:35 operation and certainly when you look at 06:37 any of the government budgets especially 06:40 in the Department of Defense the much of 06:44 the effort of the IT spending is devoted 06:49 strictly to staying alive and getting 06:52 the thing patched so that it sort of 06:54 doesn't fall apart I call that the 06:57 infrastructure over 50% of the money is 07:01 being spent by people running around 07:04 during the break I talked to a number of 07:07 you who are night students who are 07:09 systems administrator who make a living 07:11 by making sure that there's paper in the 07:14 printers and that nobody has kicked the 07:17 cables on the at the desk and so forth 07:20 but you're part of infrastructure which 07:23 is important but there is no way of 07:26 creating value just from the 07:28 infrastructure I deliberated very much 07:32 on the subject of what is the current 07:34 performance of security the only 07:38 quantitative number I could find was an 07:41 on boolean question mark when you look 07:46 at the cost of fielding an application 07:49 you find that much of the ridging of an 07:53 application into systems environment 07:57 consists of integration sort of just 08:00 splicing things performing neurosurgery 08:05 on a run on a runner who is bleeding now 08:10 that's a good analogy and then of course 08:15 you have network downtime and I'm being 08:19 very kind but network downtimes 08:23 particularly on email availability in 08:26 many of the operational functions is 08:28 deplorable and if you want to innovate 08:31 anything you have to have a feasibility 08:34 study and god help you if you have to go 08:36 through DoD acquisition 08:38 you know that can extend infinitely and 08:43 indefinitely so here is the gap between 08:46 what is needed and what we have the 08:52 question is can we get from where we are 08:56 to where we are going and the conclusion 09:01 is very simple namely that you cannot 09:06 design network centric systems with the 09:12 existing workgroup centric architecture 09:14 just not doable you cannot design an 09:20 airplane using railroad technology just 09:24 another example even good trains don't 09:29 fly so so the this is a very important 09:35 good-news-bad-news kind of a slide the 09:40 good news is that we know where we need 09:42 to go the bad news is that all those of 09:46 you who are with it in your careers to 09:48 work group centric architecture based on 09:52 what you study these days is going to be 09:54 obsolete so you might as well get ready 09:57 for a different view of the world of the 10:00 future and here are four principles 10:05 which I see are manifested in Google and 10:11 they are of course many variants on this 10:15 thing but here it goes first if you want 10:20 to have reliability if you want to have 10:23 uptime if you want to every dungeon see 10:26 you have to build and operated protected 10:28 Information Network the current internet 10:33 is not a protected information 10:36 environment what you have is an 10:40 outgrowth of a very clever 10:42 DARPA actually ARPA research effort 10:46 build by the professors for professors 10:49 and for students 10:51 all you have now is an extrapolation of 10:54 that that is not going to be robust 10:57 enough to meet their requirements so if 11:00 you want to build your secure 11:04 environment on the existing 11:06 second-generation work group environment 11:09 internet it won't work you need a 11:11 different environment second you have to 11:16 offer Universal connectivity for 11:19 collection processing storing of 11:21 information and you must provide secure 11:24 communications now too is the principle 11:29 number two is a mouthful which leads 11:32 really to principle number three namely 11:36 that in order to achieve the 11:39 interoperability in the collection you 11:42 must maintain shared data models in 11:45 other words when you study the Bible in 11:51 the book of Genesis chapter 11 you may 11:55 those of you who are biblical scholar 11:56 remember that when the good Lord wanted 12:00 to confound human affairs he scrambled a 12:03 data dictionary 12:04 now they say loose interpretation of the 12:07 Bible but I'm sure you get what I mean 12:10 we must have here data models in order 12:13 to have interoperability across various 12:16 functions but fourth is actually the 12:20 most important one is this cannot be 12:22 playing the future cannot be playing in 12:24 some kind of a master design it has to 12:27 be evolutionary it cannot be obtained by 12:31 issuing an RFP for to the usual Beltway 12:37 consultants saying will give me a 12:40 network centric design it just cannot be 12:43 done the network centric principle is I 12:49 will demonstrate in a moment depends on 12:51 candela 12:51 continue upgrading in innovation and 12:54 experimentation so 12:58 what I will do rather than talk in 13:01 abstract I will use the Google 13:05 principles as an illustration of how 13:10 these principles are actually executed 13:15 the first one is building and operating 13:18 protected information network the 13:25 fundamental reality of our Google lead 13:28 is not a search application Google is 13:34 not a search application it's the only 13:36 way how they make lots of money it's the 13:39 only way 13:40 it's the secret juice that they get the 13:43 secret formula for getting up what is 13:47 basically a massive parallel processing 13:52 environment consisting of clusters and 13:55 in each cluster you have racks and in 14:01 each racks you have machines you have 14:04 multi multiple CPUs and basically you 14:10 have clusters and whether they are 20 or 14:15 40 nobody knows this is something that 14:18 is held very tightly but the underlying 14:22 fact about the reality of the Google 14:26 architecture is the fact that it is a 14:28 massive parallel machine consisting of 14:33 lots of clusters which consists of lots 14:36 of servers maybe 200,000 maybe 300,000 14:40 servers all connected all working 14:42 together and therefore what Google has 14:45 is the world's largest computer although 14:49 it is in small pieces fairly simple 14:53 machine once you hook up logically 14:57 hundred thousand 200 or 300 thousand 15:00 servers you have a supercomputer the 15:04 likes of which nobody has ever seen now 15:07 I also want to point out to you that the 15:10 way how these things are done is that 15:14 each of the clusters is basically the 15:16 same architecture in other words each 15:20 cluster is what's called index and 15:23 awareness of what exists in the network 15:26 it is all the documents in other words I 15:30 will be showing you for instance 15:31 references to post draftsman in Arabic 15:35 just happened to hit on that that is 15:39 most likely hosted in a document server 15:42 in Singapore it's surely not being 15:45 hosted in New York or in Washington but 15:49 if you make an inquiry in Connecticut 15:51 the index server would know that this 15:55 record is available in Singapore and the 16:01 duplicate of that record is also backed 16:04 up somewhere in the Pacific and then 16:08 brings that particular applications 16:11 through a web server into a web switch 16:13 and brings it to my desktop in New 16:17 Canaan Connecticut in less than quarter 16:19 of a second and so it is the duplication 16:26 of identical architectures which is the 16:30 secret sauce of Google it is the massive 16:35 parallel application of enormous amount 16:38 of computing power in a very organized 16:41 way self aware as part of the network in 16:46 order to find information and then 16:49 combine information from various sources 16:55 those of you who are involved in the 16:58 arcane art of building data centers I 17:02 was able to buy some subterfuge actually 17:07 show you a picture 17:08 what a cluster looks like I was told and 17:12 this was into an internal when you do a 17:16 search on Google they really 17:19 very little information unless you get 17:21 little devious and once in a while they 17:23 slip up and this particular operator 17:27 claimed that they put it up in three 17:28 days subsequently I found out that 17:32 setting up a Google cluster in three 17:35 days is just too slow and too 17:39 labor-intensive my latest intelligence 17:42 which are revealing today for the first 17:44 time is that Google has now 17:48 containerized clusters and they can be 17:52 drop shaped and put up in less than 17:55 eight hours anywhere in the world now it 17:59 is vast implication from the national 18:02 security and defense standpoint because 18:04 this is exactly the kind of capability 18:07 you need in the battlefield so what is 18:14 then the secret to Google 18:16 it is the infrastructure you have over 18:21 two and a thousand custom-built 18:22 commodity servers these are custom built 18:25 by a foundry in Taiwan this is no fancy 18:31 architecture no raid disk 18:33 these are off-the-shelf low-cost boxes 18:37 by the way you can buy one of those 18:39 boxes with some reservations so that you 18:42 can actually have a look at these things 18:44 these are basically pizza like kind of 18:47 boxer you slide them in and they are 18:50 billed as a full-color and hardware 18:54 which means any one of those servers 18:57 when they fail don't matter they are not 19:01 only plug replaceable but immediately at 19:05 least one and sometimes as many as three 19:07 servers pop in because the index knows 19:10 where the backup is now I don't know 19:19 whether your classes include petabytes 19:21 as a is a scale about a petabyte a 19:28 thousand terabytes terabyte is thousands 19:32 bites very quickly you can sort of 19:34 multiply the numbers or we know it's 19:37 more than five petabytes and growing 19:39 rapidly each server is 80 gigabytes 19:44 although is the disk become cheaper and 19:47 the prices drop they just yank out the 19:51 the the disks in factory refurbish it 19:55 and just put it right back one of the 19:58 interesting aspects of the servers on 20:01 the Google network is that when you move 20:05 from Pentium to the whatever the next 20:08 gizmo is it doesn't matter because the 20:12 functionality is not dictated by the 20:14 microprocessor which means it lowers the 20:17 cost you must understand everything that 20:19 I'm describing here is dirt cheap it's 20:23 standard low cost commodity hardware 20:30 when I compared some of the costs for 20:35 infrastructure the cheapest next 20:39 infrastructure is there of Sun 20:40 Microsystems which in my view is at 20:44 least four to six times more expensive 20:46 then this particular configuration when 20:50 you add the bloated Microsoft server 20:54 environment you are dealing with a large 20:58 multiplier of cost so you must 21:00 understand what drives this particular 21:03 environment is cost cost and cost now 21:08 course doesn't come for nothing the the 21:14 the the way how you compensate for the 21:18 fact that you have lots of cheap 21:19 Hardware you compensated by software and 21:23 that means that in addition to serving 21:26 machine cycles from the customer which 21:30 is you most of the Machine cycles are 21:34 really devoted to the system operating 21:37 itself and becoming aware in other words 21:40 the data gets moved 21:44 in the whole global network as the men 21:48 arises in certain frequency 21:51 distributions take place so this is a 21:53 massively parallel self adjusted self 21:56 healing self adaptive environment the 22:00 complex is mathematically very calm very 22:04 the indexing is complex it is a 500 22:10 million by two billion matrix if there 22:15 are any operations research people in 22:17 here or mathematicians they would 22:19 understand this is an awesome 22:24 mathematical exercise which of course 22:27 operates during the time as the demand 22:30 circuit circles around the ural in the 22:33 world there's always machines and 22:36 clusters who are not active and during 22:39 that time the machines just go and work 22:41 on themselves by the way this is the way 22:44 the human brain works the reason you 22:46 need sleep whether you know it or not is 22:50 you need idle time for your sensory 22:52 perceptions in your brain to be 22:55 re-indexed that's why the denial of 22:58 sleep is basically a way of ultimately 23:01 disorienting a human being your brain 23:04 require sleep in their particular case 23:07 it's called indexing re-indexing the 23:13 capital and operating costs our fraction 23:15 of commercial servers the extremely 23:18 scalable the traffic is growing 20 to 30 23:21 percent per month 23:22 you sort of wonder how long can this go 23:24 on and and we don't know but the data 23:29 centers are growing and being drop 23:31 shipped as conditions warrant now 23:37 because we are here in the national 23:39 security area I want to point out that 23:42 replication is the way how you deal with 23:46 reliability much of the inheritance from 23:51 the IBM and Microsoft era is 23:54 if you want reliability you'll just pile 23:56 on more functions more code as well self 24:00 up you you you just put more code into 24:04 an operating system the operating system 24:07 is really a stripped-down Linux version 24:10 proprietary by the way but the reliance 24:15 is on redundancy rather than on layering 24:19 software and the replication is done for 24:24 proximity and response which means you 24:26 replicate depending on the way that the 24:30 man appears so for suddenly there is 24:36 interest in New Orleans many of the data 24:41 that deal with New Orleans would be 24:43 moved from wherever they are to where 24:46 the demand would be arising think those 24:50 of you who are in the department of 24:54 systems engineering and in software 24:57 should understand that reliability can 25:01 be achieved with software and 25:03 architecture not with hardware those of 25:07 you who understand the way we do 25:10 reliability now right now for instance 25:12 you want to allow a reliable desktop you 25:15 will take your hard disk and make it a 25:18 raid disk to operate the array or far up 25:22 raid array they don't have any rein in 25:25 the system the whole system as a system 25:28 is a raid so you then rely on indexing 25:35 for response by moving transactions and 25:40 data to point of use we do it 25:43 do this dynamically and by the way the 25:50 it's easy to do indexing of text they 25:55 are now doing indexing of images which 25:58 is really the future challenge and their 26:02 dynamic indexing is very demanding and 26:05 of course you 26:07 started taking messages and the way you 26:12 do it if you take the index and you 26:15 break up the index so the index is not 26:18 in one place 26:19 the index is broken in two what's good 26:23 they call shards and they are 26:26 distributed across data centers so you 26:28 could kill any one data center now I 26:32 have no evidence corroborated evidence 26:35 that this is so but there was a 26:37 particular moment for a number of 26:39 reasons where they lost half of the 26:42 shards and they were still operating 26:44 only with a small delay in latency now 26:52 how does this work well the the issue 26:58 really has to deal with the query 27:00 serving infrastructure you in this kind 27:05 of a distributed environment you cannot 27:09 think in terms of a particular server 27:11 that has a particular database or Oracle 27:18 file and so forth and you hit against 27:20 and you want an answer you really when 27:25 you assemble a display and I'll be show 27:28 you some displays in a moment you may 27:31 have to go to many places where there 27:33 pieces of this display there are pieces 27:37 to the answer that you never know how 27:39 the question will be asked and the 27:43 question is not standard because the 27:45 question can be totally improvised it 27:49 can be in different languages which I'll 27:51 be demonstrating to you there's over 80 27:53 languages in which a particular question 27:56 can be asked yet the answer it still has 27:58 to come out to a particular disk top in 28:01 the context of that equerry and so that 28:05 means that in on in order to answer 28:08 certain complex questions particularly 28:11 the question is some boolean searches 28:13 you may actually involve the cooperation 28:16 over more than thousand servers index 28:19 servers 28:20 data servers and web servers in order to 28:24 answer their particular questions and 28:28 that means that the document servers 28:31 also have to then look at the pieces of 28:34 information which fits a particular 28:35 inquiry this is a subject this slide 28:40 alone would be worthy of a semester of a 28:45 course in engineering now how do you in 28:53 the world keep this thing going 28:56 particularly if you keep popping in 28:59 these clusters you know drop shipping 29:01 them you know and then plugging them in 29:03 and you suppose they're supposed to take 29:05 over they have a private proprietary 29:11 system called the MapReduce server 29:14 system which is highly proprietary and 29:19 they basically coordinate all the 29:22 servers in real time and distributes the 29:25 workload so that if the workload and 29:30 then I've seen the diagram of the work 29:32 holds the workload shifts with the time 29:35 time zone in which you operate and 29:39 therefore in order to optimize the use 29:42 of the system you basically have to 29:44 distribution of the workload in order to 29:47 keep both reliability and latency 29:50 up-to-date and you also must have a 29:57 capability in this environment to 30:01 reconstitute the service in case any 30:04 particular server or pieces of a cluster 30:07 or a component fails and therefore you 30:12 must have an operating system the likes 30:15 of which the world has never seen and 30:18 it's not an operating system it's 30:20 basically a master scheduler scheduler 30:22 which then monitors the performance 30:25 because it's a really an automatic 30:28 system that keeps track of itself 30:33 okay so so far this has been easy I hope 30:36 you appreciate the elegance of this but 30:40 it's a tough order and some of you who 30:44 have been in this business for a while 30:46 and you have built systems data centers 30:52 or a client-server system understand 30:54 that this is a degree of complexity 30:58 several orders of magnitude over what is 31:00 your experience now let's go now to the 31:03 second principle which is universal 31:05 connectivity by the way before I finish 31:11 I want to point out to you that all my 31:15 presentations here come with a 90-day 31:17 warranty which none of your professors 31:20 ever give you my warranty consists of 31:23 the fact if you have any question about 31:24 a specific slide any of the handout you 31:28 can email to me the question and if it's 31:34 decent and answerable another request 31:37 for free consulting I will put it on a 31:39 blog so you'll be able to see who is 31:42 asking what questions okay alright 31:48 universal connectivity principle number 31:50 two when we say universal it means that 31:56 it has to interface in many languages 32:01 although English is a dominant language 32:07 among computer experts computer experts 32:10 represent only a tiny fraction of the 32:13 human population and the question about 32:17 proportion of the human intelligence 32:18 that represents is still arguable 32:20 nevertheless the world is multilingual 32:23 multicultural and here's an example of 32:26 how the various languages get responded 32:32 to I know some of these languages I 32:34 tested it and I was amazed again 32:38 here is my inquiry and apparently I was 32:47 able to find a quotation in Iran in in 32:55 Arabic documentation published in 32:59 Philadelphia of all places and then some 33:05 kind of a reference to quotation and 33:11 these were searches of Arabic pages 33:15 again I'm just showing to you this as an 33:19 example of collectivity another example 33:25 of connectivity which is not only text 33:28 is that you may wish to go back and 33:33 actually go to Google video and see my 33:39 prior presentations as well as this 33:42 presentation that will be posted after 33:45 Google will make sure that this is 33:48 decent and awesome and doesn't violate 33:51 any laws so you will be seeing my 33:58 presentation of today most likely in 34:02 about a month or so now here is one that 34:08 is sort of interesting this is a new 34:11 application called Google base and in a 34:14 moment I'll be talking more about 34:15 applications and innovation but now now 34:20 the thing becomes very interesting the 34:23 question that was asked locate events 34:26 within 45 miles of New York in November 34:29 of 2005 now that you know this is a 34:33 pretty sophisticated kind of a question 34:36 and it shows that there is a BCD 34:43 and if you want to know D which is on 34:47 the near the docks on what looks to me 34:51 like 34th Street and unsavory district 34:54 by the way it would tell me it's souls 34:58 on all re which and there you can click 35:02 on that and and you can you can find out 35:07 what's going on and the 531 West 25th 35:15 Street in the mid district and you can 35:19 then click on it and find out what that 35:21 is by the way since this is typically 35:26 done as a Google innovation all of that 35:31 is called a beta version they have they 35:34 have over hundred beta applications as 35:37 you see report bad item they are asking 35:40 the customers this is one of the major 35:42 innovations on a part of the Google way 35:46 of doing things is the the customer is 35:49 the test and the tester of 35:52 of the application what is becoming very 36:01 important however from from a standpoint 36:05 of the future has to deal with what's 36:09 called semantic parsing because if you 36:13 will just index on the keywords you 36:17 would get just too many answers and 36:19 therefore you really have to parse 36:21 things in context and in this particular 36:25 case somebody wanted to know Bay Area 36:28 cooking classes and apparently there are 36:33 related items this is called semantic 36:37 parsing this is how your brain works 36:38 your brain basically particularly parses 36:42 what people say and relates it what you 36:45 don't know to what you know now going to 36:50 principle number three shared data 36:52 models I cannot 36:56 overestimate the importance of this 36:59 particular development particularly from 37:01 the standpoint of national security you 37:04 need to have a standard file system on 37:09 the hundred thousands of servers in 37:11 order to cooperate with one another now 37:14 the the text may be written in Arabic or 37:18 in English or whatever else but neveress 37:23 from a structural standpoint they must 37:25 be interoperable and they must be able 37:29 to be scheduled so that somebody asking 37:32 a particular question can get the answer 37:34 and therefore what I'm trying to tell 37:38 you here if you ever want to build for 37:41 an organization like this uh google-like 37:44 system you have to think very hard about 37:47 the kind of environments 37:50 that you are going to put into the 37:52 infrastructure in order for that 37:55 environment to be responsive to what 37:58 increasingly in the battlefield becomes 38:01 totally ad-hoc unprecedented inquiry or 38:07 request for information 38:13 so the the entire engine of detailed 38:18 profiles speaks on an intubated 38:22 directory I don't want to go into the 38:25 subject of metadata directory this is 38:27 very important I must tell you that the 38:30 partners of the range sustained a great 38:32 deal of money 38:33 in the 1991-1992 period of building a 38:37 metadata directory so that friendly-fire 38:42 would not take place because the data 38:44 were for the coordinates were not 38:47 compatible 38:49 I'm alluding to something that really 38:51 happened so that means that the data 38:55 transfers if you have thousand servers 39:00 cooperating the data transfers from 39:04 machine to machine must take place at 39:07 the Machine level because of the speed 39:10 again if you're shooting for redundancy 39:13 accountability in low latency data 39:18 transfers came up negotiate meaning and 39:21 that means that you must have an 39:24 extremely fast processing taking place 39:28 within the system in order to be able to 39:31 execute the transactions and again the 39:37 chunks is they are being processed after 39:39 placated again in case something gets 39:41 lost so the entire network is 39:44 probabilistic not deterministic they are 39:51 here is an example for its data 39:54 dictionary for interoperability and by 39:57 the way this is this is the data tags 40:02 which are available to developers so for 40:07 instance 40:08 the Google map environment is now 40:11 available to developers to integrate 40:13 into their own application now in order 40:18 for my application which may be 40:22 organization specific and for instance 40:27 if I'm the hand of Maxwell House coffee 40:29 and I want to know which stores or which 40:32 neighborhoods are deep consumers Olmecs 40:35 full house one-pound drip dry the drip 40:40 drip coffee chain's I would then 40:44 construct an application and then go to 40:47 google map using their tags in order to 40:50 feed my Maxwell House application from 40:54 the lab so that the map display that it 40:58 showed where the other stores where the 41:00 coffee drinkers are which leads me 41:05 really to the whole issue of api's this 41:09 is the key to the future of 41:12 network-centric operation you cannot 41:17 rely on a single organization Google 41:20 there's only six thousand employees and 41:23 they are very choosy about over the edge 41:26 of the paper the objective really is to 41:30 engage developers thousands and ten 41:33 thousand if you want to take one hot tip 41:36 away from a career standpoint from this 41:39 presentation the young student series if 41:43 you really want to make lots of money 41:45 and sign up the Google developer and so 41:56 you can then create environments by 42:00 using their hooks into the database 42:03 which is out there which is growing and 42:07 then coming in 42:10 interfacing the environment to this API 42:15 then fix the massive pedo computer with 42:22 your particular application courses 42:26 feeds directly into principal number 42:28 four which is upgrading innovation this 42:32 thing will not work unless it's dynamic 42:36 you must understand there are lots of 42:38 search engines out there search engines 42:42 are not that hard to put up what is 42:45 different here is we've created a 42:49 massive parallel computer that's growing 42:51 and that's sucking in data and then 42:54 makes available the possibility to very 42:59 quickly you as the infrastructure 43:02 without incurring the cost of the 43:04 instructure and then grow your 43:05 application on top of it here's a 43:08 partial miss which you want to become 43:11 acquainted with some of the very neat 43:15 very smart 43:18 many of these are still experiments but 43:22 many of these online services have been 43:27 fielded in less than three months by a 43:31 team of programmers well the list 43:33 density of four people they have a 43:37 server and plus tree developed it's 43:39 strictly for development and all the 43:43 Google people are encouraged to spend at 43:47 least one day a week paid time to play 43:51 around with innovation it's a fun thing 43:55 to do 43:56 you need to do I'm lucky enough to have 43:57 that privilege and so you can then do an 44:02 infinite number of things this one I 44:05 like this is called frugal and all these 44:10 things get very cute of course but you 44:13 can now use the master computer to 44:17 extract from Samsung who's abend or not 44:22 only the catalog of what they have but 44:27 also the manuals princes you know I've 44:30 got lots of the electronic widgets and 44:32 you know I'd instead of this big folder 44:33 full of manuals which were always 44:37 obsolete I don't have any manuals 44:39 anymore I just go and if I look for 44:42 something I just 44:43 frugal the manual for a particular 44:46 contraption I have and I get the latest 44:48 release and can analyze it and of course 44:51 it tells me where you can buy it and 44:54 then you can put cute things like the 44:55 price range you on the luggage stores 44:59 which are nearby and so forth and so 45:01 forth and of course you into 45:04 experimentation this is clearly clearly 45:10 environment that encourages innovation 45:14 and should be prototype of how our new 45:21 transformational efforts in both the 45:23 national intelligence and in defense 45:25 should be guided here is an example 45:33 I mention that I will put all of your 45:35 inquiries on the night of my on my blog 45:43 again I this is an application which is 45:48 inside Google in 15 minutes 45:54 I can put up an application with my 45:56 picture with my archives and with a 45:59 whole history publish the bulletin board 46:20 now all this thing is sort of 46:23 interesting because there's a big how 46:26 far you can only go from the master 46:29 computer you actually for certain 46:32 applications you have to stop occupying 46:35 the desktop you have to lost God 46:37 occupying the input up here is a Google 46:42 video viewer I don't want to get into 46:46 the details it is very sophisticated but 46:49 it means that in certain instances you 46:52 would have to put a handle into your 46:56 machine in order to interface with a 46:58 particular application now whether this 47:01 is an attack of like yourself is being 47:03 debated as we stand here and of course 47:08 that means that for the first time I 47:11 identify myself as a person maybe 47:17 membership or security in participation 47:23 network centric environment so let me 47:28 now conclude I understand that I had 60 47:34 minutes 47:39 let me leave with you a set of 47:42 comparisons so that you could relate 47:45 what you have today the world today can 47:54 be represented by this diagram the blue 47:58 dots are servers mostly Microsoft 48:02 servers and then each service a little 48:05 island owned by a seesaw or by an 48:09 operator or by a union or contracted to 48:12 a particular a contractor and so forth 48:15 there's zillions of live all over the 48:17 world providing employment to two people 48:21 who then must do the updating do the 48:27 desertification or whatever you call the 48:30 elimination of doing the updates that 48:35 come in almost weekly and so forth and 48:38 so this is the application workgroup 48:42 computing today these millions of local 48:44 applications and local data the only 48:47 analogy to this world is you know in the 48:51 12th century where every little town and 48:54 they're all a shoemaker 48:56 their own cabinet maker and their own 49:00 spinner and so forth 49:03 no economies of scale and this was 49:08 basically a craft environment the 49:13 problem with that environment is is very 49:16 vulnerable when you look at the 49:20 vulnerability today this is just keep 49:23 out is you see that in order for these 49:28 little enclaves of computing to manage 49:34 themselves in a year of increased 49:37 complexity are they heading covered in 49:41 complexity and any hardware and every 49:44 time they do it they become more 49:46 vulnerable so we are coming to an end of 49:50 a period where both dis da balloon 49:54 applications in elements faces 92 49:58 percent of all the desktops in the world 50:00 are Microsoft desktops and the 50:04 cross-platform applications are 50:06 increasingly vulnerable to attack and 50:09 compromised by the way the other system 50:13 certainly unique new systems have other 50:16 and therefore Cisco system that deal 50:20 with the switching also a vulnerable but 50:23 the fact that we have designed network 50:26 switching zipper on the Cisco from the 50:30 glands and from the lands means that 50:33 each of them standing on their own each 50:36 be given time had to have their own 50:41 defenses their own mouths their own 50:45 guards and so forth where the Turks were 50:47 coming and didn't do very well when the 50:51 Turks came in massive like I'm looking 50:54 from history of the town where I come 50:56 from the new Internet is I see is that 51:02 there's going to be billions of browsers 51:05 and I mean literally billions every cell 51:09 phone is a browser and you have browsers 51:13 which are music browsers and I polished 51:17 and then you there an unlimited in 51:21 degree of imagination for doing this 51:23 thing and they have to all share because 51:27 each of these browsers have to be window 51:31 into a multiplicity of functions and 51:34 therefore that is where we are going is 51:37 an architecture so where does it go 51:42 ahead the strategy of the last 20 years 51:46 is being captured in desktop it was done 51:48 very successfully very constructively by 51:51 Microsoft and they deserve all the 51:55 billions of profits that economy the new 52:00 era is that the desktop is not a 52:04 sustainable defense position for anybody 52:07 anymore 52:08 and the only defense defensible position 52:12 today these dozen can occupy the 52:14 Internet and by the way this doesn't 52:16 have to be one company multiple 52:19 organizations can occupy the Internet an 52:23 internet would be a different internet 52:25 than what we have today now what is 52:28 different from an economic standpoint is 52:30 in the word probe centric environment 52:33 the vendor sells you the software 52:35 license and you put in the labor and 52:39 capital costs you take the risk the 52:43 vendor takes no risk at all and you have 52:49 no recourse the network centric 52:52 environment fundamentally changes the 52:55 economics by placing the labor in 52:58 capital into the network without huge 53:02 economies of scale and where the 53:04 knowledge capital which is a term I will 53:08 be using in my next presentation by the 53:11 third lecture he's really dominating the 53:14 solution the worker of centric 53:18 environment created isolated silos or 53:23 whatever term is being used these days 53:25 which are using specific infrastructures 53:29 the problem is that the infrastructures 53:33 are trying to communicate with one 53:35 another and it's done with great 53:37 difficulty 53:38 you must have an infrastructure that 53:41 Universal certainly from a national 53:43 security standpoint that is necessary 53:47 now the problem with workgroup syndrome 53:50 environment is there is too much labor 53:52 too many consultants too many interns no 53:57 disrespect 54:00 I'll pay for their tuition no it just 54:05 means that the user is put moats around 54:08 this little castle and there is lots of 54:11 labor to control what happens inside the 54:14 castle Google approach basically is that 54:18 this is too complicated cannot be done 54:22 by users or by people it is to be 54:25 automated the workgroup environment its 54:30 operating system depends in implement 54:34 the Edward centric environment must be 54:37 open source browsers you have a totally 54:41 different economic model which really 54:43 deals with demand pricing and most 54:49 importantly has to do with intelligence 54:52 data read from file has no context it is 54:58 a data and then your brain has to look 55:01 at the gaze of the screen and figure out 55:03 what it means the amount of data 55:06 increases exponentially the brain cycles 55:09 are not really enough to deal with that 55:11 so what you need to do is the sample 55:14 data holding context if you are a 55:16 infantry to a leader in a ditch 55:20 somewhere in front of some godforsaken 55:22 village you don't want to have a data 55:26 dump from the satellite in gigabyte need 55:31 only an answer to a very simple question 55:33 about is this village safe or is it 55:36 being tested out and do we have someone 55:40 in there that we can trust so the future 55:47 is I see is that everything will be on 55:51 the Internet everything your cell phone 55:55 your oven your refrigerator 56:00 everything the selling product 56:04 electric meter on your car for 56:07 maintenance real-time maintenance doing 56:11 preventive maintenance or the karim 56:12 disorders have today you know their cars 56:15 we have GPS built-in and diagnostics 56:21 transmits to to general motors for if 56:25 you paid enough for a car you get that 56:27 kind of service and that means that all 56:30 data voice video and sensor input will 56:32 be accessible selectively is needed and 56:36 that means that if you want to be in the 56:38 telephone business the TV business on 56:41 the print business and newspaper 56:42 business you better get yourself a 56:45 second job to get ready for the time 56:48 when people will stop this missing 56:50 staffs from these organizations the 56:55 future is in services looking technology 56:58 services that respond to questions as 57:02 needed by the consumer information is 57:07 displayed in the context that is 57:09 relevant to the culture personality and 57:12 habits of the customer and the 57:15 applications are there for making 57:18 decisions what do I buy where do I go 57:21 lovely boy 57:25 now why is this important a whole 57:31 national security environment whether 57:34 it's Homeland Security Department of 57:36 Defense 57:36 intelligence really depends on the 57:40 ability to have a superior intelligence 57:43 in order to deal with the challenges 57:46 competitive challenges which are 57:48 economic and terrorist threats of the 57:50 21st century we cannot do it with 57:54 workgroups anymore we must transform in 57:58 the Turkish transform to Network centric 58:00 services and we don't need much time to 58:02 do it now you kiddo just go in and blow 58:06 up what you have today you must be able 58:09 to migrate and the way you migrate your 58:13 vibrates with displacement leave what 58:16 you had and put the new stuff and then 58:19 build it and they will come as who those 58:22 of you who have seen the movie build it 58:25 and see and let the customer decide 58:28 whether they get viable service and then 58:31 pocket the money which the days getting 58:34 scarcer and scarcer invested in 58:36 innovation so in conclusion then I hope 58:40 that the relevance of Google is a future 58:45 vision of the environment it would be 58:49 something that will be useful to you and 58:51 I certainly wish you well and I will 58:53 answer any question that you submit to 58:55 me by email English (auto-generated) David 00:12 It feels like we're all suffering from information overload or data glut. And the good news is there might be an easy solution to that, and that's using our eyes more. So, visualizing information, so that we can see the patterns and connections that matter and then designing that information so it makes more sense, or it tells a story, or allows us to focus only on the information that's important. Failing that, visualized information can just look really cool. 00:38 So, let's see. This is the $Billion Dollar o-Gram, and this image arose out of frustration I had with the reporting of billion-dollar amounts in the press. That is, they're meaningless without context: 500 billion for this pipeline, 20 billion for this war. It doesn't make any sense, so the only way to understand it is visually and relatively. So I scraped a load of reported figures from various news outlets and then scaled the boxes according to those amounts. And the colors here represent the motivation behind the money. So purple is "fighting," and red is "giving money away," and green is "profiteering." And what you can see straight away is you start to have a different relationship to the numbers. You can literally see them. But more importantly, you start to see patterns and connections between numbers that would otherwise be scattered across multiple news reports. 01:30 Let me point out some that I really like. This is OPEC's revenue, this green box here -- 780 billion a year. And this little pixel in the corner -- three billion -- that's their climate change fund. Americans, incredibly generous people -- over 300 billion a year, donated to charity every year, compared with the amount of foreign aid given by the top 17 industrialized nations at 120 billion. Then of course, the Iraq War, predicted to cost just 60 billion back in 2003. And it mushroomed slightly. Afghanistan and Iraq mushroomed now to 3,000 billion. So now it's great because now we have this texture, and we can add numbers to it as well. So we could say, well, a new figure comes out ... let's see African debt. How much of this diagram do you think might be taken up by the debt that Africa owes to the West? Let's take a look. So there it is: 227 billion is what Africa owes. And the recent financial crisis, how much of this diagram might that figure take up? What has that cost the world? Let's take a look at that. Dooosh -- Which I think is the appropriate sound effect for that much money: 11,900 billion. So, by visualizing this information, we turned it into a landscape that you can explore with your eyes, a kind of map really, a sort of information map. And when you're lost in information, an information map is kind of useful. 02:55 So I want to show you another landscape now. We need to imagine what a landscape of the world's fears might look like. Let's take a look. This is Mountains Out of Molehills, a timeline of global media panic. (Laughter) So, I'll label this for you in a second. But the height here, I want to point out, is the intensity of certain fears as reported in the media. Let me point them out. So this, swine flu -- pink. Bird flu. SARS -brownish here. Remember that one? The millennium bug, terrible disaster. These little green peaks are asteroid collisions. (Laughter) And in summer, here, killer wasps. 03:42 (Laughter) 03:50 So these are what our fears look like over time in our media. But what I love -- and I'm a journalist -- and what I love is finding hidden patterns; I love being a data detective. And there's a very interesting and odd pattern hidden in this data that you can only see when you visualize it. Let me highlight it for you. See this line, this is a landscape for violent video games. As you can see, there's a kind of odd, regular pattern in the data, twin peaks every year. If we look closer, we see those peaks occur at the same month every year. Why? Well, November, Christmas video games come out, and there may well be an upsurge in the concern about their content. But April isn't a particularly massive month for video games. Why April? Well, in April 1999 was the Columbine shooting, and since then, that fear has been remembered by the media and echoes through the group mind gradually through the year. You have retrospectives, anniversaries, court cases, even copy-cat shootings, all pushing that fear into the agenda. And there's another pattern here as well. Can you spot it? See that gap there? There's a gap, and it affects all the other stories. Why is there a gap there? You see where it starts? September 2001, when we had something very real to be scared about. 05:06 So, I've been working as a data journalist for about a year, and I keep hearing a phrase all the time, which is this: "Data is the new oil." Data is the kind of ubiquitous resource that we can shape to provide new innovations and new insights, and it's all around us, and it can be mined very easily. It's not a particularly great metaphor in these times, especially if you live around the Gulf of Mexico, but I would, perhaps, adapt this metaphor slightly, and I would say that data is the new soil. Because for me, it feels like a fertile, creative medium. Over the years, online, we've laid down a huge amount of information and data, and we irrigate it with networks and connectivity, and it's been worked and tilled by unpaid workers and governments. And, all right, I'm kind of milking the metaphor a little bit. But it's a really fertile medium, and it feels like visualizations, infographics, data visualizations, they feel like flowers blooming from this medium. But if you look at it directly, it's just a lot of numbers and disconnected facts. But if you start working with it and playing with it in a certain way, interesting things can appear and different patterns can be revealed. 06:14 Let me show you this. Can you guess what this data set is? What rises twice a year, once in Easter and then two weeks before Christmas, has a mini peak every Monday, and then flattens out over the summer? I'll take answers. (Audience: Chocolate.) David McCandless: Chocolate. You might want to get some chocolate in. Any other guesses? (Audience: Shopping.) DM: Shopping. Yeah, retail therapy might help. (Audience: Sick leave.) DM: Sick leave. Yeah, you'll definitely want to take some time off. Shall we see? 06:50 (Laughter) 06:58 (Applause) 07:01 So, the information guru Lee Byron and myself, we scraped 10,000 status Facebook updates for the phrase "break-up" and "broken-up" and this is the pattern we found -- people clearing out for Spring Break, (Laughter) coming out of very bad weekends on a Monday, being single over the summer, and then the lowest day of the year, of course: Christmas Day. Who would do that? So there's a titanic amount of data out there now, unprecedented. But if you ask the right kind of question, or you work it in the right kind of way, interesting things can emerge. 07:41 So information is beautiful. Data is beautiful. I wonder if I could make my life beautiful. And here's my visual C.V. I'm not quite sure I've succeeded. Pretty blocky, the colors aren't that great. But I wanted to convey something to you. I started as a programmer, and then I worked as a writer for many years, about 20 years, in print, online and then in advertising, and only recently have I started designing. And I've never been to design school. I've never studied art or anything. I just kind of learned through doing. And when I started designing, I discovered an odd thing about myself. I already knew how to design, but it wasn't like I was amazingly brilliant at it, but more like I was sensitive to the ideas of grids and space and alignment and typography. It's almost like being exposed to all this media over the years had instilled a kind of dormant design literacy in me. And I don't feel like I'm unique. 08:36 I feel that everyday, all of us now are being blasted by information design. It's being poured into our eyes through the Web, and we're all visualizers now; we're all demanding a visual aspect to our information. There's something almost quite magical about visual information. It's effortless, it literally pours in. And if you're navigating a dense information jungle, coming across a beautiful graphic or a lovely data visualization, it's a relief, it's like coming across a clearing in the jungle. I was curious about this, so it led me to the work of a Danish physicist called Tor Norretranders, and he converted the bandwidth of the senses into computer terms. 09:16 So here we go. This is your senses, pouring into your senses every second. Your sense of sight is the fastest. It has the same bandwidth as a computer network. Then you have touch, which is about the speed of a USB key. And then you have hearing and smell, which has the throughput of a hard disk. And then you have poor old taste, which is like barely the throughput of a pocket calculator. And that little square in the corner, a naught .7 percent, that's the amount we're actually aware of. So a lot of your vision -- the bulk of it is visual, and it's pouring in. It's unconscious. The eye is exquisitely sensitive to patterns in variations in color, shape and pattern. It loves them, and it calls them beautiful. It's the language of the eye. If you combine the language of the eye with the language of the mind, which is about words and numbers and concepts, you start speaking two languages simultaneously, each enhancing the other. So, you have the eye, and then you drop in the concepts. And that whole thing -- it's two languages both working at the same time. 10:18 So we can use this new kind of language, if you like, to alter our perspective or change our views. Let me ask you a simple question with a really simple answer: Who has the biggest military budget? It's got to be America, right? Massive. 609 billion in 2008 -- 607, rather. So massive, in fact, that it can contain all the other military budgets in the world inside itself. Gobble, gobble, gobble, gobble, gobble. Now, you can see Africa's total debt there and the U.K. budget deficit for reference. So that might well chime with your view that America is a sort of warmongering military machine, out to overpower the world with its huge industrialmilitary complex. But is it true that America has the biggest military budget? Because America is an incredibly rich country. In fact, it's so massively rich that it can contain the four other top industrialized nations' economies inside itself, it's so vastly rich. So its military budget is bound to be enormous. So, to be fair and to alter our perspective, we have to bring in another data set, and that data set is GDP, or the country's earnings. Who has the biggest budget as a proportion of GDP? Let's have a look. That changes the picture considerably. Other countries pop into view that you, perhaps, weren't considering, and American drops into eighth. 11:33 Now you can also do this with soldiers. Who has the most soldiers? It's got to be China. Of course, 2.1 million. Again, chiming with your view that China has a militarized regime ready to, you know, mobilize its enormous forces. But of course, China has an enormous population. So if we do the same, we see a radically different picture. China drops to 124th. It actually has a tiny army when you take other data into consideration. So, absolute figures, like the military budget, in a connected world, don't give you the whole picture. They're not as true as they could be. 12:07 We need relative figures that are connected to other data so that we can see a fuller picture, and then that can lead to us changing our perspective. As Hans Rosling, the master, my master, said, "Let the dataset change your mindset." And if it can do that, maybe it can also change your behavior. 12:26 Take a look at this one. I'm a bit of a health nut. I love taking supplements and being fit, but I can never understand what's going on in terms of evidence. There's always conflicting evidence. Should I take vitamin C? Should I be taking wheatgrass? This is a visualization of all the evidence for nutritional supplements. This kind of diagram is called a balloon race. So the higher up the image, the more evidence there is for each supplement. And the bubbles correspond to popularity as regards to Google hits. So you can immediately apprehend the relationship between efficacy and popularity, but you can also, if you grade the evidence, do a "worth it" line. So supplements above this line are worth investigating, but only for the conditions listed below, and then the supplements below the line are perhaps not worth investigating. 13:17 Now this image constitutes a huge amount of work. We scraped like 1,000 studies from PubMed, the biomedical database, and we compiled them and graded them all. And it was incredibly frustrating for me because I had a book of 250 visualizations to do for my book, and I spent a month doing this, and I only filled two pages. But what it points to is that visualizing information like this is a form of knowledge compression. It's a way of squeezing an enormous amount of information and understanding into a small space. And once you've curated that data, and once you've cleaned that data, and once it's there, you can do cool stuff like this. 13:55 So I converted this into an interactive app, so I can now generate this application online -- this is the visualization online -- and I can say, "Yeah, brilliant." So it spawns itself. And then I can say, "Well, just show me the stuff that affects heart health." So let's filter that out. So heart is filtered out, so I can see if I'm curious about that. I think, "No, no. I don't want to take any synthetics, I just want to see plants and -- just show me herbs and plants. I've got all the natural ingredients." Now this app is spawning itself from the data. The data is all stored in a Google Doc, and it's literally generating itself from that data. So the data is now alive; this is a living image, and I can update it in a second. New evidence comes out. I just change a row on a spreadsheet. Doosh! Again, the image recreates itself. So it's cool. It's kind of living. 14:46 But it can go beyond data, and it can go beyond numbers. I like to apply information visualization to ideas and concepts. This is a visualization of the political spectrum, an attempt for me to try and understand how it works and how the ideas percolate down from government into society and culture, into families, into individuals, into their beliefs and back around again in a cycle. What I love about this image is it's made up of concepts, it explores our worldviews and it helps us -- it helps me anyway -- to see what others think, to see where they're coming from. And it feels just incredibly cool to do that. 15:28 What was most exciting for me designing this was that, when I was designing this image, I desperately wanted this side, the left side, to be better than the right side -- being a journalist, a Left-leaning person - - but I couldn't, because I would have created a lopsided, biased diagram. So, in order to really create a full image, I had to honor the perspectives on the right-hand side and at the same time, uncomfortably recognize how many of those qualities were actually in me, which was very, very annoying and uncomfortable. (Laughter) But not too uncomfortable, because there's something unthreatening about seeing a political perspective, versus being told or forced to listen to one. You're capable of holding conflicting viewpoints joyously when you can see them. It's even fun to engage with them because it's visual. So that's what's exciting to me, seeing how data can change my perspective and change my mind midstream -- beautiful, lovely data. 16:35 So, just to wrap up, I wanted to say that it feels to me that design is about solving problems and providing elegant solutions, and information design is about solving information problems. It feels like we have a lot of information problems in our society at the moment, from the overload and the saturation to the breakdown of trust and reliability and runaway skepticism and lack of transparency, or even just interestingness. I mean, I find information just too interesting. It has a magnetic quality that draws me in. 17:06 So, visualizing information can give us a very quick solution to those kinds of problems. Even when the information is terrible, the visual can be quite beautiful. Often we can get clarity or the answer to a simple question very quickly, like this one, the recent Icelandic volcano. Which was emitting the most CO2? Was it the planes or the volcano, the grounded planes or the volcano? So we can have a look. We look at the data and we see: Yep, the volcano emitted 150,000 tons; the grounded planes would have emitted 345,000 if they were in the sky. So essentially, we had our first carbon-neutral volcano. 17:46 (Laughter) 17:48 (Applause) 17:57 And that is beautiful. Thank you. 18:00 (Applause) Julia 00:00 So I'd like you to imagine for a moment that you're a soldier in the heat of battle. Maybe you're a Roman foot soldier or a medieval archer or maybe you're a Zulu warrior. Regardless of your time and place, there are some things that are constant. Your adrenaline is elevated, and your actions are stemming from these deeply ingrained reflexes, reflexes rooted in a need to protect yourself and your side and to defeat the enemy. 00:30 So now, I'd like you to imagine playing a very different role, that of the scout. The scout's job is not to attack or defend. The scout's job is to understand. The scout is the one going out, mapping the terrain, identifying potential obstacles. And the scout may hope to learn that, say, there's a bridge in a convenient location across a river. But above all, the scout wants to know what's really there, as accurately as possible. And in a real, actual army, both the soldier and the scout are essential. But you can also think of each of these roles as a mindset -- a metaphor for how all of us process information and ideas in our daily lives. What I'm going to argue today is that having good judgment, making accurate predictions, making good decisions, is mostly about which mindset you're in. 01:26 To illustrate these mindsets in action, I'm going to take you back to 19th-century France, where this innocuous-looking piece of paper launched one of the biggest political scandals in history. It was discovered in 1894 by officers in the French general staff. It was torn up in a wastepaper basket, but when they pieced it back together, they discovered that someone in their ranks had been selling military secrets to Germany. 01:54 So they launched a big investigation, and their suspicions quickly converged on this man, Alfred Dreyfus. He had a sterling record, no past history of wrongdoing, no motive as far as they could tell. But Dreyfus was the only Jewish officer at that rank in the army, and unfortunately at this time, the French Army was highly antiSemitic. They compared Dreyfus's handwriting to that on the memo and concluded that it was a match, even though outside professional handwriting experts were much less confident in the similarity, but never mind that. They went and searched Dreyfus's apartment, looking for any signs of espionage. They went through his files, and they didn't find anything. This just convinced them more that Dreyfus was not only guilty, but sneaky as well, because clearly he had hidden all of the evidence before they had managed to get to it. 02:45 Next, they went and looked through his personal history for any incriminating details. They talked to his teachers, they found that he had studied foreign languages in school, which clearly showed a desire to conspire with foreign governments later in life. His teachers also said that Dreyfus was known for having a good memory, which was highly suspicious, right? You know, because a spy has to remember a lot of things. 03:12 So the case went to trial, and Dreyfus was found guilty. Afterwards, they took him out into this public square and ritualistically tore his insignia from his uniform and broke his sword in two. This was called the Degradation of Dreyfus. And they sentenced him to life imprisonment on the aptly named Devil's Island, which is this barren rock off the coast of South America. So there he went, and there he spent his days alone, writing letters and letters to the French government begging them to reopen his case so they could discover his innocence. But for the most part, France considered the matter closed. 03:51 One thing that's really interesting to me about the Dreyfus Affair is this question of why the officers were so convinced that Dreyfus was guilty. I mean, you might even assume that they were setting him up, that they were intentionally framing him. But historians don't think that's what happened. As far as we can tell, the officers genuinely believed that the case against Dreyfus was strong. Which makes you wonder: What does it say about the human mind that we can find such paltry evidence to be compelling enough to convict a man? 04:24 Well, this is a case of what scientists call "motivated reasoning." It's this phenomenon in which our unconscious motivations, our desires and fears, shape the way we interpret information. Some information, some ideas, feel like our allies. We want them to win. We want to defend them. And other information or ideas are the enemy, and we want to shoot them down. So this is why I call motivated reasoning, "soldier mindset." 04:51 Probably most of you have never persecuted a French-Jewish officer for high treason, I assume, but maybe you've followed sports or politics, so you might have noticed that when the referee judges that your team committed a foul, for example, you're highly motivated to find reasons why he's wrong. But if he judges that the other team committed a foul -- awesome! That's a good call, let's not examine it too closely. Or, maybe you've read an article or a study that examined some controversial policy, like capital punishment. And, as researchers have demonstrated, if you support capital punishment and the study shows that it's not effective, then you're highly motivated to find all the reasons why the study was poorly designed. But if it shows that capital punishment works, it's a good study. And vice versa: if you don't support capital punishment, same thing. 05:44 Our judgment is strongly influenced, unconsciously, by which side we want to win. And this is ubiquitous. This shapes how we think about our health, our relationships, how we decide how to vote, what we consider fair or ethical. What's most scary to me about motivated reasoning or soldier mindset, is how unconscious it is. We can think we're being objective and fair-minded and still wind up ruining the life of an innocent man. 06:13 However, fortunately for Dreyfus, his story is not over. This is Colonel Picquart. He's another high-ranking officer in the French Army, and like most people, he assumed Dreyfus was guilty. Also like most people in the army, he was at least casually anti-Semitic. But at a certain point, Picquart began to suspect: "What if we're all wrong about Dreyfus?" What happened was, he had discovered evidence that the spying for Germany had continued, even after Dreyfus was in prison. And he had also discovered that another officer in the army had handwriting that perfectly matched the memo, much closer than Dreyfus's handwriting. So he brought these discoveries to his superiors, but to his dismay, they either didn't care or came up with elaborate rationalizations to explain his findings, like, "Well, all you've really shown, Picquart, is that there's another spy who learned how to mimic Dreyfus's handwriting, and he picked up the torch of spying after Dreyfus left. But Dreyfus is still guilty." Eventually, Picquart managed to get Dreyfus exonerated. But it took him 10 years, and for part of that time, he himself was in prison for the crime of disloyalty to the army. 07:26 A lot of people feel like Picquart can't really be the hero of this story because he was an anti-Semite and that's bad, which I agree with. But personally, for me, the fact that Picquart was anti-Semitic actually makes his actions more admirable, because he had the same prejudices, the same reasons to be biased as his fellow officers, but his motivation to find the truth and uphold it trumped all of that. 07:55 So to me, Picquart is a poster child for what I call "scout mindset." It's the drive not to make one idea win or another lose, but just to see what's really there as honestly and accurately as you can, even if it's not pretty or convenient or pleasant. This mindset is what I'm personally passionate about. And I've spent the last few years examining and trying to figure out what causes scout mindset. Why are some people, sometimes at least, able to cut through their own prejudices and biases and motivations and just try to see the facts and the evidence as objectively as they can? 08:35 And the answer is emotional. So, just as soldier mindset is rooted in emotions like defensiveness or tribalism, scout mindset is, too. It's just rooted in different emotions. For example, scouts are curious. They're more likely to say they feel pleasure when they learn new information or an itch to solve a puzzle. They're more likely to feel intrigued when they encounter something that contradicts their expectations. Scouts also have different values. They're more likely to say they think it's virtuous to test your own beliefs, and they're less likely to say that someone who changes his mind seems weak. And above all, scouts are grounded, which means their self-worth as a person isn't tied to how right or wrong they are about any particular topic. So they can believe that capital punishment works. If studies come out showing that it doesn't, they can say, "Huh. Looks like I might be wrong. Doesn't mean I'm bad or stupid." 09:41 This cluster of traits is what researchers have found -- and I've also found anecdotally -- predicts good judgment. And the key takeaway I want to leave you with about those traits is that they're primarily not about how smart you are or about how much you know. In fact, they don't correlate very much with IQ at all. They're about how you feel. There's a quote that I keep coming back to, by Saint-Exupéry. He's the author of "The Little Prince." He said, "If you want to build a ship, don't drum up your men to collect wood and give orders and distribute the work. Instead, teach them to yearn for the vast and endless sea." 10:26 In other words, I claim, if we really want to improve our judgment as individuals and as societies, what we need most is not more instruction in logic or rhetoric or probability or economics, even though those things are quite valuable. But what we most need to use those principles well is scout mindset. We need to change the way we feel. We need to learn how to feel proud instead of ashamed when we notice we might have been wrong about something. We need to learn how to feel intrigued instead of defensive when we encounter some information that contradicts our beliefs. 11:04 So the question I want to leave you with is: What do you most yearn for? Do you yearn to defend your own beliefs? Or do you yearn to see the world as clearly as you possibly can? 11:18 Thank you. 11:19 (Applause)