>> Eric Horvitz: So it's an honor to have David Karger with us today. David is a professor at computer science at MIT in the CSAI laboratory. He's a fellow of the ACM. He's on the Scientific Advisory Board of Web Science Research Institute. He received a 2003 National Academy of Science award for Initiatives in Research. Sounds like a great thing to get, and sounds like you, David, taking initiatives in research. His worked has spanned a variety of areas, including algorithms, information retrieval, networking, peer-to-peer systems, machine learning, communication coding. I can go on. I just say that when I look at people that are out there in the world, I always think that David Karger's breadth of interest sort of like remind me of myself. And one of the best compliments I've ever gotten when Jamie Tevon, one of your students who is now on our team, said you remind me of David Karger. I said oh, really? That was a big compliment for me. His recent work is focused on developing tools to let users and groups gather, manage, visualize and share information. And today he'll be talking about helping regular people communicate on the Web through rich interactive data visualizations. David. >> David Karger: Thanks, Eric. It's a pleasure to be here. Some of you know, some of you may not, that I've actually been working for Microsoft for year -- well, I spent sabbatical here and I've been working here half time for the past six months or so pushing forward some of these ideas. Oh, maybe we'll get the KAI announcements on the projector. [laughter]. So this reflects the work that I sort of started doing at MIT but which brought me into Microsoft and which I'm still trying to sort of advance here as well as back at MIT. The title's a bit of a riff on, all of the massive data work that everybody is talking about these days. Everybody talks about these huge datasets and all of the challenges that we face in trying to process them and so on. I want to bring a somewhat alternative message that there's a ton of stuff for us to do with small datasets that can be very valuable. And I'd like to see us advance in that direction. Okay. So I always like to start with the conclusion in case people have to leave early. And so here it is. What I'm going to be arguing today is that structure really enhances the value of information over sort of unstructured information like text, if you can get it, in that it allows all kinds of rich visualization and interactions and makes it easier for you to take that information and repurpose it, combine it with other information, so on and so forth. And I'm also going to argue that working with structured information can and should be done by end users. And that there's a big gap between what they can do and what we are allowing them to do with today's tools. They can author structured information, they can author the interactions of that information. They can combine and repurpose the information. And they can do all of this with very simple editing tools, as opposed to learning to become database engineers. And I'll show you some of those tools that you can use right now that we've got out in the -- out in the wild. The key to all of this is a real focus on sort of a data-centric architecture. Instead of thinking about complex applications, I think the right perspective for this class of users is to think of sort of lightweight skins of style over datasets and helping regular people author these themes or these styles through direct manipulation without any -- using any kind of fancy APIs. Now, I also wanted to sort of give an early alert. There was some debate about which talk I should give and Eric said well, you've got an hour and a half, so give both. And so what I'm going to do is I'm going to give this talk that I've just sort of given the conclusion for and then, after giving anybody who wants to a chance to leave, I'm going to spend a few minutes sort of surveying a bunch of the other projects that my research group is involved with that I'd be more than happy to talk to everybody about -- to talk to people about or, you know, find some interesting collaborations and such. Okay? So get set for a marathon or to leave early. So talk about these five problems that I think we see a lot on the Web today. The main one being that individuals cannot effectively communicate their ideas as effectively as large, wealthy, well skilled organizations that can hire teams of developers to do fancier things. On the flip side, there's the problem that a lot of the information that people would like to find or work with on the Web they can't find it because either it's never been published or because it's been published in a way that they can't effectively make use of it. As particular funny examples of this that I find particularly interesting, we've had this whole data.gov effort, this effort by the United States government to put all sorts of datasets up online. And there's tons and tons of datasets. But somehow they're not really -- they haven't had as much impact as some of us imagined that they would have when the government started putting all this data out. They just kind of sit there. Conversely, scientists have lots and lots of data which they never actually release. It stays inside of their -- they hug it close and just publish little -- you know, little line charts about their data, but the data itself doesn't come out. So I want to tackle all these problems with one hammer. You know, every academic has to have their hammer. And that is to make gathering and publishing rich interactive data visualizations as easy as publishing text. So to motivate this, let's look at -- back at the Web in the early days. We had these boring but exciting at the time static Web pages, right? We could -- you know, anybody who wanted to could come along and author a little bit of HTML and put it on their website, and all of a sudden they were one of these early adopters of the Web. In fact, most people didn't even author a webpage. Right? All that they would do is copy somebody else's and make a couple of changes to make it into their own. Yes. So this sort of monkey see, monkey do, was a really important part of the spread of the earlier Web. Nobody learned anything. It was just copying what other people do. >>: [inaudible]. >> David Karger: What? Okay. There were also plenty of typos in the early Web. All right. So now things have gotten a little bit easier. People don't really have to author source code anymore, instead they put up blogs or post in forums or collaboratively edit wikis that are out on the Web. So now, you know, just about anybody can put stuff out on the Web if they want to. And so the Web created this whole society of authors. All of a sudden huge numbers of people began to publish information on the Web and consume information that other people had published. But it's worth asking. What really changed with the Web? Was it the civility to fetch remote content from a server anywhere in the world? Well, no, we could always do that, right? You just start up your FTP clients and, you know, somebody else has authored a document and put it on their FTP server and you type in a few arcane command and they're you've got the document and you can look at it in your word processor or whatever other tool's appropriate. Right? So no real change. We could always do this. But the Web introduced these very minor workflow changes, right, it created these URLs that packaged up this whole fetching process in one string, right? And the click as a way to access that without typing in all of those arcane commands and then it created this browser as a place where you would just stay as you were accessing all of this remote content, you wouldn't have to keep launching new applications in order to see everything that you were looking at. And on top of this, we've got this copy, paste, tweak ecology that I've already mentioned that let people author without learning anything complicated. So the Web really didn't make anything new possible, it just made stuff, certain stuff simpler, and that was -- that caused the revolution. It's just making things a little bit easier to do so that you didn't have to think about them. So there was this real virtuous cycle created of Web authoring where it was really easy for people to finds stuff on the Web, and that created and incentive for other people to create stuff, because they knew that people were looking for it. And they would get kudos from having put it up there. There was also on the flip side the fact that it was really easy to create this content. You didn't have to host anything -- you didn't have to build or install anything yourself, you just put this file on to the Web server and you were all done. So that was great. But now things have changed a little bit. Instead of these 1990's webpage, we've got fancy Web pages with all sorts of rich interaction, right? You can filter and search and sort the information you're looking at. It's presented in all sorts of fancy templates and just looks a lot more interesting and is a lot more useful for navigation. You've got rich visualizations like maps and timelines and such. And if you look, I think that there's been a real split now, that people who have the resources, the money or the skills, are able to create these incredible rich visualizations and powerful interactive exploration and navigation. But plain user websites haven't changed, okay? These professional websites can afford to implement a rich data model where they put all the information into databases and extract it using complex queries and then feed it into templating Web servers. And top of that they can exploit the structure to do rich information interactions with filtering and sorting and all of these fans views. Plain authors don't know how to install a database, don't know how to define all of the data that goes into the database, don't know how to write the queries don't know how to write airplanes to these rich visualizations so they're still basically limited to text pages. Okay. Whether it be wikis or forums or blogs, it's basically text. And so they have less power to communicate effectively and therefore less incentive to publish the information that they could communicate. So this is actually a modern plain user webpage. Somebody's really interested in breakfast cereal. Right? And so he's made this webpage. But it's just a static webpage, right. You can click to navigate to a particular brand and inside of the particular brand you can see lists of breakfast cereal characters but there's no filtering or sorting or any of the other stuff that we have grown to expect on a fancy website. There are content carriers like Flickr and Amazon and Epicurious that are hosting sites for a particular kind of content that a user can contribute. Okay? So you can post a book review to Amazon or you can post a recipe on a recipe site or a photo on Flickr. And these sites are -- do have rich interactions driven by their structured data that the -- the site owners manage. Plain users can contribute to these repositories and benefit from the structure when they explore or consume the data. But there are real limits, right? You have to publish exactly the way that the content carrier has chosen to arrange the information. You can't say I don't like your schema, I don't like your organization. Okay? There are plenty of book sites out there, but my wife actually keeps her books at home sorted by the public -- by the birth date of the author. Okay? Now, is that something that your typically book site is going to support? No. It's something that works for her, and she might want to share that. But she can't change the site to do that. I maintain a folk dance video collection. So it's a video collection, but there's all sorts of bizarre metadata for folk dances that is not part of the YouTube metadata. So nobody can use that to navigate through my folk dance video collection, okay? If a scientist wants to change paradigms their stuff is not going to fit into the usual scientific databases. And if you get into really weird stuff you're completely doomed. There isn't even a starting point for the kind of information that you want to share. Gets even worse between sites of course because each of these sites managing its own kind of information and if you want to combine the information from one place with the information from another place then neither of the places is a good starting point. You have to create a third place that integrates the information from these two sources and create your own visualizations of that information. And this is the whole sort of mash-up scene that is popular now. But again, in order to create a mash-up you have to be a Web developer. You can't be just a regular person taking information from two places and putting it together. And of course the result is just another vertical website that again can't be changed by anybody. Now, ideally, we want to democratize all of this. Anybody should be able to do what the big content creators are doing, create interesting data or find it on multiple sites, put it together, create rich visualizations of it with interaction and make that available to anybody on the Web without knowing how to program or install a database or even what a database schema is. And if we can do this, then we're going to take this whole long tail of the Web, the small people, and instead of having them only working on text, they'll become sort of full fledged contributors to all of this rich information that we can find on the Web today. So that's the motivation. What about the how? How do we actually do all of this? Well, most of the Web, if you look at it, is crud, okay, which means that there is a process of people creating information, reading that information, occasionally updating it and even more rarely, deleting it. There isn't necessarily a lot of computation over that information. It's basically the Web is a big storage bank and you looking at what has been stored. And this is true even on these professional websites. And so if we can just democratize that much, this process of creating and showing, we don't have to worry about the computational aspects of data. So I'm going to outline an approach which starts by observing that of course accomplishing data is very easy. Anybody can put a spreadsheet on their Web server. So the only challenging part is the visualization side. And so what we're going to do is we're going to identify the key elements of the interactive visualizations that we see on the Web, and we're just going to add them to the HTML document vocabulary that everybody is already familiar with, okay. We're going to make up some new tags that talk about data the same way as we talk about images or video in Web documents today. And we are going to configure those data visualizations by binding them to the data that's in the page, the same way as you attach a chart to a spreadsheet in Excel. Okay? So let's look at a typical webpage. Here's that Epicurious page with its searching, sorting, and filtering. And let's look at the elements on this page. So some of them are very familiar, right? We've got images. Okay? We're all familiar with images and how you create an image on a webpage by putting an image tag into the webpage. But let's look at the data -- the interesting data part of the webpage. What do we have here? Well, we've got recipes, okay? So there's a bunch of items listed on this webpage. And each of those items has a bunch of properties, a title, a source magazine, a publication data rating, okay? And these are all presented to you in this template that is being used over and over again to render the items on the webpage. Okay? At a higher level, we have a view, okay? So the view is this actually -- I don't know why my boxes are breaking, but this whole big thing here which has some summary information about a set of items and here a sorted list, what -- I guess I only have 59 minutes to finish my talk. So there's here a list which can be sorted by a variety of properties like best match and a template that's used to present every item in the list. Up here we've got what are called facets. A faceted navigation has become typical on the Web. You get various categories that you can use to filter your items. So here we're filtering according to a main ingredient. But it also offers the possibility of filtering by course or dish or season. All of these are ways to filter -- to narrow down the set of information that you're looking at. And of course there's also a text search. So these in fact are the basic keys -- the key elements of basically all the interact Web pages that we see on the Web, right? There's data, there's templates for rendering individual data items that tell you basically how should the properties of each item be laid out. There are views which are ways of looking at collections of items. Are we going to have to do this over and over again? Okay. Let's just go hide that over there. And let's -- as I said, 59 minutes from now, we will see what happens. And these -- so there are these huge like lists or thumbnail collections on Flickr or maps or scatter plots, whatever you want to -- whatever you want to do to display a collection of items. And again, these are connected to data by specifying which properties of the information determine the layout or the position of the individual items in the view. So for a map, you need a latitude and longitude for every item in order to plot it. And then these facets for doing the filtering. So what we're going to do is make it possible for people to author all of these. And, in fact, they already can. Okay? So for example, if you think about data, well people use spreadsheets to author those all the time. Is it closing my -didn't it say that I had 59 minutes? >>: [inaudible]. >> David Karger: I guess we're going to take a small break now while the computer reboots. >>: [inaudible]. >> David Karger: Yes. Go ahead. I can certainly take questions. >>: So you have data templates views and facets [inaudible] and building blocks. Do you also have grouping? Is that part of views maybe? >> David Karger: Yes. Grouping is one of the things that you might want to do within a view, sort of when you're looking at a collection, right, and so, in fact, the list -- now it's not even -- there we go. And so in fact the kind of list views that I will show you once we get that started up again do support grouping as well as sorting of the items in the list. >>: [inaudible]. >> David Karger: But of course [inaudible] spreadsheets isn't going to work and that's sort of the argument that I want to make because they're boring, right? Who wants to look at a table full of -- you know, at a bunch of columns of data on the Web, right, when we've got all these beautiful ->>: And moreover, there's [inaudible]. >> David Karger: Yes, there is indeed. >>: And [inaudible]. >> David Karger: Yes. And so actually this comes back to -- this comes back to what I was just saying to Eric. Because there's so much -- it just made me change my password. And now we have to see if I can remember what it is. There's a ton of text on the Web and so there's a tremendous amount of work going into figure out how to -- figuring out how to extract structured information from where it got put into the text on the Web. Okay? And this is obviously necessary work, right, that, you know, we're not -- this text isn't going anywhere, there's a ton of value in it, let's get it out, okay. But what I think is being neglected is the question of how do we get people to author the information as structured data in the first place so that we don't have to figure out sophisticated algorithms for extracting structured information from text? >>: But that requires [inaudible]. >> David Karger: Well [inaudible]. >>: [inaudible]. >> David Karger: It doesn't necessarily require planning ahead if -- and this is the points of my talk, if we can give them an incentive to do it right away. If we can say you'll actually have a better time, it will be better for you to publish this data as structured data than it would be for you to publish it as text. >>: [inaudible]. >> David Karger: What was that? >>: And as easy. That's right. We have to create incentives and remove disincentives. >>: [inaudible] and the Epicurious people certainly somewhere [inaudible] they've decided not to publish it. >> David Karger: Correct. And there's a great -- but there's a great discussion about incentives for sharing structured data. And disincentives for sharing structured data which parallels but is -- sorry, is close to but not identical to the discussions about sharing information on the Web in general. And I've got some slides about that towards the end because I think it's a very -- it's a very important question. Okay. Where were we? So actually it was a reasonable time to spot because we had just gotten through the sort of motivation story and I was going to tell you how to do everything. What's that doing -- no, that's not the right place. Okay. Let's charge forward. Okay. Communicating with data. Okay. So we identified these elements views, facets, templates and the data. Can people author them? Well, they're authoring data in spreadsheets. They create views all the time inside of those spreadsheets by making charts, okay, by specify which columns in the spreadsheet go to which chart -- where in which chart. Facets are a lot like views. You need to specify which column of the data you want to filter on. And again, that's something that spreadsheets make available. And templates, well templates are every where. We have document templates in Microsoft Word for example and people are comfortable working with those. But they aren't doing it so much on the Web yet. Okay? So we created a proof of concept implementation of this idea, of creating new tags that will make things as easy to author in a webpage as text in images. It's designed to let somebody publish an interactive data visualization by putting two files on their website. One of the files is a data file, which can be a spreadsheet or a CSV or a variety of other formats, and the other is an HTML document with these -- with these added tags like lens tags and view tags and facet tags to specify the different data interactive elements. We have a JavaScript library that interprets these tags and makes the right things happen, so the user doesn't see -- doesn't program anything and doesn't know how it happens if they just put this magic JavaScript file -- link into their header everything just works. And it all runs in the visitors browser. So there's nothing to install on the server side. So let's walk through some demos of that. Hello. Anybody see a browser? There's a browser. Okay. So here is an exhibit that was created using our framework. I'm afraid I have a lot of pages to load, so there's going to have to be some waiting here. Good. So this is an exhibit about the presidents of the United States. We have a timeline showing when they were in office. We have a map showing where they were born. On each of them you've got an icon showing them, and you've got some additional information that you can pull up in a bubble. Over on the left we've got -- no, I don't want to install more updates. We've got filters so you can filter on the religion of the president, for example and see the -- both of the views update with that. There -- it's a sort of a combination filter. You can add in multiple values. And you can also intersect with other restrictions. So here I'm looking only at the democratic Presbyterians and, you know, the view shows me how interesting. They all lived on the East Coast. Not clear why. Okay. There's text search if I want to home in on a particular -- on a particular president. There are also alternative visualizations. So besides this map of where people were born, here we've got maps of death -- of where people died, okay? Again, we've still got the same ability to pull up information about any one of them. We also have here color coding according to party. We've got a detail view which is your more typical tabular view. You can filter on different properties. And basically continue to use the facets to filter this information. So now we're getting only the rows of the Episcopalians. So that's an -- that's a typical rich information visualization that's got the things that you're used to, views and facets and text search and templates for individual elements. How is it created? Well, let's look at the HTML. As I said, we've got a JavaScript library that interprets these new HTML tags that we've created. And down here inside of the perfectly normal HTML document are the tags themselves. So we start with a data file. We should take a look at that. Which in this case is represented as JSON but as I say could just as easily be a spreadsheet. And it's basically a collection of items, each of which has some properties like their name and what terms they were in and whether they died in office and their date of birth and some values for each of those. On top of that, we create these special tags. So the reason that there's a facet on the left that lets you filter by religion is that we put a facet tag into the document, okay? And all it says is make a facet and use the religion property as the thing that you filter on. Similarly for the party and whether or not they died in office, okay? They're just simple tags like any other. Lower down, much like inserting a -- an image, you use one HTML tag to make a timeline, okay? And so actually the only required properties for this timeline are the in date and out date which are the things that specify the start -- or just the in date to specify a start date for the timeline and then you can optionally specify an end for each element of the timeline using another property. So those are start and end. Everything else here is optional. But by specifying a color key, you can tell it to color code the lines on the timeline according to a different property. And then there are things for setting the dimensions and so on and so forth. Similarly, further down we have a map view. For the map view, the minimum that you need to specify is what property in the data contains the latitudes and longitudes that should be used to plot items on the map. Here we've specified as well what property of the data contains a URL of an icon or an image that should be put into the point that's plotted on to the map? And then there are again the usual, you know, center the map here and make it this big and so on and so forth. Here's the death places map, which is done in much the same way, but we're using a different property now to plot points on the map. So this is again very similar to charting in Excel, right, you specify some columns and the roles that they play in the chart. The tabular view is equally easy. You just specify a list of the properties that should be placed in columns of the tabular view. The last part is the templates and these are implemented using tags that we call lens tags. And so here is a lens, okay. And all it is is a fragment of HTML representing how the individual items should look. And inside of that HTML their -- it's like Mad Libs, you just specify how different blanks in the template should be filled in using the properties that are drawn from the specific item. Okay? So that's the extended HTML vocabulary for specifying these kinds of visualizations. So we put up a couple of visualizations. Here's one of our department directory. This is using a thumbnail view. And you'll notice -- you asked about grouping so this actually supports grouping. So if we for example go by floor it groups on this category but then you can also sort by other things within that grouped category. Okay? We've got the obvious facets and each of these thumbnails is one of these lenses, a template for a particular item. Here's another one we made more of a chart kind of visualization. Okay. So this is a -- this is a line -- a line chart view of some tab -- of some structured information. Again, you can facet in order to filter certain parts of the -- of the dataset. Okay? By team or by year, so on and so forth. Once we put up a few and sort of announced it, other people started making some, which was nice to approve that it wasn't only the designers of the tool that could -- that could build visualizations. So let me load up a few since the network -- >>: [inaudible]. So in the long run will [inaudible] simple queries to large databases, other interesting issues [inaudible]. >> David Karger: Absolutely. But of course one of the arguments that I made right at the beginning was that we shouldn't jump right ahead to thinking about large databases because there are a ton of really small databases. And just the challenge of working effectively with those if we could do that, I think we would have tremendous progress. >>: I heard that. But I was asking because we're actually facing something like this in our healthcare area, and question would be, you know, what would -- in the long-term and maybe wait until the end of your talk, what do you see coming in terms of large scale data mining that's interactive with Web tools to help you do that kind of thing. >> David Karger: Right. >>: In regular, kind of normal consumer oriented Web [inaudible]. >> David Karger: Yes. Well, so I think there's actually two very -- an important differentiation in that data mining is one stage, someone investigating in data, trying to figure out what's going on. But then there tends to be a communication step. Somebody has figured something out and wants to share what it is that they know. Okay? And you need -- the tools that you need for that are different. And the datasets that you're working with are not going to be as large or as confusing because you've already sort of homed in on what you want to convey. >>: Like where the consumer model might be very simple but -- and but like trends of various kinds that [inaudible]. >> David Karger: Right. >>: But the back end is complicated. >> David Karger: Yes. Yes. I mean again, that sort of trend computation, that's part of this computation aspect which I sort of explicitly pushed to the side and said let's concentrate on just authoring and seeing the results of a computation. Okay? I think that's a lower bar than asking end users to actually be able to carry out complex computations. Okay? So let's run through a few other examples. Here's a map of ozone concentrations around the world that were generated from actually a dataset on data.gov. Here's some local newspaper using exhibits to show locations for their Fringe Festival in Minneapolis. Sorry. These take a while to load. But they interact very quickly because all of the information is actually on the client, right? There's no -- there's no interaction with the server to do the visualization. So here is a Gina Trapani made a nice little map of all of the places that she's been, all the Broadway shows that she's been to and where they are in New York City, musical versus play, what theater. Okay? Here are vegetarian restaurants in Glasgow. I was going to take a little more time on these, but I'm going to try to rush through them to make up what we lost. Here's a nice visualization of sort so of the history of classical music. Somebody made a relatively complex template where inside of the template you actually have a video, you can watch a performance of the music by that composer? Okay. You can filter on periods. He's color coded the timeline by periods. You can switch to alternative visualizations. Here's a map of Europe that is showing -- you know, it's very dangerous to be a classical composer in Europe. Because you -- because they all die there. Here my -- here's my unpublications page. Obviously every academic is interested in making their publications as accessible as possible. So here you can get a nice filter according to the different areas that I've worked in or the different types of publication, the venues. You can filter by different co-authors. Up here you can see a tag class -- sort of a cloud view of my publication by year so that you can see I became more and more productive until I got tenure and then it sort of drained off after that. Okay? Here's somebody showing microloans throughout the world, drawing off of a spreadsheet that they maintained somewhere else. European Court of Human Right cases. All of this slowness is page loads from the net. The framework itself is nice and fast. Showing, you know, what's their status within the court, what's their type of violation, who -- what the location of the court case is or where the violation took place. The law library at Colombia uses it in order to help you sort of find your way through the different available law resources. Somebody really interested in soccer is using it to show the history of the World Cup. >>: [inaudible]. >> David Karger: Well, then they're all going to be fighting each other for the network bandwidth. >>: [inaudible] authors become aware of this work? >> David Karger: Word of mouth, okay? Or our publication. Or they ran into another exhibit. Although actually this is one of the real problems with this undertaking which I was going to talk about a little bit later is how do people become aware of it. So you come and you visit -- you visit this webpage and you say cool, it's got a timeline and interactive filtering and bubbles and so on and so forth. I don't think it occurs to the regular user that they could have done it themselves. They figure oh, somebody installed the database and wrote a templating engine and so on and so forth. And I'll come back to that later on. But sort of making people realize what they can do is I think one of the biggest challenges here. Somebody decided to use this for -- for genealogy and they actually authored a new view, a family tree view, to let you look at the information that way. It's integrated with the rest of the exhibits so that you can see the usual, you know, timeline of when people were alive or a map of where they lived and so on. And again, this is I think important for the ecology that now somebody instead of having to write a whole new application for genealogy just has to address this one particular need of a particular visualization, throw it into the framework and they get everything else for free. So it's a finer grained approach to the development of information and visualization than writing whole new applications. Some newspapers have picked up on exhibits. So this is the St. Petersburg Chronicle showing double dippers, those public officials who retire and then draw big salaries as well as pensions with filtering by agency. Here we've got -- let's see, where did my other newspapers go? Oh, yes, foreclosures in San Francisco. >>: [inaudible]. >> David Karger: Yes. Yes. Absolutely. >>: [inaudible]. >> David Karger: Yes. That's a really, really good point. And -- so I'm a really big fan of many eyes and for the beauty of the visualizations that we can create. But, you know, I look and I see how often people want to have something that's theirs, right? They own it. It's on their webpage. It's not -- and so even these tools that sort of let you go build a visualization somewhere else and then embed it in your webpage, it somehow, it's not yours. And I really want to -- to address that demand for ownership that people have. So there's another interesting thing about this one, which is a map of foreclosures in the Bay area which is that all it is is a map, right? The Google map's API is perfectly capable of rendering this map. Why did they use the exhibit framework? Well, because they didn't have to learn an API in order to use the exhibit framework, they just had to make a dataset of points and point the map thing out. Yeah? >>: [inaudible] exhibit API. >> David Karger: Well, they had to learn those HTML tags, but they never had to write any JavaScript, right? And if it was somebody who didn't know JavaScript, they didn't have to learn JavaScript in order to be able to write to the API. >>: [inaudible] I mean the template for each is almost exactly the same and is almost exactly the same number of lines. >> David Karger: Yeah, but I -- okay. I believe that this is simpler in people's minds, but I can't prove it. So I will sort of leave it as a debatable point. >>: [inaudible]. >> David Karger: Right. That's just it. Authoring HTML is something that people have done since the '90s. >>: [inaudible]. >> David Karger: Yes, yes, I'll get to that. So here is the Star Tribune in Minneapolis showing schools failing to meet standards. And they also showed bridges, failing bridges. I'll let these load up while I'm talking. It's also been used in some interesting scientific context. So here's somebody who was studying language acquisition and this is all of their data for their PhD. So this shows, you know, what interview questions were asked and unfortunately I can't tell you too much about this exhibit because I don't know Japanese but they provide a way to sort of navigate through their different acquisition subjects and look at that information. Here we've got brain some sort of brain gene expression data which again I can't tell you anything about. I don't know what this means. But that's a good thing, right, that somebody was able to create this without calling me and saying okay, I need a computer scientist to help me deal with this brain expression -- this gene expression data? Because what do I know about gene expression? They should be able to do it themselves. Okay. So here we have a pretty fancy view that they've created and here we've got their -- the facets that they decided were important. More biology stuff. Here is gene mutations. And again, I don't know what -- I don't know what AA position and SNP SV and SIFT score are, but they seem to matter to biologists so they put them in. Here a biologist actually took our timeline and turned it into a gene viewer. So the gene is a nice long sequence and so you can scroll over that sequence the same way as you scroll over a timeline. And you can filter it on things like protease and MEROPS families, whatever those are. Okay. All right. So enough with the different visualizations. I basically wanted to throw a lot at you in order to argue that lots of different visualizations can be created using this framework without writing any new code. Now, it has some scalability limitations because it is JavaScript, so Eric asked about this. It's nice and interactive if you have less than 500 items. Somebody made an exhibit of all the Lego sets that were ever sold, and there are 2,733 of those. And it still works, it just slows down linearly with the number of items. The problem actually is in manipulation of the DOM of the webpage to insert all of these items that are being shown. It didn't a limitation per se, because as I've argued, there are tons of small datasets out there. What I would like to see happen is for this ability to template and render data to become part of the browser layer. We're already seeing this, the latest releases or browsers now have this latest database storage API in them. What they don't yet have is a set of HTML tags that people can use to access the data that's in that store without being JavaScript programmers. And that's the kind of thing that I'm trying to demonstrate with the exhibit. Okay. At that point, I think we would easily scale to 50,000 data items. Because that's less than the amount of data that's in a typical webpage today, the sort of two megabyte Web pages that you pull down with all of their style files and things. All right. I kept these in case something crashed, but these are just other visualizations. Here's the leg go sets, all 2,700 of them. So the argument here is that these kinds of pages can show people a reason to publish structured data, right, if you publish structured data then you can have all this fancy rich interactivity visualization. Great. It lets you communicate better. It's also actually easier to maintain, right? If you have a map of all of the places that you've -- I'm sorry. Map's a bad example. But if I've got a publications page, okay, then every time I create a new publication I have to go and edit some new HTML and make sure that it looks right with all of the other HTML. If instead I just have a spreadsheet where I keep all of my publications, I just write a new row into that spreadsheet, I don't have to do the formatting again, the page takes care of it for me. So this is the same sort of motivation as CSS, right, the style sheets that there's a certain part of your webpage that's just not going to change, and you should just concentrate on changing the part that does change. Now, an interesting side effect of all of this is that we're convincing people to author the structured data. Well, that structured data is now exposed. Other people can access it and use it for other visualizations or critique it. So the selfish incentive to communicate better is leading to this social benefit of making data available. And we tried to make this explicit. I can show you on any of the visualizations that we've created where somebody hasn't turned it off. We produce a data copy button. So if there's data on this page, you click that button and you can take the data out in whatever format is suitable for your needs, okay? So we're trying to create the same sort of copy, paste, ecology as we saw in the earlier days of the Web. You see something in you like it, you copy it and you change it. So you pull it out, and you put it in. Now, we've seen signs of this. So actually here is an exhibit that we found on the Web in the early days. And you can see there's a small issue with it, right? So they started with an exhibit that we created of MIT Nobel Prize winners, and they said that looks about like what I want. I just want to put some different data there. So they downloaded the visualization, which after all is just an HTML document, and they changed the data file. And they just forgot to update the title of the HTML document. So this is nice documented proof. So there's two kinds of copying that can go on, right? You can copy down the data or you can copy down the visualization because both of them are just static files. There's nothing to install or program. We've created this small set of use. There are many others. But as I showed you, we -- our framework makes it possible for people to actually instantiate some new kind of view like the genealogy view as part of the framework. And I think that -- this is sort of the future that I wish on Web designers or on Web developers. Instead of developing whole applications, they can develop information visualizations that can be incorporated as part of entire visual -entire page that is are authored by the end user. Let's see. Let me skip over the -- well, okay. So another thing that we did with exhibit was we replaced the MIT course catalog, okay? So I called in 4 undergraduates and said make me a new course catalog that's better than this sort of big page of course listings that they have right now. Took them two days to write the UI and two days to reformat the data. After it took six months to get the data that we needed. Okay? So we contacted the registrar and we said we'd like to make a better visualization of the MIT course catalog. And they said why would you want to do that? Ours is perfectly good. And then they said, well, wait, if you do that and there are any mistakes in what you do then people are going to come see it and they're going to register for the wrong courses and it's going to be our fault that we gave you the data. So we had to argue all sorts of with issues to convince them that opening up their data would be a good thing. Once we put the course catalog up and they looked at it, they said oh, that why you wanted the data? That's really nice. And then sort of a couple weeks later they gave us a direct line into their database to be able to pull out the data whenever we needed it, because now there was a reason for that data to be open and so they made it open. So again, I think the visualization drives the creation -- or drives the opening up of data, and I think that that's very valuable. All right. Now, I left a big problem on the floor -- on the table here which is who edits HTML source code these days? Right? Only -- only people like us, right? Geeks. So we need to -- I mean so I believe that the framework, the idea of these visualization tags is correct, but we have to -- to get to everybody, we have to give them appropriate authoring tools for dealing with those tags. Okay? And so we've built three different tools that show the kind -- that show the way that a regular person can actually author. So instead of being in the Web of the 1990s with source code authoring, you're in the modern Web where people work with things like wikis or WYSIWYG editable documents or blogs. And what we've done is we've said, well, since we're just extending the HTML vocabulary, we should be able to go with the flow, use the tools that people currently use to author different kinds of HTML and just incorporate data authoring as part of those tools. And so we've done this three different ways. We've done it in a Wiki, we've done it through an editable stand alone document with a WYSIWYG editor, and we've done it as part of the blog publication process. So let me show you two of them briefly and then I'll spends a lot of time on data blogging. So one of the things we did was we added data visualization to Mediawiki, to the software that runs Wikipedia. Not on that site, but to the underlying software. We started with something called semantic Mediawiki which was a preexisting extension for Wikipedia that lets you -- for Mediawiki that lets you put structured data into the Wiki. You might think there's already structured data in the Wiki because all of those pages have info boxes in them, you know. If you go to the page for a particular President of the United States, there's this nice table on the right-hand side which shows when they were born and when they went into office and what was their party and so on and so forth. But that's not actually structured data. Because the Mediawiki treats it as text. But somebody wrote an extension to take all the information that you're putting into those templates and put it into an actual database so that you can query that database and get back the information -- and get back structured information from what people are putting into the templates. We simply enriched that with our exhibit framework. So the underlying extension just gives you back a table of results. We shoved that table of results into all of the rich information visualizations that we've created. You author them the same way as you author any Wiki page. So if I go over here to my beer page, beer. So I had a German student come out and do this. And he was very interested in beer. And he made this visualization of different varieties of beer throughout the world. And it's an exhibit like many of the other ones I've already shown you. Okay. This is the list view where you can sort by type or what country it was brewed in. We have a tabular view, and we also have a map showing where everything was -- is created and little bubbles showing what kind of things happen, showing the brand of the beer. Now, if you look at the source of this page, this is the whole thing. So this is the text extension where you can query the data inside of the Wiki and get back columns of structured information. So you write in that query. And this was already available as part of the Mediawiki extension. What we added was the ability to say format the results as an exhibit and show me a list view, a tabular view, and a map view of those results and put in a brewed in country facet for filtering them. So anybody who's comfortable offering Wiki text can offer a visualization like this. You can also use the Wiki framework to edit the individual pages. And here again, you'll see typical Wiki text in the Wiki, but as you edit it, it goes into the database and then the view gets updated to reflect the modified visualization. So that was our attempt to fit into the Wiki workflow. And you can play with that right here, projects.csail.mit.edu/wibit. And it's world writable, so you can spam it if you want. The next thing that we worked with was documents. So everybody edits documents but the documents that we edit right now have mainly text. Let's introduce structured data into the documents. So here again -- here we go. So here's a structured data document. And it's an exhibit but it's an editable one. So I can edit the information right here. And it immediately updates in the visualization. I can add new items, I can delete items, I can do all of this while I am interacting with the data. I can also interact with the visualization itself. So for example I can grab my WYSIWYG editor and say I would like to add a -- sorry. My screen is a little small. A facet to filter on the discipline within which these Nobel Prize winners won their prize. And I would also like to go over here and add a timeline view based on the Nobel year. And when I'm done with all this editing, I now have the ability to filter on discipline or look at a different visualization that I've just thrown in. And this is just a file. So once I finished with editing the data and editing the visualization I click save, and it's saved. And now I can e-mail this file to somebody else, I can put it into a version control repository, I can upload it to SharePoint, I can put it on the Web and let people interact with it that way. It's just a file, and all of the things that you do with a file you can do with this document. So that also -- and again, this works because we're just editing tag. So I grabbed an open source HTML editor and just -- just told it about our tags as something else that it should be able to edit and so it was very quick to build this -- to build this prototype. Now, the last thing that I wants to talk about -- oh, but I have to say. Just this morning, as I was putting these slides together, there was an announcement of the executable paper grand challenge from ELSEVIER which they want a way for scientists to be able to publish their papers, including the data and the ability for somebody who's reading the paper to interact with the data. >>: [inaudible]. >> David Karger: Yes. But it also sounds very DIDOish. DIDO is the tool that I just showed you, right, a data integrated active document is exactly what these scientists need. And again, I think that the opportunity to publish not just a boring old line chart but some data that a reader can actually read can provide the kind of incentive that we're looking for for the scientists to put their data out in their publications. Last thing I want to talk about is blogging of data. So we built a tool to integrate these sort of rich visualizations into -- into word press. Before we started we thought we would use this to test whether this is actually a need for data publication. So we grabbed a bunch of the blogs off of technorati which tracks lots of blogs and looked through the articles that were booking posted on those blogs. And here's what we found. 21 percent of the articles and 81 percent of the blogs in their postings would enumerate the properties of a structured item. Okay? So, you know, when somebody posts a review, they tend to throw lots of properties of the item being reviewed into their text. Okay? Also, 30 percent of the articles and 86 percent of the blogs did sort of data comparison. So here we were -- this was election season and so people were talking about poles and different results for different candidates. 91 percent of the articles actually referenced some data that were somewhere else. 32 percent of them did explicitly and 59 percent of them just did it implicitly by sort of referring to some data set that you couldn't actually link to. What was -- but invariably this data was conveyed either as text or at best as an HTML table, or maybe as a picture of a chart that they made with some other tool. And I have to tell -- so Eric, were you involved with CPOF? So I heard this great story about the command post of the future which was this very fancy system for the military to let them ->>: [inaudible]. >> David Karger: Okay. So they ->>: [inaudible]. >> David Karger: They built this big fancy tool that would let the military create these very rich information visualizations in this tool of all you know all of the units and the resources and the combatants and so on and so forth interact with it. The way this was used was that people with one instance of this command public keys of the future would create this really rich visualization. Then they would take a screenshot of it and send that over to another installation where they would put that screenshot into the command post of the future installation that they had. So instead of caring all of this rich data from one application to another, all that you got was an image of the output, and you lost all of the richness, even though it was the same tool on both sides. Okay? And we get some of the same thing in data, in blogs, where people will create a rich visualization using some other tool and put a picture of it in their blog posting. So we made Datapress, which is a word press plugin. And it uses the standard Datapress workflow. You upload or link to some data, and then you WYSIWYG your visualization of that data using the regular word press blog post editor. So here is sort of a work example for somebody writing a blog post at a conference. We gave a paper about this at the Semantic Web Conference last month. They're writing their blog post. They go to the editor, and they notice that above the toolbar there's some new stuff, namely a -- an upload for data and a visualization button. So you the click on the visualization button and you point it at a spreadsheet say, where there is some data. Okay? And you type in that URL, okay? We've got a little wizard so you say here's the data that I want to visualize. Then you can go through and specify some visualizations that should be created over that data. So here it's creating a table and you're specifying which columns from the dataset should be concluded into that table. Similarly you go through and you add some facets and you configure whether you want it to be light boxed or part of the blog post or so on and so forth. Okay? And then you have the ability to configure templates for the items, but unfortunately since we said it was advanced nobody every actually used that. So we should have left that off. So we put this out, and we actually studied -- we studied the way this tool got used. So here we have some honest usage reporting. So if I go over here, here's one -- here's a dataset, and here is a website qualified QuantNet that's one of the users of our tools. So this is a preexisting blog, which is still loading. And they made this nice visualization of programs for different degrees in quantitative finance that you can get. You can filter by the type of degree you want, where they're -- you can look at a map of where they're located. Sort of usual exhibity stuff. But they did this right in their blog. We also had somebody use the blog to -- use Datapress to manage their publications. And the third sort of case that we looked at closely was somebody who used it to maintain a blog about the music scene in Portland, Maine. >>: [inaudible]. >> David Karger: So we've created a plugin for word press and you download that plugin from the word press site. You install it the same way you install any word press plugin, and there you go. Okay? >>: And could you also go a slightly different direction which is to be able to use exhibit to manage the publications in the blog itself? >> David Karger: Yes. So I haven't yet connected sort of the data editing tool with the blog tool. Although I have to say people seem pretty happy, you know, having their dataset somewhere else and just using -- so here's music in Portland, okay? Now, talking a little bit about these -- that wasn't what I wanted. Hang on. So, yes, we have this quantitative finance guy. Here are some other visualizations that were created using Datapress. Some data dump from data.gov. Publications. The Semantic Web Conference itself. Here is a bar graph of where people -- where the sources of the peoples at this conference. Here's one that I thought was quite interesting. This was a blog post that somebody made. And notice this part. So this guy basically says, you know, this is a thing that lets me use exhibit to put something into my blog. I thought about visualizing this data using exhibit, but I didn't have time or was too lazy to program and so Datapress saved me. It was sort of easy enough to do that I finally went and did it. And again, I think this is the key is if you can make it simple enough, then all of this restrained creativity will burst forth and you'll see lots of this stuff being created. The data can come from a spreadsheet or from -- that is upload to your blog if you really wants it to be yours or else you can just link to a data source that's somewhere else. So for example a Google spreadsheet. You can't link to a Windows Live spreadsheet because they haven't provided a data output for their spreadsheets. Or you can link to the data that's stored in a Wiki somewhere -- in our Wiki extension in wibits somewhere. Or you can link to data that's on somebody else's blog because obviously a lot of what people like to do in their blog postings is talk about other people's blog posting. And so this way you can visualize the way somebody else visualized the data and put up your own visualization of that data to compete with it. You can also create sort of a blog data feed where for example if you are writing a blog of book reviews, then each time you put in a book review you can also put in the structured data for that book. And all of those individual data items become a single feed of data that you can incorporate into a sort of an aggregate posting. So here in the blog post editor you can insert a data item and say what data item you're talking about. So here's a set of templates you can use and you specify what kind of thing that you want to add and fill in the fields for it. And that goes into the blogs database. So we studied this. We got about 120 downloads at the time we were writing the paper. 36 people participated in the study, created 94 visualizations. We got 75,000 page views and only one bug report. So that suggests that we're pretty robust. Because the exhibit framework's been around for quite a few years and had its bug fix -- bugs fixed. And then we have these 3 in-depth interviews on the sites that I showed you, the publications, the quantitative finance network and the factory Portland music website. So here they are again. This is the botanist with their publications, the music lover and the pro blogger. About quantitative finance. So here are some observations about their experiences. Our subjects, they found the Datapress plugin after months of looking. Okay? Which means that in those months of looking, they had a need and it wasn't being met by the tools they were able to find. So I think this provides some evidence that there is a need and that we don't have the tools for it. The botanist actually went so far as to install word press just to be able to use the Datapress plugin. The others had preexisting word press sites. None of them wanted to be hacking HTML source code. Okay? And what was very interesting to me was their limited ambitions. If you look at it, two of the three sites, all they -- all they created using Datapress was a table. And HTML already has tables. Okay? But they wanted the filtering and they wanted the ease of maintenance that instead of having to go in and hack that table, they could just edit their spreadsheet with additional data. Okay? Now, what was also interesting was that we asked them why they had only made the tables, and two hours later the finance network had a map and a timeline of the stuff. They just hadn't again realized -- thought about what they could do with the tool. And so they didn't try. Okay? Again, I made this point before, but the ability to visualize drives the structuring of data. Right? The botanist had an HTML publications page, but once we told her observing you could do all this visualization stuff, then she turned it into a Google spreadsheet. And that added structure to the Web. The musician had a hand-edited table. But they moved their data into a data file in order to create the visualization that they wanted. Two of these three people actually asked for a collaborative -- they wanted their users to be able to collaborate in the management of the data. They didn't know about our wibit project, but that would have been the natural way for them to do that. Somebody asked about sharing back at the beginning. And there were a lot of -there was a very interesting spread of perspectives about what is -- about the sharing. You're making this data available. Does that mean anybody should be able to take it and use it for something else? Well, one person was perfectly happy to share everything. One thought that it was fine to reuse the stuff if there was proper attribution, okay. Another thought this is a little odd but they thought it was fine for somebody to copy the visualization or to copy the data but not both because that would be copying. The musician was the most interesting of all. He actually sort of thought about trying to go out and convince other music oriented people in Portland to turn -- to expose their data so that he could aggregate it into his website and so on. But he also went and learned just enough CSS to hide the data copy button that we provided. Because he didn't want other people to take his data. >>: [inaudible]. >> David Karger: Okay? So making this data -- so, you know, looking at data like this doesn't magically solve all of the copying problems that we -- that we already have with text. In fact, it makes them worse. Because with text you have copyright. With data you can't copyright data. There are court cases, right? If you have facts, you're out of luck. If they're available, anybody else can copy them. And so this -- this -- this does pose a challenge. But what can we do? It's something that we'll have to address the same way as we address other forms of copying on the Web. So I'm going to finish with a couple of perspectives on all of this. First, this is the first webpage from CERN back in 1994 or something like that. This is the CERN webpage today, okay? It's gotten a lot fancier. The big difference is that, you know, there's a lot of CSS, there's guidelines so that how you should style things so that they should be looked at. Well, I think that if we moved in 15 years to splitting content from styling in the next is to split data from content from styling, that every webpage should actually have three parts. Or not every, but a data carrying webpage should explicitly have a data portion and it should have a content part that explains how that data should be formed for visualization and then a style part much in the same way as we're using CSS now. And I think that the common case will be that just like the CSS, the tags for visualization will be static. And it's only the data that will be changing on the page. And this many dramatically improve the ability of people to maintain their information easily, or use it for multiple datasets. All of what we did was client side. Okay? So do we need servers at all? Well, of course. Once the data gets big enough, you're not going to be able to do these kinds of client side computations that we're running, you're going to be want support from a server. And in fact we have a -- we've just gotten a small grant from the Library of Congress to work on this because they have -- they want to use exhibit but their datasets are too large. So they want some further development to provide server side support for the kinds of data interactions that exhibit lets you author. But even as we push down that direction I continues to argue that there's tons of valuable data that comes in small packages where the real challenge is not performance and scaling but simply ease of authoring the visualizations. Authoring data is not complicated and you isn't need a computer scientist to do it. Okay? Everybody is happy putting images and videos into their Web pages and inserting data should be no harder. A lot of data is in fact available on the Web this comes back to the argument I was having with Daniel at the beginning. All sorts of companies offer data APIs and say we're good citizens, we're offering data APIs. Well, this is great for programmers, but what about everybody else? I would much rather see or I would like to see us add on a Web ecology where it becomes normal for people to have data copy buttons on their Web pages, okay? We see a little bit of this in microformat -- in microformat lands. Okay? Where you should just be able to say, okay, I have navigated around on this website, I've gone through it's query process and so on and so forth, now there's a -- this page is showing a bunch of data that I like. Well, let me copy out that data and do something with it. Ideally copy it out as a feed, an updatable URL that I can use if the data ever changes so that I can keep my visualization up to date. I won't even talk about the Semantic Web vision. It's not important for this talk right now. But I'll just conclude by going back to these five problems that I listed at the beginning, right, that people can't communicate effectively, that as a result people don't put data on the Web, and other people can't find it. So I'd like -- I hope -- I believe that the approach that we're taking of making easy data visualization part of the Web ecology would actually help us to address these problems. We do so by separating data from presentation, thinking of data visualization as an authoring process rather than a programming process. And if we did so, anyone would now be able to create interesting data and visualizations which would motivate them to do so and put that data out where other people could access it. And so coming back to what I said right at the beginning to Eric, this is not about creating sophisticated information tools, it's about creating simple tools that let people do the sophisticated work. Okay? These are all of the students who have worked on different pieces of the tools that I've described. I had to asterisk David Huynh because he was the one who really kind of led us off in this direction of simple Web authoring of information visualizations by creating the initial exhibit framework. Okay? So you can play with all of the tools that I've showed you by going to these various websites. And of course you can e-mail me if you want to -- if you want to discuss any of them. >> Eric Horvitz: Thank you very much. [applause]. >>: You said [inaudible] about what you were doing at Microsoft and [inaudible]. >> David Karger: Sure. Sure. So I gave a 20-minute version of this, without the reboot in the middle, to Harry Shum [phonetic] about a year ago, and he thought it was pretty exciting, and we thought that bing might be an environment where this might happen. And I've got a white paper about this that I'm happy to circulate around. The idea is to provide an environment somewhat like many eyes but different in important ways where people can -- are supported in their authoring of data and visualizations of that data. The idea would be that you would go over to a site that's part of bing and you would there find tools. So suppose, for example, that you have a small store front you want to sell a bunch of products or something. And you want to create, you know, a -- some product catalog that people can navigate. Well, bing would give you the tools for creating that product catalog without doing all sorts of fancy software installation and so on. As a first step the products that you want to sell are probably already in bing. So we would give you like a little shopping cart to go through bing's structured data repository and pluck out the items that are part of your catalog. Now you have a dataset. Well, you would perhaps want to augment with that dataset, for example, with your comments about the products and what the prices of the products that you want to sell. Now you want to create a visualization on top of that. Well, that's a Web authoring task. So you -- you know, you use something like DIDO that I just showed to sort of WYSIWYG edit up the way you want your product page to look. When you finish, you pull down that product page and you put it on your website, okay? And the advantage to bing is that by going through this process, you've actually authored additional structured information, right? Your prices, your comments. And because it's structured and bing knows how it was authored, bing has an easy time gathering that additional structured information and enriching its own structured data repository. >>: [inaudible]. >> David Karger: Exactly. Yeah. I've got a whole sort of virtuous cycle picture in my white paper to sort of show how it helps everybody along the way. >>: [inaudible]. >> David Karger: What was that? >>: How has that gone? >> David Karger: How has that gone. I've gotten quite a lot of enthusiasm and not enough development resources. So we're still trying is the basic story, to figure out how to make -- how to move it forward. >>: [inaudible] doing the same thing, trying to bring the [inaudible] SharePoint. >> David Karger: Yeah. Well, you know, I -- and so again, lots of enthusiasm, many presentations to different people on the SharePoint team. And sort of invitations to talk to this person and that person. But actually just as I was coming out this week, I sent a e-mail to ask them if they'd like to meet again and they said oh, we've gotten distracted with something else very big and don't have time to talk about it right now. So that may not be going anywhere. We'll see. >>: [inaudible]. >> David Karger: Okay. So that's the story. So I fear -- so like I said, you know, I do have -- I do have a few words about my other projects, but I think that the time may have been eaten in the reboot. So ->> Eric Horvitz: We could arrange another seminar while you're here if you would like to do that. >> David Karger: I could do that if you want. [applause]