>> Scott Klemmer: It's great to be here today, and, as Jaime mentioned, I'll be talking about some work that many of you actually did, or at least a couple of you actually did. I'd like to start out by an example from my student, Joel Brandt, who was doing a study of how programmers build software in the modern age. And I think many of you are going to resonate with this example. So we asked programmers to come in and build a Web-based chat room, and one of our programmers said, one of the folks in the study says, you know, good grief, I don't even remember the syntax for forms. Now, Jenny's a great programmer. She just happened not to know the syntax. So she goes to a search engine, types in HTML forms, clicks on the first result, goes through to that page where it has an example of how to work with HTML forms, she scrolls down to the bottom, copies this code, pastes it back into her editor, and then, you know, 60 seconds after she went looking for this example, she's now got running code. A few minutes later, she's able to elaborate on that, and she's got a chat room up and running. She doesn't know this language. And her experience was really consistent with what we saw from other folks. 20 percent of the time that programmers in our study were programming, the active thing that they were working with was a Web browser, not an editor. And I think this is really changing how programings work. And I think people have always worked with examples and have always written code by borrowing from stuff that was already around. But I think the Web is really changing the scale at which this is happening. And a lot of what's changing here is the cost of creating, sharing, and accessing examples. And so we thought it might be interesting to explore how it would work to integrate an editor and a search engine into one tool, and so that's what we set off to do. And in collaboration with our friends at Adobe, we created a system called Blueprint. And here's how Blueprint works. It's a plug-in for the Flex Integrated Development Environment, which is in turn written on top of Eclipse. And so here you can see an example of some code in Flex. And what I can do in Blueprint is I can -- we'll zoom in a little bit -- and I can type in a key phrase, much like auto complete you have today. This extends this metaphor to being able to search for examples. I can type in busy cursor, it goes off, finds some results. These are using results that live in a Google search appliance already. And we're presenting them in an example-centric manner as opposed to a page centric manner. I can grab the line that I need, bring that back into -- see some other bits about this, bring that back in. Great. We copy that, paste it in. Notably, it gives me the providence of where I got it from so the person who created this can get credit, and also if bugs or updates happen I can be notified. And I know what's going on. I mean, my code editor today, many of the lines that I write with it are actually pasted in from somewhere else. But my editor has no idea. And so now my editor knows where all the code came from, whether I typed it or whether it came from somewhere else. Now, without Blueprint, here's what you'd need to do. You'd need to go into a separate application, type this in. You'd need to add in the development environment that you're working in. That's something that Blueprint will do automatically for you. You go down and you see a set of page-centric results, and there's a whole bunch of stuff here. If you click on the first result, here you go. And it gives you a whole bunch of stuff, which is much more complex than you need. You know, you got to page down to be able to find all of it. And then to paste that in, it takes a while. Here's another thing that you can do with Blueprint. So if you know, say, the class that's involved but not exactly what the method or the syntax is, you can start typing a few characters, use the existing auto complete functionality to be able to get that class name, hit go from there. And then we'll give you some examples of what you can do with the URLLoader. And so I can poke around and look for the thing that I'm looking for. I can grab this piece of code, and then I can paste that in. Here you can see a better view of the example-centric results that you get in Blueprint and what you get if you type that. And all of the content that we've got in Blueprint right now is the content that's indexed by the Adobe Community Help search engine. And this is really important because there is another tool out there for searching for code. Google has a system called Google Code, and Blueprint is importantly different than Google Code. So Google Code searches everything that it can get its hands on. It's a giant corpus of code. The problem with that is a lot of real software is huge, and so if you search Google Code for set busy cursor, you're going to find online 7,642 of somebody's text file, and they're using busy cursor. And it's very difficult to figure out what is it in this code that I need that's relevant to my example, and what is it that's actually stuff that they're doing from them that I don't need from me. And so by searching the Adobe Community Help documentation and not other stuff, we're only searching things that were meant to be used as examples. This also helps from a legal perspective as well. And it shows you the URL that you got it from, and it can give you a bit of extra information in addition to that. Okay. One of the fun things that we can do additionally is because we're working specifically with Flex in this system, if you've got a running example that we know what to do, we can have that running example be right in the search results view, which is something that's not possible using the existing search engine. And so I can have this button that when I press it shows the busy cursor. So that's pretty darn cool. The way that we built Blueprint is we've got this existing Google search appliance that sits at Adobe and that indexes this particular content. And that's really nice for us because, you know, all of you in industry have figured out how to do search much, much better than we'll ever be able to do at a research lab. And we just want to leverage that. We don't want to write our own search engine. So if the user queries for something like chart, I get that query, it goes to the Blueprint server that sits in the middle before you get to the search engine. And Blueprint caches all of the results from the search appliance for the reason that we're going to present them in a different way, and so we just keep everything cached. It turns out to be a lot faster. And -- but the query will go off to Google, we augment some information about the development environment, it's going to give us some URLs, also things like suggestions that we can pass back to the user. And then in the cached version, we've got the examples that are preformatted for being able to show to you. And then we give you back these results in the example-centric way. And so we show those like you saw. So does example-centric search affect the quality and efficiency of programmers' work? And we did a couple of studies to get at this. The first one we did in the lab where we had 20 professional Flex programmers come in between subjects design, and we gave them some relatively short time frame tasks like retrieve text from a URL and place it in a text box. What we found is that the programmers who are using Blueprint as compared to programmers using the exact same IDE with the exact same corpus of examples, the only difference is the user interface. And it's kind of cool that we were able to run this study. I think it's often difficult to pull that off. We found that the Blueprint users were able to produce code significantly faster. Another thing that we did is we had experts who didn't participate in the study and who were blind to condition rate the quality of the code that the people produced. And they rated the quality of the code of people working with Blueprint as being higher. And when they had a more open-ended task -- for example, do a weather visualization on this set of weather data -they produced higher quality designs as rated by an outside expert who was blind to condition. Now, one thing that you wonder is is this just an artifact of the lab. One of the nice things about the lab is we can control a whole lot, but does it scale beyond a couple hours. And so we rolled this out through Adobe labs for 12 months and we logged user queries and all of their interface actions. After three months -- so we just wrote it to disk for three months. We just let it go and write to disk for three months. After three months we opened it up and we found who the power users were. And we sent in a little note through the user interface that says, hey, if you'd like to talk with us about your experience using Blueprint so that we can make it better, please drop us a line. And we used these interviews in part as a way to generate hypotheses about what we might see in this larger corpus of data. And so we had a couple of hypotheses that I'll share with you. And our comparison point for this is, again, going to be the Adobe Community Help logs, which is within epsilon the identical set of content in both cases, the only difference is the user interface. And we got in Blueprint -- over this first three months we had about 17,000 queries from 2,000 users. And in Community Help, it was about 26,000 queries from about 13,000 users. So those are the two datasets that we're working with. The first thing that our interview participants told us when we talked to them about why they're using Blueprint is that the benefits of being able to have an example-centric view outweigh the drawbacks of missing the context that you might get in a larger Web page. And so one person told us, you know, highlighting the search term of the code is really key. I can scroll through the results quickly. And when I find the code that has it, I can understand the code much faster than I could English. These are professional programmers. These are people who know how to work with code. And so if your answer is in code, that's often quicker. And so what we guessed was that if we're seeing people be able to work with the examples directly, they don't need to click through the page. And so Blueprint is going to have a much lower click-through rate to the final page than you'll see in a traditional search engine. And in fact that's exactly what you see. People using Blueprint click through less than a third as often as they do with the more traditional snippet view that you'd see in a traditional Web search engine. And so we're able to give people the snippets that are valuable for them. Our second hypothesis is that people were able to use the features of the IDE and the features of code search synergistically. And so as you saw in that URLLoader case where I can type a few characters and it would give me the class name and then I could use that as a way to get my search query, people reported doing that a lot. So it does not show up in the data. And what we looked for was are people searching using code more frequently with Blueprint than they would be in a traditional search engine. And in fact that's exactly so. Here's an example of -- we used CamelCase and a few other heuristics as a way to figure out what's code. And in fact that's what you see, is that there are -- half of all query terms fed into Blueprint have CamelCase or other code heuristics in them as opposed to only about a sixth of the stuff without Blueprint. A third thing that we found will I think resonate with all of you who at some point had the debate about is spell check rotting your brain. So when word processors first got spell checkers, teachers worried is spell check going to rot our brain, we're no longer going to need to remember how to spell things. And so, for example, I've decided that I'm going to delegate the spelling of questionnaire to Microsoft Word. I let it remember how to spell questionnaire, and I have better things that I can do with my time. And I think with tools like this we're going to see a similar thing with searching for examples; that there is some stuff where you're just going to delegate to the Web the remembering of specific syntax. And so what one interviewee told us is that Blueprint is really useful for this mid space between when you don't know what you're doing at all and when you're not needing help because you know exactly what you're doing. You have a rough sense of things. You're going to delegate the remembering of the exact syntax to Blueprint. And so we thought, inspired by some of Jaime Teevan's work, that people would re-find more often in Blueprint than they would in Community Help. The same people would search for the same stuff more often. And in fact that's exactly what we see; that people re-find about 57 percent more often with Blueprint than they do with Community Help. So what we've seen so far is that the Web is significantly changing the way that people are programming and that by leveraging the power of examples online we can improve people's ability to program. I think the Web is also changing how people do design work. And so -yeah. >> Before you move on ->> Scott Klemmer: Please. >> -- so I want to know a little bit more about the example corpus that you're actually mining. So it sounded like so these are mostly authored help documents that are in there? >> Scott Klemmer: That's right. So Adobe has tagged a set of stuff, much of which lives on something.adobe.com, but not all of it does. And it includes all of the tutorials and help docs that Adobe offers. It includes a bunch of bloggers who offer the weekly Flex tip update. There's a bunch of other stuff that's been written by third-party people that's been decided is good quality code. And so ->> [inaudible] these are okay ->> Scott Klemmer: That's right. That's right. And then human judge did so before we showed up. I'm not entirely sure -- it's a little bit surprising to me that people take the time to go specifically to the Adobe Community Help search engine, but in fact they do. And I'm guessing the reason for that is that having this garden set of stuff is really handy. So we've seen the value of examples for code, and now I'd like to show a little bit of a value of examples for design. And so we're going to do a poll. Raise your hand if you've ever made a Web page. All right. When you were making the Web page, raise your hand if you used viewing other people's source as part of your strategy for making that Web page. And almost every hand goes up. Great. So that's been my experience too. And you're not alone. Here's my good friend from graduate school, Jimmy Lin. He made this. Several years ago he made this Web page. It's a great Web page. And he wasn't the only one that thought so, and neither was I. So his advisor at the time said, hey, that's a pretty good Web page. I can save myself a whole lot of time by borrowing from Jimmy's Web page. This isn't a wholesale copy. The colors got changed to be James's school colors, he's got an extra gadget here. Several other things have changed. He was able to borrow some things that worked for him and change the things that didn't. Bonnie John saw that and she said, hey, that's pretty cool, I'd like to use this for my homepage too. And so Bonnie's now got this page. She's changed the tabs. She's got six tabs up here, she moved her picture over to the top right. Several other things changed. Borrowing some things, changing others. Mike Krieger made a great Web page for me a couple years ago, and then Jim Holland came to visit and Jim said, hey, that's a nice Web page. So Jim borrowed that and he got it for him. And my friend and neighbor, Dan Jurafsky, liked this page also, and so Dan borrowed many of the same structures for his work. And I think that one of the most powerful user interfaces that I've ever seen from the perspective of being able to scaffold learning is this view source user interface that you see up in your Web browser. And this has been in the Web since the beginning. I'd love to talk to somebody that built one of the early Mosaic browsers to say how intentional was this as a way of getting other people to learn Web pages and how much was this something that just sort of happened by accident or was easy for debugging. I don't know. But it's been in there for a very long time. And what you see, as all of you know, when you go and click on view source is that you'll get a page that looks like this. And you can see how that was implemented. And this stands in stark contrast to the desktop world. So if there's a desktop application that I like something about how it's implemented, it's very, very difficult to say, hey, how did you make that thing. How do I make something like that. Yeah, maybe it's open source, but to be able to get to the exact point where that exact thing is implemented will take you a very long time. And I think what's important about this is not just that the user interface of being able to query for how is something implemented has changed, but rather the Web has offered us this big, giant corpus of trillions of Web pages that offer examples of what you can do in terms of Web design is really inspiring. And designers know this. So here is one example from Flickr of a designer who catalogs pages that she likes for being able to reuse and reference those later. And I think if we were to tell a story that examples are really valuable, we would say that the insight that we get from looking at other stuff helps us figure out how to solve current problems. And that's true not just in design. So here's a classic problem in cognitive psychology experiments: Please connect all nine dots using only four lines. And I bet for those of you who have seen this, the solution is jumping out. And for those of you who haven't seen this, you're going how the hell is anybody going to be able to do this. So you start to do -- let's work it through. So you draw one line. Okay. Let's draw a second line, let's draw a third line, let's draw a fourth line, miss two dots. And we could go around and around again. It's very difficult. Almost nobody gets this by being able to simply reason it through, because the trick, of course, for those of you that have seen this before, is -- so we draw our first line and we draw our second line, and to be able to pull this off, we need to go outside the box. This is where the phrase "thinking outside the box" comes from, is from this nine-dot problem. And then I can draw another line outside the box, and then I draw my fourth line, and now I've connected all nine dots. Consultants love this problem because nobody gets it on their own. And so if you show a client a problem like this, they can't -- the client can't get it. And the consultant says I can show you how to solve this. All of a sudden the consultant looks really smart. And so this has been a mainstay of business consulting for decades. So let's ask a question: Is the color red good? Totally nonsensical question. Makes absolutely no sense because the answer has to be something along the lines of, well, it depends on what you're trying to do. The answer is contextual. If you're trying to make a homepage for Berkeley, red is a terrible color to select as your homepage. If you're trying to make a homepage for Stanford, red is an excellent color to select for your homepage. So there aren't abstract truths as much as there are things that work contextually. And this is one of the challenges of designing with templates. I think that templates are really valuable. However, it's giving me dummy content. And so it's difficult to see is this right or not for my context. They're trying to abstract away a lot of the cues that would help us understand whether it makes a design good. And another challenge of templates is that they take a long time to author, and so it's a pain, and so the number of them is limited. And I think design patterns have a similar drawback; that it takes an -- in fact, it takes much longer to be able to construct a design pattern which has some examples, it's got an explanation, it's got the principle. Design patterns are great for the stuff where there is sort of a principle that you can abstract and then reapply in new situations. And there's a bunch of things that this really worked for. Checkout filter. How many times have you seen a Web site that has a terrible checkout flow. We can tell you how to do it well. We know that answer. We can encapsulate it in a pattern, hand it off to you, you'll be better off for it. But not everything works that way. In fact, the famous photographer Ansel Adams said that there are no rules of composition in photography, there are only good photographs. He's clearly lying. This is a guy who -- I mean, among being one of the most famous photographers, he also wrote the canonical set of photography technique books that was used for decades and decades, and he invented the Zone System, which is algorithm for figuring out how to meter your photographs. And I think the point that Ansel is trying to make is not that there are no rules for photography or there are no heuristics for photography, but rather the abstract knowledge isn't going to work in every single case and the crucible for success is not whether you're implementing a particular principle but rather it works in a particular context. So we wanted to know the answer to the question, and this is the work that Seville [phonetic] helped out on, which is can examples scaffold design ability. And so to pull that off, we built a really simple Web editor. We took Firefox's built-in direct editor. It's like Dreamweaver or any other direct manipulation editor. And we augmented it by having a corpus of examples that Seville harvested off the Web. And you can zoom in and look at those in more detail. And so if I wanted to make a Web page, this is where I do my design work. That's where I look at the bunch of examples, and here's the focus page. And so I can poke around, and we zoom in on that bit in the bottom right, and you can see a bunch of different Web pages, so I can grab one that I like. And so we'll grab that one right there. And then we'll go and we can grab the background color off of that, and it gets applied to this page, and then we find another bit of stuff that we like and so on and so forth. And we can build a page up that way. And one of the hardest parts about asking a question about effective design experimentally is how do you figure out what constitutes good and what's your experimental paradigm in doing this work. And so what we did is we gave people a scenario. And we've done a bunch of experiments in this genre. Here's one of the scenarios that we've given people. So we say Elaine Marsh is a 21-year-old Stanford student. She'd like a page. This is her goal. She's looking for a job. She wants to present this about herself. And then we had people come into the lab and we had -- we did a between-subjects comparison between people who were creating the Elaine -- designing for the Elaine scenario with our examples editor and people who were designing for the Elaine scenario, exact same editor only no examples. And what you see is -- and then -- oh, the fun part about this is that after all of these pages were created, we had people who were blind to condition rate how well the pages that were created met what Elaine asked for as a designer. And so our dependent variable here is not is this a good page, but how does this -- how well does this deliver on Elaine's goals. And what you see is I think all of the things that you would expect. Some of the participants were better designers. Some people get rated much more highly than others. There's a bunch of variation in the raters. Raters don't exactly agree on what's good and what's bad. But you do see some trends emerge. And so pages that were created in the examples condition were rated by these independent raters significantly more highly than those that were in the control condition. As a good manipulation check we found that experienced participants created more highly rated pages than novices. And in this particular study we found no real interaction between expertise and manipulation. So experts and novices in this one task benefited equally from examples. I think this is going to be -- the answer to this is going to be contextually dependent. But that's one data point for you. So one worry that you might have about working with examples is you say, well, we're going to end up with just everybody doing the same thing. There's going to be no variation. We're going to end up with mono culture and that's our worry. And Steven Smith at Texas was really worried about this. And so he ran a study where he asked people to create aliens. If you ran a study where you asked people to come up with the most creative alien that you can, mostly what you get are Martians. It's a really difficult task for people on the spot to come up with something where you're like be creative. Really difficult. And for him, he had two conditions. So on one condition people were able to create aliens without any priming ahead of time. And the other condition he showed them several aliens that all had a couple attributes, like having four legs. And what he found was that the examples do increase conformity; that if you prime people with a bunch of aliens who all have four legs, you're likely to see in your results aliens that have four legs. So we might conclude from this that uh-oh, this is bad news, we're all going to be brainwashed exactly the same way if we follow down this path. But I don't think that's exactly what's going on. And here's why. If you think about the space of all possible designs, most of this space is bad. Most of this space is not what you want. Most of this space is junk. The space of good designs is relatively small. And so Marsh cleverly asked a slightly different question. So same study, same paradigm. The only difference is we're now asking -- as opposed to how diverse are the aliens that people create, we're going to ask the question how many novel features do people's aliens have, where novel is defined as a few other participants came up with that same idea. And if you ask the question how many novel features do people have in their aliens, priming them with four-legged aliens ahead of time has absolutely no effect. Same number of novel features in both conditions. And why might this be? And Marsh's argument that I'm really persuaded by is that if I don't have a good idea, I'm going to borrow from whatever's on the table in front of me that seems to be better than what I've got in my head. But if I do have a good idea, if I do have a creative idea, then I'm going to go with that and I'm not going to be dissuaded by the fact that, well, their alien has four legs on it and I have this idea for one that has three legs. And so in this case what we're seeing is they're not reducing novelty. What's fun about comparing this work with the work that we did is here they don't ask the question of quality at all. There's no notion of what's a good alien. They're purely looking at the diversity of designs. And in the study that we did with Seville, we purely asked the question how well does this achieve this scenario. We didn't ask the question of novelty. And so they answered two slightly different questions. Now that we've got this motivation that, working with examples in design, may really offer a big win, how can we come up with tools that can leverage this. And I think we would want three attributes in such a tool. So one of these is you'd want to have a large pool to draw from. If I can really work with those trillions of pages on the Web, that sure would be cool. It's important that whatever our tool is that's going to give us examples shows the context; that it's not lorem ipsum. It's hi, I'm Elaine and here's my Web page, and I can see whether red is going to be appropriate for me or not. And, lastly, it should be easy to adapt. And I think this is one of the problems that we saw with something like Google Code, is that when you got these real-world examples it can often be difficult to adapt them to using your own context. So these are our three goals. And as we speak, we're working on a really exciting tool called Bricolage that I'll show you some early vision and results from. So here's the scenario. This is work by Ranjitha Kumar and colleagues. And if I've got something like the Stanford Women in Computer Science homepage, Ranjitha says this page is lame. I want a better page. I want a more exciting page than this. And so in our vision you could go out on the Web and find some page that you like the design of better. So here's one that we like the design of better. And then what I can do is I can take my content and this page's layout and automatically synthesize a new page. And this right here is a -- actually, this probably is built with our system, but we'll call it a vision for now, just in case it's not. And you can see all of the content from the Women in Computer Science is slotted in here, but it's got the design and the style of the page that Ranjitha found on the Web. In order to be able to do this, what we're going to do is we're going to say every Web page is a tree, and we're going to start out by saying and that tree for starters is its DOM. And by saying that the DOM is a rough approximation of the tree works out pretty well as a starting point. But as any of you who do Web development know, the underlying tree representation and the perceptual tree representation aren't the same. So it's a good seed. And then what we're going to do is we're going to use computer vision to be able to take this -- you know, the DOM and transform it to what we would want the perceptual representation to be. And we extend an existing algorithm called the Vision-based Page Segmentation algorithm. And the game that we're going to play is can we correspond, once we've done this transformation into a perceptual tree, the nodes of one place into the nodes of another place. And if we've got the correspondences, then we can shuffle the content across. So here's the kind of thing that we're going to do. So at the high level we've got these two pages, and so I'm going to say aha, the root node is here and then map to the root node here, and here's a big content node that maps to a big content node here. But one challenge is that ancestry gets violated. And so the classic computer science way of doing tree mapping generally enforces that ancestry must be maintained. So if I'm the child of another node on one side of the mapping, I have to be a child of another node on the other side of the mapping. And in order to be able to get around this, we're going to use an optimization-based approach where we say in an ideal world we would like ancestry to be maintained, but if the semantics of the page, it just really doesn't work, then we're going to allow it to be violated. And so we're going to assign a cost to that. How do we know what constitutes a good mapping? And I think this isn't really something that can be solved formally. It's really an empirical request. So we've gone out on the Web. And we've gone to Mechanical Turk. And we showed Turkers pairs of pages and we asked them, for this thing on the left, where does it match on the right. And you can see the screen shot with the green shows after they clicked on the appropriate spot on the right. Okay. So then I can show you another bit, and we find the correspondence for that, find the correspondence for that. And when you're done, at some random interval, I think every five or seven things that you match, it asks why, because we're interested in gathering not just what is the mapping but why is the mapping what it is. And we do this only sporadically because otherwise it gets really annoying, and I think it may actually, if we did it all the time, change what people reported. And so the question that we're all dying to know is how often do the raters agree. Because if everybody has a different opinion about what goes where, we're toast. There's no way that we're going to be able to use that corpus of ratings to be able to design new stuff. Turns out people agree pretty often. So if we cleave our dataset into things where they're structurally dissimilar and things that are structurally similar, and that's really just an eyeball test of do they feel like they're about the same or not. In both cases, you see that people agree on at least, you know, ish, three quarters in the mappings. If you pairwise similarity between two raters, do two raters agree. Three quarters of the time yes, even when they're structurally dissimilar, and much more often than that when they're structurally similar. And so what this says is that we could actually probably leverage this training corpus. As you'd expect, there are some things where everybody agrees on the same thing. And so here's an example of an organizational element that every single person in our study maps as the same thing between the two pages. So here's the one on the left, and here's the one on the right, and they're shown in green boxes. Here's one that has similar semantics. And so in this case you've got a search bar on the left and a search bar on the right, and everybody mapped those two together. There are other things where people had much more divergence. And so I think one thing that we may see come out of this work is somehow confidence gets baked into the propagation algorithm. And I don't know yet whether that's going to be -- that the UI says we've mapped all the high-confidence stuff, you're on your own for the low-confidence stuff, where we say here are our three best mappings, you can pick which one you like. But I think this is going to be -- this is going to be really exciting. And so stay tuned for more results. The last thing I want to talk about today is we've seen how having a bunch of examples that are accident in the world can help me come up with better designs. And the next thing I'd like to explore is purely on the generation side are there design strategies that we can use that are relatively simpler that simply make designers more creative. And I'd like to start with an anecdote. So Bayles and Orland report a possibly [inaudible] tale. The ceramics teacher who divided his class in two and told half of the class make as many different things as you can and told the other half of the -- tells the first half of the class, you're going to be graded on quantity. Your grade is totally volume. Tells the other half of the class, you're going to be graded on quality. Come up with the best piece of ceramics that you can; that's going to be your grade for this class. And what they found is that while the quantity group was busily churning up piles of work and learning from their mistakes, the quality group sat theorizing about perfection and in the end had little more to show for their efforts than grandiose theories and piles of dead clay. And so in this story the value of coming up with a thousand songs to have that one great song is really made salient. And we kind of wanted to know, you know, can we measure this in the lab. And I think one of the reasons why we wanted to do this is that if I tell this story to folks in industry, many people really resonate with it. And a large group of other people says that's a great story, Scott, but you have to understand in our work we have really limited time constraints. And so while it would be wonderful to come up with many different design alternatives, we don't have the time. And so we wanted to be able to explore if time is really constrained are people better off for exploration and iteration or should they gun for that one perfect thing. And we needed a Petri dish for being able to explore this work. And here's the first one that we came up with. So something that was going to be a good Petri dish needs to have a couple attributes. We need to have some measure of success. But unlike most of the work that's been done in the psychological literature, think the nine-dot problem, there need to be many paths to success. With the nine-dot problem, there's one way to go and you either see it or you don't. But design isn't like that. There's a bunch of different ways to achieve good. And so we need something where good is measurable and there's many different paths. So we came up with the egg drop. And for those of you who haven't done an egg drop yourself, here we are with my office window. And we're throwing a contraption we built three stories out the window onto the ground. And here's the egg that survived. So that's an example of -- so our dependent variable is going to be how high can you throw this thing from without the egg breaking. And what's really nice about this is that there are many different paths to success. What we saw was that the iteration group, where we forced them to iterate rapidly, did much better than the noniteration group. But out of this study actually I don't think the quantitative results are the interesting part. I think it's the qualitative stuff that turned out to be much more interesting. And so what we saw is that, independent of condition, participants picked one idea and stuck with it. [video playing] >> Scott Klemmer: He's not the only guy who feels that way. So Karl Duncker, back in the 1940s, was fascinated by this idea of functional fixation; that you get stuck into seeing the world one way and you can't believe that there is another way to think about things. And so he gave participants -- this is a drawing of what he gave people physically, is you give people a box of tacks, a candle, and a book of matches and you say please affix the candle to the wall such that none of the wax drips on the table. And it takes people a long time to be able to figure out a solution to this problem, and success rates are relatively low. Now, if you make one small twist to the way that you make this go -- well, I'll show you the solution first. So the solution to this is that you need to be able to takes the box itself and use that as a holder for the candle. And, as you can imagine, the reason why it takes people so long to see this problem is that they don't see the box as a box. They don't see it as an element that they can work with. So we wondered can we do a simple intervention that will limit the amount of functional fixation that people have. And so we asked the question how does prototyping in parallel as opposed to a serial approach affect the design performance. And here we're going to shift away from the egg drop. As fun as it was, we wanted to get back to something that was more computer-like, and we wanted to have something where the dependent measure was something that really resonated with the software world. And the insight that we had here was to have participants create an ad. And so we have some friends that run a design magazine, and we convinced them that we were going to have participants create ads for their design magazine. And MySpace has an ad creation tool that is really easy to use. And so the general strategy here is people create ads using this tool, and then we're going to roll them out over the Web. So last summer we hosted 2 million ad impressions on MySpace. And we had 33 folks come into the lab and they were -- we put them in one of two conditions. They either got put in a serial condition where we marched them through creating six iterations of a design, or we put them in a parallel condition where they created three, got feedback, created two more, got feedback, and created one. Here they're getting feedback after each one. So the number of units of feedback -- and I'll explain what that is in a moment -- is held constant across conditions. The number of prototypes that they create is held constant across conditions. And the total amount of time is held constant across conditions. And then we took the final ad that we created and we sent that, and that's what we hosted up on MySpace last summer. Here's the critique that we gave people. We went to two advertising professionals, and we had them give us critique feedback on a bunch of designs. And then we took their specific critiques and we generalized them to be more like a pearl of design wisdom. And so you see things about the overall theme, about the composition and layout, or about surface features of the design. And so each -- for each ad, participants got three units of feedback that we gave them about their design. And in addition to -- so the performance measures that we've got here -- yeah. >> [inaudible] >> Scott Klemmer: Please. Yeah. >> So you're saying that you give them these generic nuggets of wisdom as opposed to giving them feedback that was actually tailored to specifically to their ->> Scott Klemmer: I think it's both, actually, is the answer. This is an actual design that one of the participants came up with, and this is the actual feedback that they got for that design. And so we selected feedback that was meant to be relevant for them. But the feedback was all canned and the feedback was all precanned as opposed to us generating it on the spot for a host of reasons. But we did have a big bag of choices with which to pick three that were going to be relevant for them. The question about feedback is an excellent one, and we can -- I have a much longer answer to that question that we can answer offline. And so the dependent variables that we got here are we've got click-through rate. So the fun thing about doing ads is you get to ask the question how many people click through. There's a danger of using just click-through rates as your dependent variable, which is that if you told me have as many clicks as possible, I think I would have an ad that said something like free iPod, and then everybody clicks through and then they get to this design magazine site and they say, well, what's the deal, that's not what I signed up for. And so we also measured -- the nice thing about having friends as our client was that we put Google Analytics on their site and we're measuring how long people spend on the site once they get there. And so are these people getting what they were looking for. And we're getting expert ratings from both the client, the editorial board of the magazine, and advertising professionals who are all blind to condition. And so one of the fun things that's going to pop out of this work, in addition to the question about parallel versus serial, is it's the first time that I've been able to find that you're taking these common, modern, quantitative measures and asking how well they correlate with the much fuzzier measures that you get out of professionals. So I think the most important result is that people came up with a whole bunch of different ads. Some of them are great, some of them are terrible, some of them are creative, some of them are banal. They employed a whole bunch of different strategies. And it was really neat to see what people came up with. Here they all are right here. And what we see -- and I'm actually a little surprised that this actually worked, is that people who created ads in the parallel condition had ads that were clicked through at a significantly higher rate than ads that were created in the serial condition. Pretty cool. And, additionally, visitors from the parallel condition spent more time on the client site than those in the serial condition. And so not only were more people coming through but for each person that clicked through, they were happier by this measure with what they got as a result. >> Do you also have numbers for how much of a difference there was between iterations, like how much did that bias actually help? >> Scott Klemmer: Your question about how much of a difference was there between iterations is excellent. And in this study we only rolled out on MySpace the final one. We did -- we've thought about rolling out all the intermediate ones. One of the challenges of this paradigm is that you can eventually saturate the market for ads about design magazines. And so that would suggest that if you're going to implement this paradigm, which I think is the great way for studying design, you may want to pick something where the appetite for ads of this sort is really big. One thing that you could do that would be very cool is have your domain be something like donate to Haiti and your dependent variable is how much cash did the group get. That could be really fun. All right. And experts rated the parallel ads more highly than the serial ads, and this difference is significant also. And so in general the experts and the numbers agree, though on any individual ad you may well see some variance. So why did parallel outperform serial. And I'm ->> Do you have a picture of [inaudible]? >> Scott Klemmer: What's that? >> Do you have a picture of [inaudible]? >> Scott Klemmer: I do. It's the one with a bunch of hands, and the ad is ambidextrous. It's really poetic. It's really clever. And what's notable about it -- let me see -- it may be one of the ones that I showed at the beginning. So here's what's cool about this is that what you see -- I think it's fair to say that we selectively picked these two -- is that in the serial case they got an idea and they're just kind of tweaking it through the whole time. Whereas in the parallel case, somebody comes up with three initial ideas. And it's not until the third one that you see this hand thing emerge at all. And then they come up with a fourth idea that's totally different. And on their fifth idea when they're searching around for images, they come across this design. And then after doing all of that it's not until the final one that the thing really coheres. And so that was really -- I think you do see this pattern in general, and we'll see some numbers that back that up. And so my first theory about why parallel outperforms serial ->> Can I just ->> Scott Klemmer: Yeah, please. >> These were professional designers? Design students? Naive? >> Scott Klemmer: These were all design students. Doing it with professionals would also be really fun. Great question. So I think one of the things that you're getting out of parallel is implicitly the ability to compare the effect of your designs -- you'll be able to compare multiple different versions. And we see this in the educational psychology literature. So Dedre Gentner and colleagues did a study where they had business students, and interestingly they gave them either a classic case-based approach or they asked them also in addition to giving a couple of cases -- asked people to draw the parallels between the situations. And what they found is that there was about a factor of three transfer win when you ask people to compare -- explicitly compare than if you just gave them multiple cases. And so if we're seeing a win already simply by merely having multiple alternatives, we might see an even bigger win of parallel if we asked people to explicitly draw a comparison. And I think that's an exciting opportunity for future work. The second reason why I think that parallel offered a big win is the ability to ideate broadly. So what we see in something like a serial participant is this same fixation. So somebody says, you know, I tried to find a good idea and they use that idea to keep improving it, so I pretty much stuck with the same idea. And here's another thread of serial where, yeah, they pretty much stuck with the same idea. We wanted to be able to test this. And so what we did is in this case we took all six ads that this person produced, and this is a within-person measure. And we're going to take all 15 edges that those six nodes construct, and we're going to ask Mechanical Turkers online how similar are these two ads. And we're going to have Turkers do a whole bunch of these, because the first time you see this question it's gibberish and it takes a little while to be able to calibrate. And so within each cluster of six how similar are all of the edge-wise pairs. And what you see is that ads in the parallel condition are rated as significantly less similar than ads in the serial condition. So it is in fact the case that people in the parallel condition were exploring the design space more broadly than those in the serial condition. I think the last reason why this is valuable is parallel gives people a better critique orientation. And we have a short video that shows exactly what that means, which is here's somebody whose got in the serial condition, and they're talking about the feedback they got. >> Video playing: These guys, you know, are telling me that I am completely doing something wrong here. So it took me a while to get past the I'm a failure at this and to, okay, how can I go about fixing it in the ways they suggested. So there's a short period where the emotional response overwhelmed any positive like logical impact that this ended up having. >> Scott Klemmer: So I think in the serial condition I think people really felt beaten down by getting repeated critique, whereas in the parallel condition people felt like I have another avenue, there's something else. Oh, I got this negative feedback here, but not in the other case. And this I think really resonates with one of Bill Buxton and Maryam Tohidi's results about the value of paper prototyping and testing with users multiple alternatives. And what they found, unsurprisingly, is that users don't really have a vocabulary for talking about the quality of interfaces. So if you give people one, they mostly say I like it. Whereas if you give people three, it offers them a vocabulary of talking about the differences between the interfaces, and you can get much more useful feedback. This idea of examples being valuable in getting people to be more creative, we've seen it here sort of at the micro level, improving a design, improving a piece of software. I think this holds at the macro level as well. And it goes back to the slide show I was showing at the beginning that some of you may have seen, with a quote from Picasso, the good artists borrow, great artists steal. And he wasn't kidding. So this is the guy who in many ways invented modern art and cubism. And here's one of the paintings that was seen as a real turning point for that, Les Demoiselles d'Avignon. And about four years before painting this painting, Picasso's friend takes him to the Trocadéro museum, the ethnographic museum of Paris, where there are all these African sculptures including this 19th century Fang sculpture, and it's there that Picasso first sees the -- a bunch of artistic styles that really work and were by and large at that time unavailable in European art. And so you can say in a lot of ways that the insight of cubism was being able to take stuff that was preexisting but in a totally different domain and being able to see how that was made relevant in this case. And what this shows to me is that if you're talking about novices or if you're talking about minor increments, you want to be able to see examples that are proximal and have experiences that are proximal. Once you become an expert, all of that proximal stuff you've got baked into RAM. You know that already. And so I think the real opportunity for export-oriented tools is asking how do we get people to transcend and go to further afield domains. And this leads me to what I wanted to share with all of you I think are a couple of exciting research questions. One is how can we get people to take on bigger tasks using examples. So as opposed to just one Web page, how do you say lets adapt this navigation element that spans an entire site. How can we move this to other domains. So we've looked at Web design, we've looked at programming, we've looked at egg drops. What else might we be able to do here. One is how can we get people to scaffold expertise. Can we come up with a bunch of, in essence, learning tools that help people become more expert. And then once you're an expert in Web design you may not need that tool that was valuable when you're a novice. You may need a different tool. I think search is a huge opportunity here. It's pretty clear from the various sets of designs that I've showed that it would be difficult to find any of those using traditional Web search. Keyword based search just isn't going to give you minimalist page designs or a baroque page design. It just -- you need something else entirely. And the optimal solution may not be language based at all. But I think the design space for working here is really exciting. I think we've seen that patterns are valuable and that templates are valuable and examples are valuable. And each has sort of a different set of benefits and drawbacks. How can we integrate the best of all of these. And, lastly, inspired by the Picasso quote, how can we enable people to find and adapt distant examples, things that are much further away, and how can we facilitate things that are both real and technical that enable content creators to share what they're willing to share, to not share what they're not willing to share, and to the extent possible facilitate an ethical open culture. And with that I'll take questions. Thanks very much. [applause] >> Scott Klemmer: Yeah. >> So when you were showing the videos with the egg drop things [inaudible]. >> Scott Klemmer: Right. >> Right? And I'm so used to working in teams in my profession that it didn't even occur to me that you're trying to do something like that [inaudible] first of all. And then I thought your parallel study was like working in a group in that you had -- when you work in a group you always have parallel ideas that competed and you quickly sort out the good from the bad and then you focus in. And I was wondering if you could somehow construct a study where you compared all people working in pairs versus people working alone and you get double the results from the people in pairs weighted against the same result from a person alone or something. >> Scott Klemmer: Your question is excellent. I think that -- well, the simplest answer is to say groups are clearly different than individuals. And we wanted to do individuals first to sort of get a baseline upon which we could do more complex stuff. Because groups are not only more different than individuals, they're more complex. And I think that's clearly a next frontier to tackle. One obvious benefit is the one that you mentioned, you know, two heads are better than one, many hands make light work. A drawback that we often see is this sense of ideas being mine or yours, and that can influence decision-making. And I think a lot of the -- Bob Sutton, who's done a lot of the literature on group brainstorming, points out that while -- when people seek measures for brainstorming being a win, where they look at something like numbers of ideas created, they don't find anything, and they conclude that groups brainstorming is a foolish endeavor. What Bob pointed out is that one of the values of group brainstorming is the ability to launder who came up with the idea. And so in a more traditional team process, it's very clear whose idea is what. One of the things that emerges out of a well-structured brainstorm is that you launder who the origin of that idea is, and by the end of the brainstorm all of us feel like all of the ideas are ours. And so I think working with groups is a really exciting thing to do. Scott. >> So just kind of building on that, I'm kind of curious if you have -- [inaudible] worked with designers [inaudible] do you have any kind of intuition for how different you think that design -the ad design class would be if it was like a three-person design firm who works together on a daily basis churning these things out? Like the type of feedback and when it comes in -- but would the impact be totally different or similar or do you guys have any intuition for that? I don't know how it would work. >> Scott Klemmer: My hunch is that a lot of the basic principles that we're seeing here would still hold. I think that in academia and industry, and every other setting I've been in, I do believe that people don't diverge enough early in ideation. So we see serial iteration far too often. And I think even if you're talking about experts and even if you're talking about groups, you're going to see big benefits of parallel exploration earlier on. I think another thing that we see out in the real world is that people are loathe to actually commit to something. And so it's amazing -- and the design classes that I teach, I'm amazed that students are reluctant to diverge early on when the best thing to do is to diverge, and they're reluctant to converge later on because they feel like they're not quite -- they're not ready to commit and they wait too long to actually commit to something. And I think we see that pattern out in the real world too. >> What do you think the sweet spot is for where the computer leaves off and the human picks up? >> Scott Klemmer: I just read a couple of days ago a report -- there's a blog that claims that in some markets Facebook has rolled out and will automatically create an ad for you tool. And I haven't gotten a chance to try it yet. I think there are a couple of cool opportunities. One is to say can we get computers to automatically do some of these things, and I think some of the time the answer is yeah. I think with Ranjitha's work we may be able to make -- take some content and automatically synthesize something that looks pretty good, at least as a starting point. And so for me the question -- really I kind of view it as everybody races to the finish and everybody's going to win a little bit. So I think automated techniques, there's a lot of really exciting work to be done there. I think one interesting midpoint is to do a design gallery's approach. So a bunch of years ago in computer graphics Joe Marks and colleagues built this system called Design Galleries where it would auto render a bunch of different designs for you that had different parameters of, say, a planet blowing up, which is the kind of thing -- it's very difficult to specify a language or specify a priori to a system I want the planet to blow up 7 on a scale of 1 to 10, whereas if you have the system render out 30 versions, you can certainly say I want that one. And so I think the recognition over recall benefit will be one of the ways to split the human computer where the computer generates a bunch of ideas and the human says that one, either just that one or that one as a starting point that I'll then tweak further. So to me one of the things that was really interesting about your parallel versus the serial design study is that the serial folks, while they lose, they don't lose by much. They're only 20 percent worse than the folks who did it in parallel. And it could be for a couple of reasons. So one could be they go with your gut idea. You know, your first idea is your best one. So maybe that's why they did almost as good. And then the other one is the power of polish, like maybe the fact [inaudible] five more times. Maybe their first idea kind of sucked but then after five iterations it was almost as good as the parallel one. So it would be interesting to kind of try to tease that out a little bit. I guess one way you could do it with your existing data is to see how often in the parallel case did they choose to publish their very first idea versus how often was it the second or third one that made it to the finish line. >> Scott Klemmer: Right. So you raised several good questions. One is they didn't win by much. On one hand I think that's right. I think our hope is that we can come up with let's say four or five or these nuggets that together give you a huge win. I think the other thing to say is I believe that 20 percent is actually a lot. You know, if you win a hundred-yard dash by 20 percent, that'd be insane. I think it's probably fair to say that one's favorite interactive phone and the second best interactive phone, they're less than 20 percent different, but the market share difference has been huge. So 20 percent may be big. As to your point about when did people diverge and when did they come up with different alternatives, part of the answer to your question comes in out of our Mechanical Turk study of the aggregate difference between all of the designs and the fact that you see the parallel as being more different suggests that they did explore multiple different options. It doesn't tell you when, and I agree that that would be an interesting thing to look at more. And I certainly don't think that there's like magic 3-2-1. I think how much did I diverge has to be contextually dependent. >> So with your parallel study it seems like -- or my interpretation at least was that the conclusion you're proposing is that working on multiple concepts at the same time is beneficial in terms of the quality of your final product. >> Scott Klemmer: Beautiful way to say it, yes. >> It seems to me like based on the method you presented, it would be equally valid to conclude that receiving feedback on multiple concepts at the same time is more beneficial than receiving feedback on only one concept at a time. So I'm wondering if you [inaudible] in any studies. >> Scott Klemmer: So in our current study we've conflated, come up with three designs simultaneously and come up with three however you like but get feedback on them simultaneously. And that was sort of intentional because it's actually kind of hard to work on three things literally in parallel. And so we've operationalized the meaning of parallel to be bang out multiple before you start to evaluate them or get feedback on them. >> Well, I guess why I was wondering is it seemed -- at least from what you presented, it was 3-2-1, and the first three were pretty different, and then the two in stage 2 were pretty different, and that led to one final product. And I was thinking maybe there could be a control condition where you have version 1.0, 1.1, and 1.2, kind of all the same idea, you get feedback, then you have 2.2, 2.1, and then 3.0. And I was wondering if you think that might provide you with some -- an additional comparison that would be meaningful above what you have in your current data. >> Scott Klemmer: I think then you're asking the question do we want to do some way of manipulating how broadly do people explore, how broadly do they fan out. And I think, yeah, absolutely. And I think the interesting thing for -- some things that would be meaningful in terms of what is the impact of fan out would be do people stumble on ideas that matter if we get them to fan out more, or I think a big one is going to be cost reasoning. That in the serial case you really see a lot of not only fixation but also some cost reasoning. And so in the -- one benefit of parallel may be that if I don't have a horse in the race in the same way that I do with serial, then I have less cost reasoning in the parallel case. And so I think what's notable is that post hoc I offered you three reasons why I believe parallel won, and clearly one wants to go on and figure out, okay, of the three stories I told you that explain the data, is it all of those, is it some of those, can you manipulate them independently. I agree. >> [inaudible] the imprint stuff was great and you did on the SQL team [inaudible] study some of our stuff. And, yeah, everyone -- the first thing they do when [inaudible] a problem to them, they open up Google or Bing or whatever, they start searching, and unfortunately, however, there's quite a few cases where people picked the first example, it looked like the problem we gave them but wasn't at all. And it only had sort of some of the surface characteristics, but they didn't really think about what the question's really asking them, and they just grab something and it's like whoa, there's a little picture, the picture kind of looked like what we asked them to build from a diagram point of view and it didn't really get what the example was giving them, and they went off on -- like it was finally about 15 minutes -- these sites, we're not supposed to interfere, and then finally after about 15 minutes we just told the guy, look, you're way off track here, this isn't at all what you're supposed to do be doing. And he got worse and worse on [inaudible] because he just tried it. He was convinced that the example was going to help him, and he kept banging on it harder and harder. >> Scott Klemmer: I've been there. I piloted a recent study in our group that was a follow on to one of these, and I was that guy. And so I think I have two insights for you hopefully. One of them is I'm going to claim that Blueprint largely solves the problem that you've seen, which is that by presenting things in an example-centric way, we've seen that erroneous rat-holing a lot less than we did when we were running studies on a page-based search engine. And so I believe that example-centric search will solve much of your problem there. >> I agree. I think it's more suited to atoms than molecules, though. >> Scott Klemmer: I totally agree. I think that's a great way of saying it. And I agree. I think what's exciting about this is that we've only solved the atom problem. We haven't solved the molecule problem. Cool area for future work. >> [inaudible] >> Scott Klemmer: So I have -- here's one initial direction that I think would be fruitful to go off the -- if what I'm searching for is more structural in nature and I am -- it's -- one of the big differences between novice performance and expert performance across a wide variety of domains is that novices overqueue on surface features whereas experts are more likely to see the deep structure of the problem and not be distracted by the surface structure of the problem. General difference between novice and experts. One -- the best medicine I've seen against novices making that error is to ask them to explicitly generate the deep structure of the problem. And so if you have an explicit reflection step, my hypothesis is that you'd see a whole lot less rat-holing on following the wrong surface features. And so you can do a simple study which I think you would see a huge win for and that we've talked about doing, which is doing simply add in a reflection step into any of these things and see do you get a notable difference in search behavior. My bet is yeah. >> Translate that into [inaudible] it's an excellent observation. >> Scott Klemmer: I have some thoughts that we'll talk more offline for that. I think it's a really exciting direction. It's hard to get people to do. I mean, I think the classroom is the best leverage we have, and even there it's hard. >> Yeah. I mean, how do -- highly reactive environments where you can do fast iterations, you get feedback immediately about whether the thing is working or not. I know those certainly help from a development point of view. >> Scott Klemmer: Yes. >> Jaime Teevan: Thank you, Scott. [applause]