>> Scott Klemmer: It's great to be here today,... some work that many of you actually did, or at...

advertisement
>> Scott Klemmer: It's great to be here today, and, as Jaime mentioned, I'll be talking about
some work that many of you actually did, or at least a couple of you actually did.
I'd like to start out by an example from my student, Joel Brandt, who was doing a study of how
programmers build software in the modern age. And I think many of you are going to resonate
with this example.
So we asked programmers to come in and build a Web-based chat room, and one of our
programmers said, one of the folks in the study says, you know, good grief, I don't even
remember the syntax for forms.
Now, Jenny's a great programmer. She just happened not to know the syntax. So she goes to a
search engine, types in HTML forms, clicks on the first result, goes through to that page where it
has an example of how to work with HTML forms, she scrolls down to the bottom, copies this
code, pastes it back into her editor, and then, you know, 60 seconds after she went looking for
this example, she's now got running code. A few minutes later, she's able to elaborate on that,
and she's got a chat room up and running. She doesn't know this language.
And her experience was really consistent with what we saw from other folks. 20 percent of the
time that programmers in our study were programming, the active thing that they were working
with was a Web browser, not an editor. And I think this is really changing how programings
work.
And I think people have always worked with examples and have always written code by
borrowing from stuff that was already around. But I think the Web is really changing the scale at
which this is happening. And a lot of what's changing here is the cost of creating, sharing, and
accessing examples.
And so we thought it might be interesting to explore how it would work to integrate an editor and
a search engine into one tool, and so that's what we set off to do. And in collaboration with our
friends at Adobe, we created a system called Blueprint. And here's how Blueprint works. It's a
plug-in for the Flex Integrated Development Environment, which is in turn written on top of
Eclipse.
And so here you can see an example of some code in Flex. And what I can do in Blueprint is I
can -- we'll zoom in a little bit -- and I can type in a key phrase, much like auto complete you
have today. This extends this metaphor to being able to search for examples. I can type in busy
cursor, it goes off, finds some results.
These are using results that live in a Google search appliance already. And we're presenting
them in an example-centric manner as opposed to a page centric manner. I can grab the line that
I need, bring that back into -- see some other bits about this, bring that back in. Great. We copy
that, paste it in.
Notably, it gives me the providence of where I got it from so the person who created this can get
credit, and also if bugs or updates happen I can be notified. And I know what's going on. I
mean, my code editor today, many of the lines that I write with it are actually pasted in from
somewhere else. But my editor has no idea. And so now my editor knows where all the code
came from, whether I typed it or whether it came from somewhere else.
Now, without Blueprint, here's what you'd need to do. You'd need to go into a separate
application, type this in. You'd need to add in the development environment that you're working
in. That's something that Blueprint will do automatically for you. You go down and you see a
set of page-centric results, and there's a whole bunch of stuff here.
If you click on the first result, here you go. And it gives you a whole bunch of stuff, which is
much more complex than you need. You know, you got to page down to be able to find all of it.
And then to paste that in, it takes a while.
Here's another thing that you can do with Blueprint. So if you know, say, the class that's
involved but not exactly what the method or the syntax is, you can start typing a few characters,
use the existing auto complete functionality to be able to get that class name, hit go from there.
And then we'll give you some examples of what you can do with the URLLoader. And so I can
poke around and look for the thing that I'm looking for. I can grab this piece of code, and then I
can paste that in.
Here you can see a better view of the example-centric results that you get in Blueprint and what
you get if you type that. And all of the content that we've got in Blueprint right now is the
content that's indexed by the Adobe Community Help search engine. And this is really
important because there is another tool out there for searching for code. Google has a system
called Google Code, and Blueprint is importantly different than Google Code.
So Google Code searches everything that it can get its hands on. It's a giant corpus of code. The
problem with that is a lot of real software is huge, and so if you search Google Code for set busy
cursor, you're going to find online 7,642 of somebody's text file, and they're using busy cursor.
And it's very difficult to figure out what is it in this code that I need that's relevant to my
example, and what is it that's actually stuff that they're doing from them that I don't need from
me.
And so by searching the Adobe Community Help documentation and not other stuff, we're only
searching things that were meant to be used as examples. This also helps from a legal
perspective as well.
And it shows you the URL that you got it from, and it can give you a bit of extra information in
addition to that.
Okay. One of the fun things that we can do additionally is because we're working specifically
with Flex in this system, if you've got a running example that we know what to do, we can have
that running example be right in the search results view, which is something that's not possible
using the existing search engine. And so I can have this button that when I press it shows the
busy cursor. So that's pretty darn cool.
The way that we built Blueprint is we've got this existing Google search appliance that sits at
Adobe and that indexes this particular content. And that's really nice for us because, you know,
all of you in industry have figured out how to do search much, much better than we'll ever be
able to do at a research lab. And we just want to leverage that. We don't want to write our own
search engine.
So if the user queries for something like chart, I get that query, it goes to the Blueprint server that
sits in the middle before you get to the search engine. And Blueprint caches all of the results
from the search appliance for the reason that we're going to present them in a different way, and
so we just keep everything cached. It turns out to be a lot faster.
And -- but the query will go off to Google, we augment some information about the development
environment, it's going to give us some URLs, also things like suggestions that we can pass back
to the user. And then in the cached version, we've got the examples that are preformatted for
being able to show to you. And then we give you back these results in the example-centric way.
And so we show those like you saw.
So does example-centric search affect the quality and efficiency of programmers' work? And we
did a couple of studies to get at this. The first one we did in the lab where we had 20
professional Flex programmers come in between subjects design, and we gave them some
relatively short time frame tasks like retrieve text from a URL and place it in a text box.
What we found is that the programmers who are using Blueprint as compared to programmers
using the exact same IDE with the exact same corpus of examples, the only difference is the user
interface. And it's kind of cool that we were able to run this study. I think it's often difficult to
pull that off. We found that the Blueprint users were able to produce code significantly faster.
Another thing that we did is we had experts who didn't participate in the study and who were
blind to condition rate the quality of the code that the people produced. And they rated the
quality of the code of people working with Blueprint as being higher. And when they had a
more open-ended task -- for example, do a weather visualization on this set of weather data -they produced higher quality designs as rated by an outside expert who was blind to condition.
Now, one thing that you wonder is is this just an artifact of the lab. One of the nice things about
the lab is we can control a whole lot, but does it scale beyond a couple hours. And so we rolled
this out through Adobe labs for 12 months and we logged user queries and all of their interface
actions. After three months -- so we just wrote it to disk for three months. We just let it go and
write to disk for three months.
After three months we opened it up and we found who the power users were. And we sent in a
little note through the user interface that says, hey, if you'd like to talk with us about your
experience using Blueprint so that we can make it better, please drop us a line.
And we used these interviews in part as a way to generate hypotheses about what we might see
in this larger corpus of data.
And so we had a couple of hypotheses that I'll share with you. And our comparison point for this
is, again, going to be the Adobe Community Help logs, which is within epsilon the identical set
of content in both cases, the only difference is the user interface.
And we got in Blueprint -- over this first three months we had about 17,000 queries from 2,000
users. And in Community Help, it was about 26,000 queries from about 13,000 users. So those
are the two datasets that we're working with.
The first thing that our interview participants told us when we talked to them about why they're
using Blueprint is that the benefits of being able to have an example-centric view outweigh the
drawbacks of missing the context that you might get in a larger Web page.
And so one person told us, you know, highlighting the search term of the code is really key. I
can scroll through the results quickly. And when I find the code that has it, I can understand the
code much faster than I could English. These are professional programmers. These are people
who know how to work with code. And so if your answer is in code, that's often quicker.
And so what we guessed was that if we're seeing people be able to work with the examples
directly, they don't need to click through the page. And so Blueprint is going to have a much
lower click-through rate to the final page than you'll see in a traditional search engine.
And in fact that's exactly what you see. People using Blueprint click through less than a third as
often as they do with the more traditional snippet view that you'd see in a traditional Web search
engine. And so we're able to give people the snippets that are valuable for them.
Our second hypothesis is that people were able to use the features of the IDE and the features of
code search synergistically. And so as you saw in that URLLoader case where I can type a few
characters and it would give me the class name and then I could use that as a way to get my
search query, people reported doing that a lot. So it does not show up in the data.
And what we looked for was are people searching using code more frequently with Blueprint
than they would be in a traditional search engine. And in fact that's exactly so. Here's an
example of -- we used CamelCase and a few other heuristics as a way to figure out what's code.
And in fact that's what you see, is that there are -- half of all query terms fed into Blueprint have
CamelCase or other code heuristics in them as opposed to only about a sixth of the stuff without
Blueprint.
A third thing that we found will I think resonate with all of you who at some point had the debate
about is spell check rotting your brain. So when word processors first got spell checkers,
teachers worried is spell check going to rot our brain, we're no longer going to need to remember
how to spell things. And so, for example, I've decided that I'm going to delegate the spelling of
questionnaire to Microsoft Word. I let it remember how to spell questionnaire, and I have better
things that I can do with my time.
And I think with tools like this we're going to see a similar thing with searching for examples;
that there is some stuff where you're just going to delegate to the Web the remembering of
specific syntax.
And so what one interviewee told us is that Blueprint is really useful for this mid space between
when you don't know what you're doing at all and when you're not needing help because you
know exactly what you're doing. You have a rough sense of things. You're going to delegate the
remembering of the exact syntax to Blueprint.
And so we thought, inspired by some of Jaime Teevan's work, that people would re-find more
often in Blueprint than they would in Community Help. The same people would search for the
same stuff more often. And in fact that's exactly what we see; that people re-find about 57
percent more often with Blueprint than they do with Community Help.
So what we've seen so far is that the Web is significantly changing the way that people are
programming and that by leveraging the power of examples online we can improve people's
ability to program. I think the Web is also changing how people do design work. And so -yeah.
>> Before you move on ->> Scott Klemmer: Please.
>> -- so I want to know a little bit more about the example corpus that you're actually mining.
So it sounded like so these are mostly authored help documents that are in there?
>> Scott Klemmer: That's right. So Adobe has tagged a set of stuff, much of which lives on
something.adobe.com, but not all of it does. And it includes all of the tutorials and help docs that
Adobe offers. It includes a bunch of bloggers who offer the weekly Flex tip update. There's a
bunch of other stuff that's been written by third-party people that's been decided is good quality
code. And so ->> [inaudible] these are okay ->> Scott Klemmer: That's right. That's right. And then human judge did so before we showed
up. I'm not entirely sure -- it's a little bit surprising to me that people take the time to go
specifically to the Adobe Community Help search engine, but in fact they do. And I'm guessing
the reason for that is that having this garden set of stuff is really handy.
So we've seen the value of examples for code, and now I'd like to show a little bit of a value of
examples for design. And so we're going to do a poll. Raise your hand if you've ever made a
Web page. All right. When you were making the Web page, raise your hand if you used
viewing other people's source as part of your strategy for making that Web page. And almost
every hand goes up. Great.
So that's been my experience too. And you're not alone. Here's my good friend from graduate
school, Jimmy Lin. He made this. Several years ago he made this Web page. It's a great Web
page. And he wasn't the only one that thought so, and neither was I.
So his advisor at the time said, hey, that's a pretty good Web page. I can save myself a whole lot
of time by borrowing from Jimmy's Web page. This isn't a wholesale copy. The colors got
changed to be James's school colors, he's got an extra gadget here. Several other things have
changed. He was able to borrow some things that worked for him and change the things that
didn't.
Bonnie John saw that and she said, hey, that's pretty cool, I'd like to use this for my homepage
too. And so Bonnie's now got this page. She's changed the tabs. She's got six tabs up here, she
moved her picture over to the top right. Several other things changed. Borrowing some things,
changing others.
Mike Krieger made a great Web page for me a couple years ago, and then Jim Holland came to
visit and Jim said, hey, that's a nice Web page. So Jim borrowed that and he got it for him. And
my friend and neighbor, Dan Jurafsky, liked this page also, and so Dan borrowed many of the
same structures for his work.
And I think that one of the most powerful user interfaces that I've ever seen from the perspective
of being able to scaffold learning is this view source user interface that you see up in your Web
browser. And this has been in the Web since the beginning.
I'd love to talk to somebody that built one of the early Mosaic browsers to say how intentional
was this as a way of getting other people to learn Web pages and how much was this something
that just sort of happened by accident or was easy for debugging. I don't know. But it's been in
there for a very long time. And what you see, as all of you know, when you go and click on
view source is that you'll get a page that looks like this. And you can see how that was
implemented.
And this stands in stark contrast to the desktop world. So if there's a desktop application that I
like something about how it's implemented, it's very, very difficult to say, hey, how did you
make that thing. How do I make something like that. Yeah, maybe it's open source, but to be
able to get to the exact point where that exact thing is implemented will take you a very long
time.
And I think what's important about this is not just that the user interface of being able to query
for how is something implemented has changed, but rather the Web has offered us this big, giant
corpus of trillions of Web pages that offer examples of what you can do in terms of Web design
is really inspiring. And designers know this.
So here is one example from Flickr of a designer who catalogs pages that she likes for being able
to reuse and reference those later. And I think if we were to tell a story that examples are really
valuable, we would say that the insight that we get from looking at other stuff helps us figure out
how to solve current problems.
And that's true not just in design. So here's a classic problem in cognitive psychology
experiments: Please connect all nine dots using only four lines. And I bet for those of you who
have seen this, the solution is jumping out. And for those of you who haven't seen this, you're
going how the hell is anybody going to be able to do this.
So you start to do -- let's work it through. So you draw one line. Okay. Let's draw a second
line, let's draw a third line, let's draw a fourth line, miss two dots. And we could go around and
around again. It's very difficult. Almost nobody gets this by being able to simply reason it
through, because the trick, of course, for those of you that have seen this before, is -- so we draw
our first line and we draw our second line, and to be able to pull this off, we need to go outside
the box.
This is where the phrase "thinking outside the box" comes from, is from this nine-dot problem.
And then I can draw another line outside the box, and then I draw my fourth line, and now I've
connected all nine dots.
Consultants love this problem because nobody gets it on their own. And so if you show a client
a problem like this, they can't -- the client can't get it. And the consultant says I can show you
how to solve this. All of a sudden the consultant looks really smart. And so this has been a
mainstay of business consulting for decades.
So let's ask a question: Is the color red good? Totally nonsensical question. Makes absolutely
no sense because the answer has to be something along the lines of, well, it depends on what
you're trying to do. The answer is contextual. If you're trying to make a homepage for Berkeley,
red is a terrible color to select as your homepage. If you're trying to make a homepage for
Stanford, red is an excellent color to select for your homepage.
So there aren't abstract truths as much as there are things that work contextually. And this is one
of the challenges of designing with templates. I think that templates are really valuable.
However, it's giving me dummy content. And so it's difficult to see is this right or not for my
context. They're trying to abstract away a lot of the cues that would help us understand whether
it makes a design good.
And another challenge of templates is that they take a long time to author, and so it's a pain, and
so the number of them is limited. And I think design patterns have a similar drawback; that it
takes an -- in fact, it takes much longer to be able to construct a design pattern which has some
examples, it's got an explanation, it's got the principle. Design patterns are great for the stuff
where there is sort of a principle that you can abstract and then reapply in new situations.
And there's a bunch of things that this really worked for. Checkout filter. How many times have
you seen a Web site that has a terrible checkout flow. We can tell you how to do it well. We
know that answer. We can encapsulate it in a pattern, hand it off to you, you'll be better off for
it.
But not everything works that way. In fact, the famous photographer Ansel Adams said that
there are no rules of composition in photography, there are only good photographs. He's clearly
lying. This is a guy who -- I mean, among being one of the most famous photographers, he also
wrote the canonical set of photography technique books that was used for decades and decades,
and he invented the Zone System, which is algorithm for figuring out how to meter your
photographs.
And I think the point that Ansel is trying to make is not that there are no rules for photography or
there are no heuristics for photography, but rather the abstract knowledge isn't going to work in
every single case and the crucible for success is not whether you're implementing a particular
principle but rather it works in a particular context.
So we wanted to know the answer to the question, and this is the work that Seville [phonetic]
helped out on, which is can examples scaffold design ability.
And so to pull that off, we built a really simple Web editor. We took Firefox's built-in direct
editor. It's like Dreamweaver or any other direct manipulation editor. And we augmented it by
having a corpus of examples that Seville harvested off the Web. And you can zoom in and look
at those in more detail. And so if I wanted to make a Web page, this is where I do my design
work. That's where I look at the bunch of examples, and here's the focus page.
And so I can poke around, and we zoom in on that bit in the bottom right, and you can see a
bunch of different Web pages, so I can grab one that I like. And so we'll grab that one right
there. And then we'll go and we can grab the background color off of that, and it gets applied to
this page, and then we find another bit of stuff that we like and so on and so forth. And we can
build a page up that way.
And one of the hardest parts about asking a question about effective design experimentally is
how do you figure out what constitutes good and what's your experimental paradigm in doing
this work. And so what we did is we gave people a scenario. And we've done a bunch of
experiments in this genre.
Here's one of the scenarios that we've given people. So we say Elaine Marsh is a 21-year-old
Stanford student. She'd like a page. This is her goal. She's looking for a job. She wants to
present this about herself. And then we had people come into the lab and we had -- we did a
between-subjects comparison between people who were creating the Elaine -- designing for the
Elaine scenario with our examples editor and people who were designing for the Elaine scenario,
exact same editor only no examples.
And what you see is -- and then -- oh, the fun part about this is that after all of these pages were
created, we had people who were blind to condition rate how well the pages that were created
met what Elaine asked for as a designer. And so our dependent variable here is not is this a good
page, but how does this -- how well does this deliver on Elaine's goals.
And what you see is I think all of the things that you would expect. Some of the participants
were better designers. Some people get rated much more highly than others. There's a bunch of
variation in the raters. Raters don't exactly agree on what's good and what's bad. But you do see
some trends emerge. And so pages that were created in the examples condition were rated by
these independent raters significantly more highly than those that were in the control condition.
As a good manipulation check we found that experienced participants created more highly rated
pages than novices. And in this particular study we found no real interaction between expertise
and manipulation. So experts and novices in this one task benefited equally from examples. I
think this is going to be -- the answer to this is going to be contextually dependent. But that's
one data point for you.
So one worry that you might have about working with examples is you say, well, we're going to
end up with just everybody doing the same thing. There's going to be no variation. We're going
to end up with mono culture and that's our worry. And Steven Smith at Texas was really worried
about this.
And so he ran a study where he asked people to create aliens. If you ran a study where you
asked people to come up with the most creative alien that you can, mostly what you get are
Martians. It's a really difficult task for people on the spot to come up with something where
you're like be creative. Really difficult.
And for him, he had two conditions. So on one condition people were able to create aliens
without any priming ahead of time. And the other condition he showed them several aliens that
all had a couple attributes, like having four legs. And what he found was that the examples do
increase conformity; that if you prime people with a bunch of aliens who all have four legs,
you're likely to see in your results aliens that have four legs.
So we might conclude from this that uh-oh, this is bad news, we're all going to be brainwashed
exactly the same way if we follow down this path. But I don't think that's exactly what's going
on. And here's why. If you think about the space of all possible designs, most of this space is
bad. Most of this space is not what you want. Most of this space is junk. The space of good
designs is relatively small.
And so Marsh cleverly asked a slightly different question. So same study, same paradigm. The
only difference is we're now asking -- as opposed to how diverse are the aliens that people create,
we're going to ask the question how many novel features do people's aliens have, where novel is
defined as a few other participants came up with that same idea.
And if you ask the question how many novel features do people have in their aliens, priming
them with four-legged aliens ahead of time has absolutely no effect.
Same number of novel features in both conditions. And why might this be? And Marsh's
argument that I'm really persuaded by is that if I don't have a good idea, I'm going to borrow
from whatever's on the table in front of me that seems to be better than what I've got in my head.
But if I do have a good idea, if I do have a creative idea, then I'm going to go with that and I'm
not going to be dissuaded by the fact that, well, their alien has four legs on it and I have this idea
for one that has three legs.
And so in this case what we're seeing is they're not reducing novelty. What's fun about
comparing this work with the work that we did is here they don't ask the question of quality at
all. There's no notion of what's a good alien. They're purely looking at the diversity of designs.
And in the study that we did with Seville, we purely asked the question how well does this
achieve this scenario. We didn't ask the question of novelty. And so they answered two slightly
different questions.
Now that we've got this motivation that, working with examples in design, may really offer a big
win, how can we come up with tools that can leverage this. And I think we would want three
attributes in such a tool. So one of these is you'd want to have a large pool to draw from. If I
can really work with those trillions of pages on the Web, that sure would be cool.
It's important that whatever our tool is that's going to give us examples shows the context; that
it's not lorem ipsum. It's hi, I'm Elaine and here's my Web page, and I can see whether red is
going to be appropriate for me or not.
And, lastly, it should be easy to adapt. And I think this is one of the problems that we saw with
something like Google Code, is that when you got these real-world examples it can often be
difficult to adapt them to using your own context.
So these are our three goals. And as we speak, we're working on a really exciting tool called
Bricolage that I'll show you some early vision and results from.
So here's the scenario. This is work by Ranjitha Kumar and colleagues. And if I've got
something like the Stanford Women in Computer Science homepage, Ranjitha says this page is
lame. I want a better page. I want a more exciting page than this.
And so in our vision you could go out on the Web and find some page that you like the design of
better. So here's one that we like the design of better. And then what I can do is I can take my
content and this page's layout and automatically synthesize a new page. And this right here is
a -- actually, this probably is built with our system, but we'll call it a vision for now, just in case
it's not. And you can see all of the content from the Women in Computer Science is slotted in
here, but it's got the design and the style of the page that Ranjitha found on the Web.
In order to be able to do this, what we're going to do is we're going to say every Web page is a
tree, and we're going to start out by saying and that tree for starters is its DOM. And by saying
that the DOM is a rough approximation of the tree works out pretty well as a starting point.
But as any of you who do Web development know, the underlying tree representation and the
perceptual tree representation aren't the same. So it's a good seed. And then what we're going to
do is we're going to use computer vision to be able to take this -- you know, the DOM and
transform it to what we would want the perceptual representation to be. And we extend an
existing algorithm called the Vision-based Page Segmentation algorithm.
And the game that we're going to play is can we correspond, once we've done this transformation
into a perceptual tree, the nodes of one place into the nodes of another place. And if we've got
the correspondences, then we can shuffle the content across.
So here's the kind of thing that we're going to do. So at the high level we've got these two pages,
and so I'm going to say aha, the root node is here and then map to the root node here, and here's a
big content node that maps to a big content node here.
But one challenge is that ancestry gets violated. And so the classic computer science way of
doing tree mapping generally enforces that ancestry must be maintained. So if I'm the child of
another node on one side of the mapping, I have to be a child of another node on the other side of
the mapping.
And in order to be able to get around this, we're going to use an optimization-based approach
where we say in an ideal world we would like ancestry to be maintained, but if the semantics of
the page, it just really doesn't work, then we're going to allow it to be violated. And so we're
going to assign a cost to that.
How do we know what constitutes a good mapping? And I think this isn't really something that
can be solved formally. It's really an empirical request. So we've gone out on the Web. And
we've gone to Mechanical Turk. And we showed Turkers pairs of pages and we asked them, for
this thing on the left, where does it match on the right. And you can see the screen shot with the
green shows after they clicked on the appropriate spot on the right.
Okay. So then I can show you another bit, and we find the correspondence for that, find the
correspondence for that. And when you're done, at some random interval, I think every five or
seven things that you match, it asks why, because we're interested in gathering not just what is
the mapping but why is the mapping what it is.
And we do this only sporadically because otherwise it gets really annoying, and I think it may
actually, if we did it all the time, change what people reported.
And so the question that we're all dying to know is how often do the raters agree. Because if
everybody has a different opinion about what goes where, we're toast. There's no way that we're
going to be able to use that corpus of ratings to be able to design new stuff.
Turns out people agree pretty often. So if we cleave our dataset into things where they're
structurally dissimilar and things that are structurally similar, and that's really just an eyeball test
of do they feel like they're about the same or not.
In both cases, you see that people agree on at least, you know, ish, three quarters in the
mappings. If you pairwise similarity between two raters, do two raters agree. Three quarters of
the time yes, even when they're structurally dissimilar, and much more often than that when
they're structurally similar. And so what this says is that we could actually probably leverage
this training corpus.
As you'd expect, there are some things where everybody agrees on the same thing. And so here's
an example of an organizational element that every single person in our study maps as the same
thing between the two pages. So here's the one on the left, and here's the one on the right, and
they're shown in green boxes.
Here's one that has similar semantics. And so in this case you've got a search bar on the left and
a search bar on the right, and everybody mapped those two together. There are other things
where people had much more divergence.
And so I think one thing that we may see come out of this work is somehow confidence gets
baked into the propagation algorithm. And I don't know yet whether that's going to be -- that the
UI says we've mapped all the high-confidence stuff, you're on your own for the low-confidence
stuff, where we say here are our three best mappings, you can pick which one you like. But I
think this is going to be -- this is going to be really exciting. And so stay tuned for more results.
The last thing I want to talk about today is we've seen how having a bunch of examples that are
accident in the world can help me come up with better designs.
And the next thing I'd like to explore is purely on the generation side are there design strategies
that we can use that are relatively simpler that simply make designers more creative. And I'd like
to start with an anecdote.
So Bayles and Orland report a possibly [inaudible] tale. The ceramics teacher who divided his
class in two and told half of the class make as many different things as you can and told the other
half of the -- tells the first half of the class, you're going to be graded on quantity. Your grade is
totally volume. Tells the other half of the class, you're going to be graded on quality. Come up
with the best piece of ceramics that you can; that's going to be your grade for this class.
And what they found is that while the quantity group was busily churning up piles of work and
learning from their mistakes, the quality group sat theorizing about perfection and in the end had
little more to show for their efforts than grandiose theories and piles of dead clay.
And so in this story the value of coming up with a thousand songs to have that one great song is
really made salient. And we kind of wanted to know, you know, can we measure this in the lab.
And I think one of the reasons why we wanted to do this is that if I tell this story to folks in
industry, many people really resonate with it. And a large group of other people says that's a
great story, Scott, but you have to understand in our work we have really limited time
constraints. And so while it would be wonderful to come up with many different design
alternatives, we don't have the time.
And so we wanted to be able to explore if time is really constrained are people better off for
exploration and iteration or should they gun for that one perfect thing. And we needed a Petri
dish for being able to explore this work. And here's the first one that we came up with.
So something that was going to be a good Petri dish needs to have a couple attributes. We need
to have some measure of success. But unlike most of the work that's been done in the
psychological literature, think the nine-dot problem, there need to be many paths to success.
With the nine-dot problem, there's one way to go and you either see it or you don't. But design
isn't like that. There's a bunch of different ways to achieve good. And so we need something
where good is measurable and there's many different paths.
So we came up with the egg drop. And for those of you who haven't done an egg drop yourself,
here we are with my office window. And we're throwing a contraption we built three stories out
the window onto the ground. And here's the egg that survived.
So that's an example of -- so our dependent variable is going to be how high can you throw this
thing from without the egg breaking. And what's really nice about this is that there are many
different paths to success.
What we saw was that the iteration group, where we forced them to iterate rapidly, did much
better than the noniteration group. But out of this study actually I don't think the quantitative
results are the interesting part. I think it's the qualitative stuff that turned out to be much more
interesting. And so what we saw is that, independent of condition, participants picked one idea
and stuck with it.
[video playing]
>> Scott Klemmer: He's not the only guy who feels that way. So Karl Duncker, back in the
1940s, was fascinated by this idea of functional fixation; that you get stuck into seeing the world
one way and you can't believe that there is another way to think about things. And so he gave
participants -- this is a drawing of what he gave people physically, is you give people a box of
tacks, a candle, and a book of matches and you say please affix the candle to the wall such that
none of the wax drips on the table. And it takes people a long time to be able to figure out a
solution to this problem, and success rates are relatively low.
Now, if you make one small twist to the way that you make this go -- well, I'll show you the
solution first. So the solution to this is that you need to be able to takes the box itself and use
that as a holder for the candle. And, as you can imagine, the reason why it takes people so long
to see this problem is that they don't see the box as a box. They don't see it as an element that
they can work with. So we wondered can we do a simple intervention that will limit the amount
of functional fixation that people have.
And so we asked the question how does prototyping in parallel as opposed to a serial approach
affect the design performance. And here we're going to shift away from the egg drop. As fun as
it was, we wanted to get back to something that was more computer-like, and we wanted to have
something where the dependent measure was something that really resonated with the software
world. And the insight that we had here was to have participants create an ad.
And so we have some friends that run a design magazine, and we convinced them that we were
going to have participants create ads for their design magazine. And MySpace has an ad creation
tool that is really easy to use. And so the general strategy here is people create ads using this
tool, and then we're going to roll them out over the Web.
So last summer we hosted 2 million ad impressions on MySpace. And we had 33 folks come
into the lab and they were -- we put them in one of two conditions. They either got put in a serial
condition where we marched them through creating six iterations of a design, or we put them in a
parallel condition where they created three, got feedback, created two more, got feedback, and
created one. Here they're getting feedback after each one.
So the number of units of feedback -- and I'll explain what that is in a moment -- is held constant
across conditions. The number of prototypes that they create is held constant across conditions.
And the total amount of time is held constant across conditions.
And then we took the final ad that we created and we sent that, and that's what we hosted up on
MySpace last summer.
Here's the critique that we gave people. We went to two advertising professionals, and we had
them give us critique feedback on a bunch of designs. And then we took their specific critiques
and we generalized them to be more like a pearl of design wisdom.
And so you see things about the overall theme, about the composition and layout, or about
surface features of the design. And so each -- for each ad, participants got three units of
feedback that we gave them about their design.
And in addition to -- so the performance measures that we've got here -- yeah.
>> [inaudible]
>> Scott Klemmer: Please. Yeah.
>> So you're saying that you give them these generic nuggets of wisdom as opposed to giving
them feedback that was actually tailored to specifically to their ->> Scott Klemmer: I think it's both, actually, is the answer. This is an actual design that one of
the participants came up with, and this is the actual feedback that they got for that design. And
so we selected feedback that was meant to be relevant for them. But the feedback was all canned
and the feedback was all precanned as opposed to us generating it on the spot for a host of
reasons. But we did have a big bag of choices with which to pick three that were going to be
relevant for them.
The question about feedback is an excellent one, and we can -- I have a much longer answer to
that question that we can answer offline.
And so the dependent variables that we got here are we've got click-through rate. So the fun
thing about doing ads is you get to ask the question how many people click through. There's a
danger of using just click-through rates as your dependent variable, which is that if you told me
have as many clicks as possible, I think I would have an ad that said something like free iPod,
and then everybody clicks through and then they get to this design magazine site and they say,
well, what's the deal, that's not what I signed up for.
And so we also measured -- the nice thing about having friends as our client was that we put
Google Analytics on their site and we're measuring how long people spend on the site once they
get there. And so are these people getting what they were looking for.
And we're getting expert ratings from both the client, the editorial board of the magazine, and
advertising professionals who are all blind to condition. And so one of the fun things that's
going to pop out of this work, in addition to the question about parallel versus serial, is it's the
first time that I've been able to find that you're taking these common, modern, quantitative
measures and asking how well they correlate with the much fuzzier measures that you get out of
professionals.
So I think the most important result is that people came up with a whole bunch of different ads.
Some of them are great, some of them are terrible, some of them are creative, some of them are
banal. They employed a whole bunch of different strategies. And it was really neat to see what
people came up with. Here they all are right here.
And what we see -- and I'm actually a little surprised that this actually worked, is that people
who created ads in the parallel condition had ads that were clicked through at a significantly
higher rate than ads that were created in the serial condition. Pretty cool.
And, additionally, visitors from the parallel condition spent more time on the client site than
those in the serial condition. And so not only were more people coming through but for each
person that clicked through, they were happier by this measure with what they got as a result.
>> Do you also have numbers for how much of a difference there was between iterations, like
how much did that bias actually help?
>> Scott Klemmer: Your question about how much of a difference was there between iterations
is excellent. And in this study we only rolled out on MySpace the final one. We did -- we've
thought about rolling out all the intermediate ones.
One of the challenges of this paradigm is that you can eventually saturate the market for ads
about design magazines. And so that would suggest that if you're going to implement this
paradigm, which I think is the great way for studying design, you may want to pick something
where the appetite for ads of this sort is really big.
One thing that you could do that would be very cool is have your domain be something like
donate to Haiti and your dependent variable is how much cash did the group get. That could be
really fun.
All right. And experts rated the parallel ads more highly than the serial ads, and this difference is
significant also. And so in general the experts and the numbers agree, though on any individual
ad you may well see some variance.
So why did parallel outperform serial. And I'm ->> Do you have a picture of [inaudible]?
>> Scott Klemmer: What's that?
>> Do you have a picture of [inaudible]?
>> Scott Klemmer: I do. It's the one with a bunch of hands, and the ad is ambidextrous. It's
really poetic. It's really clever. And what's notable about it -- let me see -- it may be one of the
ones that I showed at the beginning.
So here's what's cool about this is that what you see -- I think it's fair to say that we selectively
picked these two -- is that in the serial case they got an idea and they're just kind of tweaking it
through the whole time. Whereas in the parallel case, somebody comes up with three initial
ideas. And it's not until the third one that you see this hand thing emerge at all. And then they
come up with a fourth idea that's totally different. And on their fifth idea when they're searching
around for images, they come across this design. And then after doing all of that it's not until the
final one that the thing really coheres. And so that was really -- I think you do see this pattern in
general, and we'll see some numbers that back that up.
And so my first theory about why parallel outperforms serial ->> Can I just ->> Scott Klemmer: Yeah, please.
>> These were professional designers? Design students? Naive?
>> Scott Klemmer: These were all design students. Doing it with professionals would also be
really fun. Great question. So I think one of the things that you're getting out of parallel is
implicitly the ability to compare the effect of your designs -- you'll be able to compare multiple
different versions.
And we see this in the educational psychology literature. So Dedre Gentner and colleagues did a
study where they had business students, and interestingly they gave them either a classic
case-based approach or they asked them also in addition to giving a couple of cases -- asked
people to draw the parallels between the situations.
And what they found is that there was about a factor of three transfer win when you ask people
to compare -- explicitly compare than if you just gave them multiple cases. And so if we're
seeing a win already simply by merely having multiple alternatives, we might see an even bigger
win of parallel if we asked people to explicitly draw a comparison. And I think that's an exciting
opportunity for future work.
The second reason why I think that parallel offered a big win is the ability to ideate broadly. So
what we see in something like a serial participant is this same fixation. So somebody says, you
know, I tried to find a good idea and they use that idea to keep improving it, so I pretty much
stuck with the same idea. And here's another thread of serial where, yeah, they pretty much
stuck with the same idea.
We wanted to be able to test this. And so what we did is in this case we took all six ads that this
person produced, and this is a within-person measure. And we're going to take all 15 edges that
those six nodes construct, and we're going to ask Mechanical Turkers online how similar are
these two ads. And we're going to have Turkers do a whole bunch of these, because the first
time you see this question it's gibberish and it takes a little while to be able to calibrate. And so
within each cluster of six how similar are all of the edge-wise pairs.
And what you see is that ads in the parallel condition are rated as significantly less similar than
ads in the serial condition. So it is in fact the case that people in the parallel condition were
exploring the design space more broadly than those in the serial condition.
I think the last reason why this is valuable is parallel gives people a better critique orientation.
And we have a short video that shows exactly what that means, which is here's somebody whose
got in the serial condition, and they're talking about the feedback they got.
>> Video playing: These guys, you know, are telling me that I am completely doing something
wrong here. So it took me a while to get past the I'm a failure at this and to, okay, how can I go
about fixing it in the ways they suggested. So there's a short period where the emotional
response overwhelmed any positive like logical impact that this ended up having.
>> Scott Klemmer: So I think in the serial condition I think people really felt beaten down by
getting repeated critique, whereas in the parallel condition people felt like I have another avenue,
there's something else. Oh, I got this negative feedback here, but not in the other case.
And this I think really resonates with one of Bill Buxton and Maryam Tohidi's results about the
value of paper prototyping and testing with users multiple alternatives. And what they found,
unsurprisingly, is that users don't really have a vocabulary for talking about the quality of
interfaces. So if you give people one, they mostly say I like it. Whereas if you give people
three, it offers them a vocabulary of talking about the differences between the interfaces, and you
can get much more useful feedback.
This idea of examples being valuable in getting people to be more creative, we've seen it here
sort of at the micro level, improving a design, improving a piece of software. I think this holds at
the macro level as well. And it goes back to the slide show I was showing at the beginning that
some of you may have seen, with a quote from Picasso, the good artists borrow, great artists
steal.
And he wasn't kidding. So this is the guy who in many ways invented modern art and cubism.
And here's one of the paintings that was seen as a real turning point for that, Les Demoiselles
d'Avignon. And about four years before painting this painting, Picasso's friend takes him to the
Trocadéro museum, the ethnographic museum of Paris, where there are all these African
sculptures including this 19th century Fang sculpture, and it's there that Picasso first sees the -- a
bunch of artistic styles that really work and were by and large at that time unavailable in
European art.
And so you can say in a lot of ways that the insight of cubism was being able to take stuff that
was preexisting but in a totally different domain and being able to see how that was made
relevant in this case.
And what this shows to me is that if you're talking about novices or if you're talking about minor
increments, you want to be able to see examples that are proximal and have experiences that are
proximal. Once you become an expert, all of that proximal stuff you've got baked into RAM.
You know that already. And so I think the real opportunity for export-oriented tools is asking
how do we get people to transcend and go to further afield domains.
And this leads me to what I wanted to share with all of you I think are a couple of exciting
research questions. One is how can we get people to take on bigger tasks using examples. So as
opposed to just one Web page, how do you say lets adapt this navigation element that spans an
entire site. How can we move this to other domains.
So we've looked at Web design, we've looked at programming, we've looked at egg drops. What
else might we be able to do here. One is how can we get people to scaffold expertise. Can we
come up with a bunch of, in essence, learning tools that help people become more expert. And
then once you're an expert in Web design you may not need that tool that was valuable when
you're a novice. You may need a different tool.
I think search is a huge opportunity here. It's pretty clear from the various sets of designs that
I've showed that it would be difficult to find any of those using traditional Web search. Keyword
based search just isn't going to give you minimalist page designs or a baroque page design. It
just -- you need something else entirely. And the optimal solution may not be language based at
all. But I think the design space for working here is really exciting.
I think we've seen that patterns are valuable and that templates are valuable and examples are
valuable. And each has sort of a different set of benefits and drawbacks. How can we integrate
the best of all of these.
And, lastly, inspired by the Picasso quote, how can we enable people to find and adapt distant
examples, things that are much further away, and how can we facilitate things that are both real
and technical that enable content creators to share what they're willing to share, to not share what
they're not willing to share, and to the extent possible facilitate an ethical open culture.
And with that I'll take questions. Thanks very much.
[applause]
>> Scott Klemmer: Yeah.
>> So when you were showing the videos with the egg drop things [inaudible].
>> Scott Klemmer: Right.
>> Right? And I'm so used to working in teams in my profession that it didn't even occur to me
that you're trying to do something like that [inaudible] first of all. And then I thought your
parallel study was like working in a group in that you had -- when you work in a group you
always have parallel ideas that competed and you quickly sort out the good from the bad and
then you focus in.
And I was wondering if you could somehow construct a study where you compared all people
working in pairs versus people working alone and you get double the results from the people in
pairs weighted against the same result from a person alone or something.
>> Scott Klemmer: Your question is excellent. I think that -- well, the simplest answer is to say
groups are clearly different than individuals. And we wanted to do individuals first to sort of get
a baseline upon which we could do more complex stuff. Because groups are not only more
different than individuals, they're more complex. And I think that's clearly a next frontier to
tackle. One obvious benefit is the one that you mentioned, you know, two heads are better than
one, many hands make light work.
A drawback that we often see is this sense of ideas being mine or yours, and that can influence
decision-making. And I think a lot of the -- Bob Sutton, who's done a lot of the literature on
group brainstorming, points out that while -- when people seek measures for brainstorming being
a win, where they look at something like numbers of ideas created, they don't find anything, and
they conclude that groups brainstorming is a foolish endeavor.
What Bob pointed out is that one of the values of group brainstorming is the ability to launder
who came up with the idea. And so in a more traditional team process, it's very clear whose idea
is what.
One of the things that emerges out of a well-structured brainstorm is that you launder who the
origin of that idea is, and by the end of the brainstorm all of us feel like all of the ideas are ours.
And so I think working with groups is a really exciting thing to do.
Scott.
>> So just kind of building on that, I'm kind of curious if you have -- [inaudible] worked with
designers [inaudible] do you have any kind of intuition for how different you think that design -the ad design class would be if it was like a three-person design firm who works together on a
daily basis churning these things out? Like the type of feedback and when it comes in -- but
would the impact be totally different or similar or do you guys have any intuition for that? I
don't know how it would work.
>> Scott Klemmer: My hunch is that a lot of the basic principles that we're seeing here would
still hold. I think that in academia and industry, and every other setting I've been in, I do believe
that people don't diverge enough early in ideation. So we see serial iteration far too often. And I
think even if you're talking about experts and even if you're talking about groups, you're going to
see big benefits of parallel exploration earlier on.
I think another thing that we see out in the real world is that people are loathe to actually commit
to something. And so it's amazing -- and the design classes that I teach, I'm amazed that students
are reluctant to diverge early on when the best thing to do is to diverge, and they're reluctant to
converge later on because they feel like they're not quite -- they're not ready to commit and they
wait too long to actually commit to something. And I think we see that pattern out in the real
world too.
>> What do you think the sweet spot is for where the computer leaves off and the human picks
up?
>> Scott Klemmer: I just read a couple of days ago a report -- there's a blog that claims that in
some markets Facebook has rolled out and will automatically create an ad for you tool. And I
haven't gotten a chance to try it yet.
I think there are a couple of cool opportunities. One is to say can we get computers to
automatically do some of these things, and I think some of the time the answer is yeah. I think
with Ranjitha's work we may be able to make -- take some content and automatically synthesize
something that looks pretty good, at least as a starting point.
And so for me the question -- really I kind of view it as everybody races to the finish and
everybody's going to win a little bit. So I think automated techniques, there's a lot of really
exciting work to be done there.
I think one interesting midpoint is to do a design gallery's approach. So a bunch of years ago in
computer graphics Joe Marks and colleagues built this system called Design Galleries where it
would auto render a bunch of different designs for you that had different parameters of, say, a
planet blowing up, which is the kind of thing -- it's very difficult to specify a language or specify
a priori to a system I want the planet to blow up 7 on a scale of 1 to 10, whereas if you have the
system render out 30 versions, you can certainly say I want that one.
And so I think the recognition over recall benefit will be one of the ways to split the human
computer where the computer generates a bunch of ideas and the human says that one, either just
that one or that one as a starting point that I'll then tweak further.
So to me one of the things that was really interesting about your parallel versus the serial design
study is that the serial folks, while they lose, they don't lose by much. They're only 20 percent
worse than the folks who did it in parallel. And it could be for a couple of reasons.
So one could be they go with your gut idea. You know, your first idea is your best one. So
maybe that's why they did almost as good. And then the other one is the power of polish, like
maybe the fact [inaudible] five more times. Maybe their first idea kind of sucked but then after
five iterations it was almost as good as the parallel one. So it would be interesting to kind of try
to tease that out a little bit.
I guess one way you could do it with your existing data is to see how often in the parallel case
did they choose to publish their very first idea versus how often was it the second or third one
that made it to the finish line.
>> Scott Klemmer: Right. So you raised several good questions. One is they didn't win by
much. On one hand I think that's right. I think our hope is that we can come up with let's say
four or five or these nuggets that together give you a huge win.
I think the other thing to say is I believe that 20 percent is actually a lot. You know, if you win a
hundred-yard dash by 20 percent, that'd be insane. I think it's probably fair to say that one's
favorite interactive phone and the second best interactive phone, they're less than 20 percent
different, but the market share difference has been huge. So 20 percent may be big.
As to your point about when did people diverge and when did they come up with different
alternatives, part of the answer to your question comes in out of our Mechanical Turk study of
the aggregate difference between all of the designs and the fact that you see the parallel as being
more different suggests that they did explore multiple different options. It doesn't tell you when,
and I agree that that would be an interesting thing to look at more. And I certainly don't think
that there's like magic 3-2-1. I think how much did I diverge has to be contextually dependent.
>> So with your parallel study it seems like -- or my interpretation at least was that the
conclusion you're proposing is that working on multiple concepts at the same time is beneficial
in terms of the quality of your final product.
>> Scott Klemmer: Beautiful way to say it, yes.
>> It seems to me like based on the method you presented, it would be equally valid to conclude
that receiving feedback on multiple concepts at the same time is more beneficial than receiving
feedback on only one concept at a time. So I'm wondering if you [inaudible] in any studies.
>> Scott Klemmer: So in our current study we've conflated, come up with three designs
simultaneously and come up with three however you like but get feedback on them
simultaneously. And that was sort of intentional because it's actually kind of hard to work on
three things literally in parallel. And so we've operationalized the meaning of parallel to be bang
out multiple before you start to evaluate them or get feedback on them.
>> Well, I guess why I was wondering is it seemed -- at least from what you presented, it was
3-2-1, and the first three were pretty different, and then the two in stage 2 were pretty different,
and that led to one final product. And I was thinking maybe there could be a control condition
where you have version 1.0, 1.1, and 1.2, kind of all the same idea, you get feedback, then you
have 2.2, 2.1, and then 3.0. And I was wondering if you think that might provide you with
some -- an additional comparison that would be meaningful above what you have in your current
data.
>> Scott Klemmer: I think then you're asking the question do we want to do some way of
manipulating how broadly do people explore, how broadly do they fan out. And I think, yeah,
absolutely. And I think the interesting thing for -- some things that would be meaningful in
terms of what is the impact of fan out would be do people stumble on ideas that matter if we get
them to fan out more, or I think a big one is going to be cost reasoning. That in the serial case
you really see a lot of not only fixation but also some cost reasoning.
And so in the -- one benefit of parallel may be that if I don't have a horse in the race in the same
way that I do with serial, then I have less cost reasoning in the parallel case. And so I think
what's notable is that post hoc I offered you three reasons why I believe parallel won, and clearly
one wants to go on and figure out, okay, of the three stories I told you that explain the data, is it
all of those, is it some of those, can you manipulate them independently. I agree.
>> [inaudible] the imprint stuff was great and you did on the SQL team [inaudible] study some
of our stuff. And, yeah, everyone -- the first thing they do when [inaudible] a problem to them,
they open up Google or Bing or whatever, they start searching, and unfortunately, however,
there's quite a few cases where people picked the first example, it looked like the problem we
gave them but wasn't at all.
And it only had sort of some of the surface characteristics, but they didn't really think about what
the question's really asking them, and they just grab something and it's like whoa, there's a little
picture, the picture kind of looked like what we asked them to build from a diagram point of
view and it didn't really get what the example was giving them, and they went off on -- like it
was finally about 15 minutes -- these sites, we're not supposed to interfere, and then finally after
about 15 minutes we just told the guy, look, you're way off track here, this isn't at all what you're
supposed to do be doing. And he got worse and worse on [inaudible] because he just tried it. He
was convinced that the example was going to help him, and he kept banging on it harder and
harder.
>> Scott Klemmer: I've been there. I piloted a recent study in our group that was a follow on to
one of these, and I was that guy. And so I think I have two insights for you hopefully. One of
them is I'm going to claim that Blueprint largely solves the problem that you've seen, which is
that by presenting things in an example-centric way, we've seen that erroneous rat-holing a lot
less than we did when we were running studies on a page-based search engine. And so I believe
that example-centric search will solve much of your problem there.
>> I agree. I think it's more suited to atoms than molecules, though.
>> Scott Klemmer: I totally agree. I think that's a great way of saying it. And I agree. I think
what's exciting about this is that we've only solved the atom problem. We haven't solved the
molecule problem. Cool area for future work.
>> [inaudible]
>> Scott Klemmer: So I have -- here's one initial direction that I think would be fruitful to go off
the -- if what I'm searching for is more structural in nature and I am -- it's -- one of the big
differences between novice performance and expert performance across a wide variety of
domains is that novices overqueue on surface features whereas experts are more likely to see the
deep structure of the problem and not be distracted by the surface structure of the problem.
General difference between novice and experts. One -- the best medicine I've seen against
novices making that error is to ask them to explicitly generate the deep structure of the problem.
And so if you have an explicit reflection step, my hypothesis is that you'd see a whole lot less
rat-holing on following the wrong surface features.
And so you can do a simple study which I think you would see a huge win for and that we've
talked about doing, which is doing simply add in a reflection step into any of these things and see
do you get a notable difference in search behavior. My bet is yeah.
>> Translate that into [inaudible] it's an excellent observation.
>> Scott Klemmer: I have some thoughts that we'll talk more offline for that. I think it's a really
exciting direction. It's hard to get people to do. I mean, I think the classroom is the best leverage
we have, and even there it's hard.
>> Yeah. I mean, how do -- highly reactive environments where you can do fast iterations, you
get feedback immediately about whether the thing is working or not. I know those certainly help
from a development point of view.
>> Scott Klemmer: Yes.
>> Jaime Teevan: Thank you, Scott.
[applause]
Download