>> Amy Karlson: Okay. Well, I'm Amy Karlson... group at Microsoft Research. And today I'm welcoming Daniel...

advertisement
>> Amy Karlson: Okay. Well, I'm Amy Karlson from the visualization and interaction
group at Microsoft Research. And today I'm welcoming Daniel Ashbrook from Georgia
Tech where he is a member of the contextual computing group and has been advised by
Thad Starner.
His primary focus of research is human computer interaction for wearable and mobile
commuting and also ubiquitous computing. So he recently defended his Ph.D. and will
be starting at Nokia Research next month.
So I'm very pleased to have him here to speak with us today. And without further ado,
I'll hand the floor over.
>> Daniel Ashbrook: Thanks. So today I'm going to talk about largely my thesis
research, which involves microinteractions, and I'm going to talk about mobile computing
and all sorts of good stuff.
So, first of all, I'm going to talk about mobility, and that's what I really like. So basically
in my research life I've discovered that if we can be mobile, we're going to be mobile.
And A.J. understands this issue. She was walking down the hall with her laptop reading.
I'd be interested if you could type doing that also. But can you type doing that, like this
lady? This is a bad idea. So I don't recommend that.
So you can't stop people trying to be mobile. So the question is if you can't stop people
from doing something, you might as well try to support it. So how are we going to
support it. So that's basically what my life concentrates on.
I'm going to talk about several things. The first one is what I'm calling microinteractions.
So a microinteraction is an interaction with a device that takes a really short amount of
time, and I will talk more about that shortly.
But first I want to talk about impairments and disabilities from being mobile. So being
mobile can almost be like only having one arm or being in a wheelchair or something like
that just because it prevents you from doing -- from acting on the full capabilities of
yourself.
And so let's take a look at this next young lady here. She's got her phone, she's walking
down this rainy street in Japan somewhere, and she's got several things that are
preventing her from acting to her full capability.
So one of them is what I'm calling accessibility impairment. If she's got her phone in her
bag -- you can see this is a giant bag and her phone is probably going to fall to the bottom
of the bag -- and her phone rings, she's got to dig through all the stuff in the bag to try to
get the phone out. So she has a problem with accessing the device, getting at it.
There's also an impairment because she's walking and it's raining and she's got like this
one hand carrying the bag and the umbrella and the other hand is carrying the phone and
there's probably like 400 people in front of her on the street going all sorts of directions.
And I've noticed in places in Japan there's little stumps about this high and you can trip
over them in places. And so she's got all this stuff she's got to watch out for while she's
trying to do whatever it is she's doing on her phone.
So she's got this impairment because she's actually using this while she's in motion.
Finally, we've got this poor guy here, who it looks like maybe he's her friend, and he's
like why are you paying attention to your phone and not paying attention to me. So there
can be this level of social awkwardness involved in using devices as well.
So the question is, you know, is this a problem with the woman or is it a problem with
her technology. So I posit that it's her technology and we should think about that and
what can we do to help her and people like her use their technology in ways that are
going to be -- ways that are going to avoid these things.
So in queuing theory, there's something called balking, which is deciding not to join a
line if it's too long. So I go to the movie theater and the line's wrapping all around the
outside of the theater. I'm like, you know what, I really didn't want to see the new
Twilight movie anyway. Let's go see something else.
So in mobility we can think about balking as deciding not to use a device for some
reason. So, for example, if it takes too long to get to the device. So I want to take a note
on my phone but it's buried at the bottom of my backpack, I'm just going to give up and
write it on my hand or something.
If it is going to be difficult to use, so if I'm on a train that's really vibrating a lot or if it's
raining and I've got an umbrella and I've got a shopping bag, I'll probably not want to use
it.
Or if there's the issue of being rude, so I don't want to start doing stuff with my phone
right now because I'm talking to you all, so that's the social aspect of it.
So I'm looking at solving these through microinteractions. So basically microinteraction
is a single-purpose interaction with the device that takes a really short amount of time.
I'm using four seconds based on some earlier research on the response time of devices,
how long you're willing to wait for a device to respond to you. This is on the input side,
but it's the only number I've got, so I'm using it.
So basically the question how can we make interacting with your mobile devices quick as
looking at the time on your watch. I can look at the time on my watch right now and it
totally doesn't interrupt anything in -- there are certainly socials aspects to this that you
might have to worry about, but in general it's really fast and it gets around a lot of these
problems.
So how can we do that with other stuff. So looking back at these impairments.
Microinteractions can be applied in various ways.
So for the social impairment we can think about subtle interfaces, what can I do that is
going to be not necessarily secret but isn't going to be socially awkward for me to interact
with a device.
When we think about on the go, we want to think about mobile usability, how can I use
the device when I'm actually -- when it's not my primary task, when I'm not stopped,
hunched on the corner bent over my device, but when I'm actually out in the world using
it. And when it comes to accessibility, how can we access the devices really quickly.
So then the question is, you know, how do we actually do this, how can we make
microinteractions into a reality.
So the first thing is to think about access time. And access time is basically how long
does it take you to get at the device. So the poor lady with her purse, it might be 10 or 15
seconds as she digs through her stuff.
Then there's also the amount of time it takes to further get the device ready, to open it, to
unlock it, to do whatever you've got to do to actually get to the thing that you care about.
So I did a study called Quickdraw. I presented this at CHI in Florence, so a couple of
you might have seen it. I had a big mustache on. It was awesome.
So in thinking about this, thinking about what are the stages of device usage, so, first of
all, you want to get your device out of wherever you've got it. Then you're get it into the
position ready to use. If, you know, my phone, I'm going to get it into my hand. Then
I'm going to unlock the device, get it ready, say, hey, device, you need to start paying
attention to me, I'm going to give you some input.
Next you're going to navigate to your application. You're going to start it up, get to
whatever you actually want to use, and you're going to use your application. And,
finally, you're going to lock up your device and put it back away.
Now, the thing to notice with all these steps here is that the only one that you actually
care about is No. 5. All the rest of this is just like stuff getting in your way.
And really 1 through 4 is the stuff that gets in your way the most, like putting -- locking
your device and putting away, that's a practiced kind of motion that you can do almost
without thought. You can just lock your thing and shove it in your pocket or whatever,
and so that's not too bad.
But the first stuff tends to be the complicated bits. So we were really interested in what is
going on with these steps.
So to give more motivation as to why you care about that, let's say it takes four seconds
to get your phone out of wherever you've got it and to start it up and to get to your
application.
So that's not too bad if you're going to be writing an e-mail. So it takes me, you know, 30
seconds or something to type an e-mail. It took me four seconds to get the phone out.
Not a big deal.
On the other hand, if I'm wondering what the weather is, that only takes a second to look
at the weather. Probably takes less than a second to look at the weather. So now there's
four seconds that I spent digging the device out, and starting it up is a huge penalty based
on this one second of actual usage.
So we took a look at three different ways of carrying your device. We've got the pocket,
we've got a holster, and we've got a wrist. A lot of people have asked why not bags,
because lots of people store stuff in bags. Because there's really no standard bag.
We thought about it and we're like, okay, there's not a standard pocket either, but
everybody puts their stuff in the their pockets, so we have to test that, but, you know, you
could have many, many sizes of bags and stuff in the bags.
So we did pocket, holster, and wrist. We had people either stand there or walking around
a track in the lab, and we basically were looking at how quickly can you get at this.
So what we asked them to do was we'd have this incredibly loud and obnoxious alarm go
off and they would have to get at it and respond to it. And so the first thing they'd see is
this screen here on the left with the big number 11. The number would change every
time. And then they'd pull it out, look at the number and slide the little thing to unlock
the device. And they'd have to tap the number.
And the point of this was to make sure that they actually looked at the device; that they
just weren't just like pulling it halfway out of their pocket and pushing the screen or
something like that.
So we wanted them -- we didn't actually care about what number they picked; we just
wanted to make sure they were doing it.
So here's the track we had them walk around, a figure-eight thing, just walking, walking,
walking.
So we were measuring a number of things. In particular we were measuring how long
did it take you to get the device out of your pocket and then how long did it take you to
get the device actually into your hand to get it into the ready-to-use state.
And so the total of those are shown up here. So obviously the wrist is going to be a lot
faster.
Now, the thing that's interesting is looking at these other two, what is the division of the
amount of time getting the device from your pocket and then getting it actually into your
hand.
And it turns out that most of the time is actually getting this device out of where it's
stored. So it's not so much the navigation time, the orienting of your hand, you're really
quick with that, but it's the digging out of wherever you've got it stored that takes up so
much time.
So this really provides justification for thinking about the wrist and other really
accessible on-body locations to put interaction technology.
So I've concentrated a lot of work on the wrist, how can we interact with devices that are
placed on the wrist, what sort of stuff can we do.
So I've looked at two different ways, looking at touch and gesture. So I'll start off talking
about touch, and then gesture is going to form actually most of the talk.
So I was thinking about touchscreen watches and so on, and I was looking at what kind of
advanced watches do we have today.
So these are both cell phone watches. And they probably have got PDA functionality and
so on. So how do you interact with these things. Well, you know, buttons are certainly
one way to do it. So this guy's got three buttons on one side, probably got three on the
other side, menus and so on. No problem. You can certainly go overboard with buttons.
So, you know, the other way to do it is touchscreens. So here are some touchscreen
watches. These two guys are cell phone watches again. This is a Palm-based OS device
that is no longer sold. And there's reasons for that. One of the reasons is that you have
these tiny screens and you inevitably get styli. And there's these little tiny toothpicks. In
some cases, like this one, they actually unfold.
So it's this little thing in the band of your watch. You have to pull it out, then you have to
unfold it, and then you can use it on your device. And so when you've got stuff like this,
it starts to get a little bit silly.
So that's one thing that I really think is dumb and wanted to think about fixing.
So the other thing is here's some more random technological watches. There's a
commonality in all these. They've got these square screens and then they've got all these
round elements to them. So nobody's really made a round like digital watch, as it were,
and certainly no round touchscreen watches.
So I thought, you know, let's think about what we can do with the circular watch.
So the goal, then, of this research was a finger-usable, round touchscreen watch. And
when I was thinking about how to use this thing with your finger, I thought about Jake
Wobbrock's EdgeWrite. So you use your stylus here and you use this little template and
it helps keep the stylus steady. You slam the Stylus very quickly from one corner of the
thing to the other. There's very little precision involved in what you're doing. It's a very
quick, easy technique.
And so watches have got these bezels around the edges, and I thought, well, we can run
our finger around the edge of that and get some stability off of that too.
And that makes sense when you start think thinking about how would you make an
interface on something like this anyway. You'd probably want to put buttons around the
edge and have something in the middle.
So then the question is how many buttons do you have, what do you have in the middle,
how much space is there for these things. So essentially you can have lots of buttons,
you can have few buttons, you can have lots of area in the center, you can have not very
much area in the center.
So what I did was I was looking at the tradeoff between these, you know, what effect do
these various choices have on the usability of the device, your error rate essentially in
using it.
So since we -- since the round touchscreen watch doesn't exist, we made one. It's pretty
awesome. So we have this steel plate here in the middle. It's got this bezel cut around it.
So of course you can't have the bezel like on a real watch, because we don't have a round
screen, but we've got this bezel like we've have on EdgeWrite. So that provides support
for your finger to run around.
And the basic idea was let's do a fit-style task. So we're doing reciprocal pointing back
and forth between two targets. And I thought, well, you know, the sliding might be good,
but maybe there's other ways to do it.
So I have a standard tapping-style interaction, where you're just tapping back and forth
between two targets. We have the straight through-style interaction where you're moving
your finger just in a straight line like you do on a standard square touchscreen. And then
there's this rim-based interaction where you're sliding your finger along the edge.
So we had people -- you know, this is some actual data. So you had people run their
fingers back and forth between these various targets as quickly and accurately as they
could. And we ended up calculating an error metric based on this.
So you can actually predict the error based on the size and basically the shape of your
button. So how many buttons do you have around the edge, how thick are they,
essentially how much space do you have left in the middle gives you a pretty good little
curve here.
And so you can stick in these constants that are based on the kind of movement and
actually predict the error rate based on the number of buttons you've got.
So just to give you an idea what this might look like, here's a sample layout at 4.8 percent
error. I can get 12 buttons and I can have 75 percent of the area left in the middle for
other stuff. I can put a display there or whatever else I want. And this is for the sliding
your finger around the rim interaction.
So what would interaction like this actually look like, what would you do with it. We
built a couple applications. I can't remember if that video's in this first -- the talk or not.
But basically the idea would be let's make it stable, let's make it so you're not accidentally
going to brush it and have something happen.
So let's say you run your finger around the watch 360 degrees to say I'm going to do
something. And so this is actually really super fast. I'm going to talk about a couple of
ideas, and it's going to take me way longer to describe them than it would actually take
for you to do them.
So let's say you want to see what the weather is. So you can touch the face of the watch
and get a bunch of icons for various selections. You can drag your finger down to the
weather icon and then sweep your finger around very quickly, at 360 degrees, and
essentially reveal your weather application in this really fast manner. And so this could
take under a second to actually do. We've done a couple prototypes.
So the other thing you could do is let's look at my schedule for next week. And so I can,
you know, again touch the face and pick the month and sweep my finger around and pick
out -- actually, having 28 buttons is too many for a decent error rate, so you could pick
the week and sweep your finger around again and then pick the day and have it actually
pull up little bands or something to show you what your schedule might be.
So I do have the video. So video's going to play three times. The first two times are
going to be actual speed, and then the third time slowed down just so you can see what's
going on.
So this super quick, round, 360-degree interaction, it takes -- this one it actually took
exactly a second. So you can see this is actually a super fast interaction. And here I
didn't actually show you choosing from icons, but it would just be that three o'clock
position you would know this is where the weather is, I'm going to zip around.
So that's it for touch. I'm going to go on and spend a lot of time now talking about
gesture and how we make gesture for mobile devices.
So this was the bulk of my dissertation work. It's a piece of software, a concept, really,
called Magic. And so when I talk about gesture, now I'm talking about motion gesture as
opposed to the touchscreen gesture that I just showed.
And so the idea here is you are moving some part of your body in a freeform fashion. It's
being sensed by something and the computer is going to respond to it.
So gesture is useful in a variety of situations. So we've got these places where you have
access problems, like the lady with her umbrella and all that. You could have your hands
full but still you could still gesture with a shrug of the shoulder, something like that.
Social situations where it would be inappropriate to dig around in your bag to shut off
your ringing phone, maybe you could just do it with a flick of the wrist or something like
that.
You can solve these first three with speech, but there's certainly situations in which
speech is inappropriate. Or in a crowd, for example, it's probably too loud to use speech
effectively.
So there's a number of situations in which you want to think about gesture. So there's a
problem with gesture. Let's say I define this gesture to delete my e-mail, which seems
great. It's a pretty recognizable gesture. But then it turns out that I'm doing this all the
time in conversation and, oops, there's goes all my e-mail. So that's a little contrived, but
there are plenty of other things that you wouldn't think about that you might do.
So you want to avoid this problem of having your false positives of gestures. So I know
the cell phone industry has come up -- has had this problem. You know, you sit on your
phone and call somebody. So they came up with a solution to it, which is push to
activate.
So, you know, on the iPhone you push a button and you slide your little slider and that
tells your phone, okay, I am ready to give you input, you need to pay attention to me
now. Or you open your phone or whatever.
Lots of other stuff use this too. So this watch on the left, you tap the face and the hands
move to be a compass or to be an altimeter or whatever. It's really cool, but it could be
very confusing, you know, if the hands are moving around wildly as you are looking at it.
So you have to hold down the crown for like a second to indicate that you're about to talk
to it. In the same way in speech recognition or Star Trek you say "computer" first and
then "call commander Riker" and the computer responds to you.
And even on the Wii, various game systems, you're doing gestures, you hold down a
button to indicate that you're about to do something. So in the bowling game I hold down
A and then I bowl, so it knows that I'm not just making random motions.
Now, the problem with push to gesture or push to activate for gesture is you are involving
your other hand again. So if I've got something on my wrist that I'm going to gesture
with and I've got to push a button and then gesture, first of all, if my hands are full of
shopping bags, that doesn't work.
Second of all, if I'm pushing a button, I might as well just push the button and just have it
do the task instead of doing a gesture. So I'm trying to avoid push to activate for doing
gestures.
So the way I do that is with the software I built called Magic. And Magic is actually a
general gesture creation tool. So the idea is let's help people figure out what kind of
gestures are going to work well for this sort of situation.
So my work, I'm using accelerometer on the wrist. You could use lots of different
sensors in lots of different places. We have a bunch of these, so this is what I did. This is
the Bluetooth accelerometer. And I'm going to talk briefly about the recognition I use so
it will make sense.
So I'm using dynamic time warming. And dynamic time warping basically is for saying
how similar are two signals. So these might be me doing this once, and then the second
one might be doing this again. And so I did them slightly differently each time. I want to
say are these things similar enough that I can say that they are essentially the same
gesture.
So we have two signals here, the green and red one. They look really similar. There are
several places where they're different. So, for example, one thing that's really clear here
is that the peak on the green one happens earlier than the red one. So they're different but
we want to be able to say, you know, how close are they really or can I pretend that
they're the same.
And so what we do with dynamic time warping is essentially figure out how do these
things match up. So can I -- where's the peak of this one, where's the peak of this one, let
me draw a line between them. So I do that for the entire set of signals and find the best
way to draw these lines between them.
And then I plop the two signals on top of each other and basically now the lines have
collapsed to be as short as they can, and so add up all the lengths of the lines, all the little
gray lines, and that is my similarity.
So intuitively, if these -- if the green and red were exactly the same, all those gray lines
would now be of zero length, and so my similarity would be zero or exactly the same.
And so the more different they are, the longer those lines get and the higher the score is.
So that's how the recognition works in a nutshell.
So how the gesture design process works. Currently this is what people do. You create a
bunch of gestures and you compare them to each other with, for example, dynamic time
warping and you say are these gestures too similar. So if I have a gesture that looks like
this for let's say playing my MP3 player, and then I have another gesture that looks like
this for calling my mother, they're somewhat similar.
And if they're too similar, then when I go to play my MP3 player, then I might
accidentally call my mother instead. And so this is something you don't want to have
happen. So you've got to make sure that the gestures don't conflict with each other.
So that's the first stage. If they do, then you've got to go back and start over again and
pick some new gestures, especially in a system -- you know, say I've got a DVD player
I'm doing and I want fast forward and rewind. Fast forward and rewind are related
conceptually, so I might make gestures that are related in their motions, and then if fast
forward doesn't work well, then I probably am going to redo rewind, so it will still be
related. And so it's -- it can be pretty involved to do this.
So once I've gotten through that, that's great. Now I'm going to go out and I'm going to
make sure that stuff's going to work in the real world. So for my DVD player I'm going
to, you know, put some people in the usability lab with the living room furniture and I'm
going to have them, you know, eat pizza and whatever and make sure that they aren't
accidentally fast-forwarding their movies.
For a mobile phone or something like that, I'm going to send them out and put them on
buses and walking around and so on, and I make sure it doesn't accidentally stop and start
their music. And that can really take a long time.
We had one guy who did a watch where you made gestures above it, and it got activated
by everything. He could -- his sleeves and walking through doors and people going by
and so on. So it took him two weeks to come up with a set of gestures that he could
actually use reliably to do this.
And the problem is if you get any gesture that is activated all the time in real life, then
you've got to go all the way back to the beginning and start over making new gestures
and then go through this whole process again. So that sucks.
So what I've got instead is a more parallel process. So you can check to see if your
gestures are going to connect with each other, and you can check if your gestures are
going to conflict with things that people do in everyday life at more or less the same time.
And this part takes minutes instead of days.
So the top part is the interface for the system. It gives you visualizations and so on, and
I'll show some examples.
The bottom part here, figuring out if your gestures are going to conflict with things, is
through something called the Everyday Gesture Library. And so I'll talk about that next.
So the basic idea is I'm going to send people out like I would normally when I'm testing
the thing, except I'm just going to stick the sensor on them.
So in my case I'd stick the wrist-mounted accelerometer on them and just set it to record
and ask them forget it's there, just go on and go about your life, drive around and have
meetings and eat your breakfast and whatever. And then when they come back, I get all
those motions that they did and I stick it into the Everyday Gesture Library.
So now later on when I say, okay, I'm going to make this interface and it's going to be
awesome and I want to figure out what people are doing, so I say, okay, here's a bunch of
gestures that I want and here's my fast forward and rewind and call my mom and so on.
And so I ask the library, hey, do people actually do this stuff? And it might say, oh, well,
you know, some of these are good, but, you know, this one looks an awful lot like this
guy eating a doughnut, and this one looks just like this guy waving to his friend and so
on -- so...
>>: But you don't know what they were doing, right? I mean, it's not like the data ->> Daniel Ashbrook: I'm glad you said that. So I do actually know what they're doing. I
need a picture of it so I remember to say it. So in my study I actually had a hat with a
fisheye camera pointing down.
>>: [inaudible] they wore the hat?
>> Daniel Ashbrook: And so they wore the hat and went around and did their stuff. And
so you could actually see what the hands are doing, and figured out here's what's actually
happening. So I can actually see you're eating the doughnut instead of being like, you
know, thumbs up for calling your mom or whatever and so on.
So we actually have contextual information, and I'll talk about how useful that was later.
So now the EGL doesn't actually say okay and bad; it says here's how many times this
happens and it allows the designer to make the choice about whether or not it's a good
idea to include that gesture. So if it happens once in a day is that okay or is that still too
often. And so you can make a decision based on that.
So I collected a bunch of data from my study.
>>: I have a question.
>> Daniel Ashbrook: Yeah.
>>: Are the gestures in the library presegmented, or is it just continuous data?
>> Daniel Ashbrook: It is continuous data, but you'll see a bit of this later, basically I
look for -- solely for the purpose of speeding things up, I look for places that are
interesting, which is they're higher energy. So if you're sitting with your hand on the
desk for an hour, it just skips right over that, because there's nothing happening there that
could possibly be a gesture.
So I collected these EGLs from a lot of people and got almost 60 hours of data total.
>>: Are you No. 1?
>> Daniel Ashbrook: I am No. 1, yes.
>>: [inaudible]
>> Daniel Ashbrook: Everybody else were volunteers doing various other things. They
didn't know any details of the study. So that's been one question raised.
And there's a lot of interesting activities that I got in here. So I asked people not to just
hang out and watch TV or sit around and use the computers, because that presumably
would after about an hour be exactly the same data over and over again.
So I got things, you know, attending conference. I was at CHI with this last year, and
you didn't even notice the hat. See? You know, brewing beer and knitting and hiking
and making cheese and all sorts of great stuff. And most of these are not me.
So just so you can see what the video looks like, here are a couple of examples. So it is
really funny to see your nose like that. So but you can see what's going on. You can see
in the upper left one there I'm opening a refrigerator and getting out some chipotle or
something to microwave. The bottom one is another one of the subjects hiking and
manipulating the water bottle or something.
So you can tell what's going on, especially if you're the person in it. But even if you're
not, you can sort of get a general idea of what's happening.
So what I did to give people to test an interface was part of the library that I collected. So
in machine learning you have a testing set and you have a training set of data. So I'm
making a new algorithm. I want to see how well it's going to work. If I just get a whole
bunch of data and tune the algorithm data and test on it, then that's invalid because I have
trained on my testing data.
So in the same way what I've done here is I've separated out a little segment of it to give
the users to tune on, and then I reserved the rest of it, plus all of the other EGLs I've got
for the testing data. And so everybody got the same little bit of training data, and then I
tested their stuff on the rest of it.
There are some limitations to the Everyday Gesture Library. It is not -- it can never be
guaranteed to span the space of everyday life. I might collect five years continuously of
data, and then one day I'll decide to go bungee jumping. And I've never done that before
and it turns out that the high acceleration of bungee jumping triggers the thing to call my
mom and she hears me screaming. Can't guarantee against that. You can probably get
enough data to -- so that that sort of thing is fairly uncommon.
The other thing is if you change your sensor, if you change where you're putting it on
your body -- you know, I move from a wrist-mounted sensor to a forehead-mounted
sensor or something, obviously it's not going to work anymore. If I decide I want to do
an interface for fighter pilots, the data I collected while attending CHI is probably not
going to be useful for whether or not they're going to accidentally activate things in the
cockpit.
And, finally, people with varying actual abilities are going to be different too. So if I'm
making something for people with Parkinson's disease, it's not going to be use -- the EGL
I collected for that is not going to be useful for people with broken arms basically.
So there's some limitations. Even so, it's pretty useful.
So I had a study -- I'm going to talk about these more briefly, but you'll see these in the
videos I'm about to show. So, you know, I had people design for a mobile audio player.
So there are three different things you can do in Magic, and you can move between them
fluidly.
But the first thing you've got to do is you've got to make some gestures. And so that's the
first stage. And I will now show you what that looks like.
Oh, but first I'm going to talk about examples. So when you're doing the kind of
recognition I'm doing, you do something called template matching. And so basically the
idea is I want to make a particular gesture, and I'm going to use pen gestures here as an
example because they're easier to show on the screen.
So I want to do a cut gesture for Microsoft Word or something. So my gesture is going to
be cut, and that's going to be essentially a category. And then within that I say, okay,
here is the shape that I want my pen gesture to look like. It's going to be like that. So
that's one example.
And so now I'm going to record a bunch of examples. I'm going to say here's all the
different ways that I could do the cut gesture, so I want to account for variation in the
way that people might actually do it, so I have really big ones and small ones and sloppy
ones and so on.
And so these are a bunch of examples. Later on the system will go and it will take these
examples and it will compare them to the input that it got and say, okay, does whatever
somebody did match up closely enough to one of these, that we can say, yeah, they were
probably trying to do cut.
And so that's the same way that the gesture recognition in Magic works, except it's with
motion gesture instead of pen gesture.
Okay. So now the video. This is what the interface looks like. And you're going to see
up here is the alive input from the sensor, so that's actually zipping along as the person
used their arm. So over here this person is making some gestures. And so, again, these
are the categories and he's making them for all eight of them that he was supposed to do.
So play/pause, next track, previous track, and so on.
So here he's made a whole bunch of gestures. And now what's going to happen is you're
going to see him actually creating some of the examples.
So I have another hat cam and I have a monitor cam. And so he's going to make four or
five of these things, so you can actually see him actually moving his arm as he does them.
So he makes that swinging gesture. You can see the motion up there, and then it appears
over there, and here's the new example. And so he's going to do this several times, so
you can see what's going on in all the different parts.
And one more I think. Yeah.
So he's made this gesture now and wants to know is this any good. So the system gives
several pieces of feedback, which are these columns here. So it tells you here's how long
each of your examples are, so you can say, you know, are -- is one of them way longer
than the other, did I do something wrong.
It says here's what the system thinks your gesture looks like. So if it says -- if it doesn't
say next playlist there, even though I just made next playlist, then I've got a problem.
Finally it's got something called goodness, which says basically how well do my
examples match up with the other examples in my class or not. So if -- here we have all
of those examples are actually recognized as next playlist, and so goodness is a hundred
percent.
Up here we have, you know, a lower goodness because maybe the next -- it recognizes
next track but maybe it could potentially be confused with some other stuff, and so you
want to have -- you want to have high goodness. That indicates that you're going to have
your gestures be fairly reliable.
So here's a case in which there are some problems. So this guy is recognized as previous
playlist even though it should be volume up. And this one is recognized as volume up,
but it's got 25 percent goodness, which says that, you know, there's a pretty high chance
that it's going to be misrecognized as something, so maybe it looks like another gesture
too much and -- yeah.
>>: Sorry. Is goodness taken into account for gesture library?
>> Daniel Ashbrook: Huh-uh.
>>: Okay. This is just --
>> Daniel Ashbrook: Nope. This is just ->>: -- self-consistency.
>> Daniel Ashbrook: Just self-consistency. Yeah. Yeah. So the gesture library will
come in a minute.
I've also got some visualizations. So, for example, it's very clear that one of these things
is not like the other. Each of these represents an example. And so this is a really quick
way to tell when something has gone amiss.
And so what we can do is look at the traces for all of them, which are hard to interpret,
but I can see, oh, look, this one is way longer than all these other ones, so that's really
easy for me to tell, you know, there was something wrong here.
And, you know, if -- the width of the bar there is the standard deviation of the distances
to everything else in its class, so if one's really wide, I can say, well, you know, it doesn't
match things very precisely. If it's really far to the one side, then I can say it's -- you
know, it really doesn't match all the rest of the gestures in its class and so on.
So the next stage of Magic is gesture testing, which for time's sake I'm not going to talk
about much. Basically you can do a set of freeform gestures and see if your gestures that
you defined in gesture creation show up. So I'm just going to test them out and make
sure that when I do this several times, then every time it's recognized properly.
So the last stage is the Everyday Gesture Library. And here's where I'm going to see if
what I'm doing shows up in somebody's everyday life. And so I'll show another video of
what's going on here.
So in the top left we've got the actual Everyday Gesture Library. And so you can drag
your mouse through it and see the actual hat video. So this is actually from CHI.
And so just to point out -- just to point out these gray bits here are the areas that are
boring areas. So there's not much going on in those. And so, you know, we can see
talking and gesturing and so on.
So I have this and I say, okay, I want to see if play/pause is going to occur, and so I'm
going to search and it's going to go through this entire thing and look to see if any of the
examples of play/pause are recognized -- are basically -- would accidentally be triggered
by the motions that I made during this set of recording.
And so I'll click check, and this actually takes about 15 seconds because it's a huge
amount of data. And I say, oh, look, here are five places where this has shown up. So I
would have had my music player accidentally start playing five times when I hadn't
intended to, which is probably bad in the space of five hours.
So what we can do is go back and look and say, okay, where do these things happen. So
here in the EGL graph you can see that little green box shows here's the place where this
one actually happened, and so I can double-click on this and it's going to show me the
video. It's going to show the video of what was happening when the EGL was recorded
and it's going to show the video that was recorded when the person created the gestures.
And so you can compare all these simultaneously and actually see what was happening.
And so you can see here I'm rubbing the bridge of my nose, and it looks -- that forward
motion as I come up, it looks a lot like his flicking-forward motion that he's doing there.
And so this can give me a clue as to what was happening when the gestures accidentally
were triggered, and maybe that can help me figure out what to do to fix it.
So there are a couple of things you can do to fix it. One of them is you can adjust
basically the sensitivity of the system. And so you see these -- they're a little bit hard to
read, but these similarity numbers here. Those are the distances that you get from a
dynamic time warping process.
And so I can actually adjust the sensitivity downward and I can say, okay, anything that
is above -- you know, this is 6.64. So anything above 6, let's just ignore that. And so that
will get rid of the occurrences in the Everyday Gesture Library, but there's also influence
how the gestures relate to each other. And so if I make that too low, then it will be
impossible for me to actually make the gesture I want to make and have it be recognized
when I'm trying to do it.
So there's sort of balance here between not having stuff show up and having the gestures
be reliable enough for me to make.
Now, the other thing I can do is -- you'll notice that over here that it was only play/pause
1 that showed up. It showed up five times. And the rest of them showed up zero times.
So I can just say, okay, maybe play/pause 1 is just a bad example. Let me just get rid of
that, and then it will never show up. And that's good, but then, again, we have the
problem of maybe play/pause 1 looks like -- is a really good representation of what
people are going to actually do.
And so, again, there's this tradeoff between having the gestures be recognized reliably
and not.
And so what you'd really want to do is after you go through this process, you'd want to
bring in other people and have them do the gestures and basically iterate through to make
sure your gestures are going to be self-consistent enough that people can use them.
>>: I have a question.
>> Daniel Ashbrook: Yeah.
>>: So if you eliminated play/pause 1 from your set of positive examples, does that mean
that none of the other play/pauses would have shown up in the gesture library?
>> Daniel Ashbrook: Right. Because they've got the zeros next to them, so they're tested
independently. So none of those have shown up. The other gestures here have not been
tested yet, and I think actually if you see the rest of the video, they show up thousands of
times, so they were really poor examples.
And showing up thousands of times is one of the weaknesses of the Everyday Gesture
Library, because you're not going to watch thousands of clips of video to see what was
going on.
So -- yeah.
>>: What is the individual that you collected the data from [inaudible] if I find ->> Daniel Ashbrook: Yeah. So this was just me. The other people I collect the data
from, I had them come and use this as well. So actually I will get to that organically here
in a couple of slides, talking about the various people.
So I ran a study. The goals were to figure out is it usable, what strategies are people
using to design these gestures, is the EGL useful at all, and, given a common task, are
people going to make a common gesture set. You know, I think that people are starting
to settle on a common set of things for editing operations with pen gestures. Is there
anything like that for controlling your music player.
So, again, here are the gestures. So I've got play/pause, shuffle. And then the rest of
them are paired gestures. So next and previous track, volume up and down, and next and
previous playlist. And volume up and down are about 10 percent because the system
doesn't do continuous. So I couldn't say as long as I'm doing this keep on turning the
volume up. I've got to have a discrete gesture for it. And that's just the limitation of the
recognition.
So got a bunch of participants. So there's actually a third category here that I haven't
shown. But since you asked, I also had a My EGL category of people who volunteered to
collect the EGLs and I ran them through it as well using their own Everyday Gesture
Libraries. The rest of these people were tested against that one segment of my library.
And it turns out there wasn't that much difference. People didn't use the video very
much. And I'll talk about that very shortly.
So I have them do this task and looked at how they did. People did really well with the
EGL. And when they didn't have it, they, as you might expect, did terribly. So there was
a very significant difference between people using the Everyday Gesture Library and
people not.
So if you had the Everyday Gesture Library, you got about two accidental activations per
gesture per hour. You might point out that that's lousy performance. If you didn't have
the EGL, you got 52 per hour, so almost one per minute per gesture. So that's really bad.
Now, the reason this one is so bad is because it turns out an accelerometer is not the right
sensor for this. Any small movement looks like any other small movement to the
accelerometer essentially because they're so low magnitude that they're very similar.
There's probably some algorithmic things you can do to fix that. Even better would be to
use a gyroscope in conjunction with some thing things that will improve the sensing.
But, in principle, the Everyday Gesture Library is great because it vastly reduces the
number of hits, and that would translate to other sensors.
I had four people who managed to get no occurrences at all.
>>: And success means the participant defined a set of gestures that didn't occur
naturally?
>> Daniel Ashbrook: Right. So I'm going to talk real quick about how goodness is
calculated now. Goodness is the harmonic mean of precision recall. And precision recall
are a little bit slippery, so I've got some pictures.
So on the left one there, basically what we're doing is trying to capture all of the orange
dots within the orange circle. And so on the left one there I've accident- -- I've got all the
orange dots but I've accidentally got a couple white ones. And so there my precision has
been lowered.
And so in the same way, if I have a gesture example and it matches really well all of the
other gesture examples of that class but also matches a couple extra ones, then it's got
lower precision. So the goodness goes down.
The next one I didn't get any white dots, but I missed one of the orange dots, so my recall
has gone down. Here we've got both of them happened, and so the goodness score has
gone down even more. And then here we've done exactly what we wanted to do, and so
we have a hundred percent goodness. So that -- just to give an intuitive notion of what's
going on.
And so when it came to goodness, the goodness actually basically -- or self-consistency
of gesture examples was really high. And people had 86 percent goodness in general.
Seven of them had nearly a hundred percent goodness for average for all of their gestures
they created.
As you would expect, it doesn't have anything to do with the condition of whether or not
they got the EGL, because it doesn't have anything to do with the EGL. We could make
it depend on the EGL as well. That's probably something to do later.
Now, the more fun stuff is the qualitative results, what people thought about it. It turned
out to be a really hard task to say sit down, define eight gestures that are going to control
something, the gestures can't conflict with each other, you can't have play activate fast
forward. And on top of that you can't have them look like something people do in their
everyday life.
That was really, really hard. And I was surprised people actually managed to do it at all.
But they did. And people actually really liked it.
In fact, one person even said this is so awesome I want to come back tomorrow and keep
on doing the experiment. So that was really gratifying. I thought that was great.
In terms of the Everyday Gesture Library, people were afraid of it because every time
they'd do a new gesture it would get thousands of hits in it. People didn't care about the
video. So that's kind of a -- I mean, it's good and bad, right? So the video was not
helpful, so that was sort a failure. At the same time it says that we don't have to have that
hat, as stylish as though it is, you don't actually need it to give to people.
And, again, I think this is because thousands of hits you're not going to go through and
you're not going watch every single one of those. On top of that, people said I can't do
anything about it. You know, if it turns out that you rubbing the bridge of your nose is
conflicting with my gesture, I can't stop you from doing that, so I've got to change my
gesture. So it really doesn't matter what you were doing, it just matters that I had a
conflict.
This is one of the guys who recorded his own Everyday Gesture Library and he thought it
was interesting. He said it was really useful, but when I pressed him he couldn't really
say why it was useful. He just I guess got an intuitive feeling of what was going on and
maybe it helped him somehow. So I thought that was interesting.
So my graphs that I briefly showed you were not widely enjoyed. They were
complicated. There were some other graphs that you didn't see that were even worse.
And so people said, you know, these are -- these are -- they're useful but they are difficult
to understand. There's a high learning curve in figuring out what's going on.
So I think there's some interesting research to be done in making machine learning
concepts accessible for people who are not machine learning professionals to -- my
graphs, for anybody who is a machine learning person, my graphs are basically taking
your confusion matrix and splitting out and visualizing it in various ways. Confusion
matrices are confusing for nonexperts. My graphs are a little bit better, but probably not
much. So I think there's some more interesting work to be done there.
So I gave a questionnaire. One of the questions I asked them was, well, would you like
to actually own this thing that you've just designed with all your gestures. And the
[inaudible] deviation here don't tell the story very much. Some people said absolutely
not, it was horrible. A lot of people said, yeah, I'd love to have that.
What was really interesting is that there's no correlation here between how well they did
and the quality of their gestures. So people with really awful gestures might have said
anyway, yeah, that would be great, and people who made good gestures might have said
no, that was terrible. So I thought that was pretty interesting.
I also asked them what else would you like to control with gestures, if anything. And I
had a lot of really interesting responses. So there were several people who said cell
phones, presentation software. I had a bunch of media equipment in the house and in the
car. There was a controlling a robot to basically pick things up for people with physical
disabilities. Replacing the mouse.
But my favorite one was the Roomba mess indicator. And there was no other
information given on the survey that was given, but, you know, you can imagine what
that would end up looking like. So I thought that was pretty cool.
So now I'm going to talk about the strategies that people used for designing these
gestures. And I thought this was really interesting. So I basically went through all the
videos and figured out here's what people are doing to make these gestures memorable or
make them not show up in the Everyday Gesture Library. And I'm going to show a video
for each one so you can see what's going on.
So the first strategy is to make things iconic. An iconic gesture is basically one that looks
like something else. And so this guy said, well, I'm doing this play/pause gesture and it's
like pushing a button. And so he does this push thing. So I'll play that a couple more
times so you can see, because it's short.
And so iconic is sort of in your head. It's this is related to what I want. And so that's the
push for playing a pause.
So directional gestures and paired gestures are often associated with each other. So this
person did next and previous track, and they were related, and the directions were also
important. So for next track she's going to go to the right, and for previous track she's
going to go to the left. So play next track first.
So she just does the sweeping thing. And previous track is just the opposite direction. I'll
play them simultaneously so you can see. So the directions are important. And the fact
that they're opposite is also important because they're opposite functions.
Some people did impacts. So this guy's clapping. There's also hitting, just hitting your
leg or hitting the table, things like that. So these -- I thought that these might actually end
up not showing up in the EGL, but there's actually -- I should mention none of the
strategies people used actually had any significant impact on anything. So they're
interesting, but the person who's doing it was the most important thing. So that's
interesting as well.
I like these. This is the best gesture in the whole set. Basically these are people who
independently figured out push to activate. So this guy said he was explicitly thinking
about Star Trek and saying "computer" before addressing the computer. And so he does
this pre-fix thing.
So what it is, for every gesture he does he makes a particular motion and then he does the
gesture. So what he does is, first of all, he cups his hand on his ear, because it's an audio
player, then he brings his hand down and he spins it in a circle, and then he does his
gesture.
Now, in this case it's the shuffle gesture. I should also mention that this, during my
defense, got my committee making Three Stooges impersonations. So cupping his hand,
moving his thing, then shuffle.
And I'll play that a second time because it's awesome.
>>: Very elaborate.
>> Daniel Ashbrook: Yes. Very elaborate. He actually explicitly said I know this looks
crazy, but we have people wandering around with Bluetooth headsets on talking to
themselves, and so I'm just assuming explicitly that whatever I'm doing here is going to
become as accepted as Bluetooth headsets. I didn't tell him I don't think Bluetooth
headsets are very acceptable, but...
So this next guy, he did a post-fix gesture, which was much less elaborate and not as
much fun. But basically at the end of every gesture he does he rotates his wrist. And so
I'll show you what that looks like.
So basically this could be thought of as a confirmatory gesture. So I do the gesture and
then, oops, I didn't actually want to do that, so I just don't rotate my wrists and then it
doesn't happen. So I thought that was pretty interesting. He was the only one who came
up with a post-fix gesture instead of a pre-fix.
So some people did jerks and directional changes. So, for example, if I was drawing the
letter A, that change of direction at the top would be a directional change. I could also do
something like that, which, you know, it's not an impact, but it's going to show up in the
accelerometer in much the same way. There's going to be a huge spike in acceleration
because I'm changing direction so quickly.
He's got just this sort of sweeping thing going on. That is -- here the direction doesn't
actually matter. So that's why I didn't file it under directional, because he's moving in a
particular -- he's making a directional change, but it doesn't have any relationship to the
gesture he's actually doing.
A lot of people did repeated gestures. A couple of them explicitly said this is because I
don't think you -- in the Everyday Gesture Library people have done this thing multiple
times, so he does this thing a couple times. You know, I had people like brushing their
arms several times and just various things that they did multiple times.
Finally, this one's pretty interesting. This is what I'm calling retrospective realization,
and it's not exactly strategy. Basically what happened is a couple people realize that
unintentional movements were affecting them. So the way that the recording works is
you say start recording, and then it listens, and when you start moving it it records, and
when you stop moving it it stops recording.
So what this guy did is for this top one he does this sideways brushing motion, and then
he stops and it stops recording. Well, then he noticed one of them, that one of the
examples had a really high goodness as opposed to the rest of them, which were really
low.
So we went and watched the video for it and did the brushing and realized, oh, I didn't
pause long enough, I put my arm down. So I'll play those again so you can see them.
And so it turns out that putting the arm down motion actually vastly improved the
goodness of his gesture, and I think it even improved the number of hits in the Everyday
Gesture Library. So we went back and deleted all these and made his arm go down in all
of them.
So I thought that was really interesting, a really fascinating use of video in the system.
Okay. So nearly done. So I -- these are the goals of my study. I found out that people,
yes, they liked Magic. They thought it was great. They had some interesting strategies.
The EGL was very useful and users did not make even anything close to a common
gesture set, and that was very interesting too.
So the only thing that was even vaguely close to common was for shuffle. I think four of
the 20 people I had did a shaking motion. So we had the guy you saw with the shaking
on both sides of his head, the other people just did a shaking kind of thing.
So shake to shuffle was the only thing that was even close. All the rest of them I had
widely varying things, from one guy who said he was visualizing a box in front of him
and he was hitting different parts of the box to indicate things, another guy who shot his
arm up in the air and then he'd draw like a plus sign for up and then a V for volume.
Yeah. I just had this huge variety of stuff. So that was interesting. There was nothing
that was even close to having a consensus about things.
Okay. Last part is I want to talk about some ideas I have for stuff in the future.
So, again, the things I've been thinking about are these various impairments. Gestures
may not, in fact, at least as implemented in my experiment, solve the social impairment
problem, certainly with the -- putting your hand on your wrist and rotating and shaking
and so on.
But I think the other ones might have been solved. But going forward, what other places
can you put technology? I've been thinking about the wrists. I think there's really a
strong possibility for basically an on-body ecology of devices that all communicate with
each other in various ways and you can put them on and take them off and use them in
various situations for different things.
So I've also been thinking about what else can you do with the wristwatch. So, you
know, I talked about the face for input. But what can you do for output. And I think that
looking at output for various things could be really interesting.
So one of them is RSVP, Rapid Serial Visual Presentation. So this is where you flash
words in one place really quickly so you don't have to move your eyes. So you could
imagine doing that on a watch. And if you look at the orange circle, I will show an
example that will go by very quickly.
So just a very fast thing. And so you could imagine using that, looking at your watch,
and then when you don't want to do it, you can train yourself to just turn your wrist and
have the presentation stop. So that's something I think would be interesting to look at.
Now, I talked about using the bezel for putting your finger against. You could also think
about doing input in other ways, like rotating it. I'd love to get haptic feedback on that so
I could have it resist as I turned it or have different detents as I turned it in various ways
to do interesting things.
You could think about using the sides of the watch to maybe put little pegs that would
stick out in various situations. You know, at CHI we had the inflatable buttons on
various surfaces so you could imagine something like that, so you could just feel it and
get some indication of what's happening.
There's actually a watch -- a commercial product that allows you to tell time by touching
your watch. As you move your finger around the rim of the watch, there's little divots or
little pegs that stick out. And when you pass the one that represents the hour it buzzes,
and when you press the one that represents the minute it buzzes twice. So you can be
sitting in a meeting and just move your finger around it without ever looking at your
watch. You can tell what time it is.
We also have this giant surface on the band that nobody's thought about taking advantage
of. You know, you could have that be a touch surface or, you know, thinking way far in
the future you could have extra display on there, something. You can imagine an entire
bracelet that does various things.
One of my favorite ideas, I often twiddle with my wedding ring. I turn it around and
around and around. So you could imagine having some sort of a connection between the
ring and the watch. You could have a tuned circuit so the watch could actually sense
what position your ring is in, and you could use that to scroll or something.
You could also have a ring on each finger and use it as a password authentication. So I
turn -- turn the rings to a particular position, and then every keystroke I type, it makes
sure that I'm doing the right thing. And so if anybody steals the rings, they've got to
know exactly what position they have to be in and what fingers they've got to be on or
something like that.
Finally, I think it would be very interesting to interact with other surfaces. So I have a -for example, a microphone in the watch. It turns out if you put a microphone on your
wrist you can hear your fingers move, you can hear your tendons creak, which is kind of
creepy but cool. You can also hear your fingers tap.
And so one of the guys in our lab did some playing around with that. You could actually
distinguish between rubbing your fingers like that and snapping them and so on. But I
was thinking I'd put that down on the table. Now, if I tap, then I can easily pick that up
on the watch. And I can imagine, you know, putting down your cell phone and putting
your wrist down and then triangulating between the two of them or something and being
able to interact with the surface in some interesting way.
Turns out there's a company in France doing that already. Not exactly this, but they have
I think three microphones they stuck on a surface, and then you have to train it and it
builds a huge table of what the different sounds are like and where they are. But that's
pretty cool.
So that's the end. Happy to take any questions.
[applause]
>>: When people were making up their gestures, you apparently didn't tell them to do
things that were socially acceptable, right?
>> Daniel Ashbrook: Oh, yeah, so that's a slide that got lost from my dissertation
defense. But I did actually. I had several criteria. I said the gestures should be reliably --
should be able to reliably activate the function that they're supposed to. They shouldn't
activate other functions. They shouldn't be something that happens in everyday life.
And I had the qualitative ones of they should be easy to remember, and they should be
socially acceptable. And basically because the gesture recognition was so hard, you
know, if I do a little jump like that to the left and a little jump like that to the right, they're
going to show up in the system as exactly the same gesture. Just because they're so low
amplitude, they look just like each other.
So because of that sort of thing, people quickly ignored socially acceptable and graduated
to huge, weird gestures just to try to get something that, A, is not going to conflict with
other gestures and, B, isn't going to show up in the library.
You know, I had people do a little hop to the right, and then that would show up in the
gesture library with me doing this on the keyboard, you know, moving my hand from the
trackpad to the keyboard, it would show up. So a better sensor, better algorithms would
very likely make social acceptability more likely.
>>: I wonder if -- so you would be an expert on making up these gestures, right? Do you
think your would be -- do you think your gestures would be appreciably better ->> Daniel Ashbrook: No. Not at all.
>>: -- than these amateurs'? No?
>> Daniel Ashbrook: So I sort of breezed through it. The people I had doing it were
largely HCI people. So I specifically recruited HCI Ph.D. students and so on from
Georgia Tech and asked them, you know, put the full force of your HCI behind this and
figure things out.
You know, I had one girl who spent an entire hour just drawing on paper trying to think
about different things and so on.
No, I think my gestures would be just as lousy, to be completely honest. I mean, I think
it's almost impossible to make good gestures with that particular sensor. So -- yeah.
>>: How are you interpreting John's notion of better? Does this [inaudible] are you
thinking accuracy?
>> Daniel Ashbrook: I'm thinking everything. I'm thinking accuracy, maybe I could -- I
mean, I had people -- I had people do like a hundred percent goodness on every single
gesture. So people were good at making a bunch of gestures that were differentiable
from each other. I had people get zero hits on the Everyday Gesture Library, so people
were able to do that.
So I could probably also do that, but I don't think I could do that in any more socially
acceptable way or any more memorable way or anything like that. I think that that would
be very difficult to actually do.
>>: Given your experience, what do you think is the acceptably large or small set that's
relatively easy to create? Given the constraints of not overlapping and socially -relatively socially acceptable [inaudible] ->> Daniel Ashbrook: Yeah. So ->>: -- [inaudible] memorable?
>> Daniel Ashbrook: I had a hard time thinking about that for the experiment. I was
like, how many gestures do I want people to make. Because I can come up with an
infinite variety of tasks. And I was thinking, well, do I want to -- you know, these are
all -- the ones I had them do are fairly -- things that were fairly iconic in general. You
know, play maybe -- play and pause isn't maybe something, but like fast forward and
rewind and so on, they have directions, you naturally associate them with that.
Then I thought, you know, should I have -- should I like ask people to make a gesture for
like Celtic music, like bring up that genre. Or, you know, play playlist No. 7, would
everybody draw a 7 in the air or would I have other things.
So I was trying to think of what's a reasonable number of these things to ask people to do
and so on. And in pilot testing, I found that people were just able to do eight, and it
seemed to be hard. I think more than eight would start getting very difficult. Again, it's
very dependent on a sensor.
>>: [inaudible] including if you have an eight [inaudible] relatively well [inaudible]
making a ninth one is really difficult?
>> Daniel Ashbrook: Not necessarily.
>>: [inaudible] memorization problem [inaudible].
>> Daniel Ashbrook: So I didn't test memorization. I thought about doing that. But in
the end I realized, you know, we have sign language. People can clearly memorize a
large set of gestures.
And by the same token, you can clearly make a large set of gestures that are recognizable
to people. So American Sign Language is an entire language. It's got lots of different
constructs and so on. But because it's human-to-human communication, it's incredibly
high fidelity. You know, there are things that are very subtle about sign language that
involve finger position and so on. If you could sense all of these things perfectly, you
could probably make an unlimited variety of gestures.
With this particular sensor, it's very low fidelity, it, you know, has lots of problems with
gestures looking exactly the same as each other and so on. And so with that particular
gesture I think that you would have a difficult time making more than -- you know, I
would wonder if you could easily make a dozen gestures.
Now, I only give people three hours also. And so in that -- you know, if I gave you a
month, you could probably have more time and -- but yeah.
>>: I think [inaudible] all this stuff and all the gesture recognition stuff you only did
discrete gestures, did you think about -- I mean, certainly some of the applications or
commands you're thinking of don't necessarily map well to discrete things, like, you
know, volume up, volume down, fast forward, rewind. When I build a tabletop for doing
video controls or whatever, right, I specifically don't do it with buttons, right ->> Daniel Ashbrook: Right. Exactly. Yeah.
>>: -- I usually do it with all sorts of other controls because there's better mechanisms.
Have you thought about how you can take what you've done so far and maybe make
gestures that give you better control than ->> Daniel Ashbrook: Yeah. So ->>: -- like rewinding, stop rewinding?
>> Daniel Ashbrook: Certainly you could do that. I mean, it would be very easy to do.
You could sort of fake it with discrete gestures that would be like my fast-forward button,
my rewind button or my, you know, fast forward and stop fast forward kind of buttons.
To do actual continuous gestures where I say, okay, as long as I'm doing this, you should
keep on doing volume up, I think that's definitely an interesting thing to look at. I'm sure
that somebody has ->>: I mean, that just [inaudible].
>> Daniel Ashbrook: Yeah, yeah, I mean ->>: But there's [inaudible].
>> Daniel Ashbrook: Yeah, there's a whole -- there's a huge range of expressiveness of
the human body that I've completely ignored.
>>: Like if I was trying to tell you silent -- like let's say somebody was doing a talk and
you were standing next to them and you were in charge of controlling the microphone
control and I was trying to tell you from across the room inaudibly how do to it, right,
and we hadn't talked about this ahead of time, right, I could come up with a way really,
really fast to tell you how to do it and to do it more, no, no, no, bring it back down, and
we would probably be able to communicate ->> Daniel Ashbrook: Yeah, absolutely.
>>: -- and you'll probability be able to tell me that I've got your attention and we'd be
able to do this without ever having done it before, right?
>> Daniel Ashbrook: Yeah.
>>: Do you think that that's just because, you know, our visual system, all that, is just too
rich? I mean, can you replicate any of that in this kind of [inaudible]?
>> Daniel Ashbrook: To a certain extent. I mean, I think that starts being an AI
complete problem where you need to have the complete range of like culture embedded
in the computer to understand. I mean, in some cultures, you know, they shake their
heads for yes and bob their heads for no. Stuff like that.
But a lot of that's just a question of gesture recognition in general. You know, I could
certainly build a system that would work in very constrained situations where I wasn't
doing anything else and I can very specifically define the gestures in a particular way.
But I think as you get more free, it gets a lot harder.
But I'm not necessarily an expert on the gesture recognition aspect of it, because there are
huge math conferences devoted to gesture recognition techniques. So there might be
something out there. Yeah.
>>: By the way, have you [inaudible] how whatever people can remember the gestures?
For example, you ask them to come back a week ->> Daniel Ashbrook: Yeah. I thought about that. I mean, the -- the real answer is I
didn't do that because this was good enough to get me graduated. But I did think about
that. I think that would be really interesting.
I actually ran into somebody and asked them if they remembered any of their gestures,
and they actually remembered at least a couple of them. But I -- yeah, I mean, they came
up with these really weird things, how much did they connect those really weird things
to, you know, how visceral were they.
You know, some -- I'm sure it would hugely vary on how much time they spent on each
one. And, you know, I had some people who right at the end went back and changed a
gesture and then they were done. And so they probably wouldn't remember that one as
well as the other ones. So I think it would be interesting.
>>: I can also -- can see the case actually they have a conflict, the creation [inaudible].
>> Daniel Ashbrook: Yeah. Yeah. I mean, it's a huge question. And, you know,
there's -- that's kind of related to the question of if I'm Microsoft and I'm making a new
watch to give to people and I -- do I spend a huge amount of time figuring out a set of
gestures that I'm going to give to everybody, or do I send it out and let them use a tool
like Magic to generate their own gestures.
If I say here's the standard set of gestures for fast forward and rewind or so on, is that
better or worse or just different than saying here you go, figure it out yourself.
>>: What's your intuition? Because you made a -- your study setup was almost in a -- I
couldn't tell if there was an assumption or you really wanted to know what different
people made as gestures.
>> Daniel Ashbrook: So -- what do you mean?
>>: Like there could be an assumption that letting you make your own gestures is more
memorable and, thus, you know, I gave all these people these tasks to do and they
[inaudible].
>> Daniel Ashbrook: So ->>: -- and it didn't really matter that your gestures are different than mine.
>> Daniel Ashbrook: Right. So -- so my -- my assumption was basically you guys are
HCI professionals. I want you to be HCI professionals doing this. So my supposition
was I'm pretending you are working for Microsoft doing this. You're going to be the
people who are actually defining the default set of gestures that are going to go out are
the people going to use.
>>: So they didn't think they were making, you know, Daniel's set of gestures ->> Daniel Ashbrook: Right.
>>: -- they thought they were making these gestures [inaudible]?
>> Daniel Ashbrook: Right. Although, that being said, that being said, I -- my -- the
people who recorded their own EGLs and came to the experiment, they were not HCI
professionals. They were just like, you know, friends of my wife and stuff.
So when they came and did it, it was very different. And I think they were thinking more
I'm going to make this for myself. And so I asked one of them, and I said, you know,
would you like to, you know -- would you ever want to do this for real. And she said it
was really hard and it was really frustrating, but, yeah, I think I would -- I think given this
product I would actually take into my home and I would probably take a week to do it. I
would spend a couple hours at a time on it, figuring things -- these things out. But I think
it would be enjoyable to do and I think it would be neat to have something that I could
actually customize myself. So, yeah, I don't know.
>>: You have this library of everyday gestures. Is it even feasible to sort of somehow
search it for spaces that are -- would be reasonable by the rule sets for [inaudible]?
>> Daniel Ashbrook: Not with that sensor. So the problem with accelerometer is you
can't back-trace from the acceleration to the motion that generated it.
So if I -- if I was really good at math and I was going to do another Ph.D., I think it
would be interesting to get an actual like motion capture kind of thing, build an inverse
kinematic model of the body, and then say, okay, you know, here's an input gesture, oh,
that doesn't work, but now I've got this model of what it looks like, let me try to permute
it in various ways and then predict what the sensor values will look like based on those
and try them.
And so do that really quick and figure out what sort of motions -- what sort of variations
on the input motion can I then come up with. I think that would be super, super hard,
though.
>>: Let's try and get back to [inaudible] so you said that, you know, maybe with better
sensors or something like that we might be able to sense gestures that didn't conflict with
the Everyday Gesture Library but weren't socially awkward. Do you really think that's
true? I mean, what do you think the advance in -- what advances in sensing technology
[inaudible] to make that upon? And like how does -- how do you [inaudible] gestures of
people [inaudible] and there would be appropriate kind of fit with what you're talking
about in the very beginning, like the motivation in terms of here are the situations where
speech don't work or here are the situations where other things don't work. I mean, how
do these gestures fit in with that landscape?
>> Daniel Ashbrook: Right. So the gestures people -- so I'll answer the second part first.
The gestures that people actually created probably don't fit in very well with that
landscape. You know, I can't do this thing while I'm holding a shopping bag. It's clearly
socially inappropriate. I can't do that in a crowd. You know, these are clearly very poor
gestures for actual real use.
But at the same time I think that, again, given a better sensor, and so let's say I have -- so
with the gyroscope and a magnetometer and a temperature sensor and an accelerometer
all in a little thousand dollar box, you can actually make an entire motion capture system.
So this company called Ex-Sense [phonetic], they're Dutch, they sell this like tight-fitting
cat suit that has these little sensors in them. And so their demo video is this woman
standing there doing this stuff, and the figure on the screen is doing it. She goes off the
video screen, comes around -- she's gone outside the building and has come back around
and is standing next to the window now and it's still synchronized with her perfectly.
So given these kind of sensors, I think that you can start doing much more subtle things.
I could just do little motions like that.
>>: Do you think those will be subtle things that don't conflict with Everyday Gesture
Library?
>> Daniel Ashbrook: Maybe. Maybe not. So one that I've thought of that I'd like is this
sort of wrist flick kind of thing. You know, shut off the cell phone, something like that. I
think that, you know, I can do that down here. It's not ->>: So I guess a related question is, is that the right way to go? Or is the right way to go
to be as clever as possible about the [inaudible]?
>> Daniel Ashbrook: Right.
>>: Right? Like if you spend all -- if you spend two months or three months or two
years trying to figure out the best, most subtle, reliable, robust activation system possible,
do you need to even worry about EGL after that?
>> Daniel Ashbrook: Yeah. It's a really good question. I don't know. Certainly there
are a lot of situations in which you can basically cheat. You know, it only responds to
my motion to shut off the cell phone when the cell phone's ringing, for example. I mean,
that's a no-brainer.
We can pull in a lot of context and start making things better. There are certain things
where that won't work. You know, when I'm responding with computer, then context is
easy. When I'm instructing the computer, then context is harder. Did I -- you know, is
there any context that will say did I really mean to start playing my music. Probably
some. But ->>: If you go with a pre-fix gesture or something to do the activation, right, I don't care
if it's grinding your teeth or whatever ->> Daniel Ashbrook: Yeah. There's a ->>: -- if you use the EGL to figure out -- I think this was Scott's point, right, using the
EGL or something to figure out the perfect activation.
>> Daniel Ashbrook: Yeah. Exactly. Yeah. So there's ->>: [inaudible] thing.
>> Daniel Ashbrook: Yeah.
>>: And then you use the prefix technique [inaudible].
>> Daniel Ashbrook: Right. Yeah. So I think there's a lot of things you could do.
>>: You might want to actually map the activation to something that's not [inaudible].
>>: Yeah, yeah, yeah, like a button.
>>: Yeah. I'm just saying -[multiple people speaking at once]
>>: If you figure out something that's perfect activation, it almost doesn't matter what -- I
mean, you can still -- you still need to be sure the gestures don't overlap over themselves,
not necessarily [inaudible] against everything else ->> Daniel Ashbrook: Although, imagine jogging. So all of my body is in motion, there's
lots of acceleration. I figure out the perfect activation gesture, and then every motion I
make while -- after I've done the gesture I still have to worry about conflicting with what
I'm doing.
So if I have a perfect activation gesture, the perfect activation could be me stopping stock
sill and then doing the gesture I want. Something like that. But, you know, in real
situations, it's going to be a lot harder. And this may just be an impossible task.
>>: With accelerometers.
>> Daniel Ashbrook: With accelerometers.
>>: And on that note...
>>: I have one more question.
>> Daniel Ashbrook: Yeah.
>>: You outlined a bunch of scenarios that people use to create these gestures. Given
the experience of watching all these people do that, do you have a recipe of if I give you a
gesture what would you do first to try to make it more reliable?
>> Daniel Ashbrook: No. People were really like ridiculously varied. Like ->>: I know. But in terms of like you as [inaudible] ->> Daniel Ashbrook: You mean if you give me what would I do. Ah. Interesting.
>>: Do you have an observation of like -- I see [inaudible] I give you a gesture ->> Daniel Ashbrook: Right.
>>: -- designed with your system, it overlaps, what's the first thing that you would try to
do to fix it?
>> Daniel Ashbrook: Probably impact, I think. Just it feels like -- it feels like a small
motion that I can do that ->>: Just making corrections ->> Daniel Ashbrook: -- impact or a really hard jerk to -- something that's maybe -- isn't
going to be -- isn't going to show up as much.
But, you know, again, we have impacts all the time. I accidentally bump things, you
know, I grab things real quick. So it might not be a good strategy. So, I mean, none of
the strategies had any significant impact on anything. So, yeah, it's a really super hard
problem as its turns out.
>> Amy Karlson: Well, I don't know if I'd wrap this up officially, but we'll let everybody
go. And thank you so much for joining us today.
>> Daniel Ashbrook: Thank you. And thank you all for hanging around and asking
really interesting questions.
Download