>> Amy Karlson: Okay. Well, I'm Amy Karlson... group at Microsoft Research. And today I'm welcoming Daniel...

>> Amy Karlson: Okay. Well, I'm Amy Karlson from the visualization and interaction group at Microsoft Research. And today I'm welcoming Daniel Ashbrook from Georgia Tech where he is a member of the contextual computing group and has been advised by Thad Starner. His primary focus of research is human computer interaction for wearable and mobile commuting and also ubiquitous computing. So he recently defended his Ph.D. and will be starting at Nokia Research next month. So I'm very pleased to have him here to speak with us today. And without further ado, I'll hand the floor over. >> Daniel Ashbrook: Thanks. So today I'm going to talk about largely my thesis research, which involves microinteractions, and I'm going to talk about mobile computing and all sorts of good stuff. So, first of all, I'm going to talk about mobility, and that's what I really like. So basically in my research life I've discovered that if we can be mobile, we're going to be mobile. And A.J. understands this issue. She was walking down the hall with her laptop reading. I'd be interested if you could type doing that also. But can you type doing that, like this lady? This is a bad idea. So I don't recommend that. So you can't stop people trying to be mobile. So the question is if you can't stop people from doing something, you might as well try to support it. So how are we going to support it. So that's basically what my life concentrates on. I'm going to talk about several things. The first one is what I'm calling microinteractions. So a microinteraction is an interaction with a device that takes a really short amount of time, and I will talk more about that shortly. But first I want to talk about impairments and disabilities from being mobile. So being mobile can almost be like only having one arm or being in a wheelchair or something like that just because it prevents you from doing -- from acting on the full capabilities of yourself. And so let's take a look at this next young lady here. She's got her phone, she's walking down this rainy street in Japan somewhere, and she's got several things that are preventing her from acting to her full capability. So one of them is what I'm calling accessibility impairment. If she's got her phone in her bag -- you can see this is a giant bag and her phone is probably going to fall to the bottom of the bag -- and her phone rings, she's got to dig through all the stuff in the bag to try to get the phone out. So she has a problem with accessing the device, getting at it. There's also an impairment because she's walking and it's raining and she's got like this one hand carrying the bag and the umbrella and the other hand is carrying the phone and there's probably like 400 people in front of her on the street going all sorts of directions. And I've noticed in places in Japan there's little stumps about this high and you can trip over them in places. And so she's got all this stuff she's got to watch out for while she's trying to do whatever it is she's doing on her phone. So she's got this impairment because she's actually using this while she's in motion. Finally, we've got this poor guy here, who it looks like maybe he's her friend, and he's like why are you paying attention to your phone and not paying attention to me. So there can be this level of social awkwardness involved in using devices as well. So the question is, you know, is this a problem with the woman or is it a problem with her technology. So I posit that it's her technology and we should think about that and what can we do to help her and people like her use their technology in ways that are going to be -- ways that are going to avoid these things. So in queuing theory, there's something called balking, which is deciding not to join a line if it's too long. So I go to the movie theater and the line's wrapping all around the outside of the theater. I'm like, you know what, I really didn't want to see the new Twilight movie anyway. Let's go see something else. So in mobility we can think about balking as deciding not to use a device for some reason. So, for example, if it takes too long to get to the device. So I want to take a note on my phone but it's buried at the bottom of my backpack, I'm just going to give up and write it on my hand or something. If it is going to be difficult to use, so if I'm on a train that's really vibrating a lot or if it's raining and I've got an umbrella and I've got a shopping bag, I'll probably not want to use it. Or if there's the issue of being rude, so I don't want to start doing stuff with my phone right now because I'm talking to you all, so that's the social aspect of it. So I'm looking at solving these through microinteractions. So basically microinteraction is a single-purpose interaction with the device that takes a really short amount of time. I'm using four seconds based on some earlier research on the response time of devices, how long you're willing to wait for a device to respond to you. This is on the input side, but it's the only number I've got, so I'm using it. So basically the question how can we make interacting with your mobile devices quick as looking at the time on your watch. I can look at the time on my watch right now and it totally doesn't interrupt anything in -- there are certainly socials aspects to this that you might have to worry about, but in general it's really fast and it gets around a lot of these problems. So how can we do that with other stuff. So looking back at these impairments. Microinteractions can be applied in various ways. So for the social impairment we can think about subtle interfaces, what can I do that is going to be not necessarily secret but isn't going to be socially awkward for me to interact with a device. When we think about on the go, we want to think about mobile usability, how can I use the device when I'm actually -- when it's not my primary task, when I'm not stopped, hunched on the corner bent over my device, but when I'm actually out in the world using it. And when it comes to accessibility, how can we access the devices really quickly. So then the question is, you know, how do we actually do this, how can we make microinteractions into a reality. So the first thing is to think about access time. And access time is basically how long does it take you to get at the device. So the poor lady with her purse, it might be 10 or 15 seconds as she digs through her stuff. Then there's also the amount of time it takes to further get the device ready, to open it, to unlock it, to do whatever you've got to do to actually get to the thing that you care about. So I did a study called Quickdraw. I presented this at CHI in Florence, so a couple of you might have seen it. I had a big mustache on. It was awesome. So in thinking about this, thinking about what are the stages of device usage, so, first of all, you want to get your device out of wherever you've got it. Then you're get it into the position ready to use. If, you know, my phone, I'm going to get it into my hand. Then I'm going to unlock the device, get it ready, say, hey, device, you need to start paying attention to me, I'm going to give you some input. Next you're going to navigate to your application. You're going to start it up, get to whatever you actually want to use, and you're going to use your application. And, finally, you're going to lock up your device and put it back away. Now, the thing to notice with all these steps here is that the only one that you actually care about is No. 5. All the rest of this is just like stuff getting in your way. And really 1 through 4 is the stuff that gets in your way the most, like putting -- locking your device and putting away, that's a practiced kind of motion that you can do almost without thought. You can just lock your thing and shove it in your pocket or whatever, and so that's not too bad. But the first stuff tends to be the complicated bits. So we were really interested in what is going on with these steps. So to give more motivation as to why you care about that, let's say it takes four seconds to get your phone out of wherever you've got it and to start it up and to get to your application. So that's not too bad if you're going to be writing an e-mail. So it takes me, you know, 30 seconds or something to type an e-mail. It took me four seconds to get the phone out. Not a big deal. On the other hand, if I'm wondering what the weather is, that only takes a second to look at the weather. Probably takes less than a second to look at the weather. So now there's four seconds that I spent digging the device out, and starting it up is a huge penalty based on this one second of actual usage. So we took a look at three different ways of carrying your device. We've got the pocket, we've got a holster, and we've got a wrist. A lot of people have asked why not bags, because lots of people store stuff in bags. Because there's really no standard bag. We thought about it and we're like, okay, there's not a standard pocket either, but everybody puts their stuff in the their pockets, so we have to test that, but, you know, you could have many, many sizes of bags and stuff in the bags. So we did pocket, holster, and wrist. We had people either stand there or walking around a track in the lab, and we basically were looking at how quickly can you get at this. So what we asked them to do was we'd have this incredibly loud and obnoxious alarm go off and they would have to get at it and respond to it. And so the first thing they'd see is this screen here on the left with the big number 11. The number would change every time. And then they'd pull it out, look at the number and slide the little thing to unlock the device. And they'd have to tap the number. And the point of this was to make sure that they actually looked at the device; that they just weren't just like pulling it halfway out of their pocket and pushing the screen or something like that. So we wanted them -- we didn't actually care about what number they picked; we just wanted to make sure they were doing it. So here's the track we had them walk around, a figure-eight thing, just walking, walking, walking. So we were measuring a number of things. In particular we were measuring how long did it take you to get the device out of your pocket and then how long did it take you to get the device actually into your hand to get it into the ready-to-use state. And so the total of those are shown up here. So obviously the wrist is going to be a lot faster. Now, the thing that's interesting is looking at these other two, what is the division of the amount of time getting the device from your pocket and then getting it actually into your hand. And it turns out that most of the time is actually getting this device out of where it's stored. So it's not so much the navigation time, the orienting of your hand, you're really quick with that, but it's the digging out of wherever you've got it stored that takes up so much time. So this really provides justification for thinking about the wrist and other really accessible on-body locations to put interaction technology. So I've concentrated a lot of work on the wrist, how can we interact with devices that are placed on the wrist, what sort of stuff can we do. So I've looked at two different ways, looking at touch and gesture. So I'll start off talking about touch, and then gesture is going to form actually most of the talk. So I was thinking about touchscreen watches and so on, and I was looking at what kind of advanced watches do we have today. So these are both cell phone watches. And they probably have got PDA functionality and so on. So how do you interact with these things. Well, you know, buttons are certainly one way to do it. So this guy's got three buttons on one side, probably got three on the other side, menus and so on. No problem. You can certainly go overboard with buttons. So, you know, the other way to do it is touchscreens. So here are some touchscreen watches. These two guys are cell phone watches again. This is a Palm-based OS device that is no longer sold. And there's reasons for that. One of the reasons is that you have these tiny screens and you inevitably get styli. And there's these little tiny toothpicks. In some cases, like this one, they actually unfold. So it's this little thing in the band of your watch. You have to pull it out, then you have to unfold it, and then you can use it on your device. And so when you've got stuff like this, it starts to get a little bit silly. So that's one thing that I really think is dumb and wanted to think about fixing. So the other thing is here's some more random technological watches. There's a commonality in all these. They've got these square screens and then they've got all these round elements to them. So nobody's really made a round like digital watch, as it were, and certainly no round touchscreen watches. So I thought, you know, let's think about what we can do with the circular watch. So the goal, then, of this research was a finger-usable, round touchscreen watch. And when I was thinking about how to use this thing with your finger, I thought about Jake Wobbrock's EdgeWrite. So you use your stylus here and you use this little template and it helps keep the stylus steady. You slam the Stylus very quickly from one corner of the thing to the other. There's very little precision involved in what you're doing. It's a very quick, easy technique. And so watches have got these bezels around the edges, and I thought, well, we can run our finger around the edge of that and get some stability off of that too. And that makes sense when you start think thinking about how would you make an interface on something like this anyway. You'd probably want to put buttons around the edge and have something in the middle. So then the question is how many buttons do you have, what do you have in the middle, how much space is there for these things. So essentially you can have lots of buttons, you can have few buttons, you can have lots of area in the center, you can have not very much area in the center. So what I did was I was looking at the tradeoff between these, you know, what effect do these various choices have on the usability of the device, your error rate essentially in using it. So since we -- since the round touchscreen watch doesn't exist, we made one. It's pretty awesome. So we have this steel plate here in the middle. It's got this bezel cut around it. So of course you can't have the bezel like on a real watch, because we don't have a round screen, but we've got this bezel like we've have on EdgeWrite. So that provides support for your finger to run around. And the basic idea was let's do a fit-style task. So we're doing reciprocal pointing back and forth between two targets. And I thought, well, you know, the sliding might be good, but maybe there's other ways to do it. So I have a standard tapping-style interaction, where you're just tapping back and forth between two targets. We have the straight through-style interaction where you're moving your finger just in a straight line like you do on a standard square touchscreen. And then there's this rim-based interaction where you're sliding your finger along the edge. So we had people -- you know, this is some actual data. So you had people run their fingers back and forth between these various targets as quickly and accurately as they could. And we ended up calculating an error metric based on this. So you can actually predict the error based on the size and basically the shape of your button. So how many buttons do you have around the edge, how thick are they, essentially how much space do you have left in the middle gives you a pretty good little curve here. And so you can stick in these constants that are based on the kind of movement and actually predict the error rate based on the number of buttons you've got. So just to give you an idea what this might look like, here's a sample layout at 4.8 percent error. I can get 12 buttons and I can have 75 percent of the area left in the middle for other stuff. I can put a display there or whatever else I want. And this is for the sliding your finger around the rim interaction. So what would interaction like this actually look like, what would you do with it. We built a couple applications. I can't remember if that video's in this first -- the talk or not. But basically the idea would be let's make it stable, let's make it so you're not accidentally going to brush it and have something happen. So let's say you run your finger around the watch 360 degrees to say I'm going to do something. And so this is actually really super fast. I'm going to talk about a couple of ideas, and it's going to take me way longer to describe them than it would actually take for you to do them. So let's say you want to see what the weather is. So you can touch the face of the watch and get a bunch of icons for various selections. You can drag your finger down to the weather icon and then sweep your finger around very quickly, at 360 degrees, and essentially reveal your weather application in this really fast manner. And so this could take under a second to actually do. We've done a couple prototypes. So the other thing you could do is let's look at my schedule for next week. And so I can, you know, again touch the face and pick the month and sweep my finger around and pick out -- actually, having 28 buttons is too many for a decent error rate, so you could pick the week and sweep your finger around again and then pick the day and have it actually pull up little bands or something to show you what your schedule might be. So I do have the video. So video's going to play three times. The first two times are going to be actual speed, and then the third time slowed down just so you can see what's going on. So this super quick, round, 360-degree interaction, it takes -- this one it actually took exactly a second. So you can see this is actually a super fast interaction. And here I didn't actually show you choosing from icons, but it would just be that three o'clock position you would know this is where the weather is, I'm going to zip around. So that's it for touch. I'm going to go on and spend a lot of time now talking about gesture and how we make gesture for mobile devices. So this was the bulk of my dissertation work. It's a piece of software, a concept, really, called Magic. And so when I talk about gesture, now I'm talking about motion gesture as opposed to the touchscreen gesture that I just showed. And so the idea here is you are moving some part of your body in a freeform fashion. It's being sensed by something and the computer is going to respond to it. So gesture is useful in a variety of situations. So we've got these places where you have access problems, like the lady with her umbrella and all that. You could have your hands full but still you could still gesture with a shrug of the shoulder, something like that. Social situations where it would be inappropriate to dig around in your bag to shut off your ringing phone, maybe you could just do it with a flick of the wrist or something like that. You can solve these first three with speech, but there's certainly situations in which speech is inappropriate. Or in a crowd, for example, it's probably too loud to use speech effectively. So there's a number of situations in which you want to think about gesture. So there's a problem with gesture. Let's say I define this gesture to delete my e-mail, which seems great. It's a pretty recognizable gesture. But then it turns out that I'm doing this all the time in conversation and, oops, there's goes all my e-mail. So that's a little contrived, but there are plenty of other things that you wouldn't think about that you might do. So you want to avoid this problem of having your false positives of gestures. So I know the cell phone industry has come up -- has had this problem. You know, you sit on your phone and call somebody. So they came up with a solution to it, which is push to activate. So, you know, on the iPhone you push a button and you slide your little slider and that tells your phone, okay, I am ready to give you input, you need to pay attention to me now. Or you open your phone or whatever. Lots of other stuff use this too. So this watch on the left, you tap the face and the hands move to be a compass or to be an altimeter or whatever. It's really cool, but it could be very confusing, you know, if the hands are moving around wildly as you are looking at it. So you have to hold down the crown for like a second to indicate that you're about to talk to it. In the same way in speech recognition or Star Trek you say "computer" first and then "call commander Riker" and the computer responds to you. And even on the Wii, various game systems, you're doing gestures, you hold down a button to indicate that you're about to do something. So in the bowling game I hold down A and then I bowl, so it knows that I'm not just making random motions. Now, the problem with push to gesture or push to activate for gesture is you are involving your other hand again. So if I've got something on my wrist that I'm going to gesture with and I've got to push a button and then gesture, first of all, if my hands are full of shopping bags, that doesn't work. Second of all, if I'm pushing a button, I might as well just push the button and just have it do the task instead of doing a gesture. So I'm trying to avoid push to activate for doing gestures. So the way I do that is with the software I built called Magic. And Magic is actually a general gesture creation tool. So the idea is let's help people figure out what kind of gestures are going to work well for this sort of situation. So my work, I'm using accelerometer on the wrist. You could use lots of different sensors in lots of different places. We have a bunch of these, so this is what I did. This is the Bluetooth accelerometer. And I'm going to talk briefly about the recognition I use so it will make sense. So I'm using dynamic time warming. And dynamic time warping basically is for saying how similar are two signals. So these might be me doing this once, and then the second one might be doing this again. And so I did them slightly differently each time. I want to say are these things similar enough that I can say that they are essentially the same gesture. So we have two signals here, the green and red one. They look really similar. There are several places where they're different. So, for example, one thing that's really clear here is that the peak on the green one happens earlier than the red one. So they're different but we want to be able to say, you know, how close are they really or can I pretend that they're the same. And so what we do with dynamic time warping is essentially figure out how do these things match up. So can I -- where's the peak of this one, where's the peak of this one, let me draw a line between them. So I do that for the entire set of signals and find the best way to draw these lines between them. And then I plop the two signals on top of each other and basically now the lines have collapsed to be as short as they can, and so add up all the lengths of the lines, all the little gray lines, and that is my similarity. So intuitively, if these -- if the green and red were exactly the same, all those gray lines would now be of zero length, and so my similarity would be zero or exactly the same. And so the more different they are, the longer those lines get and the higher the score is. So that's how the recognition works in a nutshell. So how the gesture design process works. Currently this is what people do. You create a bunch of gestures and you compare them to each other with, for example, dynamic time warping and you say are these gestures too similar. So if I have a gesture that looks like this for let's say playing my MP3 player, and then I have another gesture that looks like this for calling my mother, they're somewhat similar. And if they're too similar, then when I go to play my MP3 player, then I might accidentally call my mother instead. And so this is something you don't want to have happen. So you've got to make sure that the gestures don't conflict with each other. So that's the first stage. If they do, then you've got to go back and start over again and pick some new gestures, especially in a system -- you know, say I've got a DVD player I'm doing and I want fast forward and rewind. Fast forward and rewind are related conceptually, so I might make gestures that are related in their motions, and then if fast forward doesn't work well, then I probably am going to redo rewind, so it will still be related. And so it's -- it can be pretty involved to do this. So once I've gotten through that, that's great. Now I'm going to go out and I'm going to make sure that stuff's going to work in the real world. So for my DVD player I'm going to, you know, put some people in the usability lab with the living room furniture and I'm going to have them, you know, eat pizza and whatever and make sure that they aren't accidentally fast-forwarding their movies. For a mobile phone or something like that, I'm going to send them out and put them on buses and walking around and so on, and I make sure it doesn't accidentally stop and start their music. And that can really take a long time. We had one guy who did a watch where you made gestures above it, and it got activated by everything. He could -- his sleeves and walking through doors and people going by and so on. So it took him two weeks to come up with a set of gestures that he could actually use reliably to do this. And the problem is if you get any gesture that is activated all the time in real life, then you've got to go all the way back to the beginning and start over making new gestures and then go through this whole process again. So that sucks. So what I've got instead is a more parallel process. So you can check to see if your gestures are going to connect with each other, and you can check if your gestures are going to conflict with things that people do in everyday life at more or less the same time. And this part takes minutes instead of days. So the top part is the interface for the system. It gives you visualizations and so on, and I'll show some examples. The bottom part here, figuring out if your gestures are going to conflict with things, is through something called the Everyday Gesture Library. And so I'll talk about that next. So the basic idea is I'm going to send people out like I would normally when I'm testing the thing, except I'm just going to stick the sensor on them. So in my case I'd stick the wrist-mounted accelerometer on them and just set it to record and ask them forget it's there, just go on and go about your life, drive around and have meetings and eat your breakfast and whatever. And then when they come back, I get all those motions that they did and I stick it into the Everyday Gesture Library. So now later on when I say, okay, I'm going to make this interface and it's going to be awesome and I want to figure out what people are doing, so I say, okay, here's a bunch of gestures that I want and here's my fast forward and rewind and call my mom and so on. And so I ask the library, hey, do people actually do this stuff? And it might say, oh, well, you know, some of these are good, but, you know, this one looks an awful lot like this guy eating a doughnut, and this one looks just like this guy waving to his friend and so on -- so... >>: But you don't know what they were doing, right? I mean, it's not like the data ->> Daniel Ashbrook: I'm glad you said that. So I do actually know what they're doing. I need a picture of it so I remember to say it. So in my study I actually had a hat with a fisheye camera pointing down. >>: [inaudible] they wore the hat? >> Daniel Ashbrook: And so they wore the hat and went around and did their stuff. And so you could actually see what the hands are doing, and figured out here's what's actually happening. So I can actually see you're eating the doughnut instead of being like, you know, thumbs up for calling your mom or whatever and so on. So we actually have contextual information, and I'll talk about how useful that was later. So now the EGL doesn't actually say okay and bad; it says here's how many times this happens and it allows the designer to make the choice about whether or not it's a good idea to include that gesture. So if it happens once in a day is that okay or is that still too often. And so you can make a decision based on that. So I collected a bunch of data from my study. >>: I have a question. >> Daniel Ashbrook: Yeah. >>: Are the gestures in the library presegmented, or is it just continuous data? >> Daniel Ashbrook: It is continuous data, but you'll see a bit of this later, basically I look for -- solely for the purpose of speeding things up, I look for places that are interesting, which is they're higher energy. So if you're sitting with your hand on the desk for an hour, it just skips right over that, because there's nothing happening there that could possibly be a gesture. So I collected these EGLs from a lot of people and got almost 60 hours of data total. >>: Are you No. 1? >> Daniel Ashbrook: I am No. 1, yes. >>: [inaudible] >> Daniel Ashbrook: Everybody else were volunteers doing various other things. They didn't know any details of the study. So that's been one question raised. And there's a lot of interesting activities that I got in here. So I asked people not to just hang out and watch TV or sit around and use the computers, because that presumably would after about an hour be exactly the same data over and over again. So I got things, you know, attending conference. I was at CHI with this last year, and you didn't even notice the hat. See? You know, brewing beer and knitting and hiking and making cheese and all sorts of great stuff. And most of these are not me. So just so you can see what the video looks like, here are a couple of examples. So it is really funny to see your nose like that. So but you can see what's going on. You can see in the upper left one there I'm opening a refrigerator and getting out some chipotle or something to microwave. The bottom one is another one of the subjects hiking and manipulating the water bottle or something. So you can tell what's going on, especially if you're the person in it. But even if you're not, you can sort of get a general idea of what's happening. So what I did to give people to test an interface was part of the library that I collected. So in machine learning you have a testing set and you have a training set of data. So I'm making a new algorithm. I want to see how well it's going to work. If I just get a whole bunch of data and tune the algorithm data and test on it, then that's invalid because I have trained on my testing data. So in the same way what I've done here is I've separated out a little segment of it to give the users to tune on, and then I reserved the rest of it, plus all of the other EGLs I've got for the testing data. And so everybody got the same little bit of training data, and then I tested their stuff on the rest of it. There are some limitations to the Everyday Gesture Library. It is not -- it can never be guaranteed to span the space of everyday life. I might collect five years continuously of data, and then one day I'll decide to go bungee jumping. And I've never done that before and it turns out that the high acceleration of bungee jumping triggers the thing to call my mom and she hears me screaming. Can't guarantee against that. You can probably get enough data to -- so that that sort of thing is fairly uncommon. The other thing is if you change your sensor, if you change where you're putting it on your body -- you know, I move from a wrist-mounted sensor to a forehead-mounted sensor or something, obviously it's not going to work anymore. If I decide I want to do an interface for fighter pilots, the data I collected while attending CHI is probably not going to be useful for whether or not they're going to accidentally activate things in the cockpit. And, finally, people with varying actual abilities are going to be different too. So if I'm making something for people with Parkinson's disease, it's not going to be use -- the EGL I collected for that is not going to be useful for people with broken arms basically. So there's some limitations. Even so, it's pretty useful. So I had a study -- I'm going to talk about these more briefly, but you'll see these in the videos I'm about to show. So, you know, I had people design for a mobile audio player. So there are three different things you can do in Magic, and you can move between them fluidly. But the first thing you've got to do is you've got to make some gestures. And so that's the first stage. And I will now show you what that looks like. Oh, but first I'm going to talk about examples. So when you're doing the kind of recognition I'm doing, you do something called template matching. And so basically the idea is I want to make a particular gesture, and I'm going to use pen gestures here as an example because they're easier to show on the screen. So I want to do a cut gesture for Microsoft Word or something. So my gesture is going to be cut, and that's going to be essentially a category. And then within that I say, okay, here is the shape that I want my pen gesture to look like. It's going to be like that. So that's one example. And so now I'm going to record a bunch of examples. I'm going to say here's all the different ways that I could do the cut gesture, so I want to account for variation in the way that people might actually do it, so I have really big ones and small ones and sloppy ones and so on. And so these are a bunch of examples. Later on the system will go and it will take these examples and it will compare them to the input that it got and say, okay, does whatever somebody did match up closely enough to one of these, that we can say, yeah, they were probably trying to do cut. And so that's the same way that the gesture recognition in Magic works, except it's with motion gesture instead of pen gesture. Okay. So now the video. This is what the interface looks like. And you're going to see up here is the alive input from the sensor, so that's actually zipping along as the person used their arm. So over here this person is making some gestures. And so, again, these are the categories and he's making them for all eight of them that he was supposed to do. So play/pause, next track, previous track, and so on. So here he's made a whole bunch of gestures. And now what's going to happen is you're going to see him actually creating some of the examples. So I have another hat cam and I have a monitor cam. And so he's going to make four or five of these things, so you can actually see him actually moving his arm as he does them. So he makes that swinging gesture. You can see the motion up there, and then it appears over there, and here's the new example. And so he's going to do this several times, so you can see what's going on in all the different parts. And one more I think. Yeah. So he's made this gesture now and wants to know is this any good. So the system gives several pieces of feedback, which are these columns here. So it tells you here's how long each of your examples are, so you can say, you know, are -- is one of them way longer than the other, did I do something wrong. It says here's what the system thinks your gesture looks like. So if it says -- if it doesn't say next playlist there, even though I just made next playlist, then I've got a problem. Finally it's got something called goodness, which says basically how well do my examples match up with the other examples in my class or not. So if -- here we have all of those examples are actually recognized as next playlist, and so goodness is a hundred percent. Up here we have, you know, a lower goodness because maybe the next -- it recognizes next track but maybe it could potentially be confused with some other stuff, and so you want to have -- you want to have high goodness. That indicates that you're going to have your gestures be fairly reliable. So here's a case in which there are some problems. So this guy is recognized as previous playlist even though it should be volume up. And this one is recognized as volume up, but it's got 25 percent goodness, which says that, you know, there's a pretty high chance that it's going to be misrecognized as something, so maybe it looks like another gesture too much and -- yeah. >>: Sorry. Is goodness taken into account for gesture library? >> Daniel Ashbrook: Huh-uh. >>: Okay. This is just -- >> Daniel Ashbrook: Nope. This is just ->>: -- self-consistency. >> Daniel Ashbrook: Just self-consistency. Yeah. Yeah. So the gesture library will come in a minute. I've also got some visualizations. So, for example, it's very clear that one of these things is not like the other. Each of these represents an example. And so this is a really quick way to tell when something has gone amiss. And so what we can do is look at the traces for all of them, which are hard to interpret, but I can see, oh, look, this one is way longer than all these other ones, so that's really easy for me to tell, you know, there was something wrong here. And, you know, if -- the width of the bar there is the standard deviation of the distances to everything else in its class, so if one's really wide, I can say, well, you know, it doesn't match things very precisely. If it's really far to the one side, then I can say it's -- you know, it really doesn't match all the rest of the gestures in its class and so on. So the next stage of Magic is gesture testing, which for time's sake I'm not going to talk about much. Basically you can do a set of freeform gestures and see if your gestures that you defined in gesture creation show up. So I'm just going to test them out and make sure that when I do this several times, then every time it's recognized properly. So the last stage is the Everyday Gesture Library. And here's where I'm going to see if what I'm doing shows up in somebody's everyday life. And so I'll show another video of what's going on here. So in the top left we've got the actual Everyday Gesture Library. And so you can drag your mouse through it and see the actual hat video. So this is actually from CHI. And so just to point out -- just to point out these gray bits here are the areas that are boring areas. So there's not much going on in those. And so, you know, we can see talking and gesturing and so on. So I have this and I say, okay, I want to see if play/pause is going to occur, and so I'm going to search and it's going to go through this entire thing and look to see if any of the examples of play/pause are recognized -- are basically -- would accidentally be triggered by the motions that I made during this set of recording. And so I'll click check, and this actually takes about 15 seconds because it's a huge amount of data. And I say, oh, look, here are five places where this has shown up. So I would have had my music player accidentally start playing five times when I hadn't intended to, which is probably bad in the space of five hours. So what we can do is go back and look and say, okay, where do these things happen. So here in the EGL graph you can see that little green box shows here's the place where this one actually happened, and so I can double-click on this and it's going to show me the video. It's going to show the video of what was happening when the EGL was recorded and it's going to show the video that was recorded when the person created the gestures. And so you can compare all these simultaneously and actually see what was happening. And so you can see here I'm rubbing the bridge of my nose, and it looks -- that forward motion as I come up, it looks a lot like his flicking-forward motion that he's doing there. And so this can give me a clue as to what was happening when the gestures accidentally were triggered, and maybe that can help me figure out what to do to fix it. So there are a couple of things you can do to fix it. One of them is you can adjust basically the sensitivity of the system. And so you see these -- they're a little bit hard to read, but these similarity numbers here. Those are the distances that you get from a dynamic time warping process. And so I can actually adjust the sensitivity downward and I can say, okay, anything that is above -- you know, this is 6.64. So anything above 6, let's just ignore that. And so that will get rid of the occurrences in the Everyday Gesture Library, but there's also influence how the gestures relate to each other. And so if I make that too low, then it will be impossible for me to actually make the gesture I want to make and have it be recognized when I'm trying to do it. So there's sort of balance here between not having stuff show up and having the gestures be reliable enough for me to make. Now, the other thing I can do is -- you'll notice that over here that it was only play/pause 1 that showed up. It showed up five times. And the rest of them showed up zero times. So I can just say, okay, maybe play/pause 1 is just a bad example. Let me just get rid of that, and then it will never show up. And that's good, but then, again, we have the problem of maybe play/pause 1 looks like -- is a really good representation of what people are going to actually do. And so, again, there's this tradeoff between having the gestures be recognized reliably and not. And so what you'd really want to do is after you go through this process, you'd want to bring in other people and have them do the gestures and basically iterate through to make sure your gestures are going to be self-consistent enough that people can use them. >>: I have a question. >> Daniel Ashbrook: Yeah. >>: So if you eliminated play/pause 1 from your set of positive examples, does that mean that none of the other play/pauses would have shown up in the gesture library? >> Daniel Ashbrook: Right. Because they've got the zeros next to them, so they're tested independently. So none of those have shown up. The other gestures here have not been tested yet, and I think actually if you see the rest of the video, they show up thousands of times, so they were really poor examples. And showing up thousands of times is one of the weaknesses of the Everyday Gesture Library, because you're not going to watch thousands of clips of video to see what was going on. So -- yeah. >>: What is the individual that you collected the data from [inaudible] if I find ->> Daniel Ashbrook: Yeah. So this was just me. The other people I collect the data from, I had them come and use this as well. So actually I will get to that organically here in a couple of slides, talking about the various people. So I ran a study. The goals were to figure out is it usable, what strategies are people using to design these gestures, is the EGL useful at all, and, given a common task, are people going to make a common gesture set. You know, I think that people are starting to settle on a common set of things for editing operations with pen gestures. Is there anything like that for controlling your music player. So, again, here are the gestures. So I've got play/pause, shuffle. And then the rest of them are paired gestures. So next and previous track, volume up and down, and next and previous playlist. And volume up and down are about 10 percent because the system doesn't do continuous. So I couldn't say as long as I'm doing this keep on turning the volume up. I've got to have a discrete gesture for it. And that's just the limitation of the recognition. So got a bunch of participants. So there's actually a third category here that I haven't shown. But since you asked, I also had a My EGL category of people who volunteered to collect the EGLs and I ran them through it as well using their own Everyday Gesture Libraries. The rest of these people were tested against that one segment of my library. And it turns out there wasn't that much difference. People didn't use the video very much. And I'll talk about that very shortly. So I have them do this task and looked at how they did. People did really well with the EGL. And when they didn't have it, they, as you might expect, did terribly. So there was a very significant difference between people using the Everyday Gesture Library and people not. So if you had the Everyday Gesture Library, you got about two accidental activations per gesture per hour. You might point out that that's lousy performance. If you didn't have the EGL, you got 52 per hour, so almost one per minute per gesture. So that's really bad. Now, the reason this one is so bad is because it turns out an accelerometer is not the right sensor for this. Any small movement looks like any other small movement to the accelerometer essentially because they're so low magnitude that they're very similar. There's probably some algorithmic things you can do to fix that. Even better would be to use a gyroscope in conjunction with some thing things that will improve the sensing. But, in principle, the Everyday Gesture Library is great because it vastly reduces the number of hits, and that would translate to other sensors. I had four people who managed to get no occurrences at all. >>: And success means the participant defined a set of gestures that didn't occur naturally? >> Daniel Ashbrook: Right. So I'm going to talk real quick about how goodness is calculated now. Goodness is the harmonic mean of precision recall. And precision recall are a little bit slippery, so I've got some pictures. So on the left one there, basically what we're doing is trying to capture all of the orange dots within the orange circle. And so on the left one there I've accident- -- I've got all the orange dots but I've accidentally got a couple white ones. And so there my precision has been lowered. And so in the same way, if I have a gesture example and it matches really well all of the other gesture examples of that class but also matches a couple extra ones, then it's got lower precision. So the goodness goes down. The next one I didn't get any white dots, but I missed one of the orange dots, so my recall has gone down. Here we've got both of them happened, and so the goodness score has gone down even more. And then here we've done exactly what we wanted to do, and so we have a hundred percent goodness. So that -- just to give an intuitive notion of what's going on. And so when it came to goodness, the goodness actually basically -- or self-consistency of gesture examples was really high. And people had 86 percent goodness in general. Seven of them had nearly a hundred percent goodness for average for all of their gestures they created. As you would expect, it doesn't have anything to do with the condition of whether or not they got the EGL, because it doesn't have anything to do with the EGL. We could make it depend on the EGL as well. That's probably something to do later. Now, the more fun stuff is the qualitative results, what people thought about it. It turned out to be a really hard task to say sit down, define eight gestures that are going to control something, the gestures can't conflict with each other, you can't have play activate fast forward. And on top of that you can't have them look like something people do in their everyday life. That was really, really hard. And I was surprised people actually managed to do it at all. But they did. And people actually really liked it. In fact, one person even said this is so awesome I want to come back tomorrow and keep on doing the experiment. So that was really gratifying. I thought that was great. In terms of the Everyday Gesture Library, people were afraid of it because every time they'd do a new gesture it would get thousands of hits in it. People didn't care about the video. So that's kind of a -- I mean, it's good and bad, right? So the video was not helpful, so that was sort a failure. At the same time it says that we don't have to have that hat, as stylish as though it is, you don't actually need it to give to people. And, again, I think this is because thousands of hits you're not going to go through and you're not going watch every single one of those. On top of that, people said I can't do anything about it. You know, if it turns out that you rubbing the bridge of your nose is conflicting with my gesture, I can't stop you from doing that, so I've got to change my gesture. So it really doesn't matter what you were doing, it just matters that I had a conflict. This is one of the guys who recorded his own Everyday Gesture Library and he thought it was interesting. He said it was really useful, but when I pressed him he couldn't really say why it was useful. He just I guess got an intuitive feeling of what was going on and maybe it helped him somehow. So I thought that was interesting. So my graphs that I briefly showed you were not widely enjoyed. They were complicated. There were some other graphs that you didn't see that were even worse. And so people said, you know, these are -- these are -- they're useful but they are difficult to understand. There's a high learning curve in figuring out what's going on. So I think there's some interesting research to be done in making machine learning concepts accessible for people who are not machine learning professionals to -- my graphs, for anybody who is a machine learning person, my graphs are basically taking your confusion matrix and splitting out and visualizing it in various ways. Confusion matrices are confusing for nonexperts. My graphs are a little bit better, but probably not much. So I think there's some more interesting work to be done there. So I gave a questionnaire. One of the questions I asked them was, well, would you like to actually own this thing that you've just designed with all your gestures. And the [inaudible] deviation here don't tell the story very much. Some people said absolutely not, it was horrible. A lot of people said, yeah, I'd love to have that. What was really interesting is that there's no correlation here between how well they did and the quality of their gestures. So people with really awful gestures might have said anyway, yeah, that would be great, and people who made good gestures might have said no, that was terrible. So I thought that was pretty interesting. I also asked them what else would you like to control with gestures, if anything. And I had a lot of really interesting responses. So there were several people who said cell phones, presentation software. I had a bunch of media equipment in the house and in the car. There was a controlling a robot to basically pick things up for people with physical disabilities. Replacing the mouse. But my favorite one was the Roomba mess indicator. And there was no other information given on the survey that was given, but, you know, you can imagine what that would end up looking like. So I thought that was pretty cool. So now I'm going to talk about the strategies that people used for designing these gestures. And I thought this was really interesting. So I basically went through all the videos and figured out here's what people are doing to make these gestures memorable or make them not show up in the Everyday Gesture Library. And I'm going to show a video for each one so you can see what's going on. So the first strategy is to make things iconic. An iconic gesture is basically one that looks like something else. And so this guy said, well, I'm doing this play/pause gesture and it's like pushing a button. And so he does this push thing. So I'll play that a couple more times so you can see, because it's short. And so iconic is sort of in your head. It's this is related to what I want. And so that's the push for playing a pause. So directional gestures and paired gestures are often associated with each other. So this person did next and previous track, and they were related, and the directions were also important. So for next track she's going to go to the right, and for previous track she's going to go to the left. So play next track first. So she just does the sweeping thing. And previous track is just the opposite direction. I'll play them simultaneously so you can see. So the directions are important. And the fact that they're opposite is also important because they're opposite functions. Some people did impacts. So this guy's clapping. There's also hitting, just hitting your leg or hitting the table, things like that. So these -- I thought that these might actually end up not showing up in the EGL, but there's actually -- I should mention none of the strategies people used actually had any significant impact on anything. So they're interesting, but the person who's doing it was the most important thing. So that's interesting as well. I like these. This is the best gesture in the whole set. Basically these are people who independently figured out push to activate. So this guy said he was explicitly thinking about Star Trek and saying "computer" before addressing the computer. And so he does this pre-fix thing. So what it is, for every gesture he does he makes a particular motion and then he does the gesture. So what he does is, first of all, he cups his hand on his ear, because it's an audio player, then he brings his hand down and he spins it in a circle, and then he does his gesture. Now, in this case it's the shuffle gesture. I should also mention that this, during my defense, got my committee making Three Stooges impersonations. So cupping his hand, moving his thing, then shuffle. And I'll play that a second time because it's awesome. >>: Very elaborate. >> Daniel Ashbrook: Yes. Very elaborate. He actually explicitly said I know this looks crazy, but we have people wandering around with Bluetooth headsets on talking to themselves, and so I'm just assuming explicitly that whatever I'm doing here is going to become as accepted as Bluetooth headsets. I didn't tell him I don't think Bluetooth headsets are very acceptable, but... So this next guy, he did a post-fix gesture, which was much less elaborate and not as much fun. But basically at the end of every gesture he does he rotates his wrist. And so I'll show you what that looks like. So basically this could be thought of as a confirmatory gesture. So I do the gesture and then, oops, I didn't actually want to do that, so I just don't rotate my wrists and then it doesn't happen. So I thought that was pretty interesting. He was the only one who came up with a post-fix gesture instead of a pre-fix. So some people did jerks and directional changes. So, for example, if I was drawing the letter A, that change of direction at the top would be a directional change. I could also do something like that, which, you know, it's not an impact, but it's going to show up in the accelerometer in much the same way. There's going to be a huge spike in acceleration because I'm changing direction so quickly. He's got just this sort of sweeping thing going on. That is -- here the direction doesn't actually matter. So that's why I didn't file it under directional, because he's moving in a particular -- he's making a directional change, but it doesn't have any relationship to the gesture he's actually doing. A lot of people did repeated gestures. A couple of them explicitly said this is because I don't think you -- in the Everyday Gesture Library people have done this thing multiple times, so he does this thing a couple times. You know, I had people like brushing their arms several times and just various things that they did multiple times. Finally, this one's pretty interesting. This is what I'm calling retrospective realization, and it's not exactly strategy. Basically what happened is a couple people realize that unintentional movements were affecting them. So the way that the recording works is you say start recording, and then it listens, and when you start moving it it records, and when you stop moving it it stops recording. So what this guy did is for this top one he does this sideways brushing motion, and then he stops and it stops recording. Well, then he noticed one of them, that one of the examples had a really high goodness as opposed to the rest of them, which were really low. So we went and watched the video for it and did the brushing and realized, oh, I didn't pause long enough, I put my arm down. So I'll play those again so you can see them. And so it turns out that putting the arm down motion actually vastly improved the goodness of his gesture, and I think it even improved the number of hits in the Everyday Gesture Library. So we went back and deleted all these and made his arm go down in all of them. So I thought that was really interesting, a really fascinating use of video in the system. Okay. So nearly done. So I -- these are the goals of my study. I found out that people, yes, they liked Magic. They thought it was great. They had some interesting strategies. The EGL was very useful and users did not make even anything close to a common gesture set, and that was very interesting too. So the only thing that was even vaguely close to common was for shuffle. I think four of the 20 people I had did a shaking motion. So we had the guy you saw with the shaking on both sides of his head, the other people just did a shaking kind of thing. So shake to shuffle was the only thing that was even close. All the rest of them I had widely varying things, from one guy who said he was visualizing a box in front of him and he was hitting different parts of the box to indicate things, another guy who shot his arm up in the air and then he'd draw like a plus sign for up and then a V for volume. Yeah. I just had this huge variety of stuff. So that was interesting. There was nothing that was even close to having a consensus about things. Okay. Last part is I want to talk about some ideas I have for stuff in the future. So, again, the things I've been thinking about are these various impairments. Gestures may not, in fact, at least as implemented in my experiment, solve the social impairment problem, certainly with the -- putting your hand on your wrist and rotating and shaking and so on. But I think the other ones might have been solved. But going forward, what other places can you put technology? I've been thinking about the wrists. I think there's really a strong possibility for basically an on-body ecology of devices that all communicate with each other in various ways and you can put them on and take them off and use them in various situations for different things. So I've also been thinking about what else can you do with the wristwatch. So, you know, I talked about the face for input. But what can you do for output. And I think that looking at output for various things could be really interesting. So one of them is RSVP, Rapid Serial Visual Presentation. So this is where you flash words in one place really quickly so you don't have to move your eyes. So you could imagine doing that on a watch. And if you look at the orange circle, I will show an example that will go by very quickly. So just a very fast thing. And so you could imagine using that, looking at your watch, and then when you don't want to do it, you can train yourself to just turn your wrist and have the presentation stop. So that's something I think would be interesting to look at. Now, I talked about using the bezel for putting your finger against. You could also think about doing input in other ways, like rotating it. I'd love to get haptic feedback on that so I could have it resist as I turned it or have different detents as I turned it in various ways to do interesting things. You could think about using the sides of the watch to maybe put little pegs that would stick out in various situations. You know, at CHI we had the inflatable buttons on various surfaces so you could imagine something like that, so you could just feel it and get some indication of what's happening. There's actually a watch -- a commercial product that allows you to tell time by touching your watch. As you move your finger around the rim of the watch, there's little divots or little pegs that stick out. And when you pass the one that represents the hour it buzzes, and when you press the one that represents the minute it buzzes twice. So you can be sitting in a meeting and just move your finger around it without ever looking at your watch. You can tell what time it is. We also have this giant surface on the band that nobody's thought about taking advantage of. You know, you could have that be a touch surface or, you know, thinking way far in the future you could have extra display on there, something. You can imagine an entire bracelet that does various things. One of my favorite ideas, I often twiddle with my wedding ring. I turn it around and around and around. So you could imagine having some sort of a connection between the ring and the watch. You could have a tuned circuit so the watch could actually sense what position your ring is in, and you could use that to scroll or something. You could also have a ring on each finger and use it as a password authentication. So I turn -- turn the rings to a particular position, and then every keystroke I type, it makes sure that I'm doing the right thing. And so if anybody steals the rings, they've got to know exactly what position they have to be in and what fingers they've got to be on or something like that. Finally, I think it would be very interesting to interact with other surfaces. So I have a -for example, a microphone in the watch. It turns out if you put a microphone on your wrist you can hear your fingers move, you can hear your tendons creak, which is kind of creepy but cool. You can also hear your fingers tap. And so one of the guys in our lab did some playing around with that. You could actually distinguish between rubbing your fingers like that and snapping them and so on. But I was thinking I'd put that down on the table. Now, if I tap, then I can easily pick that up on the watch. And I can imagine, you know, putting down your cell phone and putting your wrist down and then triangulating between the two of them or something and being able to interact with the surface in some interesting way. Turns out there's a company in France doing that already. Not exactly this, but they have I think three microphones they stuck on a surface, and then you have to train it and it builds a huge table of what the different sounds are like and where they are. But that's pretty cool. So that's the end. Happy to take any questions. [applause] >>: When people were making up their gestures, you apparently didn't tell them to do things that were socially acceptable, right? >> Daniel Ashbrook: Oh, yeah, so that's a slide that got lost from my dissertation defense. But I did actually. I had several criteria. I said the gestures should be reliably -- should be able to reliably activate the function that they're supposed to. They shouldn't activate other functions. They shouldn't be something that happens in everyday life. And I had the qualitative ones of they should be easy to remember, and they should be socially acceptable. And basically because the gesture recognition was so hard, you know, if I do a little jump like that to the left and a little jump like that to the right, they're going to show up in the system as exactly the same gesture. Just because they're so low amplitude, they look just like each other. So because of that sort of thing, people quickly ignored socially acceptable and graduated to huge, weird gestures just to try to get something that, A, is not going to conflict with other gestures and, B, isn't going to show up in the library. You know, I had people do a little hop to the right, and then that would show up in the gesture library with me doing this on the keyboard, you know, moving my hand from the trackpad to the keyboard, it would show up. So a better sensor, better algorithms would very likely make social acceptability more likely. >>: I wonder if -- so you would be an expert on making up these gestures, right? Do you think your would be -- do you think your gestures would be appreciably better ->> Daniel Ashbrook: No. Not at all. >>: -- than these amateurs'? No? >> Daniel Ashbrook: So I sort of breezed through it. The people I had doing it were largely HCI people. So I specifically recruited HCI Ph.D. students and so on from Georgia Tech and asked them, you know, put the full force of your HCI behind this and figure things out. You know, I had one girl who spent an entire hour just drawing on paper trying to think about different things and so on. No, I think my gestures would be just as lousy, to be completely honest. I mean, I think it's almost impossible to make good gestures with that particular sensor. So -- yeah. >>: How are you interpreting John's notion of better? Does this [inaudible] are you thinking accuracy? >> Daniel Ashbrook: I'm thinking everything. I'm thinking accuracy, maybe I could -- I mean, I had people -- I had people do like a hundred percent goodness on every single gesture. So people were good at making a bunch of gestures that were differentiable from each other. I had people get zero hits on the Everyday Gesture Library, so people were able to do that. So I could probably also do that, but I don't think I could do that in any more socially acceptable way or any more memorable way or anything like that. I think that that would be very difficult to actually do. >>: Given your experience, what do you think is the acceptably large or small set that's relatively easy to create? Given the constraints of not overlapping and socially -relatively socially acceptable [inaudible] ->> Daniel Ashbrook: Yeah. So ->>: -- [inaudible] memorable? >> Daniel Ashbrook: I had a hard time thinking about that for the experiment. I was like, how many gestures do I want people to make. Because I can come up with an infinite variety of tasks. And I was thinking, well, do I want to -- you know, these are all -- the ones I had them do are fairly -- things that were fairly iconic in general. You know, play maybe -- play and pause isn't maybe something, but like fast forward and rewind and so on, they have directions, you naturally associate them with that. Then I thought, you know, should I have -- should I like ask people to make a gesture for like Celtic music, like bring up that genre. Or, you know, play playlist No. 7, would everybody draw a 7 in the air or would I have other things. So I was trying to think of what's a reasonable number of these things to ask people to do and so on. And in pilot testing, I found that people were just able to do eight, and it seemed to be hard. I think more than eight would start getting very difficult. Again, it's very dependent on a sensor. >>: [inaudible] including if you have an eight [inaudible] relatively well [inaudible] making a ninth one is really difficult? >> Daniel Ashbrook: Not necessarily. >>: [inaudible] memorization problem [inaudible]. >> Daniel Ashbrook: So I didn't test memorization. I thought about doing that. But in the end I realized, you know, we have sign language. People can clearly memorize a large set of gestures. And by the same token, you can clearly make a large set of gestures that are recognizable to people. So American Sign Language is an entire language. It's got lots of different constructs and so on. But because it's human-to-human communication, it's incredibly high fidelity. You know, there are things that are very subtle about sign language that involve finger position and so on. If you could sense all of these things perfectly, you could probably make an unlimited variety of gestures. With this particular sensor, it's very low fidelity, it, you know, has lots of problems with gestures looking exactly the same as each other and so on. And so with that particular gesture I think that you would have a difficult time making more than -- you know, I would wonder if you could easily make a dozen gestures. Now, I only give people three hours also. And so in that -- you know, if I gave you a month, you could probably have more time and -- but yeah. >>: I think [inaudible] all this stuff and all the gesture recognition stuff you only did discrete gestures, did you think about -- I mean, certainly some of the applications or commands you're thinking of don't necessarily map well to discrete things, like, you know, volume up, volume down, fast forward, rewind. When I build a tabletop for doing video controls or whatever, right, I specifically don't do it with buttons, right ->> Daniel Ashbrook: Right. Exactly. Yeah. >>: -- I usually do it with all sorts of other controls because there's better mechanisms. Have you thought about how you can take what you've done so far and maybe make gestures that give you better control than ->> Daniel Ashbrook: Yeah. So ->>: -- like rewinding, stop rewinding? >> Daniel Ashbrook: Certainly you could do that. I mean, it would be very easy to do. You could sort of fake it with discrete gestures that would be like my fast-forward button, my rewind button or my, you know, fast forward and stop fast forward kind of buttons. To do actual continuous gestures where I say, okay, as long as I'm doing this, you should keep on doing volume up, I think that's definitely an interesting thing to look at. I'm sure that somebody has ->>: I mean, that just [inaudible]. >> Daniel Ashbrook: Yeah, yeah, I mean ->>: But there's [inaudible]. >> Daniel Ashbrook: Yeah, there's a whole -- there's a huge range of expressiveness of the human body that I've completely ignored. >>: Like if I was trying to tell you silent -- like let's say somebody was doing a talk and you were standing next to them and you were in charge of controlling the microphone control and I was trying to tell you from across the room inaudibly how do to it, right, and we hadn't talked about this ahead of time, right, I could come up with a way really, really fast to tell you how to do it and to do it more, no, no, no, bring it back down, and we would probably be able to communicate ->> Daniel Ashbrook: Yeah, absolutely. >>: -- and you'll probability be able to tell me that I've got your attention and we'd be able to do this without ever having done it before, right? >> Daniel Ashbrook: Yeah. >>: Do you think that that's just because, you know, our visual system, all that, is just too rich? I mean, can you replicate any of that in this kind of [inaudible]? >> Daniel Ashbrook: To a certain extent. I mean, I think that starts being an AI complete problem where you need to have the complete range of like culture embedded in the computer to understand. I mean, in some cultures, you know, they shake their heads for yes and bob their heads for no. Stuff like that. But a lot of that's just a question of gesture recognition in general. You know, I could certainly build a system that would work in very constrained situations where I wasn't doing anything else and I can very specifically define the gestures in a particular way. But I think as you get more free, it gets a lot harder. But I'm not necessarily an expert on the gesture recognition aspect of it, because there are huge math conferences devoted to gesture recognition techniques. So there might be something out there. Yeah. >>: By the way, have you [inaudible] how whatever people can remember the gestures? For example, you ask them to come back a week ->> Daniel Ashbrook: Yeah. I thought about that. I mean, the -- the real answer is I didn't do that because this was good enough to get me graduated. But I did think about that. I think that would be really interesting. I actually ran into somebody and asked them if they remembered any of their gestures, and they actually remembered at least a couple of them. But I -- yeah, I mean, they came up with these really weird things, how much did they connect those really weird things to, you know, how visceral were they. You know, some -- I'm sure it would hugely vary on how much time they spent on each one. And, you know, I had some people who right at the end went back and changed a gesture and then they were done. And so they probably wouldn't remember that one as well as the other ones. So I think it would be interesting. >>: I can also -- can see the case actually they have a conflict, the creation [inaudible]. >> Daniel Ashbrook: Yeah. Yeah. I mean, it's a huge question. And, you know, there's -- that's kind of related to the question of if I'm Microsoft and I'm making a new watch to give to people and I -- do I spend a huge amount of time figuring out a set of gestures that I'm going to give to everybody, or do I send it out and let them use a tool like Magic to generate their own gestures. If I say here's the standard set of gestures for fast forward and rewind or so on, is that better or worse or just different than saying here you go, figure it out yourself. >>: What's your intuition? Because you made a -- your study setup was almost in a -- I couldn't tell if there was an assumption or you really wanted to know what different people made as gestures. >> Daniel Ashbrook: So -- what do you mean? >>: Like there could be an assumption that letting you make your own gestures is more memorable and, thus, you know, I gave all these people these tasks to do and they [inaudible]. >> Daniel Ashbrook: So ->>: -- and it didn't really matter that your gestures are different than mine. >> Daniel Ashbrook: Right. So -- so my -- my assumption was basically you guys are HCI professionals. I want you to be HCI professionals doing this. So my supposition was I'm pretending you are working for Microsoft doing this. You're going to be the people who are actually defining the default set of gestures that are going to go out are the people going to use. >>: So they didn't think they were making, you know, Daniel's set of gestures ->> Daniel Ashbrook: Right. >>: -- they thought they were making these gestures [inaudible]? >> Daniel Ashbrook: Right. Although, that being said, that being said, I -- my -- the people who recorded their own EGLs and came to the experiment, they were not HCI professionals. They were just like, you know, friends of my wife and stuff. So when they came and did it, it was very different. And I think they were thinking more I'm going to make this for myself. And so I asked one of them, and I said, you know, would you like to, you know -- would you ever want to do this for real. And she said it was really hard and it was really frustrating, but, yeah, I think I would -- I think given this product I would actually take into my home and I would probably take a week to do it. I would spend a couple hours at a time on it, figuring things -- these things out. But I think it would be enjoyable to do and I think it would be neat to have something that I could actually customize myself. So, yeah, I don't know. >>: You have this library of everyday gestures. Is it even feasible to sort of somehow search it for spaces that are -- would be reasonable by the rule sets for [inaudible]? >> Daniel Ashbrook: Not with that sensor. So the problem with accelerometer is you can't back-trace from the acceleration to the motion that generated it. So if I -- if I was really good at math and I was going to do another Ph.D., I think it would be interesting to get an actual like motion capture kind of thing, build an inverse kinematic model of the body, and then say, okay, you know, here's an input gesture, oh, that doesn't work, but now I've got this model of what it looks like, let me try to permute it in various ways and then predict what the sensor values will look like based on those and try them. And so do that really quick and figure out what sort of motions -- what sort of variations on the input motion can I then come up with. I think that would be super, super hard, though. >>: Let's try and get back to [inaudible] so you said that, you know, maybe with better sensors or something like that we might be able to sense gestures that didn't conflict with the Everyday Gesture Library but weren't socially awkward. Do you really think that's true? I mean, what do you think the advance in -- what advances in sensing technology [inaudible] to make that upon? And like how does -- how do you [inaudible] gestures of people [inaudible] and there would be appropriate kind of fit with what you're talking about in the very beginning, like the motivation in terms of here are the situations where speech don't work or here are the situations where other things don't work. I mean, how do these gestures fit in with that landscape? >> Daniel Ashbrook: Right. So the gestures people -- so I'll answer the second part first. The gestures that people actually created probably don't fit in very well with that landscape. You know, I can't do this thing while I'm holding a shopping bag. It's clearly socially inappropriate. I can't do that in a crowd. You know, these are clearly very poor gestures for actual real use. But at the same time I think that, again, given a better sensor, and so let's say I have -- so with the gyroscope and a magnetometer and a temperature sensor and an accelerometer all in a little thousand dollar box, you can actually make an entire motion capture system. So this company called Ex-Sense [phonetic], they're Dutch, they sell this like tight-fitting cat suit that has these little sensors in them. And so their demo video is this woman standing there doing this stuff, and the figure on the screen is doing it. She goes off the video screen, comes around -- she's gone outside the building and has come back around and is standing next to the window now and it's still synchronized with her perfectly. So given these kind of sensors, I think that you can start doing much more subtle things. I could just do little motions like that. >>: Do you think those will be subtle things that don't conflict with Everyday Gesture Library? >> Daniel Ashbrook: Maybe. Maybe not. So one that I've thought of that I'd like is this sort of wrist flick kind of thing. You know, shut off the cell phone, something like that. I think that, you know, I can do that down here. It's not ->>: So I guess a related question is, is that the right way to go? Or is the right way to go to be as clever as possible about the [inaudible]? >> Daniel Ashbrook: Right. >>: Right? Like if you spend all -- if you spend two months or three months or two years trying to figure out the best, most subtle, reliable, robust activation system possible, do you need to even worry about EGL after that? >> Daniel Ashbrook: Yeah. It's a really good question. I don't know. Certainly there are a lot of situations in which you can basically cheat. You know, it only responds to my motion to shut off the cell phone when the cell phone's ringing, for example. I mean, that's a no-brainer. We can pull in a lot of context and start making things better. There are certain things where that won't work. You know, when I'm responding with computer, then context is easy. When I'm instructing the computer, then context is harder. Did I -- you know, is there any context that will say did I really mean to start playing my music. Probably some. But ->>: If you go with a pre-fix gesture or something to do the activation, right, I don't care if it's grinding your teeth or whatever ->> Daniel Ashbrook: Yeah. There's a ->>: -- if you use the EGL to figure out -- I think this was Scott's point, right, using the EGL or something to figure out the perfect activation. >> Daniel Ashbrook: Yeah. Exactly. Yeah. So there's ->>: [inaudible] thing. >> Daniel Ashbrook: Yeah. >>: And then you use the prefix technique [inaudible]. >> Daniel Ashbrook: Right. Yeah. So I think there's a lot of things you could do. >>: You might want to actually map the activation to something that's not [inaudible]. >>: Yeah, yeah, yeah, like a button. >>: Yeah. I'm just saying -[multiple people speaking at once] >>: If you figure out something that's perfect activation, it almost doesn't matter what -- I mean, you can still -- you still need to be sure the gestures don't overlap over themselves, not necessarily [inaudible] against everything else ->> Daniel Ashbrook: Although, imagine jogging. So all of my body is in motion, there's lots of acceleration. I figure out the perfect activation gesture, and then every motion I make while -- after I've done the gesture I still have to worry about conflicting with what I'm doing. So if I have a perfect activation gesture, the perfect activation could be me stopping stock sill and then doing the gesture I want. Something like that. But, you know, in real situations, it's going to be a lot harder. And this may just be an impossible task. >>: With accelerometers. >> Daniel Ashbrook: With accelerometers. >>: And on that note... >>: I have one more question. >> Daniel Ashbrook: Yeah. >>: You outlined a bunch of scenarios that people use to create these gestures. Given the experience of watching all these people do that, do you have a recipe of if I give you a gesture what would you do first to try to make it more reliable? >> Daniel Ashbrook: No. People were really like ridiculously varied. Like ->>: I know. But in terms of like you as [inaudible] ->> Daniel Ashbrook: You mean if you give me what would I do. Ah. Interesting. >>: Do you have an observation of like -- I see [inaudible] I give you a gesture ->> Daniel Ashbrook: Right. >>: -- designed with your system, it overlaps, what's the first thing that you would try to do to fix it? >> Daniel Ashbrook: Probably impact, I think. Just it feels like -- it feels like a small motion that I can do that ->>: Just making corrections ->> Daniel Ashbrook: -- impact or a really hard jerk to -- something that's maybe -- isn't going to be -- isn't going to show up as much. But, you know, again, we have impacts all the time. I accidentally bump things, you know, I grab things real quick. So it might not be a good strategy. So, I mean, none of the strategies had any significant impact on anything. So, yeah, it's a really super hard problem as its turns out. >> Amy Karlson: Well, I don't know if I'd wrap this up officially, but we'll let everybody go. And thank you so much for joining us today. >> Daniel Ashbrook: Thank you. And thank you all for hanging around and asking really interesting questions.

>> Amy Karlson: Okay. Well, I'm Amy Karlson... group at Microsoft Research. And today I'm welcoming Daniel...

Related documents

Products

Support

&gt;&gt; Amy Karlson: Okay. Well, I'm Amy Karlson... group at Microsoft Research. And today I'm welcoming Daniel...

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib

>> Amy Karlson: Okay. Well, I'm Amy Karlson... group at Microsoft Research. And today I'm welcoming Daniel...