>> Andrew Kun: Okay. So I guess the one thing is please feel free to interrupt with questions at any time you like. And I do want to start this by thanking two people: Tim Paek, who's been collaborating with me for almost two years now, and a lot of the work that I'll be talking about here he's been participating in, so thanks, Tim, and looking forward to continuing this; and also Ivan (phonetic), who invited to me to this talk, and hopefully we'll have chance to collaborate as well. I want to acknowledge my students who are generating all the data that we're going to be talking about here today: John, who's working on a navigation experiment; Oskar, who's in the audience here who is working on a press to talk experiment; Jacob, who's a Microsoft Research intern this summer actually with Tim and also working on the mapping experiment; and Owen, who did a lot of the video work you'll see here; Alex, who's working on a human-human study that I'm not actually going to have a chance to talk about much, but we're basically learning from human-human interaction, trying to form our human computer interactions from that. And then Puneet (phonetic), a summer intern who's working on one of our next steps, which is creating the UNH obstacle test, which I'll talk about soon. So a short outline. I just want to introduce the topic by talking about ubiquitous computing in cars and what the relationship is, and then basically talk about the studies that we've worked on and say a couple of words about what's next. So ubi-comp in cars, the idea is ubiquitous computing, of computer being everywhere, networked but sort of fading into the background. I just did a little quick online search, and, you know, Intel certainly thinks that there's something to this. This is a thing from their Web site where they're thinking about the car being at the center of this entertainment and communications and all these things coming together in the car. Navigation is certainly something that we see a lot in cars. And I did like this picture from the Web site of one of the major manufacturers. Does it look like the person's driving to you? The left hand there seems to imply that this is a person driving. And as you'll -- you can probably guess and you'll also see from our data, this would not be such a good idea, right, driving and doing pointing at the same time. But, anyway, so I like that picture. The Zune, you can buy that with the in-car attachment and, you know, iPhones are everywhere. So certainly cars are getting into this age of ubi-comp where things are getting into cars. So how will this progress for cars? And one person who's in this field, Russell Shields, who is the CEO, I believe, of Ygomi, thinks of cars as docking ports, or cars are going to become docking ports in his opinion. So what is a good docking port? In my opinion that would be something that provides you with an open interface so that you could in fact dock into it. Now, if you're a car manufacturer, this sounds good on the one hand because, as we've seen in the slides before, people are bringing in these brought-in, third-party aftermarket devices. And that's sort of a reality they have to deal with, and perhaps there is value to saying, look, my car is an open interface. But then there's the liability issue too. So if you're a car manufacturer and someone plugged in their MP3 player and then crashed because they were playing with it, will you get sued. And also there is the profit issue. So if someone else is producing the faceplate for the new radio and you're not the one determining what the size and the shape of it is, then you're not making that extra money. So there are certainly things that drive car manufacturers towards opening up the car and sort of going into the ubi-comp age, and then there are definitely pressures that are pushing in the opposite direction. Now, having said that, the background we come from is police cars. Because, as many of you know, we have this effort called Project54, and I'll talk about that in a second. But police cars, the difference between a police car and my car is that while I might have electronic devices in my car, they're really toys. So if I don't have my MP3 player, even if I don't have my cell phone or navigation device, I can probably drive. People have done this for many years and they were okay. But you're not a police officer if you don't have lights. You're not a police officer if you don't have a radio. You're not a police officer really now even if you don't have a computer that you can run license plates on. That just is part of policing. So if you're a police officer, there are things that you have to have, these electronic devices. And so in a very real sense, police cars are the vanguard of ubi-comp in cars. Because they actually both have these devices and really need them and want to use them on a daily basis. It's easy enough for us to say, look, just don't use your cell phone, it's prohibited, and ubi-comp-problem solved. But for police this is not a way to go. So I think that if you're going to do research in this area, in fact working with police is a very nice place to be. So for some of you a reminder, for perhaps others a quick introduction into this Project54 system, basically lots of devices, as I just said. We provide a way to integrate them into a single system and provide a single user interface which has a voice modality so you can issue voice commands to turn lights on, to run license plates and these sorts of things. And there's also an LCD touchscreen as well as the original user interfaces. So if you're resistant to technology or, for that matter, if your computer crashes, you still have the fallback of everything works the way it used to work 50 years ago, but ideally you can actually take advantage of the new technology as well. Okay. So hopefully that sets the stage for some of the studies that we've done. There are two problems that the studies that I'm going to talk about address. And the first one is that there's no clear -- we don't quite know how in-car devices affect driving performance. So there are obviously people working on this. But there's no formulaic way to figure out if you put a device into a car how will that affect driving performance. And then the other question is -- related is how does driving performance and your likelihood of getting into a crash -- how are those related. Just because you're not driving as well does not necessarily mean you're going to get into an accident. Primarily, so far we've been interested in the first question, really, which is how do in-car devices, the various in-car devices affect driving performance. And that's what I'm going to concentrate on today. Our goals are twofold. One, we'd love to have an evaluation tool. So it would be really nice if we could evaluate these in-car devices and say, well, this is how they affect driving, this is a safe device and this is not a safe device, or this is a safe user interaction design and this is not. And then we would like to propose ways to reduce potential distractions. We know that there are distractions. We can identify them hopefully with the evaluation tool and hopefully we can propose ways to reduce them. Our major hypotheses in all the studies in this work are that what affects driving performance are things such as the user interface characteristics. For example, for our speech user interface, this could be the speech recognition accuracy, do you have press-to-talk button or not and so forth. What we would call road conditions: are you driving at night or day, are you driving in the city or a highway, curves, no. And then the psychological state of the driver: are you frustrated, are you happy. And, of course, all the interactions between all of these things. And in evaluating these hypotheses, we use a driving simulator. So what I'm going to talk about really today is the driving simulator studies that evaluate the major hypotheses and then, in fact, very specific hypotheses that come out of those. So the driving simulator looks something like this. And, in fact, there is a link up there if you want to get more about this. There is a video and a little explanation about what the driving simulator does. But, in short, this is a driving simulator with a 180-degree view in the front. And so that's with three channels, so-called channels, so three screens. And you also have these side-view mirrors, as well as a rear-view mirror, and then a motion platform. The motion platform allows for feeling acceleration, deceleration only. But that turns out to be a nice feature because you can actually tell -- as it turns out, it's very difficult to stop at a line without feeling that deceleration. So that really helps with that. The driving simulator helps us evaluate our hypotheses through driving performance. And what constitutes driving performance? Well, primarily, we've looked at the variance of things like lane position, steering wheel angle, velocity, and a distance if you're following a car. So what position you are in the lane exactly doesn't really matter as long as you are in the lane. So are you at zero centimeters or are you at plus ten centimeters. It doesn't matter. What does matter, though, is if you're weaving in and out or if you you're changing that position a lot. So basically the variance. Similarly, the steering wheel angle, the mean is probably going to be zero, otherwise you're going to be in a ditch. But the question is how hard are you working. So you could imagine a situation where, in fact, your lane position variance is low, you're able to keep yourself in the lane, but you're really working hard at it. And that's a sign that something's going on; you're probably overwhelmed by something, whether it's the road being too difficult or there is something else going on in the car that's distracting you People tend to slow down if they're overwhelmed. So velocity and velocity variance matters. And then people tend to have a harder time keeping distance to a vehicle. They might just lose it and the variance goes up. Of course, simple things like lane departure. So if you're departing lanes very often, that means that you're not doing so well. Collisions and other things you can look at. Now, the simulator is also equipped with a Seeing Machines eye tracker. Again, there is a URL if you want to take a look at more information about this eye tracker. The eye tracker has two cameras and also a couple of IR pods that we can use to illuminate the subject. And then what you do with this is basically look at the visual attention. So things like fixations, which would be looking at a particular spot for, let's say, over 100 milliseconds or over 200 milliseconds, however you feel like the definition is appropriate. And then the number, the timing of these things, so how often do you actually look at the GPS screen. What is the timing; do you happen to do this before or after turns or is there a voice announcement that prompts you to look at the GPS screen and so forth. Scanning matters. Scanning meaning scanning left to right. Because especially in cities, that's very important; that's how you find out if there's a pedestrian lurking behind that parked car. And people tend to focus in on the road ahead as they get overwhelmed with, for example, in-car activities. Per closed to percent closed time, so that people look at how -- as it turns out, the way to find out if you're getting tired is whether you're starting to close your eyes for longer and longer periods of time. So if you have one of those devices -- truck drivers have these now in trucks where they start beeping because it notices that you're falling asleep. And one way to notice that you're falling asleep is that this percent closed time is increasing. Time looking at the road. Just are you looking at things in the car or are you looking at things outside the car. If you're looking at things outside the car, that's probably a good sign that you're not going to crash. And so these would be things that we can use to evaluate hypotheses through our studies in the driving simulator, and then as well as with the eye tracker. So I wanted to talk about four studies today, and, again, please do feel free to interrupt if you have questions. An older study, the first study we've done had to do with a police radio. And, again, having this police background for the project, made sense to look at the police radio. And we looked at the speech user interface versus the hardware user interface interaction and how this affects driving performance. In the following study, the speech user interface characteristics were varied, and we looked at how that affects driving performance. And this is work with Tim Paek that was actually published at Interspeech last year. And then Oskar's work also with Tim's help were looking at the glove -- I'm sorry, we call it the glove press-to-talk button. So the idea here is what if you didn't have a fixed location for the press-to-talk button but rather you had it sort of floating on the steering wheel. But instead of an instrument in a steering wheel, if you just put a glove on someone just to get results quicker. And then we're also currently working with Tim on a navigation experiment where we're looking at the differences between people getting printed instructions, graphical user interface in speech, so that would be the state of the art types of instructions, and then speech-only instructions and this on city highway and so forth. So let's take a look at this police radio study. For some of you this may be a reminder, but what we've looked at here is this is the picture of a -- one of my students who's demonstrating this, so we have the speech user interface, which here is a press-to-talk button, there is the microphone so you can talk to the computer and basically issue commands such as change the radio station to X, Y, Z. Or you can do that by operating tiny buttons and looking at a tiny screen on a commercial police radio which sits in -- in fact, this particular radio is in all of the New Hampshire police cruisers, from maybe 1,500 of them between the state and local police cruisers And so you have relatively small buttons. You have to take your hand off the wheel and you have to look for feedback on this small screen. In fact, given that there is something like 200 channels, this actually isn't trivial; you kind of have to look at where you're going. And we found drastic changes in lane position variance, for example. You see that highly significant with the speech user interface, a lot smaller variance with the radio, hardware interface, a lot larger variance. And by the way, this was on a straight road. There were no curves. People were just driving on a straight road. And they had a harder time keeping in their lane. And then even more significant and even more dramatic difference in the steering wheel angle variance where the hardware interaction showed huge variance compared to the speech user interface variance. >>: (Inaudible?) >> Andrew Kun: Degree squared, sorry. In the previous one it was meter squared, so this is degree squared, yes. So having realized that there's -- having gone through past this, we wanted to get a little more information about how does speech user interface characteristics actually -- and which speech user interface characteristics affect your driving performance. So we've designed another experiment in which we had a sort of similar task, where you were supposed to interact with a police radio. But we actually took the police radio away. We made this a Wizard of Oz experiment basically. And we looked at speech control only. So the secondary task -- the primary task being driving, the secondary task was speech control of the radio. You have to issue commands such as change the channel to this, retransmit the message that just came in, go back to this channel and so forth. And then we varied a couple -- three things, specifically. We varied speech recognition accuracy. And, again, given that this was Wizard of Oz, it was easy to vary speech recognition accuracy, right? And we had a high condition or a low condition. And then we varied whether you have to use a press-to-talk button. So in one case you have to push down, hold, and then release, a fixed press-to-talk button, which in fact in this case was in the center console. And then the other option was that you had ambient recognition. So you did not need to be using a press-to-talk button. And finally we looked at the dialogue repair strategy; that is to say when the computer did not understand what you said. It either misunderstood -- that is, it executed the wrong command -- or it just said I didn't understand, please repeat. >>: For the accuracy, how high was high and how low was low? >> Andrew Kun: So 88 percent was high, and 44 percent was low. So it was truly low. We wanted to have extremes for this first study. And that's a good question. Of course you would want to have more graded result at some point presumably, but... People drove a scenario which the map looked like this, so basically it was a curvy road. And you want to have a curvy road so that people actually have to struggle a little bit of driving. Straight driving is obviously a lot easier than having some sort of curves in your driving scenario. And then what happened was that you see that we basically varied the accuracy from 89 to 44, and then with or without press-to-talk and then particular parts of the road they had to do one or the other. And so this is about one kilometer, so this is a reasonably long scenario. >>: In which direction they drive, clockwise or counterclockwise? >> Andrew Kun: Clockwise. So this was the starting point. So they headed up. And then there was a little bit of a warmup here. In fact, I'm not telling you the complete story, so there was training obviously that went on. >>: (Inaudible) the second half of the road is practically nothing. >> Andrew Kun: Yes. So in fact they did not -- you're right. So I probably could have given you a slide that left us off, because they did not actually have to drive the rest of it. Or they drove a little bit so that we could do some comparisons of baseline driving and so forth. >>: (Inaudible) I just thought that the initial part of this, on the lower right part, which means that training can -- it seemed quite long, so this is (inaudible). >> Andrew Kun: Yeah. It's about -- this is where they ended up starting, and they started the interactions here. So, again -- or what we found here were really two important results. One is that the recognition accuracy does influence your driving performance, specifically steering wheel angle. Variance was higher for low-recognition accuracy. So for low-recognition accuracy, you ended up with a higher average steering wheel variance than for high recognition. And then the second result was that if you had low accuracy, then using the press-to-talk button also influenced your driving performance, specifically (inaudible) position, and you see that on the X axis you have to the press-to-talk, so you didn't have to use it in the ambient-recognition condition and, yes, you did have to use it. In there, the variance of the lane position was higher. So this we thought was important results. So if you're going to put a speech recognizer in a car, you better make it work well enough, for one thing. And then be careful because if it starts not working so well, then the press-to-talk button might become a problem. And honestly it just kind of gives you an inkling that the press-to-talk button, there might be something there, so this might actually be worth some further -- some further study >>: Just so I'm clear on this, so this is only for the low accuracy conditions (inaudible) so the idea is that if you're low accuracy and you've got press-to-talk, you're getting a higher variance. >> Andrew Kun: Right. And so we can think about this, why would this be the case if presumably -- and that's another hypothesis that kind of came out of this, is you're probably frustrated. And perhaps you take it out on the press-to-talk. >>: And the press-to-talk was on the wheel or on the -- >> Andrew Kun: Yes. It was on the wheel. In this case we moved it onto the wheel. >>: So who are the drivers? Are those policemen or -- >> Andrew Kun: These are not policemen. These are subjects that we recruited from the community, and it's mostly -- >>: (Inaudible) diverse. >> Andrew Kun: It's diverse, yes. It's -- I don't have the statistics off the top of my head, but it was on the order of ten drivers. And they were from UNH, some students and stuff. >>: If you needed some related work, there's a kids' show on called Myth Busters where they actually did a comparison. They didn't have the speech or the push-to-talk, but they compared cell phone -- driving with cell phones versus alcohol. And I've forgotten what the result was, but you might consider alcohol as another baseline condition. >> Andrew Kun: Well, I'm sure that we'll find students who are going to be willing too -- (Laughter.) >> Andrew Kun: See, UNH is a dry campus, so we'll have to get special permission to do these sorts of -- but, you know, I mean if that's what it takes to do science, right, then... Okay. So motivated by this result that the press-to-talk button does have an influence on driving performance, we wanted to drill a little deeper. And this is an ongoing study that Oskar is working on in which, again, we had the same task of driver on the same route, do the same business of controlling this police radio to Wizard of Oz again. But now it's, again, let's review this same speech recognition accuracy high and low, but also let's take a look at what the activation sequence is for the press-to-talk button. So is it push, hold while you're talking and then release, or is it push and release and then speak and let the end pointing be done automatically. Or no push, which, thanks to Ed Catrell (phonetic) we realized this is how we really designed our experiment. >>: (Inaudible.) >> Andrew Kun: Well, and so let me just say that in fact if you look at the last line here right up, if you start out with this, so push-to-talk button, do you have no push-to-talk button, there's ambient recognition, or you have a fixed push-to-talk button, which is fixed on the steering wheel, or you have this glove, which allows you to -- here's the picture of the glove, so it's basically a glove with a couple of sensors in the thumb and in the index finger, and those are basically the switches, the press-to-talk switches. So this sort of -- instead of instrumenting the steering wheel, which is harder, right now you have basically the ability to push any point on the steering wheel and get the push-to-talk button. So back to this, so you have either -- you don't have to use the push-to-talk button, you have this fixed push-to-talk button, which is on the steering wheel, or you have this glove, which is sort of a floating push-to-talk button, and these are your three conditions. Now, for these too, you can have push-hold-release or push-release. But in fact for ambient, you can't. So hopefully this here explains it a little better, this table. So you have the push-hold-release and push-release, which makes sense for fixed and glove, but in fact don't make sense for ambient. So that's really sort of a no push condition. So just if you want to set this up statistically, this would be a good way to look at it. But at any rate, I think that what's important to take from this slide is we have these three conditions for, you know, do you have a push-to-talk button at all and, if you do, is it floating or is it fixed. And then if you do have a push-to-talk button, do you have to do push-hold-release or push-release. Now, let me show you a couple of videos of what the interactions look like. So this is again the glove, and then here is a video where you'll see a person using the fixed push-to-talk button. And, by the way, I don't know how the lighting is. Hopefully you guys can see. But one option is to potentially turn it lower. >>: They tend not to like to because (inaudible) -- >> Andrew Kun: Okay. Well, let's see how it works out. Let's see how it works out. And, anyway, you'll see the press-to-talk button -- here's the steering wheel and the press-to-talk button is right here. This person will operate it. There will be a red circle pointing out that she's pushing it, so that will help. And then there is the leading vehicle. You can see that there's a curve coming up, so this is that curvy road this person is driving on and she is basically following that leading vehicle. That's the primary task, and the secondary task, again, is issue these commands to this police radio about changing channels. (Video played.) >> Andrew Kun: Okay. So that's the fixed push-to-talk, and you could see those were the misunderstanding where the computer would misunderstand things and you'd have to fix it and so forth. Now, one thing that -- so we're still looking at the data, and I think we're probably going to end up probably collecting some more data. But one thing that we can certainly say, one thing that came out of this that was interesting, we thought, was that people actually tended to glance down at the fixed push-to-talk button. And I was surprised because I thought that it's fixed, there's only one button, it's not -- what's there to look down for. But as it turns out, as you'll see in this video, people do. And a lot -- a lot of the subjects -- most of the subjects actually look down very often to see where the button is So what you'll see here -- by the way, this is a video from the viewpoint of the eye tracker. So, I don't know, Ed can tell us how this compares to the video that he sees on his eye tracker. But a couple of things here. The two vectors here would show you the direction of the eye gaze. And you'll see that will be moving. This is the head position, so the direction the head is pointed at. And the two numbers here, the green will be counting up how many times a person looks down before he presses the button, and then the red will be counting up how many times he does not look down before he presses the button. You'll be able to hear a beep when the button is pressed. So, ideally, and what's going to happen here is you'll have a bunch of short snippets of basically the person looking down, looking up, pressing. And then I cut off to the next interaction where the person looks down, looks up and presses. So you can try to synch yourself up with the look down, look up, listen for the beep, look down, look up, listen for the beep. And the beeps are basically when the person will start to issue a command and I cut that off. That's not there. It's basically -- the original video shows all the interactions, and this one in the middle there is going to be a blacked-out spot just because it's going to be a little too long. So here we go. (Video played.) >> Andrew Kun: So look down. There's a beep. Okay. So in this particular case, 31 to 23, quite a large difference. So more than half the time this person looked down, even though it's a fixed push-to-talk button, it's not going anywhere. And there's really only one button you could possibly press. >>: So these are all first-time users. >> Andrew Kun: That's true. And I'll get to that. And that's a question that is worth asking, is does that make a difference. >>: And was the button on the left side of the wheel? >> Andrew Kun: It was. And I was wondering if anybody was going to catch that. But for some reason -- >>: (Inaudible) >> Andrew Kun: It was on the right side of the wheel. And for some reason the eye tracker gives us the mirror image of the (inaudible). >>: Is there any attempt to test maybe push-to-talk being on the left foot? >> Andrew Kun: We did not do that. I think people have done that. We haven't done that. And I can't quote a paper, but I'm not positive that that worked out so well for people. But we haven't tested it. >>: People looking at their foot and some point (inaudible). >> Andrew Kun: That would be more exciting, right? >>: The video went through real quick, but was there some learning even on these 50 trials -- >> Andrew Kun: Yes. >>: That is, was the first half that you look more and then it would be more red on the second one? >> Andrew Kun: Well, in this particular example -- so whether that's a representative example is a question for some statistical work. But you could see -- I don't know if you noticed, but in fact in this case the last few glance- -- the last ten probably were glance down. So he definitely glanced -- it almost looked like it's the opposite. So that's a good question. We have to look into that a little more. And then one thing that is kind of interesting is just the difficulties that the eye tracker has to go through and the difficulty that Oskar then has cleaning up the data, because of the various -- the person looking one way or the other, the eye tracker only has so much of an angle that it can track. And then this last shot here is interesting. See the hand right in front of the camera, which then if you put it right in front of the illumination, that kind of messes up the contrast and so forth, so just interesting things that keep -- that go on. Now, let me show you what the glove interaction looks like. So again: (Video played.) >> Andrew Kun: So that's how the glove interaction happens. And then take a look at this person again and listen for the beeps and look for the glances. (Video played.) >> Andrew Kun: So the point is that there are few, right? So at least for this person -obviously there's no reason to look at your -- you know where you index finger is. You've learned that very early on, or your thumb. So you can basically do this without looking. So we thought that was -- I certainly thought that was surprising, but interesting result. One thing that we also looked at that we've had a chance to look at already is where do people actually interact with the push-to-talk button? By this I mean, if you think of the car coordinate system, where on the steering wheel do they push the button. So if you have a fixed push-to-talk button, that depends only on how much you're turning the wheel. That gives you exactly the angle of where you're pushing. If you have the glove, then you have to kind of transcribe it. So example: (Video played.) >> Andrew Kun: So Owen, one of the students, went through this where he basically overlaid this fixed coordinate system on the steering wheel and transcribed where people push the glove button. And when you do that, you get this sort of a graph, which shows a couple of things. For one, the red, which is the fixed push-to-talk button, is centered around roughly the 75 degree bend. So we bend this. Obviously it wasn't supper precise, but roughly the 75 degree bend, which is where if you're heading straight, that's where on the steering wheel the push-to-talk button is. The glove push-to-talk is more towards the 30, 45 degree, which, when you think about it, that's the ten o'clock-two o'clock setup. So if you do what you're told to do in driving school, then that's basically where you're going to push the button. So that's a nice result we thought. And, also, see how it's more spread out, right, the blue versus the red. So this is something that we thought would happen, that people would feel more comfortable pressing the button sort of in a wider range of the steering wheel, and that is coming across. Now, you might ask is that a good thing, but that's sort of a separate -- >>: (Inaudible) this is where the push-to-talk button was actually fixed on the steering wheel? >> Andrew Kun: The button is fixed on the steering wheel, and that happens to be in that 75 degree bend when you -- and then the only thing you have to do is really look at the -- >>: (Inaudible) or just on the trial set? >> Andrew Kun: Well, that is kind of -- I mean, we could have really placed it anywhere. But that's roughly where if you have a -- if you buy a car with a bunch of press-to-talk buttons, that's where they will be, roughly the center of the steering wheel. >>: (Inaudible) because of the possible false alarms, you have a more (inaudible) -- >> Andrew Kun: You could. And so that is -- I don't know that we have -- Oskar, do we have any false -- we probably have a couple, but not too many >> Oskar Palinko: (Inaudible) significant. >> Andrew Kun: The push-to-talk buttons are a little -- they're little microswitches, and they do give you -- what do you call that -- tactile feedback by virtue of being the way they are. So it's actually not very easy to press them if you don't mean to press them. And they are on the index finger and the thumb, which, when you're driving, are not -- you don't necessarily drive like this. Is it's a valid question, you're absolutely right, but it's -- I think the setup is such that that does not happen very often. And so that brings me to the last thing I wanted to discuss today. This is also an ongoing study about navigation -- yes. >>: You didn't show any of the -- at least I don't think I saw any of the data having to do with variance, et cetera, from lane position. Was there any difference between the push-to-talk and the -- >> Andrew Kun: So we're in the process of actually looking at that right now. In our preliminary analysis, we see that the accuracy is showing up, so we're basically reproducing the result that was the previous study, so that's good. And we need to drill down, given that we have this unbalanced data set. We need to actually think about a little more how we're going to process it. So, presumably, if this talk was given in a month, I'd have slightly more results from both. But that's how it is. So navigation, the motivation is sort of obvious. The personal navigation devices are proliferating. So it's interesting to see how they affect driving. And when you think back to that picture that I showed you at the beginning of the talk where the person was I think driving and definitely pushing a button, is that such a good idea. And, of course, even from the very first study I showed you where the -- the radio interaction, you know that that's not a good idea. So pressing buttons on a tiny display, that's not going to be so good. But, anyway, even if you don't do that, we wanted to look at the following. What if you had this task of follow directions to get to a destination, but then the directions are given in three different possible ways: one would be you print out your directions from the Web and you get them on a piece of paper and then off you go; two would be the state of the art, you have a personal navigation device that has a graphical user interface and it also gives you voice prompts to help you make turns; and then the third one is sort of, well, what if you just took away the graphical user interface and you kept the voice prompts only, would that actually do just as well in terms of getting you to the destination but perhaps better in terms of driving performance. And then we varied road conditions because we figured that highway and some sort of a -- I call it suburban highway, so basically multilanes but perhaps more curves and more buildings around, and then city, obviously, those probably are going to be different -- are going act differently. So the cabin setup looks something like this. Here is our personal navigation device, Goowy. So somewhere where you'd probably put one if you rented a car and you had a beanbag, to put it on your dashboard. And then we also have a video camera, and you'll see some video from this angle of what goes on in a car So the map -- the city part of the map looks something like this. And something to notice here is that we have some short segments, we have a longer segment and some sort of a middle-length segment. Because one hypothesis is that the length of the segment, the amount of time you have to wait for the next prompt will actually influence your driving performance; perhaps you'll get nervous or fidgety or you have to look over. And then one thing we're also interested in, well, how many times do you actually look over at the GPS screen, just to look, to see where you are. And the argument would be you probably shouldn't look if you don't have to, because looking at the GPS screen is probably not the best idea. And then this is the highway, the straight segment here of the highway, and then what I called earlier, this urban highway or something like that. So there are curves here basically and also there is some more built-up -- it's a more built-up area. Okay. So here's a video of a person using printed directions. And what you'll see is four video segments. There are three video camera angles that you'll see. This one, so behind the driver, you'll see that video camera from the side that I also showed you, and then you'll also see the eye tracker video angle. And then at the end you'll see a segment created in MATLAB, which will just show you what happened with the lane position. And you'll see in this particular case that this person will deviate from the lane that he's in. You see that right now he's on the highway, he's in this particular lane, printed directions are right here, he'll pick them up and, as he picks them up, he'll start moving into the next lane. And you'll see that on that last segment as well. So let's take a look. (Video played.) >> Andrew Kun: Okay. I see the glance. It's interesting to see how it becomes yellow, by the way, there, which sort of -- and then -- okay. So here's -- glance is coming up. There's a glance. Look at how he started moving and then he decided, all right, fine, I'll just go. And then multiple glances. So here are the glances that he took down, and you see that basically as he was getting ready and then took that glance, he basically moved from his lane. This is the lane marker here. He ends up in the next lane, and then at that point he just says, well, fine, I'll just go to that lane. So this is about one meter, so this is a serious distance that he passed, and this is about five seconds of travel time on the X axis. Now, we did compare that already to what is the state of the art. Well, here's the same driver using the graphical user interface and the speech interface, so that's the state of the art that you have right now. This now is a (inaudible) scenario, and partly because the video's more interesting when you actually hear some instructions. And so what you'll hear here or what you will see is -- so there's the graphical user interface. You'll see the person glancing over. You'll see three segments. The MATLAB segment will not be here. But you'll see, again, the -- this angle, the angle from the side, and then also from the point of view of the eye tracker. You will also hear spoken directions. And I'll turn this up just to make sure that you hear that. And the spoken directions will same something to the effect of turn right. (Video played.) >> Andrew Kun: See those glances. And see the glance in this case is to the left because it's mirror image. And there's another glance on the GPS unit after he was done with the curve, with the turn. And then finally that same person: (Video played.) This is only speech. The graphical user interface is turned off. Notice that there are no glances anymore. All right. So he hears the instructions and follows them and basically looks at the road. So look at the steady gaze, right? So while we don't actually -- we're still actually in the middle of collecting data. The data that I've seen show clearly, as you'd expect, that when you have a piece of paper in your hand, your variance of your steering wheel angle and your lean position is visible by the naked eye from zoomed out on the map. And just this gives us a good indication that there will be some interesting data as far as the glances are concerned. So we're looking forward to the data collection being completed. >>: (Inaudible.) >> Andrew Kun: So a quick overview of what we've learned. Certainly most speech recognition accuracy is a problem. If you're going to put a -- and of course it may not be that big of a problem, because people will not use your system, but if they decide to use it, they will have issues with their driving performance. Press-to-talk button is an issue. So the design of the press-to-talk button, where you put it, what kind of interaction it is, you should pay attention to. And then the question about training. So this business of glancing down. So one question is would this person have done this week after week after week. So if we bring back the same subject over and over and over, are they going to stop doing it or no. And I'm not sure that they will, because in fact there is no training going on. No one is telling this person, look, don't look down. So perhaps they'll figure it out, but there is really no feedback that says, you know, you really shouldn't do this. So I'm wondering if in certain situations bad habits that are formed at the beginning of the user interaction being figured out by the user are going to stay because what exactly is the training that we give our users and what exactly is the training that they're willing to accept. So we might have to really design for this and think about this ahead of time, because the bad -- I'm really of the opinion that the bad habits that they develop early on, unless they crash and then they're told, hey, by the way, that was because you were glancing down, what exactly is the feedback that makes them stop. >>: Kind of curious as to whether we could train the users by simply having nothing interesting to look at when they look down. So they look down, blank screen. Okay. Maybe eventually they'll stop looking because they're not finding anything there. >>: And hopefully the navigation experiment where you actually have this third condition of the speech only, my guess is that that's going to be (inaudible) -- >>: (Inaudible) still was looking at that direction over and over and over each time (inaudible) the same, nothing changes. >> Andrew Kun: And what I wonder about is what is the training that tells them don't do this. I don't think that there's anything. So unless he has some self-feedback of, oh, boy, I just almost ran something over because I wasn't looking. >>: (Inaudible) possibility might be that people actually believe what they see more than what they hear more. >> Andrew Kun: True. >>: So maybe if they gain more confidence that this system is actually doing a good job (inaudible) -- >> Andrew Kun: You're right. You're absolutely right. It's an open question. I don't have an answer. >>: (Inaudible) happen to see anything, I mean, there's something to see (inaudible) they want to see. >>: So intelligibility, too, is also -- if they don't hear what was said, they're not sure, the visual confirmation -- >> Andrew Kun: Right. >>: That's why we use voice prompts. >> Andrew Kun: Right. But it -- >>: (Inaudible.) >> Andrew Kun: In this case, the video may not have sounded great, but, in fact, in the car it was pretty clear. So I don't think that -- yeah. >>: A comparable question may be whether or not they look down when they're pressing their cruise control buttons as well. >> Andrew Kun: Right. >>: Because that's a completely driving -- I mean, I don't know how people use their cruise control generally, but use accelerator and resume (inaudible) and they look down on all those as well or (inaudible). >>: College kids can (inaudible). >> Andrew Kun: And also, I'm sorry, but that's actually an even -- well, the difference is that there are a lot of buttons there, more than one anyway. So that might give you more of a reason. And also I wonder how often you use it. So there's something to tactile feedback or just -- people make buttons that are a different shape and feel, so that might help. >>: How were your subjects motivated to do a good job of driving? The stakes are actually pretty low -- >> Andrew Kun: You're right. And that's certainly something that -- we don't actually -in these designs, we actually don't have a reward for a particularly job well done. But what we do ask them to do is drive as you normally would. And it seems to me, and Oskar can correct me if I'm wrong -- but I think that people are pretty excited to be there. They're not unhappy. And they're getting paid reasonably well. It's $20 an hour. >>: I think (inaudible) said that they would get $5 more. >> Andrew Kun: Okay. So I'm wrong. So in your experiment -- oh, that's right. So in your experiment we said that for -- in what case? >>: (Inaudible) get $15 if they complete the test and $5 more if they do it right -- I mean, if they tried to drive (inaudible). >>: Is there any options to introduce traffic into those simulations? >> Andrew Kun: There is traffic. And you may not have noticed it, but there is ambient traffic. And you can actually control individual cars and make them do things that kind of cut you off or turn or whatnot. Yeah. >>: Sort of a related question to that is thinking about other kinds of metrics and performance that you might use (inaudible) and I know one thing that (inaudible) is following position. So in the task where it's basically just sort of your position to the car in front of you (inaudible). >> Andrew Kun: Yeah. So we do have distance to cars; however, actually Tim was suggesting a similar metric, which would be the two-second rule, so are you tailgating too close. >>: Sure. I'm just thinking generally. I'm looking at the speed variability -- >> Andrew Kun: Yeah, we have that number, so we (inaudible). >>: It would just be a nice thing to look at, I think. But it seems like the more you have there -- I mean, some of those differences were real (inaudible). >>: It seemed like some of the conditions you were seeing lots of glances, it felt like there was a lot of stress in the video. I wonder if you could actually measure that (inaudible). >> Andrew Kun: Yes. Well, so you can't really measure stress, but one thing you can measure is skin conductance, for example. And actually one of my slides talks about that. So, yeah, we're -- we think that one reason that you may be doing poorly is some sort of frustration or stress. And you can -- if you know that you're inducing it, then you can -- when you measure a change in the skin conductance, you can argue that that's related and that's a measure -- an effect of it. >>: In fact, that's what we're having to do before Jacob decided, before I (inaudible). So the next set of experiments were choosing a part, frustration. That frustration caused by -- different types of frustration measured, physiological, depending on frustration that's caused by not being able to fulfill the task or if it's caused by some property of the speech interface and trying to (inaudible) as well. >>: For the navigation study, if you were deciding the speech versus the speech plus screen interfaces, do you think that the speech might be different between the two, like if it was only speech you might say different things? >> Andrew Kun: Yeah. And you're absolutely right. So, I mean, one thing is ideally you'd have -- or probably you might be better off with landmarks. Right now what we're doing is sort of trying to get a baseline, I think that's fair to say, of what happens if you don't really change anything but move the Goowy away. However, what we're hoping to learn, both from the Goowy plus speech as well as the speech only is, well, how do people react? For example, what happens with the glances. And that might actually give you an idea of how you should say things differently. So, for example, if it turns out that they get fidgety in a long stretch, that they normally would have actually looked down onto the GPS to confirm that they're on the right road, the red line is still ahead of them and there are no -- they haven't lost it. Maybe you should say something along the lines of you're still fine, or repeat, by the way, we'll be taking the exit in a mile. >>: So I think the drivers will be quite -- it's easy to tell which way they prefer between a paper map and GPS. Have they -- have you asked them the question, say, hey, do you really prefer the like (inaudible) touchless speech instruction versus (inaudible)? >> Andrew Kun: We are asking the question of how do they like it and are they happy with it. >> : (Inaudible?) >> Andrew Kun: I don't have the numbers, unfortunately. We're still collecting the data, so we're -- but I -- and haven't looked at those numbers, to be honest with you. >>: I think that would be very interesting because -- >> Andrew Kun: I agree. They may not like -- they may really like the visual feedback. So that's -- but then again, that goes back to John's question, so there might be things that you can do differently into speech that will reassure them and -- >>: When I was driving -- because sometimes I get frustrated using the GPS because once they say it, they won't let me say, what did you say? This kind of interaction may actually (inaudible) users' confidence (inaudible). >> Andrew Kun: Sure. Yeah. >>: (Inaudible) listening to music (inaudible) I may even switch off the GPS and just use the (inaudible) the music shut off everything (inaudible)? >> Andrew Kun: Yeah. And that's a good question. On the other hand, I think one argument about the visual display is that it's really not that safe. I mean, looking away from the road is probably not advisable, especially -- I mean, when do you actually look at navigation devices? When you're actually lost or when you're in a new place. So you in fact need the information. It's unlikely -- at least I can't do a quick glance. I usually -I like to travel with my wife, and she looks at it. That's the safest setup that we have. >>: (Inaudible) confusion is the way you look at it. (Laughter.) >> Andrew Kun: Exactly. Well, I know, because there's a delay, right? The GPS is slightly delayed and I -- >>: (Inaudible.) >> Andrew Kun: I know. I've missed turns many times because of that, because is it really now? Oh, no. >>: The trend in the navigation displays are for higher and higher resolution and more and more detailed graphics and more and more information, kind of away from what we used to have the low res displays where it has a big arrow turning right, which was much easier to glance at and interpret what that meant. >>: (Inaudible.) >>: We have the 3D buildings and we have perspective rendering and we have shadows and drop shadows here and you see the angle of the sun and whether it's raining over there. >> Andrew Kun: I don't know, have you guys sat in a Prius, the Toyota Prius? So what's with that display? They have this display that shows you when you started braking and then the energy goes from the wheels into the whatever, the battery, and then when you step on the accelerator, the reverse -- how does this matter and how is this safe. >>: (Inaudible) impressive. >> Andrew Kun: It is impressive, but, boy, it just doesn't seem safe at all. Because people look at it and then they tell you, look. >>: It's fun to look at from the passenger's -- >> Andrew Kun: That's right. >>: I owned one of those cars, and I got one of the first in this country. And you learn to ignore that display because it's so -- I mean, what it's basically doing is teaching you how to drive. >> Andrew Kun: Sure. >>: Whenever you are heavy on any change, then you lose gas mileage and it teaches you (inaudible). And then with once you learn how to drive that way, then -- >> Andrew Kun: Yeah. >>: So now that we've seen these experiments in some detail, can you give a little bit more information about the human-human experiments that you guys are talking about? >> Andrew Kun: Sure. I can tell you about them. I don't have the slide. But in the human-human interactions, basically we're trying to find out -- in the latest study we've run is we looked at a task that's similar to the map task, which is there is a driver and there is a dispatcher and the dispatcher is trying to get you from point A to point B, and the problem is that the dispatcher's map and reality don't match. So the dispatcher is telling you to take a right but there is no right turn. And the reason to do this is so that there will be an ongoing conversation. And our interest is in the question what happens if you have multiple overlapping dialogues with a machine as well as a hands-busy eyes-busy task. So, for example, the analogous situation would be that I'm driving and I'm discussing with you the study and Tim is in the car and every now and then he interrupts with, oh, you need to take a right here, or something along the lines. That, and then I go talk to him about the directions and then I go back to you. And most likely I'm going to be able to do this without crashing. But if I do that with a computer, it's not clear that -- it's probably going to influence my driving performance, but also the speech interaction performance, right? And so what do people -- right now what we're interested in, how do people do this and what is it that we can learn from human-human interaction. So one study that we've done was looked at adjacency pairs because that's a nice, easy way -- that's something the electrical engineers that we are can understand and take a look at if you have this ongoing task and it's made up of adjacency pairs, where do people interrupt, within, without, and also depending on what kind of urgency that interrupting task has. And we're actually continuing to look at that, and now actually the next study -- I was just talking to Peter Heeman. Some of you probably know him from OGI. He's my collaborator on this. And we're designing the next experiment where we're thinking about sort of a 20-question duel being the ongoing task, where you ask a question -- you have a turn and the other person has a turn, and whoever gets to the answer first, and then having another interrupting task. But probably have something along the lines of a driver and then a person at another location connected with headsets, so that we -- we realize that we didn't have enough basically in the original -- the last experiment we had data, but it wasn't probably enough data at the right places. So we'd like to actually force a lot more question and answer pairs, and then more carefully figure out where we're going to insert the interruptions. >>: I believe there's a study that says that cell phones are dangerous, but having a passenger in the front safe actually is safer than driving (inaudible) and so that it actually matters if that person is in the car (inaudible). >> Andrew Kun: Yeah. >>: I wonder if your 20 questions could actually (inaudible) >>: Andrew Kun: Right. Interesting. >>: You might want to talk about (inaudible). >> Andrew Kun: Yes, I'm about to do that. Someone asked about measuring frustration. So I wanted to talk about a few things of what's next. So on a smaller scale, perhaps, we're looking at frustration and then specifically in the small scale referring to, well, how do you measure frustration. So one way to do that is to measure skin conductance, which could be the physiological effect of frustration. And skin conductance, we have a nice device that you're supposed to strap on little electrodes onto your fingers, which of course doesn't really work if you're driving because there is -- for one thing, motion artifact. If it's right on your fingers and you squeeze, then that creates problems. So Owen, one of the students, is designing this glove. We like gloves. And he's trying to fix the electrodes in places where the motion artifact will not be so pronounced and you'll still get a decent reading. So that hopefully will be operational soon and then we can run some studies. >>: (Inaudible) here or here and to measure the skin (inaudible) -- >> Andrew Kun: We could. But the signal is not as strong. So the best signal is on your palm and on your hand. So you're right. We might end up having to do that. We wanted to give this a try first because the signal is nicer. But you're right, we might -- we might have to do that. And then if I can remind you of the problems that we wanted to address throughout our studies, the in-car devices versus driving performance and then driving performance versus probability of accidents. And what Tim and I have been discussing now for a while and we're hoping to reach is this UNH -- Tim graciously gave up the naming rights here -- obstacle test. So the -- how about if you could design an obstacle test if you're driving you can -- and not being distracted you can get through. So things like you're driving in the city and people are pulling out in front of you or a pedestrian is jumping out or a car braking in front of you. And you can handle this fine because you're not distracted and it happens such that it gives you enough time, if you drive in a reasonable way. But then what happens if you put a device in there? Does that distract you enough that basically you cannot now pass this unhot test. And if that's the case, perhaps this is a good way to then measure the impact of these devices and even to create as a quantitative test of should you put this thing in the car or no. So this is certainly a large goal that we have set out for ourselves. And then tied to this -- this is the simulator-based world. Now, to remind you, we have this project, the 54 system, which is deployed in roughly a thousand police cruisers in New Hampshire and maybe a couple hundred around the country. So what you could do is actually tie this unhot test to some law enforcement vehicles as well and get the unhot test to tell you will this thing work well, and then tie that to some perhaps naturalistic studies, right, that go on in a police cruiser. And we've actually pilot completed recently where we looked at how do police officers use the Project54 system, meaning do they use the speech interface, do they use the graphical interface, or do they use the original hardware interface. And let me see if I can -- so, yeah, and this slide, just to remind you, so we have the speech interface, press-to-talk, and microphone. You have the graphical user interface and you also have in the center console the original hardware interface. So if you don't want to use whichever, or perhaps you're really used to flipping the light switch on, that's the fastest way to do it, or as it turns out the radar actually has a remote, and if you -- that's the fastest way to catch someone, because you really have to do it quickly: cars fly by at 80 miles an hour, you can't -- if you issue a speech command that says lock, meaning lock, the speed, by the time it gets recognized, the car is gone. So speech is just not a realistic scenario here. So in this slide or in this slide here you see this is what people would see on the graphical user interface, and you have an overlay here, sort of a heat map of, well, how many times did a particular speech command get issued. So dark blue would say yes, often, and lighter blue, less often. So some of the -- basically we can collect this type of data, so how do people use things in the car. So we think that between the unhot giving us a nice way to predict what's going to happen and then perhaps informing that -- the design of this unhot test from some of the naturalistic data that we can collect in really a large deployed base that we have a good relationship with, we think that we have something that could be interesting. Now, I said law enforcement here, but I do want to point out that from the point of view of things getting into cars, devices getting into cars, law enforcement is the vanguard because they really use it on a daily basis and they really need it on a daily basis, so this is a nice place to -- it's a nice place to study. >>: Are the physical button presses and all that stuff also instrumented? >> Andrew Kun: We can log them. They're not -- they're instrumented by software. So we basically -- given that everything is synched up, we know that someone actually pressed a button and we can tell, so -- which is really important, because that's a key. I doubt that they -- they sometimes use the graphical user interface, but my guess is they'll flip switches and then they'll talk. >>: I just think that it would be really fun to sort of map out all the commands that are used as well as the actual interactions and see where is the balance (inaudible) which is the classic things that are always touched. >> Andrew Kun: That's exactly what we're hoping to -- in fact, the pilot was run I think last summer, and now we're gearing up to basically deploy this in probably 20-ish cruisers. And we have a nice statewide setup where we can wirelessly get the data back, so I think that the data should start flowing sometime soon. So a quick set of acknowledgments: The NSF for funding us, as well as the USDOJ, where the majority of funding comes from; Microsoft Research for multiple things; certainly Tim's collaboration; also Jacob, one of my grad students, is a Microsoft intern, so that's very much appreciated. And also the in-kind contribution of software which we're receiving to compensate our subjects in the navigation study. And Tellme who provided the voice talent recordings for the navigation study, the turn directions. So, with that, I'm going to plug my blog. I run this EC blogger blog where Oskar is one of the main contributors. And we have stories that are relevant to this particular type of research, as well as other stories. So if you feel like checking it out, please do. (Applause.) >> Andrew Kun: Thank you.