16920 >> A.J. Brush: So I'm A.J. Brush. I'm...

advertisement

16920

>> A.J. Brush: So I'm A.J. Brush. I'm delighted to introduce Kevin Li, who is here to talk about his work in enabling eyes-free interaction by exploiting human experience. And usually I make people tell me a fun fact about themselves, but I didn't grill Kevin enough. But he's just recently back from Germany as of yesterday. So you can ask him tough time zone math questions if you should wish to at the end.

So Kevin.

>> Kevin Li: Thanks, A.J. So at the core of today's talk is really this problem that people have difficulty interacting with their mobile devices when they can't actually look at them.

And the main take-away I have for you today is we can actually address this by using touch-based messages and if we build them in such a way that take advantage of human experience, then they'll have some prelearned quality to them.

What I mean if we base these touch-based messages to something people are familiar with like music and human touch and speech then whatever information mapping they have with those experiences will then carry over to these touch-based messages.

And so in today's talk I'm going to open up with a brief project I worked on here actually a couple of years ago on using auditory feedback to solve this eyes-free problem. For the remainder of the talk I'll focus on touch-based communication. I'm going to first present a taxonomy that I've used to guide my exploration in this space of mapping things to touch.

And then I'm going to talk about a series of projects that I've worked on looking at how you might map music, human touch and speech to touch-based messaging.

But before I dive in, I want to briefly talk about why we even need these new ways of interacting with devices when we can't look at them.

So to see why this is a problem, I want to start by looking at your typical desktop environment.

So I'm sure you all have something like this at home or in your office. You have this nice big display for viewing information, and you have a keyboard and a mouse dedicated for input.

Unfortunately, kind of an inherent assumption in the setup is that the screen is always available to the user, and so as a result this has been designed for the visual channel in mind.

Unfortunately, this assumption no longer holds as people go mobile. If it did, then today's user would look very much like this, with the PC strapped on their back and a screen glued in front of them.

Today, instead users look much like this. Highly mobile. On the move, places to go and people to see. And lucky for us we have these cell phones we carry with us wherever which has become the de facto standard for mobile computing.

They're very cool you can see the interfaces are still highly visual because they've kind of grown from traditional UI design.

Unfortunately, there's a number of scenarios where this highly visual design won't work out so well. So using visual interfaces such as the iPhone when driving can be incredibly dangerous.

In other situations, using visual channel can be highly disruptive to your situation. And then there are other times when using devices can be socially unacceptable.

There's also a huge class of users who are visually impaired and can't even take advantage of visual interfaces much less use them. And finally we're seeing kind of as devices get smaller and smaller we're seeing a new class of screenless devices. So these devices, such as the IPod

Shuffle don't even have displays. So obviously a visual interface would not work here.

Okay. So obviously we need a way to support eyes for interaction where people can't actually look at their devices. And so then the next question becomes how do we actually enable this?

So one possibility would be to use auditory feedback. And so I imagine many of you are familiar with this project. So I'm just going to very briefly go through it. So to ground this example in kind of a concrete usage scenario I want to first show a typical scenario where a person might be trying to access phone content while talking on the phone.

[Demonstration] Hi, Kevin. Can we meet sometime next week? Yeah, John, sounds great. Let me check my calendar. Hold on. When did you have in mind? How about Tuesday morning sometime? Let me check, hold on. It would have to be after 11. What did you say? I was saying it has to be after 11, though, yeah, no, I'm only free from 11 to four I'm sorry I have meetings all afternoon how does Wednesday afternoon look. Hold on let me see.

You can see how looking at the device disrupted the conversation. To address this problem we built an application called Blind Sight, a Windows mobile application that runs on the phone essentially launches when the phone call initiates.

So users can provide input by pressing these buttons on the phone and receive output in the form of auditory feedback. So this allows a user to access information content on the phone when they can't necessarily look at it.

You might be wondering why I'm holding the phone upside down here that's just one of the many form factors we tried out when we were exploring that project.

So you might think, well, with auditory feedback this should be a fairly simple problem. You could just read out information, whatever information prompts show up on the screen you could just read that out to the user and it turns out it's not that simple because that would essentially give you what you get with interactive voice response menus.

So you've all experienced this where you call into customer service, you get this long listing of details and different departments that you can contact, and you sort of have to wait for a really long time before you can actually press the option that you want.

So we had to implement a number of different design options. And so in the interests of time, I'm only going to show one of the design options that we had to implement to actually make this work.

And the video clip just describes it.

[Video] hitting the button one describes the button's functionality. Calendar. Hitting the button again launches the function. Once users are familiar with the system, they can preempt announcements by double pressing in quick succession. Monday 9 a.m.

Gives you an idea how users are able to run applications that are kind of running on this phone.

Again, I'm leaving out a lot of details here, but I want to show you a brief demo video to give you an idea what we ended up with. I want to point out this is fast usage, by an expert user, namely, myself. And it might seem a little confusing to someone who is seeing it for the first time. But I should point out that we did give this to completely unexperienced users at the end with the user study and they were able to pick this up within a couple of minutes of use.

[Video] Using Blind Sight the previous conversation might go like this. Hi, Kevin, can we meet sometime next week? Oh. Hi John. Sounds great. What did you have in mind? How about

Tuesday morning? Noon. I'm busy in the morning, I'm free in the early afternoon. Sorry. I have meetings all afternoon. How does Wednesday afternoon look. Yeah, Wednesday sounds good.

I'm free after one. Okay. Let's make it one then. Call me on my mobile phone if anything comes up. Will do. Can you give me your number again? Sure. Do you have something to write with?

Yep. It's 555-123-7324. Got it? Of course. Oh, if anything comes up call me at the AI lab. Their number is 420 -- wait. Let me find something to write that down with.

So I think it's worth discussing a number of the different properties with audio feedback. On the plus side it's easily learnable. You can see anyone who understands English can more or less pick up the system.

That's what we saw in our user study as well. You can convey lots of information just by reading information out to the user. Unfortunately, it's not usable in all scenarios. Loud environments such as concerts you wouldn't be able to use something like this because you wouldn't be able to hear the feedback. Obviously having a phone talk to you in the middle of a meeting is probably still going to be socially disruptive.

So we need something else. And so for this touch is kind of an interesting alternative to auditory feedback as it might address some of the shortcomings that I just discussed.

So conveying binary information using touch is pretty easy. And I'm sure you all have experience with this. Your vibrating phone tells you when someone's actually calling. It doesn't really tell you much more than that but you are able to receive this information.

And so to convey more complex information, you could imagine using multiple vibrators. And this is essentially what Lorna Brown did with her multi-dimensional tactones project. Mounted a couple of actuators on users' arms, and by varying amplitude and duration she created a bunch of unique different vibration patterns that users were able to identify and distinguish.

And you could take this approach a step further and essentially generate vibration Morse code.

So Psychologist Frank Delgard did this in the '50s, where he mounted five different actuators on a user's chest and he varied intensity and duration for each of the actuators and mapped a different signal to different letters of the alphabet.

He found that after 65 hours of training users were able to start receiving messages at about 35 words per minute. This is great because it shows you can convey a lot of information using touch. But obviously we would like something where you don't have to spend so much time learning these messages.

So the question now becomes: Can we come up with something that's not so abstract, something that's a little easier to learn?

And so as we start thinking about this space, we decided to kind of look at how people interact with the world eyes-free in the absence of technology. So if you think about it, as people we have five different senses available to us.

Obviously the sense of sight is out. But we still have our sense of hearing, which we use all the time. Smell. Taste. And touch. And so if we think about how we might map these different senses to touch-based messages, obviously taste and smell are very interesting, but they don't have very straightforward computational equivalence. I know people are working in that space but I'm going to leave them out of this discussion. And so for the remainder of today I'm going to focus on how you might map things that you experience through touch and through hearing to touch-based messages.

And so you could imagine first if we talk only about the touch-based channel you can imagine breaking up the space into really touching other objects and other people. And within each one of these categories, you can imagine a different project looking at the options in that space.

And so in the case of people touching other objects, you could imagine exploring a project looking at how you might simulate the feeling of buttons on touch screens, for example.

A lot of phone manufacturers are working on that problem. So I will not talk about that today.

In the humans touching other humans category, I'll be talking about a project looking at how you might map human touch to a form of computer mediated touch. And then moving over to the hearing category, you could again divide that space into non-speech auditory cues and speech auditory cues and I'll be talking about a project looking at mapping and music sound effects to vibrations as well as a project looking at a tactile encoding of speech.

And so the first project I'll be talking about is looking at how we can map music to vibrations. And obviously sound is really just another form of vibration. And so when I talk about vibrations here I really mean mobile phone vibrations which are kind of on the order of 250 hertz. So you might be wondering why are we even talking about this? Why do we need a vibro tactile encoding of music. The most common example I give of this is vibro tactile ring tone. We currently have auditory ring tones where you're able to hear the ring tone you know exactly who is calling you.

But we don't have a straightforward tactile equivalent.

You could imagine if we could map that music cue to a vibration then users could tell who was calling based on this vibro tactile cue. The problem we're faced with here is that we want to convert music to these different these vibrations. To do so we're going to need vibrations of varying intensity. Unfortunately, the phone's vibrator as exposed through the typical phone developer API only turns on and off. And it does so only at a single frequency and a single amplitude.

So to generate these different levels we're going to need something here. And so what we did was we applied something similar to pulse with modulation, which is commonly used with motors.

So for those who aren't familiar, the general idea here is if you think about actually modern daylight dimmers use this.

If you enter a room and the light's off it's too dark. You turn the light on but the room's too bright.

So what you can do you can stand there and rapidly flip the switch on and off and what you'll get, if you switch it fast enough, is you'll get kind of this medium level of light intensity in the room.

And by varying how long you turn the light switch on for, before you turn it off, you can kind of vary the intensity of light in the room.

And so if you apply this same concept to motors, what you essentially get is you get the ability to change the rate at which your motor rotates. And when you do this with vibration motors, you essentially get different or varying levels of vibration intensity.

And so we did this using commodity smart phone motors on a phone that has Windows mobile running on it and we found we were able to produce 10 user differential intensities of about 20 milliseconds long. So when we tried plane them lower than 20 milliseconds it actually didn't work very well. And I think it has to do with the fact that the motor doesn't spin up quickly enough in those situations.

And so moving back to this problem of this higher level problem of trying to map music to vibrations, obviously there's a ton of information in the music information retrieval literature for how people identify different types of music. And it turns out beat and lyrics are kind of the two most important characteristics for identifying music.

And that kind of makes sense if you think about how you yourself think about different types of songs. Unfortunately -- yes.

>>: 20 milliseconds is 50 hertz. 50 hertz is the normal roughly spin frequency of a pager motor so how could you get code snips with 20 milliseconds? That means you can spin it once and turn it off.

>>: Kevin Li: So the pulse, the width of the pulse is actually much shorter than 20 milliseconds.

>>: The peak of the impulse or the actual vibration pulse that you feel.

>>: Kevin Li: The PWIM. The full db cycle is 20 milliseconds if you flick it on real quickly.

So you've got beat and you've got lyrics.

It turns out when you try to map these to the 50 hertz equivalent, it really doesn't work out so well just because of the high disparity in fidelity. And so what we ended up using was to try to convey the current energy level of the music instead of these kind of macro level properties.

And so shown over here is the high level process that we used, and I'll go into each step in more detail now. So the first step is to apply a common band pass filter to the signal to remove noise.

And when I say noise, I'm referring both to kind of noise in the very traditional sense of signal and noise ratio but a lot of these elements that I talked about earlier such as lyrics and things like that.

And then we use amplitude thresholding to further reduce some of the less important elements.

And all this is done in the time domain.

And then we take a running sum of absolute values, effectively performing rectification, and we take one value for every 20 milliseconds of signal, and that allows us to maintain signal duration to be consistent between the audio and the vibration equivalent.

And then in the last step we just exaggerate the features by composing the previous output with an exponential curve. And shown here is the generalized form for that and we had to apply a number of different constants just because we had to tweak it a little bit for different songs.

So to give you -- to give you an idea what this actually feels like I have a phone demo with me but obviously I can't put a phone in everyone's hands but you're welcome to try it after the talk.

What I've done is I've recorded the sound of the vibrating phone on my laptop. This will require a little bit of imagination on your part. The first is the audio equivalent of Beethoven's 5th, which I'm sure you're all familiar with. And then the vibro tactile equivalent. It feels much better than it sounds. And then this song. And the corresponding vibro tactile sequence.

Okay. So now we have a way of mapping music and sound to different vibrations. And so then the next question becomes: Will this work in the wild? And so to try this out we built a buddy proximity notification system and so this is an application that's running on Windows mobile phones and it basically supports two different states of proximity. Nearby and far away.

So when a buddy is close, it plays the corresponding song associated with that buddy. And if the phone is in vibrate mode, then we play the matching or corresponding vibro tactile queue. So obviously there's a number of symptom level components I won't talk about today but the details are in the Mobius paper. So just focusing on the user study.

We gave this to three groups of friends who carried it around with them over the course of two weeks.

So these 17 participants were divided into three different usage groups. And so the main thing to focus on here is the different cue conditions. We had nature sounds, which is just commonly used nature sounds based on the peripheral cue literature and we had your choice where you would select what your friends would hear when you guys became close to each other, and then the my choice condition, you yourself would select what you would hear when your friends came close by.

And to give you an idea of how users use this, here's a quote taken from a user at the end of the study.

"One time at the library I wanted to eat with someone and so I went outside to call someone. The phone vibrated. I just called the person immediately." You can see how this supports this nice to know information without kind of requiring users to have to look at the phone.

And so the main level take-away from the study is really shown here where we asked users whether they could identify who it was after the cue's played. So every time after a cue a form popped up on the phone they would answer it either then or answer it later.

And again the high level point is really just that the two music conditions performed better than the nature cue condition.

>>: Does the phone vibrate when it goes to sleep?

>>: Kevin Li: That was one thing that wasn't well controlled because people depending on their contacts they would often change their phones from the different conditions. Unfortunately we didn't actually record what percentage of the time people had their phone in vibrate versus audio.

>>: Is there a vibrate equivalent of the nature cues?

>>: Kevin Li: It's the same algorithm as the music.

>>: After that, did you do a test with them or were you unable?

>>: Kevin Li: We did a prestudy and a post study. The details I won't talk about here. But the thing we were interested, I guess I could talk about it briefly. The thing we were interested there is I personally expected to see some form of learning taking time, occurring over the course of the study. But turns out there were no significant learning effects. So people were very good at identifying the cues just right off the bat for songs they knew really well, both before and after the study.

But for like the sound cues and for the songs that people weren't familiar with, they were equally bad in the beginning and at the end.

>>: For the vibration?

>>: Kevin Li: For the vibration, yeah. So that was kind of an interesting finding.

So based on some of these findings from our study, at a higher level there's some interesting take-aways for designing peripheral cues in the wild. The first finding we get these higher comprehension rates when users select their own cues, that kind of alludes to what I mentioned to a few minutes ago.

I also touched on this that mapping music to vibration was more successful for people who actually knew the songs well.

And the third point is that it turns out that semantic association is sort of key to learnability. And I don't mean learnability over time. I guess I kind of mean more of this prelearned ability. So people were able to map these vibrations to songs with 75 percent of the participants were able to just get it for all of the songs that we mapped right off the bat.

This might seem like an obvious finding to people outside of HCI, and I just wanted to point out that in the peripheral cue literature it's often suggested to use nature cues, because they're unobtrusive and they kind of blend in with the environment.

I guess I wanted to point out that for certain types of applications, music actually might be more effective if it has this strong semantic association.

Okay. So at a higher level, turns out that vibrations seems to work well for trying to map music to these tactile cues. But we do need more expressive touch for kind of other things. So as I pointed out vibration was a very low fidelity signalling mechanism. We need something more interesting for conveying more information. And so that leads me to the next project that I'll talk about which is looking at computer mediated human touch.

So the idea here is that there are certain things that we convey with human touch that make much more sense than communicating with vibration. If you look into the psychology literature human touch is actually really good for conveying motive aspects or intent.

You can imagine some kind of tapping or rubbing coming from your phone that would presumably tell you how urgent a phone call is, for example.

And so again everyone here is familiar with the tap on the shoulder, or a rub on the back when kind of your friend might be feeling down and out. But modern day mobile phones don't support any of that. All they do is vibrate.

Shown here is a vibro tactile actuator pulled out of your typical phone that I'm sure it's in every single one of your phones in the audience.

And I think it's worth talking about briefly here, because it has two problems, really. So the way it works is it's got this offset weight connected to a rotational motor. When you turn the motor on this offset weight starts spinning around, and after it kind of reaches full speed, the entire body of the device starts vibrating, and the two problems that I see with this really are, first of all, that because it is kind of this rotational device, all you get out of it is vibration.

So you can turn it on and off quickly, as I mentioned earlier, or you can turn it on at different rates.

But at the end of the day all it does is vibrate. And the second problem, which I think is the bigger problem, is that people don't actually go around vibrating each other as a form of communication.

And when they do it's usually not a good thing. And so the observation here is that people instead tap and rub each other. And so if people do -- if we as people do this why can't we do this with our devices? That's kind of the motivation behind the project that we looked at here which was trying to recreate tapping and rubbing. So I'll be talking about two prototypes that we built to explore this idea. And both of them are really based on this underlying technology of voice coil motors.

So this technology is commonly found in hard disk drives. You'll find one of these. This is taken out of a standard hard drive that you find in your computer, and I'll briefly talk about it. So the way this works is that you have these coils of wires sandwiched between two permanent magnets. When you kind of change the voltage going through these coils, that allows you to move the head back and forth.

And so what's nice about this, as opposed to the vibration actuator that I showed earlier is, first of all, the coils are really light. Right? There's no offset weight and the magnet isn't actually attached to the armature. So the result is you get a nice low mechanical latency with kind of the start-up of this device.

And the second thing is it kind of sweeps laterally instead of this rotational action that the vibration motor was giving us. And so this allows us to explore a couple of different alternatives to vibration.

And the final point I want to point out is that you might think that this is really big and heavy and there's no way this could ever actually be on a multiple device. But it turns out that with customized forms of these actuators, these things could be much smaller. So shown here is

IBM's micro drive technology which they developed in the early 2000s. 2001, 2002 as an alternative to solid state memory for compact flash. This is really just a small hard drive that would basically sit in compact flash form factor.

The idea is you would be able to support higher memory densities than what solid state could support at the time.

Which is great for us, just because it shows that you could really have a smaller hard drive motor.

And so shown here is kind of the setup for how we actually use this actuator to drive our different prototypes. So to generate a signal, we wanted to use the audio output on the computer. But this signal actually turned out to be too weak. And so we had to connect it to a custom audio amplifier, which allowed us to boost the signal along with some filters to filter out kind of low level noise, which we then connected to a voice coil motor shown over there.

And this allowed us to basically control the device by using the laptop. Shown here is our tapping prototype. We've attached a hammer to the voice coil motor which allows us to generate tapping sensations by moving the contact head orthogonally to the contact surface.

Shown here is the rubbing prototype again using the voice coil motor but moving the contact head laterally across the skin surface to generate a rubbing sensation.

Here's another view of the rubbing prototype. You can see that the user would place their hand over this gray window and the white head that you see is actually the contact head that we used for generating the rubbing. What it is it's Teflon tape coded over a hollow plastic shell which kind of has this human flesh-like quality to it. And if you remove the window you can see that the head is attached to this arm, again using the voice coil mode fraction to move it laterally across the skin surface.

Now we've got a prototype. We wanted to see if this actually felt like tapping and rubbing. So we ran some user studies. We recruited 16 participants in a within-subjects design for the tapping study. We generated taps using 250 millisecond square waves. I know Mike is going to call me out on this. I'll say up front it wasn't a true wave because the band pass properties of the sound card but more of an approximated one but it was for the most part good enough for our purposes.

So we exposed users to taps of different strength and speed and we had a number of different speed conditions. So because we wanted to use different speeds but we originally wanted to keep these tap sequences of constant duration, but what happened is in some of our pilot studies we found well if you change speed but you leave duration constant. Then you're going to end up with different sequences with different number of taps.

And people said I would just count the number of taps and that would tell me which sequence was faster. So we had to break the speed condition into these two sub conditions of constant number of taps or constant duration where you essentially vary the number of taps.

We collected both quantitative and qualitative data. In the interests of time, I'm not going to talk about the qualitative -- sorry, the quantitative aspects. But we'll instead be focusing on the quantitative-qualitative aspects because we were most interested in whether we could actually recreate this human experience. After the study we asked users if you had to explain these -- if you had to describe these sensations to someone who wasn't familiar with this, what would you

say? How would you describe it? And we asked them which aspects of the experience felt natural.

And, finally, we said if your phone could generate this in an open-ended session what would you want to use these kind of sensations for?

So we were very careful throughout this process to not use the terms tap or rub or anything suggestive of human contact. Because we wanted to see what they would say. And so it turns out that tapping actually mimics the real thing. So 13 of our 16 participants in the tapping study actually used the term tap when describing the sensation.

And the people who didn't use terms like drumming their fingers on a table or kind of a tickling sensation or these very human-like qualities. It's also worth pointing out that the fast and slow taps are consistent. People consistently said that the slow taps felt much more human, whereas the fast taps felt more mechanical like a vibration.

That's also interesting, I think, because it further reinforces this idea that vibrations really just aren't a natural or human form of contact.

Rubbing situation setup was very much the same. The main difference was that we didn't have a amplitude condition because if you think about it, swinging the rubbing actuator faster across the skin is really the same as moving it faster. So there's no need for an amplitude condition and so we only had eight participants.

Again, I'm only going to focus on the qualitative aspects of the study. So half of the participants said that this actually felt like rubbing. They actually used the term rubbing in their commentary.

And so for the people who didn't, they said it felt kind of like a light grazing or light sweeping across their hand. And interestingly one participant said it felt strangely comfortable, almost like the touch of someone else. It was more like a finger touching my skin than an object, which I think is interesting. If not a bit creepy.

Okay. So it also turns out that taps and rubs are good for different things. Based on what people said they would use these types of things for, it turns out that they're good for things that vibration aren't. So on the one hand people said that taps are good because they're quiet. And so they're not going to make a sound if this device is kind of swinging around on the table at the library, whereas a vibrating phone kind of makes this sound and is still disruptive.

On the other hand, they said it's strong, which is good for loud environments where you might be on the move. And you wouldn't necessarily hear or feel your vibration, say, in your pocket. But because these taps bordered on the level of pain they imagined it was very likely they would be able to feel these things if it was in their pocket.

And I think it's worth pointing out that the rubbing is a little more subtle. So both from a perceptive standpoint and from kind of an applications standpoint.

So from an applications standpoint, participants said that this would be useful in the hand scenarios where a vibrating phone is just uncomfortable. They mentioned you probably wouldn't be able to feel a phone vibrating in your pocket just because it's a light sensation that I mentioned earlier.

The other thing I wanted to touch on is just the fact that not all of our participants said it felt like rubbing. It had kind of this light grazing quality to it. We think part of the reason for that is because we really couldn't control how much the actuator would push into the user's skin when it was rubbing across the surface.

And you could imagine if we combined kind of the lateral motion of the rubbing device with the -- into the surface of the tapping, that you could basically regulate exactly how much you're pushing into the skin when you rub across.

That would essentially be combining tapping and rubbing into one device. At a very high level it turns out, yes, we can recreate tapping and rubbing to an extent using computer mediated communication. Turns out they're good for certain things that vibration isn't good for. But it also turns out that human touch isn't necessarily good for everything. So I alluded to this earlier. But psychology tells us human touch is really good for emotive aspects, where you might be trying to get someone's attention or convey you're really mad at them, but at the same time it's not very good for semantics.

You can imagine trying to organize a lunch meeting, for example, by using only human touch. I mean, I think that would be an incredibly hard problem and I don't even know how you would do that.

It's not so great for expressing semantics. And so the question here is: Well, can we bridge this gap? So we've kind of got this emotive aspect of touch going but how can we convey semantics using touch.

That leads me to the final USB project I'm going to talk about which is something I'm currently wrapping up a vibro tactile encoding of speech.

And the idea here is that you could imagine this being used for two main applications. The first one that we started off with was looking at kind of this messaging back channel. You can imagine vibrating sequences coming from your phone telling you, you know, someone is thinking about you or some other lightweight communication message. I'm running a little late, something that doesn't necessarily need SMS to convey the information.

A second, more interesting, potentially more interesting application, would be a form of augmented SMS. So right now there's kind of this ambiguity associated with computer mediated communication using text. Because people are unable to express certain levels of emotive aspects to their speech because these elements of provity [phonetic] are actually missing from textual messaging. You can imagine an augmented SMS message where you received a text message along with an accompanying vibration that would potentially tell the user exactly kind of which words in the sentence are emphasized.

And so as a pilot study to see whether this would even work, we took five common phrases from the text messaging literature and we generated four different vibration sequences for each phrase.

We recruited 20 participants, and in a forced choice response we basically repeatedly presented these text phrases to users and we played these different vibrations and we said hey what does this vibration feel like? We made them select one of these sequences even though they might felt a little unsure about it.

It turns out that participants actually do agree across the board on certain phrases mapping with certain vibrations.

So shown here is the plot for the hello vibration. So I also have a demo of this on a phone which you could feel after the talk. On the Y axis is amplitude and on the X axis is a sample. So you could see the pulse matching most closely to hello is kind of this medium level vibration followed by a pause and then kind of a really strong vibration.

And shown here are the plots for are you busy? I miss you, good-bye, and where are you. So this is kind of -- this was kind of a -- excuse me. This was kind of a somewhat unstructured study just to get an idea of what elements of speech really are important for this kind of communication language. And so kind of moving forward, we decided, you know, by looking at some of our results, it turns out that there seems to be some overlap between linguistic properties and kind of the different vibrations that we came up with.

So we're currently testing a number of hypotheses based on this. First of all, seems like the number of syllables is important both based on looking at the confusion matrix which I didn't show and the most successful phrases. Seems like different syllables should map to demarcated vibration pulses. And looking at linguistic provity which is generally considered to be rhythm, stress and intonation. It seems like these also map to different kind of duration and intensities of these vibrations.

And an interesting insight here kind of as a side note is that you know linguistics tells us that provity conveys a lot of emotive communication and information which got us thinking about this augmented SMS application that I talked about earlier.

So one insight is rather than trying to map things like emotion and a bunch of other things that other HCI researchers are looking at, one possibility is that once we get the provity down, if we just send this vibration sequence to someone else, they will kind of, based on that vibration sequence, be able to extract the posit and presumably get a certain level of emotion out of that.

I'm currently working on testing those hypothesis that I mentioned before in some user studies back in San Diego.

Okay. So just to summarize today's talk. I opened up with an auditory feedback example. I presented a taxonomy that we use for kind of guiding our exploration in this space and then I talked about a number of different projects I worked on looking at how we might map music, human touch and speech to the tactile channel.

And really the high level take-away from today's talk, if you got nothing else, is that it turns out we really can take advantage of things that people are familiar with and map those to the tactile channel. And they come with some kind of information mapping for free. Largely based on things that they're already familiar with.

And so looking forward, you can imagine taking this idea, kind of everything that I've talked about today, which really for the most part even if you look at the greater [inaudible] literature really only focuses on how users receive information. It really focuses on kind of how we use tactile feedback really rather than tactile messages.

But it says nothing about how users could actually convey this kind of information in a tactile messaging channel, for example.

You can imagine using the exact same approach that I talked about where we try to leverage this pre-learned quality of human experience and using that for different tools for conveying information in these eyes-free scenarios.

Presumably this would lead to new tactile languages where people would be able to communicate. A whole new class of emotive communication where you would be able to convey just lightweight emotional information, and finally when you kind of combine this sending and receiving of tactile messages, I think looking forward we're going to see this new class of devices that kind of supports this lightweight messaging channel that isn't really supported by SMS or e-mail or phone calls currently.

And so with that I'll take any questions you have.

[applause]

>>: You look forward you say all of a sudden now some [inaudible] wants to make some version of what you have, what form factor would you see them bodied at? Obviously wouldn't be a big device. How would you make it noninvasive and nonlinear.

>>: Kevin Li: Fine. So given kind of what I've discovered so far and without being able to predict what I'll discover in the future, I could imagine a kind of, first of all, taking out the current vibro tactile actuator in the phone and replacing it with two of the voice coil motors that I talked about in smaller form.

So I imagine one being kind of attached to the side which would generate this tapping sensation and another one orthogonally mounted on the back that could generate this rubbing sensation.

The other thing I would mount is something similar to Ken Purlin's mouse, you know what it is.

But for the people who don't know what it is, it uses four sensitive resistors. Basically it's a really thin, paper thin-like sensor which basically allows you to pick up both pressure as well as kind of location on the pad.

So I have --

>>: That still requires you to get access, access to the device. Whereas the vibro tracker off center weight, in your pocket or close to your person, still get that one bit of information. So how would you --

>>: Kevin Li: So I think the actuators would be used for actually generating the actual taps and rubs. I bring up the pad because I imagine that to be an extremely useful sensor for picking up this generation of tapping and rubbing and other kinds of tactile messages.

So before I knew about the unmouse I actually developed a prototype for kind of end user generated taps and rubs which hasn't been published yet.

But the idea here is kind of a touch pad mounted on a pressure sensor. So you tap it to generate taps. The harder you tap, the harder these taps are, and kind of the faster you rub, the faster these rubs get. So you can imagine mounting this on the back of a phone. Obviously this isn't nearly as cool as the unmouse, and so if you had the unmouse on the back you could generate taps and rubs that way and send it to your other people.

>>: With the rubbing is that something that will have move like a windshield wiper on the outside of the phone or could it be inside the case?

>>: Kevin Li: You could transmit it through the body. All the rubbing that we've done has been more of a skin-on-skin type of feeling. In some of the early prototypes we actually played with doing the rubbing through a thin membrane and you can imagine that's exactly what you would feel if the phone was actually in your pocket, right? Because unless you have a hole in your pocket it's not going to be rubbing directly against your skin.

So an alternative to the voice coil motor approach would be to use kind of two rotational motors.

So it's kind of like train treads -- sorry tank treads but you replace the treads with fabric and generate some kind of rubbing sensation that way.

>>: I was just wondering [inaudible] achieve the same results.

>>: Kevin Li: I'm sorry, what?

>>: Allowing -- [indiscernible].

>>: Kevin Li: So you know I know some people in Japan are working on a blowing kind of feedback device. So I think it's an interesting idea. My hunch is that --

>>: [indiscernible].

>>: Kevin Li: Yeah. So I mean, first of all, my hunch is that it would probably generate a different type of sensation. First of all. My second concern with that is really miniaturizability. Because if you look at kind of their setup, they're all really big and that kind of pneumatic setup would probably -- I'm not sure how that would actually be miniaturized.

>>: Crossed my mind when you --

>>: Kevin Li: No, it's definitely interesting stuff.

>>: I was interested how you talked about we use different auditory ring tones to represent different people calling. You could essentially do that with the vibrator as well.

Based on the experiments you did turning the music into vibrations, how feasible is that to do on the fly, with existing technology?

>>: Kevin Li: So I guess everything that we did in that application could be done on the fly. I mean, the conversion wasn't that computationally intensive. It was using Map Lab but the tools exist.

The main concern is right now it's somewhat semi-automatic. You notice I had those constants in there, because it required a little bit of tweaking for different songs. The other thing obviously we didn't map the entire song, we only selected a portion of the song that was particularly good for this kind of mapping.

You can imagine some kind of automated mechanism for like a two-stage classifier first detecting which subset or which interval in the song would be good for this kind of mapping, and then trying to do a server side.

But currently it's not automated. But it could potentially be in the near future. Maybe two, three years out.

>>: Do you have any sense of how large a vocabulary in the last project that [indiscernible] did you ask people.

>>: Kevin Li: The speech one?

>>: The speech one, yes. So how large a set of sensors one could be submitted against.

>>: Kevin Li: That's a good question. I actually think that the vocabulary there, I mean I think so, first of all, I don't know exactly how big of a vocabulary we could get in terms of trying to express text. But I actually think that by trying to express provity instead of text, you, first of all, have a much more limited vocabulary of what you need to express so that's good but also you're able to express new things that existing mechanisms can't currently convey.

But off the top of my head I would wager that users could fairly quickly be able to pick up somewhere between 10 and 20 different messages in a closed language that was used over, say, an hour, I imagine, they could probably pick that up.

>>: What direction, is it turn-by-turn guiding?

>>: Kevin Li: Yeah, so Hong Kang did this thing in the late '90s when she was a grad student, which is actually using sensory [inaudible]. That's basically where you have two actuators kind of physically located apart where you play them off in quick succession and that kind of gives you a directional feeling.

And so she actually used that as kind of a navigational guide. So if you imagine having kind of four actuators mounted on a device and firing off like the bottom one and then the top one in kind of with phase offset, then you kind of get this directional feeling.

And so yes, I think that's a great idea. Someone should actually do it. But it has been studied.

>>: Have you tried any of your experiments with people who are deaf.

>>: Kevin Li: I have not. Actually that's a good question. So some of the provity work, I've actually been talking with some of the other folks at UCSD who are looking at kind of hearing impaired. And so we actually think that by once we get the details of the algorithm down, we can use that to help users who used to be kind of, who recently became hearing impaired to be able to learn sign language a little better than the current mechanism, which seemed to be really hard.

So I think in that regard there is some value, yes.

>>: With the vibro technology and this needs to be consistent, something that you could be like more intimate and not when you're away from the phone, more casual form of alerting a person.

>>: Kevin Li: Yes, so I think that's a great idea. I think one of the nice things you also touch on is the fact you can use different body locations instead of the thigh. Which isn't so great for tactile feedback.

So things like headsets or a watch kind of in terms of accessories I think would be really good for this kind of tactile messaging system.

In terms of actual clothes, I don't know actually how you would kind of weave those actuators onto clothes in a way they can still go through the washer dryer.

>>: There's like this concept I saw here like the hug shirt.

>>: Kevin Li: I saw that, too.

>>: I don't know how they do it but --

>>: Kevin Li: That's using pneumatics, I think. But it's a cool idea. I didn't actually get to try it out, but it's a good idea.

>> A.J. Brush: Let's thank Kevin.

[applause]

Download