21596 >> Merrie Morris: So thanks. Sorry, we're getting...

advertisement
21596
>> Merrie Morris: So thanks. Sorry, we're getting started a little late. But welcome back to the
second Microsoft installment of the Microsoft and UW accessibility seminar. This week's seminar
is going to be Scott Saponas, who has both UW and Microsoft roots. So he just finished his
Ph.D. in the computer science department at UW over the summer. And he just joined Microsoft
Research this fall. So this is a good way for everyone to get to know what Scott's been up to and
what he might do next.
>> Scott Saponas: Thanks, Mary. I actually left the prototypes for this in my office. So at the
conclusion of questions, I am going to sprint out to my office and sprint back down again and then
you guys can all see what the -- I mean, see and feel what these things actually look like that I'm
going to talk about today.
So you have that to look forward to after my questions -- after questions at the end of the talk.
Interrupt me as much as you want during the talk. What I wanted to talk to you guys about today
is some of the work that we did the last couple of years on optically sensing tongue gestures for
computer input or tongue computer interfaces.
Sounds a little whacky but it's interesting stuff. This is work I did here at Microsoft Research with
Desney Tan also here at MSR, my manager here as well as Dan Kelly, a Masters student in the
EEE department, who has graduated now and Babak Parviz who is a professor at the department
at UW.
A lot of the work that I do -- oh, look at this. Job Talk Leftovers. A lot of the work I to do is trying
to figure out how we can utilize some of the untapped bandwidth of the human body for computer
input.
My dissertation research was focused on muscle computer interfaces, how we can do things like
things from the surface of the skin, detect the electrical activity associated with the muscles, so
we can do things like interact with the computer with finger gestures without having an input
device in our hands.
The work I'm going to talk to you about today is actually in the same theme but here's the
untapped bandwidth that we're going to try to utilize is our tongues.
So it turns out our tongues are pretty interesting and unique part of our body, they're the only
muscle that's attached at only one end. If you move your tongue around in your mouth you can
feel that it's pretty dexterous. It also has sensors on it.
You can feel each one of your individual teeth. You can manipulate it pretty precisely. You use it
to do things like manipulate food in your mouth, but you also use it in things like speech. Very
precisely without even really thinking about it. So our tongues are super powerful.
But you'll notice when we interact with computers, we don't utilize any of that part of our
physiology or our brain.
One of the reasons why I'm interested in looking at tongue computer interfaces is for use on for
people who have traumatic brain injuries, spinal injuries or diseases of the nervous system like
Lou Gehrig's disease. As you know, in this accessibility class, you've probably talked at least a
few of these things before, but there's a variety of interfaces out there for people who are, for
example, quadriplegic. Things like sip and puff which voice recognition, as well as linguistic voice
brain computer interfaces, head tracking, eye tracking, and these all have uses that are pretty
good for a bunch of different applications things like voice recognition for doing text to speech and
dictation.
But they're also pretty limited in how many bits of input they have. And for somebody who
doesn't have any motor control, it's hard to do things like manipulate a motorized chair, for
example, with these modes of input.
So instead, like I said before, what about instead trying to use our tongues to control it? It's not
exactly -- we thought why not use the tongues because for most people who have like spinal
injury or brain injury, they usually tend to still have full control of their tongues.
But how do you put some kind of computing in the mouth so you can actually sense how your
tongue's moving around. It's not entirely obvious at first.
Before I get to how we did it, there's actually a few different pieces of work that people have done
in the past looking at tongue movement. There's kind of two different classes.
So one of them is taking your tongue and kind of using it like you would use your finger to touch
like a button on a keyboard or a mouse or a game controller or something like that.
Here what people have done is basically put something that's the equivalent of a directional pad
or a number pad in your mouth, like against the roof of your mouth. You can kind of imagine
curling your tongue back and then trying to press those buttons for like on a number pad if you're
dialing a phone or directional pad if you're trying to control something.
Then the other main approach that people have done is augment the tongue in some way. In this
case with a magnet, and then put sensors on the outside of the mouth to try and detect the
movement of that magnet that's affixed to your tongue, and then be able to use your tongue
movement in that way to control the computer.
So we don't really like either of these approaches. The reason we don't like the approach on the
left there is because it's not very comfortable to curl your tongue up in your mouth and be
constantly pressing against things. That's not really how we use our tongue in speech or most of
the time.
And this approach actually does let you leave your tongue kind of in its natural resting position
and move it around in your mouth but you have to augment the tongue with something like a
magnet. And you have to have sensors. Most likely on the outside of your mouth, possibly on
the inside of your mouth. So we don't like that either.
>>: [inaudible] of the magnetic sensor.
>> Scott Saponas: What do you mean?
>>: What's the form?
>> Scott Saponas: There's one magnet on the tongue. Is that what you mean?
>>: Yeah.
>> Scott Saponas: Yeah. So first thing that I thought of when we started working on this project
is, well, mice work really, really well. Can't you just take the innards of a mouse and put them
against the roof of your mouth and just move your tongue against the bottom of that mouse and
then you could just control anything that you can control with the mouse or use mouse input.
So believe it or not, we actually tried that out. I took one of these little miniature mice and took it
apart and stuck it in my mouth and as well as convinced a few other of my colleagues like Johnny
Lee to do the same.
We played around with it a bunch. And it turns out that for a couple of different reasons, it doesn't
track very well.
>>: How big is the sensor?
>> Scott Saponas: The sensor that's inside this is probably less than a centimeter square
footprint. And vertically maybe half a centimeter at most, something like that.
So we just took the innards of this mouse. Stuck it in our mouth and it turns out our tongue, when
you kind of press it against the bottom -- first, if you don't put anything on the bottom of the
mouse it has a big cavity there that starts to fill with saliva doesn't do any good. You can try
putting a piece of plastic down at the bottom where the focus point of it is for the mouse. But
when you press your tongue up against that so it's flat your tongue at least for this mouse
technology becomes a little featureless. As you can match it's hard to track. If you take a mouse
and put it on glass or a surface like that it's fairly featureless for that type of tracking technology
and then it's hard for it to track the movement of the mouse.
So I thought, okay, I can solve that problem. At least temporarily. I'll take some medical tape and
wrap that around my tongue. Turns out that doesn't work really well either and it tastes really
bad.
So this started to feel like maybe this whole tongue computer interfaces wasn't a good idea.
Then we have a small inset that actually panned out to be pretty good. And that was instead of
trying to use the technology of a mouse to look at the image of your tongue and then track how
it's moving around, how about instead we use a much, much simpler version of that that is just
using the fact that your tongue reflects light and then put multiple of those sensors in your mouth
and trying to detect how your tongue's moving around that way instead of tracking it precisely like
you would with a mouse.
The basic idea here is that your mouth is pretty dark. As long as your mouth's closed, there's no
other sources of infrared light in your mouth. Your tongue reflects infrared light pretty well.
There's this nice curve that light falls off in intensity over distance, with some like square root
function or something like that. Or something close to that.
So what you can do, you can have an infrared emitter that bounces out, infrared detector and
essentially detect how far away something is beneath your sensor. It's used in industrial
applications, in robotics and different places for detecting how close two things are together and
like detecting when you're about to close something or something like that.
So we thought this technology actually can be done pretty small. It can potentially be fabricated
in a very small way. Why not try to put this in the mouth.
So what we did, this is one of our early prototypes of the system, is we bought four commercial
infrared proximity sensors they use in industrial applications. These guys are pretty small. Can't
remember the dimensions off the top of my head. But they have a profile like five millimeters
high, something like that. So they're not very big.
And we picked up -- we went to an orthodontist and watched them take molds of somebody's
mouth and make a retainer and open an orthodontic textbook how to make a retainer and bought
all the supplies at UW, because UW has, when you say I'm at UW trying to buy orthodontic
supplies they just assume you're in the orthodontics department so that's not a problem.
And basically learned how to fabricate our own retainers. Now, these retainers that we were
making aren't for correcting your teeth. So there's no like wire on the front of it or anything like
that. And it's not like -- there's nothing like a fix to individual teeth or anything like that. This is
part of a retainer that fits up against your palate and hooks around your molars.
What we did we got a mold of our mouth, and then poured a retainer, but in the process of
pouring a retainer, we set down these infrared proximity sensors and ran some wires out the front
of our mouth.
A little uncomfortable having a wire up the front of your mouth but I'll address that in a little bit.
>>: The sensors, are they like on the top of it or are they inside?
>> Scott Saponas: Yeah. So I should have mentioned that. So these sensors are completely
enclosed in the acrylic. The idea is that you can essentially make all those sensors biocompatible
if you will because the acrylic is bio compatible and the infrared light can be emitted and detected
through the acrylic.
There's actually a little bit of a complication there because now you have an interface between -so normally when you have an infrared proximity sensor, you just have basically an air interface
you have your sensor and you have air and if you remember back to your physics and optics
classes or whatever, you have emitter and then you have air and something that's bouncing off it.
But the only interface is that air to sensor interface.
Now you have sensor to acrylic to maybe saliva and then some air and then some saliva. And it
comes back again. So it's a little bit more -- it doesn't work. Like if you take their performance
curves and we do the same match, which actually for a little bit of an anecdote, if you're trying to
assess the performance of how these work, we really don't have a lot of white space in here
unfortunately, but most of you are on this side of the room so I'll write over here.
Since you asked this question, I'm going to go off the path here for just a second. But one of the
things that we did here is try to assess how well these sensors work for our situation.
So can any of you guess how we would build a test harness for being able to produce like a
graph of like voltage over distance for these sensors and what the performance curve might look,
how you might build a test apparatus to simulate infrared proximity sensors in your mouth?
>>: Did you get a cadaver tongue?
>> Scott Saponas: No, that's a good guess. Another guess would be might get a cow tongue.
It's really easy to get a cow tongue hard.
Yes, I could try to use my dog. But you want a controlled situation. Cadaver tongue or cow
tongue might not be a bad way to do it, but we went cheaper than that. I won't hold you in
suspense anymore. So what we did is we have a little platform here. And then we have an
adjustable thing here. And then another platform. And then here's your infrared proximity
sensor, sending out light and receiving it back.
So what we do at the bottom here is -- and we encase this sensor in acrylic. So you have the
acrylic interface. And then what we put on the bottom is a piece of pepperoni. So we take
pepperoni here and then we put water on the pepperoni to simulate saliva. And then we put
water on the acrylic right here, and then we can try to come up with a performance graph of what
it looks like.
Why this is useful is because there's a whole circuit behind this, and there's different ways you
can interpret the signal coming out of this, different ways you can modulate the infrared
transmitter. You have to keep experimenting with this figuring out what's going to work in this
situation because what you get off the data sheet from the manufacturer doesn't really address
pepperoni or tongues or saliva or any of these situations.
>>: How does the local absorption of that radiation affect the proximity, the distance? Because
it's really dark meat as opposed to really white meat do you get a different?
>> Scott Saponas: I think you do get a little bit different absorption. But in everything that we did,
the much bigger variable seemed to be the acrylic as well as the saliva.
>>: Is that [inaudible] or amplitude sensor.
>> Scott Saponas: Amplitude.
>>: It should vary quite a bit with contrast of the subject, right?
>> Scott Saponas: Subject to subject? Yeah, it will vary. We do calibration per subject for that
reason. So one of the problems here is that I'll draw another version of this figure here. What
this really -- what this sensor really looks like is you have the transmitter and the receiver, and
then you have some amount of acrylic, no matter what, right? Because we're trying to make this
bio compatible. So you have some amount of acrylic. Then you have saliva or air or some
combination right here. And so again if you remember back to your physics classes, some of the
white is going to go out like this and reflect off your tongue and come back.
But some of the light is also going to reflect off this interface and come back, which is not good.
That toy messes with your sensors, especially because the intensity of this is much more than the
intensity of this.
So we were trying to come up with a bio compatible way of doing this. One of the first things we
did was carve a notch in here and fill it with Sharpie ink. It worked really well except we didn't
think it was a good idea to have Sharpie ink in your mouth. The thing we ended up doing, which
is kind of funny, is to carve a notch in the acrylic here and fill it with a compound of essentially
acrylic and coffee.
So that you can make -- have basically like a pick, dental acrylic in there. It does an okay job of
isolating the transmitter and the receiver.
All right. I diverged quite a bit. Go ahead.
>>: So why can't you just put it -- why can't you just embed it so it's like sticking out?
>>: Because of the components of the infrared proximity sensor and detector have various
properties like toxicity we don't really want to have it in your mouth. So we need to have the
acrylic there.
For test purposes we could try something like that but we know eventually you can't do that.
>>: Any triangulation sensors you could acquire that would eliminate that problem? Like the
sharp range sensors?
>> Scott Saponas: That's a good question, like ultrasonic.
>>: Triangulation of infrared.
>> Scott Saponas: So based on finding?
>>: No just an off axis illuminator, depending on where it came it's a position sensor. We can talk
off line.
>> Scott Saponas: That's a good question. Yeah, I guess it depends a lot on how the tongue
reflects the light.
>>: And the short distance.
>> Scott Saponas: Yeah. I'm not sure if it will work in that short a distance. Because, yeah,
that's a good question. We should talk about it. I'm not sure it will work with how the tongue is
reflecting light and how short the distance is. But that's a good question. It's worth trying.
Anyway, back to trying to figure out how to observe tongue movement in the mouth using this
infrared proximity to sensor problem that I described over there.
So you have four of these in our setup we put four of these proximity sensors in this retainer
that's in your mouth. So you can think of those as roughly kind of being overlaid on the tip of your
tongue or, sorry, on the front of your teeth on the side on the left side of your teeth, right side of
your teeth coming near the front and one over the middle of your palate. Did you have a
question?
>>: What kind of time of resolution do you have for this kind of measurement?
>> Scott Saponas: Like what's the resolution?
>>: The time. So is it continuous measurement?
>> Scott Saponas: Something like -- you can -- so the infrared proximity is [phone ringing] maybe
I should have turned my phone off before giving a talk. The infrared proximity sensors, the
receivers and the transmitters, because they're like photo dio-based you can turn them on and off
in like six nano seconds or something like that.
You can modulate them very fast and they react to the light really fast, too. So it's speed of light
to your tongue and back. They settle really, really quick. So it's more like the sampling rate we
can do with our analog to digital converter and our micro controller and stuff like that.
We sample it at like 250 hertz, something like that. But you can sample it even faster than that if
you wanted to.
>>: [inaudible] okay so like ->> Scott Saponas: So like a couple of milliseconds.
>>: That's good.
>> Scott Saponas: That actually, I don't think, becomes too much of an issue. It's more the
accuracy of when you get really close to the acrylic, you have these refraction issues. That stuff
is much more of a concern than sampling rate. I'll say sampling rate is not a problem.
>>: It's tough to make a measurement on in this case?
>> Scott Saponas: That's what I'm going to describe. We have these sensors in the four
positions of the mouth. So what we want to do is come up with some set of tongue movements
that we can detect using those sensors. And so that we can then map those to some type of
input.
So in our work, what we mostly focus -- we tried a few different things, but what we mostly
focused on was four different movements and a couple of extras that I'll describe that we can
detect somewhat robustly.
And those are basically taking your tongue from its resting position. I'm using my hand here as a
tongue. And swiping it from a rest position, do a swipe from the left to your mouth and then
returning to the rest, or swiping your tongue to the right imagine against your teeth, just left to
right, being able to tap your tongue against -- sorry, they're down here. I'm blocking them.
Tapping your tongue against the top of your mouth or holding your tongue briefly against the top
of your mouth.
So the way we detected it is actually a very simple heuristic and requires a little calibration step.
So no fancy machine learning or anything like that. And basically what we're looking at is the
timing of when your tongue slides past these sensors. Depending on the shape of a person's
mouth and how they move their tongue, the mapping of this can actually be backwards. So one
of the things that's kind of interesting is that, again I'll need the white board -- can you see
anything I'm writing on the white board over here? I'll erase this and use this far side again.
So one of the things we learned when we started taking molds of people's mouths is people's
mouth shapes are wildly different. Which I hadn't thought about but I guess makes sense but like
wildly wildly different. Some mouth cavities are really, really tiny and some are really, really large
and some are really short and some are really long. So people's mouth cavities are pretty
different sizes. So that affects how this problem works quite a bit.
So in some people's mouths, when they swipe their tongue from left to right, the way these
infrared proximity sensors detect that -- so I'll overlay the sensors like this -- in some people's
mouths, you see the tongue kind of curve up and do a swiping against their teeth, and what you
actually see is the back of the tongue getting close to the sensor and then getting close to this
sensor.
In other people's mouths you see the front of the tongue be on this side and then be on this side.
So for some people, a left-to-right swipe looks like passing very close to the left sensor and the
right sensor, and for some other people it looks like right to left. What's good is people tend to
swipe their tongue the same way over and over again or they can learn to. It's robust per person
but it can be totally backwards depending on who is doing the tongue swiping.
>>: How do you deal with this step, how can you get a sense with what's going on inside the
mouth?
>> Scott Saponas: That's a good question. Things like try that again and have your mouth open
a little bit while I look in there. And then also kind of get used to what the signal looks like while
they're doing it over and over again.
It actually is really hard. And in fact I have -- so I have a fixed retainer on the back of my teeth.
So it's not easy to take a mold of my mouth and not mess with that and build one of these
retainers for myself.
So I've actually never had this in my mouth before. So as a person trying to like design these
interactions figure out how it works, I built a retainer, I use my finger on and I have to come up
with hypotheses how somebody's tongue might work with that and try it out and watch somebody
else do it. It's actually kind of a hard problem. It's a lot different than watching someone use a
mouse or keyboard or game controller or gestural interface, all that kind of stuff, you can kind of
watch what's happening.
You have to kind of infer it a little bit. You can watch a lot and try to figure out what people are
doing in their mouth whether if they're actually following your instructions, externally what's
happening with their mouth and expression on their face while they're trying to do something.
But, yeah, it's a little funny. That's a good question.
>>: How natural it is for people maybe talk when this device is put in?
>> Scott Saponas: So the thicker the retainer is, the more it's going to become a problem with
your speech. So if you have one of these retainers and it kind of -- I'll have to draw another thing
for you guys.
So here is my poor drawing of what your teeth look like from like the top down. So if the retainer
and then from the side-view, like you have teeth and then you have a palate to your mouth, if the
retainer can occupy as small a space this way as possible and can be as thin this way as
possible, the more thin it is that way and the more you can have a little cut-in right here, the more
you can have your regular speech pattern or not really notice that it's in there when you're talking.
If you make it really thick in either this direction or making this like a flat line or bulging out this
way, it's harder to put your tongue in the normal positions it's in when you're speaking.
If you make it really thick, in either this direction or making this like a flat line or bulging out this
way it's harder to put your tongue in the normal positions it's in when you're speaking.
But to answer your question you can talk with it in your mouth as long as it's sort of thin. The first
couple we made were so thick you had saliva pouring out of your mouth the whole time. Once
we made it a little bit thinner, that wasn't so much of a problem anymore. You can ask Desney
about that.
>>: [inaudible]fixed set of movement.
>> Scott Saponas: Try to detect them robustly.
>>: I don't know much about speech. I worked on speech for a few years, if you can actually
measure the tongue tick, however you wanted to do that, and also chew sensors in the body, you
can do recognition from it.
>> Scott Saponas: Actually, that's a very interesting question one we talked about a lot. If you
get some of that problem right and put kind of a microarray of a bunch of those sensors in the
retainer I think you can do it. Especially if you modulate them, because the photo diodes respond
so fast you can do cool stuff where you're turning on different ones at different times really fast
and modulating it. I think it's possible.
>>: In Europe, I think there's still something -- there's some magnetic sensor that actually collects
about five points in the mouth, and also the rhythm is very hard for you, I guess. Once you have
that, you actually can do recognition.
>> Scott Saponas: I think you actually can do it here, too. We didn't focus on that problem
because you can't do it with commercial sensors because they're too big essentially.
But what Babek Parvis the professor who I worked with on this he's like a nano fab guy. So he's
done a lot of really small LEDs like on contact lenses and stuff like that with self-assembly
techniques on to all sorts of different substrates.
And so it's a little bit harder on the -- he's not sure how well that same technique will work for the
infrared receiver. But he thinks it can work. So I think you can actually do a good or a small
array of many of these for really low cost.
>>: I think many years ago there was a retainer similar to that, fairly high granularity capacitor
sensor array for teaching kids who were deaf from birth where to put their tongue for speech.
That was one of the hardest things to convey to teach kids is where to put their tongue.
>> Scott Saponas: We actually tried doing a capacitive thing. Initially I didn't describe that in in
the earlier slides T but we ran into the area of capacitive and saliva. And it was -- we couldn't
figure out a good solution to that problem. You had to dry it out a lot. But if you're willing to like
spend a couple of minutes like it works for a couple of minutes. And then you've got to kind of dry
it out again.
Then it will work for a couple of minutes. I don't know, maybe they came up with a better solution
than we did.
>>: News we saw on TV many, many years ago.
>>: A lip move, where the lip closes.
>> Scott Saponas: So we -- so in this version we didn't mess with that at all, because these
sensors are all like behind your teeth in the center of your mouth.
You could imagine with a corrective retainer actually has a wire that goes around your teeth. If
you're doing the nano fabrication version and these relatively big for inside your mouth sensors it
might be possible to look at whether your mouth's open and whether your lip is eclipsing or not.
And depending on how dense the array is whether you would get some lip shape. I'm not sure
but it's possible.
I think it would be very interesting.
>>: Once you get the tongue in there and then the camera can look at the lip.
>> Scott Saponas: Right.
>>: And you can do recognition with that. So in the meeting you can actually talk about ->> Scott Saponas: Another thing that -- so there's a couple other things that I didn't describe that
we also looked into and tried a little bit. Since you're bringing it up, I'll stay on this slide for a little
longer and describe what they are. The other techniques that we started to mess with was one of
them was putting a camera in the back of your mouth and like a quarter sphere mirror in the front
of your mouth and basically have these be really small pointing from the back to the front to look
down back at your tongue and look at what's going on.
And do it with like infrared illumination and that kind of stuff for your mouth. Some of the other
things we talked about were taking a bundle of optical fibers and pushing it into a camera in the
back here and then having those fibers space out and to kind of an arbitrary density, all the way
through the acrylic of the retainer.
And it seemed like that would be a really good idea because there's a bunch of ways you could
think about that as a sensor. If you had a bunch of infrared illuminate tors spaced within it and
infrared filter you could think of it as a dense version of this. You could detect other things, like
you could get image out of that.
But we decided that was going to be very difficult and expensive and hard for us to manufacture
and test in a short period of time. So we kind of abandoned that route. But that's another
approach that could potentially work even better than this and might be feasible.
So the high level point was that using these four sensors you can get what -- you can relatively
easily get these four gestures in a robust way. There might be others you can do but we wanted
to look attribute-based access control least these four. I described this a little bit in words but I'll
put it on a slide for a second. The basics of how this system works, this is our first prototype of it.
Is you have those proximity sensors in the retainer. The signal goes through an amplifier circuit
and analog-to-digital converter and there's a micro controller that's sampling that and then send it
in this case over a serial port to a computer where we just, where we do the calibration and just
the detection of the swipes and tapping your tongue on the top of your mouth or holding it there.
>>: How weird is it [inaudible]?
>> Scott Saponas: It's weird. In fact, it's hard to talk with the wire coming out of the front of your
mouth. It also tends to cause a fair amount of drooling.
Which Desney claims is uncomfortable. I don't know. I've never tried the thing before. He might
just be a complainer.
>>: Speak to the wireless information sensor in ->> Scott Saponas: I'll talk about that in a minute. This is just the first prototype. I thought I'd
show it to you. I already drew or showed this curve in my hands and talked about some of these
problems that I drew out on the board. But here's a figure. I think this figure's actually from their
data sheet. But you have an LED transmitting stuff out. It comes back in photo diode. And the
way the sensor works is you essentially get a voltage that's dependent on how much light's
reflected back in. And assuming a somewhat constant setup, the amount of light that's reflected
back is dependent on the distance it is away. And this is kind of what that curve looks like with
voltage over distance.
So as you would guess the farther away you get the less light that's reflected back. So the closer
these dots get together, the farther you are out, the less sensitivity you have about whether it's 25
millimeters or 26 millimeters, whereas when you get really close, that change in intensity with how
this curve works, you can actually tell those apart pretty well.
The one caveat here that you'll notice, it looks kind of funny, is this curve doesn't start here and
go down. The curve starts here and goes up and then comes back down.
So there's this bump. So when you're really close to the sensor, it's hard to tell whether you're
like one millimeter away or five millimeters away.
And part of that is -- because when you get really -- when you get really close to the sensor, how
much light's reflecting back.
>>: I notice schematically it looks like the sensors and the LED are domed on the top, giving it
some optical properties. And if you fill that in with acrylic, doesn't that change the optical
properties?
>> Scott Saponas: Yes. And so we tried to like push this curve as far this way as we can by
doing things like that. By filling it with acrylic, with putting the coffee or the sharpy in between and
so the first place you can have cross talk is up here instead of up here and stuff like that. We're
able to push this curve around a little bit, which is good and I was trying to remember, we actually
have a clever technique for being able to tell which side of this curve you're on, but I can't
remember what that technique was now. I came up with it, too, and I can't remember. But
there's actually a couple of neat little tricks that we came up with so that you can actually have a
different way of looking at this also to get it to figure out which side of the curve you're on.
I wish I could remember what that was, because it was clever. But anyway -- so fabrication, I
talked about it before. You take a mold of your mouth and then you make a mold from that and
then you lay out your parts on that mold and then you pour -- you mix acrylic and pour it in there
and it hardens and you have a retainer. This is one of the many iterations that only had two
proximity sensors.
So what's the first application that you would try with this? First you convince Dan to put it in his
mouth, and then you ask him to play Tetris with it, or at least that's my approach to research
problems is get Dan to try it out and get him to try to play Tetris. Tetris isn't obviously the main
application of tongue interfaces, but this will show you how kind of how fast he can make
gestures and have them be recognized and how accurately.
What he's doing is swiping his tongue to the left to move a piece to the left. And wiping his
tongue to the right to move a piece to the right and tapping his tongue to the top of his mouth to
rotate a piece and he's holding it there for a second to make the piece drop down.
And so in normal Tetris that might be left, right, up, down, kind of a directional pad map to those
four things that we could detect.
But the thing that we did after that, which I think is actually a lot more interesting, is trying to map
those same controls to the control of a motorized chair. So what we did was grabbed a motorized
chair, went and built an interface to control its direction.
We actually talked to the manufacturer of the control unit for this chair and they were actually
super helpful. They thought our project was cool and they spent time with us on the phone and
sent us a bunch of like debugging hardware so we could reprogram the chair and do all sorts of
stuff. They're like as long as you send it back it's fine with us.
They sent us all their hardware. They were actually really helpful. It was a really positive
experience, actually.
And so we built an interface for controlling just the hookup to the computer. We came up with a
slightly different way of using those four sensors to control the chair. So to turn the chair to the
left, you swipe your tongue left to begin turning the chair. But then you just hold your tongue on
the left side of your mouth while you want the chair to continue turning. And then just drop your
tongue back to a rest position to stop.
The same thing for going right. Swipe your tongue right. The chair starts to turn, and then when
you put your tongue back in a rest position, the chair stops and that's your new heading.
The way that we were controlling the chair like the software interface would only let us either drive
the chair forward, backward, or turn one direction or the other. So we can't do like nice sweeping
curves like that. We can only go forward, stop, turn, go forward, stop, turn, go backward, that
kind of thing.
And then when we're going forward, what we do is we hold the tongue at the top of the mouth, so
that's your accelerator. You kind of hit the gas and it starts to move. Once it starts moving
because of the infrared proximity sensors have that nice curve we can use them not just as a
binary where the tongue is near the sensor but also how far away. So we control the speed of
the chair by how high your tongue is in your mouth.
So you can kind of move the tongue up and down to kind of speed up and then slow down and
then speed up again as you're moving forward.
So usually you would just kind of speed up, go fast and then gracefully slow down. But you can
do different things that way.
And then if you, for example, steer and turn and then you are trying to go this way and you clip
chair, so you're kind of stuck against that chair what we did was use the tap against the top of the
mouth to make the chair back off by like a foot then you can turn and go forward again.
So how do you test this kind of thing out? Well, we just went out here in front of the building and
set up a little obstacle course with these chairs here. And like formed a little route to just go
through the chairs like that and just see like how fast we can do it and how it compared to like
controlling the chair normally or controlling it with a keyboard as a way to kind of ballpark if this
interface is kind of feasible at all.
Dan was tired of my interfaces so I had to convince Desney to do it this time. So this is Desney
controlling the chair with his tongue. He's got a laptop in his lap that has a kill switch, which
needed to be used occasionally.
But other than that he's just controlling it with his tongue here. So swiping his tongue, initiating a
turn, then returning his tongue to rest to stop the turn and then moving forward.
So he'll execute like another turn and then straight and then another turn and then come through
the finish line here.
>>: To stop he just has to relax.
>> Scott Saponas: Yeah, just has to relax.
>>: To go back to the original ->> Scott Saponas: While it's in his mouth he can actually sit there and talk to you and nothing
happens, because if you think about how this worked, you have to swipe and hold the start of turn
left, swipe right or the only thing that you really have to be worried about is like something being
interpreted as a tap on the top of your mouth and having it back up a little. But for the most part
you could talk with it in your mouth and nothing happens.
I think really you want to have a more robust way of turning the system on and off with your
tongue or with voice or something like that. Another thing that people ask about with this is but
what about in an emergency situation, something runs in front of you and the infrared proximity
sensors are never going to be perfect and what do you do in that situation? And my answer
actually is that I think you go back and you borrow a little bit from those other interfaces that we
didn't like and you just put a physical button on the top of that retainer. You don't want to be
manipulating a physical button all the time, like curling your tongue back and pressing up,
because it's uncomfortable over time but that doesn't mean you can't have a kill switch that's a
physical button and use that in combination with infrared proximity sensors that are doing most of
the control.
As you alluded to it's not necessarily comfortable to have a wire hanging out the front of your
mouth. So this was actually the first version of us building a wireless version of this. So putting
the radio, micro controller and everything in one retainer, you can see it fits in his mouth. That
version is a little bit thick so impedes the speech a fair amount but it works, which is kind of cool.
And then this is the second version where we ran our own boards. So here's the main micro
controller and radio chip and then we have the antenna out here on another board and some
power stuff on that board and the battery over here. So we were able to split it apart and form it
in here.
These are all fairly rigid PCBs, stuff like that. You could make this stuff smaller. You could put
this stuffer in custom silicon if you wanted to, make it smaller to fit it in there. Another problem -you can nano fabricate most of those sensors and make them really small. One of the biggest
problems I would say is it's hard to make the battery smaller. This is a lithium polymer battery.
But you can run this -- I think that if you're doing most of the processing on board and you're only
transmitting off the commands wirelessly, you can actually make this thing run for a day on a
battery that would fit in the retainer.
But what we were doing in our debug setup is just blasting the sensor data that sampled at a high
rate the whole time when you do that a battery this size is only going to last an hour or something
like that.
>>: Can you change the battery --?
>> Scott Saponas: What we did, we had a couple of different methods for doing that. The
easiest one was to use these two metal hooks to recharge the battery.
>>: [inaudible].
>> Scott Saponas: You could.
>>: Seems kind of basic.
>> Scott Saponas: This is that basic -- you would do induction in a commercial product.
>>: How this particular operation exists, only able to detect the four gestures, you really don't
need to sample the [inaudible].
>> Scott Saponas: Right.
>>: Maybe every 21st property?
>> Scott Saponas: Maybe something like 30 hertz something like that. It depends. If you're
swiping your tongue fast and you're trying to catch the timing of robustly of which one it swiped
past first in that situation and then you're just detecting it's held over there, you have to be
sampling pretty fast for that tongue swipe.
But I don't think that that's the main power problem. I mean, using the radio is like an order of
magnitude more power than like doing a couple of additions on the micro controller like sampling.
So I think you can make this run for a day. It might be harder to push it longer than that, but I
think you can make it run for a day. I don't think that's a problem. So I think that's everything that
I have. Here's a little summary slide. But basically kind of the future directions are looking at
whether you can use that array of sensors, make them a little bit smaller, add a few more and
expand that gesture set, maybe double it, so that you can maybe for applications like controlling
the motorized chair be able to do a little bit more and a little bit more robustly so also be able to
control like the tilt functions or have a nice mode switch from going between driving the chair and
doing the tilt functions and stuff like that, as well as I think that it could also be useful for other -there's a bunch of different standards out there for how you can do remote control of like doors
and elevators and stuff like that for people who are in motorized chairs. You can imagine using,
having a mode switching away and being able to use the same interface to control those things
so you're driving a motorized chair you come up to an elevator if you don't have any motor control
you can't tap the button.
Some people who are quadriplegic they do have some motor control so they can do things like
still operate an elevator. But a lot of people can't.
>>: How do you think it would compare in bandwidth with HCI to say [inaudible] or wood stick.
>> Scott Saponas: That's a good question. So we haven't tried to like do things like figure out
how many bits each of those have. And what we were mostly trying to address is kind of like a
comfort thing. So from experience I spent a little bit of time in rehab hospitals for people who
have just suffered like a spinal injury and become quadriplegic. And sip and puff control of a
computer or a motorized chair just anecdotally is a lot slower learning process than the tongue
retainer.
In the tongue retainer people in our study, we did a study with just like four people from around
the office. They could all play Tetris in about five minutes and people are trying to do the same
thing with the sip and puff, because the sip and puff to get four controls it's two levels of negative
pressure and two levels of positive pressure. And like that's just -- it's really hard to control that.
The other thing that's really bad about it is that people have had spinal injuries, I think it's way
more than half have much more limited than an able-bodied person respiratory ability. So sip and
puff switches are kind of tougher from that standpoint. Especially early on when they're trying to
learn it right after -- while they're still in the hospital, still having rehab from the injury.
Their lung capacity tends to be very diminished so it's hard to do that. And they're trying to talk to
you and try that. Sip and puff is tough from that. And wood stick can be good also but you have
to have, both require augmentation right in front of you and stuff.
One of the things that's cool about this, actually, is that it's totally inside your mouth. And if you
make it thin enough it doesn't really get in the way of your speech. Sip and puff and the stick you
have to have them out in front of you, it's one more thing, when you're in a motorized chair it's
one more thing between you and the people you're interacting with.
>>: Taking that into the factor is huge [inaudible].
>> Scott Saponas: Exactly. So that's been my experience in talking with people.
>>: Is there any [inaudible] the voice straw, continuous motion? Do that with ->> Scott Saponas: We only have one little piece of continuous in the control of the speed.
>>: The speed.
>> Scott Saponas: I think that as you have a bigger array of those, so that there's more parts of
the mouth where you can try to do things like control the tongue shape, like you use it analog in
speech all the time, right? Like that's how you talk.
I think there actually are a bunch of opportunities for that, especially not for a motorized chair but
for interaction for doing things on a computer, that's the place where I think you'd really have to
leverage the analog ability to have interesting and useful controls of applications. I'm not sure
how much you need it for the motorized chair beyond a couple of different things like speed.
>>: Has there been any studies about just how tired the tongue gets over time?
>> Scott Saponas: That's a good question. So, no. Not that we know of, not for this kind of stuff.
That's something that we're really interested in, though, because it really impacts your ability to
use it.
We talked to a few different people like speech pathologists and a few other physicians, and kind
of their response was that your tongue's a muscle. You should be able to build up some
endurance and stuff like that. And like you should be able to operate a device like this. But I
don't know whether that's true or not.
>>: I'm sure you can. But it's also used for other purposes.
>> Scott Saponas: Right. So I don't know the answer to that question.
>>: I'm just trying to think about whether this way of detecting lip position with the [inaudible] gets
touched on the palate, on any part ->> Scott Saponas: If you had an array of 100 emitter detectors with analog distance.
>>: That would detect the whole shipment?
>> Scott Saponas: Yes.
>>: So this is just regular [inaudible] this is four sensors that can only detect these gross moves?
>> Scott Saponas: It can detect fine movement it just can't detect things like shape very easily. If
you have -- when you have a big array of sensors that you're sampling really fast, so you're
alternating which one you have turned on at each time, like which emitter. Also you don't have to
have a one-to-one pairing of emitters and transmitters. You might have an emitter on somewhere
in your array and you might at that same time sample 12 different detectors, and you might move
the emitter over to another emitter that's just a little ways away and sample a whole bunch of
detectors again, and then you can take all of those different signals that you're sampling, right,
and then take average -- there's all sorts of tricks you can use then to try to have a smoother,
more accurate version of the shape of the tongue and the movement and they'd be getting a
better shaped path and stuff like that.
But with four sensors, you can't really do any of those tricks.
>>: Also these sensors goes only one direction so actually doesn't detect the bigger part of the
tap, rather it just detects whatever -- whether something's there.
>> Scott Saponas: And how far away it is.
>>: Moving the tongue like this, you actually detect different parts of the tongue, not the fixed part
of the tongue.
>> Scott Saponas: Yes. So when you think of it as being one emitter and one detecter that's true
but in the scenario I was just describing where you have a variety of emitters and variety of
detecters turning on different ones at different times now you're actually getting a much more
global look at what the tongue is, what that tongue position looks like.
So imagine like if you will like when you take a camera and you point it at something and you do
like an edge detector or something like that and you see those lines kind of move around, or if
you take like a connect camera and you take the depth output or something like that, like a
connect camera has what, what's the depth thing like 320 by 240 or something like that. At like
30 hertz maybe. If you will. And it has a fairly decent amount of error, right. If you did an array of
infrared proximity emitter detectors, they don't have to be paired up you can get something that
kind of approximates that in the mouth you can see how you might be able to do tongue shape in
that way. We haven't tried that I don't know if it actually would work.
>>: Or resistant touch pad. Resistant touch pad.
>> Scott Saponas: For when you're in contact, yeah. So actually I think pairing up things like
resistant touch pad plus this plus a couple of physical points as a kill switch, stuff like that, then
you would start to have a really complete package of interaction possibilities all on a retainer
wirelessly in the mouth then you start to be able to get to a really, really high bandwidth device
that's much higher than all those technologies I showed at the beginning, with the exception of
obviously speech.
I would think of this technology as being complementary to speech. You would just use it in
different situations. But everything else besides speech from the beginning one is pretty well
bandwidth, and this could be a lot higher bandwidth.
>>: Anyway to detect the opening or closing, that's one -- the [inaudible] feature for speech.
>> Scott Saponas: In the back of your mouth.
>>: Yeah, in the back -- no, right behind the palate.
>> Scott Saponas: Yeah. That's a good question.
>>: Soft tissue there. So close or open.
>> Scott Saponas: Yeah, I think you could. I mean, the right set of, either camera, infrared
detect -- so you can -- how far back is that from the like of the front of your mouth.
>>: It's the opening, it's ->> Scott Saponas: How far back is the opening.
>>: [inaudible].
>> Scott Saponas: Because the farther back it is in some ways, if there's a direct line, the easier
the sensing problem is.
>>: [inaudible] farther back then you can put in ->> Scott Saponas: If it's a straight line. I don't know if it's a straight line or not. If it's line of sight,
then sight, if you will, when you move beyond like a few centimeters a lot of these sensing
problems get a little bit easier because a lot of them have a bunch of interference problems when
you're close up. So you might be able to even use things like.
I don't think I'd put ultra sonic in there but it's possible it would be useful. It's hard to use ultra
sonic really close but when you get a little bit of a distance you can.
>>: Aleaf who makes the jaw bone sensor started their technology by using an ultra sonic
imaging device they hold up to the throat it would actually image the throat cavity for speech.
>>: [inaudible] it was over ->>: Then they would use the volume feedback to modulate.
>>: Various oscillators to try to synthesize speech.
>>: It could include entire touch.
>>: Pardon?
>>: Does it measure entire touch.
>>: No [inaudible] but it tries to image the part of the throat that.
>>: That's the ->>: That affects speech and they use that metric for modulating whatever to synthesize speech.
>>: Cool.
>> Merrie Morris: Thank you, Scott.
>> Scott Saponas: You bet. And you guys are patient, I will run up to my office and run back
down and you guys can actually see what these physical prototypes look like.
Download