>> Bodhi Priyantha: Good afternoon everybody. It's my... Gummeson, a fifth year PhD student from University of Massachusetts,...

advertisement
>> Bodhi Priyantha: Good afternoon everybody. It's my pleasure to introduce Jeremy
Gummeson, a fifth year PhD student from University of Massachusetts, Amherst. During his
summer with us this summer he built his audio and accelerometer sensor-based user input
device that works on a ring platform. Jeremy.
>> Jeremy Gummeson: Thanks for the introduction, Bodhi. Before I came for my internship
this summer I knew I was going to be working on some kind of ring platform. We weren't
exactly sure what the application was going to be and what we converged on was doing gesture
recognition and we do that specifically by doing something called sensor fusion, so I'll go more
into what that means later. Over the last 50 years or so, I mean we've seen this rather obvious
trend where we have computers that are kind of very far away from us that are coming closer
to our bodies. In the ‘60s and ‘70s we're using dumb terminals and maybe remotely accessing
computers in a remote data center. Personal computers kind of emerged in the 1980s where
people actually own their own computers. Laptops, we finally had computers we could carry
with us. Smartphones have been really popular over the last decade and as we all know kind of
the big trend over the next ten years is wearable computing where we can think of devices such
as Google Glass. There was someone wearing that around the building today. There's the
Pebble Watch which got a lot of exposure through Kickstarter, and then augmented reality so
that's the Oculus Rift Platform that you can actually wear and experience augmented -- it's like
virtual reality for playing video games, right? So we want to put this a little bit further and have
a platform that's even less obtrusive, so something that you could wear every day and maybe
not even notice that it's there. So why not a ring, right? In popular culture this has been kind
of a very popular thing. You have this ring the kind of gives you some kind of magical powers,
right? In the ‘80s there was this television show where these kids had these rings that they
could summon a superhero. There's this guy here the Green Lantern, he can create force fields
using a ring. And then of course we're all familiar with Lord of the Rings. You have this magic
ring that let's you disappear. Unfortunately, I learned very early in the internship that
superpowers aren't going to be feasible. The big reason I can think why, right, is the energy
bottleneck. I mean if you wanted to do all of these things, you'd need to store a lot of energy in
a small ring. On the left here, this is a ring prototype that Bodhi had actually built and it has a
small area where you can use buttons to enter inputs. And this is a picture of the battery that's
actually part, it's in the ring; it's underneath the board here and it only stores about 1 milliamp
power of energy, which is a small amount. So the trick that you use to keep this ring powered,
if you can see it here, there's some copper windings. There's actually an inductive charging coil
and the idea is that you can recharge the small battery kind of opportunistically. If you are
wearing a ring on your ring finger and you are holding your phone, a lot of phones now have an
NFC and you can kind of opportunistically recharge the ring’s battery when you use your phone
normally throughout the day. Actually this year at MobiSys Bodhi and I had a paper that
explored that idea in a little bit more detail, so when you are looking there at a security
application. The big take away there that is kind of relevant to this work is that we found that
using like a very aggressive harvesting strategy we can harvest up to 30 milliwatts of power
from a phone. That's a lot of power when we compare kind of the low power consumption
costs of sensors now. So this is an opportunity, right? We have this remote charging source
that we can use to replenish the small battery on a ring platform. So what do we do with the
ring? There are a few ideas that we explored early in the internship. The first was continuous
health monitoring, so one thing that we thought about was monitoring, you know, something
like a pulse ox sensor or maybe doing galvanic skin response to understand someone's
emotional state. Unfortunately, we found that a lot of, we found that the signal integrity that
you need for those different health related things for various reasons, I mean the surface area
on the ring is so small. For example, for galvanic skin response, that would mean the electrodes
would have to be very close together and you get a very weak signal which isn't necessarily
really good for detecting health. That also, we had this idea of doing motion-based HCI, so
maybe you can detect that a user is frustrated and then maybe somehow change web search
results, something of that nature, right? But we couldn't do that because we couldn't
implement those sensors. So instead we focused on this idea of a gesture, the ring is a gesture
input device. There are actually a couple of ring like things out there that do this. They're a
little bit clunky though, right? The thing on the left there that's basically a miniature trackpad
that you can wear on your finger, so if I was giving this talk and I wanted to be able to flip
through my slides, I could do that from a ring. But we think that we could do better and have
something that is much more seamless and something that you might be willing to wear all day.
Our project goal is to implement a ring based always available data input device. We can think
of data input in a variety of ways. In the first place there are UI actions, to say you are in a web
browser, you want to navigate back and forth between pages that you visited, you know, you
could implement those gestures, maybe scrolling up and down on a page, that would be
another two gestures. Another way you can enter input is kind of like a virtual keypad like we
have on our phones. You have a virtual keyboard and you can enter in individual letters.
There's more advanced ways of doing this. It's called shape writing. Yes?
>>: Have you thought about doing auto unlock on my phone with a ring?
>> Jeremy Gummeson: There are actually a couple of applications out there where I've seen
there's a ring that kind of has just a passive NFC tag in it and basically when it sees that tag ID it
does some single action on the phone. But we're looking at doing more than just one action,
kind of a richer set of inputs to a device.
>>: Following that, this is a problem we all have at Microsoft [laughter].
>>: [indiscernible] there's another project that [indiscernible] is doing with [indiscernible] as
part of the security. We could talk about it off-line.
>> Jeremy Gummeson: So what we're going to focusing on in this talk is kind of the two last
cases here, so character input and shape writing, so basically looking at characters as shapes
and trying to detect those shapes accurately so we can emulate characters. There's obviously a
variety of challenges in getting something like this to work effectively. The first challenge is
related to energy, so how do we keep the ring always available for input when we are only
relying on this, you know, harvested power from a phone and a small battery? So the second is
sensing. If you have sensors that are located, you know, in this segment of maybe the index
finger or the ring finger, you might not be able to get good gesture, you know, accurate sensor
readings to have accurate gesture recognition. The third is computation, so how should the
ring process the raw sensor input? Should you do all of the processing on the ring? Should you
do some processing on the ring and then push those results and have something that has more
computational facilities do the rest? This trade-off between computation and communication
is also important. Otherwise, you'll end up killing the battery on the ring. Yes?
>>: [indiscernible] if I'm [indiscernible] the ring, is it easy for me to [indiscernible] the phone
through my ring or just [indiscernible] how about the perception of me looking at the device
phone [indiscernible] because it's already there? How about the perception of [indiscernible]
providing advice [indiscernible] that's not? How do you think [indiscernible] do that? Would
you feel comfortable the ring that?
>> Jeremy Gummeson: So you are saying that you have some kind of remote display in wearing
the ring and you want to be able to interact with it?
>>: [indiscernible] I think [indiscernible] would be the user perception how does the user…
>> Jeremy Gummeson: Sure. That would be a fifth challenge, right.
>>: Like [indiscernible] the watching screen [indiscernible]
>> Jeremy Gummeson: That would require, you know, a user study to understand what that is.
Yeah right. Okay. So there are several different approaches towards entering symbols on
different types of devices. So BlackBerry is kind of on its way out but it was really popular
because it had this very accurate tactile method of input where you have these keys that you
actually press, so a few advantages of pressing buttons, right? You can do this at really low
power. You are basically processing interrupts instead of having to continuously process sensor
data. It's very accurate, right? I get a nice tactile feedback when I want to push a button. You
might be faster with buttons as well in entry. A couple of the limitations are form factors,
right? If I want to have a lot of buttons, I need a lot of space. If we're thinking about something
like a ring, you don't have a lot of space to work with. Then how do you map a rich set of
symbols if you are kind of constrained in terms of space, how do you have a rich set of inputs?
Another thing that became popular I think starting with the Nintendo Wii a few years ago is
doing gestures in 3-D space. One of the advantages here is it's very verbose, so you you can
move your arm in very large areas and do a lot of different types of gestures and these motions
might be really natural to people. But one of the issues here, right, there's a lack of intent. If
I'm wearing a ring and I wanted to process things in 3-D, you don't want the user to have to
press a button, so I'm moving my hands around all during the day and I don't want these
spurious movements to be misinterpreted as gesture inputs. So this is a problem, and these
can also be low accuracy, right? It's kind of hard for me to kind of perceive where I am in this 3D space in front of me. Like if I wanted to accurately reproduce a gesture, it's not exactly clear
where the exact boundaries are. You also need to deal with things like variable orientations
because you're in 3-D space and then this can also be higher power because you have to stream
out maybe a lot of accelerometer readings in order to localize the thing in 3-D space. So a third
way that we can consider is doing something in 2-D space. One of the advantages is -- so this is
one of the first things that we learn in school, so handwriting. We learn how to write letters on
a flat surface. So some of the reasons why this is attractive, so you get active feedback, so
when I'm sliding a pencil on a surface I can feel the vibration of the pencil into my hand and I
know maybe how fast the object is moving. If I'm thinking of even just writing doing
fingerpainting, right, I can feel my finger moving along the paper. This is something natural
because people have been doing this, you know, for ages. You look at cave paintings and
tablets, I mean, this has been happening for a while. And surfaces for doing this are available
all around us, so if you aren't restricting yourself to like capacitive touch surfaces, maybe if
you're considering tables, or whiteboards. Maybe it could even be a your trousers could be a 2D surface that you would use to enter input. I mean it's always readily available. A couple of
the challenges here are surface detection. So how do I disambiguate when my hand is in the air
and when it's actually on a surface? That's an open question. So it's less verbose than 3-D, so
I'm kind of constrained maybe to a smaller space where maybe I can move my finger in a
comfortable way and then power is a question. So we don't really know how much power it's
going to cost to do this surface detection. Okay. So how does a user input 2-D gestures on a
ring? This is our idea on how you would do this. Step one, the user will initially tap on the
desired surface. That's when you first put your finger down on the surface that you want to
interact with. Step two, this is actually an optional step. This is, so you have this challenge that
you need to understand what coordinate space you're talking about, so the user might first
have to enter some reference gestures to let the system know kind of how the user’s oriented
relative to the surface. Thirdly, the user enters a series of strokes on the surface to interact
with some other device. The fourth step which the user doesn't do and it's implicit is they stop
entering strokes and then the ring will go back into a low-power state where it's not
interpreting gestures anymore. With a 1 milliamp hour battery that well actually fit in a ring
like platform, so based on some back of the envelope calculations we did on hardware that's
available on the market that we actually used to do some prototyping, so we found that giving a
3700 micro Jewell battery capacity you can do about 4000 gestures on a full battery. And given
our previous results on NFC harvesting, you can recharge all of that energy from the phone in
about 20 minutes. The idea is you are using your phone periodically during the day and that
battery will kind of keep being topped off and then when you want to do another type of
interaction without the phone there will be energy available for you to do that. We haven't
actually evaluated how effective that is. It would require a longer-term, you'd have to have
people wear these things and understand how often they use phone, but that's another topic.
The first challenge that I want to talk about is segmenting gesture data. We need to know
when one gesture starts and another stops. This is just me sitting at my computer and I'm
sliding on my trackpad and I'm actually inputting an up gesture now, but it isn't clear from the
video at least whether I'm moving up or down. You can't really tell when my finger is raised off
the surface or when it's down on the surface. It's kind of this fuzzy notion. How do we do this?
Doing this there's two main challenges, so the first of which is detecting whether or not finger is
on the surface. First we need to know if it's up or down and then if it's on the surface. And the
second is distinguishing between different gestures while the finger is moving on the surface.
We know that the finger came in contact. Now what is it doing while it's on the surface? The
first part of the talk is going to be how we detect the surface. Related to those steps that I
showed earlier, we want to first detect when a finger lands on the surface. A great way to do
this, so you use an accelerometer. What you look for is a sudden deceleration in the z-axis.
When the finger stops when it hits the surface, you'll see a spike and one of the great parts
about currently off-the-shelf available accelerometers is you can do a threshold based wake up
for very low power costs. For less than a micro ampere you can determine when these spikes
occur. Then the second part is continuing to detect the finger as it moves across the surface.
Surface friction, is emitted from most surfaces as audible, band limited noise. What we're
doing here is we are using a low-power microphone and some signal processing techniques to
observe the surface friction noise and we are able to do this at reasonably low power
consumption, less than a milliamp with an optimized circuit design. What this lets us do, so if
you combine this with the accelerometer, another added benefit is that it reduces false
positives from spurious taps. Say if I'm wearing this ring and I'm nervous and I'm tapping, you
know, on the side of my trousers, you'll have a lot of false wake ups and you waste a lot of
power. In order to keep the system up in an active state what you look for is a tap followed by
the surface friction noise, and if you don't hear both you quickly turn the system back off.
>>: Is there some sort of input to the user that tells you it's [indiscernible]? I know that…
>> Jeremy Gummeson: There is one prototype now, but you could envision maybe quickly
pulsing an LED, something like that to…
>>: [indiscernible]
>>: For the accelerometer the z-axis is always vertical with [indiscernible]
>> Jeremy Gummeson: You can actually use, you don't have to just use the z-axis. I think it
looks for actually any of the axes. I just said the z-axis here because I'm assuming you have a
horizontal surface and maybe that's the one you're looking at in particular.
>>: So is the wake-up power for the ring itself?
>> Jeremy Gummeson: The wake-up power, so that is just for the accelerometer. I'll have a
slide on, a number for this little bit later, but basically you're in this low-power state where you
have everything asleep. The accelerometer is using only .27 microamperes of current and it's
just waiting for these spikes.
>>: And what about the battery [indiscernible]
>> Jeremy Gummeson: I have more slides on that. This is our initial experimental setup. I have
to be honest. It's not a ring yet. That's kind of a work in progress, but what we've built here,
this is a bicycle glove I bought at the Commons and this is a, it's a commodity accelerometer on
an evaluation board and it's in the location on the glove where a ring would be located. So the
sensor data that you get from it would be reasonably close to what you would get from a ring.
And then it's connected to a microcontroller that basically outputs the accelerometer readings
over a serial port. We also have a low-power mems accelerometer basically on the other side
of the finger, on the opposite side of the finger from the accelerometer and we basically are
outputting all of the data from that microphone into a PC sound card so that we can analyze
what these surfaces sound like.
>>: Why did you use the index finger?
>> Jeremy Gummeson: The index finger is easier to write with.
>>: Yeah, but nobody wears rings on their index finger.
>> Jeremy Gummeson: They might. [laughter]. They might. It's possible that you could do it
on the ring finger as well. Maybe the motion within the two are correlated, but we started with
the index finger. The first part of detecting the surface is the finger impact. This is just a time
series trace of data that I get from the three axis accelerometer and these spikes that you see
around two g’s here, that's actually when the finger is striking the surface. It's reasonably easy
to detect that. But what about the other part? I said there were two parts. First there's the
impact. Then there's the sound that the surface makes. Let's do an evaluation kind of right
now. What does this sound like? The first is a really loud sound. This is a piece of Styrofoam
and this is what it sounds like. It's almost like nails on a chalkboard, very loud and easy to hear.
This is going to play all the way through. Okay. And the next is a surface like wood. You might
not even be able to hear this. We are actually able to pick it up with the microphone. And then
the other scenario you can think about, right, is when there's external noise that might be
swamping the signal that you're looking for. So here's some children playing on a playground
and I was playing this sound from a laptop next to where I was performing a series of gestures
with the ring. I'll show in one slide that we're actually able to disambiguate the two signals. I
actually evaluated 12 different types of surfaces and they all produced some common
frequency band limited noise and in addition to those evaluations we did, we looked at this
journal paper from the Acoustics Society of America and they kind of confirmed our suspicion
that a lot of surfaces have these common frequencies. These are actually results, the results
that I referred to in the last slide about what this looks like when we have the noise of these
children playing on a playground and then me performing a series of gestures. On the left here,
these red regions -- first I should explain the plot here. What we have is a spectrogram. The xaxis is the frequency components of the signal that we're looking at and the y-axis is the time
that that special content was present. What we see here, all this red is the children playing and
these yellow bands here that are separated by blue bands; this is me dragging my finger across
the surface for about a second, picking it up and dragging it again for a second. There's a lot of
space here where they don't overlap. One place where they do, so in the lab where I was in the
experiments there's actually some servers on and they generated this yellow band that is kind
of there present through the whole trace, but there's plenty of frequency space around that
that doesn't overlap. If we wanted to be even more immune to noise, there's a couple of other
techniques that we could use. We could use something like dynamic filtering where we have a
programmable filter that can look at different regions of frequency content. The second might
be time domain analysis, where you have really short-lived noise you could filter that out
because that doesn't look anything like these surface movements. It's much shorter. To do this
audio processing, there's three steps that we need to do. We need to do processing because
the ring isn't going to be able to look at raw audio samples. You'd have to sample audio at 44
kHz and that would very quickly drain that 1 milliamp battery I showed earlier, so we use a
hardware filter. First you apply a bandpass filter that's constructed to look at that region that's
usually separated from human speech. Then we apply some gain, so we need the signal to be
bigger in order to be interpreted by a microcontroller with say analog-to-digital conversion.
Then we also want to use something like an envelope detector so we can actually sample that
signal at a relatively low frequency and understand when that noise is present and when it isn't
present. What does this look like after bandpass filtering? What we see here, this is another
spectrogram plot. This time I was drawing the letter L on a surface, so these red bands that you
see here that are very close together, those are individual strokes of the letter L. We're able to
do this with filtering and gain and so after the envelope detector we can actually see the two
peaks that correspond to the strokes of the letter L. This is a promising result. This is all
implemented in Matlab. This isn't from an actual hardware implementation of the filters, so we
were kind of guided by these results to do an actual filter design. This is what it looks like. It's a
little bit of a mess right now, but what we have here is, so these two boards here are the
envelope detector. This one does some filtering and some gain. We have a microcontroller,
the same accelerometer from before and a similar microphone from the previous setup. We
chose the particular components that we used so that the op amps that are used for the
filtering are very low power so all of them together used 620 microamperes. The
accelerometer even when it's active uses only three, and the micro controller, I think there are
more efficient ones on the market. The one that we use was around 270, but if you sum all of
those up you have less than a milliamp budget. In standby, so this is with the accelerometer in
its low-power mode and when the microcontroller is also asleep we consume around only one
micro ampere of power when everything is off and waiting for a user to interact with the
surface. Let's revisit writing the letter L but this time with our actual hardware. These are
traces that were actually output from our serial port, so basically our microcontroller was just
reporting the 80 C samples that it got from our audio filter and we're actually able to distinguish
the letter L. These are spaced a little bit further apart than the previous plot, so this is actually
from an actual user, so I had a few people enter gestures for me and this person happens to
enter it more slowly than I did in the previous, but because of the filter characteristics even if
the strokes were closer together it would still work. Yeah?
>>: What is the precision of the recall? Even though how long, if I write a T would it be
recognized as L?
>> Jeremy Gummeson: I'm going to have a few results on that later. This was just kind of to
prove that the envelope detector is doing the right thing.
>>: [indiscernible] fingers distinguish it from…
>> Jeremy Gummeson: This is just the fact. So L contains two strokes. I move down and I
move right. All this is showing is that I can get two strokes per L. They may not be the right
strokes, but there are two. That's what this plot is showing. To get ground truth about how
accurate this surface detection is on the movement, we actually used a off-the-shelf capacitive
touchscreen. What we did is we recorded the coordinates of the finger on the capacitive
touchscreen over time, so we knew whether or not the finger was actually moving. We also
recorded audio from the finger motion. What we did is we compared timestamps of the audio
and also the timestamps of the touchscreen coordinates and we found that they correlated
pretty well. I don't have a plot here to show that, but it proved that our audio detection
scheme is doing the right thing. So I just described how we detected the finger is moving on a
surface. I'm going to go to the second part which is how do we construct a symbol based on
maybe movement on a surface. One thing to note, right, is that symbols can be arbitrarily
complex. You think of maybe not American English where the characters are simple, but maybe
like sort of Japanese or Chinese. You can think of very complicated symbols that a user might
answer. One naïve strategy that you might do is you might try to put all of the computation on
the ring and compute the entire symbol and then tell the end device this is the character that
they entered. This could be computationally challenging, so you might need to use advanced
machine learning techniques, things like language models. You might have to update
vocabulary specific to a user. Maybe one user writes a little bit differently than another and
you might have to do customization there. Instead of trying to compute the entire symbol on
the ring, we break it up into a series of strokes. For example, if I write the letter A that's going
to be two diagonal lines in different directions followed by horizontal line. I would send each of
those segments to the end device and let it figure out that that is the letter A. So kind of an
architectural view of what we're doing here, so at the bottom we have fingers on a surface and
we're detecting some signals. The ring is detecting those signals, converting them into strokes
and then those individual strokes are sent to an end device; maybe it's a Windows Phone and
it's being converted into symbols and words. An example that I've been referring to is
handwriting. This is a letter B. this is the letter L and the letter W. This is how it's decomposed
into strokes and kind of the ordering of those strokes. Then the way that you might report this
to the end device, you might have different IDs corresponding to the different stroke primitives
you support, some timing information. It might be important how quickly, how the strokes are
grouped together. That might help disambiguate one symbol from another and maybe even in
the character, like if I write the letter A, for example, the two diagonal lines might while I'm on
the surface and be close together and the horizontal might be further apart, so that timing
information could be helpful in interpreting it later. This is great because we can send a few
bytes of data instead of sending out 400 Hz accelerometer data which would completely kill our
battery. So the two core system challenges here, again, so it's identifying the beginning and the
end of a stroke reliably. And then the second is using sensor fusion between the microphone
and audio circuit and the accelerometer to understand the relative directional properties of an
individual stroke. This is a user that I was actually collecting data from wearing our prototype.
What we have here, so the x and y coordinates are oriented, so they're facing the table and a
piece of paper, so x is positive in the right direction and then y is positive in the direction facing
away from the user, right? What the ring actually sees is something a little bit different. We're
going to have some tilt in the x and y, so basically what will happen is the part, so gravity is
detected by the accelerometer and then it's going to be found in some set of these axes. You
have to normalize kind of the coordinate space of the ring to the surface and then the other
thing that can happen is the user can actually have their finger rotated, so that will actually
confuse your x and y axes. So combining the microphone and accelerometer, how does it
work? Step one, after that tap happens and the finger is first touching the surface you can
compute the finger angle relative to the surface during idle periods. That's to get rid of that z
component from gravity. And then the second step is identifying that the finger is moving on
the surface, so you get that audio envelope and that lets you know when the stroke is actually
being performed. Then step three, you can observe the finger accelerating and decelerating in
different directions depending on what gesture the user is inputting. Then you can use some
physics-based heuristics to figure out what that direction is. If you want to move a finger, you
have to accelerate and if you want to stop you have to decelerate, so the accelerometer is
going to definitely pick up those signals. Kind of a laundry list of different stroke primitives that
we want to deal with, so first there's the easy ones. There's up, down, left and right, so
basically you're just looking at the signs of the different axes of the accelerometer. Then kind of
a medium difficulty; I call it medium because you have to actually look at combinations of the x
and y axes to detect what type of diagonal motion you're talking about. Then the third is hard.
I call it hard because now we actually care about the shape of the accelerometer motion so you
can detect things like centripetal motion in order to understand that a curve is happening as
opposed to a straight line. During the course we wanted to do all of these. That's the end goal.
During the internship we focused on the easy and the medium difficulty strokes. Now I'm going
to go into kind of how this works. This is data from an actual user performing an up gesture.
The red plot here is the output of our envelope detector. What that lets us do is, so say if you
set a threshold and you look at your analog to digital converter and you see that the voltage
went above .2 volts and goes back below .2 volts, you decide okay. That's the boundary of
when the finger was moving and then I can draw a line down the middle. There's the first half
of the stroke and the second half of the stroke. What we see in black and green, we have the xaxis in green and the y-axis in black. We actually see the finger accelerating and decelerating,
so we have 0. Because the axes are actually backwards, so negative means acceleration and
positive means deceleration, but if we look at the two halves of the finger movement, we can
clearly see this by looking at the y-axis and that signal is much larger than what we are getting
on the x-axis. We can probably figure out that that is a vertical motion as opposed to
horizontal. Let's look a little bit more at this. I just took away the envelope detector plot and
this is just the accelerometer. Basically what we do, it's very simple, we look at both halves of
those intervals, T1 and T2 and we do an integration. We integrate the first half of the x-axis and
the second half of the x-axis and do the same thing for the y. Based on the relative signs of
those integrations and then which one where the total integral is larger than the other, we can
determine which axis had the motion and what direction that was, whether it was forward or
backward or left or right. In this case we did up, so we see that the prominent axis is the y-axis.
So this is what we did. We compared the integrals of the dominant axis during the two halves
of the movement period.
>>: What is this thing on the right, the y-axis?
>> Jeremy Gummeson: So what is the figure?
>>: [indiscernible]
>> Jeremy Gummeson: Yeah. That's computing this integral right? So it's negative.
>>: [indiscernible]
>> Jeremy Gummeson: So basically I'm just summing over the whole thing and then
representing that the entire sum as kind of the -- you can think of it as sort of the average, I
think it's the average speed over the first half.
>>: [indiscernible]
>> Jeremy Gummeson: Total integral value is what I computed over that half.
>>: Your gesture is from bottom to top and then returned, or just bounded up?
>> Jeremy Gummeson: Just bottom to top and then we define down as top to bottom.
>>: Then why is the [indiscernible]
>> Jeremy Gummeson: Okay. These are actual, this is the raw data that is from the ring, but
because I know how the accelerometer is actually mounted on the ring itself, I actually, so in
software you can actually reverse the axes and figure out the sign you're looking for.
>>: [indiscernible] graph on the envelope, you are going to the peak and then going back down.
>> Jeremy Gummeson: This is just the magnitude of the audio signal.
>>: Oh. So during the peak it's actually transitioning, from bottom to top.
>> Jeremy Gummeson: Yes. Exactly, so I'm speeding up during the first half of the finger
motion and then I'm slowing down.
>>: And the gesture is finished when your fingers at the top. You don't bring the finger back
down?
>> Jeremy Gummeson: You don't bring the finger back down; that's right. In order to do up
down, left and right, it's a simple process. I just filled out a table here for the signs you are
looking for. Say if you correct for the axes, so what you are you looking for is an initial
acceleration in the y-axis and then a deceleration in the y-axis and you're looking for basically
no activity in the x. The opposite is true for the down and then you look at the other, the x-axis
for the horizontal movements. You can come up with a very simple algorithm, right, that does
the integration and then compares which axis is dominant and what the sign is. This isn't too
hard. One of the advantages of only looking at these four gestures is that you're relatively
immune to rotational drift, so there's 90 degrees of difference between say up versus right or
right versus down, so if the person kind of drifted while they were entering the gesture and
maybe entered something that looked like a slight diagonal line, you would get what they
intended and actually get up. One of the cons is that you are you limited to four features.
That's not to say that we couldn't say string a bunch of these horizontal and vertical primitives
together to do something more complex. You could do that, but you might want to have a
richer set of strokes to begin with. Yeah?
>>: How did you send it to [indiscernible]
>> Jeremy Gummeson: It's not actually that sensitive to that threshold, so basically the
important part is picking it to be the same on both sides. As long as you pick points
[indiscernible] on both sides, you're going to have a symmetric view of the motion and when
you do the integration the math works out. So you want something that's wide enough, right,
to be able to see as much of the signal as possible, but not so wide that you're actually getting
some of the accelerometer noise as part of your computation.
>>: [indiscernible] right now instead of observations [indiscernible]
>> Jeremy Gummeson: Okay. To understand how well this works, so first we looked at four
gesture classes, up, down, left, right, so I had five helpful people enter gestures for me and I
looked at these four gesture classes. I asked the participant to basically enter one of these
gestures ten times in a row for each of the gestures. I collected all the data with the glove. I
outputted it over the serial port and then I analyzed it off-line using Matlab, so basically what I
did is I implemented that simple algorithm in Matlab and saw if I could reliably determine which
gesture was which. For one user, this user did particularly well. We compute the correct
gesture among the set of gestures that are available. A hundred percent of the time except one
time we computed, we falsely interpreted the down gesture as a left gesture. That's not
perfect.
>>: [indiscernible] not for now, but to look at if the user on the input, so how are they actually
interpreting it.
>> Jeremy Gummeson: That's actually a great idea and I should've done that. What I did do, so
after each user was done entering the gestures, I took pictures of their hand and kind of what
kind of orientation they were using during the whole session. But to understand the dynamics
while this was happening that would've been really valuable and the next time I collect data on
this that will definitely be something that I will do. Then when we look across all five users, of
course things degrade a little bit, but not by much. Again, we confused the down and the left
gestures a little bit more, so we went down to an 86 percent accuracy there.
>>: Is this all right-handed people or left-handed people too?
>> Jeremy Gummeson: They were all right-handed. So the glove, it's a right-handed glove so I
didn't actually check but based on the…
>>: [indiscernible]
>> Jeremy Gummeson: Yeah. If we add a little bit more complexity, right, we can think of doing
diagonals. The way that we do this, so first right we have up, down, left, right, and they look
exactly the same as they did before. But then we kind of have this fuzzy notion of where, if
you're doing a diagonal line, you are going to see some amount of activity in the x and then also
some amount of activity in the y. If you're asking people to do diagonal lines, if they were
drawing a 45° angle and if your calibration is correct, you should see the same magnitude. But
in reality, the users as they input the gestures are going to drift a little bit and they might
actually even rotate their finger, so they might be entering the angle correctly on the table, but
then you misinterpret it because your axes aren't aligned anymore. In this case the user was
entering a down right stroke. This is after I do the integration, you see a comparable amount of
activity in both the x and y-axis, but the signs are opposite, so that's how you figure out the
directionality. Yeah. As I mentioned, it's very susceptible to individual user variations and also
to finger rotational drift. So the first could be fixed if you had a scheme that a kind of adapted
to different users if you use one of these more advanced learning techniques. And also that
type of approach could also help you handle more of the rotational drift. If you actually wanted
to completely solve the rotational drift problem, if you wanted to completely solve it you have
to add a Gyro, but it turns out that gyros right now cost significantly more power than an
accelerometer so we're trying to avoid using that. It's certainly something that you could add if
you were willing to deal with a bigger battery. First I'm going to look at what I call the best
user. They got almost a hundred percent accuracy across all eight gestures. In one case, so it
was the down gesture, it was misinterpreted as a down left, and that's a reasonable mix up. So
maybe when they went down that time it looked more like a 45° angle than like they were
going to down.
>>: [indiscernible]
>> Jeremy Gummeson: Yes. It was. It was neither myself or Bodhi, I might add. [laughter]. So
when we added all users to the mix, things actually degrade significantly. In the worst case, so
when we're entering the up right gesture, right, so we only get it right 54 percent of the time.
20 percent of the time we think that it's right and then 26 percent of the time we think that it's
up. So some of the users were probably entering an angle that looked more like right than a
diagonal, right or down than a diagonal that's in between. This is something that we want to
address in the future using some lightweight machine learning approach. SVM was actually one
of the suggestions that came to us from a machine learning expert. But I mean this is an
encouraging result that this is possible. This is good enough that maybe machine learning can
help push it up to maybe 80 percent plus accuracy.
>>: So you instructed before the experiment to like try to draw like straight lines.
>>: Yeah. It wasn't specific. I didn't say draw a 45° line. I said draw an upright gesture. So
maybe if I was more clear, maybe I would have gotten more accurate results, but I wanted to
kind of -- I mean, I wanted to observe variation in users, so how do people actually use this
stuff.
>>: [indiscernible] understand, you know.
>>: And they were not following the [indiscernible]
>> Jeremy Gummeson: There were no reference lines drawn in the table either, so what I had
was a blank white piece of paper and they used their imagination.
>>: [indiscernible] up to a hundred.
>> Jeremy Gummeson: Okay. Slight math mistake there.
>>: Do you have a diagram as to how online your accelerometer axis to the actual table?
>> Jeremy Gummeson: Yeah. I can sort of compute that for each individual stroke that users
interpreted, so I can actually adjust for it after the fact. When I do the data analysis, right, at
the beginning I calibrate the axes and then I leave it alone for when I look across the whole
trace. But if I wanted to determine how much it drifted, because I know the ground truth of the
gesture that they entered, I could actually adjust how that angle is.
>>: I think it would be [indiscernible] to see how drift versus a constant bias. For example,
[indiscernible] draw with the line, I would want to see where things are slightly sort of…
>>: Yeah. But that's [indiscernible] you are not doing [indiscernible] instruction in the
beginning?
>> Jeremy Gummeson: I am.
>>: You are? Okay. And where are you doing that?
>> Jeremy Gummeson: I'm doing this when the finger first touches the surface and it's not
moving.
>>: Are you doing it [indiscernible]
>> Jeremy Gummeson: Yes.
>>: Oh wow. You are going to force power [indiscernible]
>> Jeremy Gummeson: Right now I am doing and off-line in Matlab.
>>: But you are doing it through [indiscernible] so what you are doing is rudimentary
[indiscernible] and this like [indiscernible] and [indiscernible]
>> Jeremy Gummeson: So we are not computing sine cosine.
>>: But that [indiscernible] error, protection error.
>> Jeremy Gummeson: That doesn't help; that's for sure. Yeah. Okay. So now that I've kind of
showing you what eight gestures look like I want to show you what combining strokes together
looks like. I talked earlier about combining strokes into doing letters and so this is actually a
user entering the letter Z. That's a combination of diagonal lines and horizontal lines and the
four peaks you here in red are individual strokes of Z. So this is someone that drew Z with the
line in the middle, so you see these are the three first strokes of the Z and then the fourth is a
line in the middle. So this is just an example of three instances of this particular user entering Z.
When you look across, so there were two users so in the general case it didn't do that well, but
two users I was able to get 70 percent accuracy in detecting the Z just using these simple
heuristics that we've developed so far. Future work for this is doing what I call advanced
gesture detection, so if you have something like a left circle up gesture, so say I am writing the
letter B and the B will consist of a vertical line and then kind of two half circles to the right.
Maybe the direction that those half circles are made is important as to how the letter is
constructed. We can actually see some interesting features of those motions on the
accelerometer right now; we're just not completely sure how to deal with the signals. For
example, I have two entries here from one of my users where they did a left circle up and a left
circle down and so, for example, the y-axis here so you see that it's concave and convex in
different parts of the curve during the first and second half? And then, for example, another
thing we noticed is that in the x-axis if you look at kind of the energy of the signal, there's more
energy in the signal in the left half than on the right, so that might be able to give you your
directionality. So one of those two axes will give you directionality and then the other might
tell you that the you're drawing a circle, right, because of the characteristics of the centripetal
force. I'm not going to say much more here because this is all fairly speculative, but I think that
a machine learning approach might be able to deal with more complex signals like this. Yeah?
>>: On the previous slide, the slide with the Z, were you disambiguating that from other
characters like from an S or an N?
>> Jeremy Gummeson: Not in this case. This was basically the user entering the letter Z ten
times in a row and even that was, so I'm not doing any time domain analysis right now either,
so I think the key thing to distinguish different characters from each other would be the gap
between groupings of strokes. But this was just knowing beforehand the letter Z and then
trying to figure it out based on the sequence event.
>>: [indiscernible]
>> Jeremy Gummeson: That's right. That's all I have for kind of current work. To kind of
conclude I showed you the sensor fusion approach that we used just for detecting gestures on a
ring using an accelerometer and a microphone. So our next step we want to add more audio
noise robustness, so there are a couple of things I mentioned about doing this kind of time
domain analysis of the envelope signature to know what is spurious noise and what is a stroke.
And then another way might be having this adjustable filter so that you could look at the region
of frequencies that don't overlap with things that you detected as noise. I mean, you could
maybe change the characteristics of that filter as the user is using the ring. Next, you know,
being able to adjust for finger rotations using reference gestures, so the idea there is maybe
you could even enforce this. Maybe like once every ten strokes you have the user draw a
vertical line and then a horizontal line and if they've changed their finger orientation a little bit,
you could use kind of that reference gesture as a way to realign your coordinate space. The
third thing, so machine learning, so you could use an SVM classifier to actually look at all the 12
gestures including the curves. So far I just showed heuristic-based results for doing eight. We
would like to be able to get up to 12 and to do complete letters and sequences of letters. We
need to do a more extensive NFC harvesting evaluation. Say someone is wearing a form factor
ring and they are using their phone during the day. How much do we actually get in practice?
So in our MobiSys paper this year we actually did some analysis, so there is this live labs project
at Rice University where they had lock, unlock traces from phone usage and that is indicative of
an opportunity that you have to harvest energy from NFC because when the phone is unlocked
you are able to harvest power. By looking at those kinds of characteristics, you get an idea of
how often you could recharge the ring. The next really important thing is like building a form
factor platform. The components that we've chosen thus far are right amenable to
miniaturization, so these are all just simple off-the-shelf op amps that are available in smaller
packages. There's nothing that would prevent us from putting this in a ring size object because
we've designed it around a small battery. It's a power efficient implementation. We want to do
more user studies doing things like different characters in the same session, looking at having a
camera to be better able to understand how people move their finger around while they are
performing gestures. And then finally, doing an end-to-end evaluation with the ring as an
actual UI device. Maybe I'm in my living room and sitting at the coffee table and I am playing
games on my Xbox and then I decide I want to be able to navigate around the dashboard and
select different media and maybe a different game. I happen to be wearing the ring, so instead
of, I could use the table, right? So maybe I'm not even playing a game and I don't want to use
the controller, this might be like a more seamless way to interact with things that are in your
living room. Of course there are a bunch of acknowledgments. First I want to acknowledge
Bodhi; he's been a great mentor, a lot of really valuable guidance in steering the project in the
direction we took. Thanks to Jie and the rest of the Sensors and Energy Group for having me as
an intern. I had a lot of really valuable discussions with different people in the lab that helped
mature the project. I had a couple of discussions with Matthai Philipose and Tim Paek. One of
them is a machine learning guy and the other does stuff with UI so they had a lot of nice ideas
that we incorporated. Of course all of the people that contributed gesture data and finally my
fellow interns. We had a lot of great discussions with people and sometimes the things that
help the project the most are random ideas that people will have over a dinner conversation, so
thanks to them. And thank you all for attending. If you want to get in touch with me after my
internship is over, this is my e-mail. At this point I would be happy to take any questions.
[applause]. Yeah.
>>: You use the x and the y-axis [indiscernible] using the z-axis [indiscernible]
>> Jeremy Gummeson: It could. If the finger tilt changes along the z-axis maybe that would
give you additional hints that might be able to let you tell one character from another, for
example. It might let you more accurately choose the beginning and end of the audio
envelope. Yeah.
>>: [indiscernible] or just the response of that and the sound, right?
>> Jeremy Gummeson: We did time that the movement start just based on the sound and then
after the fact based on when that envelope went above and below a threshold, that's where
we'd know where to look for the accelerometer data.
>>: [indiscernible] surfaces over time [indiscernible] how you differentiate the random task
surface [indiscernible] me adjust my glasses, me scratching my head, me scratching my legs.
How do you differentiate that versus an actual gesture?
>> Jeremy Gummeson: Great. First, if you have a strong -- you can enforce like a strong user
tap to start interaction with the device, so that might help eliminate some of the tapping type
things. Maybe I'm not tapping really hard throughout the day; it might happen sometimes, but
less often. So maybe you can do more careful frequency analysis. Maybe not all surfaces look
exactly the same. The other bit too is say that you are trying to identify different gesture inputs
and all the time I'm just getting garbage. Obviously, you would probably want to turn the thing
back off.
>>: You are looking for a tap and a slide, not just a tap?
>> Jeremy Gummeson: Dimitrios was saying that maybe I tap my face and then I slide.
>>: [indiscernible] [laughter] maybe tap twice.
>> Jeremy Gummeson: Tap twice and then slide. And if that's not good enough, three times
[laughter] and two slides.
>>: Every six weeks we're [indiscernible] [laughter]
>>: So how [indiscernible] regarding some applications of this, how is this sensor techniques of,
are they sensitive kind of let’s say angry touch? I mean in speech recognition say [indiscernible]
common mistake is maybe the subject will speak louder or slower and that actually makes this
worse. So in this case if I touch say harder and will that change the characteristics…
>> Jeremy Gummeson: So you are saying, based on -- so you are saying the user's emotional
state, the characteristics of the way that they answer strokes and characters might change and
that might…
>>: [indiscernible] user may have a chance to learn to adapt to the device so that if they miss it
then the next time they would immediately know how to adjust that compensate. So do you
see this possibility? I think this device basically everybody will have a learning curve to adapt to
it.
>> Jeremy Gummeson: Yeah. So you do need learning in both ways. You can have the ring
learn what the user does and also you can have the user learn when their gestures are not
being input properly. One way you can do that is maybe you can have a plug in or something
on the device that you are interacting with and maybe you have some non-obtrusive,
something like a colored region or something that lets you know whether your inputs are good
or bad. You could also think of having something like a multicolor LED that turns on very briefly
on the ring that lets you know kind of how you're doing.
>>: Obtaining the threshold [indiscernible] because if you are moving it pretty fast you get a
high-voltage but if you are moving it very slowly the actuation could be slower.
>> Jeremy Gummeson: Sure.
>>: [indiscernible] use [indiscernible] most likely one of these things [indiscernible] feedback
that you said, right? [laughter]
>> Jeremy Gummeson: Yep?
>>: How do you do segmentations? So you've got people writing multiple letters in a row; how
do you know which strokes go together to form a letter?
>> Jeremy Gummeson: Right. We don't actually have a technique developed to do that, but
our intuition is that strokes that correspond to one letter should usually be grouped more
closely together than ones that are part of different characters. I don't know whether or not
that's true. It's probably true maybe 80 percent of the time and then 20 percent of the time
you have to do something.
>>: I thought maybe the [indiscernible] weighs more than [indiscernible]
>> Jeremy Gummeson: Right. So the context of the use case matters, so maybe if I'm doing -- it
depends on what type of text you are answering or I mean, if you are you doing simpler
gestures that might not matter so much.
>>: [indiscernible] character recognition systems like PalmPilot [indiscernible]? Pretty fine
characters you had to use because they couldn't recognize the actual letters, but I relied on the
fact that they could [indiscernible] while you put your finger on them, but you can't actually
detect when the person's got their finger off versus down as long as it’s still. By doing up and to
the side you don't know if I pick my finger up or not necessarily.
>> Jeremy Gummeson: That's not necessarily true. You might be able to detect something
from the accelerometer, but right now we don't depend on that. You might see a change in the
z-axis to detect that the finger is moving up. We do know that those are two distinct strokes,
but we don't necessarily know right now whether the finger has been lifted.
>>: [indiscernible] character anything that has a vertical and a horizontal, like a T or a plus and
an L, they all look…
>> Jeremy Gummeson: Again, you can look at the gap between the strokes. I looked at a lot of
the user data, right, and say for example you are writing the letter A and you have those two
diagonal lines, those two are spaced very, very close together. But say if you are writing the
letter T you are drawing a vertical line, lifting your finger, moving over to draw the horizontal,
you see a lot more space between the two.
>>: I think another answer would be you end up writing because the hand is the most
[indiscernible] to do. Maybe that is the main argument scenario for [indiscernible] the benefit
of some UI space.
>> Jeremy Gummeson: Right. Maybe I'm drawing x’s, triangles, circles, squares, you know, that
kind of thing.
>>: If you start to rely on a gap, the time between the strokes, the angry strokes might begin to
affect things. People tend to do things very slowly and deliberately and then you can't use that
[indiscernible]
>>: [indiscernible] that's why you push forward [indiscernible]
>>: That's when you need a [indiscernible] [laughter]
>> Jeremy Gummeson: We need to get the GSR sensor to work so we know how angry they
are, yeah.
>> Bodhi Priyantha: Okay. Let's thank the speaker. [applause]
Download