>> Bodhi Priyantha: Good afternoon everybody. It's my... Gummeson, a fifth year PhD student from University of Massachusetts,...

>> Bodhi Priyantha: Good afternoon everybody. It's my pleasure to introduce Jeremy Gummeson, a fifth year PhD student from University of Massachusetts, Amherst. During his summer with us this summer he built his audio and accelerometer sensor-based user input device that works on a ring platform. Jeremy. >> Jeremy Gummeson: Thanks for the introduction, Bodhi. Before I came for my internship this summer I knew I was going to be working on some kind of ring platform. We weren't exactly sure what the application was going to be and what we converged on was doing gesture recognition and we do that specifically by doing something called sensor fusion, so I'll go more into what that means later. Over the last 50 years or so, I mean we've seen this rather obvious trend where we have computers that are kind of very far away from us that are coming closer to our bodies. In the ‘60s and ‘70s we're using dumb terminals and maybe remotely accessing computers in a remote data center. Personal computers kind of emerged in the 1980s where people actually own their own computers. Laptops, we finally had computers we could carry with us. Smartphones have been really popular over the last decade and as we all know kind of the big trend over the next ten years is wearable computing where we can think of devices such as Google Glass. There was someone wearing that around the building today. There's the Pebble Watch which got a lot of exposure through Kickstarter, and then augmented reality so that's the Oculus Rift Platform that you can actually wear and experience augmented -- it's like virtual reality for playing video games, right? So we want to put this a little bit further and have a platform that's even less obtrusive, so something that you could wear every day and maybe not even notice that it's there. So why not a ring, right? In popular culture this has been kind of a very popular thing. You have this ring the kind of gives you some kind of magical powers, right? In the ‘80s there was this television show where these kids had these rings that they could summon a superhero. There's this guy here the Green Lantern, he can create force fields using a ring. And then of course we're all familiar with Lord of the Rings. You have this magic ring that let's you disappear. Unfortunately, I learned very early in the internship that superpowers aren't going to be feasible. The big reason I can think why, right, is the energy bottleneck. I mean if you wanted to do all of these things, you'd need to store a lot of energy in a small ring. On the left here, this is a ring prototype that Bodhi had actually built and it has a small area where you can use buttons to enter inputs. And this is a picture of the battery that's actually part, it's in the ring; it's underneath the board here and it only stores about 1 milliamp power of energy, which is a small amount. So the trick that you use to keep this ring powered, if you can see it here, there's some copper windings. There's actually an inductive charging coil and the idea is that you can recharge the small battery kind of opportunistically. If you are wearing a ring on your ring finger and you are holding your phone, a lot of phones now have an NFC and you can kind of opportunistically recharge the ring’s battery when you use your phone normally throughout the day. Actually this year at MobiSys Bodhi and I had a paper that explored that idea in a little bit more detail, so when you are looking there at a security application. The big take away there that is kind of relevant to this work is that we found that using like a very aggressive harvesting strategy we can harvest up to 30 milliwatts of power from a phone. That's a lot of power when we compare kind of the low power consumption costs of sensors now. So this is an opportunity, right? We have this remote charging source that we can use to replenish the small battery on a ring platform. So what do we do with the ring? There are a few ideas that we explored early in the internship. The first was continuous health monitoring, so one thing that we thought about was monitoring, you know, something like a pulse ox sensor or maybe doing galvanic skin response to understand someone's emotional state. Unfortunately, we found that a lot of, we found that the signal integrity that you need for those different health related things for various reasons, I mean the surface area on the ring is so small. For example, for galvanic skin response, that would mean the electrodes would have to be very close together and you get a very weak signal which isn't necessarily really good for detecting health. That also, we had this idea of doing motion-based HCI, so maybe you can detect that a user is frustrated and then maybe somehow change web search results, something of that nature, right? But we couldn't do that because we couldn't implement those sensors. So instead we focused on this idea of a gesture, the ring is a gesture input device. There are actually a couple of ring like things out there that do this. They're a little bit clunky though, right? The thing on the left there that's basically a miniature trackpad that you can wear on your finger, so if I was giving this talk and I wanted to be able to flip through my slides, I could do that from a ring. But we think that we could do better and have something that is much more seamless and something that you might be willing to wear all day. Our project goal is to implement a ring based always available data input device. We can think of data input in a variety of ways. In the first place there are UI actions, to say you are in a web browser, you want to navigate back and forth between pages that you visited, you know, you could implement those gestures, maybe scrolling up and down on a page, that would be another two gestures. Another way you can enter input is kind of like a virtual keypad like we have on our phones. You have a virtual keyboard and you can enter in individual letters. There's more advanced ways of doing this. It's called shape writing. Yes? >>: Have you thought about doing auto unlock on my phone with a ring? >> Jeremy Gummeson: There are actually a couple of applications out there where I've seen there's a ring that kind of has just a passive NFC tag in it and basically when it sees that tag ID it does some single action on the phone. But we're looking at doing more than just one action, kind of a richer set of inputs to a device. >>: Following that, this is a problem we all have at Microsoft [laughter]. >>: [indiscernible] there's another project that [indiscernible] is doing with [indiscernible] as part of the security. We could talk about it off-line. >> Jeremy Gummeson: So what we're going to focusing on in this talk is kind of the two last cases here, so character input and shape writing, so basically looking at characters as shapes and trying to detect those shapes accurately so we can emulate characters. There's obviously a variety of challenges in getting something like this to work effectively. The first challenge is related to energy, so how do we keep the ring always available for input when we are only relying on this, you know, harvested power from a phone and a small battery? So the second is sensing. If you have sensors that are located, you know, in this segment of maybe the index finger or the ring finger, you might not be able to get good gesture, you know, accurate sensor readings to have accurate gesture recognition. The third is computation, so how should the ring process the raw sensor input? Should you do all of the processing on the ring? Should you do some processing on the ring and then push those results and have something that has more computational facilities do the rest? This trade-off between computation and communication is also important. Otherwise, you'll end up killing the battery on the ring. Yes? >>: [indiscernible] if I'm [indiscernible] the ring, is it easy for me to [indiscernible] the phone through my ring or just [indiscernible] how about the perception of me looking at the device phone [indiscernible] because it's already there? How about the perception of [indiscernible] providing advice [indiscernible] that's not? How do you think [indiscernible] do that? Would you feel comfortable the ring that? >> Jeremy Gummeson: So you are saying that you have some kind of remote display in wearing the ring and you want to be able to interact with it? >>: [indiscernible] I think [indiscernible] would be the user perception how does the user… >> Jeremy Gummeson: Sure. That would be a fifth challenge, right. >>: Like [indiscernible] the watching screen [indiscernible] >> Jeremy Gummeson: That would require, you know, a user study to understand what that is. Yeah right. Okay. So there are several different approaches towards entering symbols on different types of devices. So BlackBerry is kind of on its way out but it was really popular because it had this very accurate tactile method of input where you have these keys that you actually press, so a few advantages of pressing buttons, right? You can do this at really low power. You are basically processing interrupts instead of having to continuously process sensor data. It's very accurate, right? I get a nice tactile feedback when I want to push a button. You might be faster with buttons as well in entry. A couple of the limitations are form factors, right? If I want to have a lot of buttons, I need a lot of space. If we're thinking about something like a ring, you don't have a lot of space to work with. Then how do you map a rich set of symbols if you are kind of constrained in terms of space, how do you have a rich set of inputs? Another thing that became popular I think starting with the Nintendo Wii a few years ago is doing gestures in 3-D space. One of the advantages here is it's very verbose, so you you can move your arm in very large areas and do a lot of different types of gestures and these motions might be really natural to people. But one of the issues here, right, there's a lack of intent. If I'm wearing a ring and I wanted to process things in 3-D, you don't want the user to have to press a button, so I'm moving my hands around all during the day and I don't want these spurious movements to be misinterpreted as gesture inputs. So this is a problem, and these can also be low accuracy, right? It's kind of hard for me to kind of perceive where I am in this 3D space in front of me. Like if I wanted to accurately reproduce a gesture, it's not exactly clear where the exact boundaries are. You also need to deal with things like variable orientations because you're in 3-D space and then this can also be higher power because you have to stream out maybe a lot of accelerometer readings in order to localize the thing in 3-D space. So a third way that we can consider is doing something in 2-D space. One of the advantages is -- so this is one of the first things that we learn in school, so handwriting. We learn how to write letters on a flat surface. So some of the reasons why this is attractive, so you get active feedback, so when I'm sliding a pencil on a surface I can feel the vibration of the pencil into my hand and I know maybe how fast the object is moving. If I'm thinking of even just writing doing fingerpainting, right, I can feel my finger moving along the paper. This is something natural because people have been doing this, you know, for ages. You look at cave paintings and tablets, I mean, this has been happening for a while. And surfaces for doing this are available all around us, so if you aren't restricting yourself to like capacitive touch surfaces, maybe if you're considering tables, or whiteboards. Maybe it could even be a your trousers could be a 2D surface that you would use to enter input. I mean it's always readily available. A couple of the challenges here are surface detection. So how do I disambiguate when my hand is in the air and when it's actually on a surface? That's an open question. So it's less verbose than 3-D, so I'm kind of constrained maybe to a smaller space where maybe I can move my finger in a comfortable way and then power is a question. So we don't really know how much power it's going to cost to do this surface detection. Okay. So how does a user input 2-D gestures on a ring? This is our idea on how you would do this. Step one, the user will initially tap on the desired surface. That's when you first put your finger down on the surface that you want to interact with. Step two, this is actually an optional step. This is, so you have this challenge that you need to understand what coordinate space you're talking about, so the user might first have to enter some reference gestures to let the system know kind of how the user’s oriented relative to the surface. Thirdly, the user enters a series of strokes on the surface to interact with some other device. The fourth step which the user doesn't do and it's implicit is they stop entering strokes and then the ring will go back into a low-power state where it's not interpreting gestures anymore. With a 1 milliamp hour battery that well actually fit in a ring like platform, so based on some back of the envelope calculations we did on hardware that's available on the market that we actually used to do some prototyping, so we found that giving a 3700 micro Jewell battery capacity you can do about 4000 gestures on a full battery. And given our previous results on NFC harvesting, you can recharge all of that energy from the phone in about 20 minutes. The idea is you are using your phone periodically during the day and that battery will kind of keep being topped off and then when you want to do another type of interaction without the phone there will be energy available for you to do that. We haven't actually evaluated how effective that is. It would require a longer-term, you'd have to have people wear these things and understand how often they use phone, but that's another topic. The first challenge that I want to talk about is segmenting gesture data. We need to know when one gesture starts and another stops. This is just me sitting at my computer and I'm sliding on my trackpad and I'm actually inputting an up gesture now, but it isn't clear from the video at least whether I'm moving up or down. You can't really tell when my finger is raised off the surface or when it's down on the surface. It's kind of this fuzzy notion. How do we do this? Doing this there's two main challenges, so the first of which is detecting whether or not finger is on the surface. First we need to know if it's up or down and then if it's on the surface. And the second is distinguishing between different gestures while the finger is moving on the surface. We know that the finger came in contact. Now what is it doing while it's on the surface? The first part of the talk is going to be how we detect the surface. Related to those steps that I showed earlier, we want to first detect when a finger lands on the surface. A great way to do this, so you use an accelerometer. What you look for is a sudden deceleration in the z-axis. When the finger stops when it hits the surface, you'll see a spike and one of the great parts about currently off-the-shelf available accelerometers is you can do a threshold based wake up for very low power costs. For less than a micro ampere you can determine when these spikes occur. Then the second part is continuing to detect the finger as it moves across the surface. Surface friction, is emitted from most surfaces as audible, band limited noise. What we're doing here is we are using a low-power microphone and some signal processing techniques to observe the surface friction noise and we are able to do this at reasonably low power consumption, less than a milliamp with an optimized circuit design. What this lets us do, so if you combine this with the accelerometer, another added benefit is that it reduces false positives from spurious taps. Say if I'm wearing this ring and I'm nervous and I'm tapping, you know, on the side of my trousers, you'll have a lot of false wake ups and you waste a lot of power. In order to keep the system up in an active state what you look for is a tap followed by the surface friction noise, and if you don't hear both you quickly turn the system back off. >>: Is there some sort of input to the user that tells you it's [indiscernible]? I know that… >> Jeremy Gummeson: There is one prototype now, but you could envision maybe quickly pulsing an LED, something like that to… >>: [indiscernible] >>: For the accelerometer the z-axis is always vertical with [indiscernible] >> Jeremy Gummeson: You can actually use, you don't have to just use the z-axis. I think it looks for actually any of the axes. I just said the z-axis here because I'm assuming you have a horizontal surface and maybe that's the one you're looking at in particular. >>: So is the wake-up power for the ring itself? >> Jeremy Gummeson: The wake-up power, so that is just for the accelerometer. I'll have a slide on, a number for this little bit later, but basically you're in this low-power state where you have everything asleep. The accelerometer is using only .27 microamperes of current and it's just waiting for these spikes. >>: And what about the battery [indiscernible] >> Jeremy Gummeson: I have more slides on that. This is our initial experimental setup. I have to be honest. It's not a ring yet. That's kind of a work in progress, but what we've built here, this is a bicycle glove I bought at the Commons and this is a, it's a commodity accelerometer on an evaluation board and it's in the location on the glove where a ring would be located. So the sensor data that you get from it would be reasonably close to what you would get from a ring. And then it's connected to a microcontroller that basically outputs the accelerometer readings over a serial port. We also have a low-power mems accelerometer basically on the other side of the finger, on the opposite side of the finger from the accelerometer and we basically are outputting all of the data from that microphone into a PC sound card so that we can analyze what these surfaces sound like. >>: Why did you use the index finger? >> Jeremy Gummeson: The index finger is easier to write with. >>: Yeah, but nobody wears rings on their index finger. >> Jeremy Gummeson: They might. [laughter]. They might. It's possible that you could do it on the ring finger as well. Maybe the motion within the two are correlated, but we started with the index finger. The first part of detecting the surface is the finger impact. This is just a time series trace of data that I get from the three axis accelerometer and these spikes that you see around two g’s here, that's actually when the finger is striking the surface. It's reasonably easy to detect that. But what about the other part? I said there were two parts. First there's the impact. Then there's the sound that the surface makes. Let's do an evaluation kind of right now. What does this sound like? The first is a really loud sound. This is a piece of Styrofoam and this is what it sounds like. It's almost like nails on a chalkboard, very loud and easy to hear. This is going to play all the way through. Okay. And the next is a surface like wood. You might not even be able to hear this. We are actually able to pick it up with the microphone. And then the other scenario you can think about, right, is when there's external noise that might be swamping the signal that you're looking for. So here's some children playing on a playground and I was playing this sound from a laptop next to where I was performing a series of gestures with the ring. I'll show in one slide that we're actually able to disambiguate the two signals. I actually evaluated 12 different types of surfaces and they all produced some common frequency band limited noise and in addition to those evaluations we did, we looked at this journal paper from the Acoustics Society of America and they kind of confirmed our suspicion that a lot of surfaces have these common frequencies. These are actually results, the results that I referred to in the last slide about what this looks like when we have the noise of these children playing on a playground and then me performing a series of gestures. On the left here, these red regions -- first I should explain the plot here. What we have is a spectrogram. The xaxis is the frequency components of the signal that we're looking at and the y-axis is the time that that special content was present. What we see here, all this red is the children playing and these yellow bands here that are separated by blue bands; this is me dragging my finger across the surface for about a second, picking it up and dragging it again for a second. There's a lot of space here where they don't overlap. One place where they do, so in the lab where I was in the experiments there's actually some servers on and they generated this yellow band that is kind of there present through the whole trace, but there's plenty of frequency space around that that doesn't overlap. If we wanted to be even more immune to noise, there's a couple of other techniques that we could use. We could use something like dynamic filtering where we have a programmable filter that can look at different regions of frequency content. The second might be time domain analysis, where you have really short-lived noise you could filter that out because that doesn't look anything like these surface movements. It's much shorter. To do this audio processing, there's three steps that we need to do. We need to do processing because the ring isn't going to be able to look at raw audio samples. You'd have to sample audio at 44 kHz and that would very quickly drain that 1 milliamp battery I showed earlier, so we use a hardware filter. First you apply a bandpass filter that's constructed to look at that region that's usually separated from human speech. Then we apply some gain, so we need the signal to be bigger in order to be interpreted by a microcontroller with say analog-to-digital conversion. Then we also want to use something like an envelope detector so we can actually sample that signal at a relatively low frequency and understand when that noise is present and when it isn't present. What does this look like after bandpass filtering? What we see here, this is another spectrogram plot. This time I was drawing the letter L on a surface, so these red bands that you see here that are very close together, those are individual strokes of the letter L. We're able to do this with filtering and gain and so after the envelope detector we can actually see the two peaks that correspond to the strokes of the letter L. This is a promising result. This is all implemented in Matlab. This isn't from an actual hardware implementation of the filters, so we were kind of guided by these results to do an actual filter design. This is what it looks like. It's a little bit of a mess right now, but what we have here is, so these two boards here are the envelope detector. This one does some filtering and some gain. We have a microcontroller, the same accelerometer from before and a similar microphone from the previous setup. We chose the particular components that we used so that the op amps that are used for the filtering are very low power so all of them together used 620 microamperes. The accelerometer even when it's active uses only three, and the micro controller, I think there are more efficient ones on the market. The one that we use was around 270, but if you sum all of those up you have less than a milliamp budget. In standby, so this is with the accelerometer in its low-power mode and when the microcontroller is also asleep we consume around only one micro ampere of power when everything is off and waiting for a user to interact with the surface. Let's revisit writing the letter L but this time with our actual hardware. These are traces that were actually output from our serial port, so basically our microcontroller was just reporting the 80 C samples that it got from our audio filter and we're actually able to distinguish the letter L. These are spaced a little bit further apart than the previous plot, so this is actually from an actual user, so I had a few people enter gestures for me and this person happens to enter it more slowly than I did in the previous, but because of the filter characteristics even if the strokes were closer together it would still work. Yeah? >>: What is the precision of the recall? Even though how long, if I write a T would it be recognized as L? >> Jeremy Gummeson: I'm going to have a few results on that later. This was just kind of to prove that the envelope detector is doing the right thing. >>: [indiscernible] fingers distinguish it from… >> Jeremy Gummeson: This is just the fact. So L contains two strokes. I move down and I move right. All this is showing is that I can get two strokes per L. They may not be the right strokes, but there are two. That's what this plot is showing. To get ground truth about how accurate this surface detection is on the movement, we actually used a off-the-shelf capacitive touchscreen. What we did is we recorded the coordinates of the finger on the capacitive touchscreen over time, so we knew whether or not the finger was actually moving. We also recorded audio from the finger motion. What we did is we compared timestamps of the audio and also the timestamps of the touchscreen coordinates and we found that they correlated pretty well. I don't have a plot here to show that, but it proved that our audio detection scheme is doing the right thing. So I just described how we detected the finger is moving on a surface. I'm going to go to the second part which is how do we construct a symbol based on maybe movement on a surface. One thing to note, right, is that symbols can be arbitrarily complex. You think of maybe not American English where the characters are simple, but maybe like sort of Japanese or Chinese. You can think of very complicated symbols that a user might answer. One naïve strategy that you might do is you might try to put all of the computation on the ring and compute the entire symbol and then tell the end device this is the character that they entered. This could be computationally challenging, so you might need to use advanced machine learning techniques, things like language models. You might have to update vocabulary specific to a user. Maybe one user writes a little bit differently than another and you might have to do customization there. Instead of trying to compute the entire symbol on the ring, we break it up into a series of strokes. For example, if I write the letter A that's going to be two diagonal lines in different directions followed by horizontal line. I would send each of those segments to the end device and let it figure out that that is the letter A. So kind of an architectural view of what we're doing here, so at the bottom we have fingers on a surface and we're detecting some signals. The ring is detecting those signals, converting them into strokes and then those individual strokes are sent to an end device; maybe it's a Windows Phone and it's being converted into symbols and words. An example that I've been referring to is handwriting. This is a letter B. this is the letter L and the letter W. This is how it's decomposed into strokes and kind of the ordering of those strokes. Then the way that you might report this to the end device, you might have different IDs corresponding to the different stroke primitives you support, some timing information. It might be important how quickly, how the strokes are grouped together. That might help disambiguate one symbol from another and maybe even in the character, like if I write the letter A, for example, the two diagonal lines might while I'm on the surface and be close together and the horizontal might be further apart, so that timing information could be helpful in interpreting it later. This is great because we can send a few bytes of data instead of sending out 400 Hz accelerometer data which would completely kill our battery. So the two core system challenges here, again, so it's identifying the beginning and the end of a stroke reliably. And then the second is using sensor fusion between the microphone and audio circuit and the accelerometer to understand the relative directional properties of an individual stroke. This is a user that I was actually collecting data from wearing our prototype. What we have here, so the x and y coordinates are oriented, so they're facing the table and a piece of paper, so x is positive in the right direction and then y is positive in the direction facing away from the user, right? What the ring actually sees is something a little bit different. We're going to have some tilt in the x and y, so basically what will happen is the part, so gravity is detected by the accelerometer and then it's going to be found in some set of these axes. You have to normalize kind of the coordinate space of the ring to the surface and then the other thing that can happen is the user can actually have their finger rotated, so that will actually confuse your x and y axes. So combining the microphone and accelerometer, how does it work? Step one, after that tap happens and the finger is first touching the surface you can compute the finger angle relative to the surface during idle periods. That's to get rid of that z component from gravity. And then the second step is identifying that the finger is moving on the surface, so you get that audio envelope and that lets you know when the stroke is actually being performed. Then step three, you can observe the finger accelerating and decelerating in different directions depending on what gesture the user is inputting. Then you can use some physics-based heuristics to figure out what that direction is. If you want to move a finger, you have to accelerate and if you want to stop you have to decelerate, so the accelerometer is going to definitely pick up those signals. Kind of a laundry list of different stroke primitives that we want to deal with, so first there's the easy ones. There's up, down, left and right, so basically you're just looking at the signs of the different axes of the accelerometer. Then kind of a medium difficulty; I call it medium because you have to actually look at combinations of the x and y axes to detect what type of diagonal motion you're talking about. Then the third is hard. I call it hard because now we actually care about the shape of the accelerometer motion so you can detect things like centripetal motion in order to understand that a curve is happening as opposed to a straight line. During the course we wanted to do all of these. That's the end goal. During the internship we focused on the easy and the medium difficulty strokes. Now I'm going to go into kind of how this works. This is data from an actual user performing an up gesture. The red plot here is the output of our envelope detector. What that lets us do is, so say if you set a threshold and you look at your analog to digital converter and you see that the voltage went above .2 volts and goes back below .2 volts, you decide okay. That's the boundary of when the finger was moving and then I can draw a line down the middle. There's the first half of the stroke and the second half of the stroke. What we see in black and green, we have the xaxis in green and the y-axis in black. We actually see the finger accelerating and decelerating, so we have 0. Because the axes are actually backwards, so negative means acceleration and positive means deceleration, but if we look at the two halves of the finger movement, we can clearly see this by looking at the y-axis and that signal is much larger than what we are getting on the x-axis. We can probably figure out that that is a vertical motion as opposed to horizontal. Let's look a little bit more at this. I just took away the envelope detector plot and this is just the accelerometer. Basically what we do, it's very simple, we look at both halves of those intervals, T1 and T2 and we do an integration. We integrate the first half of the x-axis and the second half of the x-axis and do the same thing for the y. Based on the relative signs of those integrations and then which one where the total integral is larger than the other, we can determine which axis had the motion and what direction that was, whether it was forward or backward or left or right. In this case we did up, so we see that the prominent axis is the y-axis. So this is what we did. We compared the integrals of the dominant axis during the two halves of the movement period. >>: What is this thing on the right, the y-axis? >> Jeremy Gummeson: So what is the figure? >>: [indiscernible] >> Jeremy Gummeson: Yeah. That's computing this integral right? So it's negative. >>: [indiscernible] >> Jeremy Gummeson: So basically I'm just summing over the whole thing and then representing that the entire sum as kind of the -- you can think of it as sort of the average, I think it's the average speed over the first half. >>: [indiscernible] >> Jeremy Gummeson: Total integral value is what I computed over that half. >>: Your gesture is from bottom to top and then returned, or just bounded up? >> Jeremy Gummeson: Just bottom to top and then we define down as top to bottom. >>: Then why is the [indiscernible] >> Jeremy Gummeson: Okay. These are actual, this is the raw data that is from the ring, but because I know how the accelerometer is actually mounted on the ring itself, I actually, so in software you can actually reverse the axes and figure out the sign you're looking for. >>: [indiscernible] graph on the envelope, you are going to the peak and then going back down. >> Jeremy Gummeson: This is just the magnitude of the audio signal. >>: Oh. So during the peak it's actually transitioning, from bottom to top. >> Jeremy Gummeson: Yes. Exactly, so I'm speeding up during the first half of the finger motion and then I'm slowing down. >>: And the gesture is finished when your fingers at the top. You don't bring the finger back down? >> Jeremy Gummeson: You don't bring the finger back down; that's right. In order to do up down, left and right, it's a simple process. I just filled out a table here for the signs you are looking for. Say if you correct for the axes, so what you are you looking for is an initial acceleration in the y-axis and then a deceleration in the y-axis and you're looking for basically no activity in the x. The opposite is true for the down and then you look at the other, the x-axis for the horizontal movements. You can come up with a very simple algorithm, right, that does the integration and then compares which axis is dominant and what the sign is. This isn't too hard. One of the advantages of only looking at these four gestures is that you're relatively immune to rotational drift, so there's 90 degrees of difference between say up versus right or right versus down, so if the person kind of drifted while they were entering the gesture and maybe entered something that looked like a slight diagonal line, you would get what they intended and actually get up. One of the cons is that you are you limited to four features. That's not to say that we couldn't say string a bunch of these horizontal and vertical primitives together to do something more complex. You could do that, but you might want to have a richer set of strokes to begin with. Yeah? >>: How did you send it to [indiscernible] >> Jeremy Gummeson: It's not actually that sensitive to that threshold, so basically the important part is picking it to be the same on both sides. As long as you pick points [indiscernible] on both sides, you're going to have a symmetric view of the motion and when you do the integration the math works out. So you want something that's wide enough, right, to be able to see as much of the signal as possible, but not so wide that you're actually getting some of the accelerometer noise as part of your computation. >>: [indiscernible] right now instead of observations [indiscernible] >> Jeremy Gummeson: Okay. To understand how well this works, so first we looked at four gesture classes, up, down, left, right, so I had five helpful people enter gestures for me and I looked at these four gesture classes. I asked the participant to basically enter one of these gestures ten times in a row for each of the gestures. I collected all the data with the glove. I outputted it over the serial port and then I analyzed it off-line using Matlab, so basically what I did is I implemented that simple algorithm in Matlab and saw if I could reliably determine which gesture was which. For one user, this user did particularly well. We compute the correct gesture among the set of gestures that are available. A hundred percent of the time except one time we computed, we falsely interpreted the down gesture as a left gesture. That's not perfect. >>: [indiscernible] not for now, but to look at if the user on the input, so how are they actually interpreting it. >> Jeremy Gummeson: That's actually a great idea and I should've done that. What I did do, so after each user was done entering the gestures, I took pictures of their hand and kind of what kind of orientation they were using during the whole session. But to understand the dynamics while this was happening that would've been really valuable and the next time I collect data on this that will definitely be something that I will do. Then when we look across all five users, of course things degrade a little bit, but not by much. Again, we confused the down and the left gestures a little bit more, so we went down to an 86 percent accuracy there. >>: Is this all right-handed people or left-handed people too? >> Jeremy Gummeson: They were all right-handed. So the glove, it's a right-handed glove so I didn't actually check but based on the… >>: [indiscernible] >> Jeremy Gummeson: Yeah. If we add a little bit more complexity, right, we can think of doing diagonals. The way that we do this, so first right we have up, down, left, right, and they look exactly the same as they did before. But then we kind of have this fuzzy notion of where, if you're doing a diagonal line, you are going to see some amount of activity in the x and then also some amount of activity in the y. If you're asking people to do diagonal lines, if they were drawing a 45° angle and if your calibration is correct, you should see the same magnitude. But in reality, the users as they input the gestures are going to drift a little bit and they might actually even rotate their finger, so they might be entering the angle correctly on the table, but then you misinterpret it because your axes aren't aligned anymore. In this case the user was entering a down right stroke. This is after I do the integration, you see a comparable amount of activity in both the x and y-axis, but the signs are opposite, so that's how you figure out the directionality. Yeah. As I mentioned, it's very susceptible to individual user variations and also to finger rotational drift. So the first could be fixed if you had a scheme that a kind of adapted to different users if you use one of these more advanced learning techniques. And also that type of approach could also help you handle more of the rotational drift. If you actually wanted to completely solve the rotational drift problem, if you wanted to completely solve it you have to add a Gyro, but it turns out that gyros right now cost significantly more power than an accelerometer so we're trying to avoid using that. It's certainly something that you could add if you were willing to deal with a bigger battery. First I'm going to look at what I call the best user. They got almost a hundred percent accuracy across all eight gestures. In one case, so it was the down gesture, it was misinterpreted as a down left, and that's a reasonable mix up. So maybe when they went down that time it looked more like a 45° angle than like they were going to down. >>: [indiscernible] >> Jeremy Gummeson: Yes. It was. It was neither myself or Bodhi, I might add. [laughter]. So when we added all users to the mix, things actually degrade significantly. In the worst case, so when we're entering the up right gesture, right, so we only get it right 54 percent of the time. 20 percent of the time we think that it's right and then 26 percent of the time we think that it's up. So some of the users were probably entering an angle that looked more like right than a diagonal, right or down than a diagonal that's in between. This is something that we want to address in the future using some lightweight machine learning approach. SVM was actually one of the suggestions that came to us from a machine learning expert. But I mean this is an encouraging result that this is possible. This is good enough that maybe machine learning can help push it up to maybe 80 percent plus accuracy. >>: So you instructed before the experiment to like try to draw like straight lines. >>: Yeah. It wasn't specific. I didn't say draw a 45° line. I said draw an upright gesture. So maybe if I was more clear, maybe I would have gotten more accurate results, but I wanted to kind of -- I mean, I wanted to observe variation in users, so how do people actually use this stuff. >>: [indiscernible] understand, you know. >>: And they were not following the [indiscernible] >> Jeremy Gummeson: There were no reference lines drawn in the table either, so what I had was a blank white piece of paper and they used their imagination. >>: [indiscernible] up to a hundred. >> Jeremy Gummeson: Okay. Slight math mistake there. >>: Do you have a diagram as to how online your accelerometer axis to the actual table? >> Jeremy Gummeson: Yeah. I can sort of compute that for each individual stroke that users interpreted, so I can actually adjust for it after the fact. When I do the data analysis, right, at the beginning I calibrate the axes and then I leave it alone for when I look across the whole trace. But if I wanted to determine how much it drifted, because I know the ground truth of the gesture that they entered, I could actually adjust how that angle is. >>: I think it would be [indiscernible] to see how drift versus a constant bias. For example, [indiscernible] draw with the line, I would want to see where things are slightly sort of… >>: Yeah. But that's [indiscernible] you are not doing [indiscernible] instruction in the beginning? >> Jeremy Gummeson: I am. >>: You are? Okay. And where are you doing that? >> Jeremy Gummeson: I'm doing this when the finger first touches the surface and it's not moving. >>: Are you doing it [indiscernible] >> Jeremy Gummeson: Yes. >>: Oh wow. You are going to force power [indiscernible] >> Jeremy Gummeson: Right now I am doing and off-line in Matlab. >>: But you are doing it through [indiscernible] so what you are doing is rudimentary [indiscernible] and this like [indiscernible] and [indiscernible] >> Jeremy Gummeson: So we are not computing sine cosine. >>: But that [indiscernible] error, protection error. >> Jeremy Gummeson: That doesn't help; that's for sure. Yeah. Okay. So now that I've kind of showing you what eight gestures look like I want to show you what combining strokes together looks like. I talked earlier about combining strokes into doing letters and so this is actually a user entering the letter Z. That's a combination of diagonal lines and horizontal lines and the four peaks you here in red are individual strokes of Z. So this is someone that drew Z with the line in the middle, so you see these are the three first strokes of the Z and then the fourth is a line in the middle. So this is just an example of three instances of this particular user entering Z. When you look across, so there were two users so in the general case it didn't do that well, but two users I was able to get 70 percent accuracy in detecting the Z just using these simple heuristics that we've developed so far. Future work for this is doing what I call advanced gesture detection, so if you have something like a left circle up gesture, so say I am writing the letter B and the B will consist of a vertical line and then kind of two half circles to the right. Maybe the direction that those half circles are made is important as to how the letter is constructed. We can actually see some interesting features of those motions on the accelerometer right now; we're just not completely sure how to deal with the signals. For example, I have two entries here from one of my users where they did a left circle up and a left circle down and so, for example, the y-axis here so you see that it's concave and convex in different parts of the curve during the first and second half? And then, for example, another thing we noticed is that in the x-axis if you look at kind of the energy of the signal, there's more energy in the signal in the left half than on the right, so that might be able to give you your directionality. So one of those two axes will give you directionality and then the other might tell you that the you're drawing a circle, right, because of the characteristics of the centripetal force. I'm not going to say much more here because this is all fairly speculative, but I think that a machine learning approach might be able to deal with more complex signals like this. Yeah? >>: On the previous slide, the slide with the Z, were you disambiguating that from other characters like from an S or an N? >> Jeremy Gummeson: Not in this case. This was basically the user entering the letter Z ten times in a row and even that was, so I'm not doing any time domain analysis right now either, so I think the key thing to distinguish different characters from each other would be the gap between groupings of strokes. But this was just knowing beforehand the letter Z and then trying to figure it out based on the sequence event. >>: [indiscernible] >> Jeremy Gummeson: That's right. That's all I have for kind of current work. To kind of conclude I showed you the sensor fusion approach that we used just for detecting gestures on a ring using an accelerometer and a microphone. So our next step we want to add more audio noise robustness, so there are a couple of things I mentioned about doing this kind of time domain analysis of the envelope signature to know what is spurious noise and what is a stroke. And then another way might be having this adjustable filter so that you could look at the region of frequencies that don't overlap with things that you detected as noise. I mean, you could maybe change the characteristics of that filter as the user is using the ring. Next, you know, being able to adjust for finger rotations using reference gestures, so the idea there is maybe you could even enforce this. Maybe like once every ten strokes you have the user draw a vertical line and then a horizontal line and if they've changed their finger orientation a little bit, you could use kind of that reference gesture as a way to realign your coordinate space. The third thing, so machine learning, so you could use an SVM classifier to actually look at all the 12 gestures including the curves. So far I just showed heuristic-based results for doing eight. We would like to be able to get up to 12 and to do complete letters and sequences of letters. We need to do a more extensive NFC harvesting evaluation. Say someone is wearing a form factor ring and they are using their phone during the day. How much do we actually get in practice? So in our MobiSys paper this year we actually did some analysis, so there is this live labs project at Rice University where they had lock, unlock traces from phone usage and that is indicative of an opportunity that you have to harvest energy from NFC because when the phone is unlocked you are able to harvest power. By looking at those kinds of characteristics, you get an idea of how often you could recharge the ring. The next really important thing is like building a form factor platform. The components that we've chosen thus far are right amenable to miniaturization, so these are all just simple off-the-shelf op amps that are available in smaller packages. There's nothing that would prevent us from putting this in a ring size object because we've designed it around a small battery. It's a power efficient implementation. We want to do more user studies doing things like different characters in the same session, looking at having a camera to be better able to understand how people move their finger around while they are performing gestures. And then finally, doing an end-to-end evaluation with the ring as an actual UI device. Maybe I'm in my living room and sitting at the coffee table and I am playing games on my Xbox and then I decide I want to be able to navigate around the dashboard and select different media and maybe a different game. I happen to be wearing the ring, so instead of, I could use the table, right? So maybe I'm not even playing a game and I don't want to use the controller, this might be like a more seamless way to interact with things that are in your living room. Of course there are a bunch of acknowledgments. First I want to acknowledge Bodhi; he's been a great mentor, a lot of really valuable guidance in steering the project in the direction we took. Thanks to Jie and the rest of the Sensors and Energy Group for having me as an intern. I had a lot of really valuable discussions with different people in the lab that helped mature the project. I had a couple of discussions with Matthai Philipose and Tim Paek. One of them is a machine learning guy and the other does stuff with UI so they had a lot of nice ideas that we incorporated. Of course all of the people that contributed gesture data and finally my fellow interns. We had a lot of great discussions with people and sometimes the things that help the project the most are random ideas that people will have over a dinner conversation, so thanks to them. And thank you all for attending. If you want to get in touch with me after my internship is over, this is my e-mail. At this point I would be happy to take any questions. [applause]. Yeah. >>: You use the x and the y-axis [indiscernible] using the z-axis [indiscernible] >> Jeremy Gummeson: It could. If the finger tilt changes along the z-axis maybe that would give you additional hints that might be able to let you tell one character from another, for example. It might let you more accurately choose the beginning and end of the audio envelope. Yeah. >>: [indiscernible] or just the response of that and the sound, right? >> Jeremy Gummeson: We did time that the movement start just based on the sound and then after the fact based on when that envelope went above and below a threshold, that's where we'd know where to look for the accelerometer data. >>: [indiscernible] surfaces over time [indiscernible] how you differentiate the random task surface [indiscernible] me adjust my glasses, me scratching my head, me scratching my legs. How do you differentiate that versus an actual gesture? >> Jeremy Gummeson: Great. First, if you have a strong -- you can enforce like a strong user tap to start interaction with the device, so that might help eliminate some of the tapping type things. Maybe I'm not tapping really hard throughout the day; it might happen sometimes, but less often. So maybe you can do more careful frequency analysis. Maybe not all surfaces look exactly the same. The other bit too is say that you are trying to identify different gesture inputs and all the time I'm just getting garbage. Obviously, you would probably want to turn the thing back off. >>: You are looking for a tap and a slide, not just a tap? >> Jeremy Gummeson: Dimitrios was saying that maybe I tap my face and then I slide. >>: [indiscernible] [laughter] maybe tap twice. >> Jeremy Gummeson: Tap twice and then slide. And if that's not good enough, three times [laughter] and two slides. >>: Every six weeks we're [indiscernible] [laughter] >>: So how [indiscernible] regarding some applications of this, how is this sensor techniques of, are they sensitive kind of let’s say angry touch? I mean in speech recognition say [indiscernible] common mistake is maybe the subject will speak louder or slower and that actually makes this worse. So in this case if I touch say harder and will that change the characteristics… >> Jeremy Gummeson: So you are saying, based on -- so you are saying the user's emotional state, the characteristics of the way that they answer strokes and characters might change and that might… >>: [indiscernible] user may have a chance to learn to adapt to the device so that if they miss it then the next time they would immediately know how to adjust that compensate. So do you see this possibility? I think this device basically everybody will have a learning curve to adapt to it. >> Jeremy Gummeson: Yeah. So you do need learning in both ways. You can have the ring learn what the user does and also you can have the user learn when their gestures are not being input properly. One way you can do that is maybe you can have a plug in or something on the device that you are interacting with and maybe you have some non-obtrusive, something like a colored region or something that lets you know whether your inputs are good or bad. You could also think of having something like a multicolor LED that turns on very briefly on the ring that lets you know kind of how you're doing. >>: Obtaining the threshold [indiscernible] because if you are moving it pretty fast you get a high-voltage but if you are moving it very slowly the actuation could be slower. >> Jeremy Gummeson: Sure. >>: [indiscernible] use [indiscernible] most likely one of these things [indiscernible] feedback that you said, right? [laughter] >> Jeremy Gummeson: Yep? >>: How do you do segmentations? So you've got people writing multiple letters in a row; how do you know which strokes go together to form a letter? >> Jeremy Gummeson: Right. We don't actually have a technique developed to do that, but our intuition is that strokes that correspond to one letter should usually be grouped more closely together than ones that are part of different characters. I don't know whether or not that's true. It's probably true maybe 80 percent of the time and then 20 percent of the time you have to do something. >>: I thought maybe the [indiscernible] weighs more than [indiscernible] >> Jeremy Gummeson: Right. So the context of the use case matters, so maybe if I'm doing -- it depends on what type of text you are answering or I mean, if you are you doing simpler gestures that might not matter so much. >>: [indiscernible] character recognition systems like PalmPilot [indiscernible]? Pretty fine characters you had to use because they couldn't recognize the actual letters, but I relied on the fact that they could [indiscernible] while you put your finger on them, but you can't actually detect when the person's got their finger off versus down as long as it’s still. By doing up and to the side you don't know if I pick my finger up or not necessarily. >> Jeremy Gummeson: That's not necessarily true. You might be able to detect something from the accelerometer, but right now we don't depend on that. You might see a change in the z-axis to detect that the finger is moving up. We do know that those are two distinct strokes, but we don't necessarily know right now whether the finger has been lifted. >>: [indiscernible] character anything that has a vertical and a horizontal, like a T or a plus and an L, they all look… >> Jeremy Gummeson: Again, you can look at the gap between the strokes. I looked at a lot of the user data, right, and say for example you are writing the letter A and you have those two diagonal lines, those two are spaced very, very close together. But say if you are writing the letter T you are drawing a vertical line, lifting your finger, moving over to draw the horizontal, you see a lot more space between the two. >>: I think another answer would be you end up writing because the hand is the most [indiscernible] to do. Maybe that is the main argument scenario for [indiscernible] the benefit of some UI space. >> Jeremy Gummeson: Right. Maybe I'm drawing x’s, triangles, circles, squares, you know, that kind of thing. >>: If you start to rely on a gap, the time between the strokes, the angry strokes might begin to affect things. People tend to do things very slowly and deliberately and then you can't use that [indiscernible] >>: [indiscernible] that's why you push forward [indiscernible] >>: That's when you need a [indiscernible] [laughter] >> Jeremy Gummeson: We need to get the GSR sensor to work so we know how angry they are, yeah. >> Bodhi Priyantha: Okay. Let's thank the speaker. [applause]

>> Bodhi Priyantha: Good afternoon everybody. It's my... Gummeson, a fifth year PhD student from University of Massachusetts,...

Related documents

Products

Support

&gt;&gt; Bodhi Priyantha: Good afternoon everybody. It's my... Gummeson, a fifth year PhD student from University of Massachusetts,...

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib

>> Bodhi Priyantha: Good afternoon everybody. It's my... Gummeson, a fifth year PhD student from University of Massachusetts,...