>> Ganesh Ananthanarayanan: We are excited to have He... UIUC who works with Romit Roy Choudhury, who is quite...

>> Ganesh Ananthanarayanan: We are excited to have He Wang today. He's a PhD student at UIUC who works with Romit Roy Choudhury, who is quite familiar to many in MSR. He himself is no stranger to MSR. He's interned here twice. Today, he'll be talking about his work on multi-modal sensing, gesture recognition and indoor localization. >> He Wang: Thank you. It's my pleasure to be here to talk about my research work, which is using mobile devices to sense a user's location. And let me start by saying a few words about the background of this research. In a matter of 10 years, from 2005 to 2015, the mobile phone has transformed from a basic communication device to a smart device that performs sensing, computing and communications. And today's smartphone has 14 sensors on them, more than five communication interfaces and even more CPU power than the Apollo guidance computer that landed man on the Moon. Now, given these devices are always on and always with us, we value these devices as a general-purpose human sensor capable of zooming into our lives, understanding our daily activities, preferences and behavior patterns. And Silicon Valley has been calling this the quantified self, essentially suggesting that the data from these devices can be used to cull inferences about ourselves and ultimately enabling a wide variety of applications. So a few applications that are already on the market. Let me just talk about a few examples. Using the accelerometer data on the smartphone, it is possible to count the number of steps that a user has walked, and many calorie score applications have emerged. And the newer smartphones has the skin conduction sensor that can measure the heart rate and enables many mobile health applications. And of course, GPS gives us the driving directions, right? While these were really cool and important a few years back, but now they seem obvious. And users lack patience for the next generation of mobile applications -- is much higher. And if you want to deliver better applications for the future, research is necessary towards robust, efficient and practical influencing techniques on humans. And here are just a few examples that are still quite hard today. Finding a user's indoor location, estimating a posture of a person and tracking precise hand gestures and various forms of context awareness. My research focuses on developing inference and techniques using the multi-modal sensing data for mobile devices, such as smartphones and smartwatches. And I believe these inferences can enable new kinds of humancentric mobile applications. For example, consider business analytics, so if you understand this user's location, postures and gestures, then we can understand the user's shopping behavior in a grocery store, and interestingly, this is already happening extensively in the web. Our clicking patterns, our mouse movements, are driving a billion-dollar business called web analytics. And I believe our footsteps in the indoor environment is indeed like the clicking streams online, and when we look at a cereal box in a grocery store, it is indeed like right-clicking an online item. So this physical and web analytics are really similar, but still, these physical analytics is still not present today, simply because we don't have the ability to understand the user's location, postures and gestures. Suppose we have these location, postures and gestures as building blocks. Then I believe we can enable this physical business analytics. And as many other applications open up - for example, understanding the location and orientation of the phone can enable augmented reality for indoor museums and -- sure. >>: [Indiscernible] do you also consider using the infrastructure and cameras and things, which might be a more direct indicator of where you are? >> He Wang: That's a possible solution. So if you run the analytics using a camera, that's another possible solution. But there will be other challenges, such as [indiscernible] or something else. So there are different tradeoffs. Yes. So a gesture-based control can be done just by a smartwatch, and finally, this virtual gaming, virtual reality around the corner, and they are emerging, and this understanding the postures and gestures of a user's hand can be key building blocks for such immersive gaming experiences. And if you look at these building blocks, across different models, the common denominator is really the location, the location of the human body, indoor localization, and the location of the arm and posture estimation and the location of the hand and fingers for precise hand gestures. So my talk is really about how we can leverage smart device sensors to do this macro to micro localization on the human body. So let me start by talking about indoor localization. So the question about indoor localization is essentially where am I in the indoor environment? And there has been a huge amount of work on this topic, and let me just quickly send out the key ideas. So the first question is, why don't we just use outdoor localization technology? For example, GPS. Well, the problem with GPS is that GPS signals do not penetrate well into the buildings, and even some that do that bounces too much in the indoor environment, leaving the GPS receiver with poor accuracy, if at all. And WiFi turns out to be a reasonable alternative, especially given its wide availability, and it's probably the most popular approach thus far. That's assuming we have a couple of Wi-Fi points in the environment, and now when the phones run Wi-Fi scans, they can get a list of Wi-Fi signals. As the user walks around, they received signal strength will change. Therefore, the signal can infer the user's location. But to make this happen, somebody has to go to everywhere and then build this mapping between the signal strength and the user's location. That takes a lot of effort, and even worse, this mapping changes over time, so therefore, to ensure a high accuracy, we need to periodically calibrate the signal and build this map. And Wi-Fi is not the only approach that people have considered. We also look at deploying beacons, for example, Bluetooth beacons and sound beacons. For the sound beacon case, people deploy a couple of sound beacons in an environment, and then we can use smartphone microphones to calculate the -- use the timing to calculate the distance between the phone and the beacon receiver. From that, they can triangulate and figure out the user's location. And Bluetooth beacon is another approach that has a smaller range, and then when the user passes by, the phone can hear the whisper of the Bluetooth beacon, and then can figure out the location of the phone. But the problem is beaconbased approaches is that the huge deployment needs knowledge and cost, and so therefore, many companies like Google, Cisco, Intel, Samsung, they insisted on us to provide a system that is software based and scalable. So under this context, we started thinking, can we build something that does not rely on the infrastructure? So that seems possible, and also around the same time that the smartphone has this motion sensor, like accelerometers, compass and gyroscope. So we started thinking, can we use these motion sensors themselves to figure out the user's location as opposed to relying on a Wi-Fi or acoustics signal from the environment? So that seemed possible, because when we used accelerometer data, when the user walked, the phone bounces, right? That should tell us the status that a user has walked, if you just use simple filtering techniques. And also, the compass will give us the direction, in which direction in which the user is walking. So if we combine the direction and distance, we can estimate this actual path that a user has walked. So we tried this. We collected some data, and then it turned out we failed miserably, because -- so then we come back to this later and we find this method called dead reckoning, and dead reckoning is essentially fundamentally difficult, because our environment has these metals all around. And these metals cause the compass data to fluctuate, and when the user plays with the phone, the phone also bounces. Therefore, all these together -errors together, it makes this dead reckoning tracking deviate from the actual path over time. But actually, in the past, when people did not have GPS, dead reckoning has always been used for shapes and planes to figure out their location. For example, in 1922, Charles Lindbergh has landed in Paris after a nonstop flight from New York. And at that item, there was no GPS, and Charles Lindbergh just used dead reckoning, and how could he do that? This is such a long trip, and he just used dead reckoning. The trick was that he obtained fixes from the stars, and he used that to guide himself. So once we knew this, we started thinking, hey, can we magically bring some landmarks of stars to the indoor environment? We have this, when a user walks, the moment he hits a landmark, we can reset the user's location and the user can dead reckon from there, and we can keep doing the same, so therefore over the time, this new estimated path in this green line is much closer to the actual path. That would be awesome, but a problem is that in the indoor environment, we don't have the stars or landmarks. So we start looking to our environment and start thinking, can we find some landmarks? And at first, we cannot find any, but then we collect some data and go back to live, and we suddenly realize that if we see through the data through the eyes of the sensors, then we can find many unique patterns in the environment. For example, we find this unique magnetic fluctuation, and that could be potentially used as a landmark. And out of curiosity, we also check where does this happen. It turned out this happened near a water fountain, and we also observed other patterns, such as accelerometer data, so we find some over rate and lost rate in the accelerometer data, and by the way, our smartphones today have the barometer data that can monitor the pressure changes, so we also observed these pressure changes along with this accelerometer pattern. So it turned out these patterns are caused by the elevator, and this particular spot, when a user passed by this, the cellular signal drops significantly, and then again comes back. So it can also be used as a landmark, and there are many other examples, such as Wi-Fi fluctuation, turning around a corner, taking stairs and so on and so forth. So we can use them as landmarks to help us calibrate the user's location, but of course we cannot predefine all of these patterns, because we want our system to work all over the world, in all our buildings. We don't know whether their buildings have this water fountain or have the elevators, so the idea is to automatically learn this pattern from the environment. Yes. So using this cool idea, we developed a system called UnLoc, unsupervised indoor localization, and we need to solve three main design questions. One, how to automatically detect landmarks, and second, how to localize the landmarks, and third, how to localize the users, and our solution to all these problems are interdependent and recursive, but let me try to explain them one by one. So let me start by talking about how to automatically detect landmarks. So let's consider a user walking in this building, and let's say that for the next five minutes, we have a reasonable estimate of the user's location. Say, perhaps from dead reckoning, but this is a rough estimate. But we will soon relax this assumption and explain even dead reckoning is bad, how we can bootstrap our system. Let's say we have a basic estimate, and then at the same time we can collect the sensor data from the user. So therefore, we will have a location to sensor tuple. As more and more people walk in this environment, if you have more and more of such tuples, and then we can extract features from the sensor data, and then we can get this location to feature tuple. Because we are interested in the unique sensor pattern in the feature domain, so we run an unsupervised clustering algorithm on that, so this is just an illustration, but in the system, we have multiple features. There is a higher dimension. But the idea is that the same pattern here in the same cluster, you share the same feature pattern, but that's not enough, because we want to know where does this pattern really happen in the real world? Well, we can do that because we already have this crude location estimate, so we can map this pattern back to the floor plan. Let's see one example. Let's say this right point. Well, it turns out these right points also happened in a small cluster in a geographic space, which gives us a sense that it can be used as a landmark. But not everything is as good as this. For example, if you look at these blue patterns, it happens in multiple locations. So the way to deal with that is that we can leverage the Wi-Fi signal, but this is different from Wi-Fi-based indoor localization. You don't need to manually label the location and Wi-Fi mapping. As long as you know the WiFi signal of these two clusters are different, that's enough to distinguish them, so even if we use Wi-Fi, it still comes for free. And some patterns, like this green one, it just happens everywhere, so it turned out this pattern is because when a user is walking, they also create a unique pattern, but walking just had an error that's not helpful at all. So just because it's a sensor domain cluster, not necessarily means it's a landmark. And because we have this feature -- sure. >>: So does this work best in narrowly constrained hallway type of environments? What about like a mall, where the hall is much, much larger than some of the examples that you've shown? >> He Wang: Yes, so the system will work better in narrow corridors, not open spaces, but the system still can work in those areas, but with a little worse accuracy. >>: [Indiscernible] more people in the mall, for example. >> He Wang: Yes, it's not impacted by that, because we -- so when more people walk around, the Wi-Fi signal can flux a little bit, but this magnetic signal is quite stable, still, and initial pattern of the users walking past. >>: So in which departments did you test on? >> He Wang: We also tested our system in the engineering building, office building of different places, and we also tested our system in a shopping mall. Yes, so if you try to play with these features, we can generate different kind of landmarks. In our system, we use this inertial landmark, meaning we use the gyroscope and accelerometer data. We also have the magnetic landmarks and Wi-Fi landmarks. So because our environment has so many sensing data, and they are likely to be evenly distributed -- and also, we can always use this Wi-Fi signal to divide our area into subzones, so our assumption, our hypothesis, is that our indoor environment has enough landmarks for us for localization. So that just means we have found the stars in our indoor environment, and we can use this for periodically calibrating the user's location and then providing high accuracy. Sure. >>: Part of the assumption was so that at the landmarks, this set of sensor signals that you see, don't change over time or are invariant? >> He Wang: So the signal, in the short time scale, it doesn't change, but in the longer time, it does change over time. So what we do is after a period, we regenerate the landmark, such that it can be tuned to the state of the latest environment. That is possible because we have this data, it's automatically generated, so there is no manual effort to go there and label, label again and again. So potentially, what we can do is at the end of the day, you regenerate, and that should be working by at least one day. >>: So one thing is an example of when it changes your long-term demand and your short term? Do you have an intuition for why it is? >> He Wang: For example, the Wi-Fi signal, it should work for at least one or two days, and magnetic signal, that works even longer. I think it works for more than one or two months. >>: That would be like if someone was installing a new drinking fountain or something like that, then the magnetic signatures would change. >> He Wang: Yes, in those cases. The building structure changes, then that could change, but typically, it's quite stable. >>: [Indiscernible] coordinate system do? >> He Wang: Yes, so we need to manually label two places to figure out the outputs to the location. >>: [Indiscernible] walk around the place manually? So how do you bootstrap ->> He Wang: Right. Yes, so I will talk about how to bootstrap in a minute, but we need to manually label where is the building entrance and one of the stairs or elevators. But you don't need to go there. Let's say I can stand here and label what is the building entrance of Building 42. >>: [Indiscernible]. >> He Wang: The assumption, but the thing is that we don't need a detailed map how to label where is the walls. We just need to have -- you just need to tell me two information -- where is the entrance? Where is one of the elevators. That's our requirement. >>: Could you -- maybe you'll get to it later, I don't know. Are there any applications to a home environment? >> He Wang: Yes, I think there will be applications for the home environment. For example, if you leave the room and you want to automatically turn off the lights, or you want to track the age of the people alone at the home and how things are going on, I think there will be many applications. >>: I was thinking you would get fewer Wi-Fi signals, most likely. >> He Wang: Yes, I think for the home environment, you have fewer Wi-Fi, but we still can rely on the sensors to track your steps. Yes, sure. >>: Just curious, so what if many people are using a Wi-Fi hotspot, and they are moving, so will it impact this accuracy? >> He Wang: What if multiple people are using a hotspot and they are moving, under this scenario, what is the accuracy? So in our system, because we can rely on this step counting and tracking, and we also rely on magnetic inertial patterns, so we don't rely heavily on the Wi-Fi, so that means we only use the most stable Wi-Fi signal. We can afford that, because we rely on other stuff, so we use highly selective Wi-Fi signal and use the cruise features, so that increased accuracy. Yes. Yes, so that sounds good, but wait, right? So we assume that we have reasonably good dead reckoning, but what if dead reckoning system is not that good, which may be always the case in the real world, right? So what happens then? So our idea is that -- let’s say the user walked past three sensor patterns like this, white, blue and green, and different users will walk the same path, but just because of the tracking error, these paths are going to diverge from each other over time. But the initial point shouldn't be too bad. It’s reasonably good for the initial point, the dead reckoning. So if we can use those choices to figure out that this right pattern actually happened in a small area, it should be recognized as a landmark. And then we use the [indiscernible] of the landmark as the estimate. But this is just initial estimate. It's not accurate at all. This is not perfect at all, but it's a reasonable estimate. And then because these errors from different devices, different hub error and different phone, different user step size, counting data uncorrelated. So if you plot the real-world data, and this is one example, this right square is the true landmark location, and each of these blue points shows the estimated estimation from the dead reckoning paths. So as you can see here, they are really uncorrelated, because of the hardware noises and the human step size and all the patterns. So they essentially can be used as an initial estimate, and from that -- but the problem here is that these blue and green patterns cannot be recognized as landmarks, because they are scattered in a quite large area. You are not sure what will happen, right? But what we can do is, since we know the first one is a landmark, we can refine user's location to the first one, and then dead reckoning from there, and now the second one, this blue one, becomes this more area, and then we can recognize it as a landmark. And we can keep doing the same, and then finally recognize that the last one is also a landmark. So in other words, we can gradually grow the landmark from the origin building and then finally fill the whole building in a bootstrapping phase. But these estimates are not perfect. They have errors, but the good news is that we always have users walking in our building, so we can leverage these new user choices and fit new user choices to our system and to improve the estimate of landmark location. So here's how we do this. Let's say that a user walks in the building, and then we can always find whether there's a new landmark, and we can also find if it's the landmark, and then we can update the landmark list. And then when more users walk in the building, the landmark location estimation will become better, which in turn will improve the user location error, because the user relies on the landmark to reset themselves, right? If you have a better landmark, you have better user estimation. As more and more users walk, and then we have new data for our system, this landmark location error and user location error both decreases all the time. And we demonstrate -- so we test our system in more than six buildings, including shopping malls, universities and office buildings, and we test our system using five different Android models, with more than 20 users. And currently, it's running live in our lab CSL, and even though we use landmark generated a half year ago, it still can work robustly. And to get a quantitative evaluation of the system, we collect the ground truth from the shopping mall ECE and the CS buildings, and this graph shows the result. And the X-axis shows the error in meters, and the Y-axis shows the CDF, and different lines shows the system performance over time. And different lines shows the system performance over time. As you can see, the system performance improves, and after around two hours walking, we can achieve a median accuracy of 1.63 meters. So in summary, so our nature has diversity that's guided Charles Lindbergh to find his way to Paris, and then our indoor manmade nature, our indoor environment, also has the diversity that can be used to find landmarks and then help us improve our indoor localization system accuracy and allow us to achieve a median accuracy of 1.63 meters with no infrastructure cost and no manual calibration. Sure. >>: This goes back to the question asked earlier. Your summary numbers are kind of averaged over all the places that you tested, right? So in a place like this, what would you expect? In a place like that mall? >> He Wang: In a place like this, I think the accuracy will be worse than the corridors narrow. We don't have an exact number for that right now. Yes. Yes, so we test our system -- after publishing the paper, we continued working on this work for another one year and optimized different things, and we deployed our system in different buildings, and here is one example. The user is using our system, and what KDC on the film screen is showing on the left part of the video, and when a user passes by Room 246, an UnLoc system can precisely capture this. And we also have a back-end server that can automatically calculate all the landmarks from different users from the building and collect data from different users and generate landmarks and also realize the traces and do all the management, localization management. And we demonstrate our work to different companies at different places, and TKE is an elevator company who is interested in our system, because they want to use our system for their elevator scheduling. They have the elevators in Wall Street -- again, super-tall buildings, and people are tired of waiting there for the elevator for a long time. They want to use indoor localization to schedule their elevator, and during these demos, we assumed that the phone is in the user's hand and the user is looking at the screen and walking like this. But in our demo to Intel, they tried our system at different places. They tried to put the phone in the pant pocket, in the shirt pocket, and all kinds of places and orientations, and the system just failed in those cases. And this reminded us that indoor localization is not just about navigation. You cannot always assume the phone is in the user's hand and you're looking at this and wireless tracking. So the question is how we can still estimate a user's walking direction, even if the phone is in the user's pocket in other different places and different orientations, right? So we solved this challenge with another more business paper, but because of timing issues, I cannot go into details, but I would just show a very brief demo of that. So in this video, the green line shows the estimated walking direction from the user, and then while the user is walking, the user tries to hack the system. They're trying to change orientation of the phone, and we still can estimate the false on the phone and then quickly analyze the user's walking direction. And then we integrated the system and UnLoc together, and then Samsung purchased a research license of our work, and they are interested in pushing this to their Android platforms. So with that, I will move to the next part, but before that, any questions. >>: [Indiscernible] you said your accuracy was 1.63 meters. >> He Wang: Yes, in the median case. >>: Yes, so again, I'm not totally familiar with this space, but I do see recent work on indoor localization using Wi-Fi that talks about decimeter level -- decimeter, one-tenth of a meter. How does your thing compare? >> He Wang: Yes, so I am aware that many systems can achieve better accuracy, but typically, they will require deploying infrastructure, deploying hardware or you need to [indiscernible] the detailed Wi-Fi information, such that it can provide that accuracy. >>: The hardware, so has been known for ages, that you can take one approach, which is that you apply infrastructure and hardware, which can get a significant level of accuracy. In fact, you can [indiscernible] deploy a ticket system a long time back, which we kept saying was ultrasound and it [indiscernible]. And so the other one with [Kyle] and [Suchi] and all these guys ->>: Yes, this was [Suchi] and ->>: Yes, they're deploying -- [Bridget's] deploying an extra piece of hardware at the access points. >> He Wang: Yes, so in this cases, it may not scale to all the buildings easily. >>: I actually missed -- I'm so sorry. I thought it was 10:30. I came in at exactly 10:30. Anyway, what is the weakness of your system. >> He Wang: Yes, so the weakness, I think, our system is still in some sense relying on the dead reckoning, relying on the step counting and the direction estimation. So even though we kind of solved the problem when we changed the orientation and put it in different places, but the user behavior, they could be not as our expectation. They can just check the phone like this, like I'm talking, and then the system it thinks they are walking. So in some cases, we can still rely on some Wi-Fi as the lower bound, but those cases, the system needs to handle carefully. User behavior may not always be as expected. >>: Did you try -- well, what were the challenges of doing it passively? So when you said that it might be in your [indiscernible], but if the user is just -- they're not really trying to locate right now, they've been walking around in the mall for 10 minutes, they can't find their store, and now they want to start localizing. And so you'd have to keep all these sensors on, for example, when it passes them, so people might not like the battery drain. >> He Wang: So in those cases, we have achieved a sensor arm. The tracking is not really a problem, because it's relatively in a stable place, even if it's in the pocket, right? But what about energy? We don't have an energy number, but I think as continuous sensing are quite popular, and I know that many of you are working on continuous sensing projects, and newer smartphones have the additional hardware that can -- additional CPU, lightweight CPU, that can process these easy tasks such as sampling and do some simple calculations. So using that, that will reduce your energy cost. Yes. So yeah, let me move to this pose-estimation problem. So here, what we want to do is, we want to understand the pose of the user, and this is working in these small spaces. So can we track the arm pose for the user using just smartwatches? So by posture, I mean the 3D location of wrist and elbow, and there are a couple of challenges. First of all, there are noises in the sensors, so accelerometer and gyroscope, it's not perfect. And over time, these errors will accumulate, so you cannot just use double integration. And also, this accelerator data is only on the user's wrist, so smartwatch is on the wrist, so how can you infer the elbow location? It seems we don't have enough data. And then the third one is that we don't have training data. We don't want to train the user on a specific set of gestures such that we can work for those gestures. We want to do freeform gestures, so those are the challenges we are facing, but this arm posture tracking problem is not new at all, and many researchers have looked at this. For example, from robotics or biology domain, signal-processing domain, people have been looking at this problem. And most close work to us is one work that is also trying to use the smartwatch to figure out user's pose. So what they do, they leverage a couple of opportunities. First of all, our elbow is always going to be on a circle around by our shoulder, and then the wrist is on another circle surrounded by our elbow, so based on these two concepts, we can actually narrow down the search space quite a lot. And then people also borrow the medical information domain, because our arm has five angles of freedom, and then each of them has the range of constraints. For example, in this [indiscernible] here, our arm can only move from 0 to 150. We cannot move in a negative way. So if we combine these constraints together, we can narrow down the search space, but that's still not enough. So that's why they also train a preset of 15 gestures, and then they can do a pretty good job there. But what we want to do is we want only to use the smartwatch, but we also want to do the freeform gestures. So there are a couple of opportunities we are going to leverage. First of all, once we know the orientation of the watch, that can be very valuable to us. That can infer the user's wrist and elbow location. And second of all, if you used acceleration from the watch, that can show some information about a user's movement, and we combine that with a hidden Markov model, then probably we can get a better estimation. And of course, we can also leverage data structure to improve the speed of our tracking system, so let me first explain what do I mean by watch orientation. By watch orientation, I mean the three axis pointing direction of the smartwatch. And now we found that -- so for a given orientation, basically, we iterate all the angles possible, and then from that, we can find a subset of a combination that satisfies this particular orientation. With this subset theta, we can map this theta to the possible location of the user's wrist and elbow. Sure. >>: You said five degrees of freedom -- three here, two here. What is this? Isn't this another one? >> He Wang: Yes, yes, so there are ->>: Wrist rotation? >> He Wang: So here I think we have ->>: Move the slides back. >> He Wang: So here we have three -- one, two, three, and there is one. There's one here, actually. So that's one that comes to here, so when you do this, it comes to the upper part, so we call it there. Yes. And then so basically, from these five angles, we can try all the combinations and see which one satisfies this orientation constraint. And from that, we can map back to the points, and see these are the possible orientations that satisfy this orientation constraint. And our key observation is that, supposedly, another three orientations of the watch, the possible wrist and the elbow locations are quite limited. For example, in this case, when our orientation is like this, the possible wrist and elbow will be like this, right? So as shown in these green and red dots, and it's quite limited, so let's try to quantify that. Let's see how good is this opportunity. So we can easily do this, because we can always try different combinations and see how it goes, right? The metric we have is the area of the point plot here, and divide by the whole sphere area, so what is the region. If the region is low, then that's good news to us, and we plot this CDF. Access shows the ratio and Y-axis shows the CDF, and we found in the median case, we typically have 9% of the sphere, so that means it's a really small area. If you just use the central, probably you can do a reasonably good job. But what if -- how do we know that -- how can we figure out if the user's within this constraint, within this narrowed down area? So the idea ->>: [Indiscernible] the graph like that? But your 80th or 90th percent of degrees are, right? It's 30%, 40%. It could be anywhere in that sphere. >> He Wang: Yes, so in many cases, it's not that good, and that's true, and I will just explain how we can improve this. Yes. So the opportunity is using the accelerometer data, and from that, we can get an inference of the user's wrist movement. Let's say one example, right, when the user is punching like this, so the orientation of the watch doesn't change at all, but your elbow is going to move backward and forward, and so the tracking -- the naive method, you think, you don't move, it's a static point. But since we know acceleration of the watch, we should do better than just giving us static estimation, so that's the intuition, so how we can leverage accelerometer data here. So the question to understand is the real sequence, right? From that sequence, this is like a location A and B and C. From that, we can actually infer the acceleration. And we also have the real-world acceleration from the user's watch, and from this choice, for any given choice, we can -- from the two locations, we can infer velocity from. To velocity, we can infer acceleration, so basically, the question is how we can combine this inferred acceleration from the smartwatch with what the acceleration that [trees] give us, and then we can bind these together and consider the noise model, and then figure out what is the best possible estimate. So then, the thing is here, what we can do easily is that we can use the third-order hidden Markov model, right, so we can combine three states together, and then if you consider three states, and then you can track the acceleration. And from that, it can combine these together to build the arm choices, right? But the problem is that this is slow, and this third-order hidden Markov model cannot be solved efficiently, so we need to reorganize these things to make that efficient algorithm can apply. So if we combine these three locations into one state here, then we can use -- that becomes a first-order hidden Markov model. Then we can use a Viterbi decoding algorithm, which is efficient for decoding first-order hidden Markov models. So the idea is to build three adjacent locations to one state, and then because two adjacent locations will tell you the velocity, and then from that actually each state itself you embed this acceleration information. And if you also have this acceleration environment from the watch, so we can combine these together and also take this noise model into consideration, and from that, we can infer the choice. We also consider the continuity, because your choice is going to be smooth, so this overlapping should be the same, and we also consider the point cloud limitation. Given this orientation limitation at this particular time, your possible candidates will be in there, range. >>: [Indiscernible] Viterbi? >> He Wang: Yeah. >>: Viterbi's already N-squared in the number of states, and this is state explosion -- it's like an [indiscernible] in the number of states in which you're doing the Viterbi. How does that? So why does this benefit in terms of speed? >> He Wang: So I think what you are saying is exactly right, because ->>: Something like this is not Viterbi, but do some kind of beam search, so you don't actually look at all the Viterbi paths. >>: And why would you use -- you're not using multiple states here, right? You're first order, essentially ->> He Wang: Right. So previously, if you consider three states separately, then that would be a third algorithm, so we mold them together to use Viterbi. If you have three states, probably you have to search that in exponential time. Now, the least that we can do it is in polynomial time. >>: Are you actually doing full Viterbi over all the states? >> He Wang: I try to reduce that in the next slides. Like the slide, let's say we have N possible locations, right? And then the T time steps. Then our state number will be N-cubed, and then the running time will be N to the power [indiscernible], and N will be a huge number, like 1,000 possible locations on a sphere, so that's not quite acceptable, and how we can reduce the number of states. So we try to reduce the number of states by actually only looking at two adjacent locations. From the two locations actually now for each state, you can encode the velocity -- not acceleration, though, but we can always build this acceleration transition into the state translation, so now from one state to another state, this translation, it shows the acceleration. Now, you can move this acceleration from the observation to the transition. Now, at this point, you can then advance forward. Then we can have -- of course, we also have this continuity and this orientation clustering, but we can reduce the state number to N to the power 2, and now we can do N to the power 4, all divided by T. And we can even do better, so because we have this continuity constraint, because this stage has to be smooth, right? So that means in a Viterbi decoding, each stage, many of them will be zero, and so if you can reorganize this such that they happen in a continuous trunk and we label the start and endpoints points, then we can fully reduce the ON complexity, so then we can get something like ON to the power 3, divided by T. >>: [Indiscernible] large-scale instrument tracking, you don't try to run Viterbi, right? We run some kind of a beam search. We just run the most promising K paths or something like that. So even N cubed seems large, and you're running a N cube algorithm on a large N on a phone, is that's what's happening? You're running this on a phone? >> He Wang: No, not on the phone. We run that on the server. >>: The model there now, I thought it was running on the watch. >> He Wang: On the watch, right. So there are two tradeoffs. There are two system design points. Well, one is you can achieve real time by using a simple method that's just you calculate the centroid of the point cloud. That can be done in real time. And then the other is for you're offloading the data to the cloud, and then that will give you an offline result for other purposes. Let's say understanding your activity over the day, precise hand activity recognition, but that cannot be done in real time. >>: The transition properties? >> He Wang: I'm sorry? >>: How do you determine the transition properties to begin with for your ->>: How do you train the agent? >> He Wang: Right, so this transition probability actually is here, right? So it depends on this noise model, basically, and in the noise model, we do that by putting the watch not the table, and then we see what is the variance, and from that we generate its parameters. Yes. And then this figure shows the accuracy. The X-axis shows the error, and the Y-axis shows the CDF, and this black line shows the real-time accuracy, which means we just use the centroid, which is not always working. And this red line shows what we can do offline using the hidden Markov model. I think what you are saying is right. It may not be the best and fastest method, but at least in theory, it can give you the best estimate. It is not in real time yet. >>: There's been a bunch of work in the graphics community and the motion capture world, where they've actually done similar work with accelerometers, as well, and the way they solve the scaling problem there is you know how in Viterbi, you look at all the paths using dynamic programming and you get the N-squared. Instead of using all the paths at every point, you can just pick the K most promising paths. It's a standard thing called a beam search, so you focus on that, so then you can do the HMM much faster, so you might want to try that. >> He Wang: I think that's a good system. >>: So this notion of getting point estimation from accelerometers, as we've done in the motion capture community doing that, and they use all the similar insights there, so you might want to look this over. >>: But will the speedup be enough to run it on the watch? >>: Well, I don't know about the watch, but it's certainly not N-squared at that point. So Nsquared is what you're starting with. >>: It might be [indiscernible]. >>: It's probably implemented at this, right? >> He Wang: We implemented this. Currently, for a one-minute choice, we take 10 minutes to process. It's like 10X time. >>: On the watch or on the server? >> He Wang: On the server, on the server, but suppose we have more computing power, probably we can do it better, or you can try to use different beamforming hidden Markov model decoding. That could also speed it up. But I think in the near future, I don't think this is really possible on a watch, but offline, anything. And we have -- yes, sure. >>: How did you measure the ground through here? >> He Wang: We used Kinect. We used Kinect. >>: The main use of Kinect. >>: Going back to the question about determining transition probabilities, you could put people in front of a Kinect and also have the watch on and have them move and then report data that way. Did you guys do that? >> He Wang: So if you do that, then there are many problems, because the acceleration tracking error also depends on what orientation mechanism, so all this mixed together is hard to record in a precise watch orientation, so that also has an impact on who good is your acceleration, right? >>: At least you can record elbow position, right? >> He Wang: First of all, your watch is on the wrist, and even if you put it here and try to estimate this, Kinect is not that good yet to give you price accelerometer-level model. It's not that good yet, right? It can do ->>: Its frame rate is sufficient for that. >> He Wang: So you can see the acceleration is kind of similar, but that model is not good enough, because it's not that precise. Let's say you have a two-centimeter error. With that, you have a large impact on your acceleration estimation, but Kinect cannot give two-centimeter accuracy today, especially for this kind of tracking. >>: We found its accuracy to be quite good, centimeter for sure. >> He Wang: Maybe you are using the latest Kinect. >>: We're using actually an old one, but that's okay. We can take it offline. >> He Wang: If that's good, then we can do that, too, right? We can build a better model if possible, right? And we have this demo, so this user is using the smartwatch and then trying to write in the air, and then writing ABCDE, and this red line shows the Kinect, and then this dot shows what our system tracking result is. So even though here we let the user write ABCDE, but in our system, we also do freeform gestures and all kinds of evaluations in the paper. >>: Do you do this on the watch, or is the one that's done ->> He Wang: This is a version that runs offline, so this is an offline result. >>: What would it look like if you ran it on the watch? >> He Wang: We have that video online, too, and the [indiscernible]. I think I linked that, too. We have a side-by-side comparison in real time, offline and ground truth. Yeah. So the offline accuracy here is around 8 to 9 centimeters, and this online real-time tracking is about 5 to 13 median accuracy. And with that, I want to very quickly talk about the last piece of work, which is about hand tracking, hand gestures. I spent only one minute about this, so the question here is, can we use the smartwatch to understand what the user is typing on the keyboard? So that may be a security problem or a privacy leakage, because we use the watch to track steps or calculate calories. What if the user can use the smartwatch to understand what you're typing? So we have a couple of challenges. First of all, as before, this noise, the sensor data on a watch are noisy, and secondly, your watch location is not necessarily your finger location. You type ASDF, your wrist doesn't move at all, right? And third, we don't have the right-hand data. We have only one hand watch, right? And then we solve this -- and also, of course, we don't have training data. When we attack somebody, this guy won't train the typing for us, so we solve these challenges using [indiscernible], Kalman filtering, and we also borrow the English word structure from the dictionary, and we combine these together, and we have detailed results in the paper. But in short, when you type a word longer than six characters, our system can on average shortlist 10 words that you include the word that you have just typed. For example, if you type confident, and out of 5,000 words in our dictionary, we can rank confident in the second. And so in summary, so our devices have this computation in and sensing power, and we can use those as sensing and computing lens for a society. It is possible by using motion data precisely, we can use this to build interesting inferences on the humans, and my research is focused on designing these inferencing techniques, using a multi-modal sensor data from the mobile devices and do this location posture and gesture estimation. And our research goal, the ultimate goal, is building the system that has impact. And beyond that work that I have talked about, I also work on outdoor localization, context awareness, search, mobile security and augmented reality. And moving forward, I plan to spend this human inferencing techniques for several years, but I will broaden to not only the motion sensor. I will do other research to other sensors, sensing dimensions. And I will do both bottom-up and top-down research. For the bottom-up part, I will not only do this indoor localization and postures. I will also want to understand the finer grain of finger gestures, cross-sourcing and other behavioral analysis. And I also look at it as a privacy problem, because as that project has showed right there, privacy challenges, even though the sensing are good, it's a double-edged sword. It can have privacy concerns. And then at the top down, I will try to elaborate these unreliable system building blocks to build the systems applications, such as augmented reality, smart home, vehicle analytics and mobile health applications. With that, I would like to thank you for your patience, and I am happy to take any questions and comments. Thank you. >>: Do you [indiscernible]. What do you think is the single biggest opportunity here in that space that you talked about? You talked about many, many things. What do you think is ->>: Can you show us your slide? >>: What do you think is the single biggest bet or opportunity that we can make in this space? Because to some extent, things like accelerometers and these sensors, it seems like everybody's doing something or the other, and lots of stuff has already happened, so what's the biggest possibility in your mind? >> He Wang: I think I would focus on mobile health and physical analytics that will require many underlying techniques that are still not ready today. And also, we have these variable devices and HoloLens, right, and all these devices together. How can we leverage those together to build detailed fine-grained inferences? >>: How do you use HoloLens for mobile health, for example? >> He Wang: For mobile health, I'm not really sure, but if you have some sensors on your head, I think that could maybe -- I'm not fully sure, of course, and have never played with that, but I think maybe you can use the sensor data to infer your detailed activity recognition and you can combine those sensors with your smartwatch and mobile phone together to do something interesting. >>: And do you think we are -- you grayed out indoor localization. Are we done? >> He Wang: Not yet, I think. It's still not there, of course, and yeah. >>: Okay. Are you done? >> He Wang: For indoor localization? I think there are interesting ideas, and the collaborations are definitely I'm open minded to work on that, but I am not saying I will have to work on this for a couple of years. I'm not sure. Yeah. Okay. Thank you so much.

>> Ganesh Ananthanarayanan: We are excited to have He... UIUC who works with Romit Roy Choudhury, who is quite...

Related documents

Products

Support

&gt;&gt; Ganesh Ananthanarayanan: We are excited to have He... UIUC who works with Romit Roy Choudhury, who is quite...

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib

>> Ganesh Ananthanarayanan: We are excited to have He... UIUC who works with Romit Roy Choudhury, who is quite...