23438 >> Sharad Agarwal: Okay. We'll get started now. It's my pleasure to have Professor Junehwa Song from KAIST. KAIST is in Dae Jung in Korea. And Professor Song actually spent a number of years before he was at KAIST at IBM TJ Watson. He's going to be showing us today about some interesting applications that he's built, context applications using wearable sensors and sensors in the environment. Thanks. >> Junehwa Song: So parts of the world is quite getting old. But I packed it up to present potentially everything what I have done, I have been doing so far. So started around near the year 2006 or 7 and continuing on to now. So we have been developing a platform for -- mobile platform to support contextual aware applications. Context aware applications is good and I believe it's the future direction of mobile applications. However, the problem is it's difficult to develop. First of all, it's complex. Sensing feature extraction recognition and all those things. The problem here is that usually I believe that every developers are not good at those. I mean, it should be those who know something about machine learning or pattern recognition things. So it's not common yet. Right? That's one thing. And the second problem is computational happy. And the third, if it is mobile and then also if we use limited sensing devices, then it's resource and energy limited. So we have two problems. Complexity in the logic and complexity in the environment. So we wanted to provide developers with a platform that includes refractive runtime environment as well as to use APIs. And I recently saw a presentation from MSR as well somewhat related in this year's census. I was also there to present different things. So the key building block of context aware applications requires complex monitoring. This is common to many different applications. So it is good to know that it is cabling, it is again multi-step continuous processing, should do it with multiple sensors, multiple devices, to get intelligence. So we use built-in sensors, use body sensors, also use embedded sensors to get the most of what we could extract. So look at the problems, the challenges here. >>: What is pan scan? >> Junehwa Song: [inaudible] network. That's what I'm saying. It includes built in phone sensors and body sensors and personal area sensors. So we support dynamic system to get connection to the sensors in those three steps. So here is the phone and the sensors and applications. So this is a reasonably good system. However, it is still small. It will get better. And we have a number of sensors. We can use a number of sensors. And we can also -- we should also support a number of applications here. One big difference is that, again, we should not only support application-by-application. If it's a platform it should support multiple applications at the same time, I believe, right? So that should be the platform. So if we think about multiple applications at the same time then here in fact scalability problem occurs, because the system is rather small scaled. The hardware is rather small. So even if it's not that serious system as in what we see in the Internet, it's still considering the resources here is quite, it's getting quite complex. The further problem is that sensing should occur continuously. And there's all the computations following should occur continuously. It will complicate the problem further. So we have natural scalability problem considering the resources and energy problem and also once we say that multiple applications run at the same time, we should do resource code -- resource management, because the current way of developing applications using this kind of sensor it's rather ad hoc. Start from the application layer. So we are providing APIs to applications. So that the application developers just use that high level API and don't care about what's going on underneath. And also the system provides abstraction to the hardware sensor devices, and in the middle it should do a bunch of things. So there's two things what I will talk a little bit about this but render on the second one. I planned to talk each of those bit by bit, but I don't think I will have enough time. I'll probably stop around there. So CMAN, it provides a very high level -- I will go very fast about this. But the users are given like a coded length type of interface, I mean the developers, so that they can use this to specify their request in a declarative way using high level presentations. So context is, for example, represented in this way. Activities running and temperatures hot, humidity is wet. And this one is from false to true. So it means that they want to, the program wants to know if this condition is getting satisfied. So it wants to know the change, not the context value itself. Because it is important because it's rather event-based system -- I mean, what the application usually do is not to know -- it's more to know the changes so that it can trigger the services. So we include it as part of the language and the duration specifies the time. So given this high level representation, we do translation. So this is the data structure in the system doing the translation we have context translation map. So all these provides the mapping so the result is something like extra meter Y value, Y energy value is something larger than 52. And accelerometer DC is less than 500, et cetera. So the good thing about this is once we have this representation, this one has lower level, medium level representation. For example, it has -- we mapped the high level representation towards those including the research information here. So we have sensor information here. So from this we can do system optimization to some extent. So it's just like compiler we use. So one example we do is from that medium level representation we can extract what is really -- the sensors we should use and those that we don't have to use. So here we did things like short-circuit evaluation. So that if we are given this kind of curious like is the weather hot and humid, and let's say we know that it's already hot or humid, I mean, I'm sorry, the other way. If it is cold, then we don't have to know the other one, right? So in that way we can identify what are the sensors we don't have to use. So we do evaluate on those. So what we do is that unless given a false context, complex false context, then we can identify what makes them false, right? And then we don't have to worry about all the other things until something comes true, right? >>: So for more complex -- how do you do the translation into the ->>: No, no. Video, I don't. To be honest, I think video is quite different, well-studied type of sensor. So we can do to some extent if we really go, dig into the different parts of the video encoders and decoders, right? But I'd rather say use it as a canned system for me. So it's rather -- I would exclude video. But I do some of the sound. But in the sound as well, if we do the regular sound recognition, I would not be very much interested because, again, it's very well understood process. However, if we use part of it, for example, let's say if we want to understand who is, man or female, or are there -- how many people are there? All those things are variations of those and then we can use part of it, I mean this kind of idea. The reason what I am saying that is that whatever I do, there will be better ways of doing it which have already been developed if it is the regular processing. >>: So sounds a little bit like that case, where you guys go in and inspect that by hand to reverse engineer how it maps. >> Junehwa Song: Yeah, we did. We did. Yeah. But it's not easy. It takes a lot of time for students. However, it's relatively easier than doing it with video. That's what I'm saying. So anyway, I will just go briefly here. The high level idea is once we know this is false and we don't have to turn on all the sensors, we can extract the number of -- the sensors which we don't have to worry, right? So that's one idea. So here we could save about half of the sensors, and the reason for that is think about the context, the situation. Let's say there are tons of questions from the applications. Then how many of them will be true at what moment? I think it should be just small. For example, let's say there's tons of questions about the locations like ten different location-based applications. It can ask if I'm in this room, if I'm in this building, if I'm in this area. All different kinds of questions. And also there could be many ones, but I believe not too many of them will be true because I'm in a certain situation, right? So many of those will be false. So even if there are a number of context monetary requests, the number of those which is true at the moment will be quite limited, right? So this heuristic really worked how it worked and you could save a lot. And the second idea. The second idea we developed was -- so here we are dealing with multiple questions at the same time, right? So we could do optimization considering all of them together so we developed a shared processor as well as incremental processor from some heuristics of the requests. So, again, this helped us to improve the performance at the minimum like three or four times in the CPU timeline. So those were the first ideas we developed which has been quite old. Now, once we have that, the second question is, again, if we run multiple applications at the same time, then we should do further. Let's look into it a little bit more about the environment. So here we have a number of applications, which share scarce resources of the system. And so, for example, this Micaso mode [phonetic], which is quite old now, but anyway, that can run even less than one FFT, even if it is lightweight. So it's very limited. And we are considering running multiple applications. It's problematic, right? And also it's dynamic. Users are moving around. So we should identify sensors at runtime, dynamically and connect to it. And applications will come and go. So one is that resource scarcity and energy scarcity and also we have dynamics of the system. So applications cannot do it. See, it's not the matter of difficulty. To run at the same time together sharing the system, applications should know what others are doing. So we should provide system support. But the problem was that in this kind of environment we didn't have that kind of support. So we just designed a system to -- okay. So briefly, again -- so current system happens to be what's done in the application layer. They specified low level resource status or resource requirement from the application layer and the system just to rescind said do it if you can do, right? So here the system specifies again the high level -- the application specifies the request in high level, right? And the system reserves it and translates it and investigate it and analyze and identifies what kind of resource requirement. And he also understands what other applications are doing and what the system has now and does the resource binding at runtime dynamically. So the system should provide holistic view of the applications and resources. So the key idea here is that -- we are providing high level representation, APIs to the users, for example, if somebody's running and there are many different ways to identify that fact. There are two levels. One is the sensor level. We can use different set of sensors to identify if the user is running, and the second is we can also use different logics for it, right? So we use that alternative resource usage and logics to provide different plans to the system. And so sitting in between the applications and the resources, the orchestrator has the holistic view of the system and understanding of the system as well as the application and use that flexibility to orchestrate this whole system. Example again is that. Let's say we have application ABC. Application A is translated and the system prepares two different plans. Application B is transferred into two different plans, and C into three. Among those seven, so for each plans we have -- so this plan B-1 uses extra metal on the wrist, and it also, in more detail it also has the different processing methods. So in this case it uses frequency domain feature extractor and decision tree and all these are done on the mobile side in this plan. And this plan, plan B 2 it uses a meter attached to the belt in this case and uses statistical feature and decision tree again. However, in this case the processing of this part is done on the sensor and the mobile side is only the classification is done. So this way we can select different nodes, different processing method. Also use different -- do the computation in different parts of the system. So also we have resource demand map analyzing the plans. We prepared resource plans, demand to CPU and bandwidth and energy, et cetera, and the system in the background prepares the availability map by monitoring. Basically, the system undergoes continuous system monitoring to abstract what's available now. So it has memory bandwidth and energy in this case. And comparing the different plans against what is available now we can select what are possible. For example, we can have -- this one is available according to what we have in the resource side, but part of this is not. We further on developed a framework to define and enforce different policies. For example, we can -- one policy can be maximizing the number of concurrent applications. In that case we in translate the policy in force, on top of this constraint, matching. An adult can be optimized resource, the energy usage. So we can say the amount of energy to the minimum. We can do all different kind of things using the -- yes. >>: So I didn't understand this. Does B 1 and B 2, those are both provided by, is it the producer? >> Junehwa Song: Prepared by the system. >>: From the query. Sing query B, derive both. >> Junehwa Song: You're right. If you remember the complex translation map that's prepared by the system so that should be configured by external expertise, right? So eventually what we should do is to have kind of ontology thing which probably exists in the Internet so that we can collect all the knowledges from different people so that it can be imported to the system and used as an extension of the map. So this shows the overall architecture of the orchestraters. We have the APIs and application broker, and we have planning, processing planning part. We have resource monitor, which monitors the system resources in the ground and we have policy manager plan generator and plan selector. This is the part for the processing feature extraction and recognizer. And the sensor part, the sensor broker part is the part where it's communicated with the sensor. Also in the sensor side we use TinyOS in this version, and on top of tying OS, we have resource monitor. So the sensor itself monitors resources including CPU and memory and also energy and communicates through the mobile broker, with the mobile side orchestrater so that orchestrater can do the orchestration. Also, the process -- plan processor on the sensor side makes it possible to participate in the processing part if it is requested as such. For the communication protocols we have developed a suit of protocols including sensor detection protocol, sensory protocol and controlling protocol and data reporting protocols. So this provides infrastructure to develop a kind of microdistributed system around a mobile device dynamically. So we tested with about less than around ten sensors. Many of them are on bodies, some are environment. I will just go very briefly. To show the performance of the system, this was different from just simply showing the through put or response time, because we should have shown it works under different environments, the dynamic and changing environment. So what my student did was to simulate the changing environment. What they did was they divided a timeline to four different phases. And the first phase they controlled the number of available sensors to two. In the second phase they controlled it to four, and four to six, and six and eight. And, again, probably 2.5-minutes they changed a number. So every 2.5 minutes they increased one or decreased one. The total number was controlled in this range. And also they did similar thing with number of requests from the applications. So they controlled in different phases between the number of choices between five to ten, 10 to 15, and 15 to 20. And that shows the workload. So it shows the result. So this is a phase A, B, C, and D. So in phase A and D, here is -- this case is the number of queries fixed to 20. And we tried to see number of activated queries and number of activated sensors. And in phase A and B, orchestrator supported much more number of requests where a little bit less number of sensors. And in phase C and D, again, it could fill up, not all the requests, but the other case was less than that and still the number of sensors was very controlled very tightly. So an average energy consumption was because it used less sensor, was controlled a little bit lower. But number of active queries was almost doubled. And a similar thing was observed when we controlled -- when you fixed the number of sensors to six and looked at the same thing and again what's confusing here is that in this case this color show -- remember I was just in performance. So, again, it's about half of the sensors, even if we have six. We need to control it less. And showed the performance of the supported queries similar. So that's in a very high level what we have. In the background, we have an infrastructure for monitoring the system and both sensor pattern and mobile part. And we also have sensor generation part, sensor detection part and policy, although those things are a regular system thing, and actually it's getting larger system now, and we are extending still toward larger scale. >>: [inaudible] format? >> Junehwa Song: Right. That's -- I wanted to show video but I was given -- the second part. So we have mobile device. And it should really be mobile support, mobile thing. It's not, for example, let's assume you're running. Let's assume you're coming to work and you get a phone call. And if you stop and push out your phone and at the minimum you should touch five times. You should push this button and touch probably five or six times and think about your jogging and you're listening to music and you want to skip to the next music. You should first stop and touch at most, around ten times. You don't want to do it. So the mobile device should really support mobile situations, but it's not because it's due to the limited interfaces. So now Microsoft is doing gesture or voice to integrate with mobile -- without much attention. So this work is about developing mobile gesture interaction platform. Again, it's a platform. So we want to support application developers. We don't want to spend time in developing the detailed complex logics and care about the systems. So by providing this platform, the developers can develop their own application. I mean, interactions rather easily. The problem in that case was, again, energy and in this case one of the problems was the gesture recognition accuracy. The energy is natural. In this case because we don't want to hold the phone, we put it into the pocket and we use, watch-style sensor node. So that requires more energy. And what's interesting is a lot of people that worked on gesture recognition, however in mobile -- they say mobile, but in reality they did in nomadic situations. So all their experiment was that you move and stop and do the experiment, do the gesture, and experiment, and you move again. So we wanted to do a real mobile experiment. So these are the two things we have done. So what we did was that one thing for the energy problem, what we did was we used collaborative architecture. We developed an architecture where two devices, the sensor and mobile node, collaborate to save energy, and the second is the sensor node itself, uses two different sensors, gyro and accelerometer, but in a clever way so that here these two sensors have very different characteristic. Gyro is very good for gesture recognition. However, it requires a lot of energy. So we instead of using the gyro all the time we have accelerometer in the front. This is chip, but the accuracy is bad in mobile situations. It is not robust to the mobility errors. So what we did was we put this one in front of that and made a feedback loop between the two. So on this part it does the segmentation, but in an adaptive way by having this control, closed control loop. So in that way we can achieve the same accuracy level of the gyro provide, and the energy is saved much using this accelerometer. So what it does is gyro makes the accelerometer adaptive to the technology, the mobility noise situation. So by that we could save the energy on the sensor node about 2.4 times. And the energy on the mobile side was reduced about this much. 43 percent. One interesting thing here is that the energy savings here is due to the segmentation on the sensor node. The reason for that is say segmentation identifies the potential segment of gestures. So what it does is that rather than send in all the sensor data to the mobile node, it sends chunk of data and the interesting thing is that human gesturing intersection is sporadic. You don't do the gesturing input all the time. You do it for a while and don't and do it for a while and don't. So that dissenting behavior is chunked, right? So for the other times, the mobile device can go to lower power sleep mode more easily. And that's the big issue in energy in this environment. So this could achieve a lot of savings. So in that way -- actually, I should stop here. So in that way we could solve that problem of energy and the accuracy as well. But there is some more optimization we should have gone through to deal with in different mobility situations. So here in this experiment we did mobility experiment the standing position and walking position and running position and the situation in a car. So we picked up those four cases as representative mobility situations and we could achieve about 15 -- 90 -- I'm sorry, 96 points of, percent of accuracy, which is good enough in this kind of rather coarse level gesture tractions, but we should learn more to come with rather final level ding, but so far we could just do this kind of remote control type of ding like you run and you have a phone called in. If you want to receive the phone while you're running, you do this gesture and if you don't want to receive the phone, then do this. If you do this, then volume up. And if you do this it's volume less. Volume down. And skip and go back. And so this kind of gesture could be successfully complemented. So we implemented an application for MP3 player control while running or walking situation. But we are doing further expanding the system towards a different application. And different things. So those are a bit of our mobile platform to support developers and users as well in two levels. One is the context monitoring level and another was interaction level. Okay. So I will stop here. >> Sharad Agarwal: All right. Thank you. >>: Thank you. [applause]