23438 >> Sharad Agarwal: Okay. We'll get started now. ... Professor Junehwa Song from KAIST. KAIST is in Dae...

advertisement
23438
>> Sharad Agarwal: Okay. We'll get started now. It's my pleasure to have
Professor Junehwa Song from KAIST. KAIST is in Dae Jung in Korea. And
Professor Song actually spent a number of years before he was at KAIST at IBM
TJ Watson. He's going to be showing us today about some interesting
applications that he's built, context applications using wearable sensors and
sensors in the environment. Thanks.
>> Junehwa Song: So parts of the world is quite getting old. But I packed it up
to present potentially everything what I have done, I have been doing so far.
So started around near the year 2006 or 7 and continuing on to now. So we
have been developing a platform for -- mobile platform to support contextual
aware applications. Context aware applications is good and I believe it's the
future direction of mobile applications.
However, the problem is it's difficult to develop. First of all, it's complex. Sensing
feature extraction recognition and all those things. The problem here is that
usually I believe that every developers are not good at those. I mean, it should
be those who know something about machine learning or pattern recognition
things. So it's not common yet. Right? That's one thing.
And the second problem is computational happy. And the third, if it is mobile and
then also if we use limited sensing devices, then it's resource and energy limited.
So we have two problems. Complexity in the logic and complexity in the
environment.
So we wanted to provide developers with a platform that includes refractive
runtime environment as well as to use APIs.
And I recently saw a presentation from MSR as well somewhat related in this
year's census. I was also there to present different things.
So the key building block of context aware applications requires complex
monitoring. This is common to many different applications. So it is good to know
that it is cabling, it is again multi-step continuous processing, should do it with
multiple sensors, multiple devices, to get intelligence. So we use built-in
sensors, use body sensors, also use embedded sensors to get the most of what
we could extract.
So look at the problems, the challenges here.
>>: What is pan scan?
>> Junehwa Song: [inaudible] network. That's what I'm saying. It includes built
in phone sensors and body sensors and personal area sensors. So we support
dynamic system to get connection to the sensors in those three steps.
So here is the phone and the sensors and applications. So this is a reasonably
good system. However, it is still small. It will get better. And we have a number
of sensors. We can use a number of sensors.
And we can also -- we should also support a number of applications here. One
big difference is that, again, we should not only support
application-by-application. If it's a platform it should support multiple applications
at the same time, I believe, right? So that should be the platform.
So if we think about multiple applications at the same time then here in fact
scalability problem occurs, because the system is rather small scaled. The
hardware is rather small. So even if it's not that serious system as in what we
see in the Internet, it's still considering the resources here is quite, it's getting
quite complex.
The further problem is that sensing should occur continuously. And there's all
the computations following should occur continuously. It will complicate the
problem further.
So we have natural scalability problem considering the resources and energy
problem and also once we say that multiple applications run at the same time, we
should do resource code -- resource management, because the current way of
developing applications using this kind of sensor it's rather ad hoc. Start from the
application layer. So we are providing APIs to applications. So that the
application developers just use that high level API and don't care about what's
going on underneath.
And also the system provides abstraction to the hardware sensor devices, and in
the middle it should do a bunch of things.
So there's two things what I will talk a little bit about this but render on the second
one.
I planned to talk each of those bit by bit, but I don't think I will have enough time.
I'll probably stop around there. So CMAN, it provides a very high level -- I will go
very fast about this.
But the users are given like a coded length type of interface, I mean the
developers, so that they can use this to specify their request in a declarative way
using high level presentations.
So context is, for example, represented in this way. Activities running and
temperatures hot, humidity is wet. And this one is from false to true. So it means
that they want to, the program wants to know if this condition is getting satisfied.
So it wants to know the change, not the context value itself. Because it is
important because it's rather event-based system -- I mean, what the application
usually do is not to know -- it's more to know the changes so that it can trigger
the services.
So we include it as part of the language and the duration specifies the time. So
given this high level representation, we do translation. So this is the data
structure in the system doing the translation we have context translation map.
So all these provides the mapping so the result is something like extra meter Y
value, Y energy value is something larger than 52. And accelerometer DC is less
than 500, et cetera.
So the good thing about this is once we have this representation, this one has
lower level, medium level representation. For example, it has -- we mapped the
high level representation towards those including the research information here.
So we have sensor information here. So from this we can do system
optimization to some extent. So it's just like compiler we use.
So one example we do is from that medium level representation we can extract
what is really -- the sensors we should use and those that we don't have to use.
So here we did things like short-circuit evaluation. So that if we are given this
kind of curious like is the weather hot and humid, and let's say we know that it's
already hot or humid, I mean, I'm sorry, the other way. If it is cold, then we don't
have to know the other one, right? So in that way we can identify what are the
sensors we don't have to use.
So we do evaluate on those. So what we do is that unless given a false context,
complex false context, then we can identify what makes them false, right? And
then we don't have to worry about all the other things until something comes true,
right?
>>: So for more complex -- how do you do the translation into the ->>: No, no. Video, I don't. To be honest, I think video is quite different,
well-studied type of sensor.
So we can do to some extent if we really go, dig into the different parts of the
video encoders and decoders, right? But I'd rather say use it as a canned
system for me. So it's rather -- I would exclude video. But I do some of the
sound. But in the sound as well, if we do the regular sound recognition, I would
not be very much interested because, again, it's very well understood process.
However, if we use part of it, for example, let's say if we want to understand who
is, man or female, or are there -- how many people are there? All those things
are variations of those and then we can use part of it, I mean this kind of idea.
The reason what I am saying that is that whatever I do, there will be better ways
of doing it which have already been developed if it is the regular processing.
>>: So sounds a little bit like that case, where you guys go in and inspect that by
hand to reverse engineer how it maps.
>> Junehwa Song: Yeah, we did. We did. Yeah. But it's not easy. It takes a lot
of time for students. However, it's relatively easier than doing it with video.
That's what I'm saying.
So anyway, I will just go briefly here. The high level idea is once we know this is
false and we don't have to turn on all the sensors, we can extract the number
of -- the sensors which we don't have to worry, right?
So that's one idea. So here we could save about half of the sensors, and the
reason for that is think about the context, the situation. Let's say there are tons
of questions from the applications. Then how many of them will be true at what
moment?
I think it should be just small. For example, let's say there's tons of questions
about the locations like ten different location-based applications. It can ask if I'm
in this room, if I'm in this building, if I'm in this area. All different kinds of
questions. And also there could be many ones, but I believe not too many of
them will be true because I'm in a certain situation, right? So many of those will
be false.
So even if there are a number of context monetary requests, the number of those
which is true at the moment will be quite limited, right?
So this heuristic really worked how it worked and you could save a lot. And the
second idea.
The second idea we developed was -- so here we are dealing with multiple
questions at the same time, right? So we could do optimization considering all of
them together so we developed a shared processor as well as incremental
processor from some heuristics of the requests.
So, again, this helped us to improve the performance at the minimum like three
or four times in the CPU timeline. So those were the first ideas we developed
which has been quite old.
Now, once we have that, the second question is, again, if we run multiple
applications at the same time, then we should do further. Let's look into it a little
bit more about the environment.
So here we have a number of applications, which share scarce resources of the
system. And so, for example, this Micaso mode [phonetic], which is quite old
now, but anyway, that can run even less than one FFT, even if it is lightweight.
So it's very limited.
And we are considering running multiple applications. It's problematic, right?
And also it's dynamic. Users are moving around. So we should identify sensors
at runtime, dynamically and connect to it. And applications will come and go.
So one is that resource scarcity and energy scarcity and also we have dynamics
of the system. So applications cannot do it. See, it's not the matter of difficulty.
To run at the same time together sharing the system, applications should know
what others are doing. So we should provide system support.
But the problem was that in this kind of environment we didn't have that kind of
support. So we just designed a system to -- okay. So briefly, again -- so current
system happens to be what's done in the application layer. They specified low
level resource status or resource requirement from the application layer and the
system just to rescind said do it if you can do, right?
So here the system specifies again the high level -- the application specifies the
request in high level, right? And the system reserves it and translates it and
investigate it and analyze and identifies what kind of resource requirement. And
he also understands what other applications are doing and what the system has
now and does the resource binding at runtime dynamically.
So the system should provide holistic view of the applications and resources. So
the key idea here is that -- we are providing high level representation, APIs to the
users, for example, if somebody's running and there are many different ways to
identify that fact. There are two levels.
One is the sensor level. We can use different set of sensors to identify if the user
is running, and the second is we can also use different logics for it, right? So we
use that alternative resource usage and logics to provide different plans to the
system.
And so sitting in between the applications and the resources, the orchestrator
has the holistic view of the system and understanding of the system as well as
the application and use that flexibility to orchestrate this whole system.
Example again is that. Let's say we have application ABC. Application A is
translated and the system prepares two different plans. Application B is
transferred into two different plans, and C into three.
Among those seven, so for each plans we have -- so this plan B-1 uses extra
metal on the wrist, and it also, in more detail it also has the different processing
methods. So in this case it uses frequency domain feature extractor and
decision tree and all these are done on the mobile side in this plan. And this
plan, plan B 2 it uses a meter attached to the belt in this case and uses statistical
feature and decision tree again.
However, in this case the processing of this part is done on the sensor and the
mobile side is only the classification is done.
So this way we can select different nodes, different processing method. Also use
different -- do the computation in different parts of the system.
So also we have resource demand map analyzing the plans. We prepared
resource plans, demand to CPU and bandwidth and energy, et cetera, and the
system in the background prepares the availability map by monitoring. Basically,
the system undergoes continuous system monitoring to abstract what's available
now. So it has memory bandwidth and energy in this case.
And comparing the different plans against what is available now we can select
what are possible. For example, we can have -- this one is available according
to what we have in the resource side, but part of this is not.
We further on developed a framework to define and enforce different policies.
For example, we can -- one policy can be maximizing the number of concurrent
applications.
In that case we in translate the policy in force, on top of this constraint, matching.
An adult can be optimized resource, the energy usage. So we can say the
amount of energy to the minimum.
We can do all different kind of things using the -- yes.
>>: So I didn't understand this. Does B 1 and B 2, those are both provided by, is
it the producer?
>> Junehwa Song: Prepared by the system.
>>: From the query. Sing query B, derive both.
>> Junehwa Song: You're right. If you remember the complex translation map
that's prepared by the system so that should be configured by external expertise,
right?
So eventually what we should do is to have kind of ontology thing which probably
exists in the Internet so that we can collect all the knowledges from different
people so that it can be imported to the system and used as an extension of the
map.
So this shows the overall architecture of the orchestraters. We have the APIs
and application broker, and we have planning, processing planning part. We
have resource monitor, which monitors the system resources in the ground and
we have policy manager plan generator and plan selector. This is the part for the
processing feature extraction and recognizer. And the sensor part, the sensor
broker part is the part where it's communicated with the sensor.
Also in the sensor side we use TinyOS in this version, and on top of tying OS, we
have resource monitor. So the sensor itself monitors resources including CPU
and memory and also energy and communicates through the mobile broker, with
the mobile side orchestrater so that orchestrater can do the orchestration.
Also, the process -- plan processor on the sensor side makes it possible to
participate in the processing part if it is requested as such.
For the communication protocols we have developed a suit of protocols including
sensor detection protocol, sensory protocol and controlling protocol and data
reporting protocols.
So this provides infrastructure to develop a kind of microdistributed system
around a mobile device dynamically.
So we tested with about less than around ten sensors. Many of them are on
bodies, some are environment. I will just go very briefly.
To show the performance of the system, this was different from just simply
showing the through put or response time, because we should have shown it
works under different environments, the dynamic and changing environment.
So what my student did was to simulate the changing environment. What they
did was they divided a timeline to four different phases. And the first phase they
controlled the number of available sensors to two. In the second phase they
controlled it to four, and four to six, and six and eight.
And, again, probably 2.5-minutes they changed a number. So every 2.5 minutes
they increased one or decreased one. The total number was controlled in this
range.
And also they did similar thing with number of requests from the applications. So
they controlled in different phases between the number of choices between five
to ten, 10 to 15, and 15 to 20. And that shows the workload.
So it shows the result. So this is a phase A, B, C, and D. So in phase A and D,
here is -- this case is the number of queries fixed to 20. And we tried to see
number of activated queries and number of activated sensors. And in phase A
and B, orchestrator supported much more number of requests where a little bit
less number of sensors.
And in phase C and D, again, it could fill up, not all the requests, but the other
case was less than that and still the number of sensors was very controlled very
tightly.
So an average energy consumption was because it used less sensor, was
controlled a little bit lower. But number of active queries was almost doubled.
And a similar thing was observed when we controlled -- when you fixed the
number of sensors to six and looked at the same thing and again what's
confusing here is that in this case this color show -- remember I was just in
performance. So, again, it's about half of the sensors, even if we have six. We
need to control it less.
And showed the performance of the supported queries similar. So that's in a
very high level what we have. In the background, we have an infrastructure for
monitoring the system and both sensor pattern and mobile part. And we also
have sensor generation part, sensor detection part and policy, although those
things are a regular system thing, and actually it's getting larger system now, and
we are extending still toward larger scale.
>>: [inaudible] format?
>> Junehwa Song: Right. That's -- I wanted to show video but I was given -- the
second part.
So we have mobile device. And it should really be mobile support, mobile thing.
It's not, for example, let's assume you're running. Let's assume you're coming to
work and you get a phone call. And if you stop and push out your phone and at
the minimum you should touch five times. You should push this button and touch
probably five or six times and think about your jogging and you're listening to
music and you want to skip to the next music. You should first stop and touch at
most, around ten times. You don't want to do it.
So the mobile device should really support mobile situations, but it's not because
it's due to the limited interfaces.
So now Microsoft is doing gesture or voice to integrate with mobile -- without
much attention. So this work is about developing mobile gesture interaction
platform. Again, it's a platform. So we want to support application developers.
We don't want to spend time in developing the detailed complex logics and care
about the systems.
So by providing this platform, the developers can develop their own application. I
mean, interactions rather easily.
The problem in that case was, again, energy and in this case one of the
problems was the gesture recognition accuracy. The energy is natural. In this
case because we don't want to hold the phone, we put it into the pocket and we
use, watch-style sensor node.
So that requires more energy. And what's interesting is a lot of people that
worked on gesture recognition, however in mobile -- they say mobile, but in
reality they did in nomadic situations.
So all their experiment was that you move and stop and do the experiment, do
the gesture, and experiment, and you move again. So we wanted to do a real
mobile experiment. So these are the two things we have done. So what we did
was that one thing for the energy problem, what we did was we used
collaborative architecture. We developed an architecture where two devices, the
sensor and mobile node, collaborate to save energy, and the second is the
sensor node itself, uses two different sensors, gyro and accelerometer, but in a
clever way so that here these two sensors have very different characteristic.
Gyro is very good for gesture recognition. However, it requires a lot of energy.
So we instead of using the gyro all the time we have accelerometer in the front.
This is chip, but the accuracy is bad in mobile situations. It is not robust to the
mobility errors.
So what we did was we put this one in front of that and made a feedback loop
between the two. So on this part it does the segmentation, but in an adaptive
way by having this control, closed control loop. So in that way we can achieve
the same accuracy level of the gyro provide, and the energy is saved much using
this accelerometer.
So what it does is gyro makes the accelerometer adaptive to the technology, the
mobility noise situation.
So by that we could save the energy on the sensor node about 2.4 times. And
the energy on the mobile side was reduced about this much. 43 percent. One
interesting thing here is that the energy savings here is due to the segmentation
on the sensor node.
The reason for that is say segmentation identifies the potential segment of
gestures. So what it does is that rather than send in all the sensor data to the
mobile node, it sends chunk of data and the interesting thing is that human
gesturing intersection is sporadic. You don't do the gesturing input all the time.
You do it for a while and don't and do it for a while and don't.
So that dissenting behavior is chunked, right? So for the other times, the mobile
device can go to lower power sleep mode more easily. And that's the big issue in
energy in this environment.
So this could achieve a lot of savings. So in that way -- actually, I should stop
here. So in that way we could solve that problem of energy and the accuracy as
well. But there is some more optimization we should have gone through to deal
with in different mobility situations. So here in this experiment we did mobility
experiment the standing position and walking position and running position and
the situation in a car.
So we picked up those four cases as representative mobility situations and we
could achieve about 15 -- 90 -- I'm sorry, 96 points of, percent of accuracy, which
is good enough in this kind of rather coarse level gesture tractions, but we should
learn more to come with rather final level ding, but so far we could just do this
kind of remote control type of ding like you run and you have a phone called in.
If you want to receive the phone while you're running, you do this gesture and if
you don't want to receive the phone, then do this. If you do this, then volume up.
And if you do this it's volume less. Volume down. And skip and go back. And so
this kind of gesture could be successfully complemented. So we implemented an
application for MP3 player control while running or walking situation.
But we are doing further expanding the system towards a different application.
And different things. So those are a bit of our mobile platform to support
developers and users as well in two levels. One is the context monitoring level
and another was interaction level. Okay.
So I will stop here.
>> Sharad Agarwal: All right. Thank you.
>>: Thank you.
[applause]
Download