17654 >>: Hello everyone. It's my pleasure to introduce Matt... assistant professor in the computer science department of Harvard.

advertisement
17654
>>: Hello everyone. It's my pleasure to introduce Matt Welsh. Matt is an
assistant professor in the computer science department of Harvard.
Matt has been working on many different aspects of distributive systems,
operating systems, program language, and so on. And in the recent past, he has
been very active in the sensor net community. And today he will be talking about
the stuff that he has been working in this area, essentially.
>>Matt Welsh: Thanks for having me. Okay, so since that is very small group,
we can keep it highly interactive which will be fun. So feel free to let me have it.
But let me get through a couple of slides before you let me have it because some
people like to jump down my throat on slide two and it's -- it's too early.
Okay. So you all know what sensor networks are the refresher of small,
low-powered devices with limited amounts of computation communication
sensing capability. There's been a lot of interesting plat forms developed to do
this, so this is kind of exciting.
Now, the applications that have been developed for sensor nets over the last
decade or so are fairly diverse. There's been groups doing monitoring redwood
forests. There's a group at Princeton that put these on zebras and tracked the
herds in the African savannah. There's a group at Vanderbilt that's placed
microphones throughout a city and been able to localize sniper fire in realtime.
This is -->>: [inaudible].
>>Matt Welsh: Oh, well, I think they actually tested it -- it was designed to work
in a city, and they tested it at an Army training facility or something like that. But
it's an urban combat, you know, training facility.
The same group at Berkeley placed them across the Golden Gate Bridge to
monitor vibrations.
We've done work on volcanoes, and we've also done work on emergency
medical care. Okay, so lots of exciting applications.
Now, in all these cases the key challenge is managing resources because the
individual devices are very resource constrained. They have about eight
megahertz CPUs, tens of kilobytes of memory, very limited radio bandwidth, and
tiny batteries, generally.
And so my claim is that these resource constraints mean that we really need a
completely new approach to software design. We need to rethink the way we
design software, center it around resource management. And that involves both
managing resources at the node level, adaptation at -- the need for the
application to change what its doing over time, and managing resources at the
network level, okay.
So my mantra here is what I call Resource Aware Programming which is make
resources a first class permeative in the programming model so that the
programer is always reasoning about the resource use that the application is
requiring, okay.
And I'm going to show you how we kind of developed that into a system design,
okay.
So I'll talk about a couple of the applications we've developed, the challenges,
and then I'll talk about two systems. First is Pixie which is an operating system
for sensor nodes which enables this resource awareness programming concept.
I'll talk about Lance which is a framework for managing resources across an
entire sensor network, and then I'll wrap it up.
The first application we worked on extensively is using wireless sensors for
volcano monitoring. Have all of you seen something on this before? I think this
should be somewhat familiar to you. So the idea is that we placed seismic and
acoustic sensors across a volcano. We have a single GPS receiver that
establishes a time synchronization for the whole network. We have a long
distance radio modem that enables connectivity to an observatory which is a safe
distance from the volcano itself.
So the cartoon version is the volcano does something interesting. This happens
ten to hundreds of times a day on the volcanoes we've worked at. The sensors
would record the seismic and acoustic signal associated with the event, route the
data back over a multihop spanning tree to a base station. And then when you're
done with this process, you get a picture that looks like this. And this is time, and
each of the colored traces is one of the signals from one of the sensor nodes.
And a seismologist would look at this and say, okay, that's an interesting little
volcano tectonic earthquake, okay.
So we've done three sensor network deployments on two different volcanoes in
Ecuador -- Yes?
>>: [inaudible].
>>Matt Welsh: There's a lot of stuff. The -- I'm doing this kind of quickly, but the
high level is they are trying to understand the physical processes going on inside
the volcano that lead to deformation, to eruptions, to earthquakes, to rock falls, to
lava flow. Everything volcanoes do. They are basically using these signals to
understand the -- the pressures and the movements inside the volcano. So
there's a lot of things they do with it, right. But basic data collection is kind of the
key challenge, okay.
So --
>>: [inaudible].
>>Matt Welsh: Yes. It's important to know where they are physically so that you
can do the analysis of the data.
>>: [inaudible] more than this kind.
>>Matt Welsh: There's only one GPS receiver, and we use a multihop time
synch protocol. When we install the sensors in the ground, we use a handheld
GPS to mark the location.
Okay. So here is a picture of one of our deployments. This is Conrad Lawrence,
one of my former Ph.D. students. And what you see here is the radio antennas,
this is the GPS receiver, and this is the radio modem. And this is one of the
sensor nodes, and the rest of the sensor nodes are strung up in a line along the
volcano, so they are quite far from each other. In fact, this -- this here is the
antenna for the next node in the network which is several hundred meters away,
okay.
And what he's installing there is a sensor node that looks like this, and this is the
mote, and the radio -- I'm sorry, the mote and the radio antenna, and this is our
ADC card and there's D cell batteries powering it.
What this is replacing is monitoring station that looks like this that the
seismologist would typically use. And this has two car batteries inside of it and a
data logger. And, you know, this is extremely heavy, very bulky, lots of power,
and would take maybe four to five people to get all of the equipment for a single
station to the site; whereas, with our design, a single -- one person can carry like
eight of these in a backpack, okay.
Yes?
>>: You said you have solar power from base stations. Do you actual have solar
power for the little nodes, too.
>>Matt Welsh: Not yet but, we are working on that right now.
>>: It seems like it changes the power assumption to an unlimited supply.
>>Matt Welsh: It's not really unlimited, though. That's the power is that ->>: [inaudible] unlimited energy I meant.
>>Matt Welsh: Well, the problem is so the solar panels charge up the battery,
but at night, for example, the volcano doesn't know whether it's day or night. So
it's active all the time. So the battery, you would still need a really large battery to
power the whole thing sort of continuously overnight.
>>: [inaudible].
>>Matt Welsh: Even with solar charging.
>>: [inaudible] for days right.
>>Matt Welsh: What's that?
>>: Without solar panels you power continuously for days, right?
>>Matt Welsh: With D cell batteries, yeah.
>>: [inaudible].
>>Matt Welsh: Okay. So the other application, and this is the one that drives the
power consumption even further is the use of wearable sensors for monitoring
limb movements in patients being treated for neuromotor diseases like
Parkinson's Disease. So we are working with the group at the Spaulding Medical
Rehab Hospital in Boston on this.
This is the sensor node here. Basically, it's the size of a Zippo lighter. It has a
triaxial accelerometer, triaxial gyroscope, a little rechargeable battery. It actually
has two gigabytes of micro SD flash on there which is fantastic, so we can log all
the signals.
And the idea is that the patient wears, you know, eight to ten of these on different
body segments continuously for several weeks and they record the movements
of the limbs, and the data is collected from them and processed and then used to
understand the progression of the disease, okay. So I could give you a whole
talk on how the sensor data is used for understanding Parkinson's Disease, but
basically, it comes down to understanding whether the patient is undergoing
normal versus abnormal movements. Does it make sense?
Okay. So the things that I want you to notice about the two applications, the first
is that the data rates are fairly high, at least for sensor network people. You
know, typical sensor networks are designed to support, you know, sampling once
every ten minutes or something like that. Now we are sampling multiple
channels of data, a hundred hertz per channel on each of the nodes.
The timing accuracy is very important so we can compare signals across nodes.
The processing of the signals is domain specific in the sense that, you know, it's
the seismologist that's going to study the, you know, P-wave arrivals of the
seismic waves, or it's going to be a clinician that studies the limb movements
using classification algorithms. I don't want to write that code. Yeah?
And then finally, the applications really do have to adapt their behavior based on
the changing resource availability. So if the wearable sensors are being -- if they
are being worn by someone and the radio link bandwidth is changing over time,
that's going to affect, you know, what data they transmit and when and how.
And likewise, if you've got solar powered sensor nodes, you've got to tune the
processing and the overheads based on how much energy you've got in the
battery.
Okay. So -- and that adaptation is also highly application specific, because it
depends on the application, what's important, and what's not, okay.
So the standard approach to doing this kind of resource tuning is very meticulous
and very painful in my opinion, so this is a tiny OS application for the limb
movement application that I described earlier. And I don't want you to worry so
much about what's inside of each of the boxes, but I want you to notice that
basically the application consists of a bunch of software components with fairly
complex wirings between them. And the other thing is that inside each of these
boxes, there tends to be one or more knobs that one can tune to change the
overhead and the fidelity and the resource consumption of that box.
So, for example, I could change the interval which I do the LPL checks on the
radio MAC layer. I can change the processing rate of the data. I can change the
refresh rate of the sorting protocol. And so there's a gazillion knobs scattered
throughout the software that affects the quality of the data and the resource
consumption.
So my belief is that a domain scientists especially is not going to want to think
about it this way. And this is a lot like sitting at the cockpit of a 747 where you're
having to, you know, fiddle with all these knobs and controls in order to get the
thing to fly.
What's that.
>>: [inaudible].
>>Matt Welsh: Yeah, great. So some people like doing this, but I would argue
that a volcanologist shouldn't be handed this and say, "Okay. Here you go."
So what we really want is something like this, right. Which gives you several
degrees of freedom, but it's much easier to sort of reason about.
Yeah?
>>: [inaudible] these cheating on benchmarks. I'm sure that thought raised you
here but ->>Matt Welsh: Yeah, we are trying to avoid using them to cheat on benchmarks
as well, that's right.
>>: [inaudible].
>>Matt Welsh: But the point is, in this regime, tuning the application to get the
right data quality and resource consumption under varying conditions is very
hard, all right. So we would like to provide a programming interface that makes
this much more straightforward.
So here is what we've done. We've designed a new OS called Pixie. This is a
sensor node operating system that has this resources as a first class
programming permeative. So I would argue in most operating systems,
resources are in some sense implicit. So, you know, the only place in, say, UNIX
that you really reason about resource availability is when you call malloc, and if
malloc returns null. Right? And, of course, if malloc returns null, then you have
to crash or something. I mean, there's very little you can do in most cases.
So the point is that in conventional operating systems, they are designed to
shield the application from having to think about resources at all, because the OS
says look, the applications do whatever it wants, and it's my job as the OS to sort
of allow it to do that and to play the shell game to try make it happen.
I'm going to argue with the severe resource constraints on sensor nodes, you
can't get away with that anymore, okay. They are virtualizing the way the
resource constraints is not practical.
Yeah?
>>: So monitor OS's do abstract away things like the virtual memory system?
>>Matt Welsh: That's right.
>>: [inaudible] so where you thrash is this is going to help.
>>Matt Welsh: That's what I'm arguing is that that is exactly what you don't want.
So virtualizing away resources is bad.
>>: So would these techniques also be working in regular operating systems?
>>Matt Welsh: Potentially, yes. Yes. Absolutely. But I'm going to argue it's less
important to do that in conventional systems, because we have usually plentiful
resource. I'm going to argue in a sensor node context, we usually have scarce
resource. And so 90 percent of the time, resources are scarce you really have to
think about rationing them out, versus 90 percent of the time they work just fine,
so virtualization is probably the right approach.
So the argument is -- and I'm going to talk about the design in a minute.
Basically, the application must contend with varying resource conditions, that you
can't hide it away. And the fundamental challenge is how do you allow
programmers to deal with this without too much pain?
All right. So here is a design -- here is an application in Pixie. This is that motion
analysis application for wearable sensors that I mentioned earlier. Again, it's
somewhat simplified in the picture. And basically what we are doing here is we
are sampling the sensors, we are looking at the data, if it's not interesting, we
drop it. So if the sensor is like not moving, for example. Otherwise, we log the
data to flash, and we pass the signals to several feature extractors that compute
different features on the raw signal and then transmit them over radio link to the
base station. Again, I'm simplifying the hell out of this, but this is just to make it a
nice picture, okay.
So each of these little boxes we call a stage. And there's queues in front of the
stages that are optional. So I would typically only put a Q in front of a stage that
has a variable amount of time to execute. For example, accessing an I/O device.
Now here is how resource management works in Pixie. Let's say that this stage
wants to perform some computation on the input signal. In order to do that, it has
to request energy from the OS because you don't get anything for free in Pixie.
All right? You have to get permission to use the interview to do the computation.
So it would make a request to the Pixie interview allocator saying, "I would like to
use 700 millijoules over the next ten seconds," for example.
Then if the energy was available in the battery, it would receive what we call a
resource ticket which is time bounded right to consume some amount of energy.
Yes?
>>: It makes a request to, doesn't that consume energy?
>>Matt Welsh: No, no, no. It's a very good question. So we do give you some
amount of energy for free, that is the ability to negotiate these things you get, and
that's -- there is some overhead for that, but it's a margin, tiny amount of
overhead so it doesn't matter, okay. That's a good point.
It would get the ticket, and then when it wants to perform the computation, it
would redeem the ticket by passing it back to the OS and then being able to
perform the computation, okay. So the OS can track how much resource it's
promised to different modules in the system, and it can track how much resource
is being used.
Yes?
>>: Maybe I'm looking at this the wrong way, because when I see you passing
the tickets back and forth, those look like security tickets like in case somebody is
cheating. That's not what you're worried about.
>>Matt Welsh: We are not worried about cheating. We trust the application
not -- we could make them unforgeable.
>>: Right, right. And that's not in your goal. So I'm trying understand what role
the tickets provide for the programer. Are they helping the programmer? Is the
idea that the ticket now is a data structure that you can pass around to keep track
of what?
>>Matt Welsh: Yes. Let me come to that. It will become clear in the next slide
or two, okay. But you're asking exactly the right question.
So now let's say I want to transmit some data, so I need a bandwidth ticket, so I
say I want to transmit a couple of packets over the next few of seconds. I get
back the bandwidth ticket and here is what I do. I attach the ticket to the packet
that I want to transmit. I attach both the ticket and the packet to the radio link
layer. The radio link layer can redeem the ticket and transmit the data. So the
one thing the tickets do is they decouple the request from the use. And so I can
accumulate tickets in advance of my needing them, and then I can redeem them
later, and I can pass tickets to different parts of the application to allow them to
use resource.
>>: [inaudible] in ordering operating systems, in ordering programs, you have to
reason about how many resources you use and you call sites that you're
responsible for spending the resource. Where here you can propagate it back to
the source site which is basically responsible for it.
>>Matt Welsh: That's right.
>>: And track that use to make sure they've accounted for it all the way through
the part that spent it.
>>Matt Welsh: That's exactly right. And I think the other thing that's important
here is, um, um, you know the site that's using the resources can be different
than the site that's requesting it. And that's very important because in typical
OS's, you sort of are requesting resource and using it kind of at the same time,
and we can't decouple those two things, and so I can't allow to you do planning.
>>: [inaudible] recently pushing the knobs, the man all the way upstream to the
guy who is turning ->>Matt Welsh: Let me come to that in a moment. Where the knobs are getting
turned is the next step of this.
Yes?
>>: There's [inaudible] some that you want to transmit the stuff, sometime you
don't care about the ->>Matt Welsh: That's correct.
>>: If you want the [inaudible].
>>Matt Welsh: Well, within the time window that I'm requesting. So the ->>: But what happens if a bunch of tickets hit the radio income with level
bandwidth?
>>Matt Welsh: I'm going to come to that in a moment. This is not a guarantee, is
the short version.
>>: This is --
>>: So I'm concerned that the novice programer will find it easiest to just ask for
the tickets right before he needs them.
>>Matt Welsh: That's correct. And that's typically what happens, but you don't
typically get good behavior when that happens. Let me come to this in a
moment. You're asking good questions, so I believe I'm going to address them
all in a moment.
So a ticket is a revokable right, and this is important and I'll talk about that in a
moment, to consume some amount of resource until the executory time. So you
can think of it like a short-term reservation for some resource. We can define a
simple algebra on the ticket. I can redeem a ticket. I can forfeit a ticket if I don't
need it. That tells the OS I don't need the resource. The ticket can be revoked,
so the OS can say, "I'm sorry, but that ticket is no longer valid." That's good
feedback to the application that it needs to change what it's doing.
We can combine tickets and we can split them. So typically an application would
hoard or request some number of tickets, combine them into a single larger
ticket. And then when it needs to do some resource, it would split the ticket and
just peel off the part that it needs and redeem it, okay.
So the granularity of a ticket depends on the resource that we are talking about.
So an example of, bandwidth might fluctuate very rapidly, so the expiratory time
on a bandwidth ticket should be very short. But to go and ask for, say, energy or
storage on the flash, I can have a long expiratory time because it's not going
anywhere if I'm not using it.
So the way of thinking about tickets, it's not a guarantee. It's very difficult to
make strict guarantees in these kind of systems, so we are trying to move away
from that. I don't want to do the kind of, you know, hard realtime OS type
guarantees because I think that's too strong, and it's not that useful in these kind
of applications. And so in Pixie, a ticket is revocable by the OS even before the
expiratory time, so you think of the expiratory time as a hint. But there's no
guarantees.
So here is the mental model you should have for a ticket, which is this kind of
ticket, an airline ticket. Because an airline ticket by no means gives you the right
to get on that airplane, right? At best it gives you the right to go through security
and to attempt to get on the airplane. But the airplane may not be there, it may
be broken, it may be delayed. And so the airlines have -- it may be overbooked.
There's all kinds of things that can happen. And so this is a good mental model
that most of the time you're going to get on the airplane with the ticket, but
sometimes things are going to not happen that way, okay.
So this is a very important -- but I think that this provides a useful programming
extraction in the sense that it sort of trades off ease of use for efficiency. Okay,
we can't guarantee things forever, so we do our best to give you good indication
of what you're going to be able to get. So in Pixie, every physical resource,
energy, bandwidth, storage, and memory has a corresponding physical allocator
that's responsible for handing out tickets. And the key thing about an allocator in
Pixie is that it's policy free. If you make a request for a ticket, it will give you the
ticket as long as the resource is available. It does not impose any kind of policy.
There's no prioritization. There's no preferences. There's no scheduling. It's just
give me what's available right now. And I'll talk in a moment about how we
impose policy on top of that.
Yes?
>>: The contract for the scheduler is [inaudible] yes or no thing that a request for
immediate [inaudible] yes you get it or no. I was wondering if you want to make
it -- is it resource is the right assumption, can you not [inaudible] this is as much
of the time width available.
>>Matt Welsh: That's exactly what's happened. So you can -- you can optionally
-- you can say, "Give me a ticket." It can optionally say, "Look, I can't give that to
you, but here is how much I do have." Because that's usually what you want to
find out. Otherwise the application is forced to do like a search and keep asking
for a smaller and smaller and smaller amounts until it gets what's available, which
is not what you want to do. So what we just do is when you don't get a ticket, we
tell you we are not giving it to you, but here's how much we do have.
>>: There's also a notion of the order or the rendition of how you redeem a ticket
straight because that's your question, too. Now you can come and essentially
stop. Even make -- single out the applications [inaudible] so one may actually
now starve the other public company.
>>Matt Welsh: That's exactly right. There's nothing that I've told you so far that
would prevent a single component from being greedy and starving others, okay.
So, yes?
>>: Terminology when you say available, do you mean X of not already
reserved?
>>Matt Welsh: That's correct. A not already reserved so the point is when I ask
for a ticket, it gives me a ticket. I'm not going to promise. Well, but, that's up to
the OS. The OS could oversubscribe itself, so it could say, "Look,
probabilistically speaking, not all the tickets get used, so I can promise more
energy than is usually available." But that would be kind of cutting it. We don't
usually find that that's a useful thing to do.
Okay. So, now, this is addressing some of the questions that have come up so
far. So tickets are actually kind of hard to use, and it's kind of low level, and it's
low level mechanism, and I find it very useful, but people tend not want to deal
with them. So we ventured into this concept of a resource broker that's basically
a software module that mediates the ticket allocations on behalf of the
application. So there's a word called middle ware that some of you may have
heard and I don't like that word, so I tend not use it, but if you've heard that word,
you may consider this to be like middle ware, okay.
Basically a broker is a high level -- it's a software module and it has its own
separate API, and you can specify, you know, what you want. And it will go and
get the tickets for you. And perform potentially operations on your behalf. So a
good example of this is because it's just a stage, it can interpose on the data flow
path, so the broker can be responsible, for example, redirecting the data that's
flowing through it down several downstream paths based on the available
resource, all right. So then the application modules don't need to care about
tickets at all because the broker is doing that for them, yeah. So this is just like
back before you went and bought your airline tickets directly from the airline, you
had to go to a travel agent, and you tell the travel agent, "I want to go to
Honolulu," and they find you the best deal and whatever, and they deal with all
the issuing of the tickets and that sort of hand you a dossier and itinerary and off
you would go, right.
Did you have a question? Did you have a question?
>>: I'm confused. What exactly are tickets kind of buying you in the sense?
Like, I mean, what -- I mean, so you want to make application resource over here
just a simple [inaudible] of that would let applications pool how much resource is
available right now. Wouldn't that be enough versus ->>Matt Welsh: I'm arguing that's not quite going far enough in the sense that I
want to be able to give hints to an application about the future ability to use a
resource up to some time limit. Because if I say, "Well, what's available right
now," and then I say okay, well, right now I'm able to transmit ten packets, then
I've got to go do a whole bunch of work to be able to plan on being able to
transmit those ten packets, but I don't know another software module jumped in
there and tried to transmit before I got around to it. So I really do think we need
the concept of a reservation in there, so an application can commit some
resources to preparing some work that it's going to do that's going to later
consume that resource down the line. Does that kind of make sense?
>>: In an abstract way, yes. I'm thinking like are there applications like that that
know what they want to do a certain amount of work ahead of time?
>>Matt Welsh: A very good example is, I don't want to go spend a whole bunch
of joules computing a bunch of stuff if I'm not going to be able to transmit the
result at the end. So I'd like to know in advance that I'm going to be able to
transmit the result. So I want to get the bandwidth ticket, and then I want to
spend some time computing the thing that I'm going to transmit, and then I'll
transmit later on. That's what I mean by decoupling the request from the usage.
It's very important to be able to do that.
>>: Typically the applications are written like, is it large enough to do that in the
sense --
>>Matt Welsh: Not all, but I want to make it possible. I guess what I'm getting at
is there's going to be a lot of cases that we don't need all this complexity. But I'm
trying to come up with what is the fundamental set of permiatives that we need to
manage resources.
Yeah?
>>: Do partial applications typically run on these sensor nets.
>>Matt Welsh: No. Okay. This is a very good point. So we are assuming
throughout this whole thing that there's a single application or at least if there's
multiple applications that they are cooperative somehow. Typically, I've never
seen a viable sensor network that has multiple applications running on it. I
mean, no one's talked about that, and ->>: [inaudible] volcano in Ecuador, that's covered with it.
>>Matt Welsh: The volcano that's covered in zebras, right. Nothing.
Okay. So let me keep moving. Let me show you an example of brokers at work.
So here is an example where these four stages would like to transmit some
amount of data, two packets a second, four packets a second, and so forth. But
the radio link may not be able to support all of the bandwidth based on the
current link conditions. So each of these stages could go and request its own
tickets, and then they would have to talk to each other and decide who gets to
transmit first and second and so forth in order to use the radio link effectively.
Rather than do that, we can introduce a bandwidth broker that would receive
information from the stages on the nominal transmission rate that they'd like, and
then it can request the tickets on behalf of those stages, get it back. And in this
case, it asks for 20 seconds a second, it only gets 12 packets a second back as
a ticket, and it realizes that it can't satisfy all these needs, so what it can do is
split the ticket and hand these stages, a sub-set of the ticket, okay and the math
happened to workout in this case to add up to 12 packets a second. And what
that has the effect of doing is disabling the stage because that stage is not
getting the resource to transmit anything.
>>: I'm confused. Why would you request for 20 rather than 20 plus 6 plus 4.
>>Matt Welsh: The reason is simply I didn't go through the details, but basically
there's a priority scheme here, and if I'm able to send this amount of data, then I
don't need to send this stuff.
>>: [inaudible].
>>Matt Welsh: So it's basically this dominates all those guys.
>>: So that's the API that ->>Matt Welsh: I'm grossly simplifying it for the purpose of the presentation, but
yes.
>>: So if the thingy in the bandwidth broker is not an abstract thing, it actually
has some knowledge of the set of things.
>>Matt Welsh: That's correct. And each one of these actually has a priority
associated with it.
>>: Then, then ->>Matt Welsh: Okay. But it's PowerPoint, not code, so I just wanted to kind of
make it -- make it simple.
Okay. So the other thing we can do is use this to schedule energy. So I can say,
for example, let's say I've got my battery and there's a lifetime target, and I can
define a nominal rate at which I'd like to allocate energy tickets to the application
to meet the lifetime target.
So then I can have an energy broker that goes and gets energy tickets and
hands them to the application to ensure that we meet this rate, right? And so a
very simple approach is a conservative strategy that says make sure that the
amount of energy that's left in the battery at each moment in time never falls
below this nominal schedule. That will guarantee I meet the lifetime target, but
it's conservative.
A different approach is to allow the energy to temporarily fall below the schedule,
and this is what I call a credit based scheme. And this allows me to incur some
energy debt and when I do go into debt, then I basically have to pull back on the
amount of energy I give out to the application so that it will recoup that debt and
go back above the schedule. And then when I'm above the schedule, I can
increase the rated energy that energy is allocated.
Yes?
>>: Getting back to your math exactly, that red line, does that matter?
>>Matt Welsh: I'm -- Yeah, it's true for the purpose of the presentation, I'm kind
of simplifying the battery model. It turns out we use a very simple battery model
and it still works very well. So it's true that real batteries like especially like
alkaline batteries are especially very sensitive to voltage fluctuations and all kind
of things. We tend to use lithium polymer batteries that have a little bit more of a
simpler model to them. But, you know, battery modeling is a very complex topic.
I kind of didn't want to get into it, but you're right. Right now we are kind of using
a very simplistic battery model.
>>: [inaudible].
>>Matt Welsh: It works very well. And so some of it is like, well, it may end up
hurting us like 10 to 20 percent in the end, but for the most part, being able to
reason about the energy consumption over time gets you most of the way there,
that's a good point.
>>: But does your model allow if you tomorrow came up with a better ->>Matt Welsh: We can plug in a better battery model and this will work. It's just
the shape of this curve would not be so simple, okay.
So let me show you a quick evaluation of this thing. This is an application based
on using a network of sensors with microphones to detect acoustic events in the
environment. And the one that we are looking at is the alarm call of this type of
marmot. And the reason is that Lou Gérard has this wonderful rich data set that
he's collected out in forests, and so we just wanted to use his data set. It could
be anything. It could be detecting gunshots or whatever, but we show marmot
calls because he had the data.
Here is the application. Basically we are sampling the acoustic signal at 24
kilohertz. And here is an example of a marmot call in the -- in the time domain.
And we drop the signal if there's nothing happening, if it's quiet. So, right. And
then there's an energy switch. And basically what it does is it looks at the
amount of energy left in the battery, and if there's plentiful energy, it will pass the
signal to a very good detector that uses an FFT and ramps up the CPU to
400 megahertz and does the full processing. And it's very energy expensive, but
it's a very good detector. Otherwise, you can use less good detectors, like the
simplest one is a threshold detector that says as long as the acoustic signal is
over some threshold, then assume it's a marmot call. Obviously, there's a lot of
false positives, but it's very cheap.
So if I look at the marmot call, the frequency domain, you can see the marmot
call right here. This is the same signal. This is the time domain, it's a frequency
domain. The threshold detector would trigger on anything that went over some
threshold. And this is -- the FFT detector is going to be able to detect the
spectral contents of the signal.
Yes?
>>: It seems like a scientist wouldn't really appreciate having data that's
generated this way because sometimes it was detected using one ->>Matt Welsh: That's right, that's right.
>>: And they don't know that ->>Matt Welsh: Well, we can tell you when it was detected which detector was in
use at the time, and so they can go back and sort of understand whether that
was good or bad.
>>: But they don't know if it wasn't detected what detectors were used.
>>Matt Welsh: That's correct. Well, we could easily tell them that information as
well. That's not a big deal.
>>: [inaudible] if it's more than one frequency, wouldn't the resonances cost only
marginally more than high cost filters?
>>Matt Welsh: Yeah, I think -- yeah, I'm trying to remember. We played with
different filter designs, and I don't remember exactly what we settled -- why we
settled on this set, but I think this is more of an illustration of the idea that I've got
different levels of processing of different fidelity.
I think your point is very well taken which is that yes, does a scientist want a
scientific instrument that's got variable fidelity over time based on energy
availability. That's a good question. We could have a healthy debate about it.
But I just think ->>: [inaudible] want to be a scientist.
>>Matt Welsh: Yes. In general, yeah. I mean, in general they are quite happy
with something that's able to tune itself as long as they can go back later and
say, "Why did it behave the way it did?"
Okay. So, here's a -- here's a -- there's a lot of stuff on the graph, so I'm going to
walk you through it step by step.
Basically, this is the energy consumption of the node and it's normalized to a
target schedule of 40 days. So if we were to hug this black line, we would need a
40-day battery lifetime target.
And down here, each of the dots represents the detection of a marmot call,
including false positives. And the black dots are the ground. Those are the true
marmot calls, okay.
So the simplest scheme is ignore energy altogether and always run the FFT
detector. We call that an optimistic policy. As you can see, it doesn't meet the
energy schedule. It uses too much energy. But, it's 100 percent accurate, okay.
A conservative strategy always meets the schedule, it always stays below the
black line, but it's forced to use these cheaper detectors most of the time, and so
it has a lot of false positives.
If I use the credit-based scheme, this is kind of interesting, what this allows me to
do is occasionally the line goes above the black line, meaning that we are
incurring energy debt up to a certain point. And that allows it to use the good
FFT detector. So it's much more accurate at some periods, but at other times it
has to repay the energy debt, so -- and forces it to use the threshold detector.
Does this kind of make sense? So the credit-based scheme, it allows me to use
energy in bursts that might temporarily violate the schedule, just like having a
credit card means that you can go buy something that you might not have
enough money for in your bank account right now. And that's a good thing.
Yeah?
All right. So, all right. So how much -- I mean, how much time do we have,
because it's almost 11:15. And I know that -- do you want to go for another
10 minutes? I mean, is that okay? I can do the second part sort of very quickly if
that's okay with you guys. I'll give you the -- I'll give you the -- because I don't -- I
think I'm out of time to do the whole thing. So I'll do it kind of quickly, if that's all
right, and then you can stop me if you have questions.
So I talked about managing the node level resource. Let's talk about the network
as a whole, okay. So I'm going to focus on reliable signal collections. So the
problem is that the rate the sensors can generate data outstrips the capacity of
the network to download that data from the network both because of bandwidth
constraints and energy constraints. So typically the bandwidth that we can
sample at outstrips the multi hop bandwidths of these radio links.
The other observation is that data differs in value. Not all data is created equally.
This is a trace of a seismic signal from one of the volcano monitoring nodes. And
a seismologist would look at this and say, "Well, this is the interesting stuff.
That's the volcano doing something that I care about, and everything else is
really just noise. So I don't necessarily need to download all the data from all the
data all the time. I care about the interesting signals first," right?
The third observation is that data differs substantially in terms of the energy cost
to download it from the network. So if I'm downloading data from a node that's
one hop away, I might be able to get that signal from 522 millijoules, and if it's
from multiple hops away, it's going to cost a lot more, and it's going to be a lower
bandwidth. And the reason is that it's passing through multiple intermediate
nodes, and we are interfering with nearby nodes that are overhearing those radio
transmissions, okay.
>>: [inaudible] because you obviously don't want to bias it to refer date from one,
right? You have to ->>Matt Welsh: Well, I'm going to come to that. So this is the optimization
problem. So what we've done is we've created the system calls Lance which is
named after this guy which is basically a priority driven data collection system for
sensor nets. And the goal is to optimize the quality of the data subject to both
the energy and the bandwidth constraints. And one thing I'm not going to talk
about today, but Lance basically allows us to have different policies for driving
the data download process to let us target many different optimization metrics,
priority maximization, fairness, spacial or temporal data distribution, and so forth.
So we can -- basically, domain scientists can come in and parameterize how this
thing works for their needs, okay. But I'm not going to have time to talk about
that. So I'm just stating this, but you have to believe me, okay.
So basically here is the design. So I've got my sensor nodes. They are sampling
data to what I call an ADU, an application data unit. And that's a chunk of data
that is sampled by a node. And in our case an ADU might be 60 seconds of data
sampled at a hundred hertz, and that would be about 18 kilobytes, okay.
They -- The summary of the ADU that represents a concise representation of
what's in the ADU, and whether it's interesting or not. And store and send those
summaries to the base station. So the base station is getting this periodic
summaries of what the sensor nodes are sampling. And then they log the data to
flash.
Then the base station takes the summaries in, scores the ADU, each ADU,
according to a scoring function which I'll describe in a minute, and then
downloads the top scoring ADU from the network, okay. So it call comes down
to the scoring function, how we decide which ADU to download next. Okay.
So the first thing we need to do to make this meaningful is we need to have some
definition of the value of data. And coming back to my original trace here, one
way to define the value of the data very easily is in the -- at least in volcanology
is something called the reduced seismic amplitude measurement, or RSAM,
which is basically just taking the average of the signal amplitude over some time
window. So if I look at the signal here, and I compute the RSAM, the RSAM is
just the envelope of the signal here. And this is very, very low wait, because I
actually only compute this for every 60 second time window. This is multiple
hours of data, that's why it looks like a lot here, okay.
Now, the definition of the data value is going to be very application dependent.
So that's where we let the domain scientist plug in a function, okay.
The assumption that we make is that computing this can be done efficiently on
the sensor nodes themselves. So it may be crude and may not be exactly what
you want, but it's very efficient to compute it on the nodes.
Yes?
>>: You're [inaudible].
>>: I'm coming to that in a moment. I have a whole slide on that, okay.
So the optimization goal is as follows. If I define the universe of all the possible
ADU's that are sampled by the network, and then I have a vector of the energy
capacity of each node, then -- and I define both the cost and the value for each
ADU, and the cost is a vector of the energy consumption accross a set of nodes
that are responsible for downloading the data, I'll show that in a moment. Then
the optimal set is the set of ADUs that maximize the sum of their value subject to
the sum of the cost being less than the energy capacity of the nodes, right? So
this is just a multidimensional Knapsack problem. The dimensions of the
Knapsack are the energy capacities of the nodes. And the values of the objects
that you're sticking into the knapsack is the value of the ADU. And the cost of
each object is the vector of the energy consumption to download that ADU, okay.
So we know how to do this in an offline way, and so I can compute the optimal
set, but I can only do that sort of knowing all the future ADU's that are sampled
by the network. So what we need is an online greedy approximation to this. So
here is what we've done basically. As we learn about an ADU sampled by the
network, we assign a score to it. And I'll talk about the scoring functions next.
And then I exclude the data stored on nodes that currently do not have enough
energy. So this is a local greedy thing. It says just like I showed before, where I
have the slope, and if a node has fallen below that slope, I just consider it to be
offline temporarily. So I don't try to download data from nodes that are being
impacted from an energy point of view. And then I download the remaining ADU
with the highest score. This is a very, very simple eristic stick. I'm going to show
you that it works incredibly well, though, so that's the advantage.
Okay. So here is the energy. You were asking about the energy cost for
downloading. So let's say that I've got this -- an ADU stored here and I want to
download it. Well, there's three costs involved. First, the energy drain on the
node that I'm downloading from. And we've measured the empirically, so it's 58
millijoules per second. The energy drain of the nodes that are forwarding the
packet, and that's about 55 millijoules per second. And then the other thing we
have to consider is that as these nodes are transmitting packets, other nodes
nearby are going to overhear those packets, and that's going to consume slightly
more energy on them, yeah? And so that's what we call the overhearing cost,
and that turns out to be about 6 millijoules per second, okay. So we can model
this pretty well.
And then so here is the scoring. This is the key is the scoring function. So if I
have a set of ADU's and let's say that I've got an ADU with value ten, and a cost
vector here which is downloading from this node plus the overhearing costs plus
the -- you know, this is the cost of each of the sensor nodes. So I can define
three ADU's and each one has a corresponding cost vector, okay. I've got three
possible scoring functions that we've looked at.
The first is just to consider the value. Ignore the cost and just say that the
highest valued ADU is the highest valued thing, okay. Yeah?
The second one can -- weights the value of the ADU by the total cost. So I would
sum up the elements of the vector and divide that by the value, and that would
give me the score, yeah?
Turns out that that doesn't work so well because it prefers only ADU nodes that
are close to the base station, because the further you are from the base station,
the more cost accumulates to download something.
So the third scoring function is what we call cost bottleneck. And here is the idea
is I weight the value of the ADU by the energy cost on the node that is the most
energy constrained. So I look at the amount of energy available across the set of
nodes that are involved, and I divide, so let's say that this is the node that's the
most energy constrained, for example, I would divide the value by the cost to that
node. Does that kind of make sense? Okay. So the intuition is I'm optimizing by
looking at who I hurt the most as I'm downloading something, and I scale the
value of the ADU by that amount, okay.
So let me show you how this works. Go ahead.
>>: Are you ignoring zeroes or not?
>>Matt Welsh: What do you mean by that?
>>: So if there's another node that has a zero ->>Matt Welsh: Yes.
>>: Is that -- do you sometimes divide by that zero, or?
>>Matt Welsh: Well, we have to -- let me see if I can explain this. So we are not
going to divide by zero, first of all.
>>: [inaudible].
>>Matt Welsh: Indeed, but that node would not be -- it's not affected by this
download at all by definition because it's zero, so it can't possibly be in the
bottleneck set. It's only the set of nodes with nonzero -- I should have been
careful. The set of nodes with nonzero entries in the cost vector are considered
to be potential bottlenecks. Then I take a look at all those nodes, figure out who
is the bottleneck, then I divide by the energy cost of that node.
>>: The spent a lot of time, there might be a zero and there might be a 0.1. Is
0.1 insignificant? I mean, is there a threshold?
>>Matt Welsh: It doesn't seem to happen.
>>: You're talking about overhearing which is kind of like ->>Matt Welsh: Right. It's very small it's not ->>: But those minor costs are going to turn into really large denominators.
>>Matt Welsh: I guess it's conceivably possible that it would happen. I can't tell
you off the bat whether any of our experiments have seen that effect. The
bottleneck node happens to be one with a very small energy cost, so therefore
the value just explodes on you. But, I mean it works, even if that is the case. So
let me show you that it works.
Okay, so this is a simple -- this is a simulation where we've taken ten nodes in a
linear chain, and we are feeding it realistic data, okay. And then I'm looking at
the fraction of the available data from each node that's downloaded, okay. So if
this was a hundred percent, that would have meant that I downloaded all the data
sampled by that node. And if I just use the value, then what you notice is that it
downloads basically the same amount of data from all the nodes. So this ignores
the energy cost. If I only consider the cost, then it prefers nodes near the sink,
because they have lower cost and this is bad. This is a bias that we would like to
avoid.
If I use the cost bottleneck function, and then I look at the optimal, this is what's
optimal according to the Knapsack solution, you can see they are basically
equivalent, that this bottleneck function which is an online greedy eristic matches
very closely an offline optimal solution to this problem. This is, I think, is pretty
significant. It says that we can made greedy local decisions about what to
download from the network, and in the end, it ends up giving us basically the
optimal set.
Yes?
>>: It's still biased, though.
>>Matt Welsh: It is biased, but this is optimal, right? This is the set of data that
we should have downloaded that maximized the sum of the value subject to the
energy constraints.
>>: Very careful as you're interpreting this data not to use the number of
samples in your -- [inaudible] I mean -- I mean, I guess it depends on your value
competition. I guess you made the assumption that you're going to determine
the value with perfectly though.
>>Matt Welsh: That's basically correct, yes.
>>: 6 is this correspondence, in order to affect that, you're looking in the as
opposed to a.
>>Matt Welsh: We have looked at this in dozens of -- I don't know if I have the
slides here. Yeah, here is another one. So here is a run on an actual test bed of
about -- well, this a 50 node. We have 200 nodes test bed. We took 50 of them,
and we created a spanning tree. So this is a real network running on read nodes
with real data, et cetera, et cetera, et cetera. And if I look at the optimal, that's
the green, and I look at what Lance did, that's the blue. And what you notice is
that it closely matches. It's not perfect. So sometimes we download less data
from the node then we should, and sometimes we download more data than we
should. But if I look at the total value downloaded out of the network they much
almost perfectly. So it's not perfect, I'm not claiming it is, but its way better than
something that just tries to consider either the cost or the value separately.
And we've looked at different data distributions. We've looked at real -- you
know, all kinds of stuff, and this holds across the -- across the spectrum.
>>: So how do you get the cost [inaudible] to the [inaudible] What are the
inferences of the cost?
>>Matt Welsh: That's right. So the base station nodes its apology because it's
getting heartbeat messages from each of the nodes about who its parents are in
the routing tree, and we discover the neighbor set based on other heartbeat
information from the node. So then -- then we use that model that says the
energy cost for downloading or for overhearing or for routing data, what I
described earlier and that's applied to the known topology of the network, and so
that determines the cost vectors. So we are using and making a modeling
assumption there that we know how much energy is eaten up on a node when it's
doing one of those three operations. Okay.
So to test the stuff, in the field setting we went back to the volcano, actually a
different volcano this time. We deployed a small network of about eight nodes
here. And this is Steve Dossen-Hagerty, who is a Ph.D. student of David
Kuehler's now. He was an undergrad at Harvard working with me. And he's
setting up one of the nodes. This is my old laptop being used to reprogram one
of the nodes because we had a bug in the software. So I had to climb back up
the volcano and reprogram the nodes by hand. Don't do this. The MACsafe
connector on a MAC is magnetic. So is volcanic dust. So cleaning that thing out,
because I couldn't plug the power cable back in until I cleaned out all the dust.
Yes.
So anyway. So we went out and did this -- so basically, the lesson of this ends
up being we spent $10,000 to get a paper accepted in Census, right, because
this was a $10,000 deployment. And it took us about a week and a half to do it
and to fly to Ecuador with all the equipment and do all the stuff and to deploy the
network for only three days and get the data and fly back home. So -- but it got
the paper in, so it was probably worth it in the end.
When we went out there the day before, the volcano was extremely active, it was
erupting every hour. There was some very exciting activity. We set up the
sensor on the volcano and it went quiet. And so we fixed the volcano is what the
joke is.
So here is one of the signals we downloaded, and that's like a little, little tiny
earthquake. It's not even that interesting. And here is another one, great. And
we saw this, too. And we said, "Oh, that's exciting. That looks like real
interesting data." Well, no it turns out this is when we install each sensor node,
to check that the seismometer is working, I asked the student to stomp the
ground five times next to the sensor so I can see the signal. And that's all it is
there. Okay. So unfortunately the volcano did not cooperate with us. But the
result is basically that Lance did the right thing anyway and ended up
downloading 99.5 percent of the optimal data according to what an offline oracle
solution would have done, right. So we came home later and analyzed all the
data from all the nodes and so forth, and we could tell you that it did the right
thing in the field. So, you know, it's unfortunate that the signals were not all that
interesting in the end, but that's sort of, nature did not cooperate with us.
>>: [inaudible] treating the distribution of a value to.
>>Matt Welsh: The distribution was good enough to see this, that Lance did the
right thing. I mean, in some sense this is pessimistic, because normally what we
would have expected was some very exciting earthquake activity punctuated by
long periods of dormancy. But in this case, we are having mostly dormancy with
the little tiny earthquakes. So it's really nice that Lance actually did the right
thing, even though there wasn't a big difference between the noise and the
earthquake signals. Does that make sense?
>>: The opposite way, because if there was a big difference, wouldn't one
number be significantly --
>>Matt Welsh: I don't think so. I think it would have been better. I think we
would have been more like 99.9 percent had the earthquakes been much
stronger because Lance would have had an easier time from discriminating the
earthquakes from the noise, right? So in some sense this is a pessimistic
answer.
>>: [inaudible] all these applications, for example, these aren't just how far from
a base station you are, it's also very [inaudible] different location's you are, and is
that also included your model.
>>Matt Welsh: Yeah, well that's just based on the energy model that I showed
you earlier, the energy cost to download something.
>>: [inaudible] eventually that, you know, if for volcanic activity such as you
record the sensor of the core of that versus essentially the dash.
>>Matt Welsh: Oh, no, no, no, no. The way -- this is a good point. No, it turns
out for volcanic activity, it basically impinges on all the nodes simultaneously. So
you're not really that concerned about which specific node you're getting sensor
data from. Does that make sense? That's a good point. It's not localized
activity, it's generalized.
Okay. So I'm out of time, but this is -- let me do two quick slides and I'll say three
sentences on each one in terms of what we are doing next.
So I talked about local node resource management. I've talked about network
wide resource management. The next step of this is to enable sensor nodes to
work together to make these decisions in the network in a decentralized fashion
so that we don't have to have global coordination, okay. And so for that we are
designing a new distributive OS call Peloton, and this is based on -- this is a
Peloton of cyclists. And the reason they ride close together like this is so that
they reduce the wind drag for each other, and so they are much more efficient
together, okay. So the idea in Peloton is that it allows sensor nodes to share
information with one another and make localized decisions about the resource
allocations.
>>: Not drag behind, but actually slightly diagonal?
>>Matt Welsh: Probably that's true, and the other problem with a Peloton is if
anyone of these guys falls over, the whole thing collapses. So that's the whole
thing we want to avoid in the system. The whole point is it was just meant to -because we have got Lance so now we have the Peloton, okay.
>>: [inaudible]. That's probably if it ->>Matt Welsh: It's a slightly different affect. But anyway, I think. So speaking of
flying, speaking of flying things, so we also just got a large NSF grant to develop
basically a robotic colony of artificial bees. And this is called the RoboBees
Project. You can check this out on my website. Basically, we have a prototype
flapping-wing micro insect that looks like this. It's got biomimetic wing design.
Right now it doesn't have any onboard computation or sensors. So that's part of
the project is going to be to turn this thing into an autonomous vehicle. The other
problem is that, you see this lead here, the way it gets actuated is it needs to be
plugged into a 1.2 kilovolt power supply, so that's probably not so practical for
allowing it to be untethered. So there's a lot of interesting challenges in terms of
making this thing real. But what my group, the project is broken into the body,
the brain, and the colony. And I'm focused on the colony side of it and in
particular providing a swarm operating system to basically allow us to program a
whole colony of these things to go to coordinated activities, like search and
rescue, pollination, environmental monitoring and so forth.
Anyway, so basically that's it. I talked about resource management challenges
and sensor nets. I talked about Pixie to manage resource at the node level and
Lance to manage resources at the network level. And there's a lot of stuff on my
web page. So thank you for your time.
[applause.]
>>Matt Welsh: Do you have any last questions? Yeah?
>>: Most of you have I guess looking at that [inaudible] resource programming
but there's also [inaudible] that when you know, essentially, it's not just
forwarding information, you're also trying to correlate something.
.>>Matt Welsh: That's correct.
>>: So essentially you want to, maybe the case of data arise from a node which
is has become unavailable for some structure of [inaudible] rates of the node
capacities of energy, or situations were very. So in terms of the linkage
application, is it just resource awareness? Because I sort of get the idea that it's
more of single node view of resource that also may be what you call a network of
where are failure of their programming.
>>Matt Welsh: Yeah. I'm sort of putting that all under the rubric of resource
aware programming, but that's just because it's a nice term. But I think it's
exactly right that a lot of the problems that we are concerned with at the network
wide level really do concern themselves with like whether a node is online or
offline or even has enough energy to even provide data to me right now. So
Lance does let us reason about those things. I didn't talk about -- there's a
bunch of parts about Lance that let us define policies that can drive the data
collection in different ways based on domain science needs. So a good example
is if a couple of nodes detect an interesting event, I might want to download the
data from all the nodes in the network simultaneously, even if some of them are
reporting low value because I want to correlate that across the entire network.
So I can override the kind of default optimization strategy in that case. So that
would also give me the power to do some of these other things which is to
account for new failures and so forth.
>>: Is it exposed to the developer now?
>>Matt Welsh: Yeah, all of that is through a python API, basically. And the way
I -- the way you program an application in Lance is you define this very simple
policy module chain that dictates how it should treat the data that's being
sampled by the sensor nodes and prioritize what should be downloaded next.
Those things can be stateful, they can track what the notes are doing over time,
there's lots of things they are capable of doing, but just for time, I didn't get into
that today.
Yeah?
>>: Is it kind of risky these program on these platforms here?
>>Matt Welsh: No, I mean that -- it's definitely true that, you know, getting the
domain scientist to use this stuff is challenging, but they don't program the sensor
nodes at all. So it's up to us to work with them and enable what they need in the
future. But we are in the process like in the medical -- in the Parkinson's
monitoring application, I'm in the process of moving that project so that they are
the ones dealing with it and doing all the programming and we are not going to
be in the loop anymore. So the hope is that over the next, you know, year, that
they are going to be able to pick the stuff up and tweak it themselves, that's the
plan. All right. But we haven't done the user study, so to speak. That's hard.
Okay. Thanks.
Download